linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 00/23] perf: Add Icelake support
@ 2019-03-21 20:56 kan.liang
  2019-03-21 20:56 ` [PATCH V2 01/23] perf/x86: Support outputting XMM registers kan.liang
                   ` (22 more replies)
  0 siblings, 23 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The patch series intends to add Icelake support for Linux perf.

PATCH 1-18: Kernel patches to support Icelake.
 - 1-5: Support adaptive PEBS feature
 - 6-7: Enable core support with some new features, e.g. 8 generic
   counters, new event constraints, a new fixed counter.
 - 8-11: Enable cstate, rapl, msr and uncore support on Icelake
 - 12-18: Support hardware Metrics counters and SLOT fixed counter for
   Topdown events.
 - 19: Support CPUID 10.ECX to disable fixed counters

PATCH 20-23: Perf tool patches to support XMM, Topdown and event list.

Andi Kleen (12):
  perf/x86/intel: Extract memory code PEBS parser for reuse
  perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them
  perf/x86: Support constraint ranges
  perf/core: Support a REMOVE transaction
  perf/x86/intel: Basic support for metrics counters
  perf/x86/intel: Support overflows on SLOTS
  perf/x86/intel: Set correct weight for topdown subevent counters
  perf/x86/intel: Export new top down events for Icelake
  perf/x86/intel: Support CPUID 10.ECX to disable fixed counters
  perf, tools: Add support for recording and printing XMM registers
  perf, tools, stat: Support new per thread TopDown metrics
  perf, tools: Add documentation for topdown metrics

Kan Liang (11):
  perf/x86: Support outputting XMM registers
  perf/x86/intel/ds: Extract code of event update in short period
  perf/x86/intel: Support adaptive PEBSv4
  perf/x86/intel: Add Icelake support
  perf/x86/intel/cstate: Add Icelake support
  perf/x86/intel/rapl: Add Icelake support
  perf/x86/msr: Add Icelake support
  perf/x86/intel/uncore: Add Intel Icelake uncore support
  perf/x86/intel: Support hardware TopDown metrics
  perf/x86/intel: Disable sampling read slots and topdown
  perf vendor events intel: Add JSON files for Icelake

 arch/x86/events/core.c                        |  81 +-
 arch/x86/events/intel/core.c                  | 422 ++++++++-
 arch/x86/events/intel/cstate.c                |   2 +
 arch/x86/events/intel/ds.c                    | 495 ++++++++--
 arch/x86/events/intel/lbr.c                   |  35 +-
 arch/x86/events/intel/rapl.c                  |   2 +
 arch/x86/events/intel/uncore.c                |   6 +
 arch/x86/events/intel/uncore.h                |   1 +
 arch/x86/events/intel/uncore_snb.c            |  91 ++
 arch/x86/events/msr.c                         |   1 +
 arch/x86/events/perf_event.h                  |  94 +-
 arch/x86/include/asm/intel_ds.h               |   2 +-
 arch/x86/include/asm/msr-index.h              |   4 +
 arch/x86/include/asm/perf_event.h             |  79 +-
 arch/x86/include/uapi/asm/perf_regs.h         |  26 +-
 arch/x86/kernel/perf_regs.c                   |  18 +-
 include/linux/perf_event.h                    |   7 +
 kernel/events/core.c                          |   5 +
 tools/arch/x86/include/uapi/asm/perf_regs.h   |  26 +-
 tools/perf/Documentation/perf-stat.txt        |   9 +-
 tools/perf/Documentation/topdown.txt          | 223 +++++
 tools/perf/arch/x86/include/perf_regs.h       |  29 +-
 tools/perf/arch/x86/util/perf_regs.c          |  16 +
 tools/perf/builtin-stat.c                     |  24 +
 .../pmu-events/arch/x86/icelake/cache.json    | 552 +++++++++++
 .../arch/x86/icelake/floating-point.json      |  90 ++
 .../pmu-events/arch/x86/icelake/frontend.json | 424 +++++++++
 .../pmu-events/arch/x86/icelake/memory.json   | 410 ++++++++
 .../pmu-events/arch/x86/icelake/other.json    | 133 +++
 .../pmu-events/arch/x86/icelake/pipeline.json | 892 ++++++++++++++++++
 .../arch/x86/icelake/virtual-memory.json      | 236 +++++
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   1 +
 tools/perf/util/perf_regs.h                   |   1 +
 tools/perf/util/stat-shadow.c                 |  89 ++
 tools/perf/util/stat.c                        |   4 +
 tools/perf/util/stat.h                        |   8 +
 36 files changed, 4421 insertions(+), 117 deletions(-)
 create mode 100644 tools/perf/Documentation/topdown.txt
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/cache.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/floating-point.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/frontend.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/memory.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/pipeline.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/virtual-memory.json

-- 
2.17.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH V2 01/23] perf/x86: Support outputting XMM registers
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 02/23] perf/x86/intel: Extract memory code PEBS parser for reuse kan.liang
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Starting from Icelake, XMM registers can be collected in PEBS record.
But current code only output the pt_regs.

Add a new struct x86_perf_regs for both pt_regs and xmm_regs.
XMM registers are 128 bit. To simplify the code, they are handled like
two different registers, which means setting two bits in the register
bitmap. This also allows only sampling the lower 64bit bits in XMM.
The index of XMM registers starts from 32. There are 16 XMM registers.
So all reserved space for regs are used.
PERF_REG_X86_MAX stands for the max number of all x86 regs include XMM.
PERF_REG_GPR_X86_MAX stands for the max number of all x86 general
purpose registers, which not include XMM.
PERF_REG_GPR_X86_32_MAX and PERF_REG_GPR_X86_64_MAX are introduced to
replace PERF_REG_X86_32_MAX and PERF_REG_X86_64_MAX for x86 general
purpose registers.

The REG_RESERVED is also updated to allow the XMM registers.
XMM is not supported on all platforms. Adding has_xmm_regs to indicate
the specific platform. Also add checks in x86_pmu_hw_config() to reject
invalid config of regs_user and regs_intr.

Originally-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V1:
- Avoid the interface changes for perf_reg_value() and
  perf_output_sample_regs().
- Remove the extra_regs in struct perf_sample_data.
- Add struct x86_perf_regs
- Add has_xmm_regs to indicate the specific platform which support XMM
  registers collection.
- Add check in x86_pmu_hw_config() to reject invalid config of regs_user
  and regs_intr.

 arch/x86/events/core.c                | 10 ++++++++++
 arch/x86/events/perf_event.h          |  2 ++
 arch/x86/include/asm/perf_event.h     |  5 +++++
 arch/x86/include/uapi/asm/perf_regs.h | 26 ++++++++++++++++++++++++--
 arch/x86/kernel/perf_regs.c           | 18 ++++++++++++++----
 5 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index e2b1447192a8..9378c6b2128f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -560,6 +560,16 @@ int x86_pmu_hw_config(struct perf_event *event)
 			return -EINVAL;
 	}
 
+	if (event->attr.sample_regs_user & ~PEBS_REGS)
+		return -EINVAL;
+	/*
+	 * Besides the general purpose registers, XMM registers may
+	 * be collected in PEBS on some platforms, e.g. Icelake
+	 */
+	if ((event->attr.sample_regs_intr & ~PEBS_REGS) &&
+	    (!x86_pmu.has_xmm_regs || !event->attr.precise_ip))
+		return -EINVAL;
+
 	return x86_setup_perfctr(event);
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index a75955741c50..6428941a5073 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -657,6 +657,8 @@ struct x86_pmu {
 	 * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
 	 */
 	int (*check_period) (struct perf_event *event, u64 period);
+
+	unsigned int	has_xmm_regs : 1; /* support XMM regs */
 };
 
 struct x86_perf_task_context {
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8bdf74902293..d9f5bbe44b3c 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -248,6 +248,11 @@ extern void perf_events_lapic_init(void);
 #define PERF_EFLAGS_VM		(1UL << 5)
 
 struct pt_regs;
+struct x86_perf_regs {
+	struct pt_regs	regs;
+	u64		*xmm_regs;
+};
+
 extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
 extern unsigned long perf_misc_flags(struct pt_regs *regs);
 #define perf_misc_flags(regs)	perf_misc_flags(regs)
diff --git a/arch/x86/include/uapi/asm/perf_regs.h b/arch/x86/include/uapi/asm/perf_regs.h
index f3329cabce5c..b33995313d17 100644
--- a/arch/x86/include/uapi/asm/perf_regs.h
+++ b/arch/x86/include/uapi/asm/perf_regs.h
@@ -28,7 +28,29 @@ enum perf_event_x86_regs {
 	PERF_REG_X86_R14,
 	PERF_REG_X86_R15,
 
-	PERF_REG_X86_32_MAX = PERF_REG_X86_GS + 1,
-	PERF_REG_X86_64_MAX = PERF_REG_X86_R15 + 1,
+	/* These all need two bits set because they are 128bit */
+	PERF_REG_X86_XMM0  = 32,
+	PERF_REG_X86_XMM1  = 34,
+	PERF_REG_X86_XMM2  = 36,
+	PERF_REG_X86_XMM3  = 38,
+	PERF_REG_X86_XMM4  = 40,
+	PERF_REG_X86_XMM5  = 42,
+	PERF_REG_X86_XMM6  = 44,
+	PERF_REG_X86_XMM7  = 46,
+	PERF_REG_X86_XMM8  = 48,
+	PERF_REG_X86_XMM9  = 50,
+	PERF_REG_X86_XMM10 = 52,
+	PERF_REG_X86_XMM11 = 54,
+	PERF_REG_X86_XMM12 = 56,
+	PERF_REG_X86_XMM13 = 58,
+	PERF_REG_X86_XMM14 = 60,
+	PERF_REG_X86_XMM15 = 62,
+
+	/* This does not include the XMMX registers */
+	PERF_REG_GPR_X86_32_MAX = PERF_REG_X86_GS + 1,
+	PERF_REG_GPR_X86_64_MAX = PERF_REG_X86_R15 + 1,
+
+	/* All registers include the XMMX registers */
+	PERF_REG_X86_MAX = PERF_REG_X86_XMM15 + 2,
 };
 #endif /* _ASM_X86_PERF_REGS_H */
diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
index c06c4c16c6b6..421d76895565 100644
--- a/arch/x86/kernel/perf_regs.c
+++ b/arch/x86/kernel/perf_regs.c
@@ -10,14 +10,14 @@
 #include <asm/ptrace.h>
 
 #ifdef CONFIG_X86_32
-#define PERF_REG_X86_MAX PERF_REG_X86_32_MAX
+#define PERF_REG_GPR_X86_MAX PERF_REG_GPR_X86_32_MAX
 #else
-#define PERF_REG_X86_MAX PERF_REG_X86_64_MAX
+#define PERF_REG_GPR_X86_MAX PERF_REG_GPR_X86_64_MAX
 #endif
 
 #define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r)
 
-static unsigned int pt_regs_offset[PERF_REG_X86_MAX] = {
+static unsigned int pt_regs_offset[PERF_REG_GPR_X86_MAX] = {
 	PT_REGS_OFFSET(PERF_REG_X86_AX, ax),
 	PT_REGS_OFFSET(PERF_REG_X86_BX, bx),
 	PT_REGS_OFFSET(PERF_REG_X86_CX, cx),
@@ -59,13 +59,23 @@ static unsigned int pt_regs_offset[PERF_REG_X86_MAX] = {
 
 u64 perf_reg_value(struct pt_regs *regs, int idx)
 {
+	struct x86_perf_regs *perf_regs;
+
+	if (idx >= PERF_REG_X86_XMM0 && idx < PERF_REG_X86_MAX) {
+		perf_regs = container_of(regs, struct x86_perf_regs, regs);
+		if (!perf_regs->xmm_regs)
+			return 0;
+		return perf_regs->xmm_regs[idx - PERF_REG_X86_XMM0];
+	}
+
 	if (WARN_ON_ONCE(idx >= ARRAY_SIZE(pt_regs_offset)))
 		return 0;
 
 	return regs_get_register(regs, pt_regs_offset[idx]);
 }
 
-#define REG_RESERVED (~((1ULL << PERF_REG_X86_MAX) - 1ULL))
+#define REG_RESERVED \
+	(PERF_REG_X86_MAX == 64 ? 0 : ~((1ULL << PERF_REG_X86_MAX)) - 1ULL)
 
 #ifdef CONFIG_X86_32
 int perf_reg_validate(u64 mask)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 02/23] perf/x86/intel: Extract memory code PEBS parser for reuse
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
  2019-03-21 20:56 ` [PATCH V2 01/23] perf/x86: Support outputting XMM registers kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 03/23] perf/x86/intel/ds: Extract code of event update in short period kan.liang
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Extract some code related to memory profiling from the PEBS record
parser into separate functions. It can be reused by the upcoming
adaptive PEBS parser. No functional changes.
Rename intel_hsw_weight to intel_get_tsx_weight, and
intel_hsw_transaction to intel_get_tsx_transaction. Because the input is
not the hsw pebs format anymore.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V1:
- Rename intel_hsw_weight and intel_hsw_transaction
- Add missed inline

 arch/x86/events/intel/ds.c | 63 ++++++++++++++++++++------------------
 1 file changed, 34 insertions(+), 29 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 10c99ce1fead..c02cd19fe640 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1125,34 +1125,50 @@ static int intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
 	return 0;
 }
 
-static inline u64 intel_hsw_weight(struct pebs_record_skl *pebs)
+static inline u64 intel_get_tsx_weight(u64 tsx_tuning)
 {
-	if (pebs->tsx_tuning) {
-		union hsw_tsx_tuning tsx = { .value = pebs->tsx_tuning };
+	if (tsx_tuning) {
+		union hsw_tsx_tuning tsx = { .value = tsx_tuning };
 		return tsx.cycles_last_block;
 	}
 	return 0;
 }
 
-static inline u64 intel_hsw_transaction(struct pebs_record_skl *pebs)
+static inline u64 intel_get_tsx_transaction(u64 tsx_tuning, u64 ax)
 {
-	u64 txn = (pebs->tsx_tuning & PEBS_HSW_TSX_FLAGS) >> 32;
+	u64 txn = (tsx_tuning & PEBS_HSW_TSX_FLAGS) >> 32;
 
 	/* For RTM XABORTs also log the abort code from AX */
-	if ((txn & PERF_TXN_TRANSACTION) && (pebs->ax & 1))
-		txn |= ((pebs->ax >> 24) & 0xff) << PERF_TXN_ABORT_SHIFT;
+	if ((txn & PERF_TXN_TRANSACTION) && (ax & 1))
+		txn |= ((ax >> 24) & 0xff) << PERF_TXN_ABORT_SHIFT;
 	return txn;
 }
 
+#define PERF_X86_EVENT_PEBS_HSW_PREC \
+		(PERF_X86_EVENT_PEBS_ST_HSW | \
+		 PERF_X86_EVENT_PEBS_LD_HSW | \
+		 PERF_X86_EVENT_PEBS_NA_HSW)
+
+static u64 get_data_src(struct perf_event *event, u64 aux)
+{
+	u64 val = PERF_MEM_NA;
+	int fl = event->hw.flags;
+	bool fst = fl & (PERF_X86_EVENT_PEBS_ST | PERF_X86_EVENT_PEBS_HSW_PREC);
+
+	if (fl & PERF_X86_EVENT_PEBS_LDLAT)
+		val = load_latency_data(aux);
+	else if (fst && (fl & PERF_X86_EVENT_PEBS_HSW_PREC))
+		val = precise_datala_hsw(event, aux);
+	else if (fst)
+		val = precise_store_data(aux);
+	return val;
+}
+
 static void setup_pebs_sample_data(struct perf_event *event,
 				   struct pt_regs *iregs, void *__pebs,
 				   struct perf_sample_data *data,
 				   struct pt_regs *regs)
 {
-#define PERF_X86_EVENT_PEBS_HSW_PREC \
-		(PERF_X86_EVENT_PEBS_ST_HSW | \
-		 PERF_X86_EVENT_PEBS_LD_HSW | \
-		 PERF_X86_EVENT_PEBS_NA_HSW)
 	/*
 	 * We cast to the biggest pebs_record but are careful not to
 	 * unconditionally access the 'extra' entries.
@@ -1160,17 +1176,13 @@ static void setup_pebs_sample_data(struct perf_event *event,
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct pebs_record_skl *pebs = __pebs;
 	u64 sample_type;
-	int fll, fst, dsrc;
-	int fl = event->hw.flags;
+	int fll;
 
 	if (pebs == NULL)
 		return;
 
 	sample_type = event->attr.sample_type;
-	dsrc = sample_type & PERF_SAMPLE_DATA_SRC;
-
-	fll = fl & PERF_X86_EVENT_PEBS_LDLAT;
-	fst = fl & (PERF_X86_EVENT_PEBS_ST | PERF_X86_EVENT_PEBS_HSW_PREC);
+	fll = event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT;
 
 	perf_sample_data_init(data, 0, event->hw.last_period);
 
@@ -1185,16 +1197,8 @@ static void setup_pebs_sample_data(struct perf_event *event,
 	/*
 	 * data.data_src encodes the data source
 	 */
-	if (dsrc) {
-		u64 val = PERF_MEM_NA;
-		if (fll)
-			val = load_latency_data(pebs->dse);
-		else if (fst && (fl & PERF_X86_EVENT_PEBS_HSW_PREC))
-			val = precise_datala_hsw(event, pebs->dse);
-		else if (fst)
-			val = precise_store_data(pebs->dse);
-		data->data_src.val = val;
-	}
+	if (sample_type & PERF_SAMPLE_DATA_SRC)
+		data->data_src.val = get_data_src(event, pebs->dse);
 
 	/*
 	 * We must however always use iregs for the unwinder to stay sane; the
@@ -1281,10 +1285,11 @@ static void setup_pebs_sample_data(struct perf_event *event,
 	if (x86_pmu.intel_cap.pebs_format >= 2) {
 		/* Only set the TSX weight when no memory weight. */
 		if ((sample_type & PERF_SAMPLE_WEIGHT) && !fll)
-			data->weight = intel_hsw_weight(pebs);
+			data->weight = intel_get_tsx_weight(pebs->tsx_tuning);
 
 		if (sample_type & PERF_SAMPLE_TRANSACTION)
-			data->txn = intel_hsw_transaction(pebs);
+			data->txn = intel_get_tsx_transaction(pebs->tsx_tuning,
+							      pebs->ax);
 	}
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 03/23] perf/x86/intel/ds: Extract code of event update in short period
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
  2019-03-21 20:56 ` [PATCH V2 01/23] perf/x86: Support outputting XMM registers kan.liang
  2019-03-21 20:56 ` [PATCH V2 02/23] perf/x86/intel: Extract memory code PEBS parser for reuse kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4 kan.liang
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The drain_pebs() could be called twice in a short period for auto-reload
event in pmu::read(). The intel_pmu_save_and_restart_reload() should be
called to update the event->count.
This case should also be handled on Icelake. Extract the codes for reuse
later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

New patch

 arch/x86/events/intel/ds.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index c02cd19fe640..efc054aee3c1 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1491,6 +1491,26 @@ static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
 	__intel_pmu_pebs_event(event, iregs, at, top, 0, n);
 }
 
+static void intel_pmu_pebs_event_update_no_drain(struct cpu_hw_events *cpuc,
+						 int size)
+{
+	struct perf_event *event;
+	int bit;
+
+	/*
+	 * The drain_pebs() could be called twice in a short period
+	 * for auto-reload event in pmu::read(). There are no
+	 * overflows have happened in between.
+	 * It needs to call intel_pmu_save_and_restart_reload() to
+	 * update the event->count for this case.
+	 */
+	for_each_set_bit(bit, (unsigned long *)&cpuc->pebs_enabled, size) {
+		event = cpuc->events[bit];
+		if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
+			intel_pmu_save_and_restart_reload(event, 0);
+	}
+}
+
 static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -1518,19 +1538,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
 	}
 
 	if (unlikely(base >= top)) {
-		/*
-		 * The drain_pebs() could be called twice in a short period
-		 * for auto-reload event in pmu::read(). There are no
-		 * overflows have happened in between.
-		 * It needs to call intel_pmu_save_and_restart_reload() to
-		 * update the event->count for this case.
-		 */
-		for_each_set_bit(bit, (unsigned long *)&cpuc->pebs_enabled,
-				 size) {
-			event = cpuc->events[bit];
-			if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
-				intel_pmu_save_and_restart_reload(event, 0);
-		}
+		intel_pmu_pebs_event_update_no_drain(cpuc, size);
 		return;
 	}
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (2 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 03/23] perf/x86/intel/ds: Extract code of event update in short period kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 21:13   ` Peter Zijlstra
                     ` (2 more replies)
  2019-03-21 20:56 ` [PATCH V2 05/23] perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them kan.liang
                   ` (18 subsequent siblings)
  22 siblings, 3 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Adaptive PEBS is a new way to report PEBS sampling information. Instead
of a fixed size record for all PEBS events it allows to configure the
PEBS record to only include the information needed. Events can then opt
in to use such an extended record, or stay with a basic record which
only contains the IP.

The major new feature is to support LBRs in PEBS record.
Besides normal LBR, this allows (much faster) large PEBS, while still
supporting callstacks through callstack LBR. So essentially a lot of
profiling can now be done without frequent interrupts, dropping the
overhead significantly.

The main requirement still is to use a period, and not use frequency
mode, because frequency mode requires reevaluating the frequency on each
overflow.

The floating point state (XMM) is also supported, which allows efficient
profiling of FP function arguments.

Adapt the drain function to handle variable length records.

Use a new callback to parse the new record format, and also handle the
STATUS field now being at a different offset.

Add code to set up the configuration register. Since there is only a
single register, all events either get the full super set of all events,
or only the basic record.

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V1:
- Code rebase on top of c634dc6bdede
- Rename @d to pebs_data_cfg
- Make pebs_update_adaptive_cfg readable
- Clear pebs_data_cfg and pebs_record_size for first PEBS in add
- Don't clear ICL_EVENTSEL_ADAPTIVE. Rely on MSR_PEBS_CFG settings
- Change PEBS record parsing order (bug fix)
- Support struct x86_perf_regs
- make get_pebs_status generic
- specific intel_pmu_drain_pebs_icl()
- Use cpuc->pebs_record_size to replace format_size

 arch/x86/events/intel/core.c      |   2 +
 arch/x86/events/intel/ds.c        | 372 ++++++++++++++++++++++++++++--
 arch/x86/events/intel/lbr.c       |  22 ++
 arch/x86/events/perf_event.h      |  14 ++
 arch/x86/include/asm/msr-index.h  |   1 +
 arch/x86/include/asm/perf_event.h |  42 ++++
 6 files changed, 436 insertions(+), 17 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 8baa441d8000..620beae035a0 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3507,6 +3507,8 @@ static struct intel_excl_cntrs *allocate_excl_cntrs(int cpu)
 
 int intel_cpuc_prepare(struct cpu_hw_events *cpuc, int cpu)
 {
+	cpuc->pebs_record_size = x86_pmu.pebs_record_size;
+
 	if (x86_pmu.extra_regs || x86_pmu.lbr_sel_map) {
 		cpuc->shared_regs = allocate_shared_regs(cpu);
 		if (!cpuc->shared_regs)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index efc054aee3c1..c50de696fb44 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -906,17 +906,85 @@ static inline void pebs_update_threshold(struct cpu_hw_events *cpuc)
 
 	if (cpuc->n_pebs == cpuc->n_large_pebs) {
 		threshold = ds->pebs_absolute_maximum -
-			reserved * x86_pmu.pebs_record_size;
+			reserved * cpuc->pebs_record_size;
 	} else {
-		threshold = ds->pebs_buffer_base + x86_pmu.pebs_record_size;
+		threshold = ds->pebs_buffer_base + cpuc->pebs_record_size;
 	}
 
 	ds->pebs_interrupt_threshold = threshold;
 }
 
+static void adaptive_pebs_record_size_update(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	u64 pebs_data_cfg = cpuc->pebs_data_cfg;
+	int sz = sizeof(struct pebs_basic);
+
+	if (pebs_data_cfg & PEBS_DATACFG_MEMINFO)
+		sz += sizeof(struct pebs_meminfo);
+	if (pebs_data_cfg & PEBS_DATACFG_GPRS)
+		sz += sizeof(struct pebs_gprs);
+	if (pebs_data_cfg & PEBS_DATACFG_XMMS)
+		sz += sizeof(struct pebs_xmm);
+	if (pebs_data_cfg & PEBS_DATACFG_LBRS)
+		sz += x86_pmu.lbr_nr * sizeof(struct pebs_lbr_entry);
+
+	cpuc->pebs_record_size = sz;
+}
+
+#define PERF_PEBS_MEMINFO_TYPE	(PERF_SAMPLE_ADDR | PERF_SAMPLE_DATA_SRC |   \
+				PERF_SAMPLE_PHYS_ADDR | PERF_SAMPLE_WEIGHT | \
+				PERF_SAMPLE_TRANSACTION)
+
+static u64 pebs_update_adaptive_cfg(struct perf_event *event)
+{
+	struct perf_event_attr *attr = &event->attr;
+	u64 sample_type = attr->sample_type;
+	u64 pebs_data_cfg = 0;
+	bool gprs, tsx_weight;
+
+	if ((sample_type & ~(PERF_SAMPLE_IP|PERF_SAMPLE_TIME)) ||
+	    attr->precise_ip < 2) {
+
+		if (sample_type & PERF_PEBS_MEMINFO_TYPE)
+			pebs_data_cfg |= PEBS_DATACFG_MEMINFO;
+
+		/*
+		 * Cases we need the registers:
+		 * + user requested registers
+		 * + precise_ip < 2 for the non event IP
+		 * + For RTM TSX weight we need GPRs too for the abort
+		 * code. But we don't want to force GPRs for all other
+		 * weights.  So add it only collectfor the RTM abort event.
+		 */
+		gprs = (sample_type & PERF_SAMPLE_REGS_INTR) &&
+			      (attr->sample_regs_intr & 0xffffffff);
+		tsx_weight = (sample_type & PERF_SAMPLE_WEIGHT) &&
+			     ((attr->config & 0xffff) == x86_pmu.force_gpr_event);
+		if (gprs || (attr->precise_ip < 2) || tsx_weight)
+			pebs_data_cfg |= PEBS_DATACFG_GPRS;
+
+		if ((sample_type & PERF_SAMPLE_REGS_INTR) &&
+		    (attr->sample_regs_intr >> 32))
+			pebs_data_cfg |= PEBS_DATACFG_XMMS;
+
+		if (sample_type & PERF_SAMPLE_BRANCH_STACK) {
+			/*
+			 * For now always log all LBRs. Could configure this
+			 * later.
+			 */
+			pebs_data_cfg |= PEBS_DATACFG_LBRS |
+				((x86_pmu.lbr_nr-1) << PEBS_DATACFG_LBR_SHIFT);
+		}
+	}
+	return pebs_data_cfg;
+}
+
 static void
-pebs_update_state(bool needed_cb, struct cpu_hw_events *cpuc, struct pmu *pmu)
+pebs_update_state(bool needed_cb, struct cpu_hw_events *cpuc,
+		  struct perf_event *event, bool add)
 {
+	struct pmu *pmu = event->ctx->pmu;
 	/*
 	 * Make sure we get updated with the first PEBS
 	 * event. It will trigger also during removal, but
@@ -933,6 +1001,34 @@ pebs_update_state(bool needed_cb, struct cpu_hw_events *cpuc, struct pmu *pmu)
 		update = true;
 	}
 
+	/*
+	 * The PEBS record doesn't shrink on the del. Because to get
+	 * an accurate config needs to go through all the existing pebs events.
+	 * It's not necessary.
+	 * There is no harmful for a bigger PEBS record, except little
+	 * performance impacts.
+	 * Also, for most cases, the same pebs config is applied for all
+	 * pebs events.
+	 */
+	if (x86_pmu.intel_cap.pebs_baseline && add) {
+		u64 pebs_data_cfg;
+
+		/* Clear pebs_data_cfg and pebs_record_size for first PEBS. */
+		if (cpuc->n_pebs == 1) {
+			cpuc->pebs_data_cfg = 0;
+			cpuc->pebs_record_size = sizeof(struct pebs_basic);
+		}
+
+		pebs_data_cfg = pebs_update_adaptive_cfg(event);
+
+		/* Update pebs_record_size if new event requires more data. */
+		if (pebs_data_cfg & ~cpuc->pebs_data_cfg) {
+			cpuc->pebs_data_cfg |= pebs_data_cfg;
+			adaptive_pebs_record_size_update();
+			update = true;
+		}
+	}
+
 	if (update)
 		pebs_update_threshold(cpuc);
 }
@@ -947,7 +1043,7 @@ void intel_pmu_pebs_add(struct perf_event *event)
 	if (hwc->flags & PERF_X86_EVENT_LARGE_PEBS)
 		cpuc->n_large_pebs++;
 
-	pebs_update_state(needed_cb, cpuc, event->ctx->pmu);
+	pebs_update_state(needed_cb, cpuc, event, true);
 }
 
 void intel_pmu_pebs_enable(struct perf_event *event)
@@ -965,6 +1061,14 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 	else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
 		cpuc->pebs_enabled |= 1ULL << 63;
 
+	if (x86_pmu.intel_cap.pebs_baseline) {
+		hwc->config |= ICL_EVENTSEL_ADAPTIVE;
+		if (cpuc->pebs_data_cfg != cpuc->active_pebs_data_cfg) {
+			wrmsrl(MSR_PEBS_DATA_CFG, cpuc->pebs_data_cfg);
+			cpuc->active_pebs_data_cfg = cpuc->pebs_data_cfg;
+		}
+	}
+
 	/*
 	 * Use auto-reload if possible to save a MSR write in the PMI.
 	 * This must be done in pmu::start(), because PERF_EVENT_IOC_PERIOD.
@@ -991,7 +1095,7 @@ void intel_pmu_pebs_del(struct perf_event *event)
 	if (hwc->flags & PERF_X86_EVENT_LARGE_PEBS)
 		cpuc->n_large_pebs--;
 
-	pebs_update_state(needed_cb, cpuc, event->ctx->pmu);
+	pebs_update_state(needed_cb, cpuc, event, false);
 }
 
 void intel_pmu_pebs_disable(struct perf_event *event)
@@ -1144,6 +1248,25 @@ static inline u64 intel_get_tsx_transaction(u64 tsx_tuning, u64 ax)
 	return txn;
 }
 
+static inline void *next_pebs_record(void *p)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	unsigned int size;
+
+	if (x86_pmu.intel_cap.pebs_format < 4)
+		size = x86_pmu.pebs_record_size;
+	else
+		size = cpuc->pebs_record_size;
+	return p + size;
+}
+
+static inline u64 get_pebs_status(void *n)
+{
+	if (x86_pmu.intel_cap.pebs_format < 4)
+		return ((struct pebs_record_nhm *)n)->status;
+	return ((struct pebs_basic *)n)->applicable_counters;
+}
+
 #define PERF_X86_EVENT_PEBS_HSW_PREC \
 		(PERF_X86_EVENT_PEBS_ST_HSW | \
 		 PERF_X86_EVENT_PEBS_LD_HSW | \
@@ -1164,7 +1287,7 @@ static u64 get_data_src(struct perf_event *event, u64 aux)
 	return val;
 }
 
-static void setup_pebs_sample_data(struct perf_event *event,
+static void setup_pebs_fixed_sample_data(struct perf_event *event,
 				   struct pt_regs *iregs, void *__pebs,
 				   struct perf_sample_data *data,
 				   struct pt_regs *regs)
@@ -1306,6 +1429,140 @@ static void setup_pebs_sample_data(struct perf_event *event,
 		data->br_stack = &cpuc->lbr_stack;
 }
 
+static void adaptive_pebs_save_regs(struct pt_regs *regs,
+				    struct pebs_gprs *gprs)
+{
+	regs->ax = gprs->ax;
+	regs->bx = gprs->bx;
+	regs->cx = gprs->cx;
+	regs->dx = gprs->dx;
+	regs->si = gprs->si;
+	regs->di = gprs->di;
+	regs->bp = gprs->bp;
+	regs->sp = gprs->sp;
+#ifndef CONFIG_X86_32
+	regs->r8 = gprs->r8;
+	regs->r9 = gprs->r9;
+	regs->r10 = gprs->r10;
+	regs->r11 = gprs->r11;
+	regs->r12 = gprs->r12;
+	regs->r13 = gprs->r13;
+	regs->r14 = gprs->r14;
+	regs->r15 = gprs->r15;
+#endif
+}
+
+/*
+ * With adaptive PEBS the layout depends on what fields are configured.
+ */
+
+static void setup_pebs_adaptive_sample_data(struct perf_event *event,
+					    struct pt_regs *iregs, void *__pebs,
+					    struct perf_sample_data *data,
+					    struct pt_regs *regs)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct pebs_basic *basic = __pebs;
+	void *next_record = basic + 1;
+	u64 sample_type;
+	u64 format_size;
+	struct pebs_meminfo *meminfo = NULL;
+	struct pebs_gprs *gprs = NULL;
+	struct x86_perf_regs *perf_regs;
+
+	if (basic == NULL)
+		return;
+
+	perf_regs = container_of(regs, struct x86_perf_regs, regs);
+	perf_regs->xmm_regs = NULL;
+
+	sample_type = event->attr.sample_type;
+	format_size = basic->format_size;
+	perf_sample_data_init(data, 0, event->hw.last_period);
+	data->period = event->hw.last_period;
+
+	if (event->attr.use_clockid == 0)
+		data->time = native_sched_clock_from_tsc(basic->tsc);
+
+	/*
+	 * We must however always use iregs for the unwinder to stay sane; the
+	 * record BP,SP,IP can point into thin air when the record is from a
+	 * previous PMI context or an (I)RET happened between the record and
+	 * PMI.
+	 */
+	if (sample_type & PERF_SAMPLE_CALLCHAIN)
+		data->callchain = perf_callchain(event, iregs);
+
+	*regs = *iregs;
+	/* The ip in basic is EventingIP */
+	set_linear_ip(regs, basic->ip);
+	regs->flags = PERF_EFLAGS_EXACT;
+
+	/*
+	 * The record for MEMINFO is in front of GPRS
+	 * But PERF_SAMPLE_TRANSACTION needs gprs->ax.
+	 * Save the pointer here but process later.
+	 */
+	if (format_size & PEBS_DATACFG_MEMINFO) {
+		meminfo = next_record;
+		next_record = meminfo + 1;
+	}
+
+	if (format_size & PEBS_DATACFG_GPRS) {
+		gprs = next_record;
+		next_record = gprs + 1;
+
+		if (event->attr.precise_ip < 2) {
+			set_linear_ip(regs, gprs->ip);
+			regs->flags &= ~PERF_EFLAGS_EXACT;
+		}
+
+		if (sample_type & PERF_SAMPLE_REGS_INTR)
+			adaptive_pebs_save_regs(regs, gprs);
+	}
+
+	if (format_size & PEBS_DATACFG_MEMINFO) {
+		if (sample_type & PERF_SAMPLE_WEIGHT)
+			data->weight = meminfo->latency ?:
+				intel_get_tsx_weight(meminfo->tsx_tuning);
+
+		if (sample_type & PERF_SAMPLE_DATA_SRC)
+			data->data_src.val = get_data_src(event, meminfo->aux);
+
+		if (sample_type & (PERF_SAMPLE_ADDR | PERF_SAMPLE_PHYS_ADDR))
+			data->addr = meminfo->address;
+
+		if (sample_type & PERF_SAMPLE_TRANSACTION)
+			data->txn = intel_get_tsx_transaction(meminfo->tsx_tuning,
+							  gprs ? gprs->ax : 0);
+	}
+
+	if (format_size & PEBS_DATACFG_XMMS) {
+		struct pebs_xmm *xmm = next_record;
+
+		next_record = xmm + 1;
+		perf_regs->xmm_regs = xmm->xmm;
+	}
+
+	if (format_size & PEBS_DATACFG_LBRS) {
+		struct pebs_lbr *lbr = next_record;
+		int num_lbr = ((format_size >> PEBS_DATACFG_LBR_SHIFT)
+					& 0xff) + 1;
+		next_record = next_record + num_lbr*sizeof(struct pebs_lbr_entry);
+
+		if (has_branch_stack(event)) {
+			intel_pmu_store_pebs_lbrs(lbr);
+			data->br_stack = &cpuc->lbr_stack;
+		}
+	}
+
+	WARN_ONCE(next_record != __pebs + (format_size >> 48),
+			"PEBS record size %llu, expected %llu, config %llx\n",
+			format_size >> 48,
+			(u64)(next_record - __pebs),
+			basic->format_size);
+}
+
 static inline void *
 get_next_pebs_record_by_bit(void *base, void *top, int bit)
 {
@@ -1323,19 +1580,19 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
 	if (base == NULL)
 		return NULL;
 
-	for (at = base; at < top; at += x86_pmu.pebs_record_size) {
-		struct pebs_record_nhm *p = at;
+	for (at = base; at < top; at = next_pebs_record(at)) {
+		unsigned long status = get_pebs_status(at);
 
-		if (test_bit(bit, (unsigned long *)&p->status)) {
+		if (test_bit(bit, (unsigned long *)&status)) {
 			/* PEBS v3 has accurate status bits */
 			if (x86_pmu.intel_cap.pebs_format >= 3)
 				return at;
 
-			if (p->status == (1 << bit))
+			if (status == (1 << bit))
 				return at;
 
 			/* clear non-PEBS bit and re-check */
-			pebs_status = p->status & cpuc->pebs_enabled;
+			pebs_status = status & cpuc->pebs_enabled;
 			pebs_status &= PEBS_COUNTER_MASK;
 			if (pebs_status == (1 << bit))
 				return at;
@@ -1419,7 +1676,8 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 {
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_sample_data data;
-	struct pt_regs regs;
+	struct x86_perf_regs perf_regs;
+	struct pt_regs *regs = &perf_regs.regs;
 	void *at = get_next_pebs_record_by_bit(base, top, bit);
 
 	if (hwc->flags & PERF_X86_EVENT_AUTO_RELOAD) {
@@ -1434,20 +1692,20 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
 		return;
 
 	while (count > 1) {
-		setup_pebs_sample_data(event, iregs, at, &data, &regs);
-		perf_event_output(event, &data, &regs);
-		at += x86_pmu.pebs_record_size;
+		x86_pmu.setup_pebs_sample_data(event, iregs, at, &data, regs);
+		perf_event_output(event, &data, regs);
+		at = next_pebs_record(at);
 		at = get_next_pebs_record_by_bit(at, top, bit);
 		count--;
 	}
 
-	setup_pebs_sample_data(event, iregs, at, &data, &regs);
+	x86_pmu.setup_pebs_sample_data(event, iregs, at, &data, regs);
 
 	/*
 	 * All but the last records are processed.
 	 * The last one is left to be able to call the overflow handler.
 	 */
-	if (perf_event_overflow(event, &data, &regs)) {
+	if (perf_event_overflow(event, &data, regs)) {
 		x86_pmu_stop(event, 0);
 		return;
 	}
@@ -1626,6 +1884,59 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
 	}
 }
 
+static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs)
+{
+	short counts[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] = {};
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct debug_store *ds = cpuc->ds;
+	struct perf_event *event;
+	void *base, *at, *top;
+	int bit, size;
+	u64 mask;
+
+	if (!x86_pmu.pebs_active)
+		return;
+
+	base = (struct pebs_basic *)(unsigned long)ds->pebs_buffer_base;
+	top = (struct pebs_basic *)(unsigned long)ds->pebs_index;
+
+	ds->pebs_index = ds->pebs_buffer_base;
+
+	mask = ((1ULL << x86_pmu.max_pebs_events) - 1) |
+	       (((1ULL << x86_pmu.num_counters_fixed) - 1) << INTEL_PMC_IDX_FIXED);
+	size = INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed;
+
+	if (unlikely(base >= top)) {
+		intel_pmu_pebs_event_update_no_drain(cpuc, size);
+		return;
+	}
+
+	for (at = base; at < top; at = next_pebs_record(at)) {
+		u64 pebs_status;
+
+		pebs_status = get_pebs_status(at) & cpuc->pebs_enabled;
+		pebs_status &= mask;
+
+		for_each_set_bit(bit, (unsigned long *)&pebs_status, size)
+			counts[bit]++;
+	}
+
+	for (bit = 0; bit < size; bit++) {
+		if (counts[bit] == 0)
+			continue;
+
+		event = cpuc->events[bit];
+		if (WARN_ON_ONCE(!event))
+			continue;
+
+		if (WARN_ON_ONCE(!event->attr.precise_ip))
+			continue;
+
+		__intel_pmu_pebs_event(event, iregs, base,
+				       top, bit, counts[bit]);
+	}
+}
+
 /*
  * BTS, PEBS probe and setup
  */
@@ -1645,8 +1956,10 @@ void __init intel_ds_init(void)
 		x86_pmu.pebs_no_isolation = 1;
 	if (x86_pmu.pebs) {
 		char pebs_type = x86_pmu.intel_cap.pebs_trap ?  '+' : '-';
+		char *pebs_qual = "";
 		int format = x86_pmu.intel_cap.pebs_format;
 
+		x86_pmu.setup_pebs_sample_data = setup_pebs_fixed_sample_data;
 		switch (format) {
 		case 0:
 			pr_cont("PEBS fmt0%c, ", pebs_type);
@@ -1682,10 +1995,35 @@ void __init intel_ds_init(void)
 			x86_pmu.large_pebs_flags |= PERF_SAMPLE_TIME;
 			break;
 
+		case 4:
+			x86_pmu.drain_pebs = intel_pmu_drain_pebs_icl;
+			x86_pmu.setup_pebs_sample_data = setup_pebs_adaptive_sample_data;
+			x86_pmu.pebs_record_size = sizeof(struct pebs_basic);
+			if (x86_pmu.intel_cap.pebs_baseline) {
+				x86_pmu.large_pebs_flags |=
+					PERF_SAMPLE_BRANCH_STACK |
+					PERF_SAMPLE_TIME;
+				x86_pmu.flags |= PMU_FL_PEBS_ALL;
+				pebs_qual = "-baseline";
+			} else {
+				/* Only basic record supported */
+				x86_pmu.large_pebs_flags &=
+					~(PERF_SAMPLE_ADDR |
+					  PERF_SAMPLE_TIME |
+					  PERF_SAMPLE_DATA_SRC |
+					  PERF_SAMPLE_TRANSACTION |
+					  PERF_SAMPLE_REGS_USER |
+					  PERF_SAMPLE_REGS_INTR);
+			}
+			pr_cont("PEBS fmt4%c%s, ", pebs_type, pebs_qual);
+			break;
+
 		default:
 			pr_cont("no PEBS fmt%d%c, ", format, pebs_type);
 			x86_pmu.pebs = 0;
 		}
+		if (format != 4)
+			x86_pmu.intel_cap.pebs_baseline = 0;
 	}
 }
 
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 580c1b91c454..07b7175fc378 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1080,6 +1080,28 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 	}
 }
 
+void intel_pmu_store_pebs_lbrs(struct pebs_lbr *lbr)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int i;
+
+	cpuc->lbr_stack.nr = x86_pmu.lbr_nr;
+	for (i = 0; i < x86_pmu.lbr_nr; i++) {
+		u64 info = lbr->lbr[i].info;
+		struct perf_branch_entry *e = &cpuc->lbr_entries[i];
+
+		e->from		= lbr->lbr[i].from;
+		e->to		= lbr->lbr[i].to;
+		e->mispred	= !!(info & LBR_INFO_MISPRED);
+		e->predicted	= !(info & LBR_INFO_MISPRED);
+		e->in_tx	= !!(info & LBR_INFO_IN_TX);
+		e->abort	= !!(info & LBR_INFO_ABORT);
+		e->cycles	= info & LBR_INFO_CYCLES;
+		e->reserved	= 0;
+	}
+	intel_pmu_lbr_filter(cpuc);
+}
+
 /*
  * Map interface branch filters onto LBR filters
  */
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 6428941a5073..16237d94728d 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -207,6 +207,11 @@ struct cpu_hw_events {
 	int			n_pebs;
 	int			n_large_pebs;
 
+	/* Current super set of events hardware configuration */
+	u64			pebs_data_cfg;
+	u64			active_pebs_data_cfg;
+	int			pebs_record_size;
+
 	/*
 	 * Intel LBR bits
 	 */
@@ -473,6 +478,7 @@ union perf_capabilities {
 		 * values > 32bit.
 		 */
 		u64	full_width_write:1;
+		u64     pebs_baseline:1;
 	};
 	u64	capabilities;
 };
@@ -617,10 +623,16 @@ struct x86_pmu {
 	int		pebs_record_size;
 	int		pebs_buffer_size;
 	void		(*drain_pebs)(struct pt_regs *regs);
+	void		(*setup_pebs_sample_data)(struct perf_event *event,
+						  struct pt_regs *iregs,
+						  void *__pebs,
+						  struct perf_sample_data *data,
+						  struct pt_regs *regs);
 	struct event_constraint *pebs_constraints;
 	void		(*pebs_aliases)(struct perf_event *event);
 	int 		max_pebs_events;
 	unsigned long	large_pebs_flags;
+	u64		force_gpr_event;
 
 	/*
 	 * Intel LBR
@@ -961,6 +973,8 @@ void intel_pmu_pebs_sched_task(struct perf_event_context *ctx, bool sched_in);
 
 void intel_pmu_auto_reload_read(struct perf_event *event);
 
+void intel_pmu_store_pebs_lbrs(struct pebs_lbr *lbr);
+
 void intel_ds_init(void);
 
 void intel_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in);
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ca5bc0eacb95..1378518cf63f 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -116,6 +116,7 @@
 #define LBR_INFO_CYCLES			0xffff
 
 #define MSR_IA32_PEBS_ENABLE		0x000003f1
+#define MSR_PEBS_DATA_CFG		0x000003f2
 #define MSR_IA32_DS_AREA		0x00000600
 #define MSR_IA32_PERF_CAPABILITIES	0x00000345
 #define MSR_PEBS_LD_LAT_THRESHOLD	0x000003f6
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index d9f5bbe44b3c..fd0ec2699213 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -32,6 +32,7 @@
 
 #define HSW_IN_TX					(1ULL << 32)
 #define HSW_IN_TX_CHECKPOINTED				(1ULL << 33)
+#define ICL_EVENTSEL_ADAPTIVE				(1ULL << 34)
 
 #define AMD64_EVENTSEL_INT_CORE_ENABLE			(1ULL << 36)
 #define AMD64_EVENTSEL_GUESTONLY			(1ULL << 40)
@@ -87,6 +88,12 @@
 #define ARCH_PERFMON_BRANCH_MISSES_RETIRED		6
 #define ARCH_PERFMON_EVENTS_COUNT			7
 
+#define PEBS_DATACFG_MEMINFO	BIT_ULL(0)
+#define PEBS_DATACFG_GPRS	BIT_ULL(1)
+#define PEBS_DATACFG_XMMS	BIT_ULL(2)
+#define PEBS_DATACFG_LBRS	BIT_ULL(3)
+#define PEBS_DATACFG_LBR_SHIFT	24
+
 /*
  * Intel "Architectural Performance Monitoring" CPUID
  * detection/enumeration details:
@@ -176,6 +183,41 @@ struct x86_pmu_capability {
 #define GLOBAL_STATUS_LBRS_FROZEN			BIT_ULL(58)
 #define GLOBAL_STATUS_TRACE_TOPAPMI			BIT_ULL(55)
 
+/*
+ * Adaptive PEBS v4
+ */
+
+struct pebs_basic {
+	u64 format_size;
+	u64 ip;
+	u64 applicable_counters;
+	u64 tsc;
+};
+
+struct pebs_meminfo {
+	u64 address;
+	u64 aux;
+	u64 latency;
+	u64 tsx_tuning;
+};
+
+struct pebs_gprs {
+	u64 flags, ip, ax, cx, dx, bx, sp, bp, si, di;
+	u64 r8, r9, r10, r11, r12, r13, r14, r15;
+};
+
+struct pebs_xmm {
+	u64 xmm[16*2];	/* two entries for each register */
+};
+
+struct pebs_lbr_entry {
+	u64 from, to, info;
+};
+
+struct pebs_lbr {
+	struct pebs_lbr_entry lbr[0]; /* Variable length */
+};
+
 /*
  * IBS cpuid feature detection
  */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 05/23] perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (3 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4 kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 06/23] perf/x86: Support constraint ranges kan.liang
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

With adaptive PEBS the CPU can directly supply the LBR information,
so we don't need to read it again. But the LBRs still need to be
enabled. Add a special count to the cpuc that distinguishes these
two cases, and avoid reading the LBRs unnecessarily when PEBS is
active.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/intel/lbr.c  | 13 ++++++++++++-
 arch/x86/events/perf_event.h |  1 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 07b7175fc378..6f814a27416b 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -488,6 +488,8 @@ void intel_pmu_lbr_add(struct perf_event *event)
 	 * be 'new'. Conversely, a new event can get installed through the
 	 * context switch path for the first time.
 	 */
+	if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip > 0)
+		cpuc->lbr_pebs_users++;
 	perf_sched_cb_inc(event->ctx->pmu);
 	if (!cpuc->lbr_users++ && !event->total_time_running)
 		intel_pmu_lbr_reset();
@@ -507,8 +509,11 @@ void intel_pmu_lbr_del(struct perf_event *event)
 		task_ctx->lbr_callstack_users--;
 	}
 
+	if (x86_pmu.intel_cap.pebs_baseline && event->attr.precise_ip > 0)
+		cpuc->lbr_pebs_users--;
 	cpuc->lbr_users--;
 	WARN_ON_ONCE(cpuc->lbr_users < 0);
+	WARN_ON_ONCE(cpuc->lbr_pebs_users < 0);
 	perf_sched_cb_dec(event->ctx->pmu);
 }
 
@@ -658,7 +663,13 @@ void intel_pmu_lbr_read(void)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
-	if (!cpuc->lbr_users)
+	/*
+	 * Don't read when all LBRs users are using adaptive PEBS.
+	 *
+	 * This could be smarter and actually check the event,
+	 * but this simple approach seems to work for now.
+	 */
+	if (!cpuc->lbr_users || cpuc->lbr_users == cpuc->lbr_pebs_users)
 		return;
 
 	if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_32)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 16237d94728d..132130f00c36 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -216,6 +216,7 @@ struct cpu_hw_events {
 	 * Intel LBR bits
 	 */
 	int				lbr_users;
+	int				lbr_pebs_users;
 	struct perf_branch_stack	lbr_stack;
 	struct perf_branch_entry	lbr_entries[MAX_LBR_ENTRIES];
 	struct er_account		*lbr_sel;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 06/23] perf/x86: Support constraint ranges
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (4 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 05/23] perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 21:09   ` Peter Zijlstra
  2019-03-21 20:56 ` [PATCH V2 07/23] perf/x86/intel: Add Icelake support kan.liang
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Icelake extended the general counters to 8, even when SMT is enabled.
However only a (large) subset of the events can be used on all 8
counters.

The events that can or cannot be used on all counters are organized
in ranges.

We need a lot of scheduler constraints to handle all this.

To avoid blowing up the tables add event code ranges to the constraint
tables, and a new inline function to match them.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V1:
- Use 'size' to replace 'range_end'

 arch/x86/events/intel/core.c |  2 +-
 arch/x86/events/intel/ds.c   |  2 +-
 arch/x86/events/perf_event.h | 38 ++++++++++++++++++++++++++++++------
 3 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 620beae035a0..d5d796e114a1 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2688,7 +2688,7 @@ x86_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 
 	if (x86_pmu.event_constraints) {
 		for_each_event_constraint(c, x86_pmu.event_constraints) {
-			if ((event->hw.config & c->cmask) == c->code) {
+			if (constraint_match(c, event->hw.config)) {
 				event->hw.flags |= c->flags;
 				return c;
 			}
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index c50de696fb44..04900f393f4e 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -858,7 +858,7 @@ struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 
 	if (x86_pmu.pebs_constraints) {
 		for_each_event_constraint(c, x86_pmu.pebs_constraints) {
-			if ((event->hw.config & c->cmask) == c->code) {
+			if (constraint_match(c, event->hw.config)) {
 				event->hw.flags |= c->flags;
 				return c;
 			}
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 132130f00c36..c1762b50fc4c 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -49,11 +49,12 @@ struct event_constraint {
 		unsigned long	idxmsk[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 		u64		idxmsk64;
 	};
-	u64	code;
-	u64	cmask;
-	int	weight;
-	int	overlap;
-	int	flags;
+	u64		code;
+	u64		cmask;
+	int		weight;
+	int		overlap;
+	int		flags;
+	unsigned int	size;
 };
 /*
  * struct hw_perf_event.flags flags
@@ -71,6 +72,10 @@ struct event_constraint {
 #define PERF_X86_EVENT_AUTO_RELOAD	0x0400 /* use PEBS auto-reload */
 #define PERF_X86_EVENT_LARGE_PEBS	0x0800 /* use large PEBS */
 
+static inline bool constraint_match(struct event_constraint *c, u64 ecode)
+{
+	return ((ecode & c->cmask) - c->code) <= (u64)c->size;
+}
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -263,18 +268,25 @@ struct cpu_hw_events {
 	void				*kfree_on_online[X86_PERF_KFREE_MAX];
 };
 
-#define __EVENT_CONSTRAINT(c, n, m, w, o, f) {\
+#define __EVENT_CONSTRAINT_RANGE(c, e, n, m, w, o, f) {	\
 	{ .idxmsk64 = (n) },		\
 	.code = (c),			\
+	.size = (e) - (c),		\
 	.cmask = (m),			\
 	.weight = (w),			\
 	.overlap = (o),			\
 	.flags = f,			\
 }
 
+#define __EVENT_CONSTRAINT(c, n, m, w, o, f) \
+	__EVENT_CONSTRAINT_RANGE(c, c, n, m, w, o, f)
+
 #define EVENT_CONSTRAINT(c, n, m)	\
 	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0, 0)
 
+#define EVENT_CONSTRAINT_RANGE(c, e, n, m) \
+	__EVENT_CONSTRAINT_RANGE(c, e, n, m, HWEIGHT(n), 0, 0)
+
 #define INTEL_EXCLEVT_CONSTRAINT(c, n)	\
 	__EVENT_CONSTRAINT(c, n, ARCH_PERFMON_EVENTSEL_EVENT, HWEIGHT(n),\
 			   0, PERF_X86_EVENT_EXCL)
@@ -309,6 +321,12 @@ struct cpu_hw_events {
 #define INTEL_EVENT_CONSTRAINT(c, n)	\
 	EVENT_CONSTRAINT(c, n, ARCH_PERFMON_EVENTSEL_EVENT)
 
+/*
+ * Constraint on a range of Event codes
+ */
+#define INTEL_EVENT_CONSTRAINT_RANGE(c, e, n)			\
+	EVENT_CONSTRAINT_RANGE(c, e, n, ARCH_PERFMON_EVENTSEL_EVENT)
+
 /*
  * Constraint on the Event code + UMask + fixed-mask
  *
@@ -356,6 +374,9 @@ struct cpu_hw_events {
 #define INTEL_FLAGS_EVENT_CONSTRAINT(c, n) \
 	EVENT_CONSTRAINT(c, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS)
 
+#define INTEL_FLAGS_EVENT_CONSTRAINT_RANGE(c, e, n)			\
+	EVENT_CONSTRAINT_RANGE(c, e, n, INTEL_ARCH_EVENT_MASK|X86_ALL_EVENT_FLAGS)
+
 /* Check only flags, but allow all event/umask */
 #define INTEL_ALL_EVENT_CONSTRAINT(code, n)	\
 	EVENT_CONSTRAINT(code, n, X86_ALL_EVENT_FLAGS)
@@ -372,6 +393,11 @@ struct cpu_hw_events {
 			  ARCH_PERFMON_EVENTSEL_EVENT|X86_ALL_EVENT_FLAGS, \
 			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LD_HSW)
 
+#define INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD_RANGE(code, end, n) \
+	__EVENT_CONSTRAINT_RANGE(code, end, n,				\
+			  ARCH_PERFMON_EVENTSEL_EVENT|X86_ALL_EVENT_FLAGS, \
+			  HWEIGHT(n), 0, PERF_X86_EVENT_PEBS_LD_HSW)
+
 #define INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_XLD(code, n) \
 	__EVENT_CONSTRAINT(code, n,			\
 			  ARCH_PERFMON_EVENTSEL_EVENT|X86_ALL_EVENT_FLAGS, \
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 07/23] perf/x86/intel: Add Icelake support
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (5 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 06/23] perf/x86: Support constraint ranges kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 08/23] perf/x86/intel/cstate: " kan.liang
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Add Icelake core PMU perf code, including constraint tables and the main
enable code.

Icelake expanded the generic counters to always 8 even with HT on, but a
range of events cannot be scheduled on the extra 4 counters.
Add new constraint ranges to describe this to the scheduler.
The number of constraints that need to be checked is larger now than
with earlier CPUs.
At some point we may need a new data structure to look them up more
efficiently than with linear search. So far it still seems to be
acceptable however.

Icelake added a new fixed counter SLOTS. Full support for it is added
later in the patch series.

The cache events table is identical to Skylake.

Compare to PEBS instruction event on generic counter, fixed counter 0
has less skid. Force instruction:ppp always in fixed counter 0.

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V1:
- Add x86_pmu.has_xmm_regs = true;

 arch/x86/events/intel/core.c      | 112 ++++++++++++++++++++++++++++++
 arch/x86/events/intel/ds.c        |  26 ++++++-
 arch/x86/events/perf_event.h      |   2 +
 arch/x86/include/asm/intel_ds.h   |   2 +-
 arch/x86/include/asm/perf_event.h |   2 +-
 5 files changed, 140 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index d5d796e114a1..ef95d73ef4f0 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -239,6 +239,35 @@ static struct extra_reg intel_skl_extra_regs[] __read_mostly = {
 	EVENT_EXTRA_END
 };
 
+static struct event_constraint intel_icl_event_constraints[] = {
+	FIXED_EVENT_CONSTRAINT(0x00c0, 0),	/* INST_RETIRED.ANY */
+	INTEL_UEVENT_CONSTRAINT(0x1c0, 0),	/* INST_RETIRED.PREC_DIST */
+	FIXED_EVENT_CONSTRAINT(0x003c, 1),	/* CPU_CLK_UNHALTED.CORE */
+	FIXED_EVENT_CONSTRAINT(0x0300, 2),	/* CPU_CLK_UNHALTED.REF */
+	FIXED_EVENT_CONSTRAINT(0x0400, 3),	/* SLOTS */
+	INTEL_EVENT_CONSTRAINT_RANGE(0x03, 0x0a, 0xf),
+	INTEL_EVENT_CONSTRAINT_RANGE(0x1f, 0x28, 0xf),
+	INTEL_EVENT_CONSTRAINT(0x32, 0xf),	/* SW_PREFETCH_ACCESS.* */
+	INTEL_EVENT_CONSTRAINT_RANGE(0x48, 0x54, 0xf),
+	INTEL_EVENT_CONSTRAINT_RANGE(0x60, 0x8b, 0xf),
+	INTEL_UEVENT_CONSTRAINT(0x04a3, 0xff),  /* CYCLE_ACTIVITY.STALLS_TOTAL */
+	INTEL_UEVENT_CONSTRAINT(0x10a3, 0xff),  /* CYCLE_ACTIVITY.STALLS_MEM_ANY */
+	INTEL_EVENT_CONSTRAINT(0xa3, 0xf),      /* CYCLE_ACTIVITY.* */
+	INTEL_EVENT_CONSTRAINT_RANGE(0xa8, 0xb0, 0xf),
+	INTEL_EVENT_CONSTRAINT_RANGE(0xb7, 0xbd, 0xf),
+	INTEL_EVENT_CONSTRAINT_RANGE(0xd0, 0xe6, 0xf),
+	INTEL_EVENT_CONSTRAINT_RANGE(0xf0, 0xf4, 0xf),
+	EVENT_CONSTRAINT_END
+};
+
+static struct extra_reg intel_icl_extra_regs[] __read_mostly = {
+	INTEL_UEVENT_EXTRA_REG(0x01b7, MSR_OFFCORE_RSP_0, 0x3fffff9fffull, RSP_0),
+	INTEL_UEVENT_EXTRA_REG(0x01bb, MSR_OFFCORE_RSP_1, 0x3fffff9fffull, RSP_1),
+	INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
+	INTEL_UEVENT_EXTRA_REG(0x01c6, MSR_PEBS_FRONTEND, 0x7fff17, FE),
+	EVENT_EXTRA_END
+};
+
 EVENT_ATTR_STR(mem-loads,	mem_ld_nhm,	"event=0x0b,umask=0x10,ldlat=3");
 EVENT_ATTR_STR(mem-loads,	mem_ld_snb,	"event=0xcd,umask=0x1,ldlat=3");
 EVENT_ATTR_STR(mem-stores,	mem_st_snb,	"event=0xcd,umask=0x2");
@@ -3366,6 +3395,9 @@ static struct event_constraint counter0_constraint =
 static struct event_constraint counter2_constraint =
 			EVENT_CONSTRAINT(0, 0x4, 0);
 
+static struct event_constraint fixed_counter0_constraint =
+			FIXED_EVENT_CONSTRAINT(0x00c0, 0);
+
 static struct event_constraint *
 hsw_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 			  struct perf_event *event)
@@ -3384,6 +3416,21 @@ hsw_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 	return c;
 }
 
+static struct event_constraint *
+icl_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
+			  struct perf_event *event)
+{
+	/*
+	 * Fixed counter 0 has less skid.
+	 * Force instruction:ppp in Fixed counter 0
+	 */
+	if ((event->attr.precise_ip == 3) &&
+	    ((event->hw.config & X86_RAW_EVENT_MASK) == 0x00c0))
+		return &fixed_counter0_constraint;
+
+	return hsw_get_event_constraints(cpuc, idx, event);
+}
+
 static struct event_constraint *
 glp_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 			  struct perf_event *event)
@@ -4110,6 +4157,42 @@ static struct attribute *hsw_tsx_events_attrs[] = {
 	NULL
 };
 
+EVENT_ATTR_STR(tx-capacity-read,  tx_capacity_read,  "event=0x54,umask=0x80");
+EVENT_ATTR_STR(tx-capacity-write, tx_capacity_write, "event=0x54,umask=0x2");
+EVENT_ATTR_STR(el-capacity-read,  el_capacity_read,  "event=0x54,umask=0x80");
+EVENT_ATTR_STR(el-capacity-write, el_capacity_write, "event=0x54,umask=0x2");
+
+static struct attribute *icl_events_attrs[] = {
+	EVENT_PTR(mem_ld_hsw),
+	EVENT_PTR(mem_st_hsw),
+	NULL,
+};
+
+static struct attribute *icl_tsx_events_attrs[] = {
+	EVENT_PTR(tx_start),
+	EVENT_PTR(tx_abort),
+	EVENT_PTR(tx_commit),
+	EVENT_PTR(tx_capacity_read),
+	EVENT_PTR(tx_capacity_write),
+	EVENT_PTR(tx_conflict),
+	EVENT_PTR(el_start),
+	EVENT_PTR(el_abort),
+	EVENT_PTR(el_commit),
+	EVENT_PTR(el_capacity_read),
+	EVENT_PTR(el_capacity_write),
+	EVENT_PTR(el_conflict),
+	EVENT_PTR(cycles_t),
+	EVENT_PTR(cycles_ct),
+	NULL,
+};
+
+static __init struct attribute **get_icl_events_attrs(void)
+{
+	return boot_cpu_has(X86_FEATURE_RTM) ?
+		merge_attr(icl_events_attrs, icl_tsx_events_attrs) :
+		icl_events_attrs;
+}
+
 static ssize_t freeze_on_smi_show(struct device *cdev,
 				  struct device_attribute *attr,
 				  char *buf)
@@ -4697,6 +4780,35 @@ __init int intel_pmu_init(void)
 		name = "skylake";
 		break;
 
+	case INTEL_FAM6_ICELAKE_MOBILE:
+		x86_pmu.late_ack = true;
+		memcpy(hw_cache_event_ids, skl_hw_cache_event_ids, sizeof(hw_cache_event_ids));
+		memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
+		hw_cache_event_ids[C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)] = -1;
+		intel_pmu_lbr_init_skl();
+
+		x86_pmu.event_constraints = intel_icl_event_constraints;
+		x86_pmu.pebs_constraints = intel_icl_pebs_event_constraints;
+		x86_pmu.extra_regs = intel_icl_extra_regs;
+		x86_pmu.pebs_aliases = NULL;
+		x86_pmu.pebs_prec_dist = true;
+		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
+		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
+
+		x86_pmu.hw_config = hsw_hw_config;
+		x86_pmu.get_event_constraints = icl_get_event_constraints;
+		extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
+			hsw_format_attr : nhm_format_attr;
+		extra_attr = merge_attr(extra_attr, skl_format_attr);
+		x86_pmu.cpu_events = get_icl_events_attrs();
+		x86_pmu.force_gpr_event = 0x2ca;
+		x86_pmu.lbr_pt_coexist = true;
+		x86_pmu.has_xmm_regs = true;
+		intel_pmu_pebs_data_source_skl(false);
+		pr_cont("Icelake events, ");
+		name = "icelake";
+		break;
+
 	default:
 		switch (x86_pmu.version) {
 		case 1:
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 04900f393f4e..22972279b034 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -849,6 +849,26 @@ struct event_constraint intel_skl_pebs_event_constraints[] = {
 	EVENT_CONSTRAINT_END
 };
 
+struct event_constraint intel_icl_pebs_event_constraints[] = {
+	INTEL_FLAGS_UEVENT_CONSTRAINT(0x1c0, 0x100000000ULL),	/* INST_RETIRED.PREC_DIST */
+	INTEL_FLAGS_UEVENT_CONSTRAINT(0x0400, 0x400000000ULL),	/* SLOTS */
+
+	INTEL_PLD_CONSTRAINT(0x1cd, 0xff),		/* MEM_TRANS_RETIRED.LOAD_LATENCY */
+	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_LD(0x1d0, 0xf),  /* MEM_INST_RETIRED.LOAD */
+	INTEL_FLAGS_UEVENT_CONSTRAINT_DATALA_ST(0x2d0, 0xf),  /* MEM_INST_RETIRED.STORE */
+
+	INTEL_FLAGS_EVENT_CONSTRAINT_DATALA_LD_RANGE(0xd1, 0xd4, 0xf), /* MEM_LOAD_*_RETIRED.* */
+
+	INTEL_FLAGS_EVENT_CONSTRAINT(0xd0, 0xf), 	     /* MEM_INST_RETIRED.* */
+
+	/*
+	 * Everything else is handled by PMU_FL_PEBS_ALL, because we
+	 * need the full constraints from the main table.
+	 */
+
+	EVENT_CONSTRAINT_END
+};
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event)
 {
 	struct event_constraint *c;
@@ -1056,7 +1076,8 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 
 	cpuc->pebs_enabled |= 1ULL << hwc->idx;
 
-	if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
+	if ((event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) &&
+	    (x86_pmu.version < 5))
 		cpuc->pebs_enabled |= 1ULL << (hwc->idx + 32);
 	else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
 		cpuc->pebs_enabled |= 1ULL << 63;
@@ -1108,7 +1129,8 @@ void intel_pmu_pebs_disable(struct perf_event *event)
 
 	cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
 
-	if (event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT)
+	if ((event->hw.flags & PERF_X86_EVENT_PEBS_LDLAT) &&
+	    (x86_pmu.version < 5))
 		cpuc->pebs_enabled &= ~(1ULL << (hwc->idx + 32));
 	else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
 		cpuc->pebs_enabled &= ~(1ULL << 63);
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index c1762b50fc4c..58774373f410 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -982,6 +982,8 @@ extern struct event_constraint intel_bdw_pebs_event_constraints[];
 
 extern struct event_constraint intel_skl_pebs_event_constraints[];
 
+extern struct event_constraint intel_icl_pebs_event_constraints[];
+
 struct event_constraint *intel_pebs_constraints(struct perf_event *event);
 
 void intel_pmu_pebs_add(struct perf_event *event);
diff --git a/arch/x86/include/asm/intel_ds.h b/arch/x86/include/asm/intel_ds.h
index ae26df1c2789..8380c3ddd4b2 100644
--- a/arch/x86/include/asm/intel_ds.h
+++ b/arch/x86/include/asm/intel_ds.h
@@ -8,7 +8,7 @@
 
 /* The maximal number of PEBS events: */
 #define MAX_PEBS_EVENTS		8
-#define MAX_FIXED_PEBS_EVENTS	3
+#define MAX_FIXED_PEBS_EVENTS	4
 
 /*
  * A debug store configuration.
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index fd0ec2699213..dcb8bac6fddb 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -7,7 +7,7 @@
  */
 
 #define INTEL_PMC_MAX_GENERIC				       32
-#define INTEL_PMC_MAX_FIXED					3
+#define INTEL_PMC_MAX_FIXED					4
 #define INTEL_PMC_IDX_FIXED				       32
 
 #define X86_PMC_IDX_MAX					       64
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 08/23] perf/x86/intel/cstate: Add Icelake support
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (6 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 07/23] perf/x86/intel: Add Icelake support kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 09/23] perf/x86/intel/rapl: " kan.liang
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Icelake uses the same C-state residency events as Sandy Bridge.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/intel/cstate.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 94a4b7fc75d0..dd5658ec31d5 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -578,6 +578,8 @@ static const struct x86_cpu_id intel_cstates_match[] __initconst = {
 	X86_CSTATES_MODEL(INTEL_FAM6_ATOM_GOLDMONT_X, glm_cstates),
 
 	X86_CSTATES_MODEL(INTEL_FAM6_ATOM_GOLDMONT_PLUS, glm_cstates),
+
+	X86_CSTATES_MODEL(INTEL_FAM6_ICELAKE_MOBILE, snb_cstates),
 	{ },
 };
 MODULE_DEVICE_TABLE(x86cpu, intel_cstates_match);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 09/23] perf/x86/intel/rapl: Add Icelake support
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (7 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 08/23] perf/x86/intel/cstate: " kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 10/23] perf/x86/msr: " kan.liang
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Icelake support the same RAPL counters as Skylake.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/intel/rapl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 94dc564146ca..37ebf6fc5415 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -775,6 +775,8 @@ static const struct x86_cpu_id rapl_cpu_match[] __initconst = {
 	X86_RAPL_MODEL_MATCH(INTEL_FAM6_ATOM_GOLDMONT_X, hsw_rapl_init),
 
 	X86_RAPL_MODEL_MATCH(INTEL_FAM6_ATOM_GOLDMONT_PLUS, hsw_rapl_init),
+
+	X86_RAPL_MODEL_MATCH(INTEL_FAM6_ICELAKE_MOBILE,  skl_rapl_init),
 	{},
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 10/23] perf/x86/msr: Add Icelake support
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (8 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 09/23] perf/x86/intel/rapl: " kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 11/23] perf/x86/intel/uncore: Add Intel Icelake uncore support kan.liang
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Icelake is the same as the existing Skylake parts.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/msr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c
index a878e6286e4a..f3f4c2263501 100644
--- a/arch/x86/events/msr.c
+++ b/arch/x86/events/msr.c
@@ -89,6 +89,7 @@ static bool test_intel(int idx)
 	case INTEL_FAM6_SKYLAKE_X:
 	case INTEL_FAM6_KABYLAKE_MOBILE:
 	case INTEL_FAM6_KABYLAKE_DESKTOP:
+	case INTEL_FAM6_ICELAKE_MOBILE:
 		if (idx == PERF_MSR_SMI || idx == PERF_MSR_PPERF)
 			return true;
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 11/23] perf/x86/intel/uncore: Add Intel Icelake uncore support
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (9 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 10/23] perf/x86/msr: " kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 12/23] perf/core: Support a REMOVE transaction kan.liang
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Add Intel Icelake uncore support,
 - The init code is based on Skylake
 - Add new pci id for IMC
 - New MSR address for CBOX
 - Get CBOX# from CNL_UNC_CBO_CONFIG MSR directly
 - Create a new PMU for fixed clocktick counter

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/intel/uncore.c     |  6 ++
 arch/x86/events/intel/uncore.h     |  1 +
 arch/x86/events/intel/uncore_snb.c | 91 ++++++++++++++++++++++++++++++
 3 files changed, 98 insertions(+)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 9fe64c01a2e5..fc40a1473058 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1367,6 +1367,11 @@ static const struct intel_uncore_init_fun skx_uncore_init __initconst = {
 	.pci_init = skx_uncore_pci_init,
 };
 
+static const struct intel_uncore_init_fun icl_uncore_init __initconst = {
+	.cpu_init = icl_uncore_cpu_init,
+	.pci_init = skl_uncore_pci_init,
+};
+
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
 	X86_UNCORE_MODEL_MATCH(INTEL_FAM6_NEHALEM_EP,	  nhm_uncore_init),
 	X86_UNCORE_MODEL_MATCH(INTEL_FAM6_NEHALEM,	  nhm_uncore_init),
@@ -1393,6 +1398,7 @@ static const struct x86_cpu_id intel_uncore_match[] __initconst = {
 	X86_UNCORE_MODEL_MATCH(INTEL_FAM6_SKYLAKE_X,      skx_uncore_init),
 	X86_UNCORE_MODEL_MATCH(INTEL_FAM6_KABYLAKE_MOBILE, skl_uncore_init),
 	X86_UNCORE_MODEL_MATCH(INTEL_FAM6_KABYLAKE_DESKTOP, skl_uncore_init),
+	X86_UNCORE_MODEL_MATCH(INTEL_FAM6_ICELAKE_MOBILE, icl_uncore_init),
 	{},
 };
 
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 853a49a8ccf6..79eb2e21e4f0 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -512,6 +512,7 @@ int skl_uncore_pci_init(void);
 void snb_uncore_cpu_init(void);
 void nhm_uncore_cpu_init(void);
 void skl_uncore_cpu_init(void);
+void icl_uncore_cpu_init(void);
 int snb_pci2phy_map_init(int devid);
 
 /* uncore_snbep.c */
diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index 13493f43b247..f8431819b3e1 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -34,6 +34,8 @@
 #define PCI_DEVICE_ID_INTEL_CFL_4S_S_IMC	0x3e33
 #define PCI_DEVICE_ID_INTEL_CFL_6S_S_IMC	0x3eca
 #define PCI_DEVICE_ID_INTEL_CFL_8S_S_IMC	0x3e32
+#define PCI_DEVICE_ID_INTEL_ICL_U_IMC		0x8a02
+#define PCI_DEVICE_ID_INTEL_ICL_U2_IMC		0x8a12
 
 /* SNB event control */
 #define SNB_UNC_CTL_EV_SEL_MASK			0x000000ff
@@ -93,6 +95,12 @@
 #define SKL_UNC_PERF_GLOBAL_CTL			0xe01
 #define SKL_UNC_GLOBAL_CTL_CORE_ALL		((1 << 5) - 1)
 
+/* ICL Cbo register */
+#define ICL_UNC_CBO_CONFIG			0x396
+#define ICL_UNC_NUM_CBO_MASK			0xf
+#define ICL_UNC_CBO_0_PER_CTR0			0x702
+#define ICL_UNC_CBO_MSR_OFFSET			0x8
+
 DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
 DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15");
 DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
@@ -280,6 +288,70 @@ void skl_uncore_cpu_init(void)
 	snb_uncore_arb.ops = &skl_uncore_msr_ops;
 }
 
+static struct intel_uncore_type icl_uncore_cbox = {
+	.name		= "cbox",
+	.num_counters   = 4,
+	.perf_ctr_bits	= 44,
+	.perf_ctr	= ICL_UNC_CBO_0_PER_CTR0,
+	.event_ctl	= SNB_UNC_CBO_0_PERFEVTSEL0,
+	.event_mask	= SNB_UNC_RAW_EVENT_MASK,
+	.msr_offset	= ICL_UNC_CBO_MSR_OFFSET,
+	.ops		= &skl_uncore_msr_ops,
+	.format_group	= &snb_uncore_format_group,
+};
+
+static struct uncore_event_desc icl_uncore_events[] = {
+	INTEL_UNCORE_EVENT_DESC(clockticks, "event=0xff"),
+	{ /* end: all zeroes */ },
+};
+
+static struct attribute *icl_uncore_clock_formats_attr[] = {
+	&format_attr_event.attr,
+	NULL,
+};
+
+static struct attribute_group icl_uncore_clock_format_group = {
+	.name = "format",
+	.attrs = icl_uncore_clock_formats_attr,
+};
+
+static struct intel_uncore_type icl_uncore_clockbox = {
+	.name		= "clock",
+	.num_counters	= 1,
+	.num_boxes	= 1,
+	.fixed_ctr_bits	= 48,
+	.fixed_ctr	= SNB_UNC_FIXED_CTR,
+	.fixed_ctl	= SNB_UNC_FIXED_CTR_CTRL,
+	.single_fixed	= 1,
+	.event_mask	= SNB_UNC_CTL_EV_SEL_MASK,
+	.format_group	= &icl_uncore_clock_format_group,
+	.ops		= &skl_uncore_msr_ops,
+	.event_descs	= icl_uncore_events,
+};
+
+static struct intel_uncore_type *icl_msr_uncores[] = {
+	&icl_uncore_cbox,
+	&snb_uncore_arb,
+	&icl_uncore_clockbox,
+	NULL,
+};
+
+static int icl_get_cbox_num(void)
+{
+	u64 num_boxes;
+
+	rdmsrl(ICL_UNC_CBO_CONFIG, num_boxes);
+
+	return num_boxes & ICL_UNC_NUM_CBO_MASK;
+}
+
+void icl_uncore_cpu_init(void)
+{
+	uncore_msr_uncores = icl_msr_uncores;
+	icl_uncore_cbox.num_boxes = icl_get_cbox_num();
+	snb_uncore_arb.ops = &skl_uncore_msr_ops;
+}
+
 enum {
 	SNB_PCI_UNCORE_IMC,
 };
@@ -668,6 +740,18 @@ static const struct pci_device_id skl_uncore_pci_ids[] = {
 	{ /* end: all zeroes */ },
 };
 
+static const struct pci_device_id icl_uncore_pci_ids[] = {
+	{ /* IMC */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICL_U_IMC),
+		.driver_data = UNCORE_PCI_DEV_DATA(SNB_PCI_UNCORE_IMC, 0),
+	},
+	{ /* IMC */
+		PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICL_U2_IMC),
+		.driver_data = UNCORE_PCI_DEV_DATA(SNB_PCI_UNCORE_IMC, 0),
+	},
+	{ /* end: all zeroes */ },
+};
+
 static struct pci_driver snb_uncore_pci_driver = {
 	.name		= "snb_uncore",
 	.id_table	= snb_uncore_pci_ids,
@@ -693,6 +777,11 @@ static struct pci_driver skl_uncore_pci_driver = {
 	.id_table	= skl_uncore_pci_ids,
 };
 
+static struct pci_driver icl_uncore_pci_driver = {
+	.name		= "icl_uncore",
+	.id_table	= icl_uncore_pci_ids,
+};
+
 struct imc_uncore_pci_dev {
 	__u32 pci_id;
 	struct pci_driver *driver;
@@ -732,6 +821,8 @@ static const struct imc_uncore_pci_dev desktop_imc_pci_ids[] = {
 	IMC_DEV(CFL_4S_S_IMC, &skl_uncore_pci_driver),  /* 8th Gen Core S 4 Cores Server */
 	IMC_DEV(CFL_6S_S_IMC, &skl_uncore_pci_driver),  /* 8th Gen Core S 6 Cores Server */
 	IMC_DEV(CFL_8S_S_IMC, &skl_uncore_pci_driver),  /* 8th Gen Core S 8 Cores Server */
+	IMC_DEV(ICL_U_IMC, &icl_uncore_pci_driver),	/* 10th Gen Core Mobile */
+	IMC_DEV(ICL_U2_IMC, &icl_uncore_pci_driver),	/* 10th Gen Core Mobile */
 	{  /* end marker */ }
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 12/23] perf/core: Support a REMOVE transaction
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (10 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 11/23] perf/x86/intel/uncore: Add Intel Icelake uncore support kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 13/23] perf/x86/intel: Basic support for metrics counters kan.liang
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

The TopDown events can be collected per thread/process on Icelake. To
use TopDown through RDPMC in applications, the metrics and slots MSR
values have to be saved/restored during context switching.
It is useful to have a remove transaction when the counter is
unscheduled, so that the values can be saved correctly.
Add a remove transaction to the perf core.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V1:
- Add more explanation in change log

 arch/x86/events/core.c     | 3 +--
 include/linux/perf_event.h | 1 +
 kernel/events/core.c       | 5 +++++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 9378c6b2128f..9c14b4b3e457 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1912,8 +1912,7 @@ static inline void x86_pmu_read(struct perf_event *event)
  * Set the flag to make pmu::enable() not perform the
  * schedulability test, it will be performed at commit time
  *
- * We only support PERF_PMU_TXN_ADD transactions. Save the
- * transaction flags but otherwise ignore non-PERF_PMU_TXN_ADD
+ * Save the transaction flags and ignore non-PERF_PMU_TXN_ADD
  * transactions.
  */
 static void x86_pmu_start_txn(struct pmu *pmu, unsigned int txn_flags)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e47ef764f613..fb258c171b2c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -233,6 +233,7 @@ struct perf_event;
  */
 #define PERF_PMU_TXN_ADD  0x1		/* txn to add/schedule event on PMU */
 #define PERF_PMU_TXN_READ 0x2		/* txn to read event group from PMU */
+#define PERF_PMU_TXN_REMOVE 0x4		/* txn to remove event on PMU */
 
 /**
  * pmu::capabilities flags
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 1032a16bd186..dea8cfe2a891 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2032,6 +2032,7 @@ group_sched_out(struct perf_event *group_event,
 		struct perf_cpu_context *cpuctx,
 		struct perf_event_context *ctx)
 {
+	struct pmu *pmu = ctx->pmu;
 	struct perf_event *event;
 
 	if (group_event->state != PERF_EVENT_STATE_ACTIVE)
@@ -2039,6 +2040,8 @@ group_sched_out(struct perf_event *group_event,
 
 	perf_pmu_disable(ctx->pmu);
 
+	pmu->start_txn(pmu, PERF_PMU_TXN_REMOVE);
+
 	event_sched_out(group_event, cpuctx, ctx);
 
 	/*
@@ -2051,6 +2054,8 @@ group_sched_out(struct perf_event *group_event,
 
 	if (group_event->attr.exclusive)
 		cpuctx->exclusive = 0;
+
+	pmu->commit_txn(pmu);
 }
 
 #define DETACH_GROUP	0x01UL
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 13/23] perf/x86/intel: Basic support for metrics counters
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (11 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 12/23] perf/core: Support a REMOVE transaction kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 14/23] perf/x86/intel: Support overflows on SLOTS kan.liang
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Metrics counters (hardware counters containing multiple metrics)
are modelled as separate registers for each sub-event, with an
extra reg being used for coordinating access to the underlying
register in the scheduler.

This patch adds the basic infrastructure to separate the scheduler
register indexes from the actual hardware register indexes. In
most cases the MSR address is already used correctly, but for
code using indexes we need a separate reg_idx field in the event
to indicate the correct underlying register.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/core.c            | 18 ++++++++++++++++--
 arch/x86/events/intel/core.c      | 29 ++++++++++++++++++++---------
 arch/x86/events/perf_event.h      | 15 +++++++++++++++
 arch/x86/include/asm/msr-index.h  |  1 +
 arch/x86/include/asm/perf_event.h | 30 ++++++++++++++++++++++++++++++
 include/linux/perf_event.h        |  1 +
 6 files changed, 83 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 9c14b4b3e457..d24f8d009529 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1006,16 +1006,30 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 	struct hw_perf_event *hwc = &event->hw;
 
 	hwc->idx = cpuc->assign[i];
+	hwc->reg_idx = hwc->idx;
 	hwc->last_cpu = smp_processor_id();
 	hwc->last_tag = ++cpuc->tags[i];
 
+	/*
+	 * Metrics counters use different indexes in the scheduler
+	 * versus the hardware.
+	 *
+	 * Map metrics to fixed counter 3 (which is the base count),
+	 * but the update event callback reads the extra metric register
+	 * and converts to the right metric.
+	 */
+	if (is_metric_idx(hwc->idx))
+		hwc->reg_idx = INTEL_PMC_IDX_FIXED_SLOTS;
+
 	if (hwc->idx == INTEL_PMC_IDX_FIXED_BTS) {
 		hwc->config_base = 0;
 		hwc->event_base	= 0;
 	} else if (hwc->idx >= INTEL_PMC_IDX_FIXED) {
 		hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
-		hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 + (hwc->idx - INTEL_PMC_IDX_FIXED);
-		hwc->event_base_rdpmc = (hwc->idx - INTEL_PMC_IDX_FIXED) | 1<<30;
+		hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 +
+			(hwc->reg_idx - INTEL_PMC_IDX_FIXED);
+		hwc->event_base_rdpmc = (hwc->reg_idx - INTEL_PMC_IDX_FIXED)
+			| 1<<30;
 	} else {
 		hwc->config_base = x86_pmu_config_addr(hwc->idx);
 		hwc->event_base  = x86_pmu_event_addr(hwc->idx);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ef95d73ef4f0..5c8f0df137bc 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2090,7 +2090,7 @@ static inline void intel_pmu_ack_status(u64 ack)
 
 static void intel_pmu_disable_fixed(struct hw_perf_event *hwc)
 {
-	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
+	int idx = hwc->reg_idx - INTEL_PMC_IDX_FIXED;
 	u64 ctrl_val, mask;
 
 	mask = 0xfULL << (idx * 4);
@@ -2116,9 +2116,19 @@ static void intel_pmu_disable_event(struct perf_event *event)
 		return;
 	}
 
-	cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx);
-	cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx);
-	cpuc->intel_cp_status &= ~(1ull << hwc->idx);
+	__clear_bit(hwc->idx, cpuc->enabled_events);
+
+	/*
+	 * When any other slots sharing event is still enabled,
+	 * cancel the disabling.
+	 */
+	if (is_any_slots_idx(hwc->idx) &&
+	    (*(u64 *)&cpuc->enabled_events & INTEL_PMC_MSK_ANY_SLOTS))
+		return;
+
+	cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->reg_idx);
+	cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->reg_idx);
+	cpuc->intel_cp_status &= ~(1ull << hwc->reg_idx);
 
 	if (unlikely(event->attr.precise_ip))
 		intel_pmu_pebs_disable(event);
@@ -2150,7 +2160,7 @@ static void intel_pmu_read_event(struct perf_event *event)
 static void intel_pmu_enable_fixed(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
+	int idx = hwc->reg_idx - INTEL_PMC_IDX_FIXED;
 	u64 ctrl_val, mask, bits = 0;
 
 	/*
@@ -2194,18 +2204,19 @@ static void intel_pmu_enable_event(struct perf_event *event)
 	}
 
 	if (event->attr.exclude_host)
-		cpuc->intel_ctrl_guest_mask |= (1ull << hwc->idx);
+		cpuc->intel_ctrl_guest_mask |= (1ull << hwc->reg_idx);
 	if (event->attr.exclude_guest)
-		cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);
+		cpuc->intel_ctrl_host_mask |= (1ull << hwc->reg_idx);
 
 	if (unlikely(event_is_checkpointed(event)))
-		cpuc->intel_cp_status |= (1ull << hwc->idx);
+		cpuc->intel_cp_status |= (1ull << hwc->reg_idx);
 
 	if (unlikely(event->attr.precise_ip))
 		intel_pmu_pebs_enable(event);
 
 	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
-		intel_pmu_enable_fixed(event);
+		if (!__test_and_set_bit(hwc->idx, cpuc->enabled_events))
+			intel_pmu_enable_fixed(event);
 		return;
 	}
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 58774373f410..75a75400aae7 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -185,6 +185,7 @@ struct cpu_hw_events {
 	unsigned long		active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	unsigned long		running[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	int			enabled;
+	unsigned long		enabled_events[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 
 	int			n_events; /* the # of events in the below arrays */
 	int			n_added;  /* the # last events in the below arrays;
@@ -344,6 +345,20 @@ struct cpu_hw_events {
 #define FIXED_EVENT_CONSTRAINT(c, n)	\
 	EVENT_CONSTRAINT(c, (1ULL << (32+n)), FIXED_EVENT_FLAGS)
 
+/*
+ * Special metric counters do not actually exist, but get remapped
+ * to a combination of FxCtr3 + MSR_PERF_METRICS
+ *
+ * This allocates them to a dummy offset for the scheduler.
+ * This does not allow sharing of multiple users of the same
+ * metric without multiplexing, even though the hardware supports that
+ * in principle.
+ */
+
+#define METRIC_EVENT_CONSTRAINT(c, n)					\
+	EVENT_CONSTRAINT(c, (1ULL << (INTEL_PMC_IDX_FIXED_METRIC_BASE+n)), \
+			 FIXED_EVENT_FLAGS)
+
 /*
  * Constraint on the Event code + UMask
  */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1378518cf63f..4310477d6808 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -777,6 +777,7 @@
 #define MSR_CORE_PERF_FIXED_CTR0	0x00000309
 #define MSR_CORE_PERF_FIXED_CTR1	0x0000030a
 #define MSR_CORE_PERF_FIXED_CTR2	0x0000030b
+#define MSR_CORE_PERF_FIXED_CTR3	0x0000030c
 #define MSR_CORE_PERF_FIXED_CTR_CTRL	0x0000038d
 #define MSR_CORE_PERF_GLOBAL_STATUS	0x0000038e
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index dcb8bac6fddb..4a59d35a8e1d 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -166,6 +166,10 @@ struct x86_pmu_capability {
 #define INTEL_PMC_IDX_FIXED_REF_CYCLES	(INTEL_PMC_IDX_FIXED + 2)
 #define INTEL_PMC_MSK_FIXED_REF_CYCLES	(1ULL << INTEL_PMC_IDX_FIXED_REF_CYCLES)
 
+#define MSR_ARCH_PERFMON_FIXED_CTR3	0x30c
+#define INTEL_PMC_IDX_FIXED_SLOTS	(INTEL_PMC_IDX_FIXED + 3)
+#define INTEL_PMC_MSK_FIXED_SLOTS	(1ULL << INTEL_PMC_IDX_FIXED_SLOTS)
+
 /*
  * We model BTS tracing as another fixed-mode PMC.
  *
@@ -175,6 +179,32 @@ struct x86_pmu_capability {
  */
 #define INTEL_PMC_IDX_FIXED_BTS				(INTEL_PMC_IDX_FIXED + 16)
 
+/*
+ * We model PERF_METRICS as more magic fixed-mode PMCs, one for each metric
+ * and another for the whole slots counter
+ *
+ * Internally they all map to Fixed Ctr 3 (SLOTS), and allocate PERF_METRICS
+ * as an extra_reg. PERF_METRICS has no own configuration, but we fill in
+ * the configuration of FxCtr3 to enforce that all the shared users of SLOTS
+ * have the same configuration.
+ */
+#define INTEL_PMC_IDX_FIXED_METRIC_BASE		(INTEL_PMC_IDX_FIXED + 17)
+#define INTEL_PMC_IDX_TD_RETIRING		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 0)
+#define INTEL_PMC_IDX_TD_BAD_SPEC		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 1)
+#define INTEL_PMC_IDX_TD_FE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 2)
+#define INTEL_PMC_IDX_TD_BE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 3)
+#define INTEL_PMC_MSK_ANY_SLOTS			((0xfull << INTEL_PMC_IDX_FIXED_METRIC_BASE) | \
+						 INTEL_PMC_MSK_FIXED_SLOTS)
+static inline bool is_metric_idx(int idx)
+{
+	return idx >= INTEL_PMC_IDX_FIXED_METRIC_BASE && idx <= INTEL_PMC_IDX_TD_BE_BOUND;
+}
+
+static inline bool is_any_slots_idx(int idx)
+{
+	return is_metric_idx(idx) || idx == INTEL_PMC_IDX_FIXED_SLOTS;
+}
+
 #define GLOBAL_STATUS_COND_CHG				BIT_ULL(63)
 #define GLOBAL_STATUS_BUFFER_OVF			BIT_ULL(62)
 #define GLOBAL_STATUS_UNC_OVF				BIT_ULL(61)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index fb258c171b2c..c0cbd714f10e 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -127,6 +127,7 @@ struct hw_perf_event {
 			unsigned long	event_base;
 			int		event_base_rdpmc;
 			int		idx;
+			int		reg_idx;
 			int		last_cpu;
 			int		flags;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 14/23] perf/x86/intel: Support overflows on SLOTS
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (12 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 13/23] perf/x86/intel: Basic support for metrics counters kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 15/23] perf/x86/intel: Support hardware TopDown metrics kan.liang
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

The internal counters used for the metrics can overflow. If this happens
an overflow is triggered on the SLOTS fixed counter. Add special code
that resets all the slave metric counters in this case.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/intel/core.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 5c8f0df137bc..2da822414627 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2231,12 +2231,35 @@ static void intel_pmu_add_event(struct perf_event *event)
 		intel_pmu_lbr_add(event);
 }
 
+/* When SLOTS overflowed update all the active topdown-* events */
+static void intel_pmu_update_metrics(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int idx;
+	u64 slots_events;
+
+	slots_events = *(u64 *)cpuc->enabled_events & INTEL_PMC_MSK_ANY_SLOTS;
+
+	for_each_set_bit(idx, (unsigned long *)&slots_events, 64) {
+		struct perf_event *ev = cpuc->events[idx];
+
+		if (ev == event)
+			continue;
+		x86_perf_event_update(event);
+	}
+}
+
 /*
  * Save and restart an expired event. Called by NMI contexts,
  * so it has to be careful about preempting normal event ops:
  */
 int intel_pmu_save_and_restart(struct perf_event *event)
 {
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (unlikely(hwc->reg_idx == INTEL_PMC_IDX_FIXED_SLOTS))
+		intel_pmu_update_metrics(event);
+
 	x86_perf_event_update(event);
 	/*
 	 * For a checkpointed counter always reset back to 0.  This
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 15/23] perf/x86/intel: Support hardware TopDown metrics
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (13 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 14/23] perf/x86/intel: Support overflows on SLOTS kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 16/23] perf/x86/intel: Set correct weight for topdown subevent counters kan.liang
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Intro
=====

Icelake has support for measuring the four top level TopDown metrics
directly in hardware. This is implemented by an additional "metrics"
register, and a new Fixed Counter 3 that measures pipeline "slots".

Events
======

We export four metric events as separate perf events, which map to
internal "metrics" counter register. Those events do not exist in
hardware, but can be allocated by the scheduler.

For the event mapping we use a special 0xff event code, which is
reserved for software.

When setting up such events they point to the slots counter, and a
special callback, metric_update_event(), reads the additional metrics
msr to generate the metrics. Then the metric is reported by multiplying
the metric (percentage) with slots.

This multiplication allows to easily keep a running count, for example
when the slots counter overflows, and makes all the standard tools, such
as a perf stat, work. They can do deltas of the values without needing
to know about percentages. This also simplifies accumulating the counts
of child events, which otherwise would need to know how to average
percent values.

Groups
======

To avoid reading the METRICS register multiple times, the metrics and
slots value are cached. This only works when multiple sub-events are in
the same group.

Resetting1
==========

The 8bit metrics ratio values lose precision when the measurement period
gets longer.

To avoid this we always reset the metric value when reading, as we
already accumulate the count in the perf count value.

For a long period read, low precision is acceptable.
For a short period read, the register will be reset often enough that it
is not a problem.

This implies that to read more than one submetrics always a group needs
to be used, so that the caching above still gives the correct value.

We also need to support this in the NMI, so that it's possible to
collect all top down metrics as part of leader sampling. To avoid races
with the normal transactions use a special nmi_metric cache that is only
used during the NMI.

Resetting2
==========

The PERF_METRICS may report wrong value if its delta was less than 1/255
of SLOTS (Fixed counter 3).

To avoid this, the PERF_METRICS and SLOTS registers have to be reset
simultaneously. The slots value has to be cached as well.

In counting, the -max_period is the initial value of the SLOTS. The huge
initial value will definitely trigger the issue mentioned above.
Force initial value as 0 for topdown and slots event counting.

RDPMC
=========
The TopDown events can be collected per thread/process. To use TopDown
through RDPMC in applications on Icelake, the metrics and slots values
have to be saved/restored during context switching.

Add specific set_period() to specially handle the slots and metrics
event. Because,
 - The initial value must be 0.
 - Only need to restore the value in context switch. For other cases,
   the counters have been cleared after read.

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/core.c           |  40 +++++--
 arch/x86/events/intel/core.c     | 192 +++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h     |  14 +++
 arch/x86/include/asm/msr-index.h |   2 +
 include/linux/perf_event.h       |   5 +
 5 files changed, 242 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index d24f8d009529..7d4d56f76436 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -91,16 +91,20 @@ u64 x86_perf_event_update(struct perf_event *event)
 					new_raw_count) != prev_raw_count)
 		goto again;
 
-	/*
-	 * Now we have the new raw value and have updated the prev
-	 * timestamp already. We can now calculate the elapsed delta
-	 * (event-)time and add that to the generic event.
-	 *
-	 * Careful, not all hw sign-extends above the physical width
-	 * of the count.
-	 */
-	delta = (new_raw_count << shift) - (prev_raw_count << shift);
-	delta >>= shift;
+	if (unlikely(hwc->flags & PERF_X86_EVENT_UPDATE))
+		delta = x86_pmu.metric_update_event(event, new_raw_count);
+	else {
+		/*
+		 * Now we have the new raw value and have updated the prev
+		 * timestamp already. We can now calculate the elapsed delta
+		 * (event-)time and add that to the generic event.
+		 *
+		 * Careful, not all hw sign-extends above the physical width
+		 * of the count.
+		 */
+		delta = (new_raw_count << shift) - (prev_raw_count << shift);
+		delta >>= shift;
+	}
 
 	local64_add(delta, &event->count);
 	local64_sub(delta, &hwc->period_left);
@@ -974,6 +978,10 @@ static int collect_events(struct cpu_hw_events *cpuc, struct perf_event *leader,
 
 	max_count = x86_pmu.num_counters + x86_pmu.num_counters_fixed;
 
+	/* There are 4 TopDown metrics */
+	if (x86_pmu.has_metric)
+		max_count += 4;
+
 	/* current number of events already accepted */
 	n = cpuc->n_events;
 
@@ -1157,6 +1165,9 @@ int x86_perf_event_set_period(struct perf_event *event)
 	if (idx == INTEL_PMC_IDX_FIXED_BTS)
 		return 0;
 
+	if (x86_pmu.set_period && x86_pmu.set_period(event))
+		goto update_userpage;
+
 	/*
 	 * If we are way outside a reasonable range then just skip forward:
 	 */
@@ -1205,6 +1216,7 @@ int x86_perf_event_set_period(struct perf_event *event)
 			(u64)(-left) & x86_pmu.cntval_mask);
 	}
 
+update_userpage:
 	perf_event_update_userpage(event);
 
 	return ret;
@@ -1957,6 +1969,8 @@ static void x86_pmu_cancel_txn(struct pmu *pmu)
 
 	txn_flags = cpuc->txn_flags;
 	cpuc->txn_flags = 0;
+	cpuc->txn_metric = 0;
+	cpuc->txn_slots = 0;
 	if (txn_flags & ~PERF_PMU_TXN_ADD)
 		return;
 
@@ -1984,6 +1998,8 @@ static int x86_pmu_commit_txn(struct pmu *pmu)
 
 	WARN_ON_ONCE(!cpuc->txn_flags);	/* no txn in flight */
 
+	cpuc->txn_metric = 0;
+	cpuc->txn_slots = 0;
 	if (cpuc->txn_flags & ~PERF_PMU_TXN_ADD) {
 		cpuc->txn_flags = 0;
 		return 0;
@@ -2198,7 +2214,9 @@ static int x86_pmu_event_idx(struct perf_event *event)
 	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return 0;
 
-	if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
+	if (is_metric_idx(idx))
+		idx = 1 << 29;
+	else if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
 		idx -= INTEL_PMC_IDX_FIXED;
 		idx |= 1 << 30;
 	}
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2da822414627..6a8a221dc188 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -245,6 +245,11 @@ static struct event_constraint intel_icl_event_constraints[] = {
 	FIXED_EVENT_CONSTRAINT(0x003c, 1),	/* CPU_CLK_UNHALTED.CORE */
 	FIXED_EVENT_CONSTRAINT(0x0300, 2),	/* CPU_CLK_UNHALTED.REF */
 	FIXED_EVENT_CONSTRAINT(0x0400, 3),	/* SLOTS */
+	METRIC_EVENT_CONSTRAINT(0x01ff, 0),	/* Retiring metric */
+	METRIC_EVENT_CONSTRAINT(0x02ff, 1),	/* Bad speculation metric */
+	METRIC_EVENT_CONSTRAINT(0x03ff, 2),	/* FE bound metric */
+	METRIC_EVENT_CONSTRAINT(0x04ff, 3),	/* BE bound metric */
+
 	INTEL_EVENT_CONSTRAINT_RANGE(0x03, 0x0a, 0xf),
 	INTEL_EVENT_CONSTRAINT_RANGE(0x1f, 0x28, 0xf),
 	INTEL_EVENT_CONSTRAINT(0x32, 0xf),	/* SW_PREFETCH_ACCESS.* */
@@ -265,6 +270,14 @@ static struct extra_reg intel_icl_extra_regs[] __read_mostly = {
 	INTEL_UEVENT_EXTRA_REG(0x01bb, MSR_OFFCORE_RSP_1, 0x3fffff9fffull, RSP_1),
 	INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
 	INTEL_UEVENT_EXTRA_REG(0x01c6, MSR_PEBS_FRONTEND, 0x7fff17, FE),
+	/*
+	 * PERF_METRICS does exist, but it is not configured. But we
+	 * share the original Fixed Ctr 3 from different metrics
+	 * events. So use the extra reg to enforce the same
+	 * configuration on the original register, but do not actually
+	 * write to it.
+	 */
+	INTEL_EVENT_EXTRA_REG(0xff, 0, -1L, PERF_METRICS),
 	EVENT_EXTRA_END
 };
 
@@ -2432,6 +2445,8 @@ static int intel_pmu_handle_irq_v4(struct pt_regs *regs)
 	int pmu_enabled = cpuc->enabled;
 	int loops = 0;
 
+	cpuc->nmi_metric = 0;
+	cpuc->nmi_slots = 0;
 	/* PMU has been disabled because of counter freezing */
 	cpuc->enabled = 0;
 	if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
@@ -2505,6 +2520,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	int pmu_enabled;
 
 	cpuc = this_cpu_ptr(&cpu_hw_events);
+	cpuc->nmi_metric = 0;
+	cpuc->nmi_slots = 0;
 
 	/*
 	 * Save the PMU state.
@@ -3236,6 +3253,13 @@ static int core_pmu_hw_config(struct perf_event *event)
 	return intel_pmu_bts_config(event);
 }
 
+#define EVENT_CODE(e)	(e->attr.config & INTEL_ARCH_EVENT_MASK)
+#define is_slots_event(e)	(EVENT_CODE(e) == 0x0400)
+#define is_perf_metrics_event(e)				\
+		(((EVENT_CODE(e) & 0xff) == 0xff) &&		\
+		 (EVENT_CODE(e) >= 0x01ff) &&			\
+		 (EVENT_CODE(e) <= 0x04ff))
+
 static int intel_pmu_hw_config(struct perf_event *event)
 {
 	int ret = x86_pmu_hw_config(event);
@@ -3281,6 +3305,31 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
 
+	/* Fixed Counter 3 with its metric sub events */
+	if (x86_pmu.has_metric &&
+	    (is_slots_event(event) || is_perf_metrics_event(event))) {
+		if (event->attr.config1 != 0)
+			return -EINVAL;
+		if (event->attr.config & ARCH_PERFMON_EVENTSEL_ANY)
+			return -EINVAL;
+		/*
+		 * Put configuration (minus event) into config1 so that
+		 * the scheduler enforces through an extra_reg that
+		 * all instances of the metrics events have the same
+		 * configuration.
+		 */
+		event->attr.config1 = event->hw.config & X86_ALL_EVENT_FLAGS;
+		if (is_perf_metrics_event(event)) {
+			if (!x86_pmu.intel_cap.perf_metrics)
+				return -EINVAL;
+			if (is_sampling_event(event))
+				return -EINVAL;
+		}
+		if (!is_sampling_event(event))
+			event->hw.flags |= PERF_X86_EVENT_UPDATE;
+		return 0;
+	}
+
 	if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
 		return 0;
 
@@ -4141,6 +4190,141 @@ static __init void intel_ht_bug(void)
 	x86_pmu.stop_scheduling = intel_stop_scheduling;
 }
 
+/*
+ * Update metric event with the PERF_METRICS register and return the delta
+ *
+ * Metric events are defined as SLOTS * metric. The original
+ * metric can be reconstructed by taking SUM(all-metrics)/metric
+ * (or SLOTS/metric)
+ *
+ * There are some limits for reading and resetting the PERF_METRICS register.
+ * - The PERF_METRICS and SLOTS registers have to be reset simultaneously.
+ * - To get the high precision, the measurement period has to be short.
+ *   The PERF_METRICS and SLOTS will be reset for each reading.
+ *   The only exception is context switch. The PERF_METRICS and SLOTS have to
+ *   be saved/restored during context switch for RDPMC users.
+ *
+ * The PERF_METRICS doesn't have the absolute value. To calculate the delta
+ * of metric events,
+ *   delta = new_SLOTS * new_METRICS - last_SLOTS * last_METRICS;
+ * Only need to save the last SLOTS and METRICS for context switch. For other
+ * cases, the registers have been reset. The last_* values are 0.
+ *
+ * To avoid reading the METRICS register multiple times, the metrics and slots
+ * value are cached. This only works when multiple sub-events are in the same
+ * group.
+ */
+static u64 icl_metric_update_event(struct perf_event *event, u64 val)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+	u64 newval, metric, slots_val = 0, new, last;
+	bool nmi = in_nmi();
+	int txn_flags = nmi ? 0 : cpuc->txn_flags;
+
+	/*
+	 * Use cached value for transaction.
+	 */
+	newval = 0;
+	if (txn_flags) {
+		newval = cpuc->txn_metric;
+		slots_val = cpuc->txn_slots;
+	} else if (nmi) {
+		newval = cpuc->nmi_metric;
+		slots_val = cpuc->nmi_slots;
+	}
+
+	if (!newval) {
+		slots_val = val;
+
+		rdpmcl((1<<29), newval);
+		if (txn_flags) {
+			cpuc->txn_metric = newval;
+			cpuc->txn_slots = slots_val;
+		} else if (nmi) {
+			cpuc->nmi_metric = newval;
+			cpuc->nmi_slots = slots_val;
+		}
+
+		if (!(txn_flags & PERF_PMU_TXN_REMOVE)) {
+			/* Reset the metric value when reading
+			 * The SLOTS register must be reset when PERF_METRICS reset,
+			 * otherwise PERF_METRICS may has wrong output.
+			 */
+			wrmsrl(MSR_PERF_METRICS, 0);
+			wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+			hwc->saved_metric = 0;
+			hwc->saved_slots = 0;
+		} else {
+			/* saved metric and slots for context switch */
+			hwc->saved_metric = newval;
+			hwc->saved_slots = val;
+
+		}
+		/* cache the last metric and slots */
+		cpuc->last_metric = hwc->last_metric;
+		cpuc->last_slots = hwc->last_slots;
+		hwc->last_metric = 0;
+		hwc->last_slots = 0;
+	}
+
+	/* The METRICS and SLOTS have been reset when reading */
+	if (!(txn_flags & PERF_PMU_TXN_REMOVE))
+		local64_set(&hwc->prev_count, 0);
+
+	if (is_slots_event(event))
+		return (slots_val - cpuc->last_slots);
+
+	/*
+	 * The metric is reported as an 8bit integer percentage
+	 * suming up to 0xff. As the counter is less than 64bits
+	 * we can use the not used bits to get the needed precision.
+	 * Use 16bit fixed point arithmetic for
+	 * slots-in-metric = (MetricPct / 0xff) * val
+	 * This works fine for upto 48bit counters, but will
+	 * lose precision above that.
+	 */
+
+	metric = (cpuc->last_metric >> ((hwc->idx - INTEL_PMC_IDX_FIXED_METRIC_BASE)*8)) & 0xff;
+	last = (((metric * 0xffff) >> 8) * cpuc->last_slots) >> 16;
+
+	metric = (newval >> ((hwc->idx - INTEL_PMC_IDX_FIXED_METRIC_BASE)*8)) & 0xff;
+	new = (((metric * 0xffff) >> 8) * slots_val) >> 16;
+
+	return (new - last);
+}
+
+/*
+ * Update counters for metrics and slots events
+ */
+static int icl_set_period(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	s64 left = local64_read(&hwc->period_left);
+
+	if (!(hwc->flags & PERF_X86_EVENT_UPDATE))
+		return 0;
+
+	/* The initial value must be 0 */
+	if ((left == x86_pmu.max_period) && !hwc->saved_slots) {
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+		wrmsrl(MSR_PERF_METRICS, 0);
+	}
+
+	/* restore metric and slots for context switch */
+	if (hwc->saved_slots) {
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3,  hwc->saved_slots);
+		wrmsrl(MSR_PERF_METRICS, hwc->saved_metric);
+		hwc->last_slots = hwc->saved_slots;
+		hwc->last_metric = hwc->saved_metric;
+		hwc->saved_slots = 0;
+		hwc->saved_metric = 0;
+
+	}
+
+	return 1;
+}
+
 EVENT_ATTR_STR(mem-loads,	mem_ld_hsw,	"event=0xcd,umask=0x1,ldlat=3");
 EVENT_ATTR_STR(mem-stores,	mem_st_hsw,	"event=0xd0,umask=0x82")
 
@@ -4839,6 +5023,9 @@ __init int intel_pmu_init(void)
 		x86_pmu.lbr_pt_coexist = true;
 		x86_pmu.has_xmm_regs = true;
 		intel_pmu_pebs_data_source_skl(false);
+		x86_pmu.has_metric = x86_pmu.intel_cap.perf_metrics;
+		x86_pmu.metric_update_event = icl_metric_update_event;
+		x86_pmu.set_period = icl_set_period;
 		pr_cont("Icelake events, ");
 		name = "icelake";
 		break;
@@ -4953,6 +5140,11 @@ __init int intel_pmu_init(void)
 	if (x86_pmu.counter_freezing)
 		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
 
+	if (x86_pmu.has_metric) {
+		x86_pmu.intel_ctrl |= 1ULL << 48;
+		pr_cont("TopDown, ");
+	}
+
 	kfree(to_free);
 	return 0;
 }
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 75a75400aae7..06f1fe553ac1 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -40,6 +40,7 @@ enum extra_reg_type {
 	EXTRA_REG_LBR   = 2,	/* lbr_select */
 	EXTRA_REG_LDLAT = 3,	/* ld_lat_threshold */
 	EXTRA_REG_FE    = 4,    /* fe_* */
+	EXTRA_REG_PERF_METRICS = 5, /* perf metrics */
 
 	EXTRA_REG_MAX		/* number of entries needed */
 };
@@ -71,6 +72,7 @@ struct event_constraint {
 #define PERF_X86_EVENT_EXCL_ACCT	0x0200 /* accounted EXCL event */
 #define PERF_X86_EVENT_AUTO_RELOAD	0x0400 /* use PEBS auto-reload */
 #define PERF_X86_EVENT_LARGE_PEBS	0x0800 /* use large PEBS */
+#define PERF_X86_EVENT_UPDATE		0x1000 /* call update_event after read */
 
 static inline bool constraint_match(struct event_constraint *c, u64 ecode)
 {
@@ -237,6 +239,13 @@ struct cpu_hw_events {
 	u64				intel_ctrl_host_mask;
 	struct perf_guest_switch_msr	guest_switch_msrs[X86_PMC_IDX_MAX];
 
+	unsigned long			txn_metric;
+	unsigned long			txn_slots;
+	unsigned long			nmi_metric;
+	unsigned long			nmi_slots;
+	unsigned long			last_metric;
+	unsigned long			last_slots;
+
 	/*
 	 * Intel checkpoint mask
 	 */
@@ -521,6 +530,7 @@ union perf_capabilities {
 		 */
 		u64	full_width_write:1;
 		u64     pebs_baseline:1;
+		u64	perf_metrics:1;
 	};
 	u64	capabilities;
 };
@@ -584,6 +594,7 @@ struct x86_pmu {
 	int		(*addr_offset)(int index, bool eventsel);
 	int		(*rdpmc_index)(int index);
 	u64		(*event_map)(int);
+	u64		(*metric_update_event)(struct perf_event *event, u64 val);
 	int		max_events;
 	int		num_counters;
 	int		num_counters_fixed;
@@ -614,6 +625,7 @@ struct x86_pmu {
 	struct x86_pmu_quirk *quirks;
 	int		perfctr_second_write;
 	u64		(*limit_period)(struct perf_event *event, u64 l);
+	int		(*set_period)(struct perf_event *event);
 
 	/* PMI handler bits */
 	unsigned int	late_ack		:1,
@@ -702,6 +714,8 @@ struct x86_pmu {
 	struct extra_reg *extra_regs;
 	unsigned int flags;
 
+	bool has_metric;
+
 	/*
 	 * Intel host/guest support (KVM)
 	 */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 4310477d6808..bcdd6fadf225 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -102,6 +102,8 @@
 #define MSR_TURBO_RATIO_LIMIT1		0x000001ae
 #define MSR_TURBO_RATIO_LIMIT2		0x000001af
 
+#define MSR_PERF_METRICS		0x00000329
+
 #define MSR_LBR_SELECT			0x000001c8
 #define MSR_LBR_TOS			0x000001c9
 #define MSR_LBR_NHM_FROM		0x00000680
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c0cbd714f10e..2a1405e907ec 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -133,6 +133,11 @@ struct hw_perf_event {
 
 			struct hw_perf_event_extra extra_reg;
 			struct hw_perf_event_extra branch_reg;
+
+			u64		saved_metric;
+			u64		saved_slots;
+			u64		last_slots;
+			u64		last_metric;
 		};
 		struct { /* software */
 			struct hrtimer	hrtimer;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 16/23] perf/x86/intel: Set correct weight for topdown subevent counters
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (14 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 15/23] perf/x86/intel: Support hardware TopDown metrics kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 17/23] perf/x86/intel: Export new top down events for Icelake kan.liang
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

The top down sub event counters are mapped to a fixed counter,
but should have the normal weight for the scheduler.
So special case this.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/intel/core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 6a8a221dc188..31e4e283e7c5 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5081,6 +5081,15 @@ __init int intel_pmu_init(void)
 		 * counter, so do not extend mask to generic counters
 		 */
 		for_each_event_constraint(c, x86_pmu.event_constraints) {
+			/*
+			 * Don't limit the event mask for topdown sub event
+			 * counters.
+			 */
+			if (x86_pmu.num_counters_fixed >= 3 &&
+			    c->idxmsk64 & INTEL_PMC_MSK_ANY_SLOTS) {
+				c->weight = hweight64(c->idxmsk64);
+				continue;
+			}
 			if (c->cmask == FIXED_EVENT_FLAGS
 			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) {
 				c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 17/23] perf/x86/intel: Export new top down events for Icelake
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (15 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 16/23] perf/x86/intel: Set correct weight for topdown subevent counters kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 18/23] perf/x86/intel: Disable sampling read slots and topdown kan.liang
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Export new top down events for perf that map to the sub metrics
in the metrics register, and another for the new slots fixed counter.
This makes the new fixed counters in Icelake visible to the perf
user tools.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/intel/core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 31e4e283e7c5..b08e361fc718 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -320,6 +320,12 @@ EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles,
 EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale,
 	"4", "2");
 
+EVENT_ATTR_STR(slots,			slots,		"event=0x00,umask=0x4");
+EVENT_ATTR_STR(topdown-retiring,	td_retiring,	"event=0xff,umask=0x1");
+EVENT_ATTR_STR(topdown-bad-spec,	td_bad_spec,	"event=0xff,umask=0x2");
+EVENT_ATTR_STR(topdown-fe-bound,	td_fe_bound,	"event=0xff,umask=0x3");
+EVENT_ATTR_STR(topdown-be-bound,	td_be_bound,	"event=0xff,umask=0x4");
+
 static struct attribute *snb_events_attrs[] = {
 	EVENT_PTR(td_slots_issued),
 	EVENT_PTR(td_slots_retired),
@@ -4383,6 +4389,11 @@ EVENT_ATTR_STR(el-capacity-write, el_capacity_write, "event=0x54,umask=0x2");
 static struct attribute *icl_events_attrs[] = {
 	EVENT_PTR(mem_ld_hsw),
 	EVENT_PTR(mem_st_hsw),
+	EVENT_PTR(slots),
+	EVENT_PTR(td_retiring),
+	EVENT_PTR(td_bad_spec),
+	EVENT_PTR(td_fe_bound),
+	EVENT_PTR(td_be_bound),
 	NULL,
 };
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 18/23] perf/x86/intel: Disable sampling read slots and topdown
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (16 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 17/23] perf/x86/intel: Export new top down events for Icelake kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:56 ` [PATCH V2 19/23] perf/x86/intel: Support CPUID 10.ECX to disable fixed counters kan.liang
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

To get correct PERF_METRICS value, the fixed counter 3 must start from
0. It would bring problems when sampling read slots and topdown events.
For example,
        perf record -e '{slots, topdown-retiring}:S'
The slots would not overflow if it starts from 0.

Add specific validate_group() support to reject the case and error out
for Icelake.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/core.c       |  2 ++
 arch/x86/events/intel/core.c | 20 ++++++++++++++++++++
 arch/x86/events/perf_event.h |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 7d4d56f76436..b9bee53e53d8 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2122,6 +2122,8 @@ static int validate_group(struct perf_event *event)
 
 	ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
 
+	if (x86_pmu.validate_group)
+		ret = x86_pmu.validate_group(fake_cpuc, n);
 out:
 	free_fake_cpuc(fake_cpuc);
 	return ret;
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index b08e361fc718..ef6045544628 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4331,6 +4331,25 @@ static int icl_set_period(struct perf_event *event)
 	return 1;
 }
 
+static int icl_validate_group(struct cpu_hw_events *cpuc, int n)
+{
+	bool has_sampling_slots = false, has_metrics = false;
+	struct perf_event *e;
+	int i;
+
+	for (i = 0; i < n; i++) {
+		e = cpuc->event_list[i];
+		if (is_slots_event(e) && is_sampling_event(e))
+			has_sampling_slots = true;
+
+		if (is_perf_metrics_event(e))
+			has_metrics = true;
+	}
+	if (unlikely(has_sampling_slots && has_metrics))
+		return -EINVAL;
+	return 0;
+}
+
 EVENT_ATTR_STR(mem-loads,	mem_ld_hsw,	"event=0xcd,umask=0x1,ldlat=3");
 EVENT_ATTR_STR(mem-stores,	mem_st_hsw,	"event=0xd0,umask=0x82")
 
@@ -5037,6 +5056,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.has_metric = x86_pmu.intel_cap.perf_metrics;
 		x86_pmu.metric_update_event = icl_metric_update_event;
 		x86_pmu.set_period = icl_set_period;
+		x86_pmu.validate_group = icl_validate_group;
 		pr_cont("Icelake events, ");
 		name = "icelake";
 		break;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 06f1fe553ac1..bd0dbbc2adeb 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -627,6 +627,8 @@ struct x86_pmu {
 	u64		(*limit_period)(struct perf_event *event, u64 l);
 	int		(*set_period)(struct perf_event *event);
 
+	int		(*validate_group)(struct cpu_hw_events *cpuc, int n);
+
 	/* PMI handler bits */
 	unsigned int	late_ack		:1,
 			counter_freezing	:1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 19/23] perf/x86/intel: Support CPUID 10.ECX to disable fixed counters
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (17 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 18/23] perf/x86/intel: Disable sampling read slots and topdown kan.liang
@ 2019-03-21 20:56 ` kan.liang
  2019-03-21 20:57 ` [PATCH V2 20/23] perf, tools: Add support for recording and printing XMM registers kan.liang
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:56 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Icelake supports a new CPUID 10.ECX cpu leaf to indicate some fixed
counters are not supported.  This extends the previous count to a bitmap
which allows to disable even lower counters.

It's a nop on Icelake (all fixed counters are supported), but let's
implement it here.  This adds the necessary checks. In theory it could
be used today by a Hypervisor.

For disabled counters disable any constraint events. I reuse the
existing intel_ctrl variable to remember which counters are disabled.
All code that reads all counters is fixed to check this extra bitmask.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1.

 arch/x86/events/core.c       |  8 +++++++-
 arch/x86/events/intel/core.c | 22 +++++++++++++++-------
 arch/x86/events/perf_event.h |  6 ++++++
 3 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index b9bee53e53d8..12d7d591843e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -225,6 +225,8 @@ static bool check_hw_exists(void)
 		if (ret)
 			goto msr_fail;
 		for (i = 0; i < x86_pmu.num_counters_fixed; i++) {
+			if (fixed_counter_disabled(i))
+				continue;
 			if (val & (0x03 << i*4)) {
 				bios_fail = 1;
 				val_fail = val;
@@ -1372,6 +1374,8 @@ void perf_event_print_debug(void)
 			cpu, idx, prev_left);
 	}
 	for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
+		if (fixed_counter_disabled(idx))
+			continue;
 		rdmsrl(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, pmc_count);
 
 		pr_info("CPU#%d: fixed-PMC%d count: %016llx\n",
@@ -1887,7 +1891,9 @@ static int __init init_hw_perf_events(void)
 	pr_info("... generic registers:      %d\n",     x86_pmu.num_counters);
 	pr_info("... value mask:             %016Lx\n", x86_pmu.cntval_mask);
 	pr_info("... max period:             %016Lx\n", x86_pmu.max_period);
-	pr_info("... fixed-purpose events:   %d\n",     x86_pmu.num_counters_fixed);
+	pr_info("... fixed-purpose events:   %lu\n",
+			hweight64((((1ULL << x86_pmu.num_counters_fixed) - 1)
+					<< INTEL_PMC_IDX_FIXED) & x86_pmu.intel_ctrl));
 	pr_info("... event mask:             %016Lx\n", x86_pmu.intel_ctrl);
 
 	/*
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ef6045544628..a4b7711ef0ee 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2311,8 +2311,11 @@ static void intel_pmu_reset(void)
 		wrmsrl_safe(x86_pmu_config_addr(idx), 0ull);
 		wrmsrl_safe(x86_pmu_event_addr(idx),  0ull);
 	}
-	for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++)
+	for (idx = 0; idx < x86_pmu.num_counters_fixed; idx++) {
+		if (fixed_counter_disabled(idx))
+			continue;
 		wrmsrl_safe(MSR_ARCH_PERFMON_FIXED_CTR0 + idx, 0ull);
+	}
 
 	if (ds)
 		ds->bts_index = ds->bts_buffer_base;
@@ -4551,7 +4554,7 @@ __init int intel_pmu_init(void)
 	union cpuid10_eax eax;
 	union cpuid10_ebx ebx;
 	struct event_constraint *c;
-	unsigned int unused;
+	unsigned int fixed_mask;
 	struct extra_reg *er;
 	int version, i;
 	char *name;
@@ -4572,9 +4575,11 @@ __init int intel_pmu_init(void)
 	 * Check whether the Architectural PerfMon supports
 	 * Branch Misses Retired hw_event or not.
 	 */
-	cpuid(10, &eax.full, &ebx.full, &unused, &edx.full);
+	cpuid(10, &eax.full, &ebx.full, &fixed_mask, &edx.full);
 	if (eax.split.mask_length < ARCH_PERFMON_EVENTS_COUNT)
 		return -ENODEV;
+	if (!fixed_mask)
+		fixed_mask = -1;
 
 	version = eax.split.version_id;
 	if (version < 2)
@@ -5104,7 +5109,8 @@ __init int intel_pmu_init(void)
 	}
 
 	x86_pmu.intel_ctrl |=
-		((1LL << x86_pmu.num_counters_fixed)-1) << INTEL_PMC_IDX_FIXED;
+		(((1LL << x86_pmu.num_counters_fixed)-1) & (u64)fixed_mask)
+			<< INTEL_PMC_IDX_FIXED;
 
 	if (x86_pmu.event_constraints) {
 		/*
@@ -5121,9 +5127,11 @@ __init int intel_pmu_init(void)
 				c->weight = hweight64(c->idxmsk64);
 				continue;
 			}
-			if (c->cmask == FIXED_EVENT_FLAGS
-			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) {
-				c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
+			if (c->cmask == FIXED_EVENT_FLAGS)  {
+				if (c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES)
+					c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
+				/* Disabled fixed counters which are not in CPUID */
+				c->idxmsk64 &= x86_pmu.intel_ctrl;
 			}
 			c->idxmsk64 &=
 				~(~0ULL << (INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed));
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index bd0dbbc2adeb..c1e1b5407449 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -926,6 +926,12 @@ ssize_t events_sysfs_show(struct device *dev, struct device_attribute *attr,
 ssize_t events_ht_sysfs_show(struct device *dev, struct device_attribute *attr,
 			  char *page);
 
+static inline bool fixed_counter_disabled(int i)
+{
+	return x86_pmu.intel_ctrl &&
+		((1ULL << (i + INTEL_PMC_IDX_FIXED)) & x86_pmu.intel_ctrl);
+}
+
 #ifdef CONFIG_CPU_SUP_AMD
 
 int amd_pmu_init(void);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 20/23] perf, tools: Add support for recording and printing XMM registers
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (18 preceding siblings ...)
  2019-03-21 20:56 ` [PATCH V2 19/23] perf/x86/intel: Support CPUID 10.ECX to disable fixed counters kan.liang
@ 2019-03-21 20:57 ` kan.liang
  2019-03-21 20:57 ` [PATCH V2 21/23] perf, tools, stat: Support new per thread TopDown metrics kan.liang
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:57 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Newer kernel code can collect XMM registers in some cases.
Add support for perf script to dump them, and support
for the register parser in perf record -I ... to configure them.
For now they are just printed in hex, could potentially add
other formats too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V1:
- Make perf_regs.h consistent between kernel and user space

 tools/arch/x86/include/uapi/asm/perf_regs.h | 26 ++++++++++++++++--
 tools/perf/arch/x86/include/perf_regs.h     | 29 ++++++++++++++++++---
 tools/perf/arch/x86/util/perf_regs.c        | 16 ++++++++++++
 tools/perf/util/perf_regs.h                 |  1 +
 4 files changed, 66 insertions(+), 6 deletions(-)

diff --git a/tools/arch/x86/include/uapi/asm/perf_regs.h b/tools/arch/x86/include/uapi/asm/perf_regs.h
index f3329cabce5c..b33995313d17 100644
--- a/tools/arch/x86/include/uapi/asm/perf_regs.h
+++ b/tools/arch/x86/include/uapi/asm/perf_regs.h
@@ -28,7 +28,29 @@ enum perf_event_x86_regs {
 	PERF_REG_X86_R14,
 	PERF_REG_X86_R15,
 
-	PERF_REG_X86_32_MAX = PERF_REG_X86_GS + 1,
-	PERF_REG_X86_64_MAX = PERF_REG_X86_R15 + 1,
+	/* These all need two bits set because they are 128bit */
+	PERF_REG_X86_XMM0  = 32,
+	PERF_REG_X86_XMM1  = 34,
+	PERF_REG_X86_XMM2  = 36,
+	PERF_REG_X86_XMM3  = 38,
+	PERF_REG_X86_XMM4  = 40,
+	PERF_REG_X86_XMM5  = 42,
+	PERF_REG_X86_XMM6  = 44,
+	PERF_REG_X86_XMM7  = 46,
+	PERF_REG_X86_XMM8  = 48,
+	PERF_REG_X86_XMM9  = 50,
+	PERF_REG_X86_XMM10 = 52,
+	PERF_REG_X86_XMM11 = 54,
+	PERF_REG_X86_XMM12 = 56,
+	PERF_REG_X86_XMM13 = 58,
+	PERF_REG_X86_XMM14 = 60,
+	PERF_REG_X86_XMM15 = 62,
+
+	/* This does not include the XMMX registers */
+	PERF_REG_GPR_X86_32_MAX = PERF_REG_X86_GS + 1,
+	PERF_REG_GPR_X86_64_MAX = PERF_REG_X86_R15 + 1,
+
+	/* All registers include the XMMX registers */
+	PERF_REG_X86_MAX = PERF_REG_X86_XMM15 + 2,
 };
 #endif /* _ASM_X86_PERF_REGS_H */
diff --git a/tools/perf/arch/x86/include/perf_regs.h b/tools/perf/arch/x86/include/perf_regs.h
index 7f6d538f8a89..023484e9aebd 100644
--- a/tools/perf/arch/x86/include/perf_regs.h
+++ b/tools/perf/arch/x86/include/perf_regs.h
@@ -8,17 +8,16 @@
 
 void perf_regs_load(u64 *regs);
 
+#define PERF_REGS_MAX PERF_REG_X86_MAX
 #ifndef HAVE_ARCH_X86_64_SUPPORT
-#define PERF_REGS_MASK ((1ULL << PERF_REG_X86_32_MAX) - 1)
-#define PERF_REGS_MAX PERF_REG_X86_32_MAX
+#define PERF_REGS_MASK ((1ULL << PERF_REG_GPR_X86_32_MAX) - 1)
 #define PERF_SAMPLE_REGS_ABI PERF_SAMPLE_REGS_ABI_32
 #else
 #define REG_NOSUPPORT ((1ULL << PERF_REG_X86_DS) | \
 		       (1ULL << PERF_REG_X86_ES) | \
 		       (1ULL << PERF_REG_X86_FS) | \
 		       (1ULL << PERF_REG_X86_GS))
-#define PERF_REGS_MASK (((1ULL << PERF_REG_X86_64_MAX) - 1) & ~REG_NOSUPPORT)
-#define PERF_REGS_MAX PERF_REG_X86_64_MAX
+#define PERF_REGS_MASK (((1ULL << PERF_REG_GPR_X86_64_MAX) - 1) & ~REG_NOSUPPORT)
 #define PERF_SAMPLE_REGS_ABI PERF_SAMPLE_REGS_ABI_64
 #endif
 #define PERF_REG_IP PERF_REG_X86_IP
@@ -77,6 +76,28 @@ static inline const char *perf_reg_name(int id)
 	case PERF_REG_X86_R15:
 		return "R15";
 #endif /* HAVE_ARCH_X86_64_SUPPORT */
+
+#define XMM(x) \
+	case PERF_REG_X86_XMM ## x:	\
+	case PERF_REG_X86_XMM ## x + 1:	\
+		return "XMM" #x;
+	XMM(0)
+	XMM(1)
+	XMM(2)
+	XMM(3)
+	XMM(4)
+	XMM(5)
+	XMM(6)
+	XMM(7)
+	XMM(8)
+	XMM(9)
+	XMM(10)
+	XMM(11)
+	XMM(12)
+	XMM(13)
+	XMM(14)
+	XMM(15)
+#undef XMM
 	default:
 		return NULL;
 	}
diff --git a/tools/perf/arch/x86/util/perf_regs.c b/tools/perf/arch/x86/util/perf_regs.c
index fead6b3b4206..71d7604dbf0b 100644
--- a/tools/perf/arch/x86/util/perf_regs.c
+++ b/tools/perf/arch/x86/util/perf_regs.c
@@ -31,6 +31,22 @@ const struct sample_reg sample_reg_masks[] = {
 	SMPL_REG(R14, PERF_REG_X86_R14),
 	SMPL_REG(R15, PERF_REG_X86_R15),
 #endif
+	SMPL_REG2(XMM0, PERF_REG_X86_XMM0),
+	SMPL_REG2(XMM1, PERF_REG_X86_XMM1),
+	SMPL_REG2(XMM2, PERF_REG_X86_XMM2),
+	SMPL_REG2(XMM3, PERF_REG_X86_XMM3),
+	SMPL_REG2(XMM4, PERF_REG_X86_XMM4),
+	SMPL_REG2(XMM5, PERF_REG_X86_XMM5),
+	SMPL_REG2(XMM6, PERF_REG_X86_XMM6),
+	SMPL_REG2(XMM7, PERF_REG_X86_XMM7),
+	SMPL_REG2(XMM8, PERF_REG_X86_XMM8),
+	SMPL_REG2(XMM9, PERF_REG_X86_XMM9),
+	SMPL_REG2(XMM10, PERF_REG_X86_XMM10),
+	SMPL_REG2(XMM11, PERF_REG_X86_XMM11),
+	SMPL_REG2(XMM12, PERF_REG_X86_XMM12),
+	SMPL_REG2(XMM13, PERF_REG_X86_XMM13),
+	SMPL_REG2(XMM14, PERF_REG_X86_XMM14),
+	SMPL_REG2(XMM15, PERF_REG_X86_XMM15),
 	SMPL_REG_END
 };
 
diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
index c9319f8d17a6..1a15a4bfc28d 100644
--- a/tools/perf/util/perf_regs.h
+++ b/tools/perf/util/perf_regs.h
@@ -12,6 +12,7 @@ struct sample_reg {
 	uint64_t mask;
 };
 #define SMPL_REG(n, b) { .name = #n, .mask = 1ULL << (b) }
+#define SMPL_REG2(n, b) { .name = #n, .mask = 3ULL << (b) }
 #define SMPL_REG_END { .name = NULL }
 
 extern const struct sample_reg sample_reg_masks[];
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 21/23] perf, tools, stat: Support new per thread TopDown metrics
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (19 preceding siblings ...)
  2019-03-21 20:57 ` [PATCH V2 20/23] perf, tools: Add support for recording and printing XMM registers kan.liang
@ 2019-03-21 20:57 ` kan.liang
  2019-03-21 20:57 ` [PATCH V2 22/23] perf, tools: Add documentation for topdown metrics kan.liang
  2019-03-21 20:57 ` [PATCH V2 23/23] perf vendor events intel: Add JSON files for Icelake kan.liang
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:57 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Icelake has support for reporting per thread TopDown metrics.
These are reported differently than the previous TopDown support,
each metric is standalone, but scaled to pipeline "slots".
We don't need to do anything special for HyperThreading anymore.
Teach perf stat --topdown to handle these new metrics and
print them in the same way as the previous TopDown metrics.
The restrictions of only being able to report information per core is
gone.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1

 tools/perf/Documentation/perf-stat.txt |  9 ++-
 tools/perf/builtin-stat.c              | 24 +++++++
 tools/perf/util/stat-shadow.c          | 89 ++++++++++++++++++++++++++
 tools/perf/util/stat.c                 |  4 ++
 tools/perf/util/stat.h                 |  8 +++
 5 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 4bc2085e5197..f4469751ef0a 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -266,8 +266,13 @@ if the workload is actually bound by the CPU and not by something else.
 For best results it is usually a good idea to use it with interval
 mode like -I 1000, as the bottleneck of workloads can change often.
 
-The top down metrics are collected per core instead of per
-CPU thread. Per core mode is automatically enabled
+This enables --metric-only, unless overridden with --no-metric-only.
+
+The following restrictions only apply to older Intel CPUs and Atom,
+on newer CPUs (IceLake and later) TopDown can be collected for any thread:
+
+The top down metrics are collected per core instead of per CPU thread.
+Per core mode is automatically enabled
 and -a (global monitoring) is needed, requiring root rights or
 perf.perf_event_paranoid=-1.
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7b8f09b0b8bf..5068396c241b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -123,6 +123,14 @@ static const char * topdown_attrs[] = {
 	NULL,
 };
 
+static const char *topdown_metric_attrs[] = {
+	"topdown-retiring",
+	"topdown-bad-spec",
+	"topdown-fe-bound",
+	"topdown-be-bound",
+	NULL,
+};
+
 static const char *smi_cost_attrs = {
 	"{"
 	"msr/aperf/,"
@@ -1215,6 +1223,21 @@ static int add_default_attributes(void)
 		char *str = NULL;
 		bool warn = false;
 
+		if (topdown_filter_events(topdown_metric_attrs, &str, 1) < 0) {
+			pr_err("Out of memory\n");
+			return -1;
+		}
+		if (topdown_metric_attrs[0] && str) {
+			if (!stat_config.interval) {
+				fprintf(stat_config.output,
+					"Topdown accuracy may decreases when measuring long period.\n"
+					"Please print the result regularly, e.g. -I1000\n");
+			}
+			goto setup_metrics;
+		}
+
+		str = NULL;
+
 		if (stat_config.aggr_mode != AGGR_GLOBAL &&
 		    stat_config.aggr_mode != AGGR_CORE) {
 			pr_err("top down event configuration requires --per-core mode\n");
@@ -1236,6 +1259,7 @@ static int add_default_attributes(void)
 		if (topdown_attrs[0] && str) {
 			if (warn)
 				arch_topdown_group_warn();
+setup_metrics:
 			err = parse_events(evsel_list, str, &errinfo);
 			if (err) {
 				fprintf(stderr,
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 83d8094be4fe..6b6ffd64a4c4 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -238,6 +238,18 @@ void perf_stat__update_shadow_stats(struct perf_evsel *counter, u64 count,
 	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
 		update_runtime_stat(st, STAT_TOPDOWN_RECOVERY_BUBBLES,
 				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_RETIRING))
+		update_runtime_stat(st, STAT_TOPDOWN_RETIRING,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BAD_SPEC))
+		update_runtime_stat(st, STAT_TOPDOWN_BAD_SPEC,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_FE_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_FE_BOUND,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BE_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_BE_BOUND,
+				    ctx, cpu, count);
 	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT,
 				    ctx, cpu, count);
@@ -682,6 +694,47 @@ static double td_be_bound(int ctx, int cpu, struct runtime_stat *st)
 	return sanitize_val(1.0 - sum);
 }
 
+/*
+ * Kernel reports metrics multiplied with slots. To get back
+ * the ratios we need to recreate the sum.
+ */
+
+static double td_metric_ratio(int ctx, int cpu,
+			      enum stat_type type,
+			      struct runtime_stat *stat)
+{
+	double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, ctx, cpu);
+	double d = runtime_stat_avg(stat, type, ctx, cpu);
+
+	if (sum)
+		return d / sum;
+	return 0;
+}
+
+/*
+ * ... but only if most of the values are actually available.
+ * We allow two missing.
+ */
+
+static bool full_td(int ctx, int cpu,
+		    struct runtime_stat *stat)
+{
+	int c = 0;
+
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, ctx, cpu) > 0)
+		c++;
+	return c >= 2;
+}
+
 static void print_smi_cost(struct perf_stat_config *config,
 			   int cpu, struct perf_evsel *evsel,
 			   struct perf_stat_output_ctx *out,
@@ -970,6 +1023,42 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					be_bound * 100.);
 		else
 			print_metric(config, ctxp, NULL, NULL, name, 0);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RETIRING) &&
+			full_td(ctx, cpu, st)) {
+		double retiring = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_RETIRING, st);
+
+		if (retiring > 0.7)
+			color = PERF_COLOR_GREEN;
+		print_metric(config, ctxp, color, "%8.1f%%", "retiring",
+				retiring * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FE_BOUND) &&
+			full_td(ctx, cpu, st)) {
+		double fe_bound = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_FE_BOUND, st);
+
+		if (fe_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "frontend bound",
+				fe_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BE_BOUND) &&
+			full_td(ctx, cpu, st)) {
+		double be_bound = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_BE_BOUND, st);
+
+		if (be_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "backend bound",
+				be_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BAD_SPEC) &&
+			full_td(ctx, cpu, st)) {
+		double bad_spec = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_BAD_SPEC, st);
+
+		if (bad_spec > 0.1)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "bad speculation",
+				bad_spec * 100.);
 	} else if (evsel->metric_expr) {
 		generic_metric(config, evsel->metric_expr, evsel->metric_events, evsel->name,
 				evsel->metric_name, avg, cpu, out, st);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 4d40515307b8..56f86752a14e 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -87,6 +87,10 @@ static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
 	ID(TOPDOWN_SLOTS_RETIRED, topdown-slots-retired),
 	ID(TOPDOWN_FETCH_BUBBLES, topdown-fetch-bubbles),
 	ID(TOPDOWN_RECOVERY_BUBBLES, topdown-recovery-bubbles),
+	ID(TOPDOWN_RETIRING, topdown-retiring),
+	ID(TOPDOWN_BAD_SPEC, topdown-bad-spec),
+	ID(TOPDOWN_FE_BOUND, topdown-fe-bound),
+	ID(TOPDOWN_BE_BOUND, topdown-be-bound),
 	ID(SMI_NUM, msr/smi/),
 	ID(APERF, msr/aperf/),
 };
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 2f9c9159a364..827c560ad19f 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -29,6 +29,10 @@ enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_RETIRED,
 	PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_BUBBLES,
 	PERF_STAT_EVSEL_ID__TOPDOWN_RECOVERY_BUBBLES,
+	PERF_STAT_EVSEL_ID__TOPDOWN_RETIRING,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BAD_SPEC,
+	PERF_STAT_EVSEL_ID__TOPDOWN_FE_BOUND,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BE_BOUND,
 	PERF_STAT_EVSEL_ID__SMI_NUM,
 	PERF_STAT_EVSEL_ID__APERF,
 	PERF_STAT_EVSEL_ID__MAX,
@@ -81,6 +85,10 @@ enum stat_type {
 	STAT_TOPDOWN_SLOTS_RETIRED,
 	STAT_TOPDOWN_FETCH_BUBBLES,
 	STAT_TOPDOWN_RECOVERY_BUBBLES,
+	STAT_TOPDOWN_RETIRING,
+	STAT_TOPDOWN_BAD_SPEC,
+	STAT_TOPDOWN_FE_BOUND,
+	STAT_TOPDOWN_BE_BOUND,
 	STAT_SMI_NUM,
 	STAT_APERF,
 	STAT_MAX
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 22/23] perf, tools: Add documentation for topdown metrics
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (20 preceding siblings ...)
  2019-03-21 20:57 ` [PATCH V2 21/23] perf, tools, stat: Support new per thread TopDown metrics kan.liang
@ 2019-03-21 20:57 ` kan.liang
  2019-03-21 20:57 ` [PATCH V2 23/23] perf vendor events intel: Add JSON files for Icelake kan.liang
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:57 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Add some documentation how to use the topdown metrics in ring 3.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1

 tools/perf/Documentation/topdown.txt | 223 +++++++++++++++++++++++++++
 1 file changed, 223 insertions(+)
 create mode 100644 tools/perf/Documentation/topdown.txt

diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
new file mode 100644
index 000000000000..167393225641
--- /dev/null
+++ b/tools/perf/Documentation/topdown.txt
@@ -0,0 +1,223 @@
+Using TopDown metrics in user space
+-----------------------------------
+
+Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown
+methology to break down CPU pipeline execution into 4 bottlenecks:
+frontend bound, backend bound, bad speculation, retiring.
+
+For more details on Topdown see [1][5]
+
+Traditionally this was implemented by events in generic counters
+and specific formulas to compute the bottlenecks.
+
+perf stat --topdown implements this.
+
+% perf stat -a --topdown -I1000
+#           time             counts unit events
+     1.000373951      8,460,978,609      topdown-retiring          #     22.9% retiring
+     1.000373951      3,445,383,303      topdown-bad-spec          #      9.3% bad speculation
+     1.000373951     15,886,483,355      topdown-fe-bound          #     43.0% frontend bound
+     1.000373951      9,163,488,720      topdown-be-bound          #     24.8% backend bound
+     2.000782154      8,477,925,431      topdown-retiring          #     22.9% retiring
+     2.000782154      3,459,151,256      topdown-bad-spec          #      9.3% bad speculation
+     2.000782154     15,947,224,725      topdown-fe-bound          #     43.0% frontend bound
+     2.000782154      9,145,551,695      topdown-be-bound          #     24.7% backend bound
+     3.001155967      8,487,323,125      topdown-retiring          #     22.9% retiring
+     3.001155967      3,451,808,066      topdown-bad-spec          #      9.3% bad speculation
+     3.001155967     15,959,068,902      topdown-fe-bound          #     43.0% frontend bound
+     3.001155967      9,172,479,784      topdown-be-bound          #     24.7% backend bound
+...
+
+Full Top Down includes more levels that can break down the
+bottlenecks further. This is not directly implemented in perf,
+but available in other tools that can run on top of perf,
+such as toplev[2] or vtune[3]
+
+New Topdown features in Icelake
+===============================
+
+With Icelake (2018 Core) CPUs the TopDown metrics are directly available as
+fixed counters and do not require generic counters. This allows
+to collect TopDown always in addition to other events.
+
+This also enables measuring TopDown per thread/process instead
+of only per core.
+
+Using TopDown through RDPMC in applications on Icelake
+======================================================
+
+For more fine grained measurements it can be useful to
+access the new  directly from user space. This is more complicated,
+but drastically lowers overhead.
+
+On Icelake, there is a new fixed counter 3: SLOTS, which reports
+"pipeline SLOTS" (cycles multiplied by core issue width) and a
+metric register that reports slots ratios for the different bottleneck
+categories.
+
+The metrics counter is CPU model specific and is not be available
+on older CPUs.
+
+Example code
+============
+
+Library functions to do the functionality described below
+is also available in libjevents [4]
+
+The application opens a perf_event file descriptor
+and sets up fixed counter 3 (SLOTS) to start and
+allow user programs to read the performance counters.
+
+Fixed counter 3 is mapped to a pseudo event event=0x00, umask=04,
+so the perf_event_attr structure should be initialized with
+{ .config = 0x0400, .type = PERF_TYPE_RAW }
+
+#include <linux/perf_event.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+/* Provide own perf_event_open stub because glibc doesn't */
+__attribute__((weak))
+int perf_event_open(struct perf_event_attr *attr, pid_t pid,
+		    int cpu, int group_fd, unsigned long flags)
+{
+	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+}
+
+/* open slots counter file descriptor for current task */
+struct perf_event_attr slots = {
+	.type = PERF_TYPE_RAW,
+	.size = sizeof(struct perf_event_attr),
+	.config = 0x400,
+	.exclude_kernel = 1,
+};
+
+int fd = perf_event_open(&slots, 0, -1, -1, 0);
+if (fd < 0)
+	... error ...
+
+The RDPMC instruction (or _rdpmc compiler intrinsic) can now be used
+to read slots and the topdown metrics at different points of the program:
+
+#include <stdint.h>
+#include <x86intrin.h>
+
+#define RDPMC_FIXED	(1 << 30)	/* return fixed counters */
+#define RDPMC_METRIC	(1 << 29)	/* return metric counters */
+
+#define FIXED_COUNTER_SLOTS		3
+#define METRIC_COUNTER_TOPDOWN_L1	0
+
+static inline uint64_t read_slots(void)
+{
+	return _rdpmc(RDPMC_FIXED | FIXED_COUNTER_SLOTS);
+}
+
+static inline uint64_t read_metrics(void)
+{
+	return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1);
+}
+
+Then the program can be instrumented to read these metrics at different
+points.
+
+It's not a good idea to do this with too short code regions,
+as the parallelism and overlap in the CPU program execution will
+cause too much measurement inaccuracy. For example instrumenting
+individual basic blocks is definitely too fine grained.
+
+Decoding metrics values
+=======================
+
+The value reported by read_metrics() contains four 8 bit fields
+that represent a scaled ratio that represent the Level 1 bottleneck.
+All four fields add up to 0xff (= 100%)
+
+The binary ratios in the metric value can be converted to float ratios:
+
+#define GET_METRIC(m, i) (((m) >> (i*8)) & 0xff)
+
+#define TOPDOWN_RETIRING(val)	((float)GET_METRIC(val, 0) / 0xff)
+#define TOPDOWN_BAD_SPEC(val)	((float)GET_METRIC(val, 1) / 0xff)
+#define TOPDOWN_FE_BOUND(val)	((float)GET_METRIC(val, 2) / 0xff)
+#define TOPDOWN_BE_BOUND(val)	((float)GET_METRIC(val, 3) / 0xff)
+
+and then converted to percent for printing.
+
+The ratios in the metric accumulate for the time when the counter
+is enabled. For measuring programs it is often useful to measure
+specific sections. For this it is needed to deltas on metrics.
+
+This can be done by scaling the metrics with the slots counter
+read at the same time.
+
+Then it's possible to take deltas of these slots counts
+measured at different points, and determine the metrics
+for that time period.
+
+	slots_a = read_slots();
+	metric_a = read_metrics();
+
+	... larger code region ...
+
+	slots_b = read_slots()
+	metric_b = read_metrics()
+
+	# compute scaled metrics for measurement a
+	retiring_slots_a = GET_METRIC(metric_a, 0) * slots_a
+	bad_spec_slots_a = GET_METRIC(metric_a, 1) * slots_a
+	fe_bound_slots_a = GET_METRIC(metric_a, 2) * slots_a
+	be_bound_slots_a = GET_METRIC(metric_a, 3) * slots_a
+
+	# compute delta scaled metrics between b and a
+	retiring_slots = GET_METRIC(metric_b, 0) * slots_b - retiring_slots_a
+	bad_spec_slots = GET_METRIC(metric_b, 1) * slots_b - bad_spec_slots_a
+	fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a
+	be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a
+
+Later the individual ratios for the measurement period can be recreated
+from these counts.
+
+	slots_delta = slots_b - slots_a
+	retiring_ratio = (float)retiring_slots / slots_delta
+	bad_spec_ratio = (float)bad_spec_slots / slots_delta
+	fe_bound_ratio = (float)fe_bound_slots / slots_delta
+	be_bound_ratio = (float)be_bound_slots / slota_delta
+
+	printf("Retiring %.2f%% Bad Speculation %.2f%% FE Bound %.2f%% BE Bound %.2f%%\n",
+		retiring_ratio * 100.,
+		bad_spec_ratio * 100.,
+		fe_bound_ratio * 100.,
+		be_bound_ratio * 100.);
+
+Resetting metrics counters
+==========================
+
+Since the individual metrics are only 8bit they lose precision for
+short regions over time because the number of cycles covered by each
+fraction bit shrinks. So the counters need to be reset regularly.
+
+When using the kernel perf API the kernel resets on every read.
+So as long as the reading is at reasonable intervals (every few
+seconds) the precision is good.
+
+When using perf stat it is recommended to always use the -I option,
+with no longer interval than a few seconds
+
+	perf stat -I 1000 --topdown ...
+
+For user programs using RDPMC directly the counter can
+be reset explicitly using ioctl:
+
+	ioctl(perf_fd, PERF_EVENT_IOC_RESET, 0);
+
+This "opens" a new measurement period.
+
+A program using RDPMC for TopDown should schedule such a reset
+regularly, as in every few seconds.
+
+[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
+[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
+[3] https://software.intel.com/en-us/intel-vtune-amplifier-xe
+[4] https://github.com/andikleen/pmu-tools/tree/master/jevents
+[5] https://sites.google.com/site/analysismethods/yasin-pubs
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH V2 23/23] perf vendor events intel: Add JSON files for Icelake
  2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
                   ` (21 preceding siblings ...)
  2019-03-21 20:57 ` [PATCH V2 22/23] perf, tools: Add documentation for topdown metrics kan.liang
@ 2019-03-21 20:57 ` kan.liang
  22 siblings, 0 replies; 30+ messages in thread
From: kan.liang @ 2019-03-21 20:57 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Add V1 event list for Icelake.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V1

 .../pmu-events/arch/x86/icelake/cache.json    | 552 +++++++++++
 .../arch/x86/icelake/floating-point.json      |  90 ++
 .../pmu-events/arch/x86/icelake/frontend.json | 424 +++++++++
 .../pmu-events/arch/x86/icelake/memory.json   | 410 ++++++++
 .../pmu-events/arch/x86/icelake/other.json    | 133 +++
 .../pmu-events/arch/x86/icelake/pipeline.json | 892 ++++++++++++++++++
 .../arch/x86/icelake/virtual-memory.json      | 236 +++++
 tools/perf/pmu-events/arch/x86/mapfile.csv    |   1 +
 8 files changed, 2738 insertions(+)
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/cache.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/floating-point.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/frontend.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/memory.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/other.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/pipeline.json
 create mode 100644 tools/perf/pmu-events/arch/x86/icelake/virtual-memory.json

diff --git a/tools/perf/pmu-events/arch/x86/icelake/cache.json b/tools/perf/pmu-events/arch/x86/icelake/cache.json
new file mode 100644
index 000000000000..3529fc338c17
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/icelake/cache.json
@@ -0,0 +1,552 @@
+[
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of demand Data Read requests that miss L2 cache. Only not rejected loads are counted.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0x21",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.DEMAND_DATA_RD_MISS",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Demand Data Read miss L2, no rejects"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the RFO (Read-for-Ownership) requests that miss L2 cache.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0x22",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.RFO_MISS",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "RFO requests that miss L2 cache"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts L2 cache misses when fetching instructions.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0x24",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.CODE_RD_MISS",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "L2 cache misses when fetching instructions"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts demand requests that miss L2 cache.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0x27",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.ALL_DEMAND_MISS",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Demand requests that miss L2 cache"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts Software prefetch requests that miss the L2 cache. This event accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0x28",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.SWPF_MISS",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "SW prefetch requests that miss L2 cache."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of demand Data Read requests initiated by load instructions that hit L2 cache.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0xc1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.DEMAND_DATA_RD_HIT",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Demand Data Read requests that hit L2 cache"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the RFO (Read-for-Ownership) requests that hit L2 cache.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0xc2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.RFO_HIT",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "RFO requests that hit L2 cache"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts L2 cache hits when fetching instructions, code reads.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0xc4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.CODE_RD_HIT",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "L2 cache hits when fetching instructions, code reads."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts Software prefetch requests that hit the L2 cache. This event accounts for PREFETCHNTA and PREFETCHT0/1/2 instructions.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0xc8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.SWPF_HIT",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "SW prefetch requests that hit L2 cache."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of demand Data Read requests (including requests from L1D hardware prefetchers). These loads may hit or miss L2 cache. Only non rejected loads are counted.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0xe1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.ALL_DEMAND_DATA_RD",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Demand Data Read requests"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the total number of RFO (read for ownership) requests to L2 cache. L2 RFO requests include both L1D demand RFO misses as well as L1D RFO prefetches.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0xe2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.ALL_RFO",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "RFO requests to L2 cache"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the total number of L2 code requests.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0xe4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.ALL_CODE_RD",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "L2 code requests"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts demand requests to L2 cache.",
+        "EventCode": "0x24",
+        "Counter": "0,1,2,3",
+        "UMask": "0xe7",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_RQSTS.ALL_DEMAND_REFERENCES",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Demand requests to L2 cache"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of L1D misses that are outstanding in each cycle, that is each cycle the number of Fill Buffers (FB) outstanding required by Demand Reads. FB either is held by demand loads, or it is held by non-demand loads and gets hit at least once by demand. The valid outstanding interval is defined until the FB deallocation by one of the following ways: from FB allocation, if FB is allocated by demand from the demand Hit FB, if it is allocated by hardware or software prefetch. Note: In the L1D, a Demand Read contains cacheable or noncacheable demand loads, including ones causing cache-line splits and reads due to page walks resulted from any request type.",
+        "EventCode": "0x48",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L1D_PEND_MISS.PENDING",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of L1D misses that are outstanding"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts duration of L1D miss outstanding in cycles.",
+        "EventCode": "0x48",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L1D_PEND_MISS.PENDING_CYCLES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles with L1D load Misses outstanding.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailablability. Demand requests include cacheable/uncacheable demand load, store, lock or SW prefetch accesses.",
+        "EventCode": "0x48",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L1D_PEND_MISS.FB_FULL",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of cycles a demand request has waited due to L1D Fill Buffer (FB) unavailablability."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of phases a demand request has waited due to L1D Fill Buffer (FB) unavailablability. Demand requests include cacheable/uncacheable demand load, store, lock or SW prefetch accesses.",
+        "EventCode": "0x48",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L1D_PEND_MISS.FB_FULL_PERIODS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of phases a demand request has waited due to L1D Fill Buffer (FB) unavailablability.",
+        "CounterMask": "1",
+        "EdgeDetect": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of cycles a demand request has waited due to L1D due to lack of L2 resources. Demand requests include cacheable/uncacheable demand load, store, lock or SW prefetch accesses.",
+        "EventCode": "0x48",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L1D_PEND_MISS.L2_STALL",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of cycles a demand request has waited due to L1D due to lack of L2 resources."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts L1D data line replacements including opportunistic replacements, and replacements that require stall-for-replace or block-for-replace.",
+        "EventCode": "0x51",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L1D.REPLACEMENT",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Counts the number of cache lines replaced in L1 data cache."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of offcore outstanding demand rfo Reads transactions in the super queue every cycle. The 'Offcore outstanding' state of the transaction lasts from the L2 miss until the sending transaction completion to requestor (SQ deallocation). See the corresponding Umask under OFFCORE_REQUESTS.",
+        "EventCode": "0x60",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DEMAND_RFO",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles with offcore outstanding demand rfo reads transactions in SuperQueue (SQ), queue to uncore.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of offcore outstanding cacheable Core Data Read transactions in the super queue every cycle. A transaction is considered to be in the Offcore outstanding state between L2 miss and transaction completion sent to requestor (SQ de-allocation). See corresponding Umask under OFFCORE_REQUESTS.",
+        "EventCode": "0x60",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles when offcore outstanding cacheable Core Data Read transactions are present in the super queue. A transaction is considered to be in the Offcore outstanding state between L2 miss and transaction completion sent to requestor (SQ de-allocation). See corresponding Umask under OFFCORE_REQUESTS.",
+        "EventCode": "0x60",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the Demand Data Read requests sent to uncore. Use it in conjunction with OFFCORE_REQUESTS_OUTSTANDING to determine average latency in the uncore.",
+        "EventCode": "0xB0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "OFFCORE_REQUESTS.DEMAND_DATA_RD",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Demand Data Read requests sent to uncore"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the demand RFO (read for ownership) requests including regular RFOs, locks, ItoM.",
+        "EventCode": "0xB0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "OFFCORE_REQUESTS.DEMAND_RFO",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Demand RFO requests including regular RFOs, locks, ItoM"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the demand and prefetch data reads. All Core Data Reads include cacheable 'Demands' and L2 prefetchers (not L3 prefetchers). Counting also covers reads due to page walks resulted from any request type.",
+        "EventCode": "0xB0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "OFFCORE_REQUESTS.ALL_DATA_RD",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Demand and prefetch data reads"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts memory transactions reached the super queue including requests initiated by the core, all L3 prefetches, page walks, etc..",
+        "EventCode": "0xB0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x80",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "OFFCORE_REQUESTS.ALL_REQUESTS",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Any memory transaction that reached the SQ."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions that true miss the STLB.",
+        "EventCode": "0xD0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x11",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_INST_RETIRED.STLB_MISS_LOADS",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Retired load instructions that miss the STLB.",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired store instructions that true miss the STLB.",
+        "EventCode": "0xD0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x12",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_INST_RETIRED.STLB_MISS_STORES",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Retired store instructions that miss the STLB.",
+        "Data_LA": "1",
+        "L1_Hit_Indication": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions with locked access.",
+        "EventCode": "0xD0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x21",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_INST_RETIRED.LOCK_LOADS",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired load instructions with locked access.",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions that split across a cacheline boundary.",
+        "EventCode": "0xD0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x41",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_INST_RETIRED.SPLIT_LOADS",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Retired load instructions that split across a cacheline boundary.",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired store instructions that split across a cacheline boundary.",
+        "EventCode": "0xD0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x42",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_INST_RETIRED.SPLIT_STORES",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Retired store instructions that split across a cacheline boundary.",
+        "Data_LA": "1",
+        "L1_Hit_Indication": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts all retired load instructions. This event accounts for SW prefetch instructions for loads.",
+        "EventCode": "0xD0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x81",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_INST_RETIRED.ALL_LOADS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "All retired load instructions.",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts all retired store instructions. This event account for SW prefetch instructions and PREFETCHW instruction for stores.",
+        "EventCode": "0xD0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x82",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_INST_RETIRED.ALL_STORES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "All retired store instructions.",
+        "Data_LA": "1",
+        "L1_Hit_Indication": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions with at least one uop that hit in the L1 data cache. This event includes all SW prefetches and lock instructions regardless of the data source.",
+        "EventCode": "0xD1",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_RETIRED.L1_HIT",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Retired load instructions with L1 cache hits as data sources",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions with L2 cache hits as data sources.",
+        "EventCode": "0xD1",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_RETIRED.L2_HIT",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Retired load instructions with L2 cache hits as data sources",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions with at least one uop that hit in the L3 cache.",
+        "EventCode": "0xD1",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_RETIRED.L3_HIT",
+        "SampleAfterValue": "50021",
+        "BriefDescription": "Retired load instructions with L3 cache hits as data sources",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions with at least one uop that missed in the L1 cache.",
+        "EventCode": "0xD1",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_RETIRED.L1_MISS",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Retired load instructions missed L1 cache as data sources",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions missed L2 cache as data sources.",
+        "EventCode": "0xD1",
+        "Counter": "0,1,2,3",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_RETIRED.L2_MISS",
+        "SampleAfterValue": "50021",
+        "BriefDescription": "Retired load instructions missed L2 cache as data sources",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions with at least one uop that missed in the L3 cache.",
+        "EventCode": "0xD1",
+        "Counter": "0,1,2,3",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_RETIRED.L3_MISS",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired load instructions missed L3 cache as data sources",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions with at least one uop was load missed in L1 but hit FB (Fill Buffers) due to preceding miss to the same cache line with data not ready.",
+        "EventCode": "0xd1",
+        "Counter": "0,1,2,3",
+        "UMask": "0x40",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_RETIRED.FB_HIT",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Number of completed demand load requests that missed the L1, but hit the FB(fill buffer), because a preceding miss to the same cacheline initiated the line to be brought into L1, but data is not yet ready in L1.",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the retired load instructions whose data sources were L3 hit and cross-core snoop missed in on-pkg core cache.",
+        "EventCode": "0xd2",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS",
+        "SampleAfterValue": "20011",
+        "BriefDescription": "Retired load instructions whose data sources were L3 hit and cross-core snoop missed in on-pkg core cache.",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions whose data sources were L3 and cross-core snoop hits in on-pkg core cache.",
+        "EventCode": "0xd2",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT",
+        "SampleAfterValue": "20011",
+        "BriefDescription": "Retired load instructions whose data sources were L3 and cross-core snoop hits in on-pkg core cache",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions whose data sources were HitM responses from shared L3.",
+        "EventCode": "0xd2",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM",
+        "SampleAfterValue": "20011",
+        "BriefDescription": "Retired load instructions whose data sources were HitM responses from shared L3",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired load instructions whose data sources were hits in L3 without snoops required.",
+        "EventCode": "0xd2",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "MEM_LOAD_L3_HIT_RETIRED.XSNP_NONE",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Retired load instructions whose data sources were hits in L3 without snoops required",
+        "Data_LA": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of L2 cache lines filling the L2. Counting does not cover rejects.",
+        "EventCode": "0xF1",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1f",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "L2_LINES_IN.ALL",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "L2 cache lines filling L2"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the cycles for which the thread is active and the superQ cannot take any more entries.",
+        "EventCode": "0xF4",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "SQ_MISC.SQ_FULL",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Cycles the thread is active and superQ cannot take any more entries."
+    }
+]
\ No newline at end of file
diff --git a/tools/perf/pmu-events/arch/x86/icelake/floating-point.json b/tools/perf/pmu-events/arch/x86/icelake/floating-point.json
new file mode 100644
index 000000000000..8e59146c2432
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/icelake/floating-point.json
@@ -0,0 +1,90 @@
+[
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of SSE/AVX computational scalar double precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 1 computational operation. Applies to SSE* and AVX* scalar double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
+        "EventCode": "0xc7",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FP_ARITH_INST_RETIRED.SCALAR_DOUBLE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of SSE/AVX computational scalar double precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 1 computation. Applies to SSE* and AVX* scalar double precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 RANGE SQRT DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of SSE/AVX computational scalar single precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 1 computational operation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT RCP FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
+        "EventCode": "0xc7",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FP_ARITH_INST_RETIRED.SCALAR_SINGLE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of SSE/AVX computational scalar single precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 1 computation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 RANGE SQRT DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 2 computation operations, one for each element.  Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
+        "EventCode": "0xc7",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 2 computation operations, one for each element.  Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT14 RCP14 RANGE DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 4 computation operations, one for each element.  Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
+        "EventCode": "0xc7",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 4 computation operations, one for each element.  Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 SQRT DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 4 computation operations, one for each element.  Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT FM(N)ADD/SUB.  FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
+        "EventCode": "0xc7",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 4 computation operations, one for each element.  Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 RANGE SQRT DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 8 computation operations, one for each element.  Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB HADD HSUB SUBADD MUL DIV MIN MAX SQRT RSQRT RCP DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
+        "EventCode": "0xc7",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 8 computation operations, one for each element.  Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP14 RSQRT14 RANGE SQRT DPP FM(N)ADD/SUB.  DPP and FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of SSE/AVX computational 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 8 computation operations, one for each element.  Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 RANGE FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
+        "EventCode": "0xc7",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x40",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_DOUBLE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of SSE/AVX computational 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 16 computation operations, one for each element.  Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 RANGE FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of SSE/AVX computational 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 16 computation operations, one for each element.  Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 RANGE FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element.",
+        "EventCode": "0xc7",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x80",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FP_ARITH_INST_RETIRED.512B_PACKED_SINGLE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of SSE/AVX computational 512-bit packed double precision floating-point instructions retired; some instructions will count twice as noted below.  Each count represents 8 computation operations, one for each element.  Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT RSQRT14 RCP14 RANGE FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform 2 calculations per element."
+    }
+]
\ No newline at end of file
diff --git a/tools/perf/pmu-events/arch/x86/icelake/frontend.json b/tools/perf/pmu-events/arch/x86/icelake/frontend.json
new file mode 100644
index 000000000000..9c3cfbfcec0f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/icelake/frontend.json
@@ -0,0 +1,424 @@
+[
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of uops delivered to Instruction Decode Queue (IDQ) from the MITE path. This also means that uops are not being delivered from the Decode Stream Buffer (DSB).",
+        "EventCode": "0x79",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "IDQ.MITE_UOPS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) from MITE path"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of cycles where optimal number of uops was delivered to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipeline) path. During these cycles uops are not being delivered from the Decode Stream Buffer (DSB).",
+        "EventCode": "0x79",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "IDQ.MITE_CYCLES_OK",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles MITE is delivering optimal number of Uops",
+        "CounterMask": "5"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of cycles uops were delivered to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipeline) path. During these cycles uops are not being delivered from the Decode Stream Buffer (DSB).",
+        "EventCode": "0x79",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "IDQ.MITE_CYCLES_ANY",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles MITE is delivering any Uop",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.",
+        "EventCode": "0x79",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "IDQ.DSB_UOPS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of cycles where optimal number of uops was delivered to the Instruction Decode Queue (IDQ) from the MITE (legacy decode pipeline) path. During these cycles uops are not being delivered from the Decode Stream Buffer (DSB).",
+        "EventCode": "0x79",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "IDQ.DSB_CYCLES_OK",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles DSB is delivering optimal number of Uops",
+        "CounterMask": "5"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of cycles uops were delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path.",
+        "EventCode": "0x79",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "IDQ.DSB_CYCLES_ANY",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles Decode Stream Buffer (DSB) is delivering any Uop",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Number of switches from DSB (Decode Stream Buffer) or MITE (legacy decode pipeline) to the Microcode Sequencer.",
+        "EventCode": "0x79",
+        "Counter": "0,1,2,3",
+        "UMask": "0x30",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "IDQ.MS_SWITCHES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of switches from DSB or MITE to the MS",
+        "CounterMask": "1",
+        "EdgeDetect": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the total number of uops delivered by the Microcode Sequencer (MS). Any instruction over 4 uops will be delivered by the MS. Some instructions such as transcendentals may additionally generate uops from the MS.",
+        "EventCode": "0x79",
+        "Counter": "0,1,2,3",
+        "UMask": "0x30",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "IDQ.MS_UOPS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Uops delivered to IDQ while MS is busy"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles during which uops are being delivered to Instruction Decode Queue (IDQ) while the Microcode Sequencer (MS) is busy. Uops maybe initiated by Decode Stream Buffer (DSB) or MITE.",
+        "EventCode": "0x79",
+        "Counter": "0,1,2,3",
+        "UMask": "0x30",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "IDQ.MS_CYCLES_ANY",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles when uops are being delivered to IDQ while MS is busy",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles where a code line fetch is stalled due to an L1 instruction cache miss. The legacy decode pipeline works at a 16 Byte granularity.",
+        "EventCode": "0x80",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ICACHE_16B.IFDATA_STALL",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache miss."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts instruction fetch tag lookups that hit in the instruction cache (L1I). Counts at 64-byte cache-line granularity. Accounts for both cacheable and uncacheable accesses.",
+        "EventCode": "0x83",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ICACHE_64B.IFTAG_HIT",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Instruction fetch tag lookups that hit in the instruction cache (L1I). Counts at 64-byte cache-line granularity."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts instruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity. Accounts for both cacheable and uncacheable accesses.",
+        "EventCode": "0x83",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ICACHE_64B.IFTAG_MISS",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Instruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles where a code fetch is stalled due to L1 instruction cache tag miss.",
+        "EventCode": "0x83",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ICACHE_64B.IFTAG_STALL",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Cycles where a code fetch is stalled due to L1 instruction cache tag miss."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of uops not delivered to by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle.",
+        "EventCode": "0x9C",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "IDQ_UOPS_NOT_DELIVERED.CORE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Uops not delivered by IDQ when backend of the machine is not stalled"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of cycles when no uops were delivered by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle.",
+        "EventCode": "0x9c",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles when no uops are not delivered by the IDQ when backend of the machine is not stalled",
+        "CounterMask": "5"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of cycles when the optimal number of uops were delivered by the Instruction Decode Queue (IDQ) to the back-end of the pipeline when there was no back-end stalls. This event counts for one SMT thread in a given cycle.",
+        "EventCode": "0x9C",
+        "Invert": "1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "IDQ_UOPS_NOT_DELIVERED.CYCLES_FE_WAS_OK",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles when optimal number of uops was delivered to the back-end when the back-end is not stalled",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Decode Stream Buffer (DSB) is a Uop-cache that holds translations of previously fetched instructions that were decoded by the legacy x86 decode pipeline (MITE). This event counts fetch penalty cycles when a transition occurs from DSB to MITE.",
+        "EventCode": "0xAB",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DSB2MITE_SWITCHES.PENALTY_CYCLES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "DSB-to-MITE switch true penalty cycles."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired Instructions that experienced DSB (Decode stream buffer i.e. the decoded instruction-cache) miss.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x11",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.DSB_MISS",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired Instructions who experienced DSB miss.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired Instructions who experienced Instruction L1 Cache true miss.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x12",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.L1I_MISS",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired Instructions who experienced Instruction L1 Cache true miss.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired Instructions who experienced Instruction L2 Cache true miss.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x13",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.L2_MISS",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired Instructions who experienced Instruction L2 Cache true miss.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired Instructions that experienced iTLB (Instruction TLB) true miss.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x14",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.ITLB_MISS",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired Instructions who experienced iTLB true miss.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired Instructions that experienced STLB (2nd level TLB) true miss.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x15",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.STLB_MISS",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired Instructions who experienced STLB (2nd level TLB) true miss.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 2 cycles which was not interrupted by a back-end stall.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x500206",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.LATENCY_GE_2",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 2 cycles which was not interrupted by a back-end stall.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 4 cycles which was not interrupted by a back-end stall.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x500406",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.LATENCY_GE_4",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 4 cycles which was not interrupted by a back-end stall.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired instructions that are delivered to the back-end after a front-end stall of at least 8 cycles. During this period the front-end delivered no uops.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x500806",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.LATENCY_GE_8",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 8 cycles which was not interrupted by a back-end stall.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired instructions that are delivered to the back-end after a front-end stall of at least 16 cycles. During this period the front-end delivered no uops.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x501006",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.LATENCY_GE_16",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 16 cycles which was not interrupted by a back-end stall.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired instructions that are delivered to the back-end after a front-end stall of at least 32 cycles. During this period the front-end delivered no uops.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x502006",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.LATENCY_GE_32",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 32 cycles which was not interrupted by a back-end stall.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 64 cycles which was not interrupted by a back-end stall.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x504006",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.LATENCY_GE_64",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 64 cycles which was not interrupted by a back-end stall.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 128 cycles which was not interrupted by a back-end stall.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x508006",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.LATENCY_GE_128",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 128 cycles which was not interrupted by a back-end stall.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 256 cycles which was not interrupted by a back-end stall.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x510006",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.LATENCY_GE_256",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 256 cycles which was not interrupted by a back-end stall.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 512 cycles which was not interrupted by a back-end stall.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x520006",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.LATENCY_GE_512",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired instructions that are fetched after an interval where the front-end delivered no uops for a period of 512 cycles which was not interrupted by a back-end stall.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts retired instructions that are delivered to the back-end after the front-end had at least 1 bubble-slot for a period of 2 cycles. A bubble-slot is an empty issue-pipeline slot while there was no RAT stall.",
+        "EventCode": "0xC6",
+        "MSRValue": "0x100206",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "FRONTEND_RETIRED.LATENCY_GE_2_BUBBLES_GE_1",
+        "MSRIndex": "0x3F7",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Retired instructions that are fetched after an interval where the front-end had at least 1 bubble-slot for a period of 2 cycles which was not interrupted by a back-end stall.",
+        "TakenAlone": "1"
+    }
+]
\ No newline at end of file
diff --git a/tools/perf/pmu-events/arch/x86/icelake/memory.json b/tools/perf/pmu-events/arch/x86/icelake/memory.json
new file mode 100644
index 000000000000..f158366b9dd6
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/icelake/memory.json
@@ -0,0 +1,410 @@
+[
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times a TSX line had a cache conflict.",
+        "EventCode": "0x54",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "TX_MEM.ABORT_CONFLICT",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Speculatively counts the number Transactional Synchronization Extensions (TSX) Aborts due to a data capacity limitation for transactional writes.",
+        "EventCode": "0x54",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "TX_MEM.ABORT_CAPACITY_WRITE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Speculatively counts the number TSX Aborts due to a data capacity limitation for transactional writes."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times a TSX Abort was triggered due to a non-release/commit store to lock.",
+        "EventCode": "0x54",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "TX_MEM.ABORT_HLE_STORE_TO_ELIDED_LOCK",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Number of times a HLE transactional region aborted due to a non XRELEASE prefixed instruction writing to an elided lock in the elision buffer"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times a TSX Abort was triggered due to commit but Lock Buffer not empty.",
+        "EventCode": "0x54",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_NOT_EMPTY",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times a TSX Abort was triggered due to release/commit but data and address mismatch.",
+        "EventCode": "0x54",
+        "Counter": "0,1,2,3",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_MISMATCH",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times a TSX Abort was triggered due to attempting an unsupported alignment from Lock Buffer.",
+        "EventCode": "0x54",
+        "Counter": "0,1,2,3",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "TX_MEM.ABORT_HLE_ELISION_BUFFER_UNSUPPORTED_ALIGNMENT",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an HLE transactional execution aborted due to an unsupported read alignment from the elision buffer."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times we could not allocate Lock Buffer.",
+        "EventCode": "0x54",
+        "Counter": "0,1,2,3",
+        "UMask": "0x40",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "TX_MEM.HLE_ELISION_BUFFER_FULL",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts Unfriendly TSX abort triggered by a vzeroupper instruction.",
+        "EventCode": "0x5d",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "TX_EXEC.MISC2",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Counts the number of times a class of instructions that may cause a transactional abort was executed inside a transactional region"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts Unfriendly TSX abort triggered by a nest count that is too deep.",
+        "EventCode": "0x5d",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "TX_EXEC.MISC3",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an instruction execution caused the transactional nest count supported to be exceeded"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xA3",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "CYCLE_ACTIVITY.CYCLES_L3_MISS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles while L3 cache miss demand load is outstanding.",
+        "CounterMask": "2"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xA3",
+        "Counter": "0,1,2,3",
+        "UMask": "0x6",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "CYCLE_ACTIVITY.STALLS_L3_MISS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Execution stalls while L3 cache miss demand load is outstanding.",
+        "CounterMask": "6"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Demand Data Read requests who miss L3 cache.",
+        "EventCode": "0xB0",
+        "Counter": "0,1,2,3",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "OFFCORE_REQUESTS.L3_MISS_DEMAND_DATA_RD",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Demand Data Read requests who miss L3 cache"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of Machine Clears detected dye to memory ordering. Memory Ordering Machine Clears may apply when a memory read may not conform to the memory ordering rules of the x86 architecture",
+        "EventCode": "0xc3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MACHINE_CLEARS.MEMORY_ORDERING",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Number of machine clears due to memory ordering conflicts."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times we entered an HLE region. Does not count nested transactions.",
+        "EventCode": "0xC8",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "HLE_RETIRED.START",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an HLE execution started."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times HLE commit succeeded.",
+        "EventCode": "0xC8",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "HLE_RETIRED.COMMIT",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an HLE execution successfully committed",
+        "Data_LA": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times HLE abort was triggered.",
+        "EventCode": "0xc8",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "HLE_RETIRED.ABORTED",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an HLE execution aborted due to any reasons (multiple categories may count as one)."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts).",
+        "EventCode": "0xC8",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "HLE_RETIRED.ABORTED_MEM",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts)."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times an HLE execution aborted due to HLE-unfriendly instructions and certain unfriendly events (such as AD assists etc.).",
+        "EventCode": "0xC8",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "HLE_RETIRED.ABORTED_UNFRIENDLY",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an HLE execution aborted due to HLE-unfriendly instructions and certain unfriendly events (such as AD assists etc.)."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times an HLE execution aborted due to unfriendly events (such as interrupts).",
+        "EventCode": "0xC8",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x80",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "HLE_RETIRED.ABORTED_EVENTS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an HLE execution aborted due to unfriendly events (such as interrupts)."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times we entered an RTM region. Does not count nested transactions.",
+        "EventCode": "0xC9",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RTM_RETIRED.START",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an RTM execution started."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times RTM commit succeeded.",
+        "EventCode": "0xC9",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RTM_RETIRED.COMMIT",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an RTM execution successfully committed"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times RTM abort was triggered.",
+        "EventCode": "0xc9",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RTM_RETIRED.ABORTED",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an RTM execution aborted.",
+        "Data_LA": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts).",
+        "EventCode": "0xC9",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RTM_RETIRED.ABORTED_MEM",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts)"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times an RTM execution aborted due to HLE-unfriendly instructions.",
+        "EventCode": "0xC9",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RTM_RETIRED.ABORTED_UNFRIENDLY",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an RTM execution aborted due to HLE-unfriendly instructions"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times an RTM execution aborted due to incompatible memory type.",
+        "EventCode": "0xC9",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x40",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RTM_RETIRED.ABORTED_MEMTYPE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an RTM execution aborted due to incompatible memory type"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt).",
+        "EventCode": "0xC9",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x80",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RTM_RETIRED.ABORTED_EVENTS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt)"
+    },
+    {
+        "PEBS": "2",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 4 cycles.  Reported latency may be longer than just the memory latency.",
+        "EventCode": "0xcd",
+        "MSRValue": "0x4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4",
+        "MSRIndex": "0x3F6",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 4 cycles.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "2",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 8 cycles.  Reported latency may be longer than just the memory latency.",
+        "EventCode": "0xcd",
+        "MSRValue": "0x8",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8",
+        "MSRIndex": "0x3F6",
+        "SampleAfterValue": "50021",
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 8 cycles.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "2",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 16 cycles.  Reported latency may be longer than just the memory latency.",
+        "EventCode": "0xcd",
+        "MSRValue": "0x10",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_16",
+        "MSRIndex": "0x3F6",
+        "SampleAfterValue": "20011",
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 16 cycles.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "2",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 32 cycles.  Reported latency may be longer than just the memory latency.",
+        "EventCode": "0xcd",
+        "MSRValue": "0x20",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_32",
+        "MSRIndex": "0x3F6",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 32 cycles.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "2",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 64 cycles.  Reported latency may be longer than just the memory latency.",
+        "EventCode": "0xcd",
+        "MSRValue": "0x40",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_64",
+        "MSRIndex": "0x3F6",
+        "SampleAfterValue": "2003",
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 64 cycles.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "2",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 128 cycles.  Reported latency may be longer than just the memory latency.",
+        "EventCode": "0xcd",
+        "MSRValue": "0x80",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_128",
+        "MSRIndex": "0x3F6",
+        "SampleAfterValue": "1009",
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 128 cycles.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "2",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 256 cycles.  Reported latency may be longer than just the memory latency.",
+        "EventCode": "0xcd",
+        "MSRValue": "0x100",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256",
+        "MSRIndex": "0x3F6",
+        "SampleAfterValue": "503",
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 256 cycles.",
+        "TakenAlone": "1"
+    },
+    {
+        "PEBS": "2",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 512 cycles.  Reported latency may be longer than just the memory latency.",
+        "EventCode": "0xcd",
+        "MSRValue": "0x200",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512",
+        "MSRIndex": "0x3F6",
+        "SampleAfterValue": "101",
+        "BriefDescription": "Counts randomly selected loads when the latency from first dispatch to completion is greater than 512 cycles.",
+        "TakenAlone": "1"
+    }
+]
\ No newline at end of file
diff --git a/tools/perf/pmu-events/arch/x86/icelake/other.json b/tools/perf/pmu-events/arch/x86/icelake/other.json
new file mode 100644
index 000000000000..b9d0e17d4d23
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/icelake/other.json
@@ -0,0 +1,133 @@
+[
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method. The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core. Software can use this event as the denominator for the top-level metrics of the Top-down Microarchitecture Analysis method. This event is counted on a designated fixed counter (Fixed Counter 3) and is an architectural event.",
+        "Counter": "35",
+        "UMask": "0x4",
+        "PEBScounters": "35",
+        "EventName": "TOPDOWN.SLOTS",
+        "SampleAfterValue": "10000003",
+        "BriefDescription": "Counts the number of available slots for an unhalted logical processor."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts Core cycles where the core was running with power-delivery for baseline license level 0.  This includes non-AVX codes, SSE, AVX 128-bit, and low-current AVX 256-bit codes.",
+        "EventCode": "0x28",
+        "Counter": "0,1,2,3",
+        "UMask": "0x7",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "CORE_POWER.LVL0_TURBO_LICENSE",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Core cycles where the core was running in a manner where Turbo may be clipped to the Non-AVX turbo schedule."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts Core cycles where the core was running with power-delivery for license level 1.  This includes high current AVX 256-bit instructions as well as low current AVX 512-bit instructions.",
+        "EventCode": "0x28",
+        "Counter": "0,1,2,3",
+        "UMask": "0x18",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "CORE_POWER.LVL1_TURBO_LICENSE",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Core cycles where the core was running in a manner where Turbo may be clipped to the AVX2 turbo schedule."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Core cycles where the core was running with power-delivery for license level 2 (introduced in Skylake Server microarchtecture).  This includes high current AVX 512-bit instructions.",
+        "EventCode": "0x28",
+        "Counter": "0,1,2,3",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "CORE_POWER.LVL2_TURBO_LICENSE",
+        "SampleAfterValue": "200003",
+        "BriefDescription": "Core cycles where the core was running in a manner where Turbo may be clipped to the AVX512 turbo schedule."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of PREFETCHNTA instructions executed.",
+        "EventCode": "0x32",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "SW_PREFETCH_ACCESS.NTA",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of PREFETCHNTA instructions executed."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of PREFETCHT0 instructions executed.",
+        "EventCode": "0x32",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "SW_PREFETCH_ACCESS.T0",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of PREFETCHT0 instructions executed."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of PREFETCHT1 or PREFETCHT2 instructions executed.",
+        "EventCode": "0x32",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "SW_PREFETCH_ACCESS.T1_T2",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of PREFETCHT1 or PREFETCHT2 instructions executed."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of PREFETCHW instructions executed.",
+        "EventCode": "0x32",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "SW_PREFETCH_ACCESS.PREFETCHW",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of PREFETCHW instructions executed."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of available slots for an unhalted logical processor. The event increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method. The count is distributed among unhalted logical processors (hyper-threads) who share the same physical core.",
+        "EventCode": "0xa4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "TOPDOWN.SLOTS_P",
+        "SampleAfterValue": "10000003",
+        "BriefDescription": "Counts the number of available slots for an unhalted logical processor."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xA4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "TOPDOWN.BACKEND_BOUND_SLOTS",
+        "SampleAfterValue": "10000003",
+        "BriefDescription": "Issue slots where no uops were being issued due to lack of back end resources."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts all microcode Floating Point assists.",
+        "EventCode": "0xC1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "ASSISTS.FP",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Counts all microcode FP assists.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of occurrences where a microcode assist is invoked by hardware Examples include AD (page Access Dirty), FP and AVX related assists.",
+        "EventCode": "0xc1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x7",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "ASSISTS.ANY",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Number of occurrences where a microcode assist is invoked by hardware."
+    }
+]
\ No newline at end of file
diff --git a/tools/perf/pmu-events/arch/x86/icelake/pipeline.json b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
new file mode 100644
index 000000000000..6d8311e634aa
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/icelake/pipeline.json
@@ -0,0 +1,892 @@
+[
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
+        "Counter": "32",
+        "UMask": "0x1",
+        "PEBScounters": "32",
+        "EventName": "INST_RETIRED.ANY",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of instructions retired. Fixed Counter - architectural event"
+    },
+    {
+        "PEBS": "2",
+        "CollectPEBSRecord": "3",
+        "PublicDescription": "A version of INST_RETIRED that allows for a more unbiased distribution of samples across instructions retired. It utilizes the Precise Distribution of Instructions Retired (PDIR) feature to mitigate some bias in how retired instructions get sampled. Use on Fixed Counter 0.",
+        "Counter": "32",
+        "UMask": "0x1",
+        "PEBScounters": "32",
+        "EventName": "INST_RETIRED.PREC_DIST",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Precise instruction retired event with a reduced effect of PEBS shadow in IP distribution"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of core cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios. The core frequency may change from time to time due to transitions associated with Enhanced Intel SpeedStep Technology or TM2. For this reason this event may have a changing ratio with regards to time. When the core frequency is constant, this event can approximate elapsed time while the core was not in the halt state. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events.",
+        "Counter": "33",
+        "UMask": "0x2",
+        "PEBScounters": "33",
+        "EventName": "CPU_CLK_UNHALTED.THREAD",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Core cycles when the thread is not in halt state"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of reference cycles when the core is not in a halt state. The core enters the halt state when it is running the HLT instruction or the MWAIT instruction. This event is not affected by core frequency changes (for example, P states, TM2 transitions) but has the same incrementing frequency as the time stamp counter. This event can approximate elapsed time while the core was not in a halt state. This event has a constant ratio with the CPU_CLK_UNHALTED.REF_XCLK event. It is counted on a dedicated fixed counter, leaving the four (eight when Hyperthreading is disabled) programmable counters available for other events. Note: On all current platforms this event stops counting during 'throttling (TM)' states duty off periods the processor is 'halted'.  The counter update is done at a lower clock rate then the core clock the overflow status bit for this counter may appear 'sticky'.  After the counter has overflowed and software clears the overfl
 ow status bit and resets the counter to less than MAX. The reset value to the counter is not clocked immediately so the overflow status bit will flip 'high (1)' and generate another PMI (if enabled) after which the reset value gets clocked into the counter. Therefore, software will get the interrupt, read the overflow status bit '1 for bit 34 while the counter value is less than MAX. Software should ignore this case.",
+        "Counter": "34",
+        "UMask": "0x3",
+        "PEBScounters": "34",
+        "EventName": "CPU_CLK_UNHALTED.REF_TSC",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Reference cycles when the core is not in halt state."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times the load operation got the true Block-on-Store blocking code preventing store forwarding. This includes cases when: a. preceding store conflicts with the load (incomplete overlap),b. store forwarding is impossible due to u-arch limitations, c. preceding lock RMW operations are not forwarded, d. store has the no-forward bit set (uncacheable/page-split/masked stores), e. all-blocking stores are used (mostly, fences and port I/O), and others. The most common case is a load blocked due to its address range overlapping with a preceding smaller uncompleted store. Note: This event does not take into account cases of out-of-SW-control (for example, SbTailHit), unknown physical STA, and cases of blocking loads on store due to being non-WB memory type or a lock. These cases are covered by other events. See the table of not supported store forwards in the Optimization Guide.",
+        "EventCode": "0x03",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "LD_BLOCKS.STORE_FORWARD",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Loads blocked by overlapping with store buffer that cannot be forwarded."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use.",
+        "EventCode": "0x03",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "LD_BLOCKS.NO_SR",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times a load got blocked due to false dependencies in MOB due to partial compare on address.",
+        "EventCode": "0x07",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "LD_BLOCKS_PARTIAL.ADDRESS_ALIAS",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "False dependencies in MOB due to partial compare on address."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts core cycles when the Resource allocator was stalled due to recovery from an earlier branch misprediction or machine clear event.",
+        "EventCode": "0x0D",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "INT_MISC.RECOVERY_CYCLES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Core cycles the allocator was stalled due to recovery from earlier clear event for this thread"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles the Backend cluster is recovering after a miss-speculation or a Store Buffer or Load Buffer drain stall.",
+        "EventCode": "0x0D",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x3",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "INT_MISC.ALL_RECOVERY_CYCLES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles the Backend cluster is recovering after a miss-speculation or a Store Buffer or Load Buffer drain stall.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Cycles after recovery from a branch misprediction or machine clear till the first uop is issued from the resteered path.",
+        "EventCode": "0x0d",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x80",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "INT_MISC.CLEAR_RESTEER_CYCLES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Counts cycles after recovery from a branch misprediction or machine clear till the first uop is issued from the resteered path."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of uops that the Resource Allocation Table (RAT) issues to the Reservation Station (RS).",
+        "EventCode": "0x0E",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_ISSUED.ANY",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Uops that RAT issues to RS"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles during which the Resource Allocation Table (RAT) does not issue any Uops to the reservation station (RS) for the current thread.",
+        "EventCode": "0x0E",
+        "Invert": "1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_ISSUED.STALL_CYCLES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles when RAT does not issue Uops to RS for the thread",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles when divide unit is busy executing divide or square root operations. Accounts for integer and floating-point operations.",
+        "EventCode": "0x14",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x9",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "ARITH.DIVIDER_ACTIVE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles when divide unit is busy executing divide or square root operations.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "This is an architectural event that counts the number of thread cycles while the thread is not in a halt state. The thread enters the halt state when it is running the HLT instruction. The core frequency may change from time to time due to power or thermal throttling. For this reason, this event may have a changing ratio with regards to wall clock time.",
+        "EventCode": "0x3C",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "CPU_CLK_UNHALTED.THREAD_P",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Thread cycles when thread is not in halt state"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts core crystal clock cycles when the thread is unhalted.",
+        "EventCode": "0x3C",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "CPU_CLK_UNHALTED.REF_XCLK",
+        "SampleAfterValue": "25003",
+        "BriefDescription": "Core crystal clock cycles when the thread is unhalted."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts Core crystal clock cycles when current thread is unhalted and the other thread is halted.",
+        "EventCode": "0x3C",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE",
+        "SampleAfterValue": "25003",
+        "BriefDescription": "Core crystal clock cycles when this thread is unhalted and the other thread is halted."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts all not software-prefetch load dispatches that hit the fill buffer (FB) allocated for the software prefetch. It can also be incremented by some lock instructions. So it should only be used with profiling so that the locks can be excluded by ASM (Assembly File) inspection of the nearby instructions.",
+        "EventCode": "0x4c",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "LOAD_HIT_PREFETCH.SWPF",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Counts the number of demand load dispatches that hit L1D fill buffer (FB) allocated for software prefetch."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles during which the reservation station (RS) is empty for this logical processor. This is usually caused when the front-end pipeline runs into stravation periods (e.g. branch mispredictions or i-cache misses)",
+        "EventCode": "0x5E",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RS_EVENTS.EMPTY_CYCLES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles when Reservation Station (RS) is empty for the thread"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts end of periods where the Reservation Station (RS) was empty. Could be useful to closely sample on front-end latency issues (see the FRONTEND_RETIRED event of designated precise events)",
+        "EventCode": "0x5E",
+        "Invert": "1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RS_EVENTS.EMPTY_END",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Counts end of periods where the Reservation Station (RS) was empty.",
+        "CounterMask": "1",
+        "EdgeDetect": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles that the Instruction Length decoder (ILD) stalls occurred due to dynamically changing prefix length of the decoded instruction (by operand size prefix instruction 0x66, address size prefix instruction 0x67 or REX.W for Intel64). Count is proportional to the number of prefixes in a 16B-line. This may result in a three-cycle penalty for each LCP (Length changing prefix) in a 16-byte chunk.",
+        "EventCode": "0x87",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ILD_STALL.LCP",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Stalls caused by changing prefix length of the instruction."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 0.",
+        "EventCode": "0xa1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_DISPATCHED.PORT_0",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of uops executed on port 0"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 1.",
+        "EventCode": "0xa1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_DISPATCHED.PORT_1",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of uops executed on port 1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to ports 2 and 3.",
+        "EventCode": "0xa1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_DISPATCHED.PORT_2_3",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of uops executed on port 2 and 3"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to ports 5 and 9.",
+        "EventCode": "0xa1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_DISPATCHED.PORT_4_9",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of uops executed on port 4 and 9"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 5.",
+        "EventCode": "0xa1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_DISPATCHED.PORT_5",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of uops executed on port 5"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to port 6.",
+        "EventCode": "0xa1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x40",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_DISPATCHED.PORT_6",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of uops executed on port 6"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts, on the per-thread basis, cycles during which at least one uop is dispatched from the Reservation Station (RS) to ports 7 and 8.",
+        "EventCode": "0xa1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x80",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_DISPATCHED.PORT_7_8",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of uops executed on port 7 and 8"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xa2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RESOURCE_STALLS.SCOREBOARD",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Counts cycles where the pipeline is stalled due to serializing operations."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts allocation stall cycles caused by the store buffer (SB) being full. This counts cycles that the pipeline back-end blocked uop delivery from the front-end.",
+        "EventCode": "0xA2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "RESOURCE_STALLS.SB",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles stalled due to no store buffers available. (not including draining form sync)."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xA3",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "CYCLE_ACTIVITY.CYCLES_L2_MISS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles while L2 cache miss demand load is outstanding.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xA3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "CYCLE_ACTIVITY.STALLS_TOTAL",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Total execution stalls.",
+        "CounterMask": "4"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xA3",
+        "Counter": "0,1,2,3",
+        "UMask": "0x5",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "CYCLE_ACTIVITY.STALLS_L2_MISS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Execution stalls while L2 cache miss demand load is outstanding.",
+        "CounterMask": "5"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xA3",
+        "Counter": "0,1,2,3",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "CYCLE_ACTIVITY.CYCLES_L1D_MISS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles while L1 cache miss demand load is outstanding.",
+        "CounterMask": "8"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xA3",
+        "Counter": "0,1,2,3",
+        "UMask": "0xc",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "CYCLE_ACTIVITY.STALLS_L1D_MISS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Execution stalls while L1 cache miss demand load is outstanding.",
+        "CounterMask": "12"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xA3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "CYCLE_ACTIVITY.CYCLES_MEM_ANY",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles while memory subsystem has an outstanding load.",
+        "CounterMask": "16"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xA3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x14",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "CYCLE_ACTIVITY.STALLS_MEM_ANY",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Execution stalls while memory subsystem has an outstanding load.",
+        "CounterMask": "20"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles during which a total of 1 uop was executed on all ports and Reservation Station (RS) was not empty.",
+        "EventCode": "0xa6",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "EXE_ACTIVITY.1_PORTS_UTIL",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles total of 1 uop is executed on all ports and Reservation Station was not empty."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles during which a total of 2 uops were executed on all ports and Reservation Station (RS) was not empty.",
+        "EventCode": "0xa6",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "EXE_ACTIVITY.2_PORTS_UTIL",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles total of 2 uops are executed on all ports and Reservation Station was not empty."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles where the Store Buffer was full and no loads caused an execution stall.",
+        "EventCode": "0xA6",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x40",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "EXE_ACTIVITY.BOUND_ON_STORES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles where the Store Buffer was full and no loads caused an execution stall.",
+        "CounterMask": "2"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles during which no uops were executed on all ports and Reservation Station (RS) was not empty.",
+        "EventCode": "0xa6",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x80",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "EXE_ACTIVITY.EXE_BOUND_0_PORTS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles where no uops were executed, the Reservation Station was not empty, the Store Buffer was full and there was no outstanding load."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of uops delivered to the back-end by the LSD(Loop Stream Detector).",
+        "EventCode": "0xA8",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "LSD.UOPS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of Uops delivered by the LSD."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the cycles when at least one uop is delivered by the LSD (Loop-stream detector).",
+        "EventCode": "0xA8",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "LSD.CYCLES_ACTIVE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles Uops delivered by the LSD, but didn't come from the decoder.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the cycles when optimal number of uops is delivered by the LSD (Loop-stream detector).",
+        "EventCode": "0xa8",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "LSD.CYCLES_OK",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles optimal number of Uops delivered by the LSD, but did not come from the decoder.",
+        "CounterMask": "5"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "EventCode": "0xB1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.THREAD",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Counts the number of uops to be executed per-thread each cycle."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles during which no uops were dispatched from the Reservation Station (RS) per thread.",
+        "EventCode": "0xB1",
+        "Invert": "1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.STALL_CYCLES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Counts number of cycles no uops were dispatched to be executed on this thread.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Cycles where at least 1 uop was executed per-thread.",
+        "EventCode": "0xb1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.CYCLES_GE_1",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles where at least 1 uop was executed per-thread",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Cycles where at least 2 uops were executed per-thread.",
+        "EventCode": "0xb1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.CYCLES_GE_2",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles where at least 2 uops were executed per-thread",
+        "CounterMask": "2"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Cycles where at least 3 uops were executed per-thread.",
+        "EventCode": "0xb1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.CYCLES_GE_3",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles where at least 3 uops were executed per-thread",
+        "CounterMask": "3"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Cycles where at least 4 uops were executed per-thread.",
+        "EventCode": "0xb1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.CYCLES_GE_4",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles where at least 4 uops were executed per-thread",
+        "CounterMask": "4"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of uops executed from any thread.",
+        "EventCode": "0xB1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.CORE",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of uops executed on the core."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles when at least 1 micro-op is executed from any thread on physical core.",
+        "EventCode": "0xB1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_1",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles at least 1 micro-op is executed from any thread on physical core.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles when at least 2 micro-ops are executed from any thread on physical core.",
+        "EventCode": "0xB1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_2",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles at least 2 micro-op is executed from any thread on physical core.",
+        "CounterMask": "2"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles when at least 3 micro-ops are executed from any thread on physical core.",
+        "EventCode": "0xB1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_3",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles at least 3 micro-op is executed from any thread on physical core.",
+        "CounterMask": "3"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles when at least 4 micro-ops are executed from any thread on physical core.",
+        "EventCode": "0xB1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.CORE_CYCLES_GE_4",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles at least 4 micro-op is executed from any thread on physical core.",
+        "CounterMask": "4"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of x87 uops executed.",
+        "EventCode": "0xB1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_EXECUTED.X87",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Counts the number of x87 uops dispatched."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of X86 instructions retired - an Architectural PerfMon event. Counting continues during hardware interrupts, traps, and inside interrupt handlers. Notes: INST_RETIRED.ANY is counted by a designated fixed counter freeing up programmable counters to count other events. INST_RETIRED.ANY_P is counted by a programmable counter.",
+        "EventCode": "0xC0",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "INST_RETIRED.ANY_P",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of instructions retired. General Counter - architectural event"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of cycles using always true condition (uops_ret &amp;lt; 16) applied to non PEBS uops retired event.",
+        "EventCode": "0xC2",
+        "Invert": "1",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_RETIRED.TOTAL_CYCLES",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycles with less than 10 actually retired uops.",
+        "CounterMask": "10"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the retirement slots used each cycle.",
+        "EventCode": "0xc2",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "UOPS_RETIRED.SLOTS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Retirement slots used."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of machine clears (nukes) of any type.",
+        "EventCode": "0xC3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MACHINE_CLEARS.COUNT",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Number of machine clears (nukes) of any type.",
+        "CounterMask": "1",
+        "EdgeDetect": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts self-modifying code (SMC) detected, which causes a machine clear.",
+        "EventCode": "0xC3",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MACHINE_CLEARS.SMC",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Self-modifying code (SMC) detected."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts all branch instructions retired.",
+        "EventCode": "0xC4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_INST_RETIRED.ALL_BRANCHES",
+        "SampleAfterValue": "400009",
+        "BriefDescription": "All branch instructions retired."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts taken conditional branch instructions retired.",
+        "EventCode": "0xc4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_INST_RETIRED.COND_TAKEN",
+        "SampleAfterValue": "400009",
+        "BriefDescription": "Taken conditional branch instructions retired."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts both direct and indirect near call instructions retired.",
+        "EventCode": "0xC4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_INST_RETIRED.NEAR_CALL",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Direct and indirect near call instructions retired."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts return instructions retired.",
+        "EventCode": "0xC4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x8",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_INST_RETIRED.NEAR_RETURN",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Return instructions retired."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts not taken branch instructions retired.",
+        "EventCode": "0xC4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_INST_RETIRED.COND_NTAKEN",
+        "SampleAfterValue": "400009",
+        "BriefDescription": "Not taken branch instructions retired."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts conditional branch instructions retired.",
+        "EventCode": "0xc4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x11",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_INST_RETIRED.COND",
+        "SampleAfterValue": "400009",
+        "BriefDescription": "Conditional branch instructions retired."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts taken branch instructions retired.",
+        "EventCode": "0xC4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_INST_RETIRED.NEAR_TAKEN",
+        "SampleAfterValue": "400009",
+        "BriefDescription": "Taken branch instructions retired."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts far branch instructions retired.",
+        "EventCode": "0xC4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x40",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_INST_RETIRED.FAR_BRANCH",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Far branch instructions retired."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts all indirect branch instructions retired (excluding RETs. TSX aborts is considered indirect branch).",
+        "EventCode": "0xc4",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x80",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_INST_RETIRED.INDIRECT",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "All indirect branch instructions retired (excluding RETs. TSX aborts are considered indirect branch)."
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts all the retired branch instructions that were mispredicted by the processor. A branch misprediction occurs when the processor incorrectly predicts the destination of the branch.  When the misprediction is discovered at execution, all the instructions executed in the wrong (speculative) path must be discarded, and the processor must start fetching from the correct path.",
+        "EventCode": "0xC5",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_MISP_RETIRED.ALL_BRANCHES",
+        "SampleAfterValue": "400009",
+        "BriefDescription": "All mispredicted branch instructions retired.",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts taken conditional mispredicted branch instructions retired.",
+        "EventCode": "0xc5",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_MISP_RETIRED.COND_TAKEN",
+        "SampleAfterValue": "400009",
+        "BriefDescription": "number of branch instructions retired that were mispredicted and taken. Non PEBS",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts mispredicted conditional branch instructions retired.",
+        "EventCode": "0xc5",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x11",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_MISP_RETIRED.COND",
+        "SampleAfterValue": "400009",
+        "BriefDescription": "Mispredicted conditional branch instructions retired.",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts number of near branch instructions retired that were mispredicted and taken.",
+        "EventCode": "0xC5",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_MISP_RETIRED.NEAR_TAKEN",
+        "SampleAfterValue": "400009",
+        "BriefDescription": "Number of near branch instructions retired that were mispredicted and taken.",
+        "Data_LA": "1"
+    },
+    {
+        "PEBS": "1",
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts all miss-predicted indirect branch instructions retired (excluding RETs. TSX aborts is considered indirect branch).",
+        "EventCode": "0xC5",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x80",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "BR_MISP_RETIRED.INDIRECT",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "All miss-predicted indirect branch instructions retired (excluding RETs. TSX aborts is considered indirect branch).",
+        "Data_LA": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Increments when an entry is added to the Last Branch Record (LBR) array (or removed from the array in case of RETURNs in call stack mode). The event requires LBR enable via IA32_DEBUGCTL MSR and branch type selection via MSR_LBR_SELECT.",
+        "EventCode": "0xcc",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "MISC_RETIRED.LBR_INSERTS",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Increments whenever there is an update to the LBR array."
+    },
+    {
+        "PublicDescription": "Counts number of retired PAUSE instructions (that do not end up with a VMExit to the VMM; TSX aborted Instructions may be counted).",
+        "EventCode": "0xcc",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x40",
+        "EventName": "MISC_RETIRED.PAUSE_INST",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of retired PAUSE instructions."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of times the front-end is resteered when it finds a branch instruction in a fetch line. This occurs for the first time a branch instruction is fetched or when the branch is not tracked by the BPU (Branch Prediction Unit) anymore.",
+        "EventCode": "0xE6",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "BACLEARS.ANY",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "This event distributes cycle counts between active hyperthreads, i.e., those in C0.  A hyperthread becomes inactive when it executes the HLT or MWAIT instructions.  If all other hyperthreads are inactive (or disabled or do not exist), all counts are attributed to this hyperthread. To obtain the full count when the Core is active, sum the counts from each hyperthread.",
+        "EventCode": "0xec",
+        "Counter": "0,1,2,3,4,5,6,7",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3,4,5,6,7",
+        "EventName": "CPU_CLK_UNHALTED.DISTRIBUTED",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Cycle counts are evenly distributed between active threads in the Core."
+    }
+]
\ No newline at end of file
diff --git a/tools/perf/pmu-events/arch/x86/icelake/virtual-memory.json b/tools/perf/pmu-events/arch/x86/icelake/virtual-memory.json
new file mode 100644
index 000000000000..7180a900c175
--- /dev/null
+++ b/tools/perf/pmu-events/arch/x86/icelake/virtual-memory.json
@@ -0,0 +1,236 @@
+[
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts page walks completed due to demand data loads whose address translations missed in the TLB and were mapped to 4K pages.  The page walks can end with or without a page fault.",
+        "EventCode": "0x08",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_4K",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Page walks completed due to a demand data load to a 4K page."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts page walks completed due to demand data loads whose address translations missed in the TLB and were mapped to 2M/4M pages.  The page walks can end with or without a page fault.",
+        "EventCode": "0x08",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED_2M_4M",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Page walks completed due to a demand data load to a 2M/4M page."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts demand data loads that caused a completed page walk of any page size (4K/2M/4M/1G). This implies it missed in all TLB levels. The page walk can end with or without a fault.",
+        "EventCode": "0x08",
+        "Counter": "0,1,2,3",
+        "UMask": "0xe",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_LOAD_MISSES.WALK_COMPLETED",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Load miss in all TLB levels causes a page walk that completes. (All page sizes)"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of page walks outstanding for a demand load in the PMH (Page Miss Handler) each cycle.",
+        "EventCode": "0x08",
+        "Counter": "0,1,2,3",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_LOAD_MISSES.WALK_PENDING",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of page walks outstanding for a demand load in the PMH each cycle."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles when at least one PMH (Page Miss Handler) is busy with a page walk for a demand load.",
+        "EventCode": "0x08",
+        "Counter": "0,1,2,3",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_LOAD_MISSES.WALK_ACTIVE",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Cycles when at least one PMH is busy with a page walk for a demand load.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts loads that miss the DTLB (Data TLB) and hit the STLB (Second level TLB).",
+        "EventCode": "0x08",
+        "Counter": "0,1,2,3",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_LOAD_MISSES.STLB_HIT",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Loads that miss the DTLB and hit the STLB."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts page walks completed due to demand data stores whose address translations missed in the TLB and were mapped to 4K pages.  The page walks can end with or without a page fault.",
+        "EventCode": "0x49",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_4K",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Page walks completed due to a demand data store to a 4K page."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts page walks completed due to demand data stores whose address translations missed in the TLB and were mapped to 2M/4M pages.  The page walks can end with or without a page fault.",
+        "EventCode": "0x49",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED_2M_4M",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Page walks completed due to a demand data store to a 2M/4M page."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts demand data stores that caused a completed page walk of any page size (4K/2M/4M/1G). This implies it missed in all TLB levels. The page walk can end with or without a fault.",
+        "EventCode": "0x49",
+        "Counter": "0,1,2,3",
+        "UMask": "0xe",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_STORE_MISSES.WALK_COMPLETED",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Store misses in all TLB levels causes a page walk that completes. (All page sizes)"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of page walks outstanding for a store in the PMH (Page Miss Handler) each cycle.",
+        "EventCode": "0x49",
+        "Counter": "0,1,2,3",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_STORE_MISSES.WALK_PENDING",
+        "SampleAfterValue": "2000003",
+        "BriefDescription": "Number of page walks outstanding for a store in the PMH each cycle."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles when at least one PMH (Page Miss Handler) is busy with a page walk for a store.",
+        "EventCode": "0x49",
+        "Counter": "0,1,2,3",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_STORE_MISSES.WALK_ACTIVE",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Cycles when at least one PMH is busy with a page walk for a store.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts stores that miss the DTLB (Data TLB) and hit the STLB (2nd Level TLB).",
+        "EventCode": "0x49",
+        "Counter": "0,1,2,3",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "DTLB_STORE_MISSES.STLB_HIT",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Stores that miss the DTLB and hit the STLB."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts completed page walks (4K page size) caused by a code fetch. This implies it missed in the ITLB and further levels of TLB. The page walk can end with or without a fault.",
+        "EventCode": "0x85",
+        "Counter": "0,1,2,3",
+        "UMask": "0x2",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ITLB_MISSES.WALK_COMPLETED_4K",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (4K)"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts code misses in all ITLB (Instruction TLB) levels that caused a completed page walk (2M and 4M page sizes). The page walk can end with or without a fault.",
+        "EventCode": "0x85",
+        "Counter": "0,1,2,3",
+        "UMask": "0x4",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ITLB_MISSES.WALK_COMPLETED_2M_4M",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (2M/4M)"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts completed page walks (2M and 4M page sizes) caused by a code fetch. This implies it missed in the ITLB (Instruction TLB) and further levels of TLB. The page walk can end with or without a fault.",
+        "EventCode": "0x85",
+        "Counter": "0,1,2,3",
+        "UMask": "0xe",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ITLB_MISSES.WALK_COMPLETED",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Code miss in all TLB levels causes a page walk that completes. (All page sizes)"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of page walks outstanding for an outstanding code (instruction fetch) request in the PMH (Page Miss Handler) each cycle.",
+        "EventCode": "0x85",
+        "Counter": "0,1,2,3",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ITLB_MISSES.WALK_PENDING",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Number of page walks outstanding for an outstanding code request in the PMH each cycle."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts cycles when at least one PMH (Page Miss Handler) is busy with a page walk for a code (instruction fetch) request.",
+        "EventCode": "0x85",
+        "Counter": "0,1,2,3",
+        "UMask": "0x10",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ITLB_MISSES.WALK_ACTIVE",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Cycles when at least one PMH is busy with a page walk for code (instruction fetch) request.",
+        "CounterMask": "1"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts instruction fetch requests that miss the ITLB (Instruction TLB) and hit the STLB (Second-level TLB).",
+        "EventCode": "0x85",
+        "Counter": "0,1,2,3",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ITLB_MISSES.STLB_HIT",
+        "SampleAfterValue": "100003",
+        "BriefDescription": "Instruction fetch requests that miss the ITLB and hit the STLB."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of flushes of the big or small ITLB pages. Counting include both TLB Flush (covering all sets) and TLB Set Clear (set-specific).",
+        "EventCode": "0xAE",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "ITLB.ITLB_FLUSH",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages."
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of DTLB flush attempts of the thread-specific entries.",
+        "EventCode": "0xBD",
+        "Counter": "0,1,2,3",
+        "UMask": "0x1",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "TLB_FLUSH.DTLB_THREAD",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "DTLB flush attempts of the thread-specific entries"
+    },
+    {
+        "CollectPEBSRecord": "2",
+        "PublicDescription": "Counts the number of any STLB flush attempts (such as entire, VPID, PCID, InvPage, CR3 write, etc.).",
+        "EventCode": "0xBD",
+        "Counter": "0,1,2,3",
+        "UMask": "0x20",
+        "PEBScounters": "0,1,2,3",
+        "EventName": "TLB_FLUSH.STLB_ANY",
+        "SampleAfterValue": "100007",
+        "BriefDescription": "STLB flush attempts"
+    }
+]
\ No newline at end of file
diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv
index e05c2c8458fc..7e5947aa42e1 100644
--- a/tools/perf/pmu-events/arch/x86/mapfile.csv
+++ b/tools/perf/pmu-events/arch/x86/mapfile.csv
@@ -33,3 +33,4 @@ GenuineIntel-6-25,v2,westmereep-sp,core
 GenuineIntel-6-2F,v2,westmereex,core
 GenuineIntel-6-55-[01234],v1,skylakex,core
 GenuineIntel-6-55-[56789ABCDEF],v1,cascadelakex,core
+GenuineIntel-6-7E,v1,icelake,core
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 06/23] perf/x86: Support constraint ranges
  2019-03-21 20:56 ` [PATCH V2 06/23] perf/x86: Support constraint ranges kan.liang
@ 2019-03-21 21:09   ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-03-21 21:09 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Thu, Mar 21, 2019 at 01:56:46PM -0700, kan.liang@linux.intel.com wrote:
> From: Andi Kleen <ak@linux.intel.com>

That might be a stretch at this point...


> @@ -263,18 +268,25 @@ struct cpu_hw_events {
>  	void				*kfree_on_online[X86_PERF_KFREE_MAX];
>  };
>  
> -#define __EVENT_CONSTRAINT(c, n, m, w, o, f) {\
> +#define __EVENT_CONSTRAINT_RANGE(c, e, n, m, w, o, f) {	\
>  	{ .idxmsk64 = (n) },		\
>  	.code = (c),			\
> +	.size = (e) - (c),		\
>  	.cmask = (m),			\
>  	.weight = (w),			\
>  	.overlap = (o),			\
>  	.flags = f,			\
>  }
>  
> +#define __EVENT_CONSTRAINT(c, n, m, w, o, f) \
> +	__EVENT_CONSTRAINT_RANGE(c, c, n, m, w, o, f)
> +
>  #define EVENT_CONSTRAINT(c, n, m)	\
>  	__EVENT_CONSTRAINT(c, n, m, HWEIGHT(n), 0, 0)
>  
> +#define EVENT_CONSTRAINT_RANGE(c, e, n, m) \
> +	__EVENT_CONSTRAINT_RANGE(c, e, n, m, HWEIGHT(n), 0, 0)
> +

^ that one needs a comment that it doesn't work for AMD events.

>  #define INTEL_EXCLEVT_CONSTRAINT(c, n)	\
>  	__EVENT_CONSTRAINT(c, n, ARCH_PERFMON_EVENTSEL_EVENT, HWEIGHT(n),\
>  			   0, PERF_X86_EVENT_EXCL)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4
  2019-03-21 20:56 ` [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4 kan.liang
@ 2019-03-21 21:13   ` Peter Zijlstra
  2019-03-21 21:17   ` Peter Zijlstra
  2019-03-21 21:20   ` Peter Zijlstra
  2 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-03-21 21:13 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Thu, Mar 21, 2019 at 01:56:44PM -0700, kan.liang@linux.intel.com wrote:
> @@ -1434,20 +1692,20 @@ static void __intel_pmu_pebs_event(struct perf_event *event,
>  		return;
>  
>  	while (count > 1) {
> -		setup_pebs_sample_data(event, iregs, at, &data, &regs);
> -		perf_event_output(event, &data, &regs);
> -		at += x86_pmu.pebs_record_size;
> +		x86_pmu.setup_pebs_sample_data(event, iregs, at, &data, regs);
> +		perf_event_output(event, &data, regs);
> +		at = next_pebs_record(at);
>  		at = get_next_pebs_record_by_bit(at, top, bit);
>  		count--;
>  	}
>  
> -	setup_pebs_sample_data(event, iregs, at, &data, &regs);
> +	x86_pmu.setup_pebs_sample_data(event, iregs, at, &data, regs);

If you make the setup_pebs_sample_data() a function pointer argument of
this function.

>  
>  	/*
>  	 * All but the last records are processed.
>  	 * The last one is left to be able to call the overflow handler.
>  	 */
> -	if (perf_event_overflow(event, &data, &regs)) {
> +	if (perf_event_overflow(event, &data, regs)) {
>  		x86_pmu_stop(event, 0);
>  		return;
>  	}
> @@ -1626,6 +1884,59 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
>  	}
>  }
>  
> +static void intel_pmu_drain_pebs_icl(struct pt_regs *iregs)
> +{
> +	short counts[INTEL_PMC_IDX_FIXED + MAX_FIXED_PEBS_EVENTS] = {};
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +	struct debug_store *ds = cpuc->ds;
> +	struct perf_event *event;
> +	void *base, *at, *top;
> +	int bit, size;
> +	u64 mask;
> +
> +	if (!x86_pmu.pebs_active)
> +		return;
> +
> +	base = (struct pebs_basic *)(unsigned long)ds->pebs_buffer_base;
> +	top = (struct pebs_basic *)(unsigned long)ds->pebs_index;
> +
> +	ds->pebs_index = ds->pebs_buffer_base;
> +
> +	mask = ((1ULL << x86_pmu.max_pebs_events) - 1) |
> +	       (((1ULL << x86_pmu.num_counters_fixed) - 1) << INTEL_PMC_IDX_FIXED);
> +	size = INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed;
> +
> +	if (unlikely(base >= top)) {
> +		intel_pmu_pebs_event_update_no_drain(cpuc, size);
> +		return;
> +	}
> +
> +	for (at = base; at < top; at = next_pebs_record(at)) {
> +		u64 pebs_status;
> +
> +		pebs_status = get_pebs_status(at) & cpuc->pebs_enabled;
> +		pebs_status &= mask;
> +
> +		for_each_set_bit(bit, (unsigned long *)&pebs_status, size)
> +			counts[bit]++;
> +	}
> +
> +	for (bit = 0; bit < size; bit++) {
> +		if (counts[bit] == 0)
> +			continue;
> +
> +		event = cpuc->events[bit];
> +		if (WARN_ON_ONCE(!event))
> +			continue;
> +
> +		if (WARN_ON_ONCE(!event->attr.precise_ip))
> +			continue;
> +
> +		__intel_pmu_pebs_event(event, iregs, base,
> +				       top, bit, counts[bit]);

		__intel_pmu_pebs_event(event, iregs, base, top, bit,
				       count[bits], setup_adaptibe_pebs_sample_data);

> +	}
> +}

And we can do away with that x86_pmu method..

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4
  2019-03-21 20:56 ` [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4 kan.liang
  2019-03-21 21:13   ` Peter Zijlstra
@ 2019-03-21 21:17   ` Peter Zijlstra
  2019-03-21 21:20   ` Peter Zijlstra
  2 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-03-21 21:17 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Thu, Mar 21, 2019 at 01:56:44PM -0700, kan.liang@linux.intel.com wrote:
> +static inline void *next_pebs_record(void *p)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +	unsigned int size;
> +
> +	if (x86_pmu.intel_cap.pebs_format < 4)
> +		size = x86_pmu.pebs_record_size;
> +	else
> +		size = cpuc->pebs_record_size;
> +	return p + size;
> +}

> @@ -1323,19 +1580,19 @@ get_next_pebs_record_by_bit(void *base, void *top, int bit)
>  	if (base == NULL)
>  		return NULL;
>  
> -	for (at = base; at < top; at += x86_pmu.pebs_record_size) {
> -		struct pebs_record_nhm *p = at;
> +	for (at = base; at < top; at = next_pebs_record(at)) {
> +		unsigned long status = get_pebs_status(at);

afaict we do not mix base and adaptive records, and thus the above
really could use cpuc->pebs_record_size unconditionally, right?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4
  2019-03-21 20:56 ` [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4 kan.liang
  2019-03-21 21:13   ` Peter Zijlstra
  2019-03-21 21:17   ` Peter Zijlstra
@ 2019-03-21 21:20   ` Peter Zijlstra
  2019-03-22  0:40     ` Liang, Kan
  2 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-03-21 21:20 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Thu, Mar 21, 2019 at 01:56:44PM -0700, kan.liang@linux.intel.com wrote:
> @@ -933,6 +1001,34 @@ pebs_update_state(bool needed_cb, struct cpu_hw_events *cpuc, struct pmu *pmu)
>  		update = true;
>  	}
>  
> +	/*
> +	 * The PEBS record doesn't shrink on the del. Because to get
> +	 * an accurate config needs to go through all the existing pebs events.
> +	 * It's not necessary.
> +	 * There is no harmful for a bigger PEBS record, except little
> +	 * performance impacts.
> +	 * Also, for most cases, the same pebs config is applied for all
> +	 * pebs events.
> +	 */
> +	if (x86_pmu.intel_cap.pebs_baseline && add) {
> +		u64 pebs_data_cfg;
> +
> +		/* Clear pebs_data_cfg and pebs_record_size for first PEBS. */
> +		if (cpuc->n_pebs == 1) {
> +			cpuc->pebs_data_cfg = 0;
> +			cpuc->pebs_record_size = sizeof(struct pebs_basic);
> +		}

Argh, no. This is daft. The previous site was fine, it was just the
pebs_record_size assignment I'm confused about.

Note how by setting ->pebs_data_cfs to 0, you force the below branch to
true and call adaptive_pebs_record_size_update() ? So _why_ do you have
to set pebs_record_size()?

> +
> +		pebs_data_cfg = pebs_update_adaptive_cfg(event);
> +
> +		/* Update pebs_record_size if new event requires more data. */
> +		if (pebs_data_cfg & ~cpuc->pebs_data_cfg) {
> +			cpuc->pebs_data_cfg |= pebs_data_cfg;
> +			adaptive_pebs_record_size_update();
> +			update = true;
> +		}
> +	}
> +
>  	if (update)
>  		pebs_update_threshold(cpuc);
>  }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4
  2019-03-21 21:20   ` Peter Zijlstra
@ 2019-03-22  0:40     ` Liang, Kan
  2019-03-22 12:30       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Liang, Kan @ 2019-03-22  0:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 3/21/2019 5:20 PM, Peter Zijlstra wrote:
> On Thu, Mar 21, 2019 at 01:56:44PM -0700, kan.liang@linux.intel.com wrote:
>> @@ -933,6 +1001,34 @@ pebs_update_state(bool needed_cb, struct cpu_hw_events *cpuc, struct pmu *pmu)
>>   		update = true;
>>   	}
>>   
>> +	/*
>> +	 * The PEBS record doesn't shrink on the del. Because to get
>> +	 * an accurate config needs to go through all the existing pebs events.
>> +	 * It's not necessary.
>> +	 * There is no harmful for a bigger PEBS record, except little
>> +	 * performance impacts.
>> +	 * Also, for most cases, the same pebs config is applied for all
>> +	 * pebs events.
>> +	 */
>> +	if (x86_pmu.intel_cap.pebs_baseline && add) {
>> +		u64 pebs_data_cfg;
>> +
>> +		/* Clear pebs_data_cfg and pebs_record_size for first PEBS. */
>> +		if (cpuc->n_pebs == 1) {
>> +			cpuc->pebs_data_cfg = 0;
>> +			cpuc->pebs_record_size = sizeof(struct pebs_basic);
>> +		}
> 
> Argh, no. This is daft. The previous site was fine, it was just the
> pebs_record_size assignment I'm confused about.
> 
> Note how by setting ->pebs_data_cfs to 0, you force the below branch to
> true and call adaptive_pebs_record_size_update() ? So _why_ do you have
> to set pebs_record_size()?
>

I think we have to reset both cpuc->pebs_data_cfg and 
cpuc->pebs_record_size. Because pebs_update_adaptive_cfg() can return 0.
If so, adaptive_pebs_record_size_update() will not be called. The 
cpuc->pebs_record_size still use the stale data, which may be wrong.

I think there is no difference to reset them in first add or last del.
If so, I will keep the code here unchanged.

I will prepare V3 to address other comments.

Thanks,
Kan

>> +
>> +		pebs_data_cfg = pebs_update_adaptive_cfg(event);
>> +
>> +		/* Update pebs_record_size if new event requires more data. */
>> +		if (pebs_data_cfg & ~cpuc->pebs_data_cfg) {
>> +			cpuc->pebs_data_cfg |= pebs_data_cfg;
>> +			adaptive_pebs_record_size_update();
>> +			update = true;
>> +		}
>> +	}
>> +
>>   	if (update)
>>   		pebs_update_threshold(cpuc);
>>   }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4
  2019-03-22  0:40     ` Liang, Kan
@ 2019-03-22 12:30       ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-03-22 12:30 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Thu, Mar 21, 2019 at 08:40:03PM -0400, Liang, Kan wrote:
> 
> 
> On 3/21/2019 5:20 PM, Peter Zijlstra wrote:
> > On Thu, Mar 21, 2019 at 01:56:44PM -0700, kan.liang@linux.intel.com wrote:
> > > @@ -933,6 +1001,34 @@ pebs_update_state(bool needed_cb, struct cpu_hw_events *cpuc, struct pmu *pmu)
> > >   		update = true;
> > >   	}
> > > +	/*
> > > +	 * The PEBS record doesn't shrink on the del. Because to get
> > > +	 * an accurate config needs to go through all the existing pebs events.
> > > +	 * It's not necessary.
> > > +	 * There is no harmful for a bigger PEBS record, except little
> > > +	 * performance impacts.
> > > +	 * Also, for most cases, the same pebs config is applied for all
> > > +	 * pebs events.
> > > +	 */
> > > +	if (x86_pmu.intel_cap.pebs_baseline && add) {
> > > +		u64 pebs_data_cfg;
> > > +
> > > +		/* Clear pebs_data_cfg and pebs_record_size for first PEBS. */
> > > +		if (cpuc->n_pebs == 1) {
> > > +			cpuc->pebs_data_cfg = 0;
> > > +			cpuc->pebs_record_size = sizeof(struct pebs_basic);
> > > +		}
> > 
> > Argh, no. This is daft. The previous site was fine, it was just the
> > pebs_record_size assignment I'm confused about.
> > 
> > Note how by setting ->pebs_data_cfs to 0, you force the below branch to
> > true and call adaptive_pebs_record_size_update() ? So _why_ do you have
> > to set pebs_record_size()?
> > 
> 
> I think we have to reset both cpuc->pebs_data_cfg and
> cpuc->pebs_record_size. Because pebs_update_adaptive_cfg() can return 0.
> If so, adaptive_pebs_record_size_update() will not be called. The
> cpuc->pebs_record_size still use the stale data, which may be wrong.

Oh, bugger.. I see.

> > > +
> > > +		pebs_data_cfg = pebs_update_adaptive_cfg(event);
> > > +
> > > +		/* Update pebs_record_size if new event requires more data. */
> > > +		if (pebs_data_cfg & ~cpuc->pebs_data_cfg) {
> > > +			cpuc->pebs_data_cfg |= pebs_data_cfg;
> > > +			adaptive_pebs_record_size_update();
> > > +			update = true;
> > > +		}
> > > +	}
> > > +
> > >   	if (update)
> > >   		pebs_update_threshold(cpuc);
> > >   }

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-03-22 12:30 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-21 20:56 [PATCH V2 00/23] perf: Add Icelake support kan.liang
2019-03-21 20:56 ` [PATCH V2 01/23] perf/x86: Support outputting XMM registers kan.liang
2019-03-21 20:56 ` [PATCH V2 02/23] perf/x86/intel: Extract memory code PEBS parser for reuse kan.liang
2019-03-21 20:56 ` [PATCH V2 03/23] perf/x86/intel/ds: Extract code of event update in short period kan.liang
2019-03-21 20:56 ` [PATCH V2 04/23] perf/x86/intel: Support adaptive PEBSv4 kan.liang
2019-03-21 21:13   ` Peter Zijlstra
2019-03-21 21:17   ` Peter Zijlstra
2019-03-21 21:20   ` Peter Zijlstra
2019-03-22  0:40     ` Liang, Kan
2019-03-22 12:30       ` Peter Zijlstra
2019-03-21 20:56 ` [PATCH V2 05/23] perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them kan.liang
2019-03-21 20:56 ` [PATCH V2 06/23] perf/x86: Support constraint ranges kan.liang
2019-03-21 21:09   ` Peter Zijlstra
2019-03-21 20:56 ` [PATCH V2 07/23] perf/x86/intel: Add Icelake support kan.liang
2019-03-21 20:56 ` [PATCH V2 08/23] perf/x86/intel/cstate: " kan.liang
2019-03-21 20:56 ` [PATCH V2 09/23] perf/x86/intel/rapl: " kan.liang
2019-03-21 20:56 ` [PATCH V2 10/23] perf/x86/msr: " kan.liang
2019-03-21 20:56 ` [PATCH V2 11/23] perf/x86/intel/uncore: Add Intel Icelake uncore support kan.liang
2019-03-21 20:56 ` [PATCH V2 12/23] perf/core: Support a REMOVE transaction kan.liang
2019-03-21 20:56 ` [PATCH V2 13/23] perf/x86/intel: Basic support for metrics counters kan.liang
2019-03-21 20:56 ` [PATCH V2 14/23] perf/x86/intel: Support overflows on SLOTS kan.liang
2019-03-21 20:56 ` [PATCH V2 15/23] perf/x86/intel: Support hardware TopDown metrics kan.liang
2019-03-21 20:56 ` [PATCH V2 16/23] perf/x86/intel: Set correct weight for topdown subevent counters kan.liang
2019-03-21 20:56 ` [PATCH V2 17/23] perf/x86/intel: Export new top down events for Icelake kan.liang
2019-03-21 20:56 ` [PATCH V2 18/23] perf/x86/intel: Disable sampling read slots and topdown kan.liang
2019-03-21 20:56 ` [PATCH V2 19/23] perf/x86/intel: Support CPUID 10.ECX to disable fixed counters kan.liang
2019-03-21 20:57 ` [PATCH V2 20/23] perf, tools: Add support for recording and printing XMM registers kan.liang
2019-03-21 20:57 ` [PATCH V2 21/23] perf, tools, stat: Support new per thread TopDown metrics kan.liang
2019-03-21 20:57 ` [PATCH V2 22/23] perf, tools: Add documentation for topdown metrics kan.liang
2019-03-21 20:57 ` [PATCH V2 23/23] perf vendor events intel: Add JSON files for Icelake kan.liang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).