linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support
@ 2022-08-11 12:29 Sandipan Das
  2022-08-11 12:29 ` [PATCH 01/13] perf/x86/amd/brs: Move feature-specific functions Sandipan Das
                   ` (12 more replies)
  0 siblings, 13 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

Last Branch Record (LBR) is a feature available on modern processors for
recording branch information. It helps determine the flow of control by
logging branch information to registers in realtime and helps with the
detection of hot code paths.

Add support for using AMD Last Branch Record Extension Version 2 (LbrExtV2)
features on Zen 4 processors. New CPU features are introduced for LbrExtV2
detection. New MSR definitions are added for configuring hardware branch
filtering and for enabling the LBR Freeze on PMI feature.

The LBR Freeze on PMI feature is essential for ensuring that branch records
remain consistent with the point of PMU overflow in order to provide a
precise correlation between the two.

Hardware branch filtering allows users to record only specific types of
branches and can be mapped to most of the existing filters supported by the
perf tool. Additional software filtering ensures that some special branches
(syscall entry and exit) for which direct hardware filters do not exist are
also recorded. This expands the scope of filters like "any_call".

Additionally, the perf UAPI is now extended to provide branch speculation
information, if available. LbrExtV2 provides this information through the
"valid" and "spec" bits in the Branch To registers. The tools-side changes
for this will be submitted as a separate series.

Users of perf tool can now record branches as shown below. The 'div'
workload used here is from https://lwn.net/Articles/680985/.

E.g.

  $ perf record -b -e cycles:u ./div

Before:

  Error:
  cycles:u: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'

After:

  [ perf record: Woken up 49 times to write data ]
  [ perf record: Captured and wrote 12.197 MB perf.data (29601 samples) ]

  $ perf report --stdio

  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 473K of event 'cycles:u'
  # Event count (approx.): 473521
  #
  # Overhead  Command  Source Shared Object  Source Symbol           Target Symbol           Basic Block Cycles
  # ........  .......  ....................  ......................  ......................  ..................
  #
      29.69%  div      div                   [.] main                [.] main                -
      23.84%  div      div                   [.] compute_flag        [.] main                -
      23.41%  div      div                   [.] compute_flag        [.] compute_flag        -
      23.04%  div      div                   [.] main                [.] compute_flag        -
  [...]

No additional failures are seen upon running the following:
  * perf built-in test suite
  * perf_event_tests suite

Sandipan Das (13):
  perf/x86/amd/brs: Move feature-specific functions
  perf/x86/amd/core: Refactor branch attributes
  perf/x86/amd/core: Add generic branch record interfaces
  x86/cpufeatures: Add LbrExtV2 feature bit
  perf/x86/amd/lbr: Detect LbrExtV2 support
  perf/x86/amd/lbr: Add LbrExtV2 branch record support
  perf/x86/amd/lbr: Add LbrExtV2 hardware branch filter support
  perf/x86: Move branch classifier
  perf/x86/amd/lbr: Add LbrExtV2 software branch filter support
  perf/x86: Make branch classifier fusion-aware
  perf/x86/amd/lbr: Use fusion-aware branch classifier
  perf/core: Add speculation info to branch entries
  perf/x86/amd/lbr: Add LbrExtV2 branch speculation info support

 arch/x86/events/Makefile           |   2 +-
 arch/x86/events/amd/Makefile       |   2 +-
 arch/x86/events/amd/brs.c          |  69 ++++-
 arch/x86/events/amd/core.c         | 200 +++++++------
 arch/x86/events/amd/lbr.c          | 435 +++++++++++++++++++++++++++++
 arch/x86/events/intel/lbr.c        | 273 ------------------
 arch/x86/events/perf_event.h       |  81 +++++-
 arch/x86/events/utils.c            | 247 ++++++++++++++++
 arch/x86/include/asm/cpufeatures.h |   2 +-
 arch/x86/include/asm/msr-index.h   |   5 +
 arch/x86/include/asm/perf_event.h  |   3 +-
 arch/x86/kernel/cpu/scattered.c    |   1 +
 include/linux/perf_event.h         |   1 +
 include/uapi/linux/perf_event.h    |  15 +-
 14 files changed, 952 insertions(+), 384 deletions(-)
 create mode 100644 arch/x86/events/amd/lbr.c
 create mode 100644 arch/x86/events/utils.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/13] perf/x86/amd/brs: Move feature-specific functions
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 12:29 ` [PATCH 02/13] perf/x86/amd/core: Refactor branch attributes Sandipan Das
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

Move some of the Branch Sampling (BRS) specific functions out of the Core
events sources and into the BRS sources in preparation for adding other
mechanisms to record branches.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/brs.c    | 69 ++++++++++++++++++++++++++++++++++-
 arch/x86/events/amd/core.c   | 70 ++----------------------------------
 arch/x86/events/perf_event.h |  7 ++--
 3 files changed, 76 insertions(+), 70 deletions(-)

diff --git a/arch/x86/events/amd/brs.c b/arch/x86/events/amd/brs.c
index bee8765a1e9b..f1bff153d945 100644
--- a/arch/x86/events/amd/brs.c
+++ b/arch/x86/events/amd/brs.c
@@ -81,7 +81,7 @@ static bool __init amd_brs_detect(void)
  * a br_sel_map. Software filtering is not supported because it would not correlate well
  * with a sampling period.
  */
-int amd_brs_setup_filter(struct perf_event *event)
+static int amd_brs_setup_filter(struct perf_event *event)
 {
 	u64 type = event->attr.branch_sample_type;
 
@@ -96,6 +96,73 @@ int amd_brs_setup_filter(struct perf_event *event)
 	return 0;
 }
 
+static inline int amd_is_brs_event(struct perf_event *e)
+{
+	return (e->hw.config & AMD64_RAW_EVENT_MASK) == AMD_FAM19H_BRS_EVENT;
+}
+
+int amd_brs_hw_config(struct perf_event *event)
+{
+	int ret = 0;
+
+	/*
+	 * Due to interrupt holding, BRS is not recommended in
+	 * counting mode.
+	 */
+	if (!is_sampling_event(event))
+		return -EINVAL;
+
+	/*
+	 * Due to the way BRS operates by holding the interrupt until
+	 * lbr_nr entries have been captured, it does not make sense
+	 * to allow sampling on BRS with an event that does not match
+	 * what BRS is capturing, i.e., retired taken branches.
+	 * Otherwise the correlation with the event's period is even
+	 * more loose:
+	 *
+	 * With retired taken branch:
+	 *   Effective P = P + 16 + X
+	 * With any other event:
+	 *   Effective P = P + Y + X
+	 *
+	 * Where X is the number of taken branches due to interrupt
+	 * skid. Skid is large.
+	 *
+	 * Where Y is the occurences of the event while BRS is
+	 * capturing the lbr_nr entries.
+	 *
+	 * By using retired taken branches, we limit the impact on the
+	 * Y variable. We know it cannot be more than the depth of
+	 * BRS.
+	 */
+	if (!amd_is_brs_event(event))
+		return -EINVAL;
+
+	/*
+	 * BRS implementation does not work with frequency mode
+	 * reprogramming of the period.
+	 */
+	if (event->attr.freq)
+		return -EINVAL;
+	/*
+	 * The kernel subtracts BRS depth from period, so it must
+	 * be big enough.
+	 */
+	if (event->attr.sample_period <= x86_pmu.lbr_nr)
+		return -EINVAL;
+
+	/*
+	 * Check if we can allow PERF_SAMPLE_BRANCH_STACK
+	 */
+	ret = amd_brs_setup_filter(event);
+
+	/* only set in case of success */
+	if (!ret)
+		event->hw.flags |= PERF_X86_EVENT_AMD_BRS;
+
+	return ret;
+}
+
 /* tos = top of stack, i.e., last valid entry written */
 static inline int amd_brs_get_tos(union amd_debug_extn_cfg *cfg)
 {
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 9ac3718410ce..e32a27899e11 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -330,16 +330,8 @@ static inline bool amd_is_pair_event_code(struct hw_perf_event *hwc)
 	}
 }
 
-#define AMD_FAM19H_BRS_EVENT 0xc4 /* RETIRED_TAKEN_BRANCH_INSTRUCTIONS */
-static inline int amd_is_brs_event(struct perf_event *e)
-{
-	return (e->hw.config & AMD64_RAW_EVENT_MASK) == AMD_FAM19H_BRS_EVENT;
-}
-
 static int amd_core_hw_config(struct perf_event *event)
 {
-	int ret = 0;
-
 	if (event->attr.exclude_host && event->attr.exclude_guest)
 		/*
 		 * When HO == GO == 1 the hardware treats that as GO == HO == 0
@@ -356,66 +348,10 @@ static int amd_core_hw_config(struct perf_event *event)
 	if ((x86_pmu.flags & PMU_FL_PAIR) && amd_is_pair_event_code(&event->hw))
 		event->hw.flags |= PERF_X86_EVENT_PAIR;
 
-	/*
-	 * if branch stack is requested
-	 */
-	if (has_branch_stack(event)) {
-		/*
-		 * Due to interrupt holding, BRS is not recommended in
-		 * counting mode.
-		 */
-		if (!is_sampling_event(event))
-			return -EINVAL;
-
-		/*
-		 * Due to the way BRS operates by holding the interrupt until
-		 * lbr_nr entries have been captured, it does not make sense
-		 * to allow sampling on BRS with an event that does not match
-		 * what BRS is capturing, i.e., retired taken branches.
-		 * Otherwise the correlation with the event's period is even
-		 * more loose:
-		 *
-		 * With retired taken branch:
-		 *   Effective P = P + 16 + X
-		 * With any other event:
-		 *   Effective P = P + Y + X
-		 *
-		 * Where X is the number of taken branches due to interrupt
-		 * skid. Skid is large.
-		 *
-		 * Where Y is the occurences of the event while BRS is
-		 * capturing the lbr_nr entries.
-		 *
-		 * By using retired taken branches, we limit the impact on the
-		 * Y variable. We know it cannot be more than the depth of
-		 * BRS.
-		 */
-		if (!amd_is_brs_event(event))
-			return -EINVAL;
+	if (has_branch_stack(event))
+		return amd_brs_hw_config(event);
 
-		/*
-		 * BRS implementation does not work with frequency mode
-		 * reprogramming of the period.
-		 */
-		if (event->attr.freq)
-			return -EINVAL;
-		/*
-		 * The kernel subtracts BRS depth from period, so it must
-		 * be big enough.
-		 */
-		if (event->attr.sample_period <= x86_pmu.lbr_nr)
-			return -EINVAL;
-
-		/*
-		 * Check if we can allow PERF_SAMPLE_BRANCH_STACK
-		 */
-		ret = amd_brs_setup_filter(event);
-
-		/* only set in case of success */
-		if (!ret)
-			event->hw.flags |= PERF_X86_EVENT_AMD_BRS;
-	}
-	return ret;
+	return 0;
 }
 
 static inline int amd_is_nb_event(struct hw_perf_event *hwc)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index ca2f8bfe6ff1..6d23e88d232c 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1231,6 +1231,9 @@ static inline bool fixed_counter_disabled(int i, struct pmu *pmu)
 int amd_pmu_init(void);
 
 #ifdef CONFIG_PERF_EVENTS_AMD_BRS
+
+#define AMD_FAM19H_BRS_EVENT 0xc4 /* RETIRED_TAKEN_BRANCH_INSTRUCTIONS */
+
 int amd_brs_init(void);
 void amd_brs_disable(void);
 void amd_brs_enable(void);
@@ -1239,7 +1242,7 @@ void amd_brs_disable_all(void);
 void amd_brs_drain(void);
 void amd_brs_lopwr_init(void);
 void amd_brs_disable_all(void);
-int amd_brs_setup_filter(struct perf_event *event);
+int amd_brs_hw_config(struct perf_event *event);
 void amd_brs_reset(void);
 
 static inline void amd_pmu_brs_add(struct perf_event *event)
@@ -1275,7 +1278,7 @@ static inline void amd_brs_enable(void) {}
 static inline void amd_brs_drain(void) {}
 static inline void amd_brs_lopwr_init(void) {}
 static inline void amd_brs_disable_all(void) {}
-static inline int amd_brs_setup_filter(struct perf_event *event)
+static inline int amd_brs_hw_config(struct perf_event *event)
 {
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/13] perf/x86/amd/core: Refactor branch attributes
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
  2022-08-11 12:29 ` [PATCH 01/13] perf/x86/amd/brs: Move feature-specific functions Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 12:29 ` [PATCH 03/13] perf/x86/amd/core: Add generic branch record interfaces Sandipan Das
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

AMD processors that are capable of recording branches support either Branch
Sampling (BRS) or Last Branch Record (LBR). In preparation for adding Last
Branch Record Extension Version 2 (LbrExtV2) support, reuse the "branches"
capability to advertise information about both BRS and LBR but make the
"branch-brs" event exclusive to Family 19h processors that support BRS.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/core.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index e32a27899e11..2f524cf84528 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -1247,23 +1247,25 @@ static ssize_t branches_show(struct device *cdev,
 
 static DEVICE_ATTR_RO(branches);
 
-static struct attribute *amd_pmu_brs_attrs[] = {
+static struct attribute *amd_pmu_branches_attrs[] = {
 	&dev_attr_branches.attr,
 	NULL,
 };
 
 static umode_t
-amd_brs_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+amd_branches_is_visible(struct kobject *kobj, struct attribute *attr, int i)
 {
 	return x86_pmu.lbr_nr ? attr->mode : 0;
 }
 
-static struct attribute_group group_caps_amd_brs = {
+static struct attribute_group group_caps_amd_branches = {
 	.name  = "caps",
-	.attrs = amd_pmu_brs_attrs,
-	.is_visible = amd_brs_is_visible,
+	.attrs = amd_pmu_branches_attrs,
+	.is_visible = amd_branches_is_visible,
 };
 
+#ifdef CONFIG_PERF_EVENTS_AMD_BRS
+
 EVENT_ATTR_STR(branch-brs, amd_branch_brs,
 	       "event=" __stringify(AMD_FAM19H_BRS_EVENT)"\n");
 
@@ -1272,15 +1274,26 @@ static struct attribute *amd_brs_events_attrs[] = {
 	NULL,
 };
 
+static umode_t
+amd_brs_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+{
+	return static_cpu_has(X86_FEATURE_BRS) && x86_pmu.lbr_nr ?
+	       attr->mode : 0;
+}
+
 static struct attribute_group group_events_amd_brs = {
 	.name       = "events",
 	.attrs      = amd_brs_events_attrs,
 	.is_visible = amd_brs_is_visible,
 };
 
+#endif	/* CONFIG_PERF_EVENTS_AMD_BRS */
+
 static const struct attribute_group *amd_attr_update[] = {
-	&group_caps_amd_brs,
+	&group_caps_amd_branches,
+#ifdef CONFIG_PERF_EVENTS_AMD_BRS
 	&group_events_amd_brs,
+#endif
 	NULL,
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/13] perf/x86/amd/core: Add generic branch record interfaces
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
  2022-08-11 12:29 ` [PATCH 01/13] perf/x86/amd/brs: Move feature-specific functions Sandipan Das
  2022-08-11 12:29 ` [PATCH 02/13] perf/x86/amd/core: Refactor branch attributes Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 12:29 ` [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit Sandipan Das
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

AMD processors that are capable of recording branches support either Branch
Sampling (BRS) or Last Branch Record (LBR). In preparation for adding Last
Branch Record Extension Version 2 (LbrExtV2) support, introduce new static
calls which act as gateways to call into the feature-dependent functions
based on what is available on the processor.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/core.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index 2f524cf84528..ef3520731a20 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -330,6 +330,8 @@ static inline bool amd_is_pair_event_code(struct hw_perf_event *hwc)
 	}
 }
 
+DEFINE_STATIC_CALL_RET0(amd_pmu_branch_hw_config, *x86_pmu.hw_config);
+
 static int amd_core_hw_config(struct perf_event *event)
 {
 	if (event->attr.exclude_host && event->attr.exclude_guest)
@@ -349,7 +351,7 @@ static int amd_core_hw_config(struct perf_event *event)
 		event->hw.flags |= PERF_X86_EVENT_PAIR;
 
 	if (has_branch_stack(event))
-		return amd_brs_hw_config(event);
+		return static_call(amd_pmu_branch_hw_config)(event);
 
 	return 0;
 }
@@ -518,8 +520,14 @@ static struct amd_nb *amd_alloc_nb(int cpu)
 	return nb;
 }
 
+typedef void (amd_pmu_branch_reset_t)(void);
+DEFINE_STATIC_CALL_NULL(amd_pmu_branch_reset, amd_pmu_branch_reset_t);
+
 static void amd_pmu_cpu_reset(int cpu)
 {
+	if (x86_pmu.lbr_nr)
+		static_call(amd_pmu_branch_reset)();
+
 	if (x86_pmu.version < 2)
 		return;
 
@@ -576,7 +584,6 @@ static void amd_pmu_cpu_starting(int cpu)
 	cpuc->amd_nb->nb_id = nb_id;
 	cpuc->amd_nb->refcnt++;
 
-	amd_brs_reset();
 	amd_pmu_cpu_reset(cpu);
 }
 
@@ -771,16 +778,20 @@ static void amd_pmu_v2_disable_all(void)
 	amd_pmu_check_overflow();
 }
 
+DEFINE_STATIC_CALL_NULL(amd_pmu_branch_add, *x86_pmu.add);
+
 static void amd_pmu_add_event(struct perf_event *event)
 {
 	if (needs_branch_stack(event))
-		amd_pmu_brs_add(event);
+		static_call(amd_pmu_branch_add)(event);
 }
 
+DEFINE_STATIC_CALL_NULL(amd_pmu_branch_del, *x86_pmu.del);
+
 static void amd_pmu_del_event(struct perf_event *event)
 {
 	if (needs_branch_stack(event))
-		amd_pmu_brs_del(event);
+		static_call(amd_pmu_branch_del)(event);
 }
 
 /*
@@ -1184,13 +1195,6 @@ static ssize_t amd_event_sysfs_show(char *page, u64 config)
 	return x86_event_sysfs_show(page, config, event);
 }
 
-static void amd_pmu_sched_task(struct perf_event_context *ctx,
-				 bool sched_in)
-{
-	if (sched_in && x86_pmu.lbr_nr)
-		amd_pmu_brs_sched_task(ctx, sched_in);
-}
-
 static u64 amd_pmu_limit_period(struct perf_event *event, u64 left)
 {
 	/*
@@ -1375,8 +1379,14 @@ static int __init amd_core_pmu_init(void)
 	 */
 	if (boot_cpu_data.x86 >= 0x19 && !amd_brs_init()) {
 		x86_pmu.get_event_constraints = amd_get_event_constraints_f19h;
-		x86_pmu.sched_task = amd_pmu_sched_task;
+		x86_pmu.sched_task = amd_pmu_brs_sched_task;
 		x86_pmu.limit_period = amd_pmu_limit_period;
+
+		static_call_update(amd_pmu_branch_hw_config, amd_brs_hw_config);
+		static_call_update(amd_pmu_branch_reset, amd_brs_reset);
+		static_call_update(amd_pmu_branch_add, amd_pmu_brs_add);
+		static_call_update(amd_pmu_branch_del, amd_pmu_brs_del);
+
 		/*
 		 * put_event_constraints callback same as Fam17h, set above
 		 */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
                   ` (2 preceding siblings ...)
  2022-08-11 12:29 ` [PATCH 03/13] perf/x86/amd/core: Add generic branch record interfaces Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 13:13   ` Borislav Petkov
  2022-08-15 11:27   ` Peter Zijlstra
  2022-08-11 12:29 ` [PATCH 05/13] perf/x86/amd/lbr: Detect LbrExtV2 support Sandipan Das
                   ` (8 subsequent siblings)
  12 siblings, 2 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
monitoring features for AMD processors.

Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
(LbrExtV2) features. If found to be set during PMU initialization, the EBX
bits of the same leaf can be used to determine the number of available LBR
entries.

For better utilization of feature words, LbrExtV2 is added as a scattered
feature bit.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 2 +-
 arch/x86/kernel/cpu/scattered.c    | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 393f2bbb5e3a..e3fa476a24b0 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -96,7 +96,7 @@
 #define X86_FEATURE_SYSCALL32		( 3*32+14) /* "" syscall in IA32 userspace */
 #define X86_FEATURE_SYSENTER32		( 3*32+15) /* "" sysenter in IA32 userspace */
 #define X86_FEATURE_REP_GOOD		( 3*32+16) /* REP microcode works well */
-/* FREE!                                ( 3*32+17) */
+#define X86_FEATURE_LBREXT_V2		( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
 #define X86_FEATURE_LFENCE_RDTSC	( 3*32+18) /* "" LFENCE synchronizes RDTSC */
 #define X86_FEATURE_ACC_POWER		( 3*32+19) /* AMD Accumulated Power Mechanism */
 #define X86_FEATURE_NOPL		( 3*32+20) /* The NOPL (0F 1F) instructions */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index dbaa8326d6f2..6be46dffddbf 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
 	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
 	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
 	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
+	{ X86_FEATURE_LBREXT_V2,	CPUID_EAX,  1, 0x80000022, 0 },
 	{ 0, 0, 0, 0, 0 }
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/13] perf/x86/amd/lbr: Detect LbrExtV2 support
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
                   ` (3 preceding siblings ...)
  2022-08-11 12:29 ` [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 12:29 ` [PATCH 06/13] perf/x86/amd/lbr: Add LbrExtV2 branch record support Sandipan Das
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

AMD Last Branch Record Extension Version 2 (LbrExtV2) is driven by Core PMC
overflows. It records recently taken branches up to the moment when the PMC
overflow occurs.

Detect the feature during PMU initialization and set the branch stack depth
using CPUID leaf 0x80000022 EBX.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/Makefile      |  2 +-
 arch/x86/events/amd/core.c        |  9 +++++----
 arch/x86/events/amd/lbr.c         | 21 +++++++++++++++++++++
 arch/x86/events/perf_event.h      |  2 ++
 arch/x86/include/asm/perf_event.h |  3 ++-
 5 files changed, 31 insertions(+), 6 deletions(-)
 create mode 100644 arch/x86/events/amd/lbr.c

diff --git a/arch/x86/events/amd/Makefile b/arch/x86/events/amd/Makefile
index b9f5d4610256..527d947eb76b 100644
--- a/arch/x86/events/amd/Makefile
+++ b/arch/x86/events/amd/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
-obj-$(CONFIG_CPU_SUP_AMD)		+= core.o
+obj-$(CONFIG_CPU_SUP_AMD)		+= core.o lbr.o
 obj-$(CONFIG_PERF_EVENTS_AMD_BRS)	+= brs.o
 obj-$(CONFIG_PERF_EVENTS_AMD_POWER)	+= power.o
 obj-$(CONFIG_X86_LOCAL_APIC)		+= ibs.o
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index ef3520731a20..a3aa67b4fd50 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -1374,10 +1374,11 @@ static int __init amd_core_pmu_init(void)
 		x86_pmu.flags |= PMU_FL_PAIR;
 	}
 
-	/*
-	 * BRS requires special event constraints and flushing on ctxsw.
-	 */
-	if (boot_cpu_data.x86 >= 0x19 && !amd_brs_init()) {
+	/* LBR and BRS are mutually exclusive features */
+	if (amd_pmu_lbr_init() && !amd_brs_init()) {
+		/*
+		 * BRS requires special event constraints and flushing on ctxsw.
+		 */
 		x86_pmu.get_event_constraints = amd_get_event_constraints_f19h;
 		x86_pmu.sched_task = amd_pmu_brs_sched_task;
 		x86_pmu.limit_period = amd_pmu_limit_period;
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
new file mode 100644
index 000000000000..d240ff722b78
--- /dev/null
+++ b/arch/x86/events/amd/lbr.c
@@ -0,0 +1,21 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/perf_event.h>
+#include <asm/perf_event.h>
+
+#include "../perf_event.h"
+
+__init int amd_pmu_lbr_init(void)
+{
+	union cpuid_0x80000022_ebx ebx;
+
+	if (x86_pmu.version < 2 || !boot_cpu_has(X86_FEATURE_LBREXT_V2))
+		return -EOPNOTSUPP;
+
+	/* Set number of entries */
+	ebx.full = cpuid_ebx(EXT_PERFMON_DEBUG_FEATURES);
+	x86_pmu.lbr_nr = ebx.split.lbr_v2_stack_sz;
+
+	pr_cont("%d-deep LBR, ", x86_pmu.lbr_nr);
+
+	return 0;
+}
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 6d23e88d232c..a6cd83c57e8b 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1230,6 +1230,8 @@ static inline bool fixed_counter_disabled(int i, struct pmu *pmu)
 
 int amd_pmu_init(void);
 
+int amd_pmu_lbr_init(void);
+
 #ifdef CONFIG_PERF_EVENTS_AMD_BRS
 
 #define AMD_FAM19H_BRS_EVENT 0xc4 /* RETIRED_TAKEN_BRANCH_INSTRUCTIONS */
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 34348ae41cdb..07844424aaea 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -207,7 +207,8 @@ union cpuid_0x80000022_ebx {
 	struct {
 		/* Number of Core Performance Counters */
 		unsigned int	num_core_pmc:4;
-		unsigned int	reserved:6;
+		/* Number of available LBR Stack Entries */
+		unsigned int	lbr_v2_stack_sz:6;
 		/* Number of Data Fabric Counters */
 		unsigned int	num_df_pmc:6;
 	} split;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/13] perf/x86/amd/lbr: Add LbrExtV2 branch record support
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
                   ` (4 preceding siblings ...)
  2022-08-11 12:29 ` [PATCH 05/13] perf/x86/amd/lbr: Detect LbrExtV2 support Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 12:29 ` [PATCH 07/13] perf/x86/amd/lbr: Add LbrExtV2 hardware branch filter support Sandipan Das
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected,
enable it alongside LBR Freeze on PMI when an event requests branch stack
i.e. PERF_SAMPLE_BRANCH_STACK.

Each branch record is represented by a pair of registers, LBR From and LBR
To. The freeze feature prevents any updates to these registers once a PMC
overflows. The contents remain unchanged until the freeze bit is cleared by
the PMI handler.

The branch records are read and copied to sample data before unfreezing.
However, only valid entries are copied. There is no additional register to
denote which of the register pairs represent the top of the stack (TOS)
since internal register renaming always ensures that the first pair (i.e.
index 0) is the one representing the most recent branch and so on.

The LBR registers are per-thread resources and are cleared explicitly
whenever a new task is scheduled in. There are no special implications on
the contents of these registers when transitioning to deep C-states.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/core.c       |  47 +++++--
 arch/x86/events/amd/lbr.c        | 203 +++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h     |   8 ++
 arch/x86/include/asm/msr-index.h |   5 +
 4 files changed, 252 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index a3aa67b4fd50..d799628016c8 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -620,7 +620,7 @@ static inline u64 amd_pmu_get_global_status(void)
 	/* PerfCntrGlobalStatus is read-only */
 	rdmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, status);
 
-	return status & amd_pmu_global_cntr_mask;
+	return status;
 }
 
 static inline void amd_pmu_ack_global_status(u64 status)
@@ -631,8 +631,6 @@ static inline void amd_pmu_ack_global_status(u64 status)
 	 * clears the same bit in PerfCntrGlobalStatus
 	 */
 
-	/* Only allow modifications to PerfCntrGlobalStatus.PerfCntrOvfl */
-	status &= amd_pmu_global_cntr_mask;
 	wrmsrl(MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, status);
 }
 
@@ -742,11 +740,17 @@ static void amd_pmu_v2_enable_event(struct perf_event *event)
 	__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
 }
 
-static void amd_pmu_v2_enable_all(int added)
+static __always_inline void amd_pmu_core_enable_all(void)
 {
 	amd_pmu_set_global_ctl(amd_pmu_global_cntr_mask);
 }
 
+static void amd_pmu_v2_enable_all(int added)
+{
+	amd_pmu_lbr_enable_all();
+	amd_pmu_core_enable_all();
+}
+
 static void amd_pmu_disable_event(struct perf_event *event)
 {
 	x86_pmu_disable_event(event);
@@ -771,10 +775,15 @@ static void amd_pmu_disable_all(void)
 	amd_pmu_check_overflow();
 }
 
-static void amd_pmu_v2_disable_all(void)
+static __always_inline void amd_pmu_core_disable_all(void)
 {
-	/* Disable all PMCs */
 	amd_pmu_set_global_ctl(0);
+}
+
+static void amd_pmu_v2_disable_all(void)
+{
+	amd_pmu_core_disable_all();
+	amd_pmu_lbr_disable_all();
 	amd_pmu_check_overflow();
 }
 
@@ -877,8 +886,8 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
 	pmu_enabled = cpuc->enabled;
 	cpuc->enabled = 0;
 
-	/* Stop counting */
-	amd_pmu_v2_disable_all();
+	/* Stop counting but do not disable LBR */
+	amd_pmu_core_disable_all();
 
 	status = amd_pmu_get_global_status();
 
@@ -886,6 +895,12 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
 	if (!status)
 		goto done;
 
+	/* Read branch records before unfreezing */
+	if (status & GLOBAL_STATUS_LBRS_FROZEN) {
+		amd_pmu_lbr_read();
+		status &= ~GLOBAL_STATUS_LBRS_FROZEN;
+	}
+
 	for (idx = 0; idx < x86_pmu.num_counters; idx++) {
 		if (!test_bit(idx, cpuc->active_mask))
 			continue;
@@ -905,6 +920,9 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
 		if (!x86_perf_event_set_period(event))
 			continue;
 
+		if (has_branch_stack(event))
+			data.br_stack = &cpuc->lbr_stack;
+
 		if (perf_event_overflow(event, &data, regs))
 			x86_pmu_stop(event, 0);
 
@@ -918,7 +936,7 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
 	 */
 	WARN_ON(status > 0);
 
-	/* Clear overflow bits */
+	/* Clear overflow and freeze bits */
 	amd_pmu_ack_global_status(~status);
 
 	/*
@@ -932,7 +950,7 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
 
 	/* Resume counting only if PMU is active */
 	if (pmu_enabled)
-		amd_pmu_v2_enable_all(0);
+		amd_pmu_core_enable_all();
 
 	return amd_pmu_adjust_nmi_window(handled);
 }
@@ -1375,7 +1393,14 @@ static int __init amd_core_pmu_init(void)
 	}
 
 	/* LBR and BRS are mutually exclusive features */
-	if (amd_pmu_lbr_init() && !amd_brs_init()) {
+	if (!amd_pmu_lbr_init()) {
+		/* LBR requires flushing on context switch */
+		x86_pmu.sched_task = amd_pmu_lbr_sched_task;
+		static_call_update(amd_pmu_branch_hw_config, amd_pmu_lbr_hw_config);
+		static_call_update(amd_pmu_branch_reset, amd_pmu_lbr_reset);
+		static_call_update(amd_pmu_branch_add, amd_pmu_lbr_add);
+		static_call_update(amd_pmu_branch_del, amd_pmu_lbr_del);
+	} else if (!amd_brs_init()) {
 		/*
 		 * BRS requires special event constraints and flushing on ctxsw.
 		 */
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index d240ff722b78..39247e7be53c 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -4,6 +4,209 @@
 
 #include "../perf_event.h"
 
+struct branch_entry {
+	union {
+		struct {
+			u64	ip:58;
+			u64	ip_sign_ext:5;
+			u64	mispredict:1;
+		} split;
+		u64		full;
+	} from;
+
+	union {
+		struct {
+			u64	ip:58;
+			u64	ip_sign_ext:3;
+			u64	reserved:1;
+			u64	spec:1;
+			u64	valid:1;
+		} split;
+		u64		full;
+	} to;
+};
+
+static __always_inline void amd_pmu_lbr_set_from(unsigned int idx, u64 val)
+{
+	wrmsrl(MSR_AMD_SAMP_BR_FROM + idx * 2, val);
+}
+
+static __always_inline void amd_pmu_lbr_set_to(unsigned int idx, u64 val)
+{
+	wrmsrl(MSR_AMD_SAMP_BR_FROM + idx * 2 + 1, val);
+}
+
+static __always_inline u64 amd_pmu_lbr_get_from(unsigned int idx)
+{
+	u64 val;
+
+	rdmsrl(MSR_AMD_SAMP_BR_FROM + idx * 2, val);
+
+	return val;
+}
+
+static __always_inline u64 amd_pmu_lbr_get_to(unsigned int idx)
+{
+	u64 val;
+
+	rdmsrl(MSR_AMD_SAMP_BR_FROM + idx * 2 + 1, val);
+
+	return val;
+}
+
+static __always_inline u64 sign_ext_branch_ip(u64 ip)
+{
+	u32 shift = 64 - boot_cpu_data.x86_virt_bits;
+
+	return (u64)(((s64)ip << shift) >> shift);
+}
+
+void amd_pmu_lbr_read(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct perf_branch_entry *br = cpuc->lbr_entries;
+	struct branch_entry entry;
+	int out = 0, i;
+
+	if (!cpuc->lbr_users)
+		return;
+
+	for (i = 0; i < x86_pmu.lbr_nr; i++) {
+		entry.from.full	= amd_pmu_lbr_get_from(i);
+		entry.to.full	= amd_pmu_lbr_get_to(i);
+
+		/* Check if a branch has been logged */
+		if (!entry.to.split.valid)
+			continue;
+
+		perf_clear_branch_entry_bitfields(br + out);
+
+		br[out].from	= sign_ext_branch_ip(entry.from.split.ip);
+		br[out].to	= sign_ext_branch_ip(entry.to.split.ip);
+		br[out].mispred	= entry.from.split.mispredict;
+		br[out].predicted = !br[out].mispred;
+		out++;
+	}
+
+	cpuc->lbr_stack.nr = out;
+
+	/*
+	 * Internal register renaming always ensures that LBR From[0] and
+	 * LBR To[0] always represent the TOS
+	 */
+	cpuc->lbr_stack.hw_idx = 0;
+}
+
+static int amd_pmu_lbr_setup_filter(struct perf_event *event)
+{
+	/* No LBR support */
+	if (!x86_pmu.lbr_nr)
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
+int amd_pmu_lbr_hw_config(struct perf_event *event)
+{
+	int ret = 0;
+
+	/* LBR is not recommended in counting mode */
+	if (!is_sampling_event(event))
+		return -EINVAL;
+
+	ret = amd_pmu_lbr_setup_filter(event);
+	if (!ret)
+		event->attach_state |= PERF_ATTACH_SCHED_CB;
+
+	return ret;
+}
+
+void amd_pmu_lbr_reset(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int i;
+
+	if (!x86_pmu.lbr_nr)
+		return;
+
+	/* Reset all branch records individually */
+	for (i = 0; i < x86_pmu.lbr_nr; i++) {
+		amd_pmu_lbr_set_from(i, 0);
+		amd_pmu_lbr_set_to(i, 0);
+	}
+
+	cpuc->last_task_ctx = NULL;
+	cpuc->last_log_id = 0;
+}
+
+void amd_pmu_lbr_add(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	if (!x86_pmu.lbr_nr)
+		return;
+
+	perf_sched_cb_inc(event->ctx->pmu);
+
+	if (!cpuc->lbr_users++ && !event->total_time_running)
+		amd_pmu_lbr_reset();
+}
+
+void amd_pmu_lbr_del(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	if (!x86_pmu.lbr_nr)
+		return;
+
+	cpuc->lbr_users--;
+	WARN_ON_ONCE(cpuc->lbr_users < 0);
+	perf_sched_cb_dec(event->ctx->pmu);
+}
+
+void amd_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	/*
+	 * A context switch can flip the address space and LBR entries are
+	 * not tagged with an identifier. Hence, branches cannot be resolved
+	 * from the old address space and the LBR records should be wiped.
+	 */
+	if (cpuc->lbr_users && sched_in)
+		amd_pmu_lbr_reset();
+}
+
+void amd_pmu_lbr_enable_all(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	u64 dbg_ctl, dbg_extn_cfg;
+
+	if (!cpuc->lbr_users || !x86_pmu.lbr_nr)
+		return;
+
+	rdmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl);
+	rdmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg);
+
+	wrmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI);
+	wrmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg | DBG_EXTN_CFG_LBRV2EN);
+}
+
+void amd_pmu_lbr_disable_all(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	u64 dbg_ctl, dbg_extn_cfg;
+
+	if (!cpuc->lbr_users || !x86_pmu.lbr_nr)
+		return;
+
+	rdmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg);
+	rdmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl);
+
+	wrmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg & ~DBG_EXTN_CFG_LBRV2EN);
+	wrmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl & ~DEBUGCTLMSR_FREEZE_LBRS_ON_PMI);
+}
+
 __init int amd_pmu_lbr_init(void)
 {
 	union cpuid_0x80000022_ebx ebx;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index a6cd83c57e8b..c8397290f388 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1231,6 +1231,14 @@ static inline bool fixed_counter_disabled(int i, struct pmu *pmu)
 int amd_pmu_init(void);
 
 int amd_pmu_lbr_init(void);
+void amd_pmu_lbr_reset(void);
+void amd_pmu_lbr_read(void);
+void amd_pmu_lbr_add(struct perf_event *event);
+void amd_pmu_lbr_del(struct perf_event *event);
+void amd_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in);
+void amd_pmu_lbr_enable_all(void);
+void amd_pmu_lbr_disable_all(void);
+int amd_pmu_lbr_hw_config(struct perf_event *event);
 
 #ifdef CONFIG_PERF_EVENTS_AMD_BRS
 
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 403e83b4adc8..440d0490f71e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -539,6 +539,9 @@
 #define MSR_AMD64_PERF_CNTR_GLOBAL_CTL		0xc0000301
 #define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR	0xc0000302
 
+/* AMD Last Branch Record MSRs */
+#define MSR_AMD64_LBR_SELECT			0xc000010e
+
 /* Fam 17h MSRs */
 #define MSR_F17H_IRPERF			0xc00000e9
 
@@ -707,6 +710,8 @@
 #define MSR_AMD_DBG_EXTN_CFG		0xc000010f
 #define MSR_AMD_SAMP_BR_FROM		0xc0010300
 
+#define DBG_EXTN_CFG_LBRV2EN		BIT_ULL(6)
+
 #define MSR_IA32_MPERF			0x000000e7
 #define MSR_IA32_APERF			0x000000e8
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/13] perf/x86/amd/lbr: Add LbrExtV2 hardware branch filter support
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
                   ` (5 preceding siblings ...)
  2022-08-11 12:29 ` [PATCH 06/13] perf/x86/amd/lbr: Add LbrExtV2 branch record support Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 12:29 ` [PATCH 08/13] perf/x86: Move branch classifier Sandipan Das
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

If AMD Last Branch Record Extension Version 2 (LbrExtV2) is detected,
convert the requested branch filter (PERF_SAMPLE_BRANCH_* flags) to the
corresponding hardware filter value and stash it in the event data when
a branch stack is requested. The hardware filter value is also saved in
per-CPU areas for use during event scheduling.

Hardware filtering is provided by the LBR Branch Select register. It has
bits which when set, suppress recording of the following types of branches:

  * CPL = 0 (Kernel only)
  * CPL > 0 (Userspace only)
  * Conditional Branches
  * Near Relative Calls
  * Near Indirect Calls
  * Near Returns
  * Near Indirect Jumps (excluding Near Indirect Calls and Near Returns)
  * Near Relative Jumps (excluding Near Relative Calls)
  * Far Branches

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/core.c | 21 ++++++---
 arch/x86/events/amd/lbr.c  | 94 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 108 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index d799628016c8..36bede1d7b1e 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -542,16 +542,24 @@ static int amd_pmu_cpu_prepare(int cpu)
 {
 	struct cpu_hw_events *cpuc = &per_cpu(cpu_hw_events, cpu);
 
+	cpuc->lbr_sel = kzalloc_node(sizeof(struct er_account), GFP_KERNEL,
+				     cpu_to_node(cpu));
+	if (!cpuc->lbr_sel)
+		return -ENOMEM;
+
 	WARN_ON_ONCE(cpuc->amd_nb);
 
 	if (!x86_pmu.amd_nb_constraints)
 		return 0;
 
 	cpuc->amd_nb = amd_alloc_nb(cpu);
-	if (!cpuc->amd_nb)
-		return -ENOMEM;
+	if (cpuc->amd_nb)
+		return 0;
 
-	return 0;
+	kfree(cpuc->lbr_sel);
+	cpuc->lbr_sel = NULL;
+
+	return -ENOMEM;
 }
 
 static void amd_pmu_cpu_starting(int cpu)
@@ -589,13 +597,14 @@ static void amd_pmu_cpu_starting(int cpu)
 
 static void amd_pmu_cpu_dead(int cpu)
 {
-	struct cpu_hw_events *cpuhw;
+	struct cpu_hw_events *cpuhw = &per_cpu(cpu_hw_events, cpu);
+
+	kfree(cpuhw->lbr_sel);
+	cpuhw->lbr_sel = NULL;
 
 	if (!x86_pmu.amd_nb_constraints)
 		return;
 
-	cpuhw = &per_cpu(cpu_hw_events, cpu);
-
 	if (cpuhw->amd_nb) {
 		struct amd_nb *nb = cpuhw->amd_nb;
 
diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index 39247e7be53c..064a6c8e1597 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -4,6 +4,39 @@
 
 #include "../perf_event.h"
 
+/* LBR Branch Select valid bits */
+#define LBR_SELECT_MASK		0x1ff
+
+/*
+ * LBR Branch Select filter bits which when set, ensures that the
+ * corresponding type of branches are not recorded
+ */
+#define LBR_SELECT_KERNEL		0	/* Branches ending in CPL = 0 */
+#define LBR_SELECT_USER			1	/* Branches ending in CPL > 0 */
+#define LBR_SELECT_JCC			2	/* Conditional branches */
+#define LBR_SELECT_CALL_NEAR_REL	3	/* Near relative calls */
+#define LBR_SELECT_CALL_NEAR_IND	4	/* Indirect relative calls */
+#define LBR_SELECT_RET_NEAR		5	/* Near returns */
+#define LBR_SELECT_JMP_NEAR_IND		6	/* Near indirect jumps (excl. calls and returns) */
+#define LBR_SELECT_JMP_NEAR_REL		7	/* Near relative jumps (excl. calls) */
+#define LBR_SELECT_FAR_BRANCH		8	/* Far branches */
+
+#define LBR_KERNEL	BIT(LBR_SELECT_KERNEL)
+#define LBR_USER	BIT(LBR_SELECT_USER)
+#define LBR_JCC		BIT(LBR_SELECT_JCC)
+#define LBR_REL_CALL	BIT(LBR_SELECT_CALL_NEAR_REL)
+#define LBR_IND_CALL	BIT(LBR_SELECT_CALL_NEAR_IND)
+#define LBR_RETURN	BIT(LBR_SELECT_RET_NEAR)
+#define LBR_REL_JMP	BIT(LBR_SELECT_JMP_NEAR_REL)
+#define LBR_IND_JMP	BIT(LBR_SELECT_JMP_NEAR_IND)
+#define LBR_FAR		BIT(LBR_SELECT_FAR_BRANCH)
+#define LBR_NOT_SUPP	-1	/* unsupported filter */
+#define LBR_IGNORE	0
+
+#define LBR_ANY		\
+	(LBR_JCC | LBR_REL_CALL | LBR_IND_CALL | LBR_RETURN |	\
+	 LBR_REL_JMP | LBR_IND_JMP | LBR_FAR)
+
 struct branch_entry {
 	union {
 		struct {
@@ -97,12 +130,56 @@ void amd_pmu_lbr_read(void)
 	cpuc->lbr_stack.hw_idx = 0;
 }
 
+static const int lbr_select_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
+	[PERF_SAMPLE_BRANCH_USER_SHIFT]		= LBR_USER,
+	[PERF_SAMPLE_BRANCH_KERNEL_SHIFT]	= LBR_KERNEL,
+	[PERF_SAMPLE_BRANCH_HV_SHIFT]		= LBR_IGNORE,
+
+	[PERF_SAMPLE_BRANCH_ANY_SHIFT]		= LBR_ANY,
+	[PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT]	= LBR_REL_CALL | LBR_IND_CALL,
+	[PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT]	= LBR_RETURN,
+	[PERF_SAMPLE_BRANCH_IND_CALL_SHIFT]	= LBR_IND_CALL,
+	[PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT]	= LBR_NOT_SUPP,
+	[PERF_SAMPLE_BRANCH_IN_TX_SHIFT]	= LBR_NOT_SUPP,
+	[PERF_SAMPLE_BRANCH_NO_TX_SHIFT]	= LBR_NOT_SUPP,
+	[PERF_SAMPLE_BRANCH_COND_SHIFT]		= LBR_JCC,
+
+	[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT]	= LBR_NOT_SUPP,
+	[PERF_SAMPLE_BRANCH_IND_JUMP_SHIFT]	= LBR_IND_JMP,
+	[PERF_SAMPLE_BRANCH_CALL_SHIFT]		= LBR_REL_CALL,
+
+	[PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT]	= LBR_NOT_SUPP,
+	[PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT]	= LBR_NOT_SUPP,
+
+	[PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT]	= LBR_NOT_SUPP,
+};
+
 static int amd_pmu_lbr_setup_filter(struct perf_event *event)
 {
+	struct hw_perf_event_extra *reg = &event->hw.branch_reg;
+	u64 br_type = event->attr.branch_sample_type;
+	u64 mask = 0, v;
+	int i;
+
 	/* No LBR support */
 	if (!x86_pmu.lbr_nr)
 		return -EOPNOTSUPP;
 
+	for (i = 0; i < PERF_SAMPLE_BRANCH_MAX_SHIFT; i++) {
+		if (!(br_type & BIT_ULL(i)))
+			continue;
+
+		v = lbr_select_map[i];
+		if (v == LBR_NOT_SUPP)
+			return -EOPNOTSUPP;
+
+		if (v != LBR_IGNORE)
+			mask |= v;
+	}
+
+	/* Filter bits operate in suppress mode */
+	reg->config = mask ^ LBR_SELECT_MASK;
+
 	return 0;
 }
 
@@ -137,6 +214,7 @@ void amd_pmu_lbr_reset(void)
 
 	cpuc->last_task_ctx = NULL;
 	cpuc->last_log_id = 0;
+	wrmsrl(MSR_AMD64_LBR_SELECT, 0);
 }
 
 void amd_pmu_lbr_add(struct perf_event *event)
@@ -146,6 +224,11 @@ void amd_pmu_lbr_add(struct perf_event *event)
 	if (!x86_pmu.lbr_nr)
 		return;
 
+	if (has_branch_stack(event)) {
+		cpuc->lbr_select = 1;
+		cpuc->lbr_sel->config = event->hw.branch_reg.config;
+	}
+
 	perf_sched_cb_inc(event->ctx->pmu);
 
 	if (!cpuc->lbr_users++ && !event->total_time_running)
@@ -159,6 +242,9 @@ void amd_pmu_lbr_del(struct perf_event *event)
 	if (!x86_pmu.lbr_nr)
 		return;
 
+	if (has_branch_stack(event))
+		cpuc->lbr_select = 0;
+
 	cpuc->lbr_users--;
 	WARN_ON_ONCE(cpuc->lbr_users < 0);
 	perf_sched_cb_dec(event->ctx->pmu);
@@ -180,11 +266,17 @@ void amd_pmu_lbr_sched_task(struct perf_event_context *ctx, bool sched_in)
 void amd_pmu_lbr_enable_all(void)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
-	u64 dbg_ctl, dbg_extn_cfg;
+	u64 lbr_select, dbg_ctl, dbg_extn_cfg;
 
 	if (!cpuc->lbr_users || !x86_pmu.lbr_nr)
 		return;
 
+	/* Set hardware branch filter */
+	if (cpuc->lbr_select) {
+		lbr_select = cpuc->lbr_sel->config & LBR_SELECT_MASK;
+		wrmsrl(MSR_AMD64_LBR_SELECT, lbr_select);
+	}
+
 	rdmsrl(MSR_IA32_DEBUGCTLMSR, dbg_ctl);
 	rdmsrl(MSR_AMD_DBG_EXTN_CFG, dbg_extn_cfg);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/13] perf/x86: Move branch classifier
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
                   ` (6 preceding siblings ...)
  2022-08-11 12:29 ` [PATCH 07/13] perf/x86/amd/lbr: Add LbrExtV2 hardware branch filter support Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 12:29 ` [PATCH 09/13] perf/x86/amd/lbr: Add LbrExtV2 software branch filter support Sandipan Das
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

Commit 3e702ff6d1ea ("perf/x86: Add LBR software filter support for Intel
CPUs") introduces a software branch filter which complements the hardware
branch filter and adds an x86 branch classifier.

Move the branch classifier to arch/x86/events/ so that it can be utilized
by other vendors for branch record filtering.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/Makefile     |   2 +-
 arch/x86/events/intel/lbr.c  | 273 -----------------------------------
 arch/x86/events/perf_event.h |  62 ++++++++
 arch/x86/events/utils.c      | 216 +++++++++++++++++++++++++++
 4 files changed, 279 insertions(+), 274 deletions(-)
 create mode 100644 arch/x86/events/utils.c

diff --git a/arch/x86/events/Makefile b/arch/x86/events/Makefile
index 9933c0e8e97a..86a76efa8bb6 100644
--- a/arch/x86/events/Makefile
+++ b/arch/x86/events/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y					+= core.o probe.o
+obj-y					+= core.o probe.o utils.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_RAPL)	+= rapl.o
 obj-y					+= amd/
 obj-$(CONFIG_X86_LOCAL_APIC)            += msr.o
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 13179f31fe10..d0dc1c358ec0 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -4,7 +4,6 @@
 
 #include <asm/perf_event.h>
 #include <asm/msr.h>
-#include <asm/insn.h>
 
 #include "../perf_event.h"
 
@@ -65,65 +64,6 @@
 
 #define LBR_FROM_SIGNEXT_2MSB	(BIT_ULL(60) | BIT_ULL(59))
 
-/*
- * x86control flow change classification
- * x86control flow changes include branches, interrupts, traps, faults
- */
-enum {
-	X86_BR_NONE		= 0,      /* unknown */
-
-	X86_BR_USER		= 1 << 0, /* branch target is user */
-	X86_BR_KERNEL		= 1 << 1, /* branch target is kernel */
-
-	X86_BR_CALL		= 1 << 2, /* call */
-	X86_BR_RET		= 1 << 3, /* return */
-	X86_BR_SYSCALL		= 1 << 4, /* syscall */
-	X86_BR_SYSRET		= 1 << 5, /* syscall return */
-	X86_BR_INT		= 1 << 6, /* sw interrupt */
-	X86_BR_IRET		= 1 << 7, /* return from interrupt */
-	X86_BR_JCC		= 1 << 8, /* conditional */
-	X86_BR_JMP		= 1 << 9, /* jump */
-	X86_BR_IRQ		= 1 << 10,/* hw interrupt or trap or fault */
-	X86_BR_IND_CALL		= 1 << 11,/* indirect calls */
-	X86_BR_ABORT		= 1 << 12,/* transaction abort */
-	X86_BR_IN_TX		= 1 << 13,/* in transaction */
-	X86_BR_NO_TX		= 1 << 14,/* not in transaction */
-	X86_BR_ZERO_CALL	= 1 << 15,/* zero length call */
-	X86_BR_CALL_STACK	= 1 << 16,/* call stack */
-	X86_BR_IND_JMP		= 1 << 17,/* indirect jump */
-
-	X86_BR_TYPE_SAVE	= 1 << 18,/* indicate to save branch type */
-
-};
-
-#define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
-#define X86_BR_ANYTX (X86_BR_NO_TX | X86_BR_IN_TX)
-
-#define X86_BR_ANY       \
-	(X86_BR_CALL    |\
-	 X86_BR_RET     |\
-	 X86_BR_SYSCALL |\
-	 X86_BR_SYSRET  |\
-	 X86_BR_INT     |\
-	 X86_BR_IRET    |\
-	 X86_BR_JCC     |\
-	 X86_BR_JMP	 |\
-	 X86_BR_IRQ	 |\
-	 X86_BR_ABORT	 |\
-	 X86_BR_IND_CALL |\
-	 X86_BR_IND_JMP  |\
-	 X86_BR_ZERO_CALL)
-
-#define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
-
-#define X86_BR_ANY_CALL		 \
-	(X86_BR_CALL		|\
-	 X86_BR_IND_CALL	|\
-	 X86_BR_ZERO_CALL	|\
-	 X86_BR_SYSCALL		|\
-	 X86_BR_IRQ		|\
-	 X86_BR_INT)
-
 /*
  * Intel LBR_CTL bits
  *
@@ -1143,219 +1083,6 @@ int intel_pmu_setup_lbr_filter(struct perf_event *event)
 	return ret;
 }
 
-/*
- * return the type of control flow change at address "from"
- * instruction is not necessarily a branch (in case of interrupt).
- *
- * The branch type returned also includes the priv level of the
- * target of the control flow change (X86_BR_USER, X86_BR_KERNEL).
- *
- * If a branch type is unknown OR the instruction cannot be
- * decoded (e.g., text page not present), then X86_BR_NONE is
- * returned.
- */
-static int branch_type(unsigned long from, unsigned long to, int abort)
-{
-	struct insn insn;
-	void *addr;
-	int bytes_read, bytes_left;
-	int ret = X86_BR_NONE;
-	int ext, to_plm, from_plm;
-	u8 buf[MAX_INSN_SIZE];
-	int is64 = 0;
-
-	to_plm = kernel_ip(to) ? X86_BR_KERNEL : X86_BR_USER;
-	from_plm = kernel_ip(from) ? X86_BR_KERNEL : X86_BR_USER;
-
-	/*
-	 * maybe zero if lbr did not fill up after a reset by the time
-	 * we get a PMU interrupt
-	 */
-	if (from == 0 || to == 0)
-		return X86_BR_NONE;
-
-	if (abort)
-		return X86_BR_ABORT | to_plm;
-
-	if (from_plm == X86_BR_USER) {
-		/*
-		 * can happen if measuring at the user level only
-		 * and we interrupt in a kernel thread, e.g., idle.
-		 */
-		if (!current->mm)
-			return X86_BR_NONE;
-
-		/* may fail if text not present */
-		bytes_left = copy_from_user_nmi(buf, (void __user *)from,
-						MAX_INSN_SIZE);
-		bytes_read = MAX_INSN_SIZE - bytes_left;
-		if (!bytes_read)
-			return X86_BR_NONE;
-
-		addr = buf;
-	} else {
-		/*
-		 * The LBR logs any address in the IP, even if the IP just
-		 * faulted. This means userspace can control the from address.
-		 * Ensure we don't blindly read any address by validating it is
-		 * a known text address.
-		 */
-		if (kernel_text_address(from)) {
-			addr = (void *)from;
-			/*
-			 * Assume we can get the maximum possible size
-			 * when grabbing kernel data.  This is not
-			 * _strictly_ true since we could possibly be
-			 * executing up next to a memory hole, but
-			 * it is very unlikely to be a problem.
-			 */
-			bytes_read = MAX_INSN_SIZE;
-		} else {
-			return X86_BR_NONE;
-		}
-	}
-
-	/*
-	 * decoder needs to know the ABI especially
-	 * on 64-bit systems running 32-bit apps
-	 */
-#ifdef CONFIG_X86_64
-	is64 = kernel_ip((unsigned long)addr) || any_64bit_mode(current_pt_regs());
-#endif
-	insn_init(&insn, addr, bytes_read, is64);
-	if (insn_get_opcode(&insn))
-		return X86_BR_ABORT;
-
-	switch (insn.opcode.bytes[0]) {
-	case 0xf:
-		switch (insn.opcode.bytes[1]) {
-		case 0x05: /* syscall */
-		case 0x34: /* sysenter */
-			ret = X86_BR_SYSCALL;
-			break;
-		case 0x07: /* sysret */
-		case 0x35: /* sysexit */
-			ret = X86_BR_SYSRET;
-			break;
-		case 0x80 ... 0x8f: /* conditional */
-			ret = X86_BR_JCC;
-			break;
-		default:
-			ret = X86_BR_NONE;
-		}
-		break;
-	case 0x70 ... 0x7f: /* conditional */
-		ret = X86_BR_JCC;
-		break;
-	case 0xc2: /* near ret */
-	case 0xc3: /* near ret */
-	case 0xca: /* far ret */
-	case 0xcb: /* far ret */
-		ret = X86_BR_RET;
-		break;
-	case 0xcf: /* iret */
-		ret = X86_BR_IRET;
-		break;
-	case 0xcc ... 0xce: /* int */
-		ret = X86_BR_INT;
-		break;
-	case 0xe8: /* call near rel */
-		if (insn_get_immediate(&insn) || insn.immediate1.value == 0) {
-			/* zero length call */
-			ret = X86_BR_ZERO_CALL;
-			break;
-		}
-		fallthrough;
-	case 0x9a: /* call far absolute */
-		ret = X86_BR_CALL;
-		break;
-	case 0xe0 ... 0xe3: /* loop jmp */
-		ret = X86_BR_JCC;
-		break;
-	case 0xe9 ... 0xeb: /* jmp */
-		ret = X86_BR_JMP;
-		break;
-	case 0xff: /* call near absolute, call far absolute ind */
-		if (insn_get_modrm(&insn))
-			return X86_BR_ABORT;
-
-		ext = (insn.modrm.bytes[0] >> 3) & 0x7;
-		switch (ext) {
-		case 2: /* near ind call */
-		case 3: /* far ind call */
-			ret = X86_BR_IND_CALL;
-			break;
-		case 4:
-		case 5:
-			ret = X86_BR_IND_JMP;
-			break;
-		}
-		break;
-	default:
-		ret = X86_BR_NONE;
-	}
-	/*
-	 * interrupts, traps, faults (and thus ring transition) may
-	 * occur on any instructions. Thus, to classify them correctly,
-	 * we need to first look at the from and to priv levels. If they
-	 * are different and to is in the kernel, then it indicates
-	 * a ring transition. If the from instruction is not a ring
-	 * transition instr (syscall, systenter, int), then it means
-	 * it was a irq, trap or fault.
-	 *
-	 * we have no way of detecting kernel to kernel faults.
-	 */
-	if (from_plm == X86_BR_USER && to_plm == X86_BR_KERNEL
-	    && ret != X86_BR_SYSCALL && ret != X86_BR_INT)
-		ret = X86_BR_IRQ;
-
-	/*
-	 * branch priv level determined by target as
-	 * is done by HW when LBR_SELECT is implemented
-	 */
-	if (ret != X86_BR_NONE)
-		ret |= to_plm;
-
-	return ret;
-}
-
-#define X86_BR_TYPE_MAP_MAX	16
-
-static int branch_map[X86_BR_TYPE_MAP_MAX] = {
-	PERF_BR_CALL,		/* X86_BR_CALL */
-	PERF_BR_RET,		/* X86_BR_RET */
-	PERF_BR_SYSCALL,	/* X86_BR_SYSCALL */
-	PERF_BR_SYSRET,		/* X86_BR_SYSRET */
-	PERF_BR_UNKNOWN,	/* X86_BR_INT */
-	PERF_BR_ERET,		/* X86_BR_IRET */
-	PERF_BR_COND,		/* X86_BR_JCC */
-	PERF_BR_UNCOND,		/* X86_BR_JMP */
-	PERF_BR_IRQ,		/* X86_BR_IRQ */
-	PERF_BR_IND_CALL,	/* X86_BR_IND_CALL */
-	PERF_BR_UNKNOWN,	/* X86_BR_ABORT */
-	PERF_BR_UNKNOWN,	/* X86_BR_IN_TX */
-	PERF_BR_UNKNOWN,	/* X86_BR_NO_TX */
-	PERF_BR_CALL,		/* X86_BR_ZERO_CALL */
-	PERF_BR_UNKNOWN,	/* X86_BR_CALL_STACK */
-	PERF_BR_IND,		/* X86_BR_IND_JMP */
-};
-
-static int
-common_branch_type(int type)
-{
-	int i;
-
-	type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
-
-	if (type) {
-		i = __ffs(type);
-		if (i < X86_BR_TYPE_MAP_MAX)
-			return branch_map[i];
-	}
-
-	return PERF_BR_UNKNOWN;
-}
-
 enum {
 	ARCH_LBR_BR_TYPE_JCC			= 0,
 	ARCH_LBR_BR_TYPE_NEAR_IND_JMP		= 1,
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index c8397290f388..8ac8f9e6affb 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1208,6 +1208,68 @@ static inline void set_linear_ip(struct pt_regs *regs, unsigned long ip)
 	regs->ip = ip;
 }
 
+/*
+ * x86control flow change classification
+ * x86control flow changes include branches, interrupts, traps, faults
+ */
+enum {
+	X86_BR_NONE		= 0,      /* unknown */
+
+	X86_BR_USER		= 1 << 0, /* branch target is user */
+	X86_BR_KERNEL		= 1 << 1, /* branch target is kernel */
+
+	X86_BR_CALL		= 1 << 2, /* call */
+	X86_BR_RET		= 1 << 3, /* return */
+	X86_BR_SYSCALL		= 1 << 4, /* syscall */
+	X86_BR_SYSRET		= 1 << 5, /* syscall return */
+	X86_BR_INT		= 1 << 6, /* sw interrupt */
+	X86_BR_IRET		= 1 << 7, /* return from interrupt */
+	X86_BR_JCC		= 1 << 8, /* conditional */
+	X86_BR_JMP		= 1 << 9, /* jump */
+	X86_BR_IRQ		= 1 << 10,/* hw interrupt or trap or fault */
+	X86_BR_IND_CALL		= 1 << 11,/* indirect calls */
+	X86_BR_ABORT		= 1 << 12,/* transaction abort */
+	X86_BR_IN_TX		= 1 << 13,/* in transaction */
+	X86_BR_NO_TX		= 1 << 14,/* not in transaction */
+	X86_BR_ZERO_CALL	= 1 << 15,/* zero length call */
+	X86_BR_CALL_STACK	= 1 << 16,/* call stack */
+	X86_BR_IND_JMP		= 1 << 17,/* indirect jump */
+
+	X86_BR_TYPE_SAVE	= 1 << 18,/* indicate to save branch type */
+
+};
+
+#define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
+#define X86_BR_ANYTX (X86_BR_NO_TX | X86_BR_IN_TX)
+
+#define X86_BR_ANY       \
+	(X86_BR_CALL    |\
+	 X86_BR_RET     |\
+	 X86_BR_SYSCALL |\
+	 X86_BR_SYSRET  |\
+	 X86_BR_INT     |\
+	 X86_BR_IRET    |\
+	 X86_BR_JCC     |\
+	 X86_BR_JMP	 |\
+	 X86_BR_IRQ	 |\
+	 X86_BR_ABORT	 |\
+	 X86_BR_IND_CALL |\
+	 X86_BR_IND_JMP  |\
+	 X86_BR_ZERO_CALL)
+
+#define X86_BR_ALL (X86_BR_PLM | X86_BR_ANY)
+
+#define X86_BR_ANY_CALL		 \
+	(X86_BR_CALL		|\
+	 X86_BR_IND_CALL	|\
+	 X86_BR_ZERO_CALL	|\
+	 X86_BR_SYSCALL		|\
+	 X86_BR_IRQ		|\
+	 X86_BR_INT)
+
+int common_branch_type(int type);
+int branch_type(unsigned long from, unsigned long to, int abort);
+
 ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event);
 ssize_t intel_event_sysfs_show(char *page, u64 config);
 
diff --git a/arch/x86/events/utils.c b/arch/x86/events/utils.c
new file mode 100644
index 000000000000..a32368945462
--- /dev/null
+++ b/arch/x86/events/utils.c
@@ -0,0 +1,216 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <asm/insn.h>
+
+#include "perf_event.h"
+
+/*
+ * return the type of control flow change at address "from"
+ * instruction is not necessarily a branch (in case of interrupt).
+ *
+ * The branch type returned also includes the priv level of the
+ * target of the control flow change (X86_BR_USER, X86_BR_KERNEL).
+ *
+ * If a branch type is unknown OR the instruction cannot be
+ * decoded (e.g., text page not present), then X86_BR_NONE is
+ * returned.
+ */
+int branch_type(unsigned long from, unsigned long to, int abort)
+{
+	struct insn insn;
+	void *addr;
+	int bytes_read, bytes_left;
+	int ret = X86_BR_NONE;
+	int ext, to_plm, from_plm;
+	u8 buf[MAX_INSN_SIZE];
+	int is64 = 0;
+
+	to_plm = kernel_ip(to) ? X86_BR_KERNEL : X86_BR_USER;
+	from_plm = kernel_ip(from) ? X86_BR_KERNEL : X86_BR_USER;
+
+	/*
+	 * maybe zero if lbr did not fill up after a reset by the time
+	 * we get a PMU interrupt
+	 */
+	if (from == 0 || to == 0)
+		return X86_BR_NONE;
+
+	if (abort)
+		return X86_BR_ABORT | to_plm;
+
+	if (from_plm == X86_BR_USER) {
+		/*
+		 * can happen if measuring at the user level only
+		 * and we interrupt in a kernel thread, e.g., idle.
+		 */
+		if (!current->mm)
+			return X86_BR_NONE;
+
+		/* may fail if text not present */
+		bytes_left = copy_from_user_nmi(buf, (void __user *)from,
+						MAX_INSN_SIZE);
+		bytes_read = MAX_INSN_SIZE - bytes_left;
+		if (!bytes_read)
+			return X86_BR_NONE;
+
+		addr = buf;
+	} else {
+		/*
+		 * The LBR logs any address in the IP, even if the IP just
+		 * faulted. This means userspace can control the from address.
+		 * Ensure we don't blindly read any address by validating it is
+		 * a known text address.
+		 */
+		if (kernel_text_address(from)) {
+			addr = (void *)from;
+			/*
+			 * Assume we can get the maximum possible size
+			 * when grabbing kernel data.  This is not
+			 * _strictly_ true since we could possibly be
+			 * executing up next to a memory hole, but
+			 * it is very unlikely to be a problem.
+			 */
+			bytes_read = MAX_INSN_SIZE;
+		} else {
+			return X86_BR_NONE;
+		}
+	}
+
+	/*
+	 * decoder needs to know the ABI especially
+	 * on 64-bit systems running 32-bit apps
+	 */
+#ifdef CONFIG_X86_64
+	is64 = kernel_ip((unsigned long)addr) || any_64bit_mode(current_pt_regs());
+#endif
+	insn_init(&insn, addr, bytes_read, is64);
+	if (insn_get_opcode(&insn))
+		return X86_BR_ABORT;
+
+	switch (insn.opcode.bytes[0]) {
+	case 0xf:
+		switch (insn.opcode.bytes[1]) {
+		case 0x05: /* syscall */
+		case 0x34: /* sysenter */
+			ret = X86_BR_SYSCALL;
+			break;
+		case 0x07: /* sysret */
+		case 0x35: /* sysexit */
+			ret = X86_BR_SYSRET;
+			break;
+		case 0x80 ... 0x8f: /* conditional */
+			ret = X86_BR_JCC;
+			break;
+		default:
+			ret = X86_BR_NONE;
+		}
+		break;
+	case 0x70 ... 0x7f: /* conditional */
+		ret = X86_BR_JCC;
+		break;
+	case 0xc2: /* near ret */
+	case 0xc3: /* near ret */
+	case 0xca: /* far ret */
+	case 0xcb: /* far ret */
+		ret = X86_BR_RET;
+		break;
+	case 0xcf: /* iret */
+		ret = X86_BR_IRET;
+		break;
+	case 0xcc ... 0xce: /* int */
+		ret = X86_BR_INT;
+		break;
+	case 0xe8: /* call near rel */
+		if (insn_get_immediate(&insn) || insn.immediate1.value == 0) {
+			/* zero length call */
+			ret = X86_BR_ZERO_CALL;
+			break;
+		}
+		fallthrough;
+	case 0x9a: /* call far absolute */
+		ret = X86_BR_CALL;
+		break;
+	case 0xe0 ... 0xe3: /* loop jmp */
+		ret = X86_BR_JCC;
+		break;
+	case 0xe9 ... 0xeb: /* jmp */
+		ret = X86_BR_JMP;
+		break;
+	case 0xff: /* call near absolute, call far absolute ind */
+		if (insn_get_modrm(&insn))
+			return X86_BR_ABORT;
+
+		ext = (insn.modrm.bytes[0] >> 3) & 0x7;
+		switch (ext) {
+		case 2: /* near ind call */
+		case 3: /* far ind call */
+			ret = X86_BR_IND_CALL;
+			break;
+		case 4:
+		case 5:
+			ret = X86_BR_IND_JMP;
+			break;
+		}
+		break;
+	default:
+		ret = X86_BR_NONE;
+	}
+	/*
+	 * interrupts, traps, faults (and thus ring transition) may
+	 * occur on any instructions. Thus, to classify them correctly,
+	 * we need to first look at the from and to priv levels. If they
+	 * are different and to is in the kernel, then it indicates
+	 * a ring transition. If the from instruction is not a ring
+	 * transition instr (syscall, systenter, int), then it means
+	 * it was a irq, trap or fault.
+	 *
+	 * we have no way of detecting kernel to kernel faults.
+	 */
+	if (from_plm == X86_BR_USER && to_plm == X86_BR_KERNEL
+	    && ret != X86_BR_SYSCALL && ret != X86_BR_INT)
+		ret = X86_BR_IRQ;
+
+	/*
+	 * branch priv level determined by target as
+	 * is done by HW when LBR_SELECT is implemented
+	 */
+	if (ret != X86_BR_NONE)
+		ret |= to_plm;
+
+	return ret;
+}
+
+#define X86_BR_TYPE_MAP_MAX	16
+
+static int branch_map[X86_BR_TYPE_MAP_MAX] = {
+	PERF_BR_CALL,		/* X86_BR_CALL */
+	PERF_BR_RET,		/* X86_BR_RET */
+	PERF_BR_SYSCALL,	/* X86_BR_SYSCALL */
+	PERF_BR_SYSRET,		/* X86_BR_SYSRET */
+	PERF_BR_UNKNOWN,	/* X86_BR_INT */
+	PERF_BR_ERET,		/* X86_BR_IRET */
+	PERF_BR_COND,		/* X86_BR_JCC */
+	PERF_BR_UNCOND,		/* X86_BR_JMP */
+	PERF_BR_IRQ,		/* X86_BR_IRQ */
+	PERF_BR_IND_CALL,	/* X86_BR_IND_CALL */
+	PERF_BR_UNKNOWN,	/* X86_BR_ABORT */
+	PERF_BR_UNKNOWN,	/* X86_BR_IN_TX */
+	PERF_BR_UNKNOWN,	/* X86_BR_NO_TX */
+	PERF_BR_CALL,		/* X86_BR_ZERO_CALL */
+	PERF_BR_UNKNOWN,	/* X86_BR_CALL_STACK */
+	PERF_BR_IND,		/* X86_BR_IND_JMP */
+};
+
+int common_branch_type(int type)
+{
+	int i;
+
+	type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
+
+	if (type) {
+		i = __ffs(type);
+		if (i < X86_BR_TYPE_MAP_MAX)
+			return branch_map[i];
+	}
+
+	return PERF_BR_UNKNOWN;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/13] perf/x86/amd/lbr: Add LbrExtV2 software branch filter support
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
                   ` (7 preceding siblings ...)
  2022-08-11 12:29 ` [PATCH 08/13] perf/x86: Move branch classifier Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 12:29 ` [PATCH 10/13] perf/x86: Make branch classifier fusion-aware Sandipan Das
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

With AMD Last Branch Record Extension Version 2 (LbrExtV2), it is necessary
to process the branch records further as hardware filtering is not granular
enough for identifying certain types of branches. E.g. to record system
calls, one should record far branches. The filter captures both far calls
and far returns but the irrelevant records are filtered out based on the
branch type as seen by the branch classifier.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/lbr.c | 92 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 87 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index 064a6c8e1597..3699ba53a326 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -94,6 +94,50 @@ static __always_inline u64 sign_ext_branch_ip(u64 ip)
 	return (u64)(((s64)ip << shift) >> shift);
 }
 
+static void amd_pmu_lbr_filter(void)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int br_sel = cpuc->br_sel, type, i, j;
+	bool compress = false;
+	u64 from, to;
+
+	/* If sampling all branches, there is nothing to filter */
+	if (((br_sel & X86_BR_ALL) == X86_BR_ALL) &&
+	    ((br_sel & X86_BR_TYPE_SAVE) != X86_BR_TYPE_SAVE))
+		return;
+
+	for (i = 0; i < cpuc->lbr_stack.nr; i++) {
+		from = cpuc->lbr_entries[i].from;
+		to = cpuc->lbr_entries[i].to;
+		type = branch_type(from, to, 0);
+
+		/* If type does not correspond, then discard */
+		if (type == X86_BR_NONE || (br_sel & type) != type) {
+			cpuc->lbr_entries[i].from = 0;	/* mark invalid */
+			compress = true;
+		}
+
+		if ((br_sel & X86_BR_TYPE_SAVE) == X86_BR_TYPE_SAVE)
+			cpuc->lbr_entries[i].type = common_branch_type(type);
+	}
+
+	if (!compress)
+		return;
+
+	/* Remove all invalid entries */
+	for (i = 0; i < cpuc->lbr_stack.nr; ) {
+		if (!cpuc->lbr_entries[i].from) {
+			j = i;
+			while (++j < cpuc->lbr_stack.nr)
+				cpuc->lbr_entries[j - 1] = cpuc->lbr_entries[j];
+			cpuc->lbr_stack.nr--;
+			if (!cpuc->lbr_entries[i].from)
+				continue;
+		}
+		i++;
+	}
+}
+
 void amd_pmu_lbr_read(void)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -128,6 +172,9 @@ void amd_pmu_lbr_read(void)
 	 * LBR To[0] always represent the TOS
 	 */
 	cpuc->lbr_stack.hw_idx = 0;
+
+	/* Perform further software filtering */
+	amd_pmu_lbr_filter();
 }
 
 static const int lbr_select_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
@@ -136,8 +183,8 @@ static const int lbr_select_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
 	[PERF_SAMPLE_BRANCH_HV_SHIFT]		= LBR_IGNORE,
 
 	[PERF_SAMPLE_BRANCH_ANY_SHIFT]		= LBR_ANY,
-	[PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT]	= LBR_REL_CALL | LBR_IND_CALL,
-	[PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT]	= LBR_RETURN,
+	[PERF_SAMPLE_BRANCH_ANY_CALL_SHIFT]	= LBR_REL_CALL | LBR_IND_CALL | LBR_FAR,
+	[PERF_SAMPLE_BRANCH_ANY_RETURN_SHIFT]	= LBR_RETURN | LBR_FAR,
 	[PERF_SAMPLE_BRANCH_IND_CALL_SHIFT]	= LBR_IND_CALL,
 	[PERF_SAMPLE_BRANCH_ABORT_TX_SHIFT]	= LBR_NOT_SUPP,
 	[PERF_SAMPLE_BRANCH_IN_TX_SHIFT]	= LBR_NOT_SUPP,
@@ -150,8 +197,6 @@ static const int lbr_select_map[PERF_SAMPLE_BRANCH_MAX_SHIFT] = {
 
 	[PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT]	= LBR_NOT_SUPP,
 	[PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT]	= LBR_NOT_SUPP,
-
-	[PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT]	= LBR_NOT_SUPP,
 };
 
 static int amd_pmu_lbr_setup_filter(struct perf_event *event)
@@ -165,6 +210,41 @@ static int amd_pmu_lbr_setup_filter(struct perf_event *event)
 	if (!x86_pmu.lbr_nr)
 		return -EOPNOTSUPP;
 
+	if (br_type & PERF_SAMPLE_BRANCH_USER)
+		mask |= X86_BR_USER;
+
+	if (br_type & PERF_SAMPLE_BRANCH_KERNEL)
+		mask |= X86_BR_KERNEL;
+
+	/* Ignore BRANCH_HV here */
+
+	if (br_type & PERF_SAMPLE_BRANCH_ANY)
+		mask |= X86_BR_ANY;
+
+	if (br_type & PERF_SAMPLE_BRANCH_ANY_CALL)
+		mask |= X86_BR_ANY_CALL;
+
+	if (br_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
+		mask |= X86_BR_RET | X86_BR_IRET | X86_BR_SYSRET;
+
+	if (br_type & PERF_SAMPLE_BRANCH_IND_CALL)
+		mask |= X86_BR_IND_CALL;
+
+	if (br_type & PERF_SAMPLE_BRANCH_COND)
+		mask |= X86_BR_JCC;
+
+	if (br_type & PERF_SAMPLE_BRANCH_IND_JUMP)
+		mask |= X86_BR_IND_JMP;
+
+	if (br_type & PERF_SAMPLE_BRANCH_CALL)
+		mask |= X86_BR_CALL | X86_BR_ZERO_CALL;
+
+	if (br_type & PERF_SAMPLE_BRANCH_TYPE_SAVE)
+		mask |= X86_BR_TYPE_SAVE;
+
+	reg->reg = mask;
+	mask = 0;
+
 	for (i = 0; i < PERF_SAMPLE_BRANCH_MAX_SHIFT; i++) {
 		if (!(br_type & BIT_ULL(i)))
 			continue;
@@ -220,13 +300,15 @@ void amd_pmu_lbr_reset(void)
 void amd_pmu_lbr_add(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct hw_perf_event_extra *reg = &event->hw.branch_reg;
 
 	if (!x86_pmu.lbr_nr)
 		return;
 
 	if (has_branch_stack(event)) {
 		cpuc->lbr_select = 1;
-		cpuc->lbr_sel->config = event->hw.branch_reg.config;
+		cpuc->lbr_sel->config = reg->config;
+		cpuc->br_sel = reg->reg;
 	}
 
 	perf_sched_cb_inc(event->ctx->pmu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/13] perf/x86: Make branch classifier fusion-aware
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
                   ` (8 preceding siblings ...)
  2022-08-11 12:29 ` [PATCH 09/13] perf/x86/amd/lbr: Add LbrExtV2 software branch filter support Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 12:29 ` [PATCH 11/13] perf/x86/amd/lbr: Use fusion-aware branch classifier Sandipan Das
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

With branch fusion and other optimizations, branch sampling hardware in
some processors can report a branch from address that points to an
instruction preceding the actual branch by several bytes.

In such cases, the classifier cannot determine the branch type which leads
to failures such as with the recently added test from commit b55878c90ab9
("perf test: Add test for branch stack sampling"). Branch information is
also easier to consume and annotate if branch from addresses always point
to branch instructions.

Add a new variant of the branch classifier that can account for instruction
fusion. If fusion is expected and the current branch from address does not
point to a branch instruction, it attempts to find the first branch within
the next (MAX_INSN_SIZE - 1) bytes and if found, additionally provides the
offset between the reported branch from address and the address of the
expected branch instruction.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/perf_event.h |   2 +
 arch/x86/events/utils.c      | 169 +++++++++++++++++++++--------------
 2 files changed, 102 insertions(+), 69 deletions(-)

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 8ac8f9e6affb..eb6227630244 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1269,6 +1269,8 @@ enum {
 
 int common_branch_type(int type);
 int branch_type(unsigned long from, unsigned long to, int abort);
+int branch_type_fused(unsigned long from, unsigned long to, int abort,
+		      int *offset);
 
 ssize_t x86_event_sysfs_show(char *page, u64 config, u64 event);
 ssize_t intel_event_sysfs_show(char *page, u64 config);
diff --git a/arch/x86/events/utils.c b/arch/x86/events/utils.c
index a32368945462..e013243f360c 100644
--- a/arch/x86/events/utils.c
+++ b/arch/x86/events/utils.c
@@ -3,6 +3,68 @@
 
 #include "perf_event.h"
 
+static int decode_branch_type(struct insn *insn)
+{
+	int ext;
+
+	if (insn_get_opcode(insn))
+		return X86_BR_ABORT;
+
+	switch (insn->opcode.bytes[0]) {
+	case 0xf:
+		switch (insn->opcode.bytes[1]) {
+		case 0x05: /* syscall */
+		case 0x34: /* sysenter */
+			return X86_BR_SYSCALL;
+		case 0x07: /* sysret */
+		case 0x35: /* sysexit */
+			return X86_BR_SYSRET;
+		case 0x80 ... 0x8f: /* conditional */
+			return X86_BR_JCC;
+		}
+		return X86_BR_NONE;
+	case 0x70 ... 0x7f: /* conditional */
+		return X86_BR_JCC;
+	case 0xc2: /* near ret */
+	case 0xc3: /* near ret */
+	case 0xca: /* far ret */
+	case 0xcb: /* far ret */
+		return X86_BR_RET;
+	case 0xcf: /* iret */
+		return X86_BR_IRET;
+	case 0xcc ... 0xce: /* int */
+		return X86_BR_INT;
+	case 0xe8: /* call near rel */
+		if (insn_get_immediate(insn) || insn->immediate1.value == 0) {
+			/* zero length call */
+			return X86_BR_ZERO_CALL;
+		}
+		fallthrough;
+	case 0x9a: /* call far absolute */
+		return X86_BR_CALL;
+	case 0xe0 ... 0xe3: /* loop jmp */
+		return X86_BR_JCC;
+	case 0xe9 ... 0xeb: /* jmp */
+		return X86_BR_JMP;
+	case 0xff: /* call near absolute, call far absolute ind */
+		if (insn_get_modrm(insn))
+			return X86_BR_ABORT;
+
+		ext = (insn->modrm.bytes[0] >> 3) & 0x7;
+		switch (ext) {
+		case 2: /* near ind call */
+		case 3: /* far ind call */
+			return X86_BR_IND_CALL;
+		case 4:
+		case 5:
+			return X86_BR_IND_JMP;
+		}
+		return X86_BR_NONE;
+	}
+
+	return X86_BR_NONE;
+}
+
 /*
  * return the type of control flow change at address "from"
  * instruction is not necessarily a branch (in case of interrupt).
@@ -13,14 +75,22 @@
  * If a branch type is unknown OR the instruction cannot be
  * decoded (e.g., text page not present), then X86_BR_NONE is
  * returned.
+ *
+ * While recording branches, some processors can report the "from"
+ * address to be that of an instruction preceding the actual branch
+ * when instruction fusion occurs. If fusion is expected, attempt to
+ * find the type of the first branch instruction within the next
+ * MAX_INSN_SIZE bytes and if found, provide the offset between the
+ * reported "from" address and the actual branch instruction address.
  */
-int branch_type(unsigned long from, unsigned long to, int abort)
+static int get_branch_type(unsigned long from, unsigned long to, int abort,
+			   bool fused, int *offset)
 {
 	struct insn insn;
 	void *addr;
-	int bytes_read, bytes_left;
+	int bytes_read, bytes_left, insn_offset;
 	int ret = X86_BR_NONE;
-	int ext, to_plm, from_plm;
+	int to_plm, from_plm;
 	u8 buf[MAX_INSN_SIZE];
 	int is64 = 0;
 
@@ -83,77 +153,27 @@ int branch_type(unsigned long from, unsigned long to, int abort)
 	is64 = kernel_ip((unsigned long)addr) || any_64bit_mode(current_pt_regs());
 #endif
 	insn_init(&insn, addr, bytes_read, is64);
-	if (insn_get_opcode(&insn))
-		return X86_BR_ABORT;
+	ret = decode_branch_type(&insn);
+	insn_offset = 0;
 
-	switch (insn.opcode.bytes[0]) {
-	case 0xf:
-		switch (insn.opcode.bytes[1]) {
-		case 0x05: /* syscall */
-		case 0x34: /* sysenter */
-			ret = X86_BR_SYSCALL;
-			break;
-		case 0x07: /* sysret */
-		case 0x35: /* sysexit */
-			ret = X86_BR_SYSRET;
-			break;
-		case 0x80 ... 0x8f: /* conditional */
-			ret = X86_BR_JCC;
-			break;
-		default:
-			ret = X86_BR_NONE;
-		}
-		break;
-	case 0x70 ... 0x7f: /* conditional */
-		ret = X86_BR_JCC;
-		break;
-	case 0xc2: /* near ret */
-	case 0xc3: /* near ret */
-	case 0xca: /* far ret */
-	case 0xcb: /* far ret */
-		ret = X86_BR_RET;
-		break;
-	case 0xcf: /* iret */
-		ret = X86_BR_IRET;
-		break;
-	case 0xcc ... 0xce: /* int */
-		ret = X86_BR_INT;
-		break;
-	case 0xe8: /* call near rel */
-		if (insn_get_immediate(&insn) || insn.immediate1.value == 0) {
-			/* zero length call */
-			ret = X86_BR_ZERO_CALL;
+	/* Check for the possibility of branch fusion */
+	while (fused && ret == X86_BR_NONE) {
+		/* Check for decoding errors */
+		if (insn_get_length(&insn) || !insn.length)
 			break;
-		}
-		fallthrough;
-	case 0x9a: /* call far absolute */
-		ret = X86_BR_CALL;
-		break;
-	case 0xe0 ... 0xe3: /* loop jmp */
-		ret = X86_BR_JCC;
-		break;
-	case 0xe9 ... 0xeb: /* jmp */
-		ret = X86_BR_JMP;
-		break;
-	case 0xff: /* call near absolute, call far absolute ind */
-		if (insn_get_modrm(&insn))
-			return X86_BR_ABORT;
 
-		ext = (insn.modrm.bytes[0] >> 3) & 0x7;
-		switch (ext) {
-		case 2: /* near ind call */
-		case 3: /* far ind call */
-			ret = X86_BR_IND_CALL;
+		insn_offset += insn.length;
+		bytes_read -= insn.length;
+		if (bytes_read < 0)
 			break;
-		case 4:
-		case 5:
-			ret = X86_BR_IND_JMP;
-			break;
-		}
-		break;
-	default:
-		ret = X86_BR_NONE;
+
+		insn_init(&insn, addr + insn_offset, bytes_read, is64);
+		ret = decode_branch_type(&insn);
 	}
+
+	if (offset)
+		*offset = insn_offset;
+
 	/*
 	 * interrupts, traps, faults (and thus ring transition) may
 	 * occur on any instructions. Thus, to classify them correctly,
@@ -179,6 +199,17 @@ int branch_type(unsigned long from, unsigned long to, int abort)
 	return ret;
 }
 
+int branch_type(unsigned long from, unsigned long to, int abort)
+{
+	return get_branch_type(from, to, abort, false, NULL);
+}
+
+int branch_type_fused(unsigned long from, unsigned long to, int abort,
+		      int *offset)
+{
+	return get_branch_type(from, to, abort, true, offset);
+}
+
 #define X86_BR_TYPE_MAP_MAX	16
 
 static int branch_map[X86_BR_TYPE_MAP_MAX] = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/13] perf/x86/amd/lbr: Use fusion-aware branch classifier
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
                   ` (9 preceding siblings ...)
  2022-08-11 12:29 ` [PATCH 10/13] perf/x86: Make branch classifier fusion-aware Sandipan Das
@ 2022-08-11 12:29 ` Sandipan Das
  2022-08-11 12:30 ` [PATCH 12/13] perf/core: Add speculation info to branch entries Sandipan Das
  2022-08-11 12:30 ` [PATCH 13/13] perf/x86/amd/lbr: Add LbrExtV2 branch speculation info support Sandipan Das
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:29 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

AMD Last Branch Record Extension Version 2 (LbrExtV2) can report a branch
from address that points to an instruction preceding the actual branch by
several bytes due to branch fusion and further optimizations in Zen4
processors.

In such cases, software should move forward sequentially in the instruction
stream from the reported address and the address of the first branch
encountered should be used instead. Hence, use the fusion-aware branch
classifier to determine the correct branch type and get the offset for
adjusting the branch from address.

Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/lbr.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index 3699ba53a326..5fa985cd44cb 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -97,7 +97,7 @@ static __always_inline u64 sign_ext_branch_ip(u64 ip)
 static void amd_pmu_lbr_filter(void)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
-	int br_sel = cpuc->br_sel, type, i, j;
+	int br_sel = cpuc->br_sel, offset, type, i, j;
 	bool compress = false;
 	u64 from, to;
 
@@ -109,7 +109,15 @@ static void amd_pmu_lbr_filter(void)
 	for (i = 0; i < cpuc->lbr_stack.nr; i++) {
 		from = cpuc->lbr_entries[i].from;
 		to = cpuc->lbr_entries[i].to;
-		type = branch_type(from, to, 0);
+		type = branch_type_fused(from, to, 0, &offset);
+
+		/*
+		 * Adjust the branch from address in case of instruction
+		 * fusion where it points to an instruction preceding the
+		 * actual branch
+		 */
+		if (offset)
+			cpuc->lbr_entries[i].from += offset;
 
 		/* If type does not correspond, then discard */
 		if (type == X86_BR_NONE || (br_sel & type) != type) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/13] perf/core: Add speculation info to branch entries
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
                   ` (10 preceding siblings ...)
  2022-08-11 12:29 ` [PATCH 11/13] perf/x86/amd/lbr: Use fusion-aware branch classifier Sandipan Das
@ 2022-08-11 12:30 ` Sandipan Das
  2022-08-11 12:30 ` [PATCH 13/13] perf/x86/amd/lbr: Add LbrExtV2 branch speculation info support Sandipan Das
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:30 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

Add a new "spec" bitfield to branch entries for providing speculation
information. This will be populated using hints provided by branch sampling
features on supported hardware. The following cases are covered:

  * No branch speculation information is available
  * Branch is speculative but taken on the wrong path
  * Branch is non-speculative but taken on the correct path
  * Branch is speculative and taken on the correct path

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 include/linux/perf_event.h      |  1 +
 include/uapi/linux/perf_event.h | 15 ++++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index ee8b9ecdc03b..ae30c61957d2 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1078,6 +1078,7 @@ static inline void perf_clear_branch_entry_bitfields(struct perf_branch_entry *b
 	br->abort = 0;
 	br->cycles = 0;
 	br->type = 0;
+	br->spec = PERF_BR_SPEC_NA;
 	br->reserved = 0;
 }
 
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 0474ee362151..9656733a95a5 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -256,6 +256,17 @@ enum {
 	PERF_BR_MAX,
 };
 
+/*
+ * Common branch speculation outcome classification
+ */
+enum {
+	PERF_BR_SPEC_NA			= 0,	/* Not available */
+	PERF_BR_SPEC_WRONG_PATH		= 1,	/* Speculative but on wrong path */
+	PERF_BR_NON_SPEC_CORRECT_PATH	= 2,	/* Non-speculative but on correct path */
+	PERF_BR_SPEC_CORRECT_PATH	= 3,	/* Speculative and on correct path */
+	PERF_BR_SPEC_MAX,
+};
+
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
 	(PERF_SAMPLE_BRANCH_USER|\
 	 PERF_SAMPLE_BRANCH_KERNEL|\
@@ -1363,6 +1374,7 @@ union perf_mem_data_src {
  *     abort: aborting a hardware transaction
  *    cycles: cycles from last branch (or 0 if not supported)
  *      type: branch type
+ *      spec: branch speculation info (or 0 if not supported)
  */
 struct perf_branch_entry {
 	__u64	from;
@@ -1373,7 +1385,8 @@ struct perf_branch_entry {
 		abort:1,    /* transaction abort */
 		cycles:16,  /* cycle count to last branch */
 		type:4,     /* branch type */
-		reserved:40;
+		spec:2,     /* branch speculation info */
+		reserved:38;
 };
 
 union perf_sample_weight {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 13/13] perf/x86/amd/lbr: Add LbrExtV2 branch speculation info support
  2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
                   ` (11 preceding siblings ...)
  2022-08-11 12:30 ` [PATCH 12/13] perf/core: Add speculation info to branch entries Sandipan Das
@ 2022-08-11 12:30 ` Sandipan Das
  12 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-11 12:30 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, x86
  Cc: peterz, bp, acme, namhyung, jolsa, tglx, mingo, mark.rutland,
	alexander.shishkin, dave.hansen, like.xu.linux, eranian,
	ananth.narayan, ravi.bangoria, santosh.shukla, sandipan.das

Provide branch speculation information captured via AMD Last Branch Record
Extension Version 2 (LbrExtV2) by setting the speculation info in branch
records. The info is based on the "valid" and "spec" bits in the Branch To
registers.

Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
---
 arch/x86/events/amd/lbr.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/amd/lbr.c b/arch/x86/events/amd/lbr.c
index 5fa985cd44cb..3fd316b6adda 100644
--- a/arch/x86/events/amd/lbr.c
+++ b/arch/x86/events/amd/lbr.c
@@ -146,12 +146,19 @@ static void amd_pmu_lbr_filter(void)
 	}
 }
 
+static const int lbr_spec_map[PERF_BR_SPEC_MAX] = {
+	PERF_BR_SPEC_NA,
+	PERF_BR_SPEC_WRONG_PATH,
+	PERF_BR_NON_SPEC_CORRECT_PATH,
+	PERF_BR_SPEC_CORRECT_PATH,
+};
+
 void amd_pmu_lbr_read(void)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_branch_entry *br = cpuc->lbr_entries;
 	struct branch_entry entry;
-	int out = 0, i;
+	int out = 0, idx, i;
 
 	if (!cpuc->lbr_users)
 		return;
@@ -160,8 +167,11 @@ void amd_pmu_lbr_read(void)
 		entry.from.full	= amd_pmu_lbr_get_from(i);
 		entry.to.full	= amd_pmu_lbr_get_to(i);
 
-		/* Check if a branch has been logged */
-		if (!entry.to.split.valid)
+		/*
+		 * Check if a branch has been logged; if valid = 0, spec = 0
+		 * then no branch was recorded
+		 */
+		if (!entry.to.split.valid && !entry.to.split.spec)
 			continue;
 
 		perf_clear_branch_entry_bitfields(br + out);
@@ -170,6 +180,25 @@ void amd_pmu_lbr_read(void)
 		br[out].to	= sign_ext_branch_ip(entry.to.split.ip);
 		br[out].mispred	= entry.from.split.mispredict;
 		br[out].predicted = !br[out].mispred;
+
+		/*
+		 * Set branch speculation information using the status of
+		 * the valid and spec bits.
+		 *
+		 * When valid = 0, spec = 0, no branch was recorded and the
+		 * entry is discarded as seen above.
+		 *
+		 * When valid = 0, spec = 1, the recorded branch was
+		 * speculative but took the wrong path.
+		 *
+		 * When valid = 1, spec = 0, the recorded branch was
+		 * non-speculative but took the correct path.
+		 *
+		 * When valid = 1, spec = 1, the recorded branch was
+		 * speculative and took the correct path
+		 */
+		idx = (entry.to.split.valid << 1) | entry.to.split.spec;
+		br[out].spec = lbr_spec_map[idx];
 		out++;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit
  2022-08-11 12:29 ` [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit Sandipan Das
@ 2022-08-11 13:13   ` Borislav Petkov
  2022-08-15 11:27   ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Borislav Petkov @ 2022-08-11 13:13 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, peterz, acme, namhyung,
	jolsa, tglx, mingo, mark.rutland, alexander.shishkin,
	dave.hansen, like.xu.linux, eranian, ananth.narayan,
	ravi.bangoria, santosh.shukla

On Thu, Aug 11, 2022 at 05:59:52PM +0530, Sandipan Das wrote:
> CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
> monitoring features for AMD processors.
> 
> Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
> (LbrExtV2) features. If found to be set during PMU initialization, the EBX
> bits of the same leaf can be used to determine the number of available LBR
> entries.
> 
> For better utilization of feature words, LbrExtV2 is added as a scattered

s/LbrExtV2 is added/add LbrExtV2/

> feature bit.
> 
> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h | 2 +-
>  arch/x86/kernel/cpu/scattered.c    | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 393f2bbb5e3a..e3fa476a24b0 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -96,7 +96,7 @@
>  #define X86_FEATURE_SYSCALL32		( 3*32+14) /* "" syscall in IA32 userspace */
>  #define X86_FEATURE_SYSENTER32		( 3*32+15) /* "" sysenter in IA32 userspace */
>  #define X86_FEATURE_REP_GOOD		( 3*32+16) /* REP microcode works well */
> -/* FREE!                                ( 3*32+17) */
> +#define X86_FEATURE_LBREXT_V2		( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
>  #define X86_FEATURE_LFENCE_RDTSC	( 3*32+18) /* "" LFENCE synchronizes RDTSC */
>  #define X86_FEATURE_ACC_POWER		( 3*32+19) /* AMD Accumulated Power Mechanism */
>  #define X86_FEATURE_NOPL		( 3*32+20) /* The NOPL (0F 1F) instructions */
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index dbaa8326d6f2..6be46dffddbf 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
>  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
> +	{ X86_FEATURE_LBREXT_V2,	CPUID_EAX,  1, 0x80000022, 0 },
>  	{ 0, 0, 0, 0, 0 }
>  };

Other than that:

Acked-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit
  2022-08-11 12:29 ` [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit Sandipan Das
  2022-08-11 13:13   ` Borislav Petkov
@ 2022-08-15 11:27   ` Peter Zijlstra
  2022-08-15 19:42     ` Stephane Eranian
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2022-08-15 11:27 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, bp, acme, namhyung, jolsa,
	tglx, mingo, mark.rutland, alexander.shishkin, dave.hansen,
	like.xu.linux, eranian, ananth.narayan, ravi.bangoria,
	santosh.shukla

On Thu, Aug 11, 2022 at 05:59:52PM +0530, Sandipan Das wrote:
> CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
> monitoring features for AMD processors.
> 
> Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
> (LbrExtV2) features. If found to be set during PMU initialization, the EBX
> bits of the same leaf can be used to determine the number of available LBR
> entries.
> 
> For better utilization of feature words, LbrExtV2 is added as a scattered
> feature bit.
> 
> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> ---
>  arch/x86/include/asm/cpufeatures.h | 2 +-
>  arch/x86/kernel/cpu/scattered.c    | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 393f2bbb5e3a..e3fa476a24b0 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -96,7 +96,7 @@
>  #define X86_FEATURE_SYSCALL32		( 3*32+14) /* "" syscall in IA32 userspace */
>  #define X86_FEATURE_SYSENTER32		( 3*32+15) /* "" sysenter in IA32 userspace */
>  #define X86_FEATURE_REP_GOOD		( 3*32+16) /* REP microcode works well */
> -/* FREE!                                ( 3*32+17) */
> +#define X86_FEATURE_LBREXT_V2		( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
>  #define X86_FEATURE_LFENCE_RDTSC	( 3*32+18) /* "" LFENCE synchronizes RDTSC */
>  #define X86_FEATURE_ACC_POWER		( 3*32+19) /* AMD Accumulated Power Mechanism */
>  #define X86_FEATURE_NOPL		( 3*32+20) /* The NOPL (0F 1F) instructions */
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index dbaa8326d6f2..6be46dffddbf 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>  	{ X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
>  	{ X86_FEATURE_MBA,		CPUID_EBX,  6, 0x80000008, 0 },
>  	{ X86_FEATURE_PERFMON_V2,	CPUID_EAX,  0, 0x80000022, 0 },
> +	{ X86_FEATURE_LBREXT_V2,	CPUID_EAX,  1, 0x80000022, 0 },
>  	{ 0, 0, 0, 0, 0 }
>  };

Would LBR_V2 work at all? It being a new version already seems to imply
extention, no? Then again, I suppose there's an argument to be had for
avoiding confusion vs the Intel LBR thing.. Couldn't you have called
this BRS_V2 :-)

Oh well...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit
  2022-08-15 11:27   ` Peter Zijlstra
@ 2022-08-15 19:42     ` Stephane Eranian
  2022-08-22  9:05       ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Stephane Eranian @ 2022-08-15 19:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sandipan Das, linux-kernel, linux-perf-users, x86, bp, acme,
	namhyung, jolsa, tglx, mingo, mark.rutland, alexander.shishkin,
	dave.hansen, like.xu.linux, ananth.narayan, ravi.bangoria,
	santosh.shukla

Hi,

On Mon, Aug 15, 2022 at 4:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Aug 11, 2022 at 05:59:52PM +0530, Sandipan Das wrote:
> > CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
> > monitoring features for AMD processors.
> >
> > Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
> > (LbrExtV2) features. If found to be set during PMU initialization, the EBX
> > bits of the same leaf can be used to determine the number of available LBR
> > entries.
> >
> > For better utilization of feature words, LbrExtV2 is added as a scattered
> > feature bit.
> >
> > Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> > ---
> >  arch/x86/include/asm/cpufeatures.h | 2 +-
> >  arch/x86/kernel/cpu/scattered.c    | 1 +
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > index 393f2bbb5e3a..e3fa476a24b0 100644
> > --- a/arch/x86/include/asm/cpufeatures.h
> > +++ b/arch/x86/include/asm/cpufeatures.h
> > @@ -96,7 +96,7 @@
> >  #define X86_FEATURE_SYSCALL32                ( 3*32+14) /* "" syscall in IA32 userspace */
> >  #define X86_FEATURE_SYSENTER32               ( 3*32+15) /* "" sysenter in IA32 userspace */
> >  #define X86_FEATURE_REP_GOOD         ( 3*32+16) /* REP microcode works well */
> > -/* FREE!                                ( 3*32+17) */
> > +#define X86_FEATURE_LBREXT_V2                ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
> >  #define X86_FEATURE_LFENCE_RDTSC     ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
> >  #define X86_FEATURE_ACC_POWER                ( 3*32+19) /* AMD Accumulated Power Mechanism */
> >  #define X86_FEATURE_NOPL             ( 3*32+20) /* The NOPL (0F 1F) instructions */
> > diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> > index dbaa8326d6f2..6be46dffddbf 100644
> > --- a/arch/x86/kernel/cpu/scattered.c
> > +++ b/arch/x86/kernel/cpu/scattered.c
> > @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> >       { X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
> >       { X86_FEATURE_MBA,              CPUID_EBX,  6, 0x80000008, 0 },
> >       { X86_FEATURE_PERFMON_V2,       CPUID_EAX,  0, 0x80000022, 0 },
> > +     { X86_FEATURE_LBREXT_V2,        CPUID_EAX,  1, 0x80000022, 0 },
> >       { 0, 0, 0, 0, 0 }
> >  };
>
> Would LBR_V2 work at all? It being a new version already seems to imply
> extention, no? Then again, I suppose there's an argument to be had for
> avoiding confusion vs the Intel LBR thing.. Couldn't you have called
> this BRS_V2 :-)
>
I believe it is called v2 because there was already a LBR in previous
generations, however it
was 1-deep and it was not connected to the PMU like this one. The
public PPR mentions it
(MSR 0x1DB/0x1DC, Last Branch From IP, Last Branch To IP). See for
instance the PPR
for Fam17h model 71h:
https://www.amd.com/system/files/TechDocs/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip

BRS is a model specific feature for Zen3.

LBRv2 is a great improvement including over Zen3 BRS.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit
  2022-08-15 19:42     ` Stephane Eranian
@ 2022-08-22  9:05       ` Peter Zijlstra
  2022-08-22 12:52         ` Sandipan Das
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2022-08-22  9:05 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: Sandipan Das, linux-kernel, linux-perf-users, x86, bp, acme,
	namhyung, jolsa, tglx, mingo, mark.rutland, alexander.shishkin,
	dave.hansen, like.xu.linux, ananth.narayan, ravi.bangoria,
	santosh.shukla

On Mon, Aug 15, 2022 at 12:42:23PM -0700, Stephane Eranian wrote:
> Hi,
> 
> On Mon, Aug 15, 2022 at 4:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Aug 11, 2022 at 05:59:52PM +0530, Sandipan Das wrote:
> > > CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
> > > monitoring features for AMD processors.
> > >
> > > Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
> > > (LbrExtV2) features. If found to be set during PMU initialization, the EBX
> > > bits of the same leaf can be used to determine the number of available LBR
> > > entries.
> > >
> > > For better utilization of feature words, LbrExtV2 is added as a scattered
> > > feature bit.
> > >
> > > Signed-off-by: Sandipan Das <sandipan.das@amd.com>
> > > ---
> > >  arch/x86/include/asm/cpufeatures.h | 2 +-
> > >  arch/x86/kernel/cpu/scattered.c    | 1 +
> > >  2 files changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> > > index 393f2bbb5e3a..e3fa476a24b0 100644
> > > --- a/arch/x86/include/asm/cpufeatures.h
> > > +++ b/arch/x86/include/asm/cpufeatures.h
> > > @@ -96,7 +96,7 @@
> > >  #define X86_FEATURE_SYSCALL32                ( 3*32+14) /* "" syscall in IA32 userspace */
> > >  #define X86_FEATURE_SYSENTER32               ( 3*32+15) /* "" sysenter in IA32 userspace */
> > >  #define X86_FEATURE_REP_GOOD         ( 3*32+16) /* REP microcode works well */
> > > -/* FREE!                                ( 3*32+17) */
> > > +#define X86_FEATURE_LBREXT_V2                ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
> > >  #define X86_FEATURE_LFENCE_RDTSC     ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
> > >  #define X86_FEATURE_ACC_POWER                ( 3*32+19) /* AMD Accumulated Power Mechanism */
> > >  #define X86_FEATURE_NOPL             ( 3*32+20) /* The NOPL (0F 1F) instructions */
> > > diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> > > index dbaa8326d6f2..6be46dffddbf 100644
> > > --- a/arch/x86/kernel/cpu/scattered.c
> > > +++ b/arch/x86/kernel/cpu/scattered.c
> > > @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
> > >       { X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
> > >       { X86_FEATURE_MBA,              CPUID_EBX,  6, 0x80000008, 0 },
> > >       { X86_FEATURE_PERFMON_V2,       CPUID_EAX,  0, 0x80000022, 0 },
> > > +     { X86_FEATURE_LBREXT_V2,        CPUID_EAX,  1, 0x80000022, 0 },
> > >       { 0, 0, 0, 0, 0 }
> > >  };
> >
> > Would LBR_V2 work at all? It being a new version already seems to imply
> > extention, no? Then again, I suppose there's an argument to be had for
> > avoiding confusion vs the Intel LBR thing.. Couldn't you have called
> > this BRS_V2 :-)
> >
> I believe it is called v2 because there was already a LBR in previous
> generations, however it

That's not the question; It's currently called LBREXT_V2, which is a bit
of a shit name. Then again LBR_V2 is too because AMD and Intel LBR are
quite different. So in that respect BRS_V2 would be an ever so much
better name.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit
  2022-08-22  9:05       ` Peter Zijlstra
@ 2022-08-22 12:52         ` Sandipan Das
  2022-08-22 13:26           ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Sandipan Das @ 2022-08-22 12:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-perf-users, x86, bp, acme, namhyung, jolsa,
	tglx, mingo, mark.rutland, alexander.shishkin, dave.hansen,
	like.xu.linux, Stephane Eranian, ananth.narayan, ravi.bangoria,
	santosh.shukla

Hi Peter,

On 8/22/2022 2:35 PM, Peter Zijlstra wrote:
> On Mon, Aug 15, 2022 at 12:42:23PM -0700, Stephane Eranian wrote:
>> Hi,
>>
>> On Mon, Aug 15, 2022 at 4:27 AM Peter Zijlstra <peterz@infradead.org> wrote:
>>>
>>> On Thu, Aug 11, 2022 at 05:59:52PM +0530, Sandipan Das wrote:
>>>> CPUID leaf 0x80000022 i.e. ExtPerfMonAndDbg advertises some new performance
>>>> monitoring features for AMD processors.
>>>>
>>>> Bit 1 of EAX indicates support for Last Branch Record Extension Version 2
>>>> (LbrExtV2) features. If found to be set during PMU initialization, the EBX
>>>> bits of the same leaf can be used to determine the number of available LBR
>>>> entries.
>>>>
>>>> For better utilization of feature words, LbrExtV2 is added as a scattered
>>>> feature bit.
>>>>
>>>> Signed-off-by: Sandipan Das <sandipan.das@amd.com>
>>>> ---
>>>>  arch/x86/include/asm/cpufeatures.h | 2 +-
>>>>  arch/x86/kernel/cpu/scattered.c    | 1 +
>>>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
>>>> index 393f2bbb5e3a..e3fa476a24b0 100644
>>>> --- a/arch/x86/include/asm/cpufeatures.h
>>>> +++ b/arch/x86/include/asm/cpufeatures.h
>>>> @@ -96,7 +96,7 @@
>>>>  #define X86_FEATURE_SYSCALL32                ( 3*32+14) /* "" syscall in IA32 userspace */
>>>>  #define X86_FEATURE_SYSENTER32               ( 3*32+15) /* "" sysenter in IA32 userspace */
>>>>  #define X86_FEATURE_REP_GOOD         ( 3*32+16) /* REP microcode works well */
>>>> -/* FREE!                                ( 3*32+17) */
>>>> +#define X86_FEATURE_LBREXT_V2                ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
>>>>  #define X86_FEATURE_LFENCE_RDTSC     ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
>>>>  #define X86_FEATURE_ACC_POWER                ( 3*32+19) /* AMD Accumulated Power Mechanism */
>>>>  #define X86_FEATURE_NOPL             ( 3*32+20) /* The NOPL (0F 1F) instructions */
>>>> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
>>>> index dbaa8326d6f2..6be46dffddbf 100644
>>>> --- a/arch/x86/kernel/cpu/scattered.c
>>>> +++ b/arch/x86/kernel/cpu/scattered.c
>>>> @@ -44,6 +44,7 @@ static const struct cpuid_bit cpuid_bits[] = {
>>>>       { X86_FEATURE_PROC_FEEDBACK,    CPUID_EDX, 11, 0x80000007, 0 },
>>>>       { X86_FEATURE_MBA,              CPUID_EBX,  6, 0x80000008, 0 },
>>>>       { X86_FEATURE_PERFMON_V2,       CPUID_EAX,  0, 0x80000022, 0 },
>>>> +     { X86_FEATURE_LBREXT_V2,        CPUID_EAX,  1, 0x80000022, 0 },
>>>>       { 0, 0, 0, 0, 0 }
>>>>  };
>>>
>>> Would LBR_V2 work at all? It being a new version already seems to imply
>>> extention, no? Then again, I suppose there's an argument to be had for
>>> avoiding confusion vs the Intel LBR thing.. Couldn't you have called
>>> this BRS_V2 :-)
>>>
>> I believe it is called v2 because there was already a LBR in previous
>> generations, however it
> 
> That's not the question; It's currently called LBREXT_V2, which is a bit
> of a shit name. Then again LBR_V2 is too because AMD and Intel LBR are
> quite different. So in that respect BRS_V2 would be an ever so much
> better name.

AMD LbrExtV2 is similar to Intel LBR. Unlike BRS, LbrExtV2 does not rely on
interrupt holding. The branch records are "frozen" at the time of counter
overflow.

- Sandipan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit
  2022-08-22 12:52         ` Sandipan Das
@ 2022-08-22 13:26           ` Peter Zijlstra
  2022-08-23  8:51             ` Sandipan Das
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2022-08-22 13:26 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, bp, acme, namhyung, jolsa,
	tglx, mingo, mark.rutland, alexander.shishkin, dave.hansen,
	like.xu.linux, Stephane Eranian, ananth.narayan, ravi.bangoria,
	santosh.shukla

On Mon, Aug 22, 2022 at 06:22:25PM +0530, Sandipan Das wrote:
> AMD LbrExtV2 is similar to Intel LBR. Unlike BRS, LbrExtV2 does not rely on

LbrExtV2 must be the most terrible name ever, please stop using it. Heck
your own code calls it lbr_v2 wherever it can.

So can we please just kill that name entirely?

$ quilt diff --combine - | grep -i lbrext_v2
+       if (x86_pmu.version < 2 || !boot_cpu_has(X86_FEATURE_LBREXT_V2))
+#define X86_FEATURE_LBREXT_V2          ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
+       { X86_FEATURE_LBREXT_V2,        CPUID_EAX,  1, 0x80000022, 0 },

Is the complete usage of this silly name.

> interrupt holding. The branch records are "frozen" at the time of counter
> overflow.

Yes, I get all that. It is also significantly different from Intel LBR
in all details and shares not a single line of code, so also calling it
LBR is confusing at best.

The MSRs are called AMD_SAMPL_BR, so why not call the thing BRS_V2 ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit
  2022-08-22 13:26           ` Peter Zijlstra
@ 2022-08-23  8:51             ` Sandipan Das
  2022-08-25 10:24               ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Sandipan Das @ 2022-08-23  8:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-perf-users, x86, bp, acme, namhyung, jolsa,
	tglx, mingo, mark.rutland, alexander.shishkin, dave.hansen,
	like.xu.linux, Stephane Eranian, ananth.narayan, ravi.bangoria,
	santosh.shukla

Hi Peter,

On 8/22/2022 6:56 PM, Peter Zijlstra wrote:
> On Mon, Aug 22, 2022 at 06:22:25PM +0530, Sandipan Das wrote:
>> AMD LbrExtV2 is similar to Intel LBR. Unlike BRS, LbrExtV2 does not rely on
> 
> LbrExtV2 must be the most terrible name ever, please stop using it. Heck
> your own code calls it lbr_v2 wherever it can.
> 
> So can we please just kill that name entirely?
> 
> $ quilt diff --combine - | grep -i lbrext_v2
> +       if (x86_pmu.version < 2 || !boot_cpu_has(X86_FEATURE_LBREXT_V2))
> +#define X86_FEATURE_LBREXT_V2          ( 3*32+17) /* AMD Last Branch Record Extension Version 2 */
> +       { X86_FEATURE_LBREXT_V2,        CPUID_EAX,  1, 0x80000022, 0 },
> 
> Is the complete usage of this silly name.
> 

I don't have any reservations about the name :) Its the name of the feature
bit from CPUID 0x80000022 EAX as mentioned in the processor manuals.

Unlike BRS, LBRv2 (if I may call it so) is an architected feature starting
with Zen4. BRS is model-specific and available only on Zen3. LBRv2 supersedes
it and from an architectural perspective, future platforms are slated to have
LBRv2 and not BRS. As Stephane alluded to earlier on this thread, AMD's legacy
LBR is 1-entry deep. We want to be able to differentiate from that as well as
BRS. For these reasons, and also to keep it disjoint from Intel's LBR, if you
so prefer, I can rename this feature to AMD_LBR_V2.

>> interrupt holding. The branch records are "frozen" at the time of counter
>> overflow.
> 
> Yes, I get all that. It is also significantly different from Intel LBR
> in all details and shares not a single line of code, so also calling it
> LBR is confusing at best.
> 
> The MSRs are called AMD_SAMPL_BR, so why not call the thing BRS_V2 ?

Sorry for the confusion with the register names. Since the SAMP_BR_FROM
and SAMP_BR_TO registers used by BRS have the same addresses as that of
the LBR_TO_V2 and LBR_FROM_V2 registers, I chose to reuse the definitions.

It it helps and considering LBRv2 is an architected feature, I can rename
AMD_SAMP_BR_* to AMD_LBR_V2_* or have these MSR definitions duplicated with
different names.

- Sandipan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit
  2022-08-23  8:51             ` Sandipan Das
@ 2022-08-25 10:24               ` Peter Zijlstra
  2022-08-25 12:26                 ` Sandipan Das
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2022-08-25 10:24 UTC (permalink / raw)
  To: Sandipan Das
  Cc: linux-kernel, linux-perf-users, x86, bp, acme, namhyung, jolsa,
	tglx, mingo, mark.rutland, alexander.shishkin, dave.hansen,
	like.xu.linux, Stephane Eranian, ananth.narayan, ravi.bangoria,
	santosh.shukla

On Tue, Aug 23, 2022 at 02:21:06PM +0530, Sandipan Das wrote:
> I can rename this feature to AMD_LBR_V2.

I've done that and stuck them in queue/perf/core. If the robots don't
hate on it I'll push them into tip.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit
  2022-08-25 10:24               ` Peter Zijlstra
@ 2022-08-25 12:26                 ` Sandipan Das
  0 siblings, 0 replies; 23+ messages in thread
From: Sandipan Das @ 2022-08-25 12:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-perf-users, x86, bp, acme, namhyung, jolsa,
	tglx, mingo, mark.rutland, alexander.shishkin, dave.hansen,
	like.xu.linux, Stephane Eranian, ananth.narayan, ravi.bangoria,
	santosh.shukla

On 8/25/2022 3:54 PM, Peter Zijlstra wrote:
> On Tue, Aug 23, 2022 at 02:21:06PM +0530, Sandipan Das wrote:
>> I can rename this feature to AMD_LBR_V2.
> 
> I've done that and stuck them in queue/perf/core. If the robots don't
> hate on it I'll push them into tip.

Thanks!


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-08-25 12:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-11 12:29 [PATCH 00/13] perf/x86/amd: Add AMD LbrExtV2 support Sandipan Das
2022-08-11 12:29 ` [PATCH 01/13] perf/x86/amd/brs: Move feature-specific functions Sandipan Das
2022-08-11 12:29 ` [PATCH 02/13] perf/x86/amd/core: Refactor branch attributes Sandipan Das
2022-08-11 12:29 ` [PATCH 03/13] perf/x86/amd/core: Add generic branch record interfaces Sandipan Das
2022-08-11 12:29 ` [PATCH 04/13] x86/cpufeatures: Add LbrExtV2 feature bit Sandipan Das
2022-08-11 13:13   ` Borislav Petkov
2022-08-15 11:27   ` Peter Zijlstra
2022-08-15 19:42     ` Stephane Eranian
2022-08-22  9:05       ` Peter Zijlstra
2022-08-22 12:52         ` Sandipan Das
2022-08-22 13:26           ` Peter Zijlstra
2022-08-23  8:51             ` Sandipan Das
2022-08-25 10:24               ` Peter Zijlstra
2022-08-25 12:26                 ` Sandipan Das
2022-08-11 12:29 ` [PATCH 05/13] perf/x86/amd/lbr: Detect LbrExtV2 support Sandipan Das
2022-08-11 12:29 ` [PATCH 06/13] perf/x86/amd/lbr: Add LbrExtV2 branch record support Sandipan Das
2022-08-11 12:29 ` [PATCH 07/13] perf/x86/amd/lbr: Add LbrExtV2 hardware branch filter support Sandipan Das
2022-08-11 12:29 ` [PATCH 08/13] perf/x86: Move branch classifier Sandipan Das
2022-08-11 12:29 ` [PATCH 09/13] perf/x86/amd/lbr: Add LbrExtV2 software branch filter support Sandipan Das
2022-08-11 12:29 ` [PATCH 10/13] perf/x86: Make branch classifier fusion-aware Sandipan Das
2022-08-11 12:29 ` [PATCH 11/13] perf/x86/amd/lbr: Use fusion-aware branch classifier Sandipan Das
2022-08-11 12:30 ` [PATCH 12/13] perf/core: Add speculation info to branch entries Sandipan Das
2022-08-11 12:30 ` [PATCH 13/13] perf/x86/amd/lbr: Add LbrExtV2 branch speculation info support Sandipan Das

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).