linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] perf/x86/amd: Miscellaneous updates
@ 2020-09-08 21:47 Kim Phillips
  2020-09-08 21:47 ` [PATCH v2 1/7] perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour Kim Phillips
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Kim Phillips @ 2020-09-08 21:47 UTC (permalink / raw)
  To: Borislav Petkov, Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Ingo Molnar, Thomas Gleixner, kim.phillips
  Cc: Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Jiri Olsa, Mark Rutland, Michael Petlan,
	Namhyung Kim, LKML, x86

Hi,

Please merge:

Patch 1/7 restores 'perf stat -a' behaviour to program the uncore PMU
to count all CPU threads.

Patch 2/7 fixes setting the proper count when sampling Large Increment
per Cycle events / 'paired' events.

The next 4 patches fix IBS Fetch sampling on F17h and some other IBS
fine tuning, the last one greatly reducing the number of interrupts
when large sample periods are specified.

Finally, patch 7/7 extends Family 17h RAPL support to also work on
compatible F19h machines.

Thanks,

Kim

v2: Added this cover letter.  The only patch that changed in this
new submission is the fix for IBS Fetch sampling, which got a new
family check after testing on some other family machines.

Link to 1st patch in original submission of this series:

https://lore.kernel.org/patchwork/patch/1289806/

Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: x86 <x86@kernel.org>

Kim Phillips (7):
  perf/amd/uncore: Set all slices and threads to restore perf stat -a
    behaviour
  perf/x86/amd: Fix sampling Large Increment per Cycle events
  arch/x86/amd/ibs: Fix re-arming IBS Fetch
  perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()
  perf/x86/amd/ibs: Fix raw sample data accumulation
  perf/x86/amd/ibs: Support 27-bit extended Op/cycle counter
  perf/x86/rapl: Add AMD Fam19h RAPL support

 arch/x86/events/amd/ibs.c         | 87 ++++++++++++++++++++++---------
 arch/x86/events/amd/uncore.c      | 28 +++-------
 arch/x86/events/core.c            |  4 +-
 arch/x86/events/rapl.c            |  1 +
 arch/x86/include/asm/msr-index.h  |  1 +
 arch/x86/include/asm/perf_event.h |  1 +
 6 files changed, 75 insertions(+), 47 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/7] perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour
  2020-09-08 21:47 [PATCH v2 0/7] perf/x86/amd: Miscellaneous updates Kim Phillips
@ 2020-09-08 21:47 ` Kim Phillips
  2020-09-10 16:34   ` Sasha Levin
  2020-09-11  7:02   ` [tip: perf/core] " tip-bot2 for Kim Phillips
  2020-09-08 21:47 ` [PATCH v2 2/7] perf/x86/amd: Fix sampling Large Increment per Cycle events Kim Phillips
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 16+ messages in thread
From: Kim Phillips @ 2020-09-08 21:47 UTC (permalink / raw)
  To: Borislav Petkov, Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Ingo Molnar, Thomas Gleixner, kim.phillips
  Cc: Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Jiri Olsa, Mark Rutland, Michael Petlan,
	Namhyung Kim, LKML, x86, stable

Commit 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for
F17h L3 PMCs") inadvertently changed the uncore driver's behaviour
wrt perf tool invocations with or without a CPU list, specified with
-C / --cpu=.

Change the behaviour of the driver to assume the former all-cpu (-a)
case, which is the more commonly desired default.  This fixes
'-a -A' invocations without explicit cpu lists (-C) to not count
L3 events only on behalf of the first thread of the first core
in the L3 domain.

BEFORE:

Activity performed by the first thread of the last core (CPU#43) in
CPU#40's L3 domain is not reported by CPU#40:

sudo perf stat -a -A -e l3_request_g1.caching_l3_cache_accesses taskset -c 43 perf bench mem memcpy -s 32mb -l 100 -f default
...
CPU36                 21,835      l3_request_g1.caching_l3_cache_accesses
CPU40                 87,066      l3_request_g1.caching_l3_cache_accesses
CPU44                 17,360      l3_request_g1.caching_l3_cache_accesses
...

AFTER:

The L3 domain activity is now reported by CPU#40:

sudo perf stat -a -A -e l3_request_g1.caching_l3_cache_accesses taskset -c 43 perf bench mem memcpy -s 32mb -l 100 -f default
...
CPU36                354,891      l3_request_g1.caching_l3_cache_accesses
CPU40              1,780,870      l3_request_g1.caching_l3_cache_accesses
CPU44                315,062      l3_request_g1.caching_l3_cache_accesses
...

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Fixes: 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs")
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: x86 <x86@kernel.org>
Cc: stable@vger.kernel.org
---
v2: no changes.

Original submission:

https://lore.kernel.org/lkml/20200323233159.19601-1-kim.phillips@amd.com/

 arch/x86/events/amd/uncore.c | 28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 76400c052b0e..e7e61c8b56bd 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -181,28 +181,16 @@ static void amd_uncore_del(struct perf_event *event, int flags)
 }
 
 /*
- * Convert logical CPU number to L3 PMC Config ThreadMask format
+ * Return a full thread and slice mask until per-CPU is
+ * properly supported.
  */
-static u64 l3_thread_slice_mask(int cpu)
+static u64 l3_thread_slice_mask(void)
 {
-	u64 thread_mask, core = topology_core_id(cpu);
-	unsigned int shift, thread = 0;
+	if (boot_cpu_data.x86 <= 0x18)
+		return AMD64_L3_SLICE_MASK | AMD64_L3_THREAD_MASK;
 
-	if (topology_smt_supported() && !topology_is_primary_thread(cpu))
-		thread = 1;
-
-	if (boot_cpu_data.x86 <= 0x18) {
-		shift = AMD64_L3_THREAD_SHIFT + 2 * (core % 4) + thread;
-		thread_mask = BIT_ULL(shift);
-
-		return AMD64_L3_SLICE_MASK | thread_mask;
-	}
-
-	core = (core << AMD64_L3_COREID_SHIFT) & AMD64_L3_COREID_MASK;
-	shift = AMD64_L3_THREAD_SHIFT + thread;
-	thread_mask = BIT_ULL(shift);
-
-	return AMD64_L3_EN_ALL_SLICES | core | thread_mask;
+	return AMD64_L3_EN_ALL_SLICES | AMD64_L3_EN_ALL_CORES |
+	       AMD64_L3_F19H_THREAD_MASK;
 }
 
 static int amd_uncore_event_init(struct perf_event *event)
@@ -232,7 +220,7 @@ static int amd_uncore_event_init(struct perf_event *event)
 	 * For other events, the two fields do not affect the count.
 	 */
 	if (l3_mask && is_llc_event(event))
-		hwc->config |= l3_thread_slice_mask(event->cpu);
+		hwc->config |= l3_thread_slice_mask();
 
 	uncore = event_to_amd_uncore(event);
 	if (!uncore)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/7] perf/x86/amd: Fix sampling Large Increment per Cycle events
  2020-09-08 21:47 [PATCH v2 0/7] perf/x86/amd: Miscellaneous updates Kim Phillips
  2020-09-08 21:47 ` [PATCH v2 1/7] perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour Kim Phillips
@ 2020-09-08 21:47 ` Kim Phillips
  2020-09-08 21:47 ` [PATCH v2 3/7] arch/x86/amd/ibs: Fix re-arming IBS Fetch Kim Phillips
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Kim Phillips @ 2020-09-08 21:47 UTC (permalink / raw)
  To: Borislav Petkov, Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Ingo Molnar, Thomas Gleixner, kim.phillips
  Cc: Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Jiri Olsa, Mark Rutland, Michael Petlan,
	Namhyung Kim, LKML, x86, stable

Commit 5738891229a2 ("perf/x86/amd: Add support for Large Increment
per Cycle Events") mistakenly zeroes the upper 16 bits of the count
in set_period().  That's fine for counting with perf stat, but not
sampling with perf record when only Large Increment events are being
sampled.  To enable sampling, we sign extend the upper 16 bits of the
merged counter pair as described in the Family 17h PPRs:

"Software wanting to preload a value to a merged counter pair writes the
high-order 16-bit value to the low-order 16 bits of the odd counter and
then writes the low-order 48-bit value to the even counter. Reading the
even counter of the merged counter pair returns the full 64-bit value."

Fixes: 5738891229a2 ("perf/x86/amd: Add support for Large Increment per Cycle Events")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: x86 <x86@kernel.org>
Cc: stable@vger.kernel.org
---
v2: no changes.

 arch/x86/events/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1cbf57dc2ac8..2fdc211e3e56 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1284,11 +1284,11 @@ int x86_perf_event_set_period(struct perf_event *event)
 	wrmsrl(hwc->event_base, (u64)(-left) & x86_pmu.cntval_mask);
 
 	/*
-	 * Clear the Merge event counter's upper 16 bits since
+	 * Sign extend the Merge event counter's upper 16 bits since
 	 * we currently declare a 48-bit counter width
 	 */
 	if (is_counter_pair(hwc))
-		wrmsrl(x86_pmu_event_addr(idx + 1), 0);
+		wrmsrl(x86_pmu_event_addr(idx + 1), 0xffff);
 
 	/*
 	 * Due to erratum on certan cpu we need
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/7] arch/x86/amd/ibs: Fix re-arming IBS Fetch
  2020-09-08 21:47 [PATCH v2 0/7] perf/x86/amd: Miscellaneous updates Kim Phillips
  2020-09-08 21:47 ` [PATCH v2 1/7] perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour Kim Phillips
  2020-09-08 21:47 ` [PATCH v2 2/7] perf/x86/amd: Fix sampling Large Increment per Cycle events Kim Phillips
@ 2020-09-08 21:47 ` Kim Phillips
  2020-09-10  8:32   ` peterz
  2020-09-08 21:47 ` [PATCH v2 4/7] perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count() Kim Phillips
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Kim Phillips @ 2020-09-08 21:47 UTC (permalink / raw)
  To: Borislav Petkov, Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Ingo Molnar, Thomas Gleixner, kim.phillips
  Cc: Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Jiri Olsa, Mark Rutland, Michael Petlan,
	Namhyung Kim, LKML, x86, Stephane Eranian, stable

Stephane Eranian found a bug in that IBS' current Fetch counter was not
being reset when the driver would write the new value to clear it along
with the enable bit set, and found that adding an MSR write that would
first disable IBS Fetch would make IBS Fetch reset its current count.

Indeed, the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54 - Sep 12,
2019 states "The periodic fetch counter is set to IbsFetchCnt [...] when
IbsFetchEn is changed from 0 to 1."

Explicitly set IbsFetchEn to 0 and then to 1 when re-enabling IBS Fetch,
so the driver properly resets the internal counter to 0 and IBS
Fetch starts counting again.

A family 15h machine tested does not have this problem, and the extra
wrmsr is also not needed on Family 19h, so only do the extra wrmsr on
families 16h through 18h.

Reported-by: Stephane Eranian <stephane.eranian@google.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: x86 <x86@kernel.org>
Cc: stable@vger.kernel.org
---
v2: constrained the extra wrmsr to Families 16h through 18h, inclusive.

 arch/x86/events/amd/ibs.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 26c36357c4c9..3eb9a55e998c 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -363,7 +363,14 @@ perf_ibs_event_update(struct perf_ibs *perf_ibs, struct perf_event *event,
 static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
 					 struct hw_perf_event *hwc, u64 config)
 {
-	wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask);
+	u64 _config = (hwc->config | config) & ~perf_ibs->enable_mask;
+
+	/* On Fam17h, the periodic fetch counter is set when IbsFetchEn is changed from 0 to 1 */
+	if (perf_ibs == &perf_ibs_fetch && boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
+		wrmsrl(hwc->config_base, _config);
+
+ 	_config |= perf_ibs->enable_mask;
+	wrmsrl(hwc->config_base, _config);
 }
 
 /*
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 4/7] perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()
  2020-09-08 21:47 [PATCH v2 0/7] perf/x86/amd: Miscellaneous updates Kim Phillips
                   ` (2 preceding siblings ...)
  2020-09-08 21:47 ` [PATCH v2 3/7] arch/x86/amd/ibs: Fix re-arming IBS Fetch Kim Phillips
@ 2020-09-08 21:47 ` Kim Phillips
  2020-09-08 21:47 ` [PATCH v2 5/7] perf/x86/amd/ibs: Fix raw sample data accumulation Kim Phillips
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Kim Phillips @ 2020-09-08 21:47 UTC (permalink / raw)
  To: Borislav Petkov, Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Ingo Molnar, Thomas Gleixner, kim.phillips
  Cc: Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Jiri Olsa, Mark Rutland, Michael Petlan,
	Namhyung Kim, LKML, x86, stable

get_ibs_op_count() adds hardware's current count (IbsOpCurCnt) bits
to its count regardless of hardware's valid status.

According to the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54,
if the counter rolls over, valid status is set, and the lower 7 bits
of IbsOpCurCnt are randomized by hardware.

Don't include those bits in the driver's event count.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Fixes: 8b1e13638d46 ("perf/x86-ibs: Fix usage of IBS op current count")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: x86 <x86@kernel.org>
Cc: stable@vger.kernel.org
---
v2: no changes.

 arch/x86/events/amd/ibs.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 3eb9a55e998c..68776cc291a6 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -334,11 +334,15 @@ static u64 get_ibs_op_count(u64 config)
 {
 	u64 count = 0;
 
+	/*
+	 * If the internal 27-bit counter rolled over, the count is MaxCnt
+	 * and the lower 7 bits of CurCnt are randomized.
+	 * Otherwise CurCnt has the full 27-bit current counter value.
+	 */
 	if (config & IBS_OP_VAL)
-		count += (config & IBS_OP_MAX_CNT) << 4; /* cnt rolled over */
-
-	if (ibs_caps & IBS_CAPS_RDWROPCNT)
-		count += (config & IBS_OP_CUR_CNT) >> 32;
+		count = (config & IBS_OP_MAX_CNT) << 4;
+	else if (ibs_caps & IBS_CAPS_RDWROPCNT)
+		count = (config & IBS_OP_CUR_CNT) >> 32;
 
 	return count;
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 5/7] perf/x86/amd/ibs: Fix raw sample data accumulation
  2020-09-08 21:47 [PATCH v2 0/7] perf/x86/amd: Miscellaneous updates Kim Phillips
                   ` (3 preceding siblings ...)
  2020-09-08 21:47 ` [PATCH v2 4/7] perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count() Kim Phillips
@ 2020-09-08 21:47 ` Kim Phillips
  2020-09-11  7:02   ` [tip: perf/core] " tip-bot2 for Kim Phillips
  2020-09-08 21:47 ` [PATCH v2 6/7] perf/x86/amd/ibs: Support 27-bit extended Op/cycle counter Kim Phillips
  2020-09-08 21:47 ` [PATCH v2 7/7] perf/x86/rapl: Add AMD Fam19h RAPL support Kim Phillips
  6 siblings, 1 reply; 16+ messages in thread
From: Kim Phillips @ 2020-09-08 21:47 UTC (permalink / raw)
  To: Borislav Petkov, Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Ingo Molnar, Thomas Gleixner, kim.phillips
  Cc: Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Jiri Olsa, Mark Rutland, Michael Petlan,
	Namhyung Kim, LKML, x86, Stephane Eranian, stable

Neither IbsBrTarget nor OPDATA4 are populated in IBS Fetch mode.
Don't accumulate them into raw sample user data in that case.

Also, in Fetch mode, add saving the IBS Fetch Control Extended MSR.

Technically, there is an ABI change here with respect to the IBS raw
sample data format, but I don't see any perf driver version information
being included in perf.data file headers, but, existing users can detect
whether the size of the sample record has reduced by 8 bytes to
determine whether the IBS driver has this fix.

Reported-by: Stephane Eranian <stephane.eranian@google.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Fixes: 904cb3677f3a ("perf/x86/amd/ibs: Update IBS MSRs and feature definitions")
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: x86 <x86@kernel.org>
Cc: stable@vger.kernel.org
---
v2: no changes.

 arch/x86/events/amd/ibs.c        | 26 ++++++++++++++++----------
 arch/x86/include/asm/msr-index.h |  1 +
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 68776cc291a6..ace28be4cbda 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -637,18 +637,24 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 				       perf_ibs->offset_max,
 				       offset + 1);
 	} while (offset < offset_max);
+	/*
+	 * Read IbsBrTarget, IbsOpData4, and IbsExtdCtl separately
+	 * depending on their availability.
+	 * Can't add to offset_max as they are staggered
+	 */
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
-		/*
-		 * Read IbsBrTarget and IbsOpData4 separately
-		 * depending on their availability.
-		 * Can't add to offset_max as they are staggered
-		 */
-		if (ibs_caps & IBS_CAPS_BRNTRGT) {
-			rdmsrl(MSR_AMD64_IBSBRTARGET, *buf++);
-			size++;
+		if (perf_ibs == &perf_ibs_op) {
+			if (ibs_caps & IBS_CAPS_BRNTRGT) {
+				rdmsrl(MSR_AMD64_IBSBRTARGET, *buf++);
+				size++;
+			}
+			if (ibs_caps & IBS_CAPS_OPDATA4) {
+				rdmsrl(MSR_AMD64_IBSOPDATA4, *buf++);
+				size++;
+			}
 		}
-		if (ibs_caps & IBS_CAPS_OPDATA4) {
-			rdmsrl(MSR_AMD64_IBSOPDATA4, *buf++);
+		if (perf_ibs == &perf_ibs_fetch && (ibs_caps & IBS_CAPS_FETCHCTLEXTD)) {
+			rdmsrl(MSR_AMD64_ICIBSEXTDCTL, *buf++);
 			size++;
 		}
 	}
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 2859ee4f39a8..b08c8a2afc0e 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -464,6 +464,7 @@
 #define MSR_AMD64_IBSOP_REG_MASK	((1UL<<MSR_AMD64_IBSOP_REG_COUNT)-1)
 #define MSR_AMD64_IBSCTL		0xc001103a
 #define MSR_AMD64_IBSBRTARGET		0xc001103b
+#define MSR_AMD64_ICIBSEXTDCTL		0xc001103c
 #define MSR_AMD64_IBSOPDATA4		0xc001103d
 #define MSR_AMD64_IBS_REG_COUNT_MAX	8 /* includes MSR_AMD64_IBSBRTARGET */
 #define MSR_AMD64_SEV			0xc0010131
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 6/7] perf/x86/amd/ibs: Support 27-bit extended Op/cycle counter
  2020-09-08 21:47 [PATCH v2 0/7] perf/x86/amd: Miscellaneous updates Kim Phillips
                   ` (4 preceding siblings ...)
  2020-09-08 21:47 ` [PATCH v2 5/7] perf/x86/amd/ibs: Fix raw sample data accumulation Kim Phillips
@ 2020-09-08 21:47 ` Kim Phillips
  2020-09-11  7:02   ` [tip: perf/core] " tip-bot2 for Kim Phillips
  2020-09-08 21:47 ` [PATCH v2 7/7] perf/x86/rapl: Add AMD Fam19h RAPL support Kim Phillips
  6 siblings, 1 reply; 16+ messages in thread
From: Kim Phillips @ 2020-09-08 21:47 UTC (permalink / raw)
  To: Borislav Petkov, Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Ingo Molnar, Thomas Gleixner, kim.phillips
  Cc: Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Jiri Olsa, Mark Rutland, Michael Petlan,
	Namhyung Kim, LKML, x86

IBS hardware with the OpCntExt feature gets a 7-bit wider internal
counter.  Both the maximum and current count bitfields in the
IBS_OP_CTL register are extended to support reading and writing it.

No changes are necessary to the driver for handling the extra
contiguous current count bits (IbsOpCurCnt), as the driver already
passes through 32 bits of that field.  However, the driver has to do
some extra bit manipulation when converting from a period to the
non-contiguous (although conveniently aligned) extra bits in the
IbsOpMaxCnt bitfield.

This decreases IBS Op interrupt overhead when the period is over
1,048,560 (0xffff0), which would previously activate the driver's
software counter.  That threshold is now 134,217,712 (0x7fffff0).

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: x86 <x86@kernel.org>
---
v2: no changes.

 arch/x86/events/amd/ibs.c         | 42 +++++++++++++++++++++++--------
 arch/x86/include/asm/perf_event.h |  1 +
 2 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index ace28be4cbda..2766a7763d38 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -339,10 +339,13 @@ static u64 get_ibs_op_count(u64 config)
 	 * and the lower 7 bits of CurCnt are randomized.
 	 * Otherwise CurCnt has the full 27-bit current counter value.
 	 */
-	if (config & IBS_OP_VAL)
+	if (config & IBS_OP_VAL) {
 		count = (config & IBS_OP_MAX_CNT) << 4;
-	else if (ibs_caps & IBS_CAPS_RDWROPCNT)
+		if (ibs_caps & IBS_CAPS_OPCNTEXT)
+			count += config & IBS_OP_MAX_CNT_EXT_MASK;
+	} else if (ibs_caps & IBS_CAPS_RDWROPCNT) {
 		count = (config & IBS_OP_CUR_CNT) >> 32;
+	}
 
 	return count;
 }
@@ -405,7 +408,7 @@ static void perf_ibs_start(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_ibs *perf_ibs = container_of(event->pmu, struct perf_ibs, pmu);
 	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
-	u64 period;
+	u64 period, config = 0;
 
 	if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
 		return;
@@ -414,13 +417,19 @@ static void perf_ibs_start(struct perf_event *event, int flags)
 	hwc->state = 0;
 
 	perf_ibs_set_period(perf_ibs, hwc, &period);
+	if (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_OPCNTEXT)) {
+		config |= period & IBS_OP_MAX_CNT_EXT_MASK;
+		period &= ~IBS_OP_MAX_CNT_EXT_MASK;
+	}
+	config |= period >> 4;
+
 	/*
 	 * Set STARTED before enabling the hardware, such that a subsequent NMI
 	 * must observe it.
 	 */
 	set_bit(IBS_STARTED,    pcpu->state);
 	clear_bit(IBS_STOPPING, pcpu->state);
-	perf_ibs_enable_event(perf_ibs, hwc, period >> 4);
+	perf_ibs_enable_event(perf_ibs, hwc, config); 
 
 	perf_event_update_userpage(event);
 }
@@ -588,7 +597,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	struct perf_ibs_data ibs_data;
 	int offset, size, check_rip, offset_max, throttle = 0;
 	unsigned int msr;
-	u64 *buf, *config, period;
+	u64 *buf, *config, period, new_config = 0;
 
 	if (!test_bit(IBS_STARTED, pcpu->state)) {
 fail:
@@ -683,13 +692,17 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	if (throttle) {
 		perf_ibs_stop(event, 0);
 	} else {
-		period >>= 4;
-
-		if ((ibs_caps & IBS_CAPS_RDWROPCNT) &&
-		    (*config & IBS_OP_CNT_CTL))
-			period |= *config & IBS_OP_CUR_CNT_RAND;
+		if (perf_ibs == &perf_ibs_op) {
+			if (ibs_caps & IBS_CAPS_OPCNTEXT) {
+				new_config = period & IBS_OP_MAX_CNT_EXT_MASK;
+				period &= ~IBS_OP_MAX_CNT_EXT_MASK;
+			}
+			if ((ibs_caps & IBS_CAPS_RDWROPCNT) && (*config & IBS_OP_CNT_CTL))
+				new_config |= *config & IBS_OP_CUR_CNT_RAND;
+		}
+		new_config |= period >> 4;
 
-		perf_ibs_enable_event(perf_ibs, hwc, period);
+		perf_ibs_enable_event(perf_ibs, hwc, new_config);
 	}
 
 	perf_event_update_userpage(event);
@@ -756,6 +769,13 @@ static __init void perf_event_ibs_init(void)
 		perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
 		*attr++ = &format_attr_cnt_ctl.attr;
 	}
+
+	if (ibs_caps & IBS_CAPS_OPCNTEXT) {
+		perf_ibs_op.max_period  |= IBS_OP_MAX_CNT_EXT_MASK;
+		perf_ibs_op.config_mask	|= IBS_OP_MAX_CNT_EXT_MASK;
+		perf_ibs_op.cnt_mask    |= IBS_OP_MAX_CNT_EXT_MASK;
+	}
+
 	perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
 
 	register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 0c1b13720525..841b0006c74b 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -334,6 +334,7 @@ struct pebs_xmm {
 #define IBS_OP_ENABLE		(1ULL<<17)
 #define IBS_OP_MAX_CNT		0x0000FFFFULL
 #define IBS_OP_MAX_CNT_EXT	0x007FFFFFULL	/* not a register bit mask */
+#define IBS_OP_MAX_CNT_EXT_MASK	(0x7FULL<<20)	/* separate upper 7 bits */
 #define IBS_RIP_INVALID		(1ULL<<38)
 
 #ifdef CONFIG_X86_LOCAL_APIC
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 7/7] perf/x86/rapl: Add AMD Fam19h RAPL support
  2020-09-08 21:47 [PATCH v2 0/7] perf/x86/amd: Miscellaneous updates Kim Phillips
                   ` (5 preceding siblings ...)
  2020-09-08 21:47 ` [PATCH v2 6/7] perf/x86/amd/ibs: Support 27-bit extended Op/cycle counter Kim Phillips
@ 2020-09-08 21:47 ` Kim Phillips
  2020-09-11  7:02   ` [tip: perf/core] " tip-bot2 for Kim Phillips
  6 siblings, 1 reply; 16+ messages in thread
From: Kim Phillips @ 2020-09-08 21:47 UTC (permalink / raw)
  To: Borislav Petkov, Borislav Petkov, Peter Zijlstra, Ingo Molnar,
	Ingo Molnar, Thomas Gleixner, kim.phillips
  Cc: Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	H. Peter Anvin, Jiri Olsa, Mark Rutland, Michael Petlan,
	Namhyung Kim, LKML, x86

Family 19h RAPL support did not change from Family 17h; extend
the existing Fam17h support to work on Family 19h too.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: x86 <x86@kernel.org>
---
v2: no changes.

 arch/x86/events/rapl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 67b411f7e8c4..7c0120e2e957 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -815,6 +815,7 @@ static const struct x86_cpu_id rapl_model_match[] __initconst = {
 	X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,	&model_spr),
 	X86_MATCH_VENDOR_FAM(AMD,	0x17,		&model_amd_fam17h),
 	X86_MATCH_VENDOR_FAM(HYGON,	0x18,		&model_amd_fam17h),
+	X86_MATCH_VENDOR_FAM(AMD,	0x19,		&model_amd_fam17h),
 	{},
 };
 MODULE_DEVICE_TABLE(x86cpu, rapl_model_match);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/7] arch/x86/amd/ibs: Fix re-arming IBS Fetch
  2020-09-08 21:47 ` [PATCH v2 3/7] arch/x86/amd/ibs: Fix re-arming IBS Fetch Kim Phillips
@ 2020-09-10  8:32   ` peterz
  2020-09-10  8:50     ` peterz
  0 siblings, 1 reply; 16+ messages in thread
From: peterz @ 2020-09-10  8:32 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Borislav Petkov, Borislav Petkov, Ingo Molnar, Ingo Molnar,
	Thomas Gleixner, Stephane Eranian, Alexander Shishkin,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Jiri Olsa,
	Mark Rutland, Michael Petlan, Namhyung Kim, LKML, x86,
	Stephane Eranian, stable

On Tue, Sep 08, 2020 at 04:47:36PM -0500, Kim Phillips wrote:
> Stephane Eranian found a bug in that IBS' current Fetch counter was not
> being reset when the driver would write the new value to clear it along
> with the enable bit set, and found that adding an MSR write that would
> first disable IBS Fetch would make IBS Fetch reset its current count.
> 
> Indeed, the PPR for AMD Family 17h Model 31h B0 55803 Rev 0.54 - Sep 12,
> 2019 states "The periodic fetch counter is set to IbsFetchCnt [...] when
> IbsFetchEn is changed from 0 to 1."
> 
> Explicitly set IbsFetchEn to 0 and then to 1 when re-enabling IBS Fetch,
> so the driver properly resets the internal counter to 0 and IBS
> Fetch starts counting again.
> 
> A family 15h machine tested does not have this problem, and the extra
> wrmsr is also not needed on Family 19h, so only do the extra wrmsr on
> families 16h through 18h.

*groan*...

> ---
>  arch/x86/events/amd/ibs.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 26c36357c4c9..3eb9a55e998c 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -363,7 +363,14 @@ perf_ibs_event_update(struct perf_ibs *perf_ibs, struct perf_event *event,
>  static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
>  					 struct hw_perf_event *hwc, u64 config)
>  {
> -	wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask);
> +	u64 _config = (hwc->config | config) & ~perf_ibs->enable_mask;
> +
> +	/* On Fam17h, the periodic fetch counter is set when IbsFetchEn is changed from 0 to 1 */
> +	if (perf_ibs == &perf_ibs_fetch && boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
> +		wrmsrl(hwc->config_base, _config);

That's an anti-patttern (and for some reason this is the second time in
a week I've seen it). This is a fairly hot path and you're adding a
whole bunch of loads and branches.

Granted, in case you need the extra wrmsr that's probably noise, but
supposedly you're going to be fixing this in hardware eventually, and
you'll be getting rid of the second wrmsr again. But then you're stuck
with the loads and branches.

A better option would be to use hwc->flags, you're loading from that
line already, so it's guaranteed hot and then you only have a single
branch. Or stick it in perf_ibs near enable_mask, same difference.

> + 	_config |= perf_ibs->enable_mask;
> +	wrmsrl(hwc->config_base, _config);
>  }
>  
>  /*
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/7] arch/x86/amd/ibs: Fix re-arming IBS Fetch
  2020-09-10  8:32   ` peterz
@ 2020-09-10  8:50     ` peterz
  2020-09-10 15:58       ` Kim Phillips
  0 siblings, 1 reply; 16+ messages in thread
From: peterz @ 2020-09-10  8:50 UTC (permalink / raw)
  To: Kim Phillips
  Cc: Borislav Petkov, Borislav Petkov, Ingo Molnar, Ingo Molnar,
	Thomas Gleixner, Stephane Eranian, Alexander Shishkin,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Jiri Olsa,
	Mark Rutland, Michael Petlan, Namhyung Kim, LKML, x86,
	Stephane Eranian, stable

On Thu, Sep 10, 2020 at 10:32:23AM +0200, peterz@infradead.org wrote:
> > @@ -363,7 +363,14 @@ perf_ibs_event_update(struct perf_ibs *perf_ibs, struct perf_event *event,
> >  static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
> >  					 struct hw_perf_event *hwc, u64 config)
> >  {
> > -	wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask);
> > +	u64 _config = (hwc->config | config) & ~perf_ibs->enable_mask;
> > +
> > +	/* On Fam17h, the periodic fetch counter is set when IbsFetchEn is changed from 0 to 1 */
> > +	if (perf_ibs == &perf_ibs_fetch && boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
> > +		wrmsrl(hwc->config_base, _config);

> A better option would be to use hwc->flags, you're loading from that
> line already, so it's guaranteed hot and then you only have a single
> branch. Or stick it in perf_ibs near enable_mask, same difference.

I fixed it for you.

---

struct perf_ibs {
	struct pmu                 pmu;                  /*     0   296 */
	/* --- cacheline 4 boundary (256 bytes) was 40 bytes ago --- */
	unsigned int               msr;                  /*   296     4 */

	/* XXX 4 bytes hole, try to pack */

	u64                        config_mask;          /*   304     8 */
	u64                        cnt_mask;             /*   312     8 */
	/* --- cacheline 5 boundary (320 bytes) --- */
	u64                        enable_mask;          /*   320     8 */
	u64                        valid_mask;           /*   328     8 */
	u64                        max_period;           /*   336     8 */
	long unsigned int          offset_mask[1];       /*   344     8 */
	int                        offset_max;           /*   352     4 */
	unsigned int               fetch_count_reset_broken:1; /*   356: 0  4 */

	/* XXX 31 bits hole, try to pack */

	struct cpu_perf_ibs *      pcpu;                 /*   360     8 */
	struct attribute * *       format_attrs;         /*   368     8 */
	struct attribute_group     format_group;         /*   376    40 */
	/* --- cacheline 6 boundary (384 bytes) was 32 bytes ago --- */
	const struct attribute_group  * attr_groups[2];  /*   416    16 */
	u64                        (*get_count)(u64);    /*   432     8 */

	/* size: 440, cachelines: 7, members: 15 */
	/* sum members: 432, holes: 1, sum holes: 4 */
	/* sum bitfield members: 1 bits, bit holes: 1, sum bit holes: 31 bits */
	/* last cacheline: 56 bytes */
};

--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -89,6 +89,7 @@ struct perf_ibs {
 	u64				max_period;
 	unsigned long			offset_mask[1];
 	int				offset_max;
+	unsigned int			fetch_count_reset_broken : 1;
 	struct cpu_perf_ibs __percpu	*pcpu;
 
 	struct attribute		**format_attrs;
@@ -370,7 +371,13 @@ perf_ibs_event_update(struct perf_ibs *p
 static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
 					 struct hw_perf_event *hwc, u64 config)
 {
-	wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask);
+	u64 _config = (hwc->config | config) & ~perf_ibs->enable_mask;
+
+	if (perf_ibs->fetch_count_reset_broken)
+		wrmsrl(hwc->config_base, _config);
+
+ 	_config |= perf_ibs->enable_mask;
+	wrmsrl(hwc->config_base, _config);
 }
 
 /*
@@ -756,6 +763,13 @@ static __init void perf_event_ibs_init(v
 {
 	struct attribute **attr = ibs_op_format_attrs;
 
+	/*
+	 * Some chips fail to reset the fetch count when it is written; instead
+	 * they need a 0-1 transition of IbsFetchEn.
+	 */
+	if (boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
+		perf_ibs_fetch.fetch_count_reset_broken = 1;
+
 	perf_ibs_pmu_init(&perf_ibs_fetch, "ibs_fetch");
 
 	if (ibs_caps & IBS_CAPS_OPCNT) {

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/7] arch/x86/amd/ibs: Fix re-arming IBS Fetch
  2020-09-10  8:50     ` peterz
@ 2020-09-10 15:58       ` Kim Phillips
  0 siblings, 0 replies; 16+ messages in thread
From: Kim Phillips @ 2020-09-10 15:58 UTC (permalink / raw)
  To: peterz
  Cc: Borislav Petkov, Borislav Petkov, Ingo Molnar, Ingo Molnar,
	Thomas Gleixner, Stephane Eranian, Alexander Shishkin,
	Arnaldo Carvalho de Melo, H. Peter Anvin, Jiri Olsa,
	Mark Rutland, Michael Petlan, Namhyung Kim, LKML, x86,
	Stephane Eranian, stable

On 9/10/20 3:50 AM, peterz@infradead.org wrote:
> On Thu, Sep 10, 2020 at 10:32:23AM +0200, peterz@infradead.org wrote:
>>> @@ -363,7 +363,14 @@ perf_ibs_event_update(struct perf_ibs *perf_ibs, struct perf_event *event,
>>>  static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
>>>  					 struct hw_perf_event *hwc, u64 config)
>>>  {
>>> -	wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask);
>>> +	u64 _config = (hwc->config | config) & ~perf_ibs->enable_mask;
>>> +
>>> +	/* On Fam17h, the periodic fetch counter is set when IbsFetchEn is changed from 0 to 1 */
>>> +	if (perf_ibs == &perf_ibs_fetch && boot_cpu_data.x86 >= 0x16 && boot_cpu_data.x86 <= 0x18)
>>> +		wrmsrl(hwc->config_base, _config);
> 
>> A better option would be to use hwc->flags, you're loading from that
>> line already, so it's guaranteed hot and then you only have a single
>> branch. Or stick it in perf_ibs near enable_mask, same difference.
> 
> I fixed it for you.

> @@ -370,7 +371,13 @@ perf_ibs_event_update(struct perf_ibs *p
>  static inline void perf_ibs_enable_event(struct perf_ibs *perf_ibs,
>  					 struct hw_perf_event *hwc, u64 config)
>  {
> -	wrmsrl(hwc->config_base, hwc->config | config | perf_ibs->enable_mask);
> +	u64 _config = (hwc->config | config) & ~perf_ibs->enable_mask;
> +
> +	if (perf_ibs->fetch_count_reset_broken)

Nice, we don't even need the perf_ibs == &perf_ibs_fetch check
here because fetch_count_reset_broken is guaranteed to be 0 in
perf_ibs_op.

Thanks!

Kim

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/7] perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour
  2020-09-08 21:47 ` [PATCH v2 1/7] perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour Kim Phillips
@ 2020-09-10 16:34   ` Sasha Levin
  2020-09-11  7:02   ` [tip: perf/core] " tip-bot2 for Kim Phillips
  1 sibling, 0 replies; 16+ messages in thread
From: Sasha Levin @ 2020-09-10 16:34 UTC (permalink / raw)
  To: Sasha Levin, Kim Phillips, Borislav Petkov, Borislav Petkov
  Cc: Stephane Eranian, Stephane Eranian, Peter Zijlstra, Ingo Molnar,
	Ingo Molnar, Thomas Gleixner, Borislav Petkov,
	Alexander Shishkin, Arnaldo Carvalho de Melo, H. Peter Anvin,
	Jiri Olsa, Mark Rutland, Michael Petlan, Namhyung Kim, LKML, x86,
	stable, stable

Hi

[This is an automated email]

This commit has been processed because it contains a "Fixes:" tag
fixing commit: 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs").

The bot has tested the following trees: v5.8.7, v5.4.63, v4.19.143, v4.14.196.

v5.8.7: Build OK!
v5.4.63: Failed to apply! Possible dependencies:
    4dcc3df82573 ("perf/amd/uncore: Prepare L3 thread mask code for Family 19h")
    9689dbbeaea8 ("perf/amd/uncore: Make L3 thread mask code more readable")
    e48667b86548 ("perf/amd/uncore: Add support for Family 19h L3 PMU")

v4.19.143: Failed to apply! Possible dependencies:
    4dcc3df82573 ("perf/amd/uncore: Prepare L3 thread mask code for Family 19h")
    6d0ef316b9f8 ("x86/events: Add Hygon Dhyana support to PMU infrastructure")
    9689dbbeaea8 ("perf/amd/uncore: Make L3 thread mask code more readable")
    e48667b86548 ("perf/amd/uncore: Add support for Family 19h L3 PMU")

v4.14.196: Failed to apply! Possible dependencies:
    4dcc3df82573 ("perf/amd/uncore: Prepare L3 thread mask code for Family 19h")
    6d0ef316b9f8 ("x86/events: Add Hygon Dhyana support to PMU infrastructure")
    9689dbbeaea8 ("perf/amd/uncore: Make L3 thread mask code more readable")
    e48667b86548 ("perf/amd/uncore: Add support for Family 19h L3 PMU")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip: perf/core] perf/x86/amd/ibs: Support 27-bit extended Op/cycle counter
  2020-09-08 21:47 ` [PATCH v2 6/7] perf/x86/amd/ibs: Support 27-bit extended Op/cycle counter Kim Phillips
@ 2020-09-11  7:02   ` tip-bot2 for Kim Phillips
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot2 for Kim Phillips @ 2020-09-11  7:02 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kim Phillips, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     8b0bed7d410f48499d72af2e2bcd890daad94e0d
Gitweb:        https://git.kernel.org/tip/8b0bed7d410f48499d72af2e2bcd890daad94e0d
Author:        Kim Phillips <kim.phillips@amd.com>
AuthorDate:    Tue, 08 Sep 2020 16:47:39 -05:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:36 +02:00

perf/x86/amd/ibs: Support 27-bit extended Op/cycle counter

IBS hardware with the OpCntExt feature gets a 7-bit wider internal
counter.  Both the maximum and current count bitfields in the
IBS_OP_CTL register are extended to support reading and writing it.

No changes are necessary to the driver for handling the extra
contiguous current count bits (IbsOpCurCnt), as the driver already
passes through 32 bits of that field.  However, the driver has to do
some extra bit manipulation when converting from a period to the
non-contiguous (although conveniently aligned) extra bits in the
IbsOpMaxCnt bitfield.

This decreases IBS Op interrupt overhead when the period is over
1,048,560 (0xffff0), which would previously activate the driver's
software counter.  That threshold is now 134,217,712 (0x7fffff0).

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200908214740.18097-7-kim.phillips@amd.com
---
 arch/x86/events/amd/ibs.c         | 42 ++++++++++++++++++++++--------
 arch/x86/include/asm/perf_event.h |  1 +-
 2 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index cfbd020..ea323dc 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -339,10 +339,13 @@ static u64 get_ibs_op_count(u64 config)
 	 * and the lower 7 bits of CurCnt are randomized.
 	 * Otherwise CurCnt has the full 27-bit current counter value.
 	 */
-	if (config & IBS_OP_VAL)
+	if (config & IBS_OP_VAL) {
 		count = (config & IBS_OP_MAX_CNT) << 4;
-	else if (ibs_caps & IBS_CAPS_RDWROPCNT)
+		if (ibs_caps & IBS_CAPS_OPCNTEXT)
+			count += config & IBS_OP_MAX_CNT_EXT_MASK;
+	} else if (ibs_caps & IBS_CAPS_RDWROPCNT) {
 		count = (config & IBS_OP_CUR_CNT) >> 32;
+	}
 
 	return count;
 }
@@ -398,7 +401,7 @@ static void perf_ibs_start(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	struct perf_ibs *perf_ibs = container_of(event->pmu, struct perf_ibs, pmu);
 	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
-	u64 period;
+	u64 period, config = 0;
 
 	if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
 		return;
@@ -407,13 +410,19 @@ static void perf_ibs_start(struct perf_event *event, int flags)
 	hwc->state = 0;
 
 	perf_ibs_set_period(perf_ibs, hwc, &period);
+	if (perf_ibs == &perf_ibs_op && (ibs_caps & IBS_CAPS_OPCNTEXT)) {
+		config |= period & IBS_OP_MAX_CNT_EXT_MASK;
+		period &= ~IBS_OP_MAX_CNT_EXT_MASK;
+	}
+	config |= period >> 4;
+
 	/*
 	 * Set STARTED before enabling the hardware, such that a subsequent NMI
 	 * must observe it.
 	 */
 	set_bit(IBS_STARTED,    pcpu->state);
 	clear_bit(IBS_STOPPING, pcpu->state);
-	perf_ibs_enable_event(perf_ibs, hwc, period >> 4);
+	perf_ibs_enable_event(perf_ibs, hwc, config);
 
 	perf_event_update_userpage(event);
 }
@@ -581,7 +590,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 	struct perf_ibs_data ibs_data;
 	int offset, size, check_rip, offset_max, throttle = 0;
 	unsigned int msr;
-	u64 *buf, *config, period;
+	u64 *buf, *config, period, new_config = 0;
 
 	if (!test_bit(IBS_STARTED, pcpu->state)) {
 fail:
@@ -676,13 +685,17 @@ out:
 	if (throttle) {
 		perf_ibs_stop(event, 0);
 	} else {
-		period >>= 4;
-
-		if ((ibs_caps & IBS_CAPS_RDWROPCNT) &&
-		    (*config & IBS_OP_CNT_CTL))
-			period |= *config & IBS_OP_CUR_CNT_RAND;
+		if (perf_ibs == &perf_ibs_op) {
+			if (ibs_caps & IBS_CAPS_OPCNTEXT) {
+				new_config = period & IBS_OP_MAX_CNT_EXT_MASK;
+				period &= ~IBS_OP_MAX_CNT_EXT_MASK;
+			}
+			if ((ibs_caps & IBS_CAPS_RDWROPCNT) && (*config & IBS_OP_CNT_CTL))
+				new_config |= *config & IBS_OP_CUR_CNT_RAND;
+		}
+		new_config |= period >> 4;
 
-		perf_ibs_enable_event(perf_ibs, hwc, period);
+		perf_ibs_enable_event(perf_ibs, hwc, new_config);
 	}
 
 	perf_event_update_userpage(event);
@@ -749,6 +762,13 @@ static __init void perf_event_ibs_init(void)
 		perf_ibs_op.config_mask |= IBS_OP_CNT_CTL;
 		*attr++ = &format_attr_cnt_ctl.attr;
 	}
+
+	if (ibs_caps & IBS_CAPS_OPCNTEXT) {
+		perf_ibs_op.max_period  |= IBS_OP_MAX_CNT_EXT_MASK;
+		perf_ibs_op.config_mask	|= IBS_OP_MAX_CNT_EXT_MASK;
+		perf_ibs_op.cnt_mask    |= IBS_OP_MAX_CNT_EXT_MASK;
+	}
+
 	perf_ibs_pmu_init(&perf_ibs_op, "ibs_op");
 
 	register_nmi_handler(NMI_LOCAL, perf_ibs_nmi_handler, 0, "perf_ibs");
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index e20aa58..6960cd6 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -405,6 +405,7 @@ struct pebs_xmm {
 #define IBS_OP_ENABLE		(1ULL<<17)
 #define IBS_OP_MAX_CNT		0x0000FFFFULL
 #define IBS_OP_MAX_CNT_EXT	0x007FFFFFULL	/* not a register bit mask */
+#define IBS_OP_MAX_CNT_EXT_MASK	(0x7FULL<<20)	/* separate upper 7 bits */
 #define IBS_RIP_INVALID		(1ULL<<38)
 
 #ifdef CONFIG_X86_LOCAL_APIC

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip: perf/core] perf/x86/rapl: Add AMD Fam19h RAPL support
  2020-09-08 21:47 ` [PATCH v2 7/7] perf/x86/rapl: Add AMD Fam19h RAPL support Kim Phillips
@ 2020-09-11  7:02   ` tip-bot2 for Kim Phillips
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot2 for Kim Phillips @ 2020-09-11  7:02 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kim Phillips, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     a77259bdcb62a2c345914df659a1fbc421269a8b
Gitweb:        https://git.kernel.org/tip/a77259bdcb62a2c345914df659a1fbc421269a8b
Author:        Kim Phillips <kim.phillips@amd.com>
AuthorDate:    Tue, 08 Sep 2020 16:47:40 -05:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:36 +02:00

perf/x86/rapl: Add AMD Fam19h RAPL support

Family 19h RAPL support did not change from Family 17h; extend
the existing Fam17h support to work on Family 19h too.

Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200908214740.18097-8-kim.phillips@amd.com
---
 arch/x86/events/rapl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 67b411f..7c0120e 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -815,6 +815,7 @@ static const struct x86_cpu_id rapl_model_match[] __initconst = {
 	X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,	&model_spr),
 	X86_MATCH_VENDOR_FAM(AMD,	0x17,		&model_amd_fam17h),
 	X86_MATCH_VENDOR_FAM(HYGON,	0x18,		&model_amd_fam17h),
+	X86_MATCH_VENDOR_FAM(AMD,	0x19,		&model_amd_fam17h),
 	{},
 };
 MODULE_DEVICE_TABLE(x86cpu, rapl_model_match);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip: perf/core] perf/x86/amd/ibs: Fix raw sample data accumulation
  2020-09-08 21:47 ` [PATCH v2 5/7] perf/x86/amd/ibs: Fix raw sample data accumulation Kim Phillips
@ 2020-09-11  7:02   ` tip-bot2 for Kim Phillips
  0 siblings, 0 replies; 16+ messages in thread
From: tip-bot2 for Kim Phillips @ 2020-09-11  7:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Stephane Eranian, Kim Phillips, Peter Zijlstra (Intel),
	stable, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     36e1be8ada994d509538b3b1d0af8b63c351e729
Gitweb:        https://git.kernel.org/tip/36e1be8ada994d509538b3b1d0af8b63c351e729
Author:        Kim Phillips <kim.phillips@amd.com>
AuthorDate:    Tue, 08 Sep 2020 16:47:38 -05:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:35 +02:00

perf/x86/amd/ibs: Fix raw sample data accumulation

Neither IbsBrTarget nor OPDATA4 are populated in IBS Fetch mode.
Don't accumulate them into raw sample user data in that case.

Also, in Fetch mode, add saving the IBS Fetch Control Extended MSR.

Technically, there is an ABI change here with respect to the IBS raw
sample data format, but I don't see any perf driver version information
being included in perf.data file headers, but, existing users can detect
whether the size of the sample record has reduced by 8 bytes to
determine whether the IBS driver has this fix.

Fixes: 904cb3677f3a ("perf/x86/amd/ibs: Update IBS MSRs and feature definitions")
Reported-by: Stephane Eranian <stephane.eranian@google.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200908214740.18097-6-kim.phillips@amd.com
---
 arch/x86/events/amd/ibs.c        | 26 ++++++++++++++++----------
 arch/x86/include/asm/msr-index.h |  1 +
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 863174a..cfbd020 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -630,18 +630,24 @@ fail:
 				       perf_ibs->offset_max,
 				       offset + 1);
 	} while (offset < offset_max);
+	/*
+	 * Read IbsBrTarget, IbsOpData4, and IbsExtdCtl separately
+	 * depending on their availability.
+	 * Can't add to offset_max as they are staggered
+	 */
 	if (event->attr.sample_type & PERF_SAMPLE_RAW) {
-		/*
-		 * Read IbsBrTarget and IbsOpData4 separately
-		 * depending on their availability.
-		 * Can't add to offset_max as they are staggered
-		 */
-		if (ibs_caps & IBS_CAPS_BRNTRGT) {
-			rdmsrl(MSR_AMD64_IBSBRTARGET, *buf++);
-			size++;
+		if (perf_ibs == &perf_ibs_op) {
+			if (ibs_caps & IBS_CAPS_BRNTRGT) {
+				rdmsrl(MSR_AMD64_IBSBRTARGET, *buf++);
+				size++;
+			}
+			if (ibs_caps & IBS_CAPS_OPDATA4) {
+				rdmsrl(MSR_AMD64_IBSOPDATA4, *buf++);
+				size++;
+			}
 		}
-		if (ibs_caps & IBS_CAPS_OPDATA4) {
-			rdmsrl(MSR_AMD64_IBSOPDATA4, *buf++);
+		if (perf_ibs == &perf_ibs_fetch && (ibs_caps & IBS_CAPS_FETCHCTLEXTD)) {
+			rdmsrl(MSR_AMD64_ICIBSEXTDCTL, *buf++);
 			size++;
 		}
 	}
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index dc131b8..2d39c31 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -464,6 +464,7 @@
 #define MSR_AMD64_IBSOP_REG_MASK	((1UL<<MSR_AMD64_IBSOP_REG_COUNT)-1)
 #define MSR_AMD64_IBSCTL		0xc001103a
 #define MSR_AMD64_IBSBRTARGET		0xc001103b
+#define MSR_AMD64_ICIBSEXTDCTL		0xc001103c
 #define MSR_AMD64_IBSOPDATA4		0xc001103d
 #define MSR_AMD64_IBS_REG_COUNT_MAX	8 /* includes MSR_AMD64_IBSBRTARGET */
 #define MSR_AMD64_SEV			0xc0010131

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [tip: perf/core] perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour
  2020-09-08 21:47 ` [PATCH v2 1/7] perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour Kim Phillips
  2020-09-10 16:34   ` Sasha Levin
@ 2020-09-11  7:02   ` tip-bot2 for Kim Phillips
  1 sibling, 0 replies; 16+ messages in thread
From: tip-bot2 for Kim Phillips @ 2020-09-11  7:02 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kim Phillips, Peter Zijlstra (Intel), stable, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     c8fe99d0701fec9fb849ec880a86bc5592530496
Gitweb:        https://git.kernel.org/tip/c8fe99d0701fec9fb849ec880a86bc5592530496
Author:        Kim Phillips <kim.phillips@amd.com>
AuthorDate:    Tue, 08 Sep 2020 16:47:34 -05:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 10 Sep 2020 11:19:34 +02:00

perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour

Commit 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for
F17h L3 PMCs") inadvertently changed the uncore driver's behaviour
wrt perf tool invocations with or without a CPU list, specified with
-C / --cpu=.

Change the behaviour of the driver to assume the former all-cpu (-a)
case, which is the more commonly desired default.  This fixes
'-a -A' invocations without explicit cpu lists (-C) to not count
L3 events only on behalf of the first thread of the first core
in the L3 domain.

BEFORE:

Activity performed by the first thread of the last core (CPU#43) in
CPU#40's L3 domain is not reported by CPU#40:

sudo perf stat -a -A -e l3_request_g1.caching_l3_cache_accesses taskset -c 43 perf bench mem memcpy -s 32mb -l 100 -f default
...
CPU36                 21,835      l3_request_g1.caching_l3_cache_accesses
CPU40                 87,066      l3_request_g1.caching_l3_cache_accesses
CPU44                 17,360      l3_request_g1.caching_l3_cache_accesses
...

AFTER:

The L3 domain activity is now reported by CPU#40:

sudo perf stat -a -A -e l3_request_g1.caching_l3_cache_accesses taskset -c 43 perf bench mem memcpy -s 32mb -l 100 -f default
...
CPU36                354,891      l3_request_g1.caching_l3_cache_accesses
CPU40              1,780,870      l3_request_g1.caching_l3_cache_accesses
CPU44                315,062      l3_request_g1.caching_l3_cache_accesses
...

Fixes: 2f217d58a8a0 ("perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200908214740.18097-2-kim.phillips@amd.com
---
 arch/x86/events/amd/uncore.c | 28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 76400c0..e7e61c8 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -181,28 +181,16 @@ static void amd_uncore_del(struct perf_event *event, int flags)
 }
 
 /*
- * Convert logical CPU number to L3 PMC Config ThreadMask format
+ * Return a full thread and slice mask until per-CPU is
+ * properly supported.
  */
-static u64 l3_thread_slice_mask(int cpu)
+static u64 l3_thread_slice_mask(void)
 {
-	u64 thread_mask, core = topology_core_id(cpu);
-	unsigned int shift, thread = 0;
+	if (boot_cpu_data.x86 <= 0x18)
+		return AMD64_L3_SLICE_MASK | AMD64_L3_THREAD_MASK;
 
-	if (topology_smt_supported() && !topology_is_primary_thread(cpu))
-		thread = 1;
-
-	if (boot_cpu_data.x86 <= 0x18) {
-		shift = AMD64_L3_THREAD_SHIFT + 2 * (core % 4) + thread;
-		thread_mask = BIT_ULL(shift);
-
-		return AMD64_L3_SLICE_MASK | thread_mask;
-	}
-
-	core = (core << AMD64_L3_COREID_SHIFT) & AMD64_L3_COREID_MASK;
-	shift = AMD64_L3_THREAD_SHIFT + thread;
-	thread_mask = BIT_ULL(shift);
-
-	return AMD64_L3_EN_ALL_SLICES | core | thread_mask;
+	return AMD64_L3_EN_ALL_SLICES | AMD64_L3_EN_ALL_CORES |
+	       AMD64_L3_F19H_THREAD_MASK;
 }
 
 static int amd_uncore_event_init(struct perf_event *event)
@@ -232,7 +220,7 @@ static int amd_uncore_event_init(struct perf_event *event)
 	 * For other events, the two fields do not affect the count.
 	 */
 	if (l3_mask && is_llc_event(event))
-		hwc->config |= l3_thread_slice_mask(event->cpu);
+		hwc->config |= l3_thread_slice_mask();
 
 	uncore = event_to_amd_uncore(event);
 	if (!uncore)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-09-11  7:04 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08 21:47 [PATCH v2 0/7] perf/x86/amd: Miscellaneous updates Kim Phillips
2020-09-08 21:47 ` [PATCH v2 1/7] perf/amd/uncore: Set all slices and threads to restore perf stat -a behaviour Kim Phillips
2020-09-10 16:34   ` Sasha Levin
2020-09-11  7:02   ` [tip: perf/core] " tip-bot2 for Kim Phillips
2020-09-08 21:47 ` [PATCH v2 2/7] perf/x86/amd: Fix sampling Large Increment per Cycle events Kim Phillips
2020-09-08 21:47 ` [PATCH v2 3/7] arch/x86/amd/ibs: Fix re-arming IBS Fetch Kim Phillips
2020-09-10  8:32   ` peterz
2020-09-10  8:50     ` peterz
2020-09-10 15:58       ` Kim Phillips
2020-09-08 21:47 ` [PATCH v2 4/7] perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count() Kim Phillips
2020-09-08 21:47 ` [PATCH v2 5/7] perf/x86/amd/ibs: Fix raw sample data accumulation Kim Phillips
2020-09-11  7:02   ` [tip: perf/core] " tip-bot2 for Kim Phillips
2020-09-08 21:47 ` [PATCH v2 6/7] perf/x86/amd/ibs: Support 27-bit extended Op/cycle counter Kim Phillips
2020-09-11  7:02   ` [tip: perf/core] " tip-bot2 for Kim Phillips
2020-09-08 21:47 ` [PATCH v2 7/7] perf/x86/rapl: Add AMD Fam19h RAPL support Kim Phillips
2020-09-11  7:02   ` [tip: perf/core] " tip-bot2 for Kim Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).