All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
@ 2016-08-11 15:21 Matt Fleming
  2016-08-11 16:41 ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Matt Fleming @ 2016-08-11 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Matt Fleming, Ingo Molnar, Borislav Petkov, stable

While the Intel PMU monitors the LLC when perf enables the
HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
L1 instruction cache fetches (0x0080) and instruction cache misses
(0x0081) on the AMD PMU.

This is extremely confusing when monitoring the same workload across
Intel and AMD machines, since parameters like,

  $ perf stat -e cache-references,cache-misses

measure completely different things.

Instead, make the AMD PMU measure instruction/data cache fill requests
to the L2 and instruction/data cache misses in the L2 when
HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled, respectively.
That way the events measure unified caches on both platforms.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org>
---
 arch/x86/events/amd/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index e07a22bb9308..8fd8bf79f32b 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -119,8 +119,8 @@ static const u64 amd_perfmon_event_map[PERF_COUNT_HW_MAX] =
 {
   [PERF_COUNT_HW_CPU_CYCLES]			= 0x0076,
   [PERF_COUNT_HW_INSTRUCTIONS]			= 0x00c0,
-  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x0080,
-  [PERF_COUNT_HW_CACHE_MISSES]			= 0x0081,
+  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x037d,
+  [PERF_COUNT_HW_CACHE_MISSES]			= 0x037e,
   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]		= 0x00c2,
   [PERF_COUNT_HW_BRANCH_MISSES]			= 0x00c3,
   [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND]	= 0x00d0, /* "Decoder empty" event */
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-11 15:21 [PATCH] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2 Matt Fleming
@ 2016-08-11 16:41 ` Borislav Petkov
  2016-08-15 15:13   ` Matt Fleming
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2016-08-11 16:41 UTC (permalink / raw)
  To: Matt Fleming; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar

Drop stable from CC.

On Thu, Aug 11, 2016 at 04:21:42PM +0100, Matt Fleming wrote:
> While the Intel PMU monitors the LLC when perf enables the
> HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
> L1 instruction cache fetches (0x0080) and instruction cache misses
> (0x0081) on the AMD PMU.
> 
> This is extremely confusing when monitoring the same workload across
> Intel and AMD machines, since parameters like,
> 
>   $ perf stat -e cache-references,cache-misses
> 
> measure completely different things.
> 
> Instead, make the AMD PMU measure instruction/data cache fill requests
> to the L2 and instruction/data cache misses in the L2 when
> HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled, respectively.
> That way the events measure unified caches on both platforms.

I guess that's closer.

Even though LLC is not always L2 on AMD (some have L3). Btw,
what are the exact events for PERF_COUNT_HW_CACHE_REFERENCES and
PERF_COUNT_HW_CACHE_MISSES called on Intel?

I could try to find better/more fitting event selectors on AMD...

> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: <stable@vger.kernel.org>
> ---
>  arch/x86/events/amd/core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index e07a22bb9308..8fd8bf79f32b 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -119,8 +119,8 @@ static const u64 amd_perfmon_event_map[PERF_COUNT_HW_MAX] =
>  {
>    [PERF_COUNT_HW_CPU_CYCLES]			= 0x0076,
>    [PERF_COUNT_HW_INSTRUCTIONS]			= 0x00c0,
> -  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x0080,
> -  [PERF_COUNT_HW_CACHE_MISSES]			= 0x0081,
> +  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x037d,
> +  [PERF_COUNT_HW_CACHE_MISSES]			= 0x037e,
>    [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]		= 0x00c2,
>    [PERF_COUNT_HW_BRANCH_MISSES]			= 0x00c3,
>    [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND]	= 0x00d0, /* "Decoder empty" event */

Btw, there's also amd_event_mapping in arch/x86/kvm/pmu_amd.c which has
duplicated amd_perfmon_event_map. Would need adjusting too.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-11 16:41 ` Borislav Petkov
@ 2016-08-15 15:13   ` Matt Fleming
  2016-08-18 16:25     ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Matt Fleming @ 2016-08-15 15:13 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar

On Thu, 11 Aug, at 06:41:50PM, Borislav Petkov wrote:
> Drop stable from CC.
> 
> On Thu, Aug 11, 2016 at 04:21:42PM +0100, Matt Fleming wrote:
> > While the Intel PMU monitors the LLC when perf enables the
> > HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
> > L1 instruction cache fetches (0x0080) and instruction cache misses
> > (0x0081) on the AMD PMU.
> > 
> > This is extremely confusing when monitoring the same workload across
> > Intel and AMD machines, since parameters like,
> > 
> >   $ perf stat -e cache-references,cache-misses
> > 
> > measure completely different things.
> > 
> > Instead, make the AMD PMU measure instruction/data cache fill requests
> > to the L2 and instruction/data cache misses in the L2 when
> > HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled, respectively.
> > That way the events measure unified caches on both platforms.
> 
> I guess that's closer.
> 
> Even though LLC is not always L2 on AMD (some have L3). Btw,
> what are the exact events for PERF_COUNT_HW_CACHE_REFERENCES and
> PERF_COUNT_HW_CACHE_MISSES called on Intel?
 
They're referred to as "LLC Reference" and "LLC Misses" in the Intel
SDM Table 18-1 and "Longest latency cache references/misses" in Table
19-1.

> I could try to find better/more fitting event selectors on AMD...
 
If you've got any other suggestions, I'm all ears. Note that one thing
I wasn't sure about was whether we want to include TLB events hitting
the L2. I left them out of this patch, but it might make sense to add
them so that HW_CACHE_{REFERENCES,MISSES} is actually distinguishable
from LLC-{loads,misses}.

> > Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: <stable@vger.kernel.org>
> > ---
> >  arch/x86/events/amd/core.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> > index e07a22bb9308..8fd8bf79f32b 100644
> > --- a/arch/x86/events/amd/core.c
> > +++ b/arch/x86/events/amd/core.c
> > @@ -119,8 +119,8 @@ static const u64 amd_perfmon_event_map[PERF_COUNT_HW_MAX] =
> >  {
> >    [PERF_COUNT_HW_CPU_CYCLES]			= 0x0076,
> >    [PERF_COUNT_HW_INSTRUCTIONS]			= 0x00c0,
> > -  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x0080,
> > -  [PERF_COUNT_HW_CACHE_MISSES]			= 0x0081,
> > +  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x037d,
> > +  [PERF_COUNT_HW_CACHE_MISSES]			= 0x037e,
> >    [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]		= 0x00c2,
> >    [PERF_COUNT_HW_BRANCH_MISSES]			= 0x00c3,
> >    [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND]	= 0x00d0, /* "Decoder empty" event */
> 
> Btw, there's also amd_event_mapping in arch/x86/kvm/pmu_amd.c which has
> duplicated amd_perfmon_event_map. Would need adjusting too.

Urgh, right. I totally missed that. I'll update.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-15 15:13   ` Matt Fleming
@ 2016-08-18 16:25     ` Borislav Petkov
  2016-08-19 13:34       ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2016-08-18 16:25 UTC (permalink / raw)
  To: Matt Fleming; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar

On Mon, Aug 15, 2016 at 04:13:16PM +0100, Matt Fleming wrote:
> They're referred to as "LLC Reference" and "LLC Misses" in the Intel
> SDM Table 18-1 and "Longest latency cache references/misses" in Table
> 19-1.

Btw, it warns us right:

"Because cache hierarchy, cache sizes and other implementation-specific
characteristics; value comparison to estimate performance differences is
not recommended."

> 
> > I could try to find better/more fitting event selectors on AMD...
>  
> If you've got any other suggestions, I'm all ears.

So there are no LLC events on AMD in the sense that there are no
event selectors which always mean last-level cache and select those
automagically, no matter whether the LLC is the L2, L3 and so on,
depending on the part.

If we have to be correct on AMD, we'd have to check whether the part has
an L3 and then choose the L3 events, say, something like

"EventSelect 4E1h L3 Cache Misses" and "EventSelect 4E2h L3 Fills caused
by L2 Evictions"

and if the LLC is the L2 (think client parts) then take the ones you've
selected.

I guess amd_pmu_event_map() could be taught to return the proper event
map depending on the part.

Now, the L3 detection could be carved out from some pieces in
arch/x86/kernel/cpu/intel_cacheinfo.c but I'd need to swap in all that
code again...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-18 16:25     ` Borislav Petkov
@ 2016-08-19 13:34       ` Peter Zijlstra
  2016-08-19 14:44         ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-08-19 13:34 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Matt Fleming, linux-kernel, Ingo Molnar

On Thu, Aug 18, 2016 at 06:25:22PM +0200, Borislav Petkov wrote:

> > > I could try to find better/more fitting event selectors on AMD...
> >  
> > If you've got any other suggestions, I'm all ears.
> 
> So there are no LLC events on AMD in the sense that there are no
> event selectors which always mean last-level cache and select those
> automagically, no matter whether the LLC is the L2, L3 and so on,
> depending on the part.
> 
> If we have to be correct on AMD, we'd have to check whether the part has
> an L3 and then choose the L3 events, say, something like
> 
> "EventSelect 4E1h L3 Cache Misses" and "EventSelect 4E2h L3 Fills caused
> by L2 Evictions"

Can't those events are NB events and cannot be used on per CPU counters.

The 7D,7E L2 events are the best that are available on AMD afaict.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-19 13:34       ` Peter Zijlstra
@ 2016-08-19 14:44         ` Borislav Petkov
  2016-08-19 15:16           ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2016-08-19 14:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Matt Fleming, linux-kernel, Ingo Molnar

On Fri, Aug 19, 2016 at 03:34:22PM +0200, Peter Zijlstra wrote:
> Can't those events are NB events and cannot be used on per CPU counters.

It fugures, considering L3 is part of the NB on AMD.

So the Intel ones are special in the sense that they can be used on per
CPU counters even though they're not really per-CPU?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-19 14:44         ` Borislav Petkov
@ 2016-08-19 15:16           ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2016-08-19 15:16 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Matt Fleming, linux-kernel, Ingo Molnar

On Fri, Aug 19, 2016 at 04:44:53PM +0200, Borislav Petkov wrote:
> On Fri, Aug 19, 2016 at 03:34:22PM +0200, Peter Zijlstra wrote:
> > Can't those events are NB events and cannot be used on per CPU counters.
> 
> It fugures, considering L3 is part of the NB on AMD.
> 
> So the Intel ones are special in the sense that they can be used on per
> CPU counters even though they're not really per-CPU?

Intel has L3 (and L2,1) request and miss events per logical CPU. The CPU
still issues the load/store that causes the request and miss and thus
can be accounted to the program under execution.

Intel also has a bunch of L3 events at the uncore of course.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-19 15:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-11 15:21 [PATCH] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2 Matt Fleming
2016-08-11 16:41 ` Borislav Petkov
2016-08-15 15:13   ` Matt Fleming
2016-08-18 16:25     ` Borislav Petkov
2016-08-19 13:34       ` Peter Zijlstra
2016-08-19 14:44         ` Borislav Petkov
2016-08-19 15:16           ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.