All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
@ 2016-08-24 13:12 Matt Fleming
  2016-08-24 14:55 ` Borislav Petkov
  2016-09-16 16:09 ` [tip:perf/urgent] " tip-bot for Matt Fleming
  0 siblings, 2 replies; 6+ messages in thread
From: Matt Fleming @ 2016-08-24 13:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Matt Fleming, Ingo Molnar, Borislav Petkov, stable

While the Intel PMU monitors the LLC when perf enables the
HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
L1 instruction cache fetches (0x0080) and instruction cache misses
(0x0081) on the AMD PMU.

This is extremely confusing when monitoring the same workload across
Intel and AMD machines, since parameters like,

  $ perf stat -e cache-references,cache-misses

measure completely different things.

Instead, make the AMD PMU measure instruction/data cache and TLB fill
requests to the L2 and instruction/data cache and TLB misses in the L2
when HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled,
respectively. That way the events measure unified caches on both
platforms.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org>
---

Changes in v2:
  - Update the KVM AMD PMU code
  - Also measure TLB hits/misses in the L2

 arch/x86/events/amd/core.c | 4 ++--
 arch/x86/kvm/pmu_amd.c     | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index e07a22bb9308..f5f4b3fbbbc2 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -119,8 +119,8 @@ static const u64 amd_perfmon_event_map[PERF_COUNT_HW_MAX] =
 {
   [PERF_COUNT_HW_CPU_CYCLES]			= 0x0076,
   [PERF_COUNT_HW_INSTRUCTIONS]			= 0x00c0,
-  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x0080,
-  [PERF_COUNT_HW_CACHE_MISSES]			= 0x0081,
+  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x077d,
+  [PERF_COUNT_HW_CACHE_MISSES]			= 0x077e,
   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]		= 0x00c2,
   [PERF_COUNT_HW_BRANCH_MISSES]			= 0x00c3,
   [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND]	= 0x00d0, /* "Decoder empty" event */
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 39b91127ef07..cd944435dfbd 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -23,8 +23,8 @@
 static struct kvm_event_hw_type_mapping amd_event_mapping[] = {
 	[0] = { 0x76, 0x00, PERF_COUNT_HW_CPU_CYCLES },
 	[1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS },
-	[2] = { 0x80, 0x00, PERF_COUNT_HW_CACHE_REFERENCES },
-	[3] = { 0x81, 0x00, PERF_COUNT_HW_CACHE_MISSES },
+	[2] = { 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES },
+	[3] = { 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES },
 	[4] = { 0xc2, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
 	[5] = { 0xc3, 0x00, PERF_COUNT_HW_BRANCH_MISSES },
 	[6] = { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-24 13:12 [PATCH v2] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2 Matt Fleming
@ 2016-08-24 14:55 ` Borislav Petkov
  2016-08-24 18:27   ` Peter Zijlstra
  2016-09-16 16:09 ` [tip:perf/urgent] " tip-bot for Matt Fleming
  1 sibling, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2016-08-24 14:55 UTC (permalink / raw)
  To: Matt Fleming; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, stable

On Wed, Aug 24, 2016 at 02:12:08PM +0100, Matt Fleming wrote:
> While the Intel PMU monitors the LLC when perf enables the
> HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
> L1 instruction cache fetches (0x0080) and instruction cache misses
> (0x0081) on the AMD PMU.
> 
> This is extremely confusing when monitoring the same workload across
> Intel and AMD machines, since parameters like,
> 
>   $ perf stat -e cache-references,cache-misses
> 
> measure completely different things.
> 
> Instead, make the AMD PMU measure instruction/data cache and TLB fill
> requests to the L2 and instruction/data cache and TLB misses in the L2
> when HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled,
> respectively. That way the events measure unified caches on both
> platforms.

I'm still not really sure about this: we can't really compare L3 to L2
access patterns - it is almost as comparing apples to oranges. Can we
use the Intel L2 events instead?

I mean, this makes much more sense to me because:

* you *actually* compare the same cache levels
* you have L2 *everywhere* vs L3 (and L4) which are sometimes not present on
  thin clients

People who want LLC can enable them with -e additionally...

Hmmm.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-24 14:55 ` Borislav Petkov
@ 2016-08-24 18:27   ` Peter Zijlstra
  2016-08-25  3:35     ` Borislav Petkov
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2016-08-24 18:27 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Matt Fleming, linux-kernel, Ingo Molnar, stable

On Wed, Aug 24, 2016 at 04:55:14PM +0200, Borislav Petkov wrote:
> On Wed, Aug 24, 2016 at 02:12:08PM +0100, Matt Fleming wrote:
> > While the Intel PMU monitors the LLC when perf enables the
> > HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
> > L1 instruction cache fetches (0x0080) and instruction cache misses
> > (0x0081) on the AMD PMU.
> > 
> > This is extremely confusing when monitoring the same workload across
> > Intel and AMD machines, since parameters like,
> > 
> >   $ perf stat -e cache-references,cache-misses
> > 
> > measure completely different things.
> > 
> > Instead, make the AMD PMU measure instruction/data cache and TLB fill
> > requests to the L2 and instruction/data cache and TLB misses in the L2
> > when HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled,
> > respectively. That way the events measure unified caches on both
> > platforms.
> 
> I'm still not really sure about this: we can't really compare L3 to L2
> access patterns - it is almost as comparing apples to oranges. Can we
> use the Intel L2 events instead?

They're not meant to be comparable between machines. I wouldn't even
compare the LLC numbers between two different Intel parts.

These events are meant to profile a workload on the machine you run them
on. Big cache-miss/ref ratios indicate you loose performance because of
the memory subsystem and or data structure layout.

And afaict AMD parts, even those that have L3, cannot provide L3 numbers
on a per task basis, so these L2 numbers are the best we have.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-24 18:27   ` Peter Zijlstra
@ 2016-08-25  3:35     ` Borislav Petkov
  2016-09-16 13:01       ` Matt Fleming
  0 siblings, 1 reply; 6+ messages in thread
From: Borislav Petkov @ 2016-08-25  3:35 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Matt Fleming, linux-kernel, Ingo Molnar

(dropping stable@ from CC)

On Wed, Aug 24, 2016 at 08:27:06PM +0200, Peter Zijlstra wrote:
> They're not meant to be comparable between machines. I wouldn't even
> compare the LLC numbers between two different Intel parts.
> 
> These events are meant to profile a workload on the machine you run them
> on. Big cache-miss/ref ratios indicate you loose performance because of
> the memory subsystem and or data structure layout.

Ah ok, then I've misunderstood Matt's justification in the commit message.

FWIW: Acked-by: Borislav Petkov <bp@suse.de>

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-25  3:35     ` Borislav Petkov
@ 2016-09-16 13:01       ` Matt Fleming
  0 siblings, 0 replies; 6+ messages in thread
From: Matt Fleming @ 2016-09-16 13:01 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Peter Zijlstra, linux-kernel, Ingo Molnar

On Thu, 25 Aug, at 05:35:14AM, Borislav Petkov wrote:
> (dropping stable@ from CC)
> 
> On Wed, Aug 24, 2016 at 08:27:06PM +0200, Peter Zijlstra wrote:
> > They're not meant to be comparable between machines. I wouldn't even
> > compare the LLC numbers between two different Intel parts.
> > 
> > These events are meant to profile a workload on the machine you run them
> > on. Big cache-miss/ref ratios indicate you loose performance because of
> > the memory subsystem and or data structure layout.
> 
> Ah ok, then I've misunderstood Matt's justification in the commit message.
> 
> FWIW: Acked-by: Borislav Petkov <bp@suse.de>

Ping? Tip folks: are you OK to apply this?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:perf/urgent] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
  2016-08-24 13:12 [PATCH v2] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2 Matt Fleming
  2016-08-24 14:55 ` Borislav Petkov
@ 2016-09-16 16:09 ` tip-bot for Matt Fleming
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Matt Fleming @ 2016-09-16 16:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, mingo, peterz, bp, stable, hpa, linux-kernel, tglx, matt

Commit-ID:  080fe0b790ad438fc1b61621dac37c1964ce7f35
Gitweb:     http://git.kernel.org/tip/080fe0b790ad438fc1b61621dac37c1964ce7f35
Author:     Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate: Wed, 24 Aug 2016 14:12:08 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 16 Sep 2016 16:19:49 +0200

perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2

While the Intel PMU monitors the LLC when perf enables the
HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
L1 instruction cache fetches (0x0080) and instruction cache misses
(0x0081) on the AMD PMU.

This is extremely confusing when monitoring the same workload across
Intel and AMD machines, since parameters like,

  $ perf stat -e cache-references,cache-misses

measure completely different things.

Instead, make the AMD PMU measure instruction/data cache and TLB fill
requests to the L2 and instruction/data cache and TLB misses in the L2
when HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled,
respectively. That way the events measure unified caches on both
platforms.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1472044328-21302-1-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/events/amd/core.c | 4 ++--
 arch/x86/kvm/pmu_amd.c     | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index e07a22b..f5f4b3f 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -119,8 +119,8 @@ static const u64 amd_perfmon_event_map[PERF_COUNT_HW_MAX] =
 {
   [PERF_COUNT_HW_CPU_CYCLES]			= 0x0076,
   [PERF_COUNT_HW_INSTRUCTIONS]			= 0x00c0,
-  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x0080,
-  [PERF_COUNT_HW_CACHE_MISSES]			= 0x0081,
+  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x077d,
+  [PERF_COUNT_HW_CACHE_MISSES]			= 0x077e,
   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]		= 0x00c2,
   [PERF_COUNT_HW_BRANCH_MISSES]			= 0x00c3,
   [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND]	= 0x00d0, /* "Decoder empty" event */
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 39b9112..cd94443 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -23,8 +23,8 @@
 static struct kvm_event_hw_type_mapping amd_event_mapping[] = {
 	[0] = { 0x76, 0x00, PERF_COUNT_HW_CPU_CYCLES },
 	[1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS },
-	[2] = { 0x80, 0x00, PERF_COUNT_HW_CACHE_REFERENCES },
-	[3] = { 0x81, 0x00, PERF_COUNT_HW_CACHE_MISSES },
+	[2] = { 0x7d, 0x07, PERF_COUNT_HW_CACHE_REFERENCES },
+	[3] = { 0x7e, 0x07, PERF_COUNT_HW_CACHE_MISSES },
 	[4] = { 0xc2, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
 	[5] = { 0xc3, 0x00, PERF_COUNT_HW_BRANCH_MISSES },
 	[6] = { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-09-16 16:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-24 13:12 [PATCH v2] perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2 Matt Fleming
2016-08-24 14:55 ` Borislav Petkov
2016-08-24 18:27   ` Peter Zijlstra
2016-08-25  3:35     ` Borislav Petkov
2016-09-16 13:01       ` Matt Fleming
2016-09-16 16:09 ` [tip:perf/urgent] " tip-bot for Matt Fleming

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.