linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] perf_events: more wrong events for AMD fam10h
@ 2011-06-07 19:39 Vince Weaver
  2011-06-07 21:07 ` [patch] perf_events: even " Vince Weaver
  2011-06-27 11:22 ` [patch] perf_events: " Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Vince Weaver @ 2011-06-07 19:39 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo

Hello

I'm in the process of auditing perf_event's awesome "generalized events".

On AMD fam10h for some we have the following definitions:
  cache-references = INSTRUCTION_CACHE_FETCHES	0x530080
  cache-misses 	   = INSTRUCTION_CACHE_MISSES	0x530081

on Intel at least I'm pretty sure these events match to Last Level Cache 
accesses/misses, not icache.  Is there a reason for this?

Attached is a patch that removes these until better events can be found.
(LLC is tricky on AMD as it's a shared resource).

Note, l1-dcache-stores is broken too, I'm looking into it.

Thanks,

Vince
vweaver1@eecs.utk.edu

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index fe29c1d..a46b987 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -98,8 +98,6 @@ static const u64 amd_perfmon_event_map[] =
 {
   [PERF_COUNT_HW_CPU_CYCLES]			= 0x0076,
   [PERF_COUNT_HW_INSTRUCTIONS]			= 0x00c0,
-  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x0080,
-  [PERF_COUNT_HW_CACHE_MISSES]			= 0x0081,
   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]		= 0x00c2,
   [PERF_COUNT_HW_BRANCH_MISSES]			= 0x00c3,
   [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND]	= 0x00d0, /* "Decoder empty" event */

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [patch] perf_events: even more wrong events for AMD fam10h
  2011-06-07 19:39 [patch] perf_events: more wrong events for AMD fam10h Vince Weaver
@ 2011-06-07 21:07 ` Vince Weaver
  2011-06-27 11:22   ` Peter Zijlstra
  2011-06-27 11:22 ` [patch] perf_events: " Peter Zijlstra
  1 sibling, 1 reply; 8+ messages in thread
From: Vince Weaver @ 2011-06-07 21:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar, Arnaldo Carvalho de Melo


Here are two more problems I found with the superlative "generalized" 
events on AMD fam10h.

The "l1-dcache-loads" event measures loads *and* stores.
    This might be as close as you can get on AMD, but it's still wrong
      as it's not what Intel measures.  
    My patch removes it.  Better might be to add a proper
    "l1-dcache-access" event.

The "l1-dcache-load-miss" event is an invalid event. (0x141).
    From what I can tell that event (DATA_CACHE_MISSES) does not
    take a mask.  It should be 0x41.  And it's actually measuring
    all misses, not just load misses, see above.

The "l1-dcache-stores" event does not work.  See the
     ./validation/l1-dcache-stores test found in 
     http://web.eecs.utk.edu/~vweaver1/projects/perf-events/validation.html
   So remove it until we figure out why.


Also, is the value for "no such event" 0 or -1?  The perf_event_amd.c
file seems to use them interchangably from what I can tell.

Thanks,

Vince

Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu>

diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
index fe29c1d..71987d5 100644
--- a/arch/x86/kernel/cpu/perf_event_amd.c
+++ b/arch/x86/kernel/cpu/perf_event_amd.c
@@ -7,11 +7,11 @@ static __initconst const u64 amd_hw_cache_event_ids
 {
  [ C(L1D) ] = {
 	[ C(OP_READ) ] = {
-		[ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses        */
-		[ C(RESULT_MISS)   ] = 0x0141, /* Data Cache Misses          */
+		[ C(RESULT_ACCESS) ] = 0,      /* Not available on AMD       */
+		[ C(RESULT_MISS)   ] = 0x0041, /* Data Cache Misses          */
 	},
 	[ C(OP_WRITE) ] = {
-		[ C(RESULT_ACCESS) ] = 0x0142, /* Data Cache Refills :system */
+		[ C(RESULT_ACCESS) ] = 0,      /* Data Cache Refills :system doesn't work */
 		[ C(RESULT_MISS)   ] = 0,
 	},
 	[ C(OP_PREFETCH) ] = {


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [patch] perf_events: more wrong events for AMD fam10h
  2011-06-07 19:39 [patch] perf_events: more wrong events for AMD fam10h Vince Weaver
  2011-06-07 21:07 ` [patch] perf_events: even " Vince Weaver
@ 2011-06-27 11:22 ` Peter Zijlstra
  2011-06-27 13:38   ` Robert Richter
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2011-06-27 11:22 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Robert Richter

On Tue, 2011-06-07 at 15:39 -0400, Vince Weaver wrote:
> Hello
> 
> I'm in the process of auditing perf_event's awesome "generalized events".
> 
> On AMD fam10h for some we have the following definitions:
>   cache-references = INSTRUCTION_CACHE_FETCHES	0x530080
>   cache-misses 	   = INSTRUCTION_CACHE_MISSES	0x530081
> 
> on Intel at least I'm pretty sure these events match to Last Level Cache 
> accesses/misses, not icache.  Is there a reason for this?
> 
> Attached is a patch that removes these until better events can be found.
> (LLC is tricky on AMD as it's a shared resource).
> 
> Note, l1-dcache-stores is broken too, I'm looking into it.
> 
> Thanks,
> 
> Vince
> vweaver1@eecs.utk.edu
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> index fe29c1d..a46b987 100644
> --- a/arch/x86/kernel/cpu/perf_event_amd.c
> +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> @@ -98,8 +98,6 @@ static const u64 amd_perfmon_event_map[] =
>  {
>    [PERF_COUNT_HW_CPU_CYCLES]			= 0x0076,
>    [PERF_COUNT_HW_INSTRUCTIONS]			= 0x00c0,
> -  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x0080,
> -  [PERF_COUNT_HW_CACHE_MISSES]			= 0x0081,

Would 0x40000F7E0 and 0x40000F7E1 be better?

>    [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]		= 0x00c2,
>    [PERF_COUNT_HW_BRANCH_MISSES]			= 0x00c3,
>    [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND]	= 0x00d0, /* "Decoder empty" event */


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] perf_events: even more wrong events for AMD fam10h
  2011-06-07 21:07 ` [patch] perf_events: even " Vince Weaver
@ 2011-06-27 11:22   ` Peter Zijlstra
  2011-06-27 15:51     ` Robert Richter
  2011-06-28 16:20     ` Vince Weaver
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2011-06-27 11:22 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Robert Richter, Stephane Eranian,
	Andre Przywara

On Tue, 2011-06-07 at 17:07 -0400, Vince Weaver wrote:
> Here are two more problems I found with the superlative "generalized" 
> events on AMD fam10h.
> 
> The "l1-dcache-loads" event measures loads *and* stores.
>     This might be as close as you can get on AMD, but it's still wrong
>       as it's not what Intel measures.  
>     My patch removes it.  Better might be to add a proper
>     "l1-dcache-access" event.

The question to ask is, does it still have a strong correlation?

> The "l1-dcache-load-miss" event is an invalid event. (0x141).
>     From what I can tell that event (DATA_CACHE_MISSES) does not
>     take a mask.  It should be 0x41.  And it's actually measuring
>     all misses, not just load misses, see above.

See commit 83112e688f5f05dea1e63787db9a6c16b2887a1d. Also same as above.

> The "l1-dcache-stores" event does not work.  See the
>      ./validation/l1-dcache-stores test found in 
>      http://web.eecs.utk.edu/~vweaver1/projects/perf-events/validation.html
>    So remove it until we figure out why.
> 

Robert?

> Also, is the value for "no such event" 0 or -1?  The perf_event_amd.c
> file seems to use them interchangably from what I can tell.

	val = hw_cache_event_ids[cache_type][cache_op][cache_result];

	if (val == 0)
		return -ENOENT;

	if (val == -1)
		return -EINVAL;


But yeah, somewhat inconsistent. Robert, Andre, could you guys go over
the AMD events some time?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] perf_events: more wrong events for AMD fam10h
  2011-06-27 11:22 ` [patch] perf_events: " Peter Zijlstra
@ 2011-06-27 13:38   ` Robert Richter
  0 siblings, 0 replies; 8+ messages in thread
From: Robert Richter @ 2011-06-27 13:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian

On 27.06.11 07:22:20, Peter Zijlstra wrote:
> On Tue, 2011-06-07 at 15:39 -0400, Vince Weaver wrote:
> > Hello
> > 
> > I'm in the process of auditing perf_event's awesome "generalized events".
> > 
> > On AMD fam10h for some we have the following definitions:
> >   cache-references = INSTRUCTION_CACHE_FETCHES	0x530080
> >   cache-misses 	   = INSTRUCTION_CACHE_MISSES	0x530081
> > 
> > on Intel at least I'm pretty sure these events match to Last Level Cache 
> > accesses/misses, not icache.  Is there a reason for this?
> > 
> > Attached is a patch that removes these until better events can be found.
> > (LLC is tricky on AMD as it's a shared resource).
> > 
> > Note, l1-dcache-stores is broken too, I'm looking into it.
> > 
> > Thanks,
> > 
> > Vince
> > vweaver1@eecs.utk.edu
> > 
> > diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c
> > index fe29c1d..a46b987 100644
> > --- a/arch/x86/kernel/cpu/perf_event_amd.c
> > +++ b/arch/x86/kernel/cpu/perf_event_amd.c
> > @@ -98,8 +98,6 @@ static const u64 amd_perfmon_event_map[] =
> >  {
> >    [PERF_COUNT_HW_CPU_CYCLES]			= 0x0076,
> >    [PERF_COUNT_HW_INSTRUCTIONS]			= 0x00c0,

I am not sure if Intel's LLC definition includes uncore events which
are equivalent to AMD's northbridge counter (L3) events (it could be
meant the LLC of the core only?).

Following definition taken form the Intel spec (SDM 3B, 30.2.3 and
Appendix A):

> > -  [PERF_COUNT_HW_CACHE_REFERENCES]		= 0x0080,

"This event counts requests originating from the core that reference a
cache line in the last level cache." (Intel event 2EH/4FH)

> > -  [PERF_COUNT_HW_CACHE_MISSES]			= 0x0081,

"This event counts each cache miss condition for references to the
last level cache." (Intel event 2EH/41H)

> 
> Would 0x40000F7E0 and 0x40000F7E1 be better?

Taking this is a bit tricky, we would measure all misses of the node
then. We actually would have to select the current core. But for this
we need to schedule one northbridge counter for each core. On a six
core (family 10h ref D) we don't have enough counters then. There is
also an erratum, we can not do per-core L3 measurements.

On family 15h northbridge counters are not yet implemented.

On k7/k8 we must select L2 events (0x0080/0x0081).

So even if LLC includes per-definition the L3, it is not easy to
implement. But using 0x0080/0x0081 on all models would give similar
results for every cpu family.

-Robert

> 
> >    [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]		= 0x00c2,
> >    [PERF_COUNT_HW_BRANCH_MISSES]			= 0x00c3,
> >    [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND]	= 0x00d0, /* "Decoder empty" event */
> 
> 

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] perf_events: even more wrong events for AMD fam10h
  2011-06-27 11:22   ` Peter Zijlstra
@ 2011-06-27 15:51     ` Robert Richter
  2011-06-28 16:32       ` Vince Weaver
  2011-06-28 16:20     ` Vince Weaver
  1 sibling, 1 reply; 8+ messages in thread
From: Robert Richter @ 2011-06-27 15:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vince Weaver, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Przywara, Andre

On 27.06.11 07:22:21, Peter Zijlstra wrote:
> On Tue, 2011-06-07 at 17:07 -0400, Vince Weaver wrote:
> > Here are two more problems I found with the superlative "generalized" 
> > events on AMD fam10h.
> > 
> > The "l1-dcache-loads" event measures loads *and* stores.
> >     This might be as close as you can get on AMD, but it's still wrong
> >       as it's not what Intel measures.  
> >     My patch removes it.  Better might be to add a proper
> >     "l1-dcache-access" event.
> 
> The question to ask is, does it still have a strong correlation?

Vince,

do you think it is worth to introduce l1-dcache-access?

> 
> > The "l1-dcache-load-miss" event is an invalid event. (0x141).
> >     From what I can tell that event (DATA_CACHE_MISSES) does not
> >     take a mask.  It should be 0x41.  And it's actually measuring
> >     all misses, not just load misses, see above.
> 
> See commit 83112e688f5f05dea1e63787db9a6c16b2887a1d. Also same as above.

It is still event 0x41, but bit 0 of the unit mask is set now for
family 15h.

> 
> > The "l1-dcache-stores" event does not work.  See the
> >      ./validation/l1-dcache-stores test found in 
> >      http://web.eecs.utk.edu/~vweaver1/projects/perf-events/validation.html
> >    So remove it until we figure out why.
> > 
> 
> Robert?

Will look at this.

> 
> > Also, is the value for "no such event" 0 or -1?  The perf_event_amd.c
> > file seems to use them interchangably from what I can tell.
> 
> 	val = hw_cache_event_ids[cache_type][cache_op][cache_result];
> 
> 	if (val == 0)
> 		return -ENOENT;
> 
> 	if (val == -1)
> 		return -EINVAL;
> 
> 
> But yeah, somewhat inconsistent. Robert, Andre, could you guys go over
> the AMD events some time?
> 

We will review all predefined events.

Thanks,

-Robert

-- 
Advanced Micro Devices, Inc.
Operating System Research Center


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] perf_events: even more wrong events for AMD fam10h
  2011-06-27 11:22   ` Peter Zijlstra
  2011-06-27 15:51     ` Robert Richter
@ 2011-06-28 16:20     ` Vince Weaver
  1 sibling, 0 replies; 8+ messages in thread
From: Vince Weaver @ 2011-06-28 16:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Robert Richter, Stephane Eranian,
	Andre Przywara

On Mon, 27 Jun 2011, Peter Zijlstra wrote:

> On Tue, 2011-06-07 at 17:07 -0400, Vince Weaver wrote:
> > Here are two more problems I found with the superlative "generalized" 
> > events on AMD fam10h.
> > 
> > The "l1-dcache-loads" event measures loads *and* stores.
> >     This might be as close as you can get on AMD, but it's still wrong
> >       as it's not what Intel measures.  
> >     My patch removes it.  Better might be to add a proper
> >     "l1-dcache-access" event.
> 
> The question to ask is, does it still have a strong correlation?

well then shouldn't you call it something like
  "l1-dcache-highly-correlated"
instead?

Having events measure something other than their name is just going to 
confuse users.
 
> > The "l1-dcache-load-miss" event is an invalid event. (0x141).
> >     From what I can tell that event (DATA_CACHE_MISSES) does not
> >     take a mask.  It should be 0x41.  And it's actually measuring
> >     all misses, not just load misses, see above.
> 
> See commit 83112e688f5f05dea1e63787db9a6c16b2887a1d. Also same as above.

that probably warrants a comment in the code.

> > Also, is the value for "no such event" 0 or -1?  The perf_event_amd.c
> > file seems to use them interchangably from what I can tell.
> 
> 	val = hw_cache_event_ids[cache_type][cache_op][cache_result];
> 
> 	if (val == 0)
> 		return -ENOENT;
> 
> 	if (val == -1)
> 		return -EINVAL;

what about architectures where "0" is a valid event?  I know on MIPS 
R12000 "0" means "cycles".

Vince
vweaver1@eecs.utk.edu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [patch] perf_events: even more wrong events for AMD fam10h
  2011-06-27 15:51     ` Robert Richter
@ 2011-06-28 16:32       ` Vince Weaver
  0 siblings, 0 replies; 8+ messages in thread
From: Vince Weaver @ 2011-06-28 16:32 UTC (permalink / raw)
  To: Robert Richter
  Cc: Peter Zijlstra, linux-kernel, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Stephane Eranian, Przywara, Andre

On Mon, 27 Jun 2011, Robert Richter wrote:

> 
> do you think it is worth to introduce l1-dcache-access?

It could be useful to have.  On PAPI we have the equivelent
PAPI_L1_DCA and all AMD processors support it, as do all
Intel except atom.  POWER does not though, at least not
directly.

Vince
vweaver1@eecs.utk.edu

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-06-28 16:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-07 19:39 [patch] perf_events: more wrong events for AMD fam10h Vince Weaver
2011-06-07 21:07 ` [patch] perf_events: even " Vince Weaver
2011-06-27 11:22   ` Peter Zijlstra
2011-06-27 15:51     ` Robert Richter
2011-06-28 16:32       ` Vince Weaver
2011-06-28 16:20     ` Vince Weaver
2011-06-27 11:22 ` [patch] perf_events: " Peter Zijlstra
2011-06-27 13:38   ` Robert Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).