* Haswell mem-store question
@ 2014-05-14 20:50 Don Zickus
2014-05-14 22:07 ` Stephane Eranian
0 siblings, 1 reply; 3+ messages in thread
From: Don Zickus @ 2014-05-14 20:50 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, mingo, peterz, acme, jolsa, jmario, eranian
Hi Andi,
Joe was playing with our c2c tool today and noticed we were losing store
events from perf's mem-stores event. Upon investigation we stumbled into
some differences in data that Haswell reports vs. Ivy/Sandy Bridge.
This leaves our tool needing two different paths depending on the
architect, which seems odd.
I was hoping you or someone can explain to me the correct way to interpret
the mem-stores data.
My current problem is mem_lvl. It can be defined as
/* memory hierarchy (memory level, hit or miss) */
#define PERF_MEM_LVL_NA 0x01 /* not available */
#define PERF_MEM_LVL_HIT 0x02 /* hit level */
#define PERF_MEM_LVL_MISS 0x04 /* miss level */
#define PERF_MEM_LVL_L1 0x08 /* L1 */
#define PERF_MEM_LVL_LFB 0x10 /* Line Fill Buffer */
#define PERF_MEM_LVL_L2 0x20 /* L2 */
#define PERF_MEM_LVL_L3 0x40 /* L3 */
#define PERF_MEM_LVL_LOC_RAM 0x80 /* Local DRAM */
#define PERF_MEM_LVL_REM_RAM1 0x100 /* Remote DRAM (1 hop) */
#define PERF_MEM_LVL_REM_RAM2 0x200 /* Remote DRAM (2 hops) */
#define PERF_MEM_LVL_REM_CCE1 0x400 /* Remote Cache (1 hop) */
#define PERF_MEM_LVL_REM_CCE2 0x800 /* Remote Cache (2 hops) */
#define PERF_MEM_LVL_IO 0x1000 /* I/O memory */
#define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */
#define PERF_MEM_LVL_SHIFT 5
Currently IVB and SNB use LVL_L1 & (LVL_HIT or LVL_MISS) seen here in
arch/x86/kernel/cpu/perf_event_intel_ds.c
static u64 precise_store_data(u64 status)
{
union intel_x86_pebs_dse dse;
u64 val = P(OP, STORE) | P(SNOOP, NA) | P(LVL, L1) | P(TLB, L2);
^^^^^^^^^
defined here
dse.val = status;
<snip>
/*
* bit 0: hit L1 data cache
* if not set, then all we know is that
* it missed L1D
*/
if (dse.st_l1d_hit)
val |= P(LVL, HIT);
else
val |= P(LVL, MISS);
^^^^^^^
updated here
<snip>
}
However Haswell does something different:
static u64 precise_store_data_hsw(u64 status)
{
union perf_mem_data_src dse;
dse.val = 0;
dse.mem_op = PERF_MEM_OP_STORE;
dse.mem_lvl = PERF_MEM_LVL_NA;
^^^^^^
defines NA here
if (status & 1)
dse.mem_lvl = PERF_MEM_LVL_L1;
^^^^^^^
switch to LVL_L1 here
<snip>
}
So our c2c tool kept store statistics to help determine what types of
stores are causing conflicts
<snip>
} else if (op & P(OP,STORE)) {
/* store */
stats->t.store++;
if (!daddr) {
stats->t.st_noadrs++;
return -1;
}
if (lvl & P(LVL,HIT)) {
if (lvl & P(LVL,UNC)) stats->t.st_uncache++;
if (lvl & P(LVL,L1 )) stats->t.st_l1hit++;
} else if (lvl & P(LVL,MISS)) {
if (lvl & P(LVL,L1)) stats->t.st_l1miss++;
}
}
<snip>
This no longer works on Haswell because Haswell doesn't set LVL_HIT or
LVL_MISS any more. Instead it uses LVL_NA or LVL_L1.
So from a generic tool perspective, what is the recommended way to
properly capture these stats to cover both arches? The hack I have now
is:
} else if (op & P(OP,STORE)) {
/* store */
stats->t.store++;
if (!daddr) {
stats->t.st_noadrs++;
return -1;
}
if ((lvl & P(LVL,HIT)) || (lvl & P(LVL,L1))) {
if (lvl & P(LVL,UNC)) stats->t.st_uncache++;
if (lvl & P(LVL,L1 )) stats->t.st_l1hit++;
} else if ((lvl & P(LVL,MISS)) || (lvl & P(LVL,NA))) {
if (lvl & P(LVL,L1)) stats->t.st_l1miss++;
if (lvl & P(LVL,NA)) stats->t.st_l1miss++;
}
}
I am not sure that is really future proof. Thoughts? Help?
Cheers,
Don
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Haswell mem-store question
2014-05-14 20:50 Haswell mem-store question Don Zickus
@ 2014-05-14 22:07 ` Stephane Eranian
2014-05-15 2:34 ` Andi Kleen
0 siblings, 1 reply; 3+ messages in thread
From: Stephane Eranian @ 2014-05-14 22:07 UTC (permalink / raw)
To: Don Zickus
Cc: Andi Kleen, LKML, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Jiri Olsa, Joe Mario
On Wed, May 14, 2014 at 10:50 PM, Don Zickus <dzickus@redhat.com> wrote:
>
> Hi Andi,
>
> Joe was playing with our c2c tool today and noticed we were losing store
> events from perf's mem-stores event. Upon investigation we stumbled into
> some differences in data that Haswell reports vs. Ivy/Sandy Bridge.
>
> This leaves our tool needing two different paths depending on the
> architect, which seems odd.
>
> I was hoping you or someone can explain to me the correct way to interpret
> the mem-stores data.
>
> My current problem is mem_lvl. It can be defined as
>
> /* memory hierarchy (memory level, hit or miss) */
> #define PERF_MEM_LVL_NA 0x01 /* not available */
> #define PERF_MEM_LVL_HIT 0x02 /* hit level */
> #define PERF_MEM_LVL_MISS 0x04 /* miss level */
> #define PERF_MEM_LVL_L1 0x08 /* L1 */
> #define PERF_MEM_LVL_LFB 0x10 /* Line Fill Buffer */
> #define PERF_MEM_LVL_L2 0x20 /* L2 */
> #define PERF_MEM_LVL_L3 0x40 /* L3 */
> #define PERF_MEM_LVL_LOC_RAM 0x80 /* Local DRAM */
> #define PERF_MEM_LVL_REM_RAM1 0x100 /* Remote DRAM (1 hop) */
> #define PERF_MEM_LVL_REM_RAM2 0x200 /* Remote DRAM (2 hops) */
> #define PERF_MEM_LVL_REM_CCE1 0x400 /* Remote Cache (1 hop) */
> #define PERF_MEM_LVL_REM_CCE2 0x800 /* Remote Cache (2 hops) */
> #define PERF_MEM_LVL_IO 0x1000 /* I/O memory */
> #define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */
> #define PERF_MEM_LVL_SHIFT 5
>
> Currently IVB and SNB use LVL_L1 & (LVL_HIT or LVL_MISS) seen here in
> arch/x86/kernel/cpu/perf_event_intel_ds.c
>
> static u64 precise_store_data(u64 status)
> {
> union intel_x86_pebs_dse dse;
> u64 val = P(OP, STORE) | P(SNOOP, NA) | P(LVL, L1) | P(TLB, L2);
> ^^^^^^^^^
> defined here
>
> dse.val = status;
>
> <snip>
> /*
> * bit 0: hit L1 data cache
> * if not set, then all we know is that
> * it missed L1D
> */
> if (dse.st_l1d_hit)
> val |= P(LVL, HIT);
> else
> val |= P(LVL, MISS);
>
> ^^^^^^^
> updated here
>
> <snip>
> }
>
> However Haswell does something different:
>
> static u64 precise_store_data_hsw(u64 status)
> {
> union perf_mem_data_src dse;
>
> dse.val = 0;
> dse.mem_op = PERF_MEM_OP_STORE;
> dse.mem_lvl = PERF_MEM_LVL_NA;
> ^^^^^^
> defines NA here
>
>
> if (status & 1)
> dse.mem_lvl = PERF_MEM_LVL_L1;
>
> ^^^^^^^
> switch to LVL_L1 here
I think this code has a problem here.
I need to mark the hit or miss status.
I think it should do:
if (status & 1)
dse.mem_lvl = PERF_MEM_LVL_L1|PERF_MEM_LVL_HIT;
else
dse.mem_lvl = PERF_MEM_LVL_L1|PERF_MEM_LVL_MISS;
Otherwise you have L1 as the level with no hit/miss info.
> <snip>
> }
>
> So our c2c tool kept store statistics to help determine what types of
> stores are causing conflicts
>
> <snip>
> } else if (op & P(OP,STORE)) {
> /* store */
> stats->t.store++;
>
> if (!daddr) {
> stats->t.st_noadrs++;
> return -1;
> }
>
> if (lvl & P(LVL,HIT)) {
> if (lvl & P(LVL,UNC)) stats->t.st_uncache++;
> if (lvl & P(LVL,L1 )) stats->t.st_l1hit++;
> } else if (lvl & P(LVL,MISS)) {
> if (lvl & P(LVL,L1)) stats->t.st_l1miss++;
> }
> }
> <snip>
>
> This no longer works on Haswell because Haswell doesn't set LVL_HIT or
> LVL_MISS any more. Instead it uses LVL_NA or LVL_L1.
>
> So from a generic tool perspective, what is the recommended way to
> properly capture these stats to cover both arches? The hack I have now
> is:
>
> } else if (op & P(OP,STORE)) {
> /* store */
> stats->t.store++;
>
> if (!daddr) {
> stats->t.st_noadrs++;
> return -1;
> }
>
> if ((lvl & P(LVL,HIT)) || (lvl & P(LVL,L1))) {
> if (lvl & P(LVL,UNC)) stats->t.st_uncache++;
> if (lvl & P(LVL,L1 )) stats->t.st_l1hit++;
> } else if ((lvl & P(LVL,MISS)) || (lvl & P(LVL,NA))) {
> if (lvl & P(LVL,L1)) stats->t.st_l1miss++;
> if (lvl & P(LVL,NA)) stats->t.st_l1miss++;
> }
> }
>
> I am not sure that is really future proof. Thoughts? Help?
>
> Cheers,
> Don
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Haswell mem-store question
2014-05-14 22:07 ` Stephane Eranian
@ 2014-05-15 2:34 ` Andi Kleen
0 siblings, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2014-05-15 2:34 UTC (permalink / raw)
To: Stephane Eranian
Cc: Don Zickus, Andi Kleen, LKML, Ingo Molnar, Peter Zijlstra,
Arnaldo Carvalho de Melo, Jiri Olsa, Joe Mario
> I think it should do:
>
> if (status & 1)
> dse.mem_lvl = PERF_MEM_LVL_L1|PERF_MEM_LVL_HIT;
> else
> dse.mem_lvl = PERF_MEM_LVL_L1|PERF_MEM_LVL_MISS;
>
> Otherwise you have L1 as the level with no hit/miss info.
Agreed.
BTW the line before is also not always corect, and any event not
explicitely store can only fill in NA
-Andi
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-05-15 2:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-14 20:50 Haswell mem-store question Don Zickus
2014-05-14 22:07 ` Stephane Eranian
2014-05-15 2:34 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).