* event group without multiplexing
@ 2015-11-11 22:05 Yuanfang Chen
2015-11-12 14:58 ` Yuanfang Chen
0 siblings, 1 reply; 11+ messages in thread
From: Yuanfang Chen @ 2015-11-11 22:05 UTC (permalink / raw)
To: linux-perf-users
Hello
I am using a haswell box. (E3-1231 v3). HT enabled.
perf stat -e \{cycles,r148,r1000248,r8d1,r40d1\} -- sleep 1
with ubuntu version of perf
Performance counter stats for 'sleep 1':
756066 cycles
5740 r1000248
5516 r8d1
9064 r40d1
0 r148
1.001770249 seconds time elapsed
with relatively new tip of perf/core:
Performance counter stats for 'sleep 1':
729403 cycles
7250 r1000248
5628 r8d1
9273 r40d1
<not supported> r148
1.001674174 seconds time elapsed
from https://download.01.org/perfmon/HSW/
SMT on
SMT off
cpu_clk_unhalted.thread cycles Fixed counter 2 Fixed counter 2
ld_blocks.no_sr r803 0,1,2,3
0,1,2,3,4,5,6,7
mem_load_uops_retired.l1_miss r8d1 0,1,2,3 0,1,2,3
mem_load_uops_retired.hit_lfb r40d1 0,1,2,3 0,1,2,3
l1d_pend_miss.pending r148 2 2
Seems these five events couldn't be counting at the same time,
although in terms of hardware they should get along. Is this a bug or
a limitation I should be aware of? Thank you so much.
Yuanfang
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing
2015-11-11 22:05 event group without multiplexing Yuanfang Chen
@ 2015-11-12 14:58 ` Yuanfang Chen
2015-11-12 15:02 ` Yuanfang Chen
0 siblings, 1 reply; 11+ messages in thread
From: Yuanfang Chen @ 2015-11-12 14:58 UTC (permalink / raw)
To: linux-perf-users
Sorry, r1000248 on haswell should be
L1D_PEND_MISS.FB_FULL 0,1,2,3 0,1,2,3,4,5,6,7
On Wed, Nov 11, 2015 at 5:05 PM, Yuanfang Chen <cyfmxc@gmail.com> wrote:
> Hello
>
> I am using a haswell box. (E3-1231 v3). HT enabled.
>
> perf stat -e \{cycles,r148,r1000248,r8d1,r40d1\} -- sleep 1
>
> with ubuntu version of perf
>
> Performance counter stats for 'sleep 1':
>
> 756066 cycles
> 5740 r1000248
> 5516 r8d1
> 9064 r40d1
> 0 r148
>
> 1.001770249 seconds time elapsed
>
> with relatively new tip of perf/core:
>
> Performance counter stats for 'sleep 1':
>
> 729403 cycles
> 7250 r1000248
> 5628 r8d1
> 9273 r40d1
> <not supported> r148
>
> 1.001674174 seconds time elapsed
>
> from https://download.01.org/perfmon/HSW/
> SMT on
> SMT off
> cpu_clk_unhalted.thread cycles Fixed counter 2 Fixed counter 2
> ld_blocks.no_sr r803 0,1,2,3
> 0,1,2,3,4,5,6,7
> mem_load_uops_retired.l1_miss r8d1 0,1,2,3 0,1,2,3
> mem_load_uops_retired.hit_lfb r40d1 0,1,2,3 0,1,2,3
> l1d_pend_miss.pending r148 2 2
>
> Seems these five events couldn't be counting at the same time,
> although in terms of hardware they should get along. Is this a bug or
> a limitation I should be aware of? Thank you so much.
>
> Yuanfang
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing
2015-11-12 14:58 ` Yuanfang Chen
@ 2015-11-12 15:02 ` Yuanfang Chen
2015-11-12 16:51 ` Vince Weaver
2015-11-12 17:40 ` Michael Petlan
0 siblings, 2 replies; 11+ messages in thread
From: Yuanfang Chen @ 2015-11-12 15:02 UTC (permalink / raw)
To: linux-perf-users
And the case can be reduced to
perf stat -e \{r1000248,r148\} -- sleep 1
Performance counter stats for 'sleep 1':
7008 r1000248
<not supported> r148
1.001885804 seconds time elapsed
On Thu, Nov 12, 2015 at 9:58 AM, Yuanfang Chen <cyfmxc@gmail.com> wrote:
> Sorry, r1000248 on haswell should be
>
> L1D_PEND_MISS.FB_FULL 0,1,2,3 0,1,2,3,4,5,6,7
>
> On Wed, Nov 11, 2015 at 5:05 PM, Yuanfang Chen <cyfmxc@gmail.com> wrote:
>> Hello
>>
>> I am using a haswell box. (E3-1231 v3). HT enabled.
>>
>> perf stat -e \{cycles,r148,r1000248,r8d1,r40d1\} -- sleep 1
>>
>> with ubuntu version of perf
>>
>> Performance counter stats for 'sleep 1':
>>
>> 756066 cycles
>> 5740 r1000248
>> 5516 r8d1
>> 9064 r40d1
>> 0 r148
>>
>> 1.001770249 seconds time elapsed
>>
>> with relatively new tip of perf/core:
>>
>> Performance counter stats for 'sleep 1':
>>
>> 729403 cycles
>> 7250 r1000248
>> 5628 r8d1
>> 9273 r40d1
>> <not supported> r148
>>
>> 1.001674174 seconds time elapsed
>>
>> from https://download.01.org/perfmon/HSW/
>> SMT on
>> SMT off
>> cpu_clk_unhalted.thread cycles Fixed counter 2 Fixed counter 2
>> ld_blocks.no_sr r803 0,1,2,3
>> 0,1,2,3,4,5,6,7
>> mem_load_uops_retired.l1_miss r8d1 0,1,2,3 0,1,2,3
>> mem_load_uops_retired.hit_lfb r40d1 0,1,2,3 0,1,2,3
>> l1d_pend_miss.pending r148 2 2
>>
>> Seems these five events couldn't be counting at the same time,
>> although in terms of hardware they should get along. Is this a bug or
>> a limitation I should be aware of? Thank you so much.
>>
>> Yuanfang
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing
2015-11-12 15:02 ` Yuanfang Chen
@ 2015-11-12 16:51 ` Vince Weaver
2015-11-12 17:40 ` Michael Petlan
1 sibling, 0 replies; 11+ messages in thread
From: Vince Weaver @ 2015-11-12 16:51 UTC (permalink / raw)
To: Yuanfang Chen; +Cc: linux-perf-users
On Thu, 12 Nov 2015, Yuanfang Chen wrote:
> And the case can be reduced to
>
> perf stat -e \{r1000248,r148\} -- sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 7008 r1000248
> <not supported> r148
the kernel thinks that all events of type 0x48 (L1D_PEND_MISS) can only go
into one of the counters, and thus you can't have multiple at the same
time.
If this is a bug you'll need to report it to the perf_event developers.
I'd double check intel's documents to see what they actuall say about this
class of events on your processor.
Vince
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing
2015-11-12 15:02 ` Yuanfang Chen
2015-11-12 16:51 ` Vince Weaver
@ 2015-11-12 17:40 ` Michael Petlan
2015-11-12 17:51 ` Vince Weaver
1 sibling, 1 reply; 11+ messages in thread
From: Michael Petlan @ 2015-11-12 17:40 UTC (permalink / raw)
To: Yuanfang Chen; +Cc: linux-perf-users
On Thu, 2015-11-12 at 10:02 -0500, Yuanfang Chen wrote:
> And the case can be reduced to
>
> perf stat -e \{r1000248,r148\} -- sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 7008 r1000248
> <not supported> r148
>
> 1.001885804 seconds time elapsed
>
perf stat -e \{r1000248,r148\} -- sleep 1
Performance counter stats for 'sleep 1':
9,605 r1000248
553,201 r148
Intel Ivy Bridge EP machine (family = 6, model = 62),
kernel/perf version 4.3.0
The Intel 64 and IA-32 Architectures Software Developer's
Manual, vol-3B, part-2 agrees with the download.01.org's
json file on the fact that r0148 is limited to the counter
2 only on both IVB and HSW.
But I can't find any reference about the r1000248 event
in the Intel Manual. Vince, am I missing something?
Anyway, shouldn't it behave the same on IVB and HSW?
Michael
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing
2015-11-12 17:40 ` Michael Petlan
@ 2015-11-12 17:51 ` Vince Weaver
2015-11-12 18:00 ` Vince Weaver
0 siblings, 1 reply; 11+ messages in thread
From: Vince Weaver @ 2015-11-12 17:51 UTC (permalink / raw)
To: Michael Petlan; +Cc: Yuanfang Chen, linux-perf-users
On Thu, 12 Nov 2015, Michael Petlan wrote:
> But I can't find any reference about the r1000248 event
> in the Intel Manual. Vince, am I missing something?
>
> Anyway, shouldn't it behave the same on IVB and HSW?
I am going off the code in
arch/x86/kernel/cpu/perf_event_intel.c
(current git tree)
SNB and HSW have as a constraint
INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */
IVB has as a constraint
INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), /* L1D_PEND_MISS.PENDING */
Notice the difference. Not sure if this is a bug in the kernel or what,
but that's what's there and I think it's what's causing the issue.
Vince
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing
2015-11-12 17:51 ` Vince Weaver
@ 2015-11-12 18:00 ` Vince Weaver
2015-11-12 18:12 ` Yuanfang Chen
2015-11-17 0:39 ` Andi Kleen
0 siblings, 2 replies; 11+ messages in thread
From: Vince Weaver @ 2015-11-12 18:00 UTC (permalink / raw)
To: linux-perf-users; +Cc: Michael Petlan, Yuanfang Chen, Andi Kleen
ccing Andi Kleen as he's the one who introduced the Haswell constraint
code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why
it is contraining all L1D_PEND_MISS.* events rather than just
L1D_PEND_MISS.PENDING on IVB and BDW.
On Thu, 12 Nov 2015, Vince Weaver wrote:
> On Thu, 12 Nov 2015, Michael Petlan wrote:
>
> > But I can't find any reference about the r1000248 event
> > in the Intel Manual. Vince, am I missing something?
> >
> > Anyway, shouldn't it behave the same on IVB and HSW?
>
> I am going off the code in
> arch/x86/kernel/cpu/perf_event_intel.c
> (current git tree)
>
> SNB and HSW have as a constraint
> INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */
>
> IVB has as a constraint
> INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), /* L1D_PEND_MISS.PENDING */
>
> Notice the difference. Not sure if this is a bug in the kernel or what,
> but that's what's there and I think it's what's causing the issue.
>
> Vince
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing
2015-11-12 18:00 ` Vince Weaver
@ 2015-11-12 18:12 ` Yuanfang Chen
2015-11-17 0:39 ` Andi Kleen
1 sibling, 0 replies; 11+ messages in thread
From: Yuanfang Chen @ 2015-11-12 18:12 UTC (permalink / raw)
To: Vince Weaver; +Cc: linux-perf-users, Michael Petlan, Andi Kleen
Seems that 0x248 show up in https://download.01.org/perfmon/HSW/ but
not in manual. If 0x248 does not exist, then the code should be
correct.
On Thu, Nov 12, 2015 at 1:00 PM, Vince Weaver <vincent.weaver@maine.edu> wrote:
>
> ccing Andi Kleen as he's the one who introduced the Haswell constraint
> code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why
> it is contraining all L1D_PEND_MISS.* events rather than just
> L1D_PEND_MISS.PENDING on IVB and BDW.
>
> On Thu, 12 Nov 2015, Vince Weaver wrote:
>
>> On Thu, 12 Nov 2015, Michael Petlan wrote:
>>
>> > But I can't find any reference about the r1000248 event
>> > in the Intel Manual. Vince, am I missing something?
>> >
>> > Anyway, shouldn't it behave the same on IVB and HSW?
>>
>> I am going off the code in
>> arch/x86/kernel/cpu/perf_event_intel.c
>> (current git tree)
>>
>> SNB and HSW have as a constraint
>> INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */
>>
>> IVB has as a constraint
>> INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), /* L1D_PEND_MISS.PENDING */
>>
>> Notice the difference. Not sure if this is a bug in the kernel or what,
>> but that's what's there and I think it's what's causing the issue.
>>
>> Vince
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing
2015-11-12 18:00 ` Vince Weaver
2015-11-12 18:12 ` Yuanfang Chen
@ 2015-11-17 0:39 ` Andi Kleen
2015-11-17 3:03 ` Yuanfang Chen
1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2015-11-17 0:39 UTC (permalink / raw)
To: Vince Weaver; +Cc: linux-perf-users, Michael Petlan, Yuanfang Chen
On Thu, Nov 12, 2015 at 01:00:09PM -0500, Vince Weaver wrote:
>
> ccing Andi Kleen as he's the one who introduced the Haswell constraint
> code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why
> it is contraining all L1D_PEND_MISS.* events rather than just
> L1D_PEND_MISS.PENDING on IVB and BDW.
Yes it looks like Haswell could use the more limited constraint as Broadwell
or IvyBridge. I don't remember why it ended up this way.
Please submit a patch.
-Andi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing
2015-11-17 0:39 ` Andi Kleen
@ 2015-11-17 3:03 ` Yuanfang Chen
2015-11-21 0:58 ` Andi Kleen
0 siblings, 1 reply; 11+ messages in thread
From: Yuanfang Chen @ 2015-11-17 3:03 UTC (permalink / raw)
To: Andi Kleen; +Cc: Vince Weaver, linux-perf-users, Michael Petlan
Is this ok?
From 47d52ccfae56a8eb702fee6ccf327780265df2cf Mon Sep 17 00:00:00 2001
From: Yuanfang Chen <cheny@udel.edu>
Date: Mon, 16 Nov 2015 21:53:53 -0500
Subject: [PATCH 1/1] perf/x86/intel: make L1D_PEND_MISS.FB_FULL not
constrained on haswell
Signed-off-by: Yuanfang Chen <cheny@udel.edu>
---
arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c
b/arch/x86/kernel/cpu/perf_event_intel.c
index f63360b..e2a4300 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -232,7 +232,7 @@ static struct event_constraint
intel_hsw_event_constraints[] = {
FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */
- INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */
+ INTEL_UEVENT_CONSTRAINT(0x148, 0x4), /* L1D_PEND_MISS.PENDING */
INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
/* CYCLE_ACTIVITY.CYCLES_L1D_PENDING */
--
1.9.1
On Mon, Nov 16, 2015 at 7:39 PM, Andi Kleen <ak@linux.intel.com> wrote:
> On Thu, Nov 12, 2015 at 01:00:09PM -0500, Vince Weaver wrote:
>>
>> ccing Andi Kleen as he's the one who introduced the Haswell constraint
>> code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why
>> it is contraining all L1D_PEND_MISS.* events rather than just
>> L1D_PEND_MISS.PENDING on IVB and BDW.
>
> Yes it looks like Haswell could use the more limited constraint as Broadwell
> or IvyBridge. I don't remember why it ended up this way.
> Please submit a patch.
>
> -Andi
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: event group without multiplexing
2015-11-17 3:03 ` Yuanfang Chen
@ 2015-11-21 0:58 ` Andi Kleen
0 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2015-11-21 0:58 UTC (permalink / raw)
To: Yuanfang Chen; +Cc: Vince Weaver, linux-perf-users, Michael Petlan
On Mon, Nov 16, 2015 at 10:03:28PM -0500, Yuanfang Chen wrote:
> Is this ok?
Sorry for the delay. The patch looks good to me. You may need
to resend to peterz@infradead.org linux-kernel@vger.kernel.org
Reviewed-by: Andi Kleen <ak@linux.intel.com>
>
>
> From 47d52ccfae56a8eb702fee6ccf327780265df2cf Mon Sep 17 00:00:00 2001
> From: Yuanfang Chen <cheny@udel.edu>
> Date: Mon, 16 Nov 2015 21:53:53 -0500
> Subject: [PATCH 1/1] perf/x86/intel: make L1D_PEND_MISS.FB_FULL not
> constrained on haswell
>
> Signed-off-by: Yuanfang Chen <cheny@udel.edu>
> ---
> arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c
> b/arch/x86/kernel/cpu/perf_event_intel.c
> index f63360b..e2a4300 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -232,7 +232,7 @@ static struct event_constraint
> intel_hsw_event_constraints[] = {
> FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
> FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
> FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */
> - INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */
> + INTEL_UEVENT_CONSTRAINT(0x148, 0x4), /* L1D_PEND_MISS.PENDING */
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-11-21 0:58 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-11 22:05 event group without multiplexing Yuanfang Chen
2015-11-12 14:58 ` Yuanfang Chen
2015-11-12 15:02 ` Yuanfang Chen
2015-11-12 16:51 ` Vince Weaver
2015-11-12 17:40 ` Michael Petlan
2015-11-12 17:51 ` Vince Weaver
2015-11-12 18:00 ` Vince Weaver
2015-11-12 18:12 ` Yuanfang Chen
2015-11-17 0:39 ` Andi Kleen
2015-11-17 3:03 ` Yuanfang Chen
2015-11-21 0:58 ` Andi Kleen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.