All of lore.kernel.org
 help / color / mirror / Atom feed
* event group without multiplexing
@ 2015-11-11 22:05 Yuanfang Chen
  2015-11-12 14:58 ` Yuanfang Chen
  0 siblings, 1 reply; 11+ messages in thread
From: Yuanfang Chen @ 2015-11-11 22:05 UTC (permalink / raw)
  To: linux-perf-users

Hello

I am using a haswell box. (E3-1231 v3). HT enabled.

perf stat -e \{cycles,r148,r1000248,r8d1,r40d1\} -- sleep 1

with ubuntu version of perf

Performance counter stats for 'sleep 1':

            756066      cycles
              5740      r1000248
              5516      r8d1
              9064      r40d1
                 0        r148

       1.001770249 seconds time elapsed

with relatively new tip of perf/core:

 Performance counter stats for 'sleep 1':

            729403      cycles
              7250      r1000248
              5628      r8d1
              9273      r40d1
   <not supported>      r148

       1.001674174 seconds time elapsed

from https://download.01.org/perfmon/HSW/
                                                         SMT on
          SMT off
cpu_clk_unhalted.thread cycles           Fixed counter 2      Fixed counter 2
ld_blocks.no_sr    r803                        0,1,2,3
 0,1,2,3,4,5,6,7
mem_load_uops_retired.l1_miss r8d1   0,1,2,3                  0,1,2,3
mem_load_uops_retired.hit_lfb r40d1    0,1,2,3                  0,1,2,3
l1d_pend_miss.pending r148                2                          2

Seems these five events couldn't be counting at the same time,
although in terms of hardware they should get along.  Is this a bug or
a limitation I should be aware of? Thank you so much.

Yuanfang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: event group without multiplexing
  2015-11-11 22:05 event group without multiplexing Yuanfang Chen
@ 2015-11-12 14:58 ` Yuanfang Chen
  2015-11-12 15:02   ` Yuanfang Chen
  0 siblings, 1 reply; 11+ messages in thread
From: Yuanfang Chen @ 2015-11-12 14:58 UTC (permalink / raw)
  To: linux-perf-users

Sorry, r1000248 on haswell should be

L1D_PEND_MISS.FB_FULL     0,1,2,3       0,1,2,3,4,5,6,7

On Wed, Nov 11, 2015 at 5:05 PM, Yuanfang Chen <cyfmxc@gmail.com> wrote:
> Hello
>
> I am using a haswell box. (E3-1231 v3). HT enabled.
>
> perf stat -e \{cycles,r148,r1000248,r8d1,r40d1\} -- sleep 1
>
> with ubuntu version of perf
>
> Performance counter stats for 'sleep 1':
>
>             756066      cycles
>               5740      r1000248
>               5516      r8d1
>               9064      r40d1
>                  0        r148
>
>        1.001770249 seconds time elapsed
>
> with relatively new tip of perf/core:
>
>  Performance counter stats for 'sleep 1':
>
>             729403      cycles
>               7250      r1000248
>               5628      r8d1
>               9273      r40d1
>    <not supported>      r148
>
>        1.001674174 seconds time elapsed
>
> from https://download.01.org/perfmon/HSW/
>                                                          SMT on
>           SMT off
> cpu_clk_unhalted.thread cycles           Fixed counter 2      Fixed counter 2
> ld_blocks.no_sr    r803                        0,1,2,3
>  0,1,2,3,4,5,6,7
> mem_load_uops_retired.l1_miss r8d1   0,1,2,3                  0,1,2,3
> mem_load_uops_retired.hit_lfb r40d1    0,1,2,3                  0,1,2,3
> l1d_pend_miss.pending r148                2                          2
>
> Seems these five events couldn't be counting at the same time,
> although in terms of hardware they should get along.  Is this a bug or
> a limitation I should be aware of? Thank you so much.
>
> Yuanfang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: event group without multiplexing
  2015-11-12 14:58 ` Yuanfang Chen
@ 2015-11-12 15:02   ` Yuanfang Chen
  2015-11-12 16:51     ` Vince Weaver
  2015-11-12 17:40     ` Michael Petlan
  0 siblings, 2 replies; 11+ messages in thread
From: Yuanfang Chen @ 2015-11-12 15:02 UTC (permalink / raw)
  To: linux-perf-users

And the case can be reduced to

perf stat -e \{r1000248,r148\} -- sleep 1

 Performance counter stats for 'sleep 1':

              7008      r1000248
   <not supported>      r148

       1.001885804 seconds time elapsed

On Thu, Nov 12, 2015 at 9:58 AM, Yuanfang Chen <cyfmxc@gmail.com> wrote:
> Sorry, r1000248 on haswell should be
>
> L1D_PEND_MISS.FB_FULL     0,1,2,3       0,1,2,3,4,5,6,7
>
> On Wed, Nov 11, 2015 at 5:05 PM, Yuanfang Chen <cyfmxc@gmail.com> wrote:
>> Hello
>>
>> I am using a haswell box. (E3-1231 v3). HT enabled.
>>
>> perf stat -e \{cycles,r148,r1000248,r8d1,r40d1\} -- sleep 1
>>
>> with ubuntu version of perf
>>
>> Performance counter stats for 'sleep 1':
>>
>>             756066      cycles
>>               5740      r1000248
>>               5516      r8d1
>>               9064      r40d1
>>                  0        r148
>>
>>        1.001770249 seconds time elapsed
>>
>> with relatively new tip of perf/core:
>>
>>  Performance counter stats for 'sleep 1':
>>
>>             729403      cycles
>>               7250      r1000248
>>               5628      r8d1
>>               9273      r40d1
>>    <not supported>      r148
>>
>>        1.001674174 seconds time elapsed
>>
>> from https://download.01.org/perfmon/HSW/
>>                                                          SMT on
>>           SMT off
>> cpu_clk_unhalted.thread cycles           Fixed counter 2      Fixed counter 2
>> ld_blocks.no_sr    r803                        0,1,2,3
>>  0,1,2,3,4,5,6,7
>> mem_load_uops_retired.l1_miss r8d1   0,1,2,3                  0,1,2,3
>> mem_load_uops_retired.hit_lfb r40d1    0,1,2,3                  0,1,2,3
>> l1d_pend_miss.pending r148                2                          2
>>
>> Seems these five events couldn't be counting at the same time,
>> although in terms of hardware they should get along.  Is this a bug or
>> a limitation I should be aware of? Thank you so much.
>>
>> Yuanfang

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: event group without multiplexing
  2015-11-12 15:02   ` Yuanfang Chen
@ 2015-11-12 16:51     ` Vince Weaver
  2015-11-12 17:40     ` Michael Petlan
  1 sibling, 0 replies; 11+ messages in thread
From: Vince Weaver @ 2015-11-12 16:51 UTC (permalink / raw)
  To: Yuanfang Chen; +Cc: linux-perf-users

On Thu, 12 Nov 2015, Yuanfang Chen wrote:

> And the case can be reduced to
> 
> perf stat -e \{r1000248,r148\} -- sleep 1
> 
>  Performance counter stats for 'sleep 1':
> 
>               7008      r1000248
>    <not supported>      r148

the kernel thinks that all events of type 0x48 (L1D_PEND_MISS) can only go 
into one of the counters, and thus you can't have multiple at the same 
time.

If this is a bug you'll need to report it to the perf_event developers.
I'd double check intel's documents to see what they actuall say about this 
class of events on your processor.

Vince

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: event group without multiplexing
  2015-11-12 15:02   ` Yuanfang Chen
  2015-11-12 16:51     ` Vince Weaver
@ 2015-11-12 17:40     ` Michael Petlan
  2015-11-12 17:51       ` Vince Weaver
  1 sibling, 1 reply; 11+ messages in thread
From: Michael Petlan @ 2015-11-12 17:40 UTC (permalink / raw)
  To: Yuanfang Chen; +Cc: linux-perf-users

On Thu, 2015-11-12 at 10:02 -0500, Yuanfang Chen wrote:
> And the case can be reduced to
> 
> perf stat -e \{r1000248,r148\} -- sleep 1
> 
>  Performance counter stats for 'sleep 1':
> 
>               7008      r1000248
>    <not supported>      r148
> 
>        1.001885804 seconds time elapsed
> 

perf stat -e \{r1000248,r148\} -- sleep 1

 Performance counter stats for 'sleep 1':

             9,605      r1000248
           553,201      r148


Intel Ivy Bridge EP machine (family = 6, model = 62),
kernel/perf version 4.3.0

The Intel 64 and IA-32 Architectures Software Developer's
Manual, vol-3B, part-2 agrees with the download.01.org's
json file on the fact that r0148 is limited to the counter
2 only on both IVB and HSW.

But I can't find any reference about the r1000248 event
in the Intel Manual. Vince, am I missing something?

Anyway, shouldn't it behave the same on IVB and HSW?

Michael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: event group without multiplexing
  2015-11-12 17:40     ` Michael Petlan
@ 2015-11-12 17:51       ` Vince Weaver
  2015-11-12 18:00         ` Vince Weaver
  0 siblings, 1 reply; 11+ messages in thread
From: Vince Weaver @ 2015-11-12 17:51 UTC (permalink / raw)
  To: Michael Petlan; +Cc: Yuanfang Chen, linux-perf-users

On Thu, 12 Nov 2015, Michael Petlan wrote:

> But I can't find any reference about the r1000248 event
> in the Intel Manual. Vince, am I missing something?
> 
> Anyway, shouldn't it behave the same on IVB and HSW?

I am going off the code in
	arch/x86/kernel/cpu/perf_event_intel.c
(current git tree)

SNB and HSW have as a constraint
	INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */

IVB has as a constraint
	INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), /* L1D_PEND_MISS.PENDING */
	
Notice the difference.  Not sure if this is a bug in the kernel or what, 
but that's what's there and I think it's what's causing the issue.

Vince

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: event group without multiplexing
  2015-11-12 17:51       ` Vince Weaver
@ 2015-11-12 18:00         ` Vince Weaver
  2015-11-12 18:12           ` Yuanfang Chen
  2015-11-17  0:39           ` Andi Kleen
  0 siblings, 2 replies; 11+ messages in thread
From: Vince Weaver @ 2015-11-12 18:00 UTC (permalink / raw)
  To: linux-perf-users; +Cc: Michael Petlan, Yuanfang Chen, Andi Kleen


ccing Andi Kleen as he's the one who introduced the Haswell constraint 
code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why
it is contraining all L1D_PEND_MISS.* events rather than just 
L1D_PEND_MISS.PENDING on IVB and BDW.

On Thu, 12 Nov 2015, Vince Weaver wrote:

> On Thu, 12 Nov 2015, Michael Petlan wrote:
> 
> > But I can't find any reference about the r1000248 event
> > in the Intel Manual. Vince, am I missing something?
> > 
> > Anyway, shouldn't it behave the same on IVB and HSW?
> 
> I am going off the code in
> 	arch/x86/kernel/cpu/perf_event_intel.c
> (current git tree)
> 
> SNB and HSW have as a constraint
> 	INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */
> 
> IVB has as a constraint
> 	INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), /* L1D_PEND_MISS.PENDING */
> 	
> Notice the difference.  Not sure if this is a bug in the kernel or what, 
> but that's what's there and I think it's what's causing the issue.
> 
> Vince

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: event group without multiplexing
  2015-11-12 18:00         ` Vince Weaver
@ 2015-11-12 18:12           ` Yuanfang Chen
  2015-11-17  0:39           ` Andi Kleen
  1 sibling, 0 replies; 11+ messages in thread
From: Yuanfang Chen @ 2015-11-12 18:12 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-perf-users, Michael Petlan, Andi Kleen

Seems that 0x248 show up in  https://download.01.org/perfmon/HSW/ but
not in manual. If 0x248 does not exist, then the code should be
correct.

On Thu, Nov 12, 2015 at 1:00 PM, Vince Weaver <vincent.weaver@maine.edu> wrote:
>
> ccing Andi Kleen as he's the one who introduced the Haswell constraint
> code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why
> it is contraining all L1D_PEND_MISS.* events rather than just
> L1D_PEND_MISS.PENDING on IVB and BDW.
>
> On Thu, 12 Nov 2015, Vince Weaver wrote:
>
>> On Thu, 12 Nov 2015, Michael Petlan wrote:
>>
>> > But I can't find any reference about the r1000248 event
>> > in the Intel Manual. Vince, am I missing something?
>> >
>> > Anyway, shouldn't it behave the same on IVB and HSW?
>>
>> I am going off the code in
>>       arch/x86/kernel/cpu/perf_event_intel.c
>> (current git tree)
>>
>> SNB and HSW have as a constraint
>>       INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */
>>
>> IVB has as a constraint
>>       INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), /* L1D_PEND_MISS.PENDING */
>>
>> Notice the difference.  Not sure if this is a bug in the kernel or what,
>> but that's what's there and I think it's what's causing the issue.
>>
>> Vince

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: event group without multiplexing
  2015-11-12 18:00         ` Vince Weaver
  2015-11-12 18:12           ` Yuanfang Chen
@ 2015-11-17  0:39           ` Andi Kleen
  2015-11-17  3:03             ` Yuanfang Chen
  1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2015-11-17  0:39 UTC (permalink / raw)
  To: Vince Weaver; +Cc: linux-perf-users, Michael Petlan, Yuanfang Chen

On Thu, Nov 12, 2015 at 01:00:09PM -0500, Vince Weaver wrote:
> 
> ccing Andi Kleen as he's the one who introduced the Haswell constraint 
> code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why
> it is contraining all L1D_PEND_MISS.* events rather than just 
> L1D_PEND_MISS.PENDING on IVB and BDW.

Yes it looks like Haswell could use the more limited constraint as Broadwell
or IvyBridge. I don't remember why it ended up this way.
Please submit a patch.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: event group without multiplexing
  2015-11-17  0:39           ` Andi Kleen
@ 2015-11-17  3:03             ` Yuanfang Chen
  2015-11-21  0:58               ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Yuanfang Chen @ 2015-11-17  3:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Vince Weaver, linux-perf-users, Michael Petlan

Is this ok?


From 47d52ccfae56a8eb702fee6ccf327780265df2cf Mon Sep 17 00:00:00 2001
From: Yuanfang Chen <cheny@udel.edu>
Date: Mon, 16 Nov 2015 21:53:53 -0500
Subject: [PATCH 1/1] perf/x86/intel: make L1D_PEND_MISS.FB_FULL not
 constrained on haswell

Signed-off-by: Yuanfang Chen <cheny@udel.edu>
---
 arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel.c
b/arch/x86/kernel/cpu/perf_event_intel.c
index f63360b..e2a4300 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -232,7 +232,7 @@ static struct event_constraint
intel_hsw_event_constraints[] = {
  FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
  FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
  FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */
- INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */
+ INTEL_UEVENT_CONSTRAINT(0x148, 0x4),  /* L1D_PEND_MISS.PENDING */
  INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */
  INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */
  /* CYCLE_ACTIVITY.CYCLES_L1D_PENDING */
--
1.9.1

On Mon, Nov 16, 2015 at 7:39 PM, Andi Kleen <ak@linux.intel.com> wrote:
> On Thu, Nov 12, 2015 at 01:00:09PM -0500, Vince Weaver wrote:
>>
>> ccing Andi Kleen as he's the one who introduced the Haswell constraint
>> code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why
>> it is contraining all L1D_PEND_MISS.* events rather than just
>> L1D_PEND_MISS.PENDING on IVB and BDW.
>
> Yes it looks like Haswell could use the more limited constraint as Broadwell
> or IvyBridge. I don't remember why it ended up this way.
> Please submit a patch.
>
> -Andi

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: event group without multiplexing
  2015-11-17  3:03             ` Yuanfang Chen
@ 2015-11-21  0:58               ` Andi Kleen
  0 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2015-11-21  0:58 UTC (permalink / raw)
  To: Yuanfang Chen; +Cc: Vince Weaver, linux-perf-users, Michael Petlan

On Mon, Nov 16, 2015 at 10:03:28PM -0500, Yuanfang Chen wrote:
> Is this ok?

Sorry for the delay. The patch looks good to me. You may need
to resend to peterz@infradead.org linux-kernel@vger.kernel.org

Reviewed-by: Andi Kleen <ak@linux.intel.com>

> 
> 
> From 47d52ccfae56a8eb702fee6ccf327780265df2cf Mon Sep 17 00:00:00 2001
> From: Yuanfang Chen <cheny@udel.edu>
> Date: Mon, 16 Nov 2015 21:53:53 -0500
> Subject: [PATCH 1/1] perf/x86/intel: make L1D_PEND_MISS.FB_FULL not
>  constrained on haswell
> 
> Signed-off-by: Yuanfang Chen <cheny@udel.edu>
> ---
>  arch/x86/kernel/cpu/perf_event_intel.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/perf_event_intel.c
> b/arch/x86/kernel/cpu/perf_event_intel.c
> index f63360b..e2a4300 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel.c
> @@ -232,7 +232,7 @@ static struct event_constraint
> intel_hsw_event_constraints[] = {
>   FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */
>   FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */
>   FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */
> - INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */
> + INTEL_UEVENT_CONSTRAINT(0x148, 0x4),  /* L1D_PEND_MISS.PENDING */

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-11-21  0:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-11 22:05 event group without multiplexing Yuanfang Chen
2015-11-12 14:58 ` Yuanfang Chen
2015-11-12 15:02   ` Yuanfang Chen
2015-11-12 16:51     ` Vince Weaver
2015-11-12 17:40     ` Michael Petlan
2015-11-12 17:51       ` Vince Weaver
2015-11-12 18:00         ` Vince Weaver
2015-11-12 18:12           ` Yuanfang Chen
2015-11-17  0:39           ` Andi Kleen
2015-11-17  3:03             ` Yuanfang Chen
2015-11-21  0:58               ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.