* event group without multiplexing @ 2015-11-11 22:05 Yuanfang Chen 2015-11-12 14:58 ` Yuanfang Chen 0 siblings, 1 reply; 11+ messages in thread From: Yuanfang Chen @ 2015-11-11 22:05 UTC (permalink / raw) To: linux-perf-users Hello I am using a haswell box. (E3-1231 v3). HT enabled. perf stat -e \{cycles,r148,r1000248,r8d1,r40d1\} -- sleep 1 with ubuntu version of perf Performance counter stats for 'sleep 1': 756066 cycles 5740 r1000248 5516 r8d1 9064 r40d1 0 r148 1.001770249 seconds time elapsed with relatively new tip of perf/core: Performance counter stats for 'sleep 1': 729403 cycles 7250 r1000248 5628 r8d1 9273 r40d1 <not supported> r148 1.001674174 seconds time elapsed from https://download.01.org/perfmon/HSW/ SMT on SMT off cpu_clk_unhalted.thread cycles Fixed counter 2 Fixed counter 2 ld_blocks.no_sr r803 0,1,2,3 0,1,2,3,4,5,6,7 mem_load_uops_retired.l1_miss r8d1 0,1,2,3 0,1,2,3 mem_load_uops_retired.hit_lfb r40d1 0,1,2,3 0,1,2,3 l1d_pend_miss.pending r148 2 2 Seems these five events couldn't be counting at the same time, although in terms of hardware they should get along. Is this a bug or a limitation I should be aware of? Thank you so much. Yuanfang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing 2015-11-11 22:05 event group without multiplexing Yuanfang Chen @ 2015-11-12 14:58 ` Yuanfang Chen 2015-11-12 15:02 ` Yuanfang Chen 0 siblings, 1 reply; 11+ messages in thread From: Yuanfang Chen @ 2015-11-12 14:58 UTC (permalink / raw) To: linux-perf-users Sorry, r1000248 on haswell should be L1D_PEND_MISS.FB_FULL 0,1,2,3 0,1,2,3,4,5,6,7 On Wed, Nov 11, 2015 at 5:05 PM, Yuanfang Chen <cyfmxc@gmail.com> wrote: > Hello > > I am using a haswell box. (E3-1231 v3). HT enabled. > > perf stat -e \{cycles,r148,r1000248,r8d1,r40d1\} -- sleep 1 > > with ubuntu version of perf > > Performance counter stats for 'sleep 1': > > 756066 cycles > 5740 r1000248 > 5516 r8d1 > 9064 r40d1 > 0 r148 > > 1.001770249 seconds time elapsed > > with relatively new tip of perf/core: > > Performance counter stats for 'sleep 1': > > 729403 cycles > 7250 r1000248 > 5628 r8d1 > 9273 r40d1 > <not supported> r148 > > 1.001674174 seconds time elapsed > > from https://download.01.org/perfmon/HSW/ > SMT on > SMT off > cpu_clk_unhalted.thread cycles Fixed counter 2 Fixed counter 2 > ld_blocks.no_sr r803 0,1,2,3 > 0,1,2,3,4,5,6,7 > mem_load_uops_retired.l1_miss r8d1 0,1,2,3 0,1,2,3 > mem_load_uops_retired.hit_lfb r40d1 0,1,2,3 0,1,2,3 > l1d_pend_miss.pending r148 2 2 > > Seems these five events couldn't be counting at the same time, > although in terms of hardware they should get along. Is this a bug or > a limitation I should be aware of? Thank you so much. > > Yuanfang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing 2015-11-12 14:58 ` Yuanfang Chen @ 2015-11-12 15:02 ` Yuanfang Chen 2015-11-12 16:51 ` Vince Weaver 2015-11-12 17:40 ` Michael Petlan 0 siblings, 2 replies; 11+ messages in thread From: Yuanfang Chen @ 2015-11-12 15:02 UTC (permalink / raw) To: linux-perf-users And the case can be reduced to perf stat -e \{r1000248,r148\} -- sleep 1 Performance counter stats for 'sleep 1': 7008 r1000248 <not supported> r148 1.001885804 seconds time elapsed On Thu, Nov 12, 2015 at 9:58 AM, Yuanfang Chen <cyfmxc@gmail.com> wrote: > Sorry, r1000248 on haswell should be > > L1D_PEND_MISS.FB_FULL 0,1,2,3 0,1,2,3,4,5,6,7 > > On Wed, Nov 11, 2015 at 5:05 PM, Yuanfang Chen <cyfmxc@gmail.com> wrote: >> Hello >> >> I am using a haswell box. (E3-1231 v3). HT enabled. >> >> perf stat -e \{cycles,r148,r1000248,r8d1,r40d1\} -- sleep 1 >> >> with ubuntu version of perf >> >> Performance counter stats for 'sleep 1': >> >> 756066 cycles >> 5740 r1000248 >> 5516 r8d1 >> 9064 r40d1 >> 0 r148 >> >> 1.001770249 seconds time elapsed >> >> with relatively new tip of perf/core: >> >> Performance counter stats for 'sleep 1': >> >> 729403 cycles >> 7250 r1000248 >> 5628 r8d1 >> 9273 r40d1 >> <not supported> r148 >> >> 1.001674174 seconds time elapsed >> >> from https://download.01.org/perfmon/HSW/ >> SMT on >> SMT off >> cpu_clk_unhalted.thread cycles Fixed counter 2 Fixed counter 2 >> ld_blocks.no_sr r803 0,1,2,3 >> 0,1,2,3,4,5,6,7 >> mem_load_uops_retired.l1_miss r8d1 0,1,2,3 0,1,2,3 >> mem_load_uops_retired.hit_lfb r40d1 0,1,2,3 0,1,2,3 >> l1d_pend_miss.pending r148 2 2 >> >> Seems these five events couldn't be counting at the same time, >> although in terms of hardware they should get along. Is this a bug or >> a limitation I should be aware of? Thank you so much. >> >> Yuanfang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing 2015-11-12 15:02 ` Yuanfang Chen @ 2015-11-12 16:51 ` Vince Weaver 2015-11-12 17:40 ` Michael Petlan 1 sibling, 0 replies; 11+ messages in thread From: Vince Weaver @ 2015-11-12 16:51 UTC (permalink / raw) To: Yuanfang Chen; +Cc: linux-perf-users On Thu, 12 Nov 2015, Yuanfang Chen wrote: > And the case can be reduced to > > perf stat -e \{r1000248,r148\} -- sleep 1 > > Performance counter stats for 'sleep 1': > > 7008 r1000248 > <not supported> r148 the kernel thinks that all events of type 0x48 (L1D_PEND_MISS) can only go into one of the counters, and thus you can't have multiple at the same time. If this is a bug you'll need to report it to the perf_event developers. I'd double check intel's documents to see what they actuall say about this class of events on your processor. Vince ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing 2015-11-12 15:02 ` Yuanfang Chen 2015-11-12 16:51 ` Vince Weaver @ 2015-11-12 17:40 ` Michael Petlan 2015-11-12 17:51 ` Vince Weaver 1 sibling, 1 reply; 11+ messages in thread From: Michael Petlan @ 2015-11-12 17:40 UTC (permalink / raw) To: Yuanfang Chen; +Cc: linux-perf-users On Thu, 2015-11-12 at 10:02 -0500, Yuanfang Chen wrote: > And the case can be reduced to > > perf stat -e \{r1000248,r148\} -- sleep 1 > > Performance counter stats for 'sleep 1': > > 7008 r1000248 > <not supported> r148 > > 1.001885804 seconds time elapsed > perf stat -e \{r1000248,r148\} -- sleep 1 Performance counter stats for 'sleep 1': 9,605 r1000248 553,201 r148 Intel Ivy Bridge EP machine (family = 6, model = 62), kernel/perf version 4.3.0 The Intel 64 and IA-32 Architectures Software Developer's Manual, vol-3B, part-2 agrees with the download.01.org's json file on the fact that r0148 is limited to the counter 2 only on both IVB and HSW. But I can't find any reference about the r1000248 event in the Intel Manual. Vince, am I missing something? Anyway, shouldn't it behave the same on IVB and HSW? Michael ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing 2015-11-12 17:40 ` Michael Petlan @ 2015-11-12 17:51 ` Vince Weaver 2015-11-12 18:00 ` Vince Weaver 0 siblings, 1 reply; 11+ messages in thread From: Vince Weaver @ 2015-11-12 17:51 UTC (permalink / raw) To: Michael Petlan; +Cc: Yuanfang Chen, linux-perf-users On Thu, 12 Nov 2015, Michael Petlan wrote: > But I can't find any reference about the r1000248 event > in the Intel Manual. Vince, am I missing something? > > Anyway, shouldn't it behave the same on IVB and HSW? I am going off the code in arch/x86/kernel/cpu/perf_event_intel.c (current git tree) SNB and HSW have as a constraint INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */ IVB has as a constraint INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), /* L1D_PEND_MISS.PENDING */ Notice the difference. Not sure if this is a bug in the kernel or what, but that's what's there and I think it's what's causing the issue. Vince ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing 2015-11-12 17:51 ` Vince Weaver @ 2015-11-12 18:00 ` Vince Weaver 2015-11-12 18:12 ` Yuanfang Chen 2015-11-17 0:39 ` Andi Kleen 0 siblings, 2 replies; 11+ messages in thread From: Vince Weaver @ 2015-11-12 18:00 UTC (permalink / raw) To: linux-perf-users; +Cc: Michael Petlan, Yuanfang Chen, Andi Kleen ccing Andi Kleen as he's the one who introduced the Haswell constraint code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why it is contraining all L1D_PEND_MISS.* events rather than just L1D_PEND_MISS.PENDING on IVB and BDW. On Thu, 12 Nov 2015, Vince Weaver wrote: > On Thu, 12 Nov 2015, Michael Petlan wrote: > > > But I can't find any reference about the r1000248 event > > in the Intel Manual. Vince, am I missing something? > > > > Anyway, shouldn't it behave the same on IVB and HSW? > > I am going off the code in > arch/x86/kernel/cpu/perf_event_intel.c > (current git tree) > > SNB and HSW have as a constraint > INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */ > > IVB has as a constraint > INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), /* L1D_PEND_MISS.PENDING */ > > Notice the difference. Not sure if this is a bug in the kernel or what, > but that's what's there and I think it's what's causing the issue. > > Vince ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing 2015-11-12 18:00 ` Vince Weaver @ 2015-11-12 18:12 ` Yuanfang Chen 2015-11-17 0:39 ` Andi Kleen 1 sibling, 0 replies; 11+ messages in thread From: Yuanfang Chen @ 2015-11-12 18:12 UTC (permalink / raw) To: Vince Weaver; +Cc: linux-perf-users, Michael Petlan, Andi Kleen Seems that 0x248 show up in https://download.01.org/perfmon/HSW/ but not in manual. If 0x248 does not exist, then the code should be correct. On Thu, Nov 12, 2015 at 1:00 PM, Vince Weaver <vincent.weaver@maine.edu> wrote: > > ccing Andi Kleen as he's the one who introduced the Haswell constraint > code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why > it is contraining all L1D_PEND_MISS.* events rather than just > L1D_PEND_MISS.PENDING on IVB and BDW. > > On Thu, 12 Nov 2015, Vince Weaver wrote: > >> On Thu, 12 Nov 2015, Michael Petlan wrote: >> >> > But I can't find any reference about the r1000248 event >> > in the Intel Manual. Vince, am I missing something? >> > >> > Anyway, shouldn't it behave the same on IVB and HSW? >> >> I am going off the code in >> arch/x86/kernel/cpu/perf_event_intel.c >> (current git tree) >> >> SNB and HSW have as a constraint >> INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */ >> >> IVB has as a constraint >> INTEL_UEVENT_CONSTRAINT(0x0148, 0x4), /* L1D_PEND_MISS.PENDING */ >> >> Notice the difference. Not sure if this is a bug in the kernel or what, >> but that's what's there and I think it's what's causing the issue. >> >> Vince ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing 2015-11-12 18:00 ` Vince Weaver 2015-11-12 18:12 ` Yuanfang Chen @ 2015-11-17 0:39 ` Andi Kleen 2015-11-17 3:03 ` Yuanfang Chen 1 sibling, 1 reply; 11+ messages in thread From: Andi Kleen @ 2015-11-17 0:39 UTC (permalink / raw) To: Vince Weaver; +Cc: linux-perf-users, Michael Petlan, Yuanfang Chen On Thu, Nov 12, 2015 at 01:00:09PM -0500, Vince Weaver wrote: > > ccing Andi Kleen as he's the one who introduced the Haswell constraint > code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why > it is contraining all L1D_PEND_MISS.* events rather than just > L1D_PEND_MISS.PENDING on IVB and BDW. Yes it looks like Haswell could use the more limited constraint as Broadwell or IvyBridge. I don't remember why it ended up this way. Please submit a patch. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: event group without multiplexing 2015-11-17 0:39 ` Andi Kleen @ 2015-11-17 3:03 ` Yuanfang Chen 2015-11-21 0:58 ` Andi Kleen 0 siblings, 1 reply; 11+ messages in thread From: Yuanfang Chen @ 2015-11-17 3:03 UTC (permalink / raw) To: Andi Kleen; +Cc: Vince Weaver, linux-perf-users, Michael Petlan Is this ok? From 47d52ccfae56a8eb702fee6ccf327780265df2cf Mon Sep 17 00:00:00 2001 From: Yuanfang Chen <cheny@udel.edu> Date: Mon, 16 Nov 2015 21:53:53 -0500 Subject: [PATCH 1/1] perf/x86/intel: make L1D_PEND_MISS.FB_FULL not constrained on haswell Signed-off-by: Yuanfang Chen <cheny@udel.edu> --- arch/x86/kernel/cpu/perf_event_intel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index f63360b..e2a4300 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -232,7 +232,7 @@ static struct event_constraint intel_hsw_event_constraints[] = { FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */ FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */ FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */ - INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */ + INTEL_UEVENT_CONSTRAINT(0x148, 0x4), /* L1D_PEND_MISS.PENDING */ INTEL_UEVENT_CONSTRAINT(0x01c0, 0x2), /* INST_RETIRED.PREC_DIST */ INTEL_EVENT_CONSTRAINT(0xcd, 0x8), /* MEM_TRANS_RETIRED.LOAD_LATENCY */ /* CYCLE_ACTIVITY.CYCLES_L1D_PENDING */ -- 1.9.1 On Mon, Nov 16, 2015 at 7:39 PM, Andi Kleen <ak@linux.intel.com> wrote: > On Thu, Nov 12, 2015 at 01:00:09PM -0500, Vince Weaver wrote: >> >> ccing Andi Kleen as he's the one who introduced the Haswell constraint >> code in 3a632cb229bfb18b6d09822cc842451ea46c013e so maybe he knows why >> it is contraining all L1D_PEND_MISS.* events rather than just >> L1D_PEND_MISS.PENDING on IVB and BDW. > > Yes it looks like Haswell could use the more limited constraint as Broadwell > or IvyBridge. I don't remember why it ended up this way. > Please submit a patch. > > -Andi ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: event group without multiplexing 2015-11-17 3:03 ` Yuanfang Chen @ 2015-11-21 0:58 ` Andi Kleen 0 siblings, 0 replies; 11+ messages in thread From: Andi Kleen @ 2015-11-21 0:58 UTC (permalink / raw) To: Yuanfang Chen; +Cc: Vince Weaver, linux-perf-users, Michael Petlan On Mon, Nov 16, 2015 at 10:03:28PM -0500, Yuanfang Chen wrote: > Is this ok? Sorry for the delay. The patch looks good to me. You may need to resend to peterz@infradead.org linux-kernel@vger.kernel.org Reviewed-by: Andi Kleen <ak@linux.intel.com> > > > From 47d52ccfae56a8eb702fee6ccf327780265df2cf Mon Sep 17 00:00:00 2001 > From: Yuanfang Chen <cheny@udel.edu> > Date: Mon, 16 Nov 2015 21:53:53 -0500 > Subject: [PATCH 1/1] perf/x86/intel: make L1D_PEND_MISS.FB_FULL not > constrained on haswell > > Signed-off-by: Yuanfang Chen <cheny@udel.edu> > --- > arch/x86/kernel/cpu/perf_event_intel.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c > b/arch/x86/kernel/cpu/perf_event_intel.c > index f63360b..e2a4300 100644 > --- a/arch/x86/kernel/cpu/perf_event_intel.c > +++ b/arch/x86/kernel/cpu/perf_event_intel.c > @@ -232,7 +232,7 @@ static struct event_constraint > intel_hsw_event_constraints[] = { > FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */ > FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */ > FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */ > - INTEL_EVENT_CONSTRAINT(0x48, 0x4), /* L1D_PEND_MISS.* */ > + INTEL_UEVENT_CONSTRAINT(0x148, 0x4), /* L1D_PEND_MISS.PENDING */ ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-11-21 0:58 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-11-11 22:05 event group without multiplexing Yuanfang Chen 2015-11-12 14:58 ` Yuanfang Chen 2015-11-12 15:02 ` Yuanfang Chen 2015-11-12 16:51 ` Vince Weaver 2015-11-12 17:40 ` Michael Petlan 2015-11-12 17:51 ` Vince Weaver 2015-11-12 18:00 ` Vince Weaver 2015-11-12 18:12 ` Yuanfang Chen 2015-11-17 0:39 ` Andi Kleen 2015-11-17 3:03 ` Yuanfang Chen 2015-11-21 0:58 ` Andi Kleen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.