linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events
@ 2016-08-25 11:47 Alexey Brodkin
  2016-08-26 17:30 ` Vineet Gupta
  0 siblings, 1 reply; 11+ messages in thread
From: Alexey Brodkin @ 2016-08-25 11:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-snps-arc, Alexey Brodkin, Vineet Gupta, Thomas Gleixner,
	Arnaldo Carvalho de Melo, Peter Zijlstra, stable

We used to live with PERF_COUNT_HW_CACHE_REFERENCES and
PERF_COUNT_HW_CACHE_REFERENCES not specified on ARC.

Those events are actually aliases to 2 cache events that we do support
and so this change sets "cache-reference" and "cache-misses" events
in the same way as "L1-dcache-loads" and L1-dcache-load-misses.

And while at it adding debug info for cache events as well as doing a
subtle fix in HW events debug info - config value is much better
represented by hex so we may see not only event index but as well other
control bits set (if they exist).

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
---
 arch/arc/include/asm/perf_event.h | 3 +++
 arch/arc/kernel/perf_event.c      | 6 ++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/arc/include/asm/perf_event.h b/arch/arc/include/asm/perf_event.h
index 5f07176..9185541 100644
--- a/arch/arc/include/asm/perf_event.h
+++ b/arch/arc/include/asm/perf_event.h
@@ -118,6 +118,9 @@ static const char * const arc_pmu_ev_hw_map[] = {
 	[PERF_COUNT_ARC_ICM] = "icm",		/* I-cache Miss */
 	[PERF_COUNT_ARC_EDTLB] = "edtlb",	/* D-TLB Miss */
 	[PERF_COUNT_ARC_EITLB] = "eitlb",	/* I-TLB Miss */
+
+	[PERF_COUNT_HW_CACHE_REFERENCES] = "imemrdc",	/* Instr: mem read cached */
+	[PERF_COUNT_HW_CACHE_MISSES] = "dclm",		/* D-cache Load Miss */
 };
 
 #define C(_x)			PERF_COUNT_HW_CACHE_##_x
diff --git a/arch/arc/kernel/perf_event.c b/arch/arc/kernel/perf_event.c
index 08f03d9..2ce24e7 100644
--- a/arch/arc/kernel/perf_event.c
+++ b/arch/arc/kernel/perf_event.c
@@ -179,8 +179,8 @@ static int arc_pmu_event_init(struct perf_event *event)
 		if (arc_pmu->ev_hw_idx[event->attr.config] < 0)
 			return -ENOENT;
 		hwc->config |= arc_pmu->ev_hw_idx[event->attr.config];
-		pr_debug("init event %d with h/w %d \'%s\'\n",
-			 (int) event->attr.config, (int) hwc->config,
+		pr_debug("init event %d with h/w %08x \'%s\'\n",
+			 (int)event->attr.config, (int)hwc->config,
 			 arc_pmu_ev_hw_map[event->attr.config]);
 		return 0;
 
@@ -189,6 +189,8 @@ static int arc_pmu_event_init(struct perf_event *event)
 		if (ret < 0)
 			return ret;
 		hwc->config |= arc_pmu->ev_hw_idx[ret];
+		pr_debug("init cache event with h/w %08x \'%s\'\n",
+			 (int)hwc->config, arc_pmu_ev_hw_map[ret]);
 		return 0;
 	default:
 		return -ENOENT;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events
  2016-08-25 11:47 [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events Alexey Brodkin
@ 2016-08-26 17:30 ` Vineet Gupta
  2016-08-31 19:05   ` Vineet Gupta
  2016-09-14 17:53   ` [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events Vineet Gupta
  0 siblings, 2 replies; 11+ messages in thread
From: Vineet Gupta @ 2016-08-26 17:30 UTC (permalink / raw)
  To: Alexey Brodkin, linux-kernel, Peter Zijlstra
  Cc: linux-snps-arc, Thomas Gleixner, Arnaldo Carvalho de Melo, stable

On 08/25/2016 04:49 AM, Alexey Brodkin wrote:
> ...
>  	[PERF_COUNT_ARC_EDTLB] = "edtlb",	/* D-TLB Miss */
>  	[PERF_COUNT_ARC_EITLB] = "eitlb",	/* I-TLB Miss */
> +
> +	[PERF_COUNT_HW_CACHE_REFERENCES] = "imemrdc",	/* Instr: mem read cached */
> +	[PERF_COUNT_HW_CACHE_MISSES] = "dclm",		/* D-cache Load Miss */

I think this is duplicating a mistake we already have. I vaguely remember when
doing some hackbench profiling last year with range based profiling confined to
memset routine and saw that L1-dcache-misses was counting zero. This is because it
only counts LD misses while memset only does ST.

Performance counter stats for '/sbin/hackbench':

     0 L1-dcache-misses
     0 L1-dcache-load-misses
     1846082 L1-dcache-store-misses


@PeterZ do you concur that is wrong and we ought to setup 2 counters to do this
correctly ?

-Vineet

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events
  2016-08-26 17:30 ` Vineet Gupta
@ 2016-08-31 19:05   ` Vineet Gupta
  2016-09-01  8:33     ` Peter Zijlstra
  2016-09-14 17:53   ` [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events Vineet Gupta
  1 sibling, 1 reply; 11+ messages in thread
From: Vineet Gupta @ 2016-08-31 19:05 UTC (permalink / raw)
  To: Peter Zijlstra, Jiri Olsa
  Cc: Alexey Brodkin, linux-kernel, Arnaldo Carvalho de Melo,
	linux-snps-arc, linux-perf-users, Will Deacon

On 08/26/2016 10:31 AM, Vineet Gupta wrote:
> On 08/25/2016 04:49 AM, Alexey Brodkin wrote:
>> ...
>>  	[PERF_COUNT_ARC_EDTLB] = "edtlb",	/* D-TLB Miss */
>>  	[PERF_COUNT_ARC_EITLB] = "eitlb",	/* I-TLB Miss */
>> +
>> +	[PERF_COUNT_HW_CACHE_REFERENCES] = "imemrdc",	/* Instr: mem read cached */
>> +	[PERF_COUNT_HW_CACHE_MISSES] = "dclm",		/* D-cache Load Miss */
> I think this is duplicating a mistake we already have. I vaguely remember when
> doing some hackbench profiling last year with range based profiling confined to
> memset routine and saw that L1-dcache-misses was counting zero. This is because it
> only counts LD misses while memset only does ST.
>
> Performance counter stats for '/sbin/hackbench':
>
>      0 L1-dcache-misses
>      0 L1-dcache-load-misses
>      1846082 L1-dcache-store-misses
>
>
> @PeterZ do you concur that is wrong and we ought to setup 2 counters to do this
> correctly ?

Hi Peter / Will,

Can you provide some guidance here. So I looked at what others do -
ARMV7_PERFCTR_L1_DCACHE_REFILL counts both load and store misses, while ARC has 2
separate conditions for load or stores. Is there an existing mechanism to "group"
/ "add" them to give a cumulative PERF_COUNT_HW_CACHE_MISSES - is that what perf
event grouping is ?

Quoting from perf wiki @ https://perf.wiki.kernel.org/index.php/Tutorial

"It can be interesting to try and pack events in a way that guarantees that event
A and B are always measured together. Although the perf_events kernel interface
provides support for event grouping, the current perf tool does *not*."

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events
  2016-08-31 19:05   ` Vineet Gupta
@ 2016-09-01  8:33     ` Peter Zijlstra
  2016-09-20 20:56       ` perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events) Vineet Gupta
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2016-09-01  8:33 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Jiri Olsa, Alexey Brodkin, linux-kernel,
	Arnaldo Carvalho de Melo, linux-snps-arc, linux-perf-users,
	Will Deacon

On Wed, Aug 31, 2016 at 12:05:14PM -0700, Vineet Gupta wrote:
> On 08/26/2016 10:31 AM, Vineet Gupta wrote:
> > On 08/25/2016 04:49 AM, Alexey Brodkin wrote:
> >> ...
> >>  	[PERF_COUNT_ARC_EDTLB] = "edtlb",	/* D-TLB Miss */
> >>  	[PERF_COUNT_ARC_EITLB] = "eitlb",	/* I-TLB Miss */
> >> +
> >> +	[PERF_COUNT_HW_CACHE_REFERENCES] = "imemrdc",	/* Instr: mem read cached */
> >> +	[PERF_COUNT_HW_CACHE_MISSES] = "dclm",		/* D-cache Load Miss */
> > I think this is duplicating a mistake we already have. I vaguely remember when
> > doing some hackbench profiling last year with range based profiling confined to
> > memset routine and saw that L1-dcache-misses was counting zero. This is because it
> > only counts LD misses while memset only does ST.
> >
> > Performance counter stats for '/sbin/hackbench':
> >
> >      0 L1-dcache-misses
> >      0 L1-dcache-load-misses
> >      1846082 L1-dcache-store-misses
> >
> >
> > @PeterZ do you concur that is wrong and we ought to setup 2 counters to do this
> > correctly ?
> 
> Hi Peter / Will,
> 
> Can you provide some guidance here. So I looked at what others do -
> ARMV7_PERFCTR_L1_DCACHE_REFILL counts both load and store misses, while ARC has 2
> separate conditions for load or stores. Is there an existing mechanism to "group"
> / "add" them to give a cumulative PERF_COUNT_HW_CACHE_MISSES 

Nope. So I would not try and use these generic events. In other news, it
seems like there's finally some progress on the JSON patches:

  https://lkml.kernel.org/r/20160831114254.GA9001@krava

Which would make using non-standard events easier.

> - is that what perf event grouping is ?

Again, nope. Perf event groups are single counter (so no implicit
addition) that are co-scheduled on the PMU.

> Quoting from perf wiki @ https://perf.wiki.kernel.org/index.php/Tutorial
> 
> "It can be interesting to try and pack events in a way that guarantees that event
> A and B are always measured together. Although the perf_events kernel interface
> provides support for event grouping, the current perf tool does *not*."

That seems out-dated, Jiri added grouping support to perf-tool quite a
while back.

You can do it like:

	perf stat -e '{cycles,instructions}'

Which will place the cycles event and the instructions event in a group
and thereby guarantee they're co-scheduled.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events
  2016-08-26 17:30 ` Vineet Gupta
  2016-08-31 19:05   ` Vineet Gupta
@ 2016-09-14 17:53   ` Vineet Gupta
  1 sibling, 0 replies; 11+ messages in thread
From: Vineet Gupta @ 2016-09-14 17:53 UTC (permalink / raw)
  To: Alexey Brodkin, linux-kernel, Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Thomas Gleixner, linux-snps-arc, stable

On 08/26/2016 10:30 AM, Vineet Gupta wrote:
> On 08/25/2016 04:49 AM, Alexey Brodkin wrote:
>> ...
>>  	[PERF_COUNT_ARC_EDTLB] = "edtlb",	/* D-TLB Miss */
>>  	[PERF_COUNT_ARC_EITLB] = "eitlb",	/* I-TLB Miss */
>> +
>> +	[PERF_COUNT_HW_CACHE_REFERENCES] = "imemrdc",	/* Instr: mem read cached */
>> +	[PERF_COUNT_HW_CACHE_MISSES] = "dclm",		/* D-cache Load Miss */
> 
> I think this is duplicating a mistake we already have. I vaguely remember when
> doing some hackbench profiling last year with range based profiling confined to
> memset routine and saw that L1-dcache-misses was counting zero. This is because it
> only counts LD misses while memset only does ST.

So given that this is the best we got, I'm going to merge this anyways.

-Vineet

> 
> Performance counter stats for '/sbin/hackbench':
> 
>      0 L1-dcache-misses
>      0 L1-dcache-load-misses
>      1846082 L1-dcache-store-misses
> 
> 
> @PeterZ do you concur that is wrong and we ought to setup 2 counters to do this
> correctly ?
> 
> -Vineet
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)
  2016-09-01  8:33     ` Peter Zijlstra
@ 2016-09-20 20:56       ` Vineet Gupta
  2016-09-22  0:43         ` Paul Clarke
  0 siblings, 1 reply; 11+ messages in thread
From: Vineet Gupta @ 2016-09-20 20:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-perf-users, Alexey Brodkin, Will Deacon, linux-kernel,
	Arnaldo Carvalho de Melo, linux-snps-arc, Jiri Olsa

On 09/01/2016 01:33 AM, Peter Zijlstra wrote:
>> - is that what perf event grouping is ?
>
> Again, nope. Perf event groups are single counter (so no implicit
> addition) that are co-scheduled on the PMU.

I'm not sure I understand - does this require specific PMU/arch support - as in
multiple conditions feeding to same counter. How does perf user make use of this
info - I tried googling around but can't seem to find anything which explains the
semantics.

I can see that group events to work on ARC (although in our case a counter can
cont one condition at a time only) and the results seem to be similar whther we
group or not.

------------->8------------
[ARCLinux]# perf stat -e {cycles,instructions} hackbench
Running with 10*40 (== 400) tasks.
Time: 37.430

 Performance counter stats for 'hackbench':

        3487777173    cycles
        1351709784    instructions            #    0.39  insn per cycle

      38.957481536 seconds time elapsed

[ARCLinux]# perf stat -e cycles hackbench
Running with 10*40 (== 400) tasks.
Time: 36.735

 Performance counter stats for 'hackbench':

        3426151391    cycles

      38.247235981 seconds time elapsed

[ARCLinux]#
[ARCLinux]# perf stat -e instructions hackbench
Running with 10*40 (== 400) tasks.
Time: 37.537

 Performance counter stats for 'hackbench':

        1355421559    instructions

      39.061784281 seconds time elapsed
------------->8------------

...
> 
> You can do it like:
> 
> 	perf stat -e '{cycles,instructions}'
> 
> Which will place the cycles event and the instructions event in a group
> and thereby guarantee they're co-scheduled.

Again when you say co-scheduled what do you mean - why would anyone use the event
grouping - is it when they only have 1 counter and they want to count 2
conditions/events at the same time - isn't this same as event multiplexing ?

-Vineet

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)
  2016-09-20 20:56       ` perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events) Vineet Gupta
@ 2016-09-22  0:43         ` Paul Clarke
  2016-09-22  7:56           ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Clarke @ 2016-09-22  0:43 UTC (permalink / raw)
  To: Vineet Gupta, Peter Zijlstra
  Cc: linux-perf-users, Alexey Brodkin, Will Deacon, linux-kernel,
	Arnaldo Carvalho de Melo, linux-snps-arc, Jiri Olsa

On 09/20/2016 03:56 PM, Vineet Gupta wrote:
> On 09/01/2016 01:33 AM, Peter Zijlstra wrote:
>>> - is that what perf event grouping is ?
>>
>> Again, nope. Perf event groups are single counter (so no implicit
>> addition) that are co-scheduled on the PMU.
>
> I'm not sure I understand - does this require specific PMU/arch support - as in
> multiple conditions feeding to same counter.

My read is that is that what Peter meant was that each event in the perf event group is a single counter, so all the events in the group are counted simultaneously.  (No multiplexing.)

>> You can do it like:
>>
>> 	perf stat -e '{cycles,instructions}'
>>
>> Which will place the cycles event and the instructions event in a group
>> and thereby guarantee they're co-scheduled.
>
> Again when you say co-scheduled what do you mean - why would anyone use the event
> grouping - is it when they only have 1 counter and they want to count 2
> conditions/events at the same time - isn't this same as event multiplexing ?

I'd say it's the converse of multiplexing.  Instead of mapping multiple events to a single counter, perf event groups map a set of events each to their own counter, and they are active simultaneously.  I suppose it's possible for the _groups_ to be multiplexed with other events or groups, but the group as a whole will be scheduled together, as a group.

If you have a single counter, I don't believe you can support perf event groups, by definition.

Regards,
Paul Clarke, IBM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)
  2016-09-22  0:43         ` Paul Clarke
@ 2016-09-22  7:56           ` Peter Zijlstra
  2016-09-22 17:50             ` Vineet Gupta
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2016-09-22  7:56 UTC (permalink / raw)
  To: Paul Clarke
  Cc: Vineet Gupta, linux-perf-users, Alexey Brodkin, Will Deacon,
	linux-kernel, Arnaldo Carvalho de Melo, linux-snps-arc,
	Jiri Olsa

On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote:
> On 09/20/2016 03:56 PM, Vineet Gupta wrote:
> >On 09/01/2016 01:33 AM, Peter Zijlstra wrote:
> >>>- is that what perf event grouping is ?
> >>
> >>Again, nope. Perf event groups are single counter (so no implicit
> >>addition) that are co-scheduled on the PMU.
> >
> >I'm not sure I understand - does this require specific PMU/arch support - as in
> >multiple conditions feeding to same counter.
> 
> My read is that is that what Peter meant was that each event in the
> perf event group is a single counter, so all the events in the group
> are counted simultaneously.  (No multiplexing.)

Right, sorry for the poor wording.

> >Again when you say co-scheduled what do you mean - why would anyone use the event
> >grouping - is it when they only have 1 counter and they want to count 2
> >conditions/events at the same time - isn't this same as event multiplexing ?
> 
> I'd say it's the converse of multiplexing.  Instead of mapping
> multiple events to a single counter, perf event groups map a set of
> events each to their own counter, and they are active simultaneously.
> I suppose it's possible for the _groups_ to be multiplexed with other
> events or groups, but the group as a whole will be scheduled together,
> as a group.

Correct.

Each events get their own hardware counter. Grouped events are
co-scheduled on the hardware.

You can multiplex groups. But if one event in a group is schedule, they
all must be.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)
  2016-09-22  7:56           ` Peter Zijlstra
@ 2016-09-22 17:50             ` Vineet Gupta
  2016-09-22 18:23               ` Paul Clarke
  0 siblings, 1 reply; 11+ messages in thread
From: Vineet Gupta @ 2016-09-22 17:50 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Clarke
  Cc: Arnaldo Carvalho de Melo, Alexey Brodkin, Will Deacon,
	linux-kernel, linux-perf-users, linux-snps-arc, Jiri Olsa

On 09/22/2016 12:56 AM, Peter Zijlstra wrote:
> On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote:
>> On 09/20/2016 03:56 PM, Vineet Gupta wrote:
>>> On 09/01/2016 01:33 AM, Peter Zijlstra wrote:
>>>>> - is that what perf event grouping is ?
>>>>
>>>> Again, nope. Perf event groups are single counter (so no implicit
>>>> addition) that are co-scheduled on the PMU.
>>>
>>> I'm not sure I understand - does this require specific PMU/arch support - as in
>>> multiple conditions feeding to same counter.
>>
>> My read is that is that what Peter meant was that each event in the
>> perf event group is a single counter, so all the events in the group
>> are counted simultaneously.  (No multiplexing.)
> 
> Right, sorry for the poor wording.
> 
>>> Again when you say co-scheduled what do you mean - why would anyone use the event
>>> grouping - is it when they only have 1 counter and they want to count 2
>>> conditions/events at the same time - isn't this same as event multiplexing ?
>>
>> I'd say it's the converse of multiplexing.  Instead of mapping
>> multiple events to a single counter, perf event groups map a set of
>> events each to their own counter, and they are active simultaneously.
>> I suppose it's possible for the _groups_ to be multiplexed with other
>> events or groups, but the group as a whole will be scheduled together,
>> as a group.
> 
> Correct.
> 
> Each events get their own hardware counter. Grouped events are
> co-scheduled on the hardware.

And if we don't group them, then they _may_ not be co-scheduled (active/counting
at the same time) ? But how can this be possible.
Say we have 2 counters, both the cmds below

     perf -e cycles,instructions hackbench
     perf -e {cycles,instructions} hackbench

would assign 2 counters to the 2 conditions which keep counting until perf asks
them to stop (because the profiled application ended)

I don't understand the "scheduling" of counter - once we set them to count, there
is no real intervention/scheduling form software in terms of disabling/enabling
(assuming no multiplexing etc)

> You can multiplex groups. But if one event in a group is schedule, they
> all must be.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)
  2016-09-22 17:50             ` Vineet Gupta
@ 2016-09-22 18:23               ` Paul Clarke
  2016-09-22 19:42                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Clarke @ 2016-09-22 18:23 UTC (permalink / raw)
  To: Vineet Gupta, Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Alexey Brodkin, Will Deacon,
	linux-kernel, linux-perf-users, linux-snps-arc, Jiri Olsa

On 09/22/2016 12:50 PM, Vineet Gupta wrote:
> On 09/22/2016 12:56 AM, Peter Zijlstra wrote:
>> On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote:
>>> On 09/20/2016 03:56 PM, Vineet Gupta wrote:
>>>> On 09/01/2016 01:33 AM, Peter Zijlstra wrote:
>>>>>> - is that what perf event grouping is ?
>>>>>
>>>>> Again, nope. Perf event groups are single counter (so no implicit
>>>>> addition) that are co-scheduled on the PMU.
>>>>
>>>> I'm not sure I understand - does this require specific PMU/arch support - as in
>>>> multiple conditions feeding to same counter.
>>>
>>> My read is that is that what Peter meant was that each event in the
>>> perf event group is a single counter, so all the events in the group
>>> are counted simultaneously.  (No multiplexing.)
>>
>> Right, sorry for the poor wording.
>>
>>>> Again when you say co-scheduled what do you mean - why would anyone use the event
>>>> grouping - is it when they only have 1 counter and they want to count 2
>>>> conditions/events at the same time - isn't this same as event multiplexing ?
>>>
>>> I'd say it's the converse of multiplexing.  Instead of mapping
>>> multiple events to a single counter, perf event groups map a set of
>>> events each to their own counter, and they are active simultaneously.
>>> I suppose it's possible for the _groups_ to be multiplexed with other
>>> events or groups, but the group as a whole will be scheduled together,
>>> as a group.
>>
>> Correct.
>>
>> Each events get their own hardware counter. Grouped events are
>> co-scheduled on the hardware.
>
> And if we don't group them, then they _may_ not be co-scheduled (active/counting
> at the same time) ? But how can this be possible.
> Say we have 2 counters, both the cmds below
>
>      perf -e cycles,instructions hackbench
>      perf -e {cycles,instructions} hackbench
>
> would assign 2 counters to the 2 conditions which keep counting until perf asks
> them to stop (because the profiled application ended)
>
> I don't understand the "scheduling" of counter - once we set them to count, there
> is no real intervention/scheduling form software in terms of disabling/enabling
> (assuming no multiplexing etc)

If you assume no multiplexing, then this discussion on grouping is moot.

It depends on how many events you specify, how many counters there are, and which counters can count which events.  If you specify a set of events for which every event can be counted simultaneously, they will be scheduled simultaneously and continuously.  If you specify more events than counters, there's multiplexing.  AND, if you specify a set of events, some of which cannot be counted simultaneously due to hardware limitations, they'll be multiplexed.

PC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events)
  2016-09-22 18:23               ` Paul Clarke
@ 2016-09-22 19:42                 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-09-22 19:42 UTC (permalink / raw)
  To: Paul Clarke
  Cc: Vineet Gupta, Peter Zijlstra, Alexey Brodkin, Will Deacon,
	linux-kernel, linux-perf-users, linux-snps-arc, Jiri Olsa

Em Thu, Sep 22, 2016 at 01:23:04PM -0500, Paul Clarke escreveu:
> On 09/22/2016 12:50 PM, Vineet Gupta wrote:
> >On 09/22/2016 12:56 AM, Peter Zijlstra wrote:
> >>On Wed, Sep 21, 2016 at 07:43:28PM -0500, Paul Clarke wrote:
> >>>On 09/20/2016 03:56 PM, Vineet Gupta wrote:
> >>>>On 09/01/2016 01:33 AM, Peter Zijlstra wrote:
> >>>>>>- is that what perf event grouping is ?
> >>>>>
> >>>>>Again, nope. Perf event groups are single counter (so no implicit
> >>>>>addition) that are co-scheduled on the PMU.
> >>>>
> >>>>I'm not sure I understand - does this require specific PMU/arch support - as in
> >>>>multiple conditions feeding to same counter.
> >>>
> >>>My read is that is that what Peter meant was that each event in the
> >>>perf event group is a single counter, so all the events in the group
> >>>are counted simultaneously.  (No multiplexing.)
> >>
> >>Right, sorry for the poor wording.
> >>
> >>>>Again when you say co-scheduled what do you mean - why would anyone use the event
> >>>>grouping - is it when they only have 1 counter and they want to count 2
> >>>>conditions/events at the same time - isn't this same as event multiplexing ?
> >>>
> >>>I'd say it's the converse of multiplexing.  Instead of mapping
> >>>multiple events to a single counter, perf event groups map a set of
> >>>events each to their own counter, and they are active simultaneously.
> >>>I suppose it's possible for the _groups_ to be multiplexed with other
> >>>events or groups, but the group as a whole will be scheduled together,
> >>>as a group.
> >>
> >>Correct.
> >>
> >>Each events get their own hardware counter. Grouped events are
> >>co-scheduled on the hardware.
> >
> >And if we don't group them, then they _may_ not be co-scheduled (active/counting
> >at the same time) ? But how can this be possible.
> >Say we have 2 counters, both the cmds below
> >
> >     perf -e cycles,instructions hackbench
> >     perf -e {cycles,instructions} hackbench
> >
> >would assign 2 counters to the 2 conditions which keep counting until perf asks
> >them to stop (because the profiled application ended)
> >
> >I don't understand the "scheduling" of counter - once we set them to count, there
> >is no real intervention/scheduling form software in terms of disabling/enabling
> >(assuming no multiplexing etc)

So, getting this machine as an example:

[    0.067739] smpboot: CPU0: Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz (family: 0x6, model: 0x3a, stepping: 0x9)
[    0.067744] Performance Events: PEBS fmt1+, 16-deep LBR, IvyBridge events, full-width counters, Intel PMU driver.
[    0.067774] ... version:                3
[    0.067776] ... bit width:              48
[    0.067777] ... generic registers:      4
[    0.067778] ... value mask:             0000ffffffffffff
[    0.067779] ... max period:             0000ffffffffffff
[    0.067780] ... fixed-purpose events:   3
[    0.067781] ... event mask:             000000070000000f
[    0.068694] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.

[root@zoo ~]# perf stat -e '{branch-instructions,branch-misses,bus-cycles,cache-misses}' ls a
ls: cannot access 'a': No such file or directory

 Performance counter stats for 'ls a':

           356,090      branch-instructions                                         
            17,170      branch-misses             #    4.82% of all branches        
           232,365      bus-cycles                                                  
            12,107      cache-misses                                                

       0.003624967 seconds time elapsed

[root@zoo ~]# perf stat -e '{branch-instructions,branch-misses,bus-cycles,cache-misses,cpu-cycles}' ls a
ls: cannot access 'a': No such file or directory

 Performance counter stats for 'ls a':

     <not counted>      branch-instructions                                           (0.00%)
     <not counted>      branch-misses                                                 (0.00%)
     <not counted>      bus-cycles                                                    (0.00%)
     <not counted>      cache-misses                                                  (0.00%)
     <not counted>      cpu-cycles                                                    (0.00%)

       0.003659678 seconds time elapsed

[root@zoo ~]#

That was as a group, i.e. those {} enclosing it, if you run it with -vv, among
other things you'll see the "group_fd" parameter to the sys_perf_event_open
syscall:

[root@zoo ~]# perf stat -vv -e '{branch-instructions,branch-misses,bus-cycles,cache-misses,cpu-cycles}' ls a
sys_perf_event_open: pid 28581  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open: pid 28581  cpu -1  group_fd 3  flags 0x8
sys_perf_event_open: pid 28581  cpu -1  group_fd 3  flags 0x8
sys_perf_event_open: pid 28581  cpu -1  group_fd 3  flags 0x8
sys_perf_event_open: pid 28581  cpu -1  group_fd 3  flags 0x8
ls: cannot access 'a': No such file or directory

 Performance counter stats for 'ls a':

     <not counted>      branch-instructions                                           (0.00%)
     <not counted>      branch-misses                                                 (0.00%)
     <not counted>      bus-cycles                                                    (0.00%)
     <not counted>      cache-misses                                                  (0.00%)
     <not counted>      cpu-cycles                                                    (0.00%)

       0.002883209 seconds time elapsed

[root@zoo ~]#

So, the first one passes -1, to create the group, the fd it returns is '3',
that is used as group_fd for the other events in that group.

So the workload runs but nothing is counted, the kernel can't do what was
asked, i.e. schedule all those 5 hardware events _at the same time_, no
multiplexing of counters that can count different hardware events is performed
_for that task_.

If we remove that {}, i.e. say, no need to enable all those counters _at the
same time_, multiplex them _in the same task_ to be able to measure them all
to some degree, it "works":

[root@zoo ~]# perf stat -vv -e 'branch-instructions,branch-misses,bus-cycles,cache-misses,cpu-cycles' ls a
perf_event_attr: (For the first event:)
  config                           0x4
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
sys_perf_event_open: pid 28594  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open: pid 28594  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open: pid 28594  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open: pid 28594  cpu -1  group_fd -1  flags 0x8
sys_perf_event_open: pid 28594  cpu -1  group_fd -1  flags 0x8

 Performance counter stats for 'ls a':

           317,892      branch-instructions                                           (53.01%)
            13,400      branch-misses             #    4.22% of all branches        
           201,578      bus-cycles                                                  
            11,326      cache-misses                                                
         2,203,482      cpu-cycles                                                    (78.44%)

       0.003026840 seconds time elapsed

[root@zoo ~]#

See the read_format? Those percentages? the group_fd = -1?

It all depends on these PMU resources:

[    0.067777] ... generic registers:      4
[    0.067780] ... fixed-purpose events:   3

Its this part of 'man perf_event_open':

  The  group_fd  argument  allows  event groups to be created.  An event group
has one event which is the group leader.  The leader is created first, with
group_fd = -1.  The rest of the group members are created with subsequent
perf_event_open() calls with group_fd being set to the file descrip‐ tor  of
the group leader.  (A single event on its own is created with group_fd = -1 and
is considered to be a group with only 1 member.)  An event group is scheduled
onto the CPU as a unit: it will be put onto the CPU only if all of the events
in the group can be put onto the CPU.  This  means that  the  values  of  the
member  events  can be meaningfully compared—added, divided (to get ratios),
and so on—with each other, since they have counted events for the same set of
executed instructions.

- Arnaldo
 
> If you assume no multiplexing, then this discussion on grouping is moot.
 
> It depends on how many events you specify, how many counters there
> are, and which counters can count which events.  If you specify a set
> of events for which every event can be counted simultaneously, they
> will be scheduled simultaneously and continuously.  If you specify
> more events than counters, there's multiplexing.  AND, if you specify

There is multiplexing if group_fd is set to -1 in all events.

> a set of events, some of which cannot be counted simultaneously due to
> hardware limitations, they'll be multiplexed.

Not if group_fd is set to a group leader.
 
> PC

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-09-22 19:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 11:47 [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events Alexey Brodkin
2016-08-26 17:30 ` Vineet Gupta
2016-08-31 19:05   ` Vineet Gupta
2016-09-01  8:33     ` Peter Zijlstra
2016-09-20 20:56       ` perf event grouping for dummies (was Re: [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events) Vineet Gupta
2016-09-22  0:43         ` Paul Clarke
2016-09-22  7:56           ` Peter Zijlstra
2016-09-22 17:50             ` Vineet Gupta
2016-09-22 18:23               ` Paul Clarke
2016-09-22 19:42                 ` Arnaldo Carvalho de Melo
2016-09-14 17:53   ` [PATCH] arc: perf: Enable generic "cache-references" and "cache-misses" events Vineet Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).