All of lore.kernel.org
 help / color / mirror / Atom feed
* perf :: intel hybrid events (fwd)
@ 2022-04-13 11:16 Michael Petlan
       [not found] ` <CA+JHD92zy1iOkok57goXrhjOmri+fZXRhOoNGzwBW+t_a84etw@mail.gmail.com>
  2022-10-09  3:09 ` Xing Zhengjun
  0 siblings, 2 replies; 4+ messages in thread
From: Michael Petlan @ 2022-04-13 11:16 UTC (permalink / raw)
  To: linux-perf-users

Forwarding the questions to perf-users...

Also, I have found out that mem-stores:p event does not work on
Intel Alderlake:

# perf record -e mem-stores -- ./examples/dummy > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.024 MB perf.data (64 samples) ]

While with precise, it records nothing:

# perf record -e mem-stores:p -- ./examples/dummy > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.021 MB perf.data ]

This makes the perf-mem and perf-c2c commands less useful.

Again, is this how it is supposed to work or do I miss some fixes?
Or does upstream also miss some fixes?

Thanks.
Michael

---------- Forwarded message ----------
Date: Tue, 12 Apr 2022 22:59:11
From: Michael Petlan <mpetlan@redhat.com>
To: yao.jin@linux.intel.com
Subject: perf :: intel hybrid events

Hello Jin Yao,

I have a few questions/ideas about hybrid events on Alderlake...


1) L1-{d,i}cache-load{,-misse}s supported partially

Interestingly enough, perf offers the following events in the hwcache set:

L1-dcache-load-misses
L1-dcache-loads
L1-icache-load-misses
L1-icache-loads

Of course, each expands to its cpu_core and cpu_atom version, as following:

# perf stat -e L1-icache-load-misses
^C
 Performance counter stats for 'system wide':
           146,566      cpu_core/L1-icache-load-misses/
           164,971      cpu_atom/L1-icache-load-misses/

On my Alderlake testing box with RHEL-9 I see the following support pattern:

                         |  cpu_core  |  cpu_atom  |
L1-dcache-load-misses    |     OK     |     N/A    |
L1-dcache-loads          |     OK     |     OK     |
L1-icache-load-misses    |     OK     |     OK     |
L1-icache-loads          |     N/A    |     OK     |

For dcache, loads are supported on both, while misses do not work on atom.
That can be, atom is simpler, thus I can expect it missing some events...

For icache, misses are supported on both, while loads do not work on core.
This looks weird, is that really the wanted behavior? Isn't there a bug in
the drivers/event specifications?


2) You added --cputype switch to perf-stat via e69dc84282fb474cb87097c6c94
so one can restrict the expansion and keep only one cpu type used. Doesn't
perf-record need the same?


3) While perf-stat defaults to "use whatever we can" approach when not every
event is supported, puts "<not supported>" into the results, perf-record
fails. This is bad for the cases like above, since it fails when one of the
events aren't supported. That might make sense if the unsupported event was
specified explicitly by the user, e.g. `perf record -e AA -e BB -- ./load`
and perf fails "sorry, I don't support event BB".

However, what if the user just wants L1-dcache-load-misses and encounters
perf-record failing just because the event is not supported on Atom?

Shouldn't this behavior be fixed by some --tolerant switch that would ignore
the problems and record what is going on on the Core at least?


What are your ideas?
Thanks...

Michael


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: perf :: intel hybrid events (fwd)
       [not found] ` <CA+JHD92zy1iOkok57goXrhjOmri+fZXRhOoNGzwBW+t_a84etw@mail.gmail.com>
@ 2022-04-13 13:27   ` Liang, Kan
  2022-04-15  3:14     ` Xing Zhengjun
  0 siblings, 1 reply; 4+ messages in thread
From: Liang, Kan @ 2022-04-13 13:27 UTC (permalink / raw)
  To: Michael Petlan
  Cc: linux-perf-users, Arnaldo Carvalho de Melo, Andi Kleen, Zhengjun Xing


Hi Michael,

Thanks for reporting the issues.
> 
> 
> Forwarding the questions to perf-users...
> 
> Also, I have found out that mem-stores:p event does not work on
> Intel Alderlake:
> 
> # perf record -e mem-stores -- ./examples/dummy > /dev/null
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.024 MB perf.data (64 samples) ]
> 
> While with precise, it records nothing:
> 
> # perf record -e mem-stores:p -- ./examples/dummy > /dev/null
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.021 MB perf.data ]
> 
> This makes the perf-mem and perf-c2c commands less useful.
> 
> Again, is this how it is supposed to work or do I miss some fixes?
> Or does upstream also miss some fixes?
> 

It looks like a perf tool bug.

Actually, we did the support for the perf mem record with patch 
4a9086adc329 ("perf mem: Support record for hybrid platform").
It seems we need some extra work for mem-stores:p as well.


> Thanks.
> Michael
> 
> ---------- Forwarded message ----------
> Date: Tue, 12 Apr 2022 22:59:11
> From: Michael Petlan <mpetlan@redhat.com <mailto:mpetlan@redhat.com>>
> To: yao.jin@linux.intel.com <mailto:yao.jin@linux.intel.com>
> Subject: perf :: intel hybrid events
> 
> Hello Jin Yao,
> 
> I have a few questions/ideas about hybrid events on Alderlake...
>

Now, Zhengjun focus on the userspace perf tool enabling.

Zhengjun, could you please take a look all the issues?

> 
> 1) L1-{d,i}cache-load{,-misse}s supported partially
> 
> Interestingly enough, perf offers the following events in the hwcache set:
> 
> L1-dcache-load-misses
> L1-dcache-loads
> L1-icache-load-misses
> L1-icache-loads
> 
> Of course, each expands to its cpu_core and cpu_atom version, as following:
> 
> # perf stat -e L1-icache-load-misses
> ^C
>   Performance counter stats for 'system wide':
>             146,566      cpu_core/L1-icache-load-misses/
>             164,971      cpu_atom/L1-icache-load-misses/
> 
> On my Alderlake testing box with RHEL-9 I see the following support pattern:
> 
>                           |  cpu_core  |  cpu_atom  |
> L1-dcache-load-misses    |     OK     |     N/A    |
> L1-dcache-loads          |     OK     |     OK     |
> L1-icache-load-misses    |     OK     |     OK     |
> L1-icache-loads          |     N/A    |     OK     |
> 
> For dcache, loads are supported on both, while misses do not work on atom.
> That can be, atom is simpler, thus I can expect it missing some events...
> 
> For icache, misses are supported on both, while loads do not work on core.
> This looks weird, is that really the wanted behavior? Isn't there a bug in
> the drivers/event specifications?

That's expected. We don't have a proper event for the L1-icache-loads on 
big core and L1-dcache-load-misses on Atom.
You can see the same behavior on the previous core platform SKL and atom 
platform GLP and TNT.

> 
> 
> 2) You added --cputype switch to perf-stat via e69dc84282fb474cb87097c6c94
> so one can restrict the expansion and keep only one cpu type used. Doesn't
> perf-record need the same?

Yes, I agree.

> 
> 
> 3) While perf-stat defaults to "use whatever we can" approach when not every
> event is supported, puts "<not supported>" into the results, perf-record
> fails. This is bad for the cases like above, since it fails when one of the
> events aren't supported. That might make sense if the unsupported event was
> specified explicitly by the user, e.g. `perf record -e AA -e BB -- ./load`
> and perf fails "sorry, I don't support event BB".
> 
> However, what if the user just wants L1-dcache-load-misses and encounters
> perf-record failing just because the event is not supported on Atom?
> 
> Shouldn't this behavior be fixed by some --tolerant switch that would ignore
> the problems and record what is going on on the Core at least?
> 
> 

Yes, I agree. I think we should collect anything we can collect. For the 
unsupported event, a warning should be printed.

BTW: Besides the cache events, the topdown events also have some issues 
(perf stat --topdown and perf stat defaults) on the hybrid platforms. 
Zhengjun is working on it. Some Topdown related patches for the hybrid 
platforms will be posted soon.


Thanks,
Kan
> What are your ideas?
> Thanks...
> 
> Michael
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: perf :: intel hybrid events (fwd)
  2022-04-13 13:27   ` Fwd: " Liang, Kan
@ 2022-04-15  3:14     ` Xing Zhengjun
  0 siblings, 0 replies; 4+ messages in thread
From: Xing Zhengjun @ 2022-04-15  3:14 UTC (permalink / raw)
  To: Liang, Kan, Michael Petlan
  Cc: linux-perf-users, Arnaldo Carvalho de Melo, Andi Kleen



On 4/13/2022 9:27 PM, Liang, Kan wrote:
> 
> Hi Michael,
> 
> Thanks for reporting the issues.
>>
>>
>> Forwarding the questions to perf-users...
>>
>> Also, I have found out that mem-stores:p event does not work on
>> Intel Alderlake:
>>
>> # perf record -e mem-stores -- ./examples/dummy > /dev/null
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.024 MB perf.data (64 samples) ]
>>
>> While with precise, it records nothing:
>>
>> # perf record -e mem-stores:p -- ./examples/dummy > /dev/null
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.021 MB perf.data ]
>>
>> This makes the perf-mem and perf-c2c commands less useful.
>>
>> Again, is this how it is supposed to work or do I miss some fixes?
>> Or does upstream also miss some fixes?
>>
> 
> It looks like a perf tool bug.
> 
> Actually, we did the support for the perf mem record with patch 
> 4a9086adc329 ("perf mem: Support record for hybrid platform").
> It seems we need some extra work for mem-stores:p as well.
> 
> 
>> Thanks.
>> Michael
>>
>> ---------- Forwarded message ----------
>> Date: Tue, 12 Apr 2022 22:59:11
>> From: Michael Petlan <mpetlan@redhat.com <mailto:mpetlan@redhat.com>>
>> To: yao.jin@linux.intel.com <mailto:yao.jin@linux.intel.com>
>> Subject: perf :: intel hybrid events
>>
>> Hello Jin Yao,
>>
>> I have a few questions/ideas about hybrid events on Alderlake...
>>
> 
> Now, Zhengjun focus on the userspace perf tool enabling.
> 
> Zhengjun, could you please take a look all the issues?
> 
Sure. I will fix the issues.
>>
>> 1) L1-{d,i}cache-load{,-misse}s supported partially
>>
>> Interestingly enough, perf offers the following events in the hwcache 
>> set:
>>
>> L1-dcache-load-misses
>> L1-dcache-loads
>> L1-icache-load-misses
>> L1-icache-loads
>>
>> Of course, each expands to its cpu_core and cpu_atom version, as 
>> following:
>>
>> # perf stat -e L1-icache-load-misses
>> ^C
>>   Performance counter stats for 'system wide':
>>             146,566      cpu_core/L1-icache-load-misses/
>>             164,971      cpu_atom/L1-icache-load-misses/
>>
>> On my Alderlake testing box with RHEL-9 I see the following support 
>> pattern:
>>
>>                           |  cpu_core  |  cpu_atom  |
>> L1-dcache-load-misses    |     OK     |     N/A    |
>> L1-dcache-loads          |     OK     |     OK     |
>> L1-icache-load-misses    |     OK     |     OK     |
>> L1-icache-loads          |     N/A    |     OK     |
>>
>> For dcache, loads are supported on both, while misses do not work on 
>> atom.
>> That can be, atom is simpler, thus I can expect it missing some events...
>>
>> For icache, misses are supported on both, while loads do not work on 
>> core.
>> This looks weird, is that really the wanted behavior? Isn't there a 
>> bug in
>> the drivers/event specifications?
> 
> That's expected. We don't have a proper event for the L1-icache-loads on 
> big core and L1-dcache-load-misses on Atom.
> You can see the same behavior on the previous core platform SKL and atom 
> platform GLP and TNT.
> 
>>
>>
>> 2) You added --cputype switch to perf-stat via 
>> e69dc84282fb474cb87097c6c94
>> so one can restrict the expansion and keep only one cpu type used. 
>> Doesn't
>> perf-record need the same?
> 
> Yes, I agree.
> 
>>
>>
>> 3) While perf-stat defaults to "use whatever we can" approach when not 
>> every
>> event is supported, puts "<not supported>" into the results, perf-record
>> fails. This is bad for the cases like above, since it fails when one 
>> of the
>> events aren't supported. That might make sense if the unsupported 
>> event was
>> specified explicitly by the user, e.g. `perf record -e AA -e BB -- 
>> ./load`
>> and perf fails "sorry, I don't support event BB".
>>
>> However, what if the user just wants L1-dcache-load-misses and encounters
>> perf-record failing just because the event is not supported on Atom?
>>
>> Shouldn't this behavior be fixed by some --tolerant switch that would 
>> ignore
>> the problems and record what is going on on the Core at least?
>>
>>
> 
> Yes, I agree. I think we should collect anything we can collect. For the 
> unsupported event, a warning should be printed.
> 
> BTW: Besides the cache events, the topdown events also have some issues 
> (perf stat --topdown and perf stat defaults) on the hybrid platforms. 
> Zhengjun is working on it. Some Topdown related patches for the hybrid 
> platforms will be posted soon.
> 
> 
> Thanks,
> Kan
>> What are your ideas?
>> Thanks...
>>
>> Michael
>>

-- 
Zhengjun Xing

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: perf :: intel hybrid events (fwd)
  2022-04-13 11:16 perf :: intel hybrid events (fwd) Michael Petlan
       [not found] ` <CA+JHD92zy1iOkok57goXrhjOmri+fZXRhOoNGzwBW+t_a84etw@mail.gmail.com>
@ 2022-10-09  3:09 ` Xing Zhengjun
  1 sibling, 0 replies; 4+ messages in thread
From: Xing Zhengjun @ 2022-10-09  3:09 UTC (permalink / raw)
  To: Michael Petlan, linux-perf-users



I tried the will-it-scale write1 workload,  mem-stores:p works OK on 
ADL, it looks like the issue is related to the test workload.

will-it-scale:  https://github.com/antonblanchard/will-it-scale.git

# perf record -e mem-stores:p -- ./runtest.py write1
tasks,processes,processes_idle,threads,threads_idle,linear
0,0,100,0,100,0
1,2346709,96.88,2257694,96.88,2346709
^C[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 1.491 MB perf.data (31612 samples) ]


On 4/13/2022 7:16 PM, Michael Petlan wrote:
> Forwarding the questions to perf-users...
> 
> Also, I have found out that mem-stores:p event does not work on
> Intel Alderlake:
> 
> # perf record -e mem-stores -- ./examples/dummy > /dev/null
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.024 MB perf.data (64 samples) ]
> 
> While with precise, it records nothing:
> 
> # perf record -e mem-stores:p -- ./examples/dummy > /dev/null
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.021 MB perf.data ]
> 
> This makes the perf-mem and perf-c2c commands less useful.
> 
> Again, is this how it is supposed to work or do I miss some fixes?
> Or does upstream also miss some fixes?
> 
> Thanks.
> Michael
> 
> ---------- Forwarded message ----------
> Date: Tue, 12 Apr 2022 22:59:11
> From: Michael Petlan <mpetlan@redhat.com>
> To: yao.jin@linux.intel.com
> Subject: perf :: intel hybrid events
> 
> Hello Jin Yao,
> 
> I have a few questions/ideas about hybrid events on Alderlake...
> 
> 
> 1) L1-{d,i}cache-load{,-misse}s supported partially
> 
> Interestingly enough, perf offers the following events in the hwcache set:
> 
> L1-dcache-load-misses
> L1-dcache-loads
> L1-icache-load-misses
> L1-icache-loads
> 
> Of course, each expands to its cpu_core and cpu_atom version, as following:
> 
> # perf stat -e L1-icache-load-misses
> ^C
>   Performance counter stats for 'system wide':
>             146,566      cpu_core/L1-icache-load-misses/
>             164,971      cpu_atom/L1-icache-load-misses/
> 
> On my Alderlake testing box with RHEL-9 I see the following support pattern:
> 
>                           |  cpu_core  |  cpu_atom  |
> L1-dcache-load-misses    |     OK     |     N/A    |
> L1-dcache-loads          |     OK     |     OK     |
> L1-icache-load-misses    |     OK     |     OK     |
> L1-icache-loads          |     N/A    |     OK     |
> 
> For dcache, loads are supported on both, while misses do not work on atom.
> That can be, atom is simpler, thus I can expect it missing some events...
> 
> For icache, misses are supported on both, while loads do not work on core.
> This looks weird, is that really the wanted behavior? Isn't there a bug in
> the drivers/event specifications?
> 
> 
> 2) You added --cputype switch to perf-stat via e69dc84282fb474cb87097c6c94
> so one can restrict the expansion and keep only one cpu type used. Doesn't
> perf-record need the same?
> 
> 
> 3) While perf-stat defaults to "use whatever we can" approach when not every
> event is supported, puts "<not supported>" into the results, perf-record
> fails. This is bad for the cases like above, since it fails when one of the
> events aren't supported. That might make sense if the unsupported event was
> specified explicitly by the user, e.g. `perf record -e AA -e BB -- ./load`
> and perf fails "sorry, I don't support event BB".
> 
> However, what if the user just wants L1-dcache-load-misses and encounters
> perf-record failing just because the event is not supported on Atom?
> 
> Shouldn't this behavior be fixed by some --tolerant switch that would ignore
> the problems and record what is going on on the Core at least?
> 
> 
> What are your ideas?
> Thanks...
> 
> Michael
> 

-- 
Zhengjun Xing

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-09  3:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13 11:16 perf :: intel hybrid events (fwd) Michael Petlan
     [not found] ` <CA+JHD92zy1iOkok57goXrhjOmri+fZXRhOoNGzwBW+t_a84etw@mail.gmail.com>
2022-04-13 13:27   ` Fwd: " Liang, Kan
2022-04-15  3:14     ` Xing Zhengjun
2022-10-09  3:09 ` Xing Zhengjun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.