All of lore.kernel.org
 help / color / mirror / Atom feed
* perf report feature request: folded stack output
@ 2015-10-09 21:24 Brendan Gregg
  2015-10-09 22:13 ` Brendan Gregg
  2015-10-09 23:33 ` Milian Wolff
  0 siblings, 2 replies; 4+ messages in thread
From: Brendan Gregg @ 2015-10-09 21:24 UTC (permalink / raw)
  To: linux-perf-use.

G'Day,

Maybe someone would like to code this (if not I hope to find the
time); perf report already has the capability to print captured stacks
as a call tree. I'd like a new output mode: folded.

Flame graphs[1] consume folded stacks. Eg:

# git clone https://github.com/brendangregg/FlameGraph
# cd FlameGraph
# perf record -F 99 -a -g -- sleep 60
# perf script | ./stackcollapse-perf.pl | ./flamegraph.pl
out.perf-folded > flame.svg

The last line is inefficient, and should really be something like:

# perf report --folded | ./flamegraph.pl out.perf-folded > flame.svg

The folded format is function names separated by semicolons, a space,
then the count of occurrences. Eg:

iperf;__libc_recv;entry_SYSCALL_64_fastpath;sys_recvfrom;SYSC_recvfrom;sock_recvmsg;inet_recvmsg;tcp_recvmsg;tcp_release_cb
1
iperf;__libc_recv;entry_SYSCALL_64_fastpath;sys_recvfrom;SYSC_recvfrom;sock_recvmsg;inet_recvmsg;tcp_recvmsg;tcp_v4_do_rcv
1
iperf;__libc_recv;entry_SYSCALL_64_fastpath;sys_recvfrom;SYSC_recvfrom;sock_recvmsg;security_socket_recvmsg
2
iperf;__libc_recv;entry_SYSCALL_64_fastpath;sys_recvfrom;SYSC_recvfrom;sock_recvmsg;tcp_recvmsg
2
iperf;__libc_recv;entry_SYSCALL_64_fastpath;sys_recvfrom;SYSC_recvfrom;sockfd_lookup_light;__fdget;__fget_light;__fget
9
iperf;__libc_recv;entry_SYSCALL_64_fastpath;sys_recvfrom;SYSC_recvfrom;sockfd_lookup_light;__fget_light
4
iperf;__libc_recv;entry_SYSCALL_64_fastpath;sys_recvfrom;fput 2
iperf;__libc_recv;entry_SYSCALL_64_fastpath;sys_recvfrom;sockfd_lookup_light 1
iperf;__libc_recv;sys_recvfrom 3
iperf;__pthread_disable_asynccancel 11
iperf;check_events;xen_hypercall_xen_version 37

etc.

It should just be a different way of printing "perf report -n
--stdio", so I hope most of the logic is already there. :)

Folded output would be great. An optional additional output could be
JSON. I'm helping build an open source GUI that consumes profiles, and
it's consuming similar aggregated stacks but as JSON. Eg:
https://github.com/spiermar/d3-flame-graph/blob/master/example/stacks.json

Brendan

[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: perf report feature request: folded stack output
  2015-10-09 21:24 perf report feature request: folded stack output Brendan Gregg
@ 2015-10-09 22:13 ` Brendan Gregg
  2015-10-09 23:33 ` Milian Wolff
  1 sibling, 0 replies; 4+ messages in thread
From: Brendan Gregg @ 2015-10-09 22:13 UTC (permalink / raw)
  To: linux-perf-use.

On Fri, Oct 9, 2015 at 2:24 PM, Brendan Gregg <brendan.d.gregg@gmail.com> wrote:
>
> G'Day,
>
> Maybe someone would like to code this (if not I hope to find the
> time); perf report already has the capability to print captured stacks
> as a call tree. I'd like a new output mode: folded.
>
> Flame graphs[1] consume folded stacks. Eg:
>
> # git clone https://github.com/brendangregg/FlameGraph
> # cd FlameGraph
> # perf record -F 99 -a -g -- sleep 60
> # perf script | ./stackcollapse-perf.pl | ./flamegraph.pl
> out.perf-folded > flame.svg

Typo; that last step should just be:

# perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flame.svg

Brendan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: perf report feature request: folded stack output
  2015-10-09 21:24 perf report feature request: folded stack output Brendan Gregg
  2015-10-09 22:13 ` Brendan Gregg
@ 2015-10-09 23:33 ` Milian Wolff
  2015-10-21  0:51   ` Brendan Gregg
  1 sibling, 1 reply; 4+ messages in thread
From: Milian Wolff @ 2015-10-09 23:33 UTC (permalink / raw)
  To: Brendan Gregg; +Cc: linux-perf-use.

[-- Attachment #1: Type: text/plain, Size: 1731 bytes --]

On Freitag, 9. Oktober 2015 14:24:17 CEST Brendan Gregg wrote:
> G'Day,
> 
> Maybe someone would like to code this (if not I hope to find the
> time); perf report already has the capability to print captured stacks
> as a call tree. I'd like a new output mode: folded.
> 
> Flame graphs[1] consume folded stacks. Eg:
> 
> # git clone https://github.com/brendangregg/FlameGraph
> # cd FlameGraph
> # perf record -F 99 -a -g -- sleep 60
> # perf script | ./stackcollapse-perf.pl | ./flamegraph.pl
> out.perf-folded > flame.svg
> 
> The last line is inefficient, and should really be something like:
> 
> # perf report --folded | ./flamegraph.pl out.perf-folded > flame.svg
> 
> The folded format is function names separated by semicolons, a space,
> then the count of occurrences. Eg:

<snip>

Hey Brendan,

did you consider writing a python script to do the folding? It's pretty simple 
nowadays, once you cross the initial bar.

I wrote this for stack collapsing futex locks:
https://paste.kde.org/p61qxah7d

And this to convert samples to callgrind format to open it in KCacheGrind:
https://paste.kde.org/pjfwd1e8f

If you combine the stack collapsing in the former with the generic 
process_event hook used in the latter, you should be all set. In my tests, it 
was pretty quick to convert stuff, certainly better than creating strings, 
pushing them to the console, and then parsing that again in perl.

perf script fold | flamegraph.pl > flame.svg

gets pretty close. If you want to test locally, make sure you use `perf script 
-s fold.py` or similar.

Cheers, and HTH

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: perf report feature request: folded stack output
  2015-10-09 23:33 ` Milian Wolff
@ 2015-10-21  0:51   ` Brendan Gregg
  0 siblings, 0 replies; 4+ messages in thread
From: Brendan Gregg @ 2015-10-21  0:51 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-use.

On Fri, Oct 9, 2015 at 4:33 PM, Milian Wolff <milian.wolff@kdab.com> wrote:
> On Freitag, 9. Oktober 2015 14:24:17 CEST Brendan Gregg wrote:
>> G'Day,
>>
>> Maybe someone would like to code this (if not I hope to find the
>> time); perf report already has the capability to print captured stacks
>> as a call tree. I'd like a new output mode: folded.
>>
>> Flame graphs[1] consume folded stacks. Eg:
>>
>> # git clone https://github.com/brendangregg/FlameGraph
>> # cd FlameGraph
>> # perf record -F 99 -a -g -- sleep 60
>> # perf script | ./stackcollapse-perf.pl | ./flamegraph.pl
>> out.perf-folded > flame.svg
>>
>> The last line is inefficient, and should really be something like:
>>
>> # perf report --folded | ./flamegraph.pl out.perf-folded > flame.svg
>>
>> The folded format is function names separated by semicolons, a space,
>> then the count of occurrences. Eg:
>
> <snip>
>
> Hey Brendan,
>
> did you consider writing a python script to do the folding? It's pretty simple
> nowadays, once you cross the initial bar.
>
> I wrote this for stack collapsing futex locks:
> https://paste.kde.org/p61qxah7d
>
> And this to convert samples to callgrind format to open it in KCacheGrind:
> https://paste.kde.org/pjfwd1e8f
>
> If you combine the stack collapsing in the former with the generic
> process_event hook used in the latter, you should be all set. In my tests, it
> was pretty quick to convert stuff, certainly better than creating strings,
> pushing them to the console, and then parsing that again in perl.
>
> perf script fold | flamegraph.pl > flame.svg
>
> gets pretty close. If you want to test locally, make sure you use `perf script
> -s fold.py` or similar.

Thanks, I did check them out (those links since expired), and that is
an improvement! But I can't help but think that perf report already
has the logic for creating a call tree, and it would be more efficient
if perf could just dump in folded output directly -- without a python
coprocess.

Brendan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-10-21  0:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-09 21:24 perf report feature request: folded stack output Brendan Gregg
2015-10-09 22:13 ` Brendan Gregg
2015-10-09 23:33 ` Milian Wolff
2015-10-21  0:51   ` Brendan Gregg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.