* Adding a recording delay interferes with symbols/call-stack
@ 2017-10-30 20:25 Bram Stolk
2017-10-31 14:25 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 10+ messages in thread
From: Bram Stolk @ 2017-10-30 20:25 UTC (permalink / raw)
To: linux-perf-users
If I record with callstacks like this:
$ perf record --freq 2000 --call-graph dwarf ./bench
then it works just fine: I see callstacks, and symbols in the report.
If I delay the recording by 1 second with the --delay flag, like this:
$ perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
Then I lose all callstack information, and even all symbol information
on my program in the report.
Is this a known issue, that --delay interferes with symbol resolution?
Thanks!
Bram
PS: I am on 64b Ubuntu, and perf version 4.4.87
--
Owner/Director of Game Studio Abraham Stolk Inc.
Vancouver BC, Canada
b.stolk@gmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Adding a recording delay interferes with symbols/call-stack
2017-10-30 20:25 Adding a recording delay interferes with symbols/call-stack Bram Stolk
@ 2017-10-31 14:25 ` Arnaldo Carvalho de Melo
2017-10-31 17:10 ` Bram Stolk
0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-31 14:25 UTC (permalink / raw)
To: Bram Stolk; +Cc: linux-perf-users
Em Mon, Oct 30, 2017 at 01:25:08PM -0700, Bram Stolk escreveu:
> If I record with callstacks like this:
>
> $ perf record --freq 2000 --call-graph dwarf ./bench
>
> then it works just fine: I see callstacks, and symbols in the report.
>
> If I delay the recording by 1 second with the --delay flag, like this:
>
> $ perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
>
> Then I lose all callstack information, and even all symbol information
> on my program in the report.
>
> Is this a known issue, that --delay interferes with symbol resolution?
>
> Thanks!
>
> Bram
>
> PS: I am on 64b Ubuntu, and perf version 4.4.87
Works for me: :-)
[root@jouet ~]# perf record --delay=1000 --freq 2000 --call-graph dwarf sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.050 MB perf.data (4 samples) ]
[root@jouet ~]# perf script
perf 30567 248464.097254: 1 cycles:ppp:
3b233a __perf_event_task_sched_in (/lib/modules/4.14.0-rc3+/build/vmlinux)
2be207 finish_task_switch (/lib/modules/4.14.0-rc3+/build/vmlinux)
a44d21 __sched_text_start (/lib/modules/4.14.0-rc3+/build/vmlinux)
a45368 schedule (/lib/modules/4.14.0-rc3+/build/vmlinux)
a49679 do_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
30c816 hrtimer_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
30c96e sys_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
203a41 do_syscall_64 (/lib/modules/4.14.0-rc3+/build/vmlinux)
a4ad67 return_from_SYSCALL_64 (/lib/modules/4.14.0-rc3+/build/vmlinux)
perf 30567 248464.097288: 1 cycles:ppp:
20d1d0 intel_bts_enable_local (/lib/modules/4.14.0-rc3+/build/vmlinux)
perf 30567 248464.097302: 3 cycles:ppp:
223111 native_sched_clock (/lib/modules/4.14.0-rc3+/build/vmlinux)
perf 30567 248464.097317: 16 cycles:ppp:
223111 native_sched_clock (/lib/modules/4.14.0-rc3+/build/vmlinux)
[root@jouet ~]# perf -v
perf version 4.14.rc6.gd688d0
[root@jouet ~]# uname -a
Linux jouet 4.14.0-rc3+ #1 SMP Fri Oct 13 12:21:12 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@jouet ~]#
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Adding a recording delay interferes with symbols/call-stack
2017-10-31 14:25 ` Arnaldo Carvalho de Melo
@ 2017-10-31 17:10 ` Bram Stolk
2017-10-31 19:14 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 10+ messages in thread
From: Bram Stolk @ 2017-10-31 17:10 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users
Thanks for trying to reproduce, Arnaldo.
But for me, it's not the kernel symbols that are missing: they are all
there, syscalls and stuff.
Just the app's symbols itself are missing.
What happens if you try it on a binary you built locally, that spends
cycles in userspace?
Here is my output on my clang-built binary:
$ perf script
perf 10198 144604.396458: 1 cycles:pp:
ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
ffffffff8100aebd __intel_pmu_enable_all ([kernel.kallsyms])
ffffffff8100af30 intel_pmu_enable_all ([kernel.kallsyms])
ffffffff81006e41 x86_pmu_enable ([kernel.kallsyms])
ffffffff8117c527 perf_pmu_enable.part.89 ([kernel.kallsyms])
ffffffff811813e1 perf_pmu_enable ([kernel.kallsyms])
ffffffff810052e0 x86_pmu_commit_txn ([kernel.kallsyms])
ffffffff8117e887 group_sched_in ([kernel.kallsyms])
ffffffff8117eb50 __perf_event_enable ([kernel.kallsyms])
ffffffff8117c0fa remote_function ([kernel.kallsyms])
ffffffff8110445f flush_smp_call_function_queue ([kernel.kallsyms])
ffffffff81104e53 generic_smp_call_function_single_interrupt
([kernel.kallsyms])
ffffffff81050df7 smp_call_function_single_interrupt ([kernel.kallsyms])
ffffffff81844c22 call_function_single_interrupt ([kernel.kallsyms])
ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
perf 10198 144604.396465: 1 cycles:pp:
ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
perf 10198 144604.396468: 10 cycles:pp:
ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
perf 10198 144604.396471: 184 cycles:pp:
ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
ffffffff81037ddd native_sched_clock ([kernel.kallsyms])
perf 10198 144604.396474: 4362 cycles:pp:
ffffffff81037ddd native_sched_clock ([kernel.kallsyms])
4125ba [unknown] ([unknown])
perf 10198 144604.396479: 104121 cycles:pp:
412110 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 10198 144604.396514: 1399685 cycles:pp:
411ba0 [unknown] ([unknown])
412634 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 10198 144604.396956: 3677146 cycles:pp:
412110 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 10198 144604.398111: 3738199 cycles:pp:
40fda0 [unknown] ([unknown])
412311 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 10198 144604.399285: 3473249 cycles:pp:
411ba0 [unknown] ([unknown])
412634 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 10198 144604.400375: 3224007 cycles:pp:
412110 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 10198 144604.401390: 24189320560325 cycles:pp:
40f210 [unknown] ([unknown])
41250b [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 10198 144604.815874: 28367691 cycles:pp:
412110 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
40b303 [unknown] ([unknown])
401917 [unknown] ([unknown])
On Tue, Oct 31, 2017 at 7:25 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Mon, Oct 30, 2017 at 01:25:08PM -0700, Bram Stolk escreveu:
>> If I record with callstacks like this:
>>
>> $ perf record --freq 2000 --call-graph dwarf ./bench
>>
>> then it works just fine: I see callstacks, and symbols in the report.
>>
>> If I delay the recording by 1 second with the --delay flag, like this:
>>
>> $ perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
>>
>> Then I lose all callstack information, and even all symbol information
>> on my program in the report.
>>
>> Is this a known issue, that --delay interferes with symbol resolution?
>>
>> Thanks!
>>
>> Bram
>>
>> PS: I am on 64b Ubuntu, and perf version 4.4.87
>
> Works for me: :-)
>
> [root@jouet ~]# perf record --delay=1000 --freq 2000 --call-graph dwarf sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.050 MB perf.data (4 samples) ]
> [root@jouet ~]# perf script
> perf 30567 248464.097254: 1 cycles:ppp:
> 3b233a __perf_event_task_sched_in (/lib/modules/4.14.0-rc3+/build/vmlinux)
> 2be207 finish_task_switch (/lib/modules/4.14.0-rc3+/build/vmlinux)
> a44d21 __sched_text_start (/lib/modules/4.14.0-rc3+/build/vmlinux)
> a45368 schedule (/lib/modules/4.14.0-rc3+/build/vmlinux)
> a49679 do_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
> 30c816 hrtimer_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
> 30c96e sys_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
> 203a41 do_syscall_64 (/lib/modules/4.14.0-rc3+/build/vmlinux)
> a4ad67 return_from_SYSCALL_64 (/lib/modules/4.14.0-rc3+/build/vmlinux)
>
> perf 30567 248464.097288: 1 cycles:ppp:
> 20d1d0 intel_bts_enable_local (/lib/modules/4.14.0-rc3+/build/vmlinux)
>
> perf 30567 248464.097302: 3 cycles:ppp:
> 223111 native_sched_clock (/lib/modules/4.14.0-rc3+/build/vmlinux)
>
> perf 30567 248464.097317: 16 cycles:ppp:
> 223111 native_sched_clock (/lib/modules/4.14.0-rc3+/build/vmlinux)
>
> [root@jouet ~]# perf -v
> perf version 4.14.rc6.gd688d0
> [root@jouet ~]# uname -a
> Linux jouet 4.14.0-rc3+ #1 SMP Fri Oct 13 12:21:12 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
> [root@jouet ~]#
--
Owner/Director of Game Studio Abraham Stolk Inc.
Vancouver BC, Canada
b.stolk@gmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Adding a recording delay interferes with symbols/call-stack
2017-10-31 17:10 ` Bram Stolk
@ 2017-10-31 19:14 ` Arnaldo Carvalho de Melo
2017-11-01 4:36 ` Bram Stolk
0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-31 19:14 UTC (permalink / raw)
To: Bram Stolk; +Cc: linux-perf-users
Em Tue, Oct 31, 2017 at 10:10:28AM -0700, Bram Stolk escreveu:
> Thanks for trying to reproduce, Arnaldo.
>
> But for me, it's not the kernel symbols that are missing: they are all
> there, syscalls and stuff.
> Just the app's symbols itself are missing.
>
> What happens if you try it on a binary you built locally, that spends
> cycles in userspace?
So, trying with 'perf trace' to do a system wide strace like session for the
'open' syscall, that will use DWARF callchains and go from userspace all the
way to raw_syscalls:sys_enter/sys_exit, to check various setups to try to
narrow down what may be broken:
[root@jouet ~]# perf trace --max-stack 32 -e open
518.496 ( 0.049 ms): libvirtd/1171 open(filename: /proc/cpuinfo, mode: IRUGO|IWUGO ) = 23
__open_nocancel (inlined)
__GI__IO_file_open (inlined)
__GI__IO_file_fopen (inlined)
__fopen_internal (/usr/lib64/libc-2.25.so)
virHostCPUGetInfo (/usr/lib64/libvirt.so.0.3002.1)
virCapabilitiesGetNodeInfo (/usr/lib64/libvirt.so.0.3002.1)
virNodeGetInfo (/usr/lib64/libvirt.so.0.3002.1)
_init (/usr/sbin/libvirtd)
virNetServerProgramDispatch (/usr/lib64/libvirt.so.0.3002.1)
[0xffffaa7a2a4a8737] (/usr/sbin/libvirtd)
[0xffff80b476e99f30] (/usr/lib64/libvirt.so.0.3002.1)
[0xffff80b476e992a7] (/usr/lib64/libvirt.so.0.3002.1)
start_thread (/usr/lib64/libpthread-2.25.so)
__GI___clone (inlined)
518.817 ( 0.022 ms): libvirtd/1171 open(filename: /sys/devices/system/cpu/present, mode: IFSOCK|ISGID|IXUSR|IRGRP|0x720b0000) = 24
[0xffff80b47aad198d] (/usr/lib64/libpthread-2.25.so)
virFileReadAll (/usr/lib64/libvirt.so.0.3002.1)
virFileReadValueBitmap (/usr/lib64/libvirt.so.0.3002.1)
virSysfsGetValueBitmap (/usr/lib64/libvirt.so.0.3002.1)
virHostCPUGetPresentBitmap (/usr/lib64/libvirt.so.0.3002.1)
virHostCPUGetInfoPopulateLinux (/usr/lib64/libvirt.so.0.3002.1)
virHostCPUGetInfo (/usr/lib64/libvirt.so.0.3002.1)
virCapabilitiesGetNodeInfo (/usr/lib64/libvirt.so.0.3002.1)
virNodeGetInfo (/usr/lib64/libvirt.so.0.3002.1)
_init (/usr/sbin/libvirtd)
virNetServerProgramDispatch (/usr/lib64/libvirt.so.0.3002.1)
[0xffffaa7a2a4a8737] (/usr/sbin/libvirtd)
[0xffff80b476e99f30] (/usr/lib64/libvirt.so.0.3002.1)
[0xffff80b476e992a7] (/usr/lib64/libvirt.so.0.3002.1)
start_thread (/usr/lib64/libpthread-2.25.so)
__GI___clone (inlined)
--------------------------
Seems to work, now using a CPU hog and sampling the most precise CPU "cycles" hardware counter:
# time perf record --call-graph=dwarf stress --cpu 1 --timeout 10
stress: info: [4690] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
stress: info: [4690] successful run completed in 10s
[ perf record: Woken up 1288 times to write data ]
[ perf record: Captured and wrote 321.915 MB perf.data (40001 samples) ]
real 0m10.174s
user 0m10.065s
sys 0m0.244s
#
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|DATA_SRC, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192
#
# perf report --stdio --dsos stress
# To display the perf.data header info, please use --header/--header-only options.
#
# dso: stress
#
# Total Lost Samples: 0
#
# Samples: 40K of event 'cycles:ppp'
# Event count (approx.): 31368767617
#
# Children Self Command Symbol
# ........ ........ ....... ......................
#
88.38% 14.59% stress [.] hogcpu
|
|--73.78%--hogcpu
| |
| --73.73%--rand
| |
| --73.71%--__random
| |
| --40.58%--__random_r
|
|--13.00%--_start
| __libc_start_main
| main
| hogcpu
|
--1.59%--0xbb7
hogcpu
86.78% 0.00% stress [.] _start
|
---_start
__libc_start_main
main
hogcpu
|
--73.73%--rand
|
--73.71%--__random
|
--40.58%--__random_r
86.78% 0.00% stress [.] __libc_start_main
|
---__libc_start_main
main
hogcpu
|
--73.73%--rand
|
--73.71%--__random
|
--40.58%--__random_r
86.78% 0.00% stress [.] main
|
---main
hogcpu
|
--73.73%--rand
|
--73.71%--__random
|
--40.58%--__random_r
0.01% 0.01% stress [k] 0x000055ae24920bc9
0.00% 0.00% stress [k] 0x00007fba0ad15790
0.00% 0.00% stress [k] 0x00007fba0ad157e6
0.00% 0.00% stress [k] 0x00007fba0ad15770
0.00% 0.00% stress [k] 0x000055ae24920bcd
0.00% 0.00% stress [k] 0x00007fba0ad157a5
0.00% 0.00% stress [k] 0x00007fba0ad15c7d
0.00% 0.00% stress [k] 0x00007fba0ad15808
#
# (Tip: System-wide collection from all CPUs: perf record -a)
#
Also with 'perf script':
<SNIP>
stress 4691 1422.153895: 576462 cycles:ppp:
3b946 __random_r (/usr/lib64/libc-2.25.so)
3b7d1 __random (/usr/lib64/libc-2.25.so)
3bc78 rand (/usr/lib64/libc-2.25.so)
2bbc hogcpu (/usr/bin/stress)
160c main (/usr/bin/stress)
20509 __libc_start_main (/usr/lib64/libc-2.25.so)
2ac9 _start (/usr/bin/stress)
stress 4691 1422.154121: 582967 cycles:ppp:
3b946 __random_r (/usr/lib64/libc-2.25.so)
3b7d1 __random (/usr/lib64/libc-2.25.so)
3bc78 rand (/usr/lib64/libc-2.25.so)
2bbc hogcpu (/usr/bin/stress)
160c main (/usr/bin/stress)
20509 __libc_start_main (/usr/lib64/libc-2.25.so)
2ac9 _start (/usr/bin/stress)
stress 4691 1422.154349: 590924 cycles:ppp:
3b93c __random_r (/usr/lib64/libc-2.25.so)
3b7d1 __random (/usr/lib64/libc-2.25.so)
3bc78 rand (/usr/lib64/libc-2.25.so)
2bbc hogcpu (/usr/bin/stress)
160c main (/usr/bin/stress)
20509 __libc_start_main (/usr/lib64/libc-2.25.so)
2ac9 _start (/usr/bin/stress)
<SNIP>
> Here is my output on my clang-built binary:
I'll try as well with a clang built binary:
[acme@jouet linux]$ make CC=clang O=/tmp/build/perf -C tools/perf install-bin
I.e. will profile a perf binary built with clang:
$ readelf -wi ~/bin/perf | grep DW_AT_producer | head -1
<c> DW_AT_producer : (indirect string, offset: 0x0): clang version 4.0.0 (http://llvm.org/git/clang.git f5be8ba13adc4ba1011a7ccd60c844bd60427c1c) (http://llvm.org/git/llvm.git efca1a37676f4cd276d947658cf90b0fb625abfd)
$
# perf record -F 2000 --call-graph=dwarf perf top
[ perf record: Woken up 20 times to write data ]
[ perf record: Captured and wrote 5.771 MB perf.data (694 samples) ]
# perf report --dsos=perf --no-children --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# dso: perf
#
# Total Lost Samples: 0
#
# Samples: 694 of event 'cycles:ppp'
# Event count (approx.): 889028447
#
# Overhead Command Symbol
# ........ ....... ......................................
#
4.77% perf [.] rb_next
|
---rb_next
|
|--3.15%--map_groups__fixup_overlappings
| thread__insert_map
| machine__process_mmap2_event
| machine__process_event
| perf_event__process
| perf_event__synthesize_mmap_events
| 0xffffffffffffffff
|
--1.12%--map__next
map_groups__find_by_name
map_groups__set_modules_path_dir
map_groups__set_modules_path_dir
0xffffffffffffffff
2.69% perf [.] map_groups__fixup_overlappings
|
---map_groups__fixup_overlappings
thread__insert_map
machine__process_mmap2_event
machine__process_event
perf_event__process
perf_event__synthesize_mmap_events
0xffffffffffffffff
1.36% perf [.] rb_erase
|
---rb_erase
symbols__delete
dso__delete
dso__put
|
--1.18%--map__put
__map__zput (inlined)
hist_entry__delete
hists_evsel__exit
perf_evsel__exit
perf_evsel__delete
perf_evlist__purge (inlined)
perf_evlist__delete
cmd_top
0xffffffffffffffff
Also seems to work.
Can you try building the latest codebase, that I'm putting at:
[acme@jouet linux]$ rm -f perf-4.14.0-rc6.tar.xz
[acme@jouet linux]$ make help | grep perf
perf-tar-src-pkg - Build perf-4.14.0-rc6.tar source tarball
perf-targz-src-pkg - Build perf-4.14.0-rc6.tar.gz source tarball
perf-tarbz2-src-pkg - Build perf-4.14.0-rc6.tar.bz2 source tarball
perf-tarxz-src-pkg - Build perf-4.14.0-rc6.tar.xz source tarball
[acme@jouet linux]$ make perf-tarxz-src-pkg
TAR
PERF_VERSION = 4.14.rc6.gadb243
[acme@jouet linux]$ ls -lart perf-4*
-rw-rw-r--. 1 acme acme 1252444 Oct 31 16:06 perf-4.14.0-rc6.tar.xz
[acme@jouet linux]$
http://vger.kernel.org/~acme/perf/perf-4.14.0-rc6.tar.xz
Download it and try building it:
$ mkdir /tmp/perf
$ cd /tmp/perf
$ wget http://vger.kernel.org/~acme/perf/perf-4.14.0-rc6.tar.xz
$ tar xf perf-4.14.0-rc6.tar.xz
$ cd perf-4.14.0-rc6
$ make -C tools/perf install-bin
Then make sure your ~/bin/ directory is in the path and try your workloads to
see if there was some fix that addresses the problems you're experiencing.
- Arnaldo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Adding a recording delay interferes with symbols/call-stack
2017-10-31 19:14 ` Arnaldo Carvalho de Melo
@ 2017-11-01 4:36 ` Bram Stolk
2017-11-02 6:20 ` Namhyung Kim
0 siblings, 1 reply; 10+ messages in thread
From: Bram Stolk @ 2017-11-01 4:36 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users
Thanks again, Arnaldo!
So, ugh, there is something wrong with either my system or my binary,
because the newly built perf has the exact same problem.
bram@ubuntu64:~/apps/BloxTrace$ perf -v
perf version 4.14.rc6.gadb243
bram@ubuntu64:~/apps/BloxTrace$ perf record --delay=1000 --freq 2000
--call-graph lbr ./bench
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.015 MB perf.data (37 samples) ]
bram@ubuntu64:~/apps/BloxTrace$ perf script
perf 20537 185854.082586: 1 cycles:uppp:
ffffffff81844357 native_irq_return_iret ([kernel.kallsyms])
perf 20537 185854.082591: 1 cycles:uppp:
78400200000009 [unknown] ([unknown])
perf 20537 185854.082593: 12 cycles:uppp:
ffffffff81844357 native_irq_return_iret ([kernel.kallsyms])
perf 20537 185854.082595: 343 cycles:uppp:
ffffffff81844357 native_irq_return_iret ([kernel.kallsyms])
perf 20537 185854.082597: 10499 cycles:uppp:
412280 [unknown] ([unknown])
perf 20537 185854.082603: 326186 cycles:uppp:
412110 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 20537 185854.082708: 3664038 cycles:uppp:
411ba0 [unknown] ([unknown])
412634 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 20537 185854.083857: 5403078 cycles:uppp:
412110 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 20537 185854.085554: 17592235560281 cycles:uppp:
412110 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
perf 20537 185854.525567: 9346320 cycles:uppp:
412110 [unknown] ([unknown])
40ada3 [unknown] ([unknown])
40b303 [unknown] ([unknown])
401917 [unknown] ([unknown])
...
On Tue, Oct 31, 2017 at 12:14 PM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Tue, Oct 31, 2017 at 10:10:28AM -0700, Bram Stolk escreveu:
>> Thanks for trying to reproduce, Arnaldo.
>>
>> But for me, it's not the kernel symbols that are missing: they are all
>> there, syscalls and stuff.
>> Just the app's symbols itself are missing.
>>
>> What happens if you try it on a binary you built locally, that spends
>> cycles in userspace?
>
> So, trying with 'perf trace' to do a system wide strace like session for the
> 'open' syscall, that will use DWARF callchains and go from userspace all the
> way to raw_syscalls:sys_enter/sys_exit, to check various setups to try to
> narrow down what may be broken:
>
> [root@jouet ~]# perf trace --max-stack 32 -e open
> 518.496 ( 0.049 ms): libvirtd/1171 open(filename: /proc/cpuinfo, mode: IRUGO|IWUGO ) = 23
> __open_nocancel (inlined)
> __GI__IO_file_open (inlined)
> __GI__IO_file_fopen (inlined)
> __fopen_internal (/usr/lib64/libc-2.25.so)
> virHostCPUGetInfo (/usr/lib64/libvirt.so.0.3002.1)
> virCapabilitiesGetNodeInfo (/usr/lib64/libvirt.so.0.3002.1)
> virNodeGetInfo (/usr/lib64/libvirt.so.0.3002.1)
> _init (/usr/sbin/libvirtd)
> virNetServerProgramDispatch (/usr/lib64/libvirt.so.0.3002.1)
> [0xffffaa7a2a4a8737] (/usr/sbin/libvirtd)
> [0xffff80b476e99f30] (/usr/lib64/libvirt.so.0.3002.1)
> [0xffff80b476e992a7] (/usr/lib64/libvirt.so.0.3002.1)
> start_thread (/usr/lib64/libpthread-2.25.so)
> __GI___clone (inlined)
> 518.817 ( 0.022 ms): libvirtd/1171 open(filename: /sys/devices/system/cpu/present, mode: IFSOCK|ISGID|IXUSR|IRGRP|0x720b0000) = 24
> [0xffff80b47aad198d] (/usr/lib64/libpthread-2.25.so)
> virFileReadAll (/usr/lib64/libvirt.so.0.3002.1)
> virFileReadValueBitmap (/usr/lib64/libvirt.so.0.3002.1)
> virSysfsGetValueBitmap (/usr/lib64/libvirt.so.0.3002.1)
> virHostCPUGetPresentBitmap (/usr/lib64/libvirt.so.0.3002.1)
> virHostCPUGetInfoPopulateLinux (/usr/lib64/libvirt.so.0.3002.1)
> virHostCPUGetInfo (/usr/lib64/libvirt.so.0.3002.1)
> virCapabilitiesGetNodeInfo (/usr/lib64/libvirt.so.0.3002.1)
> virNodeGetInfo (/usr/lib64/libvirt.so.0.3002.1)
> _init (/usr/sbin/libvirtd)
> virNetServerProgramDispatch (/usr/lib64/libvirt.so.0.3002.1)
> [0xffffaa7a2a4a8737] (/usr/sbin/libvirtd)
> [0xffff80b476e99f30] (/usr/lib64/libvirt.so.0.3002.1)
> [0xffff80b476e992a7] (/usr/lib64/libvirt.so.0.3002.1)
> start_thread (/usr/lib64/libpthread-2.25.so)
> __GI___clone (inlined)
>
> --------------------------
>
> Seems to work, now using a CPU hog and sampling the most precise CPU "cycles" hardware counter:
>
> # time perf record --call-graph=dwarf stress --cpu 1 --timeout 10
> stress: info: [4690] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
> stress: info: [4690] successful run completed in 10s
> [ perf record: Woken up 1288 times to write data ]
> [ perf record: Captured and wrote 321.915 MB perf.data (40001 samples) ]
>
> real 0m10.174s
> user 0m10.065s
> sys 0m0.244s
> #
> # perf evlist -v
> cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|DATA_SRC, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192
> #
>
> # perf report --stdio --dsos stress
>
> # To display the perf.data header info, please use --header/--header-only options.
> #
> # dso: stress
> #
> # Total Lost Samples: 0
> #
> # Samples: 40K of event 'cycles:ppp'
> # Event count (approx.): 31368767617
> #
> # Children Self Command Symbol
> # ........ ........ ....... ......................
> #
> 88.38% 14.59% stress [.] hogcpu
> |
> |--73.78%--hogcpu
> | |
> | --73.73%--rand
> | |
> | --73.71%--__random
> | |
> | --40.58%--__random_r
> |
> |--13.00%--_start
> | __libc_start_main
> | main
> | hogcpu
> |
> --1.59%--0xbb7
> hogcpu
>
> 86.78% 0.00% stress [.] _start
> |
> ---_start
> __libc_start_main
> main
> hogcpu
> |
> --73.73%--rand
> |
> --73.71%--__random
> |
> --40.58%--__random_r
>
> 86.78% 0.00% stress [.] __libc_start_main
> |
> ---__libc_start_main
> main
> hogcpu
> |
> --73.73%--rand
> |
> --73.71%--__random
> |
> --40.58%--__random_r
>
> 86.78% 0.00% stress [.] main
> |
> ---main
> hogcpu
> |
> --73.73%--rand
> |
> --73.71%--__random
> |
> --40.58%--__random_r
>
> 0.01% 0.01% stress [k] 0x000055ae24920bc9
> 0.00% 0.00% stress [k] 0x00007fba0ad15790
> 0.00% 0.00% stress [k] 0x00007fba0ad157e6
> 0.00% 0.00% stress [k] 0x00007fba0ad15770
> 0.00% 0.00% stress [k] 0x000055ae24920bcd
> 0.00% 0.00% stress [k] 0x00007fba0ad157a5
> 0.00% 0.00% stress [k] 0x00007fba0ad15c7d
> 0.00% 0.00% stress [k] 0x00007fba0ad15808
>
>
> #
> # (Tip: System-wide collection from all CPUs: perf record -a)
> #
>
> Also with 'perf script':
>
> <SNIP>
> stress 4691 1422.153895: 576462 cycles:ppp:
> 3b946 __random_r (/usr/lib64/libc-2.25.so)
> 3b7d1 __random (/usr/lib64/libc-2.25.so)
> 3bc78 rand (/usr/lib64/libc-2.25.so)
> 2bbc hogcpu (/usr/bin/stress)
> 160c main (/usr/bin/stress)
> 20509 __libc_start_main (/usr/lib64/libc-2.25.so)
> 2ac9 _start (/usr/bin/stress)
>
> stress 4691 1422.154121: 582967 cycles:ppp:
> 3b946 __random_r (/usr/lib64/libc-2.25.so)
> 3b7d1 __random (/usr/lib64/libc-2.25.so)
> 3bc78 rand (/usr/lib64/libc-2.25.so)
> 2bbc hogcpu (/usr/bin/stress)
> 160c main (/usr/bin/stress)
> 20509 __libc_start_main (/usr/lib64/libc-2.25.so)
> 2ac9 _start (/usr/bin/stress)
>
> stress 4691 1422.154349: 590924 cycles:ppp:
> 3b93c __random_r (/usr/lib64/libc-2.25.so)
> 3b7d1 __random (/usr/lib64/libc-2.25.so)
> 3bc78 rand (/usr/lib64/libc-2.25.so)
> 2bbc hogcpu (/usr/bin/stress)
> 160c main (/usr/bin/stress)
> 20509 __libc_start_main (/usr/lib64/libc-2.25.so)
> 2ac9 _start (/usr/bin/stress)
> <SNIP>
>
>> Here is my output on my clang-built binary:
>
> I'll try as well with a clang built binary:
>
> [acme@jouet linux]$ make CC=clang O=/tmp/build/perf -C tools/perf install-bin
>
> I.e. will profile a perf binary built with clang:
>
> $ readelf -wi ~/bin/perf | grep DW_AT_producer | head -1
> <c> DW_AT_producer : (indirect string, offset: 0x0): clang version 4.0.0 (http://llvm.org/git/clang.git f5be8ba13adc4ba1011a7ccd60c844bd60427c1c) (http://llvm.org/git/llvm.git efca1a37676f4cd276d947658cf90b0fb625abfd)
> $
>
> # perf record -F 2000 --call-graph=dwarf perf top
> [ perf record: Woken up 20 times to write data ]
> [ perf record: Captured and wrote 5.771 MB perf.data (694 samples) ]
> # perf report --dsos=perf --no-children --stdio
> # To display the perf.data header info, please use --header/--header-only options.
> #
> # dso: perf
> #
> # Total Lost Samples: 0
> #
> # Samples: 694 of event 'cycles:ppp'
> # Event count (approx.): 889028447
> #
> # Overhead Command Symbol
> # ........ ....... ......................................
> #
> 4.77% perf [.] rb_next
> |
> ---rb_next
> |
> |--3.15%--map_groups__fixup_overlappings
> | thread__insert_map
> | machine__process_mmap2_event
> | machine__process_event
> | perf_event__process
> | perf_event__synthesize_mmap_events
> | 0xffffffffffffffff
> |
> --1.12%--map__next
> map_groups__find_by_name
> map_groups__set_modules_path_dir
> map_groups__set_modules_path_dir
> 0xffffffffffffffff
>
> 2.69% perf [.] map_groups__fixup_overlappings
> |
> ---map_groups__fixup_overlappings
> thread__insert_map
> machine__process_mmap2_event
> machine__process_event
> perf_event__process
> perf_event__synthesize_mmap_events
> 0xffffffffffffffff
>
> 1.36% perf [.] rb_erase
> |
> ---rb_erase
> symbols__delete
> dso__delete
> dso__put
> |
> --1.18%--map__put
> __map__zput (inlined)
> hist_entry__delete
> hists_evsel__exit
> perf_evsel__exit
> perf_evsel__delete
> perf_evlist__purge (inlined)
> perf_evlist__delete
> cmd_top
> 0xffffffffffffffff
>
> Also seems to work.
>
> Can you try building the latest codebase, that I'm putting at:
>
> [acme@jouet linux]$ rm -f perf-4.14.0-rc6.tar.xz
> [acme@jouet linux]$ make help | grep perf
> perf-tar-src-pkg - Build perf-4.14.0-rc6.tar source tarball
> perf-targz-src-pkg - Build perf-4.14.0-rc6.tar.gz source tarball
> perf-tarbz2-src-pkg - Build perf-4.14.0-rc6.tar.bz2 source tarball
> perf-tarxz-src-pkg - Build perf-4.14.0-rc6.tar.xz source tarball
> [acme@jouet linux]$ make perf-tarxz-src-pkg
> TAR
> PERF_VERSION = 4.14.rc6.gadb243
> [acme@jouet linux]$ ls -lart perf-4*
> -rw-rw-r--. 1 acme acme 1252444 Oct 31 16:06 perf-4.14.0-rc6.tar.xz
> [acme@jouet linux]$
>
> http://vger.kernel.org/~acme/perf/perf-4.14.0-rc6.tar.xz
>
> Download it and try building it:
>
> $ mkdir /tmp/perf
> $ cd /tmp/perf
> $ wget http://vger.kernel.org/~acme/perf/perf-4.14.0-rc6.tar.xz
> $ tar xf perf-4.14.0-rc6.tar.xz
> $ cd perf-4.14.0-rc6
> $ make -C tools/perf install-bin
>
> Then make sure your ~/bin/ directory is in the path and try your workloads to
> see if there was some fix that addresses the problems you're experiencing.
>
> - Arnaldo
--
Owner/Director of Game Studio Abraham Stolk Inc.
Vancouver BC, Canada
b.stolk@gmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Adding a recording delay interferes with symbols/call-stack
2017-11-01 4:36 ` Bram Stolk
@ 2017-11-02 6:20 ` Namhyung Kim
2017-11-02 7:01 ` Bram Stolk
0 siblings, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2017-11-02 6:20 UTC (permalink / raw)
To: Bram Stolk; +Cc: Arnaldo Carvalho de Melo, linux-perf-users, kernel-team
Hello,
On Tue, Oct 31, 2017 at 09:36:20PM -0700, Bram Stolk wrote:
> On Tue, Oct 31, 2017 at 12:14 PM, Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> > Em Tue, Oct 31, 2017 at 10:10:28AM -0700, Bram Stolk escreveu:
> >> Thanks for trying to reproduce, Arnaldo.
> >>
> >> But for me, it's not the kernel symbols that are missing: they are all
> >> there, syscalls and stuff.
> >> Just the app's symbols itself are missing.
> >>
> >> What happens if you try it on a binary you built locally, that spends
> >> cycles in userspace?
The problem is that FORK, COMM and MMAP2 events were not generated for
the workload given on the command line. Normally they are generated
by kernel during exec for this case (target__none). But initial_delay
makes the attr->enable_on_exec bit cleared so kernel won't generate
the evetns for us. System-wide and existing process cases are not
affected since they always synthesize the events.
Could you please check below patch solves your problem?
Thanks,
Namhyung
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f4d9fc54b382..89a192dc235c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -949,6 +949,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
if (opts->initial_delay) {
usleep(opts->initial_delay * USEC_PER_MSEC);
+
+ /*
+ * it did not set attr.enable_on_exec so misses
+ * task and mmap events.
+ */
+ if (target__none(&rec->opts.target))
+ record__synthesize_workload(rec, false);
+
perf_evlist__enable(rec->evlist);
}
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Adding a recording delay interferes with symbols/call-stack
2017-11-02 6:20 ` Namhyung Kim
@ 2017-11-02 7:01 ` Bram Stolk
2017-11-02 7:07 ` Namhyung Kim
2017-11-03 18:41 ` Arnaldo Carvalho de Melo
0 siblings, 2 replies; 10+ messages in thread
From: Bram Stolk @ 2017-11-02 7:01 UTC (permalink / raw)
To: Namhyung Kim; +Cc: Arnaldo Carvalho de Melo, linux-perf-users, kernel-team
Thanks Namhyung,
I tried the patch, but still no app symbols:
perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
...
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.347 MB perf.data (41 samples) ]
perf report -g 'graph,0.25,callee'
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 41 of event 'cycles:uppp'
# Event count (approx.): 94489518008269
#
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ..........................
#
50.11% 50.11% bench bench [.] 0x0000000000010372
7.13% 7.13% bench bench [.] 0x0000000000010358
3.56% 3.56% bench bench [.] 0x00000000000127fa
3.56% 3.56% bench libm-2.23.so [.] 0x00000000000328c3
3.56% 3.56% bench bench [.] 0x000000000001241d
3.56% 3.56% bench bench [.] 0x0000000000012121
3.56% 3.56% bench bench [.] 0x000000000001032d
3.56% 3.56% bench bench [.] 0x000000000001006e
3.56% 3.56% bench bench [.] 0x00000000000121b8
3.56% 3.56% bench bench [.] 0x0000000000012464
3.56% 3.56% bench bench [.] 0x0000000000012400
3.56% 3.56% bench bench [.] 0x0000000000010377
3.56% 3.56% bench bench [.] 0x000000000001008c
3.56% 3.56% bench bench [.] 0x0000000000010091
0.00% 0.00% bench bench [.] 0x00000000000124ee
0.00% 0.00% bench bench [.] 0x000000000000a530
0.00% 0.00% bench bench [.] 0x0000000000012091
0.00% 0.00% bench bench [.] 0x000000000001212f
0.00% 0.00% bench bench [.] 0x000000000000ffff
0.00% 0.00% bench bench [.] 0x000000000000ff9b
0.00% 0.00% bench bench [.] 0x000000000001009b
0.00% 0.00% bench bench [.] 0x0000000000010332
0.00% 0.00% bench bench [.] 0x00000000000102f0
0.00% 0.00% bench bench [.] 0x0000000000010368
0.00% 0.00% bench bench [.] 0x0000000000012113
0.00% 0.00% bench bench [.] 0x0000000000010068
0.00% 0.00% bench bench [.] 0x0000000000012419
0.00% 0.00% bench bench [.] 0x0000000000012431
0.00% 0.00% bench bench [.] 0x000000000001267f
0.00% 0.00% bench bench [.] 0x000000000000aea9
0.00% 0.00% bench bench [.] 0x0000000000012742
0.00% 0.00% bench [kernel.kallsyms] [.] native_irq_return_iret
On Wed, Nov 1, 2017 at 11:20 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hello,
>
> On Tue, Oct 31, 2017 at 09:36:20PM -0700, Bram Stolk wrote:
>> On Tue, Oct 31, 2017 at 12:14 PM, Arnaldo Carvalho de Melo
>> <arnaldo.melo@gmail.com> wrote:
>> > Em Tue, Oct 31, 2017 at 10:10:28AM -0700, Bram Stolk escreveu:
>> >> Thanks for trying to reproduce, Arnaldo.
>> >>
>> >> But for me, it's not the kernel symbols that are missing: they are all
>> >> there, syscalls and stuff.
>> >> Just the app's symbols itself are missing.
>> >>
>> >> What happens if you try it on a binary you built locally, that spends
>> >> cycles in userspace?
>
> The problem is that FORK, COMM and MMAP2 events were not generated for
> the workload given on the command line. Normally they are generated
> by kernel during exec for this case (target__none). But initial_delay
> makes the attr->enable_on_exec bit cleared so kernel won't generate
> the evetns for us. System-wide and existing process cases are not
> affected since they always synthesize the events.
>
> Could you please check below patch solves your problem?
>
> Thanks,
> Namhyung
>
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index f4d9fc54b382..89a192dc235c 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -949,6 +949,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>
> if (opts->initial_delay) {
> usleep(opts->initial_delay * USEC_PER_MSEC);
> +
> + /*
> + * it did not set attr.enable_on_exec so misses
> + * task and mmap events.
> + */
> + if (target__none(&rec->opts.target))
> + record__synthesize_workload(rec, false);
> +
> perf_evlist__enable(rec->evlist);
> }
>
--
Owner/Director of Game Studio Abraham Stolk Inc.
Vancouver BC, Canada
b.stolk@gmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Adding a recording delay interferes with symbols/call-stack
2017-11-02 7:01 ` Bram Stolk
@ 2017-11-02 7:07 ` Namhyung Kim
2017-11-02 7:10 ` Bram Stolk
2017-11-03 18:41 ` Arnaldo Carvalho de Melo
1 sibling, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2017-11-02 7:07 UTC (permalink / raw)
To: Bram Stolk; +Cc: Arnaldo Carvalho de Melo, linux-perf-users, kernel-team
On Thu, Nov 02, 2017 at 12:01:24AM -0700, Bram Stolk wrote:
> Thanks Namhyung,
>
> I tried the patch, but still no app symbols:
>
> perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
> ...
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 0.347 MB perf.data (41 samples) ]
>
> perf report -g 'graph,0.25,callee'
Hmmm... could you please show me whether MMAP2 events are recorded?
$ perf report -D | tail
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Adding a recording delay interferes with symbols/call-stack
2017-11-02 7:07 ` Namhyung Kim
@ 2017-11-02 7:10 ` Bram Stolk
0 siblings, 0 replies; 10+ messages in thread
From: Bram Stolk @ 2017-11-02 7:10 UTC (permalink / raw)
To: Namhyung Kim; +Cc: Arnaldo Carvalho de Melo, linux-perf-users, kernel-team
$ perf report -D | tail
COMM events: 2
EXIT events: 5
FORK events: 4
SAMPLE events: 62
MMAP2 events: 93
FINISHED_ROUND events: 2
TIME_CONV events: 1
cycles:uppp stats:
TOTAL events: 62
SAMPLE events: 62
This is with patched perf.
Bed time on west coast, I'll do more testing tomorrow. Thanks for the help!
On Thu, Nov 2, 2017 at 12:07 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> On Thu, Nov 02, 2017 at 12:01:24AM -0700, Bram Stolk wrote:
>> Thanks Namhyung,
>>
>> I tried the patch, but still no app symbols:
>>
>> perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
>> ...
>> [ perf record: Woken up 2 times to write data ]
>> [ perf record: Captured and wrote 0.347 MB perf.data (41 samples) ]
>>
>> perf report -g 'graph,0.25,callee'
>
> Hmmm... could you please show me whether MMAP2 events are recorded?
>
> $ perf report -D | tail
>
> Thanks,
> Namhyung
--
Owner/Director of Game Studio Abraham Stolk Inc.
Vancouver BC, Canada
b.stolk@gmail.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Adding a recording delay interferes with symbols/call-stack
2017-11-02 7:01 ` Bram Stolk
2017-11-02 7:07 ` Namhyung Kim
@ 2017-11-03 18:41 ` Arnaldo Carvalho de Melo
1 sibling, 0 replies; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-11-03 18:41 UTC (permalink / raw)
To: Bram Stolk
Cc: Namhyung Kim, Arnaldo Carvalho de Melo, linux-perf-users, kernel-team
Em Thu, Nov 02, 2017 at 12:01:24AM -0700, Bram Stolk escreveu:
> Thanks Namhyung,
>
> I tried the patch, but still no app symbols:
>
> perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
Can you try this one instead:
commit 81b5911db9f587ebe30af81a667ff8de2c43eaf0
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date: Fri Nov 3 15:34:34 2017 -0300
perf record: Generate PERF_RECORD_{MMAP,COMM,EXEC} with --delay
When we use an initial delay, e.g.: 'perf record --delay 1000', we do not
enable the events until that delay has passed after we started the workload,
including the tracking event, i.e. the one for which we have attr.mmap, etc,
enabled to ask the kernel to generate the PERF_RECORD_{MMAP,COMM,EXEC} metadata
events that will then allow us to resolve addresses in samples to the map, dso
and symbol. There will be a shadow that even synthesizing samples won't cover,
i.e. the workload that we start and others that fork while we wait for that delay.
So use a dummy event to be the tracking one and make it be enabled on exec.
Before:
# perf record --delay 1000 stress --cpu 1 --timeout 5
stress: info: [9029] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
stress: info: [9029] successful run completed in 5s
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.624 MB perf.data (15908 samples) ]
# perf script | head
:9031 9031 32001.826888: 1 cycles:ppp: ffffffff831aa30d event_function (/lib/modules/4.14.0-rc6+/build/vmlinux)
:9031 9031 32001.826893: 1 cycles:ppp: ffffffff8300d1a0 intel_bts_enable_local (/lib/modules/4.14.0-rc6+/build/vmlinux)
:9031 9031 32001.826895: 7 cycles:ppp: ffffffff83023870 sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
:9031 9031 32001.826897: 103 cycles:ppp: ffffffff8300c331 intel_pmu_handle_irq (/lib/modules/4.14.0-rc6+/build/vmlinux)
:9031 9031 32001.826899: 1615 cycles:ppp: ffffffff830231f8 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
:9031 9031 32001.826902: 26724 cycles:ppp: ffffffff8384c6a7 native_irq_return_iret (/lib/modules/4.14.0-rc6+/build/vmlinux)
:9031 9031 32001.826913: 329739 cycles:ppp: 7fb2a5410932 [unknown] ([unknown])
:9031 9031 32001.827033: 1225451 cycles:ppp: 7fb2a5410930 [unknown] ([unknown])
:9031 9031 32001.827474: 1391725 cycles:ppp: 7fb2a5410930 [unknown] ([unknown])
:9031 9031 32001.827978: 1233697 cycles:ppp: 7fb2a5410928 [unknown] ([unknown])
#
After:
# perf record --delay 1000 stress --cpu 1 --timeout 5
stress: info: [9741] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
stress: info: [9741] successful run completed in 5s
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.751 MB perf.data (15976 samples) ]
# perf script | head
stress 9742 32110.959106: 1 cycles:ppp: ffffffff831b26f6 __perf_event_task_sched_in (/lib/modules/4.14.0-rc6+/build/vmlinux)
stress 9742 32110.959110: 1 cycles:ppp: ffffffff8300c2e9 intel_pmu_handle_irq (/lib/modules/4.14.0-rc6+/build/vmlinux)
stress 9742 32110.959112: 7 cycles:ppp: ffffffff830231e0 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
stress 9742 32110.959115: 101 cycles:ppp: ffffffff83023870 sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
stress 9742 32110.959117: 1533 cycles:ppp: ffffffff830231f8 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
stress 9742 32110.959119: 23992 cycles:ppp: ffffffff831b0900 ctx_sched_in (/lib/modules/4.14.0-rc6+/build/vmlinux)
stress 9742 32110.959129: 329406 cycles:ppp: 7f4b1b661930 __random_r (/usr/lib64/libc-2.25.so)
stress 9742 32110.959249: 1288322 cycles:ppp: 5566e1e7cbc9 hogcpu (/usr/bin/stress)
stress 9742 32110.959712: 1464046 cycles:ppp: 7f4b1b66179e __random (/usr/lib64/libc-2.25.so)
stress 9742 32110.960241: 1266918 cycles:ppp: 7f4b1b66195b __random_r (/usr/lib64/libc-2.25.so)
#
Reported-by: Bram Stolk <b.stolk@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-nrdfchshqxf7diszhxcecqb9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f4d9fc54b382..3c74bfae1aeb 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -338,6 +338,22 @@ static int record__open(struct record *rec)
struct perf_evsel_config_term *err_term;
int rc = 0;
+ /*
+ * For initial_delay we need to add a dummy event so that we can track
+ * PERF_RECORD_MMAP while we wait for the initial delay to enable the
+ * real events, the ones asked by the user.
+ */
+ if (opts->initial_delay) {
+ if (perf_evlist__add_dummy(evlist))
+ return -ENOMEM;
+
+ pos = perf_evlist__first(evlist);
+ pos->tracking = 0;
+ pos = perf_evlist__last(evlist);
+ pos->tracking = 1;
+ pos->attr.enable_on_exec = 1;
+ }
+
perf_evlist__config(evlist, opts, &callchain_param);
evlist__for_each_entry(evlist, pos) {
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-11-03 18:41 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-30 20:25 Adding a recording delay interferes with symbols/call-stack Bram Stolk
2017-10-31 14:25 ` Arnaldo Carvalho de Melo
2017-10-31 17:10 ` Bram Stolk
2017-10-31 19:14 ` Arnaldo Carvalho de Melo
2017-11-01 4:36 ` Bram Stolk
2017-11-02 6:20 ` Namhyung Kim
2017-11-02 7:01 ` Bram Stolk
2017-11-02 7:07 ` Namhyung Kim
2017-11-02 7:10 ` Bram Stolk
2017-11-03 18:41 ` Arnaldo Carvalho de Melo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.