All of lore.kernel.org
 help / color / mirror / Atom feed
* Adding a recording delay interferes with symbols/call-stack
@ 2017-10-30 20:25 Bram Stolk
  2017-10-31 14:25 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Bram Stolk @ 2017-10-30 20:25 UTC (permalink / raw)
  To: linux-perf-users

If I record with callstacks like this:

$ perf record --freq 2000 --call-graph dwarf ./bench

then it works just fine: I see callstacks, and symbols in the report.

If I delay the recording by 1 second with the --delay flag, like this:

$ perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench

Then I lose all callstack information, and even all symbol information
on my program in the report.

Is this a known issue, that --delay interferes with symbol resolution?

Thanks!

  Bram

PS: I am on 64b Ubuntu, and perf version 4.4.87

-- 
Owner/Director of Game Studio Abraham Stolk Inc.
Vancouver BC, Canada
b.stolk@gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding a recording delay interferes with symbols/call-stack
  2017-10-30 20:25 Adding a recording delay interferes with symbols/call-stack Bram Stolk
@ 2017-10-31 14:25 ` Arnaldo Carvalho de Melo
  2017-10-31 17:10   ` Bram Stolk
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-31 14:25 UTC (permalink / raw)
  To: Bram Stolk; +Cc: linux-perf-users

Em Mon, Oct 30, 2017 at 01:25:08PM -0700, Bram Stolk escreveu:
> If I record with callstacks like this:
> 
> $ perf record --freq 2000 --call-graph dwarf ./bench
> 
> then it works just fine: I see callstacks, and symbols in the report.
> 
> If I delay the recording by 1 second with the --delay flag, like this:
> 
> $ perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
> 
> Then I lose all callstack information, and even all symbol information
> on my program in the report.
> 
> Is this a known issue, that --delay interferes with symbol resolution?
> 
> Thanks!
> 
>   Bram
> 
> PS: I am on 64b Ubuntu, and perf version 4.4.87

Works for me: :-)

[root@jouet ~]# perf record --delay=1000 --freq 2000 --call-graph dwarf sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.050 MB perf.data (4 samples) ]
[root@jouet ~]# perf script
perf 30567 248464.097254:          1 cycles:ppp: 
                  3b233a __perf_event_task_sched_in (/lib/modules/4.14.0-rc3+/build/vmlinux)
                  2be207 finish_task_switch (/lib/modules/4.14.0-rc3+/build/vmlinux)
                  a44d21 __sched_text_start (/lib/modules/4.14.0-rc3+/build/vmlinux)
                  a45368 schedule (/lib/modules/4.14.0-rc3+/build/vmlinux)
                  a49679 do_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
                  30c816 hrtimer_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
                  30c96e sys_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
                  203a41 do_syscall_64 (/lib/modules/4.14.0-rc3+/build/vmlinux)
                  a4ad67 return_from_SYSCALL_64 (/lib/modules/4.14.0-rc3+/build/vmlinux)

perf 30567 248464.097288:          1 cycles:ppp: 
                  20d1d0 intel_bts_enable_local (/lib/modules/4.14.0-rc3+/build/vmlinux)

perf 30567 248464.097302:          3 cycles:ppp: 
                  223111 native_sched_clock (/lib/modules/4.14.0-rc3+/build/vmlinux)

perf 30567 248464.097317:         16 cycles:ppp: 
                  223111 native_sched_clock (/lib/modules/4.14.0-rc3+/build/vmlinux)

[root@jouet ~]# perf -v
perf version 4.14.rc6.gd688d0
[root@jouet ~]# uname -a
Linux jouet 4.14.0-rc3+ #1 SMP Fri Oct 13 12:21:12 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@jouet ~]#

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding a recording delay interferes with symbols/call-stack
  2017-10-31 14:25 ` Arnaldo Carvalho de Melo
@ 2017-10-31 17:10   ` Bram Stolk
  2017-10-31 19:14     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Bram Stolk @ 2017-10-31 17:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users

Thanks for trying to reproduce, Arnaldo.

But for me, it's not the kernel symbols that are missing: they are all
there, syscalls and stuff.
Just the app's symbols itself are missing.

What happens if you try it on a binary you built locally, that spends
cycles in userspace?

Here is my output on my clang-built binary:

$ perf script
perf 10198 144604.396458:          1 cycles:pp:
        ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
        ffffffff8100aebd __intel_pmu_enable_all ([kernel.kallsyms])
        ffffffff8100af30 intel_pmu_enable_all ([kernel.kallsyms])
        ffffffff81006e41 x86_pmu_enable ([kernel.kallsyms])
        ffffffff8117c527 perf_pmu_enable.part.89 ([kernel.kallsyms])
        ffffffff811813e1 perf_pmu_enable ([kernel.kallsyms])
        ffffffff810052e0 x86_pmu_commit_txn ([kernel.kallsyms])
        ffffffff8117e887 group_sched_in ([kernel.kallsyms])
        ffffffff8117eb50 __perf_event_enable ([kernel.kallsyms])
        ffffffff8117c0fa remote_function ([kernel.kallsyms])
        ffffffff8110445f flush_smp_call_function_queue ([kernel.kallsyms])
        ffffffff81104e53 generic_smp_call_function_single_interrupt
([kernel.kallsyms])
        ffffffff81050df7 smp_call_function_single_interrupt ([kernel.kallsyms])
        ffffffff81844c22 call_function_single_interrupt ([kernel.kallsyms])
        ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])

perf 10198 144604.396465:          1 cycles:pp:
        ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
        ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])

perf 10198 144604.396468:         10 cycles:pp:
        ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
        ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])

perf 10198 144604.396471:        184 cycles:pp:
        ffffffff81064508 native_write_msr_safe ([kernel.kallsyms])
        ffffffff81037ddd native_sched_clock ([kernel.kallsyms])

perf 10198 144604.396474:       4362 cycles:pp:
        ffffffff81037ddd native_sched_clock ([kernel.kallsyms])
                  4125ba [unknown] ([unknown])

perf 10198 144604.396479:     104121 cycles:pp:
                  412110 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 10198 144604.396514:    1399685 cycles:pp:
                  411ba0 [unknown] ([unknown])
                  412634 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 10198 144604.396956:    3677146 cycles:pp:
                  412110 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 10198 144604.398111:    3738199 cycles:pp:
                  40fda0 [unknown] ([unknown])
                  412311 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 10198 144604.399285:    3473249 cycles:pp:
                  411ba0 [unknown] ([unknown])
                  412634 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 10198 144604.400375:    3224007 cycles:pp:
                  412110 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 10198 144604.401390: 24189320560325 cycles:pp:
                  40f210 [unknown] ([unknown])
                  41250b [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 10198 144604.815874:   28367691 cycles:pp:
                  412110 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])
                  40b303 [unknown] ([unknown])
                  401917 [unknown] ([unknown])

On Tue, Oct 31, 2017 at 7:25 AM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Mon, Oct 30, 2017 at 01:25:08PM -0700, Bram Stolk escreveu:
>> If I record with callstacks like this:
>>
>> $ perf record --freq 2000 --call-graph dwarf ./bench
>>
>> then it works just fine: I see callstacks, and symbols in the report.
>>
>> If I delay the recording by 1 second with the --delay flag, like this:
>>
>> $ perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
>>
>> Then I lose all callstack information, and even all symbol information
>> on my program in the report.
>>
>> Is this a known issue, that --delay interferes with symbol resolution?
>>
>> Thanks!
>>
>>   Bram
>>
>> PS: I am on 64b Ubuntu, and perf version 4.4.87
>
> Works for me: :-)
>
> [root@jouet ~]# perf record --delay=1000 --freq 2000 --call-graph dwarf sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.050 MB perf.data (4 samples) ]
> [root@jouet ~]# perf script
> perf 30567 248464.097254:          1 cycles:ppp:
>                   3b233a __perf_event_task_sched_in (/lib/modules/4.14.0-rc3+/build/vmlinux)
>                   2be207 finish_task_switch (/lib/modules/4.14.0-rc3+/build/vmlinux)
>                   a44d21 __sched_text_start (/lib/modules/4.14.0-rc3+/build/vmlinux)
>                   a45368 schedule (/lib/modules/4.14.0-rc3+/build/vmlinux)
>                   a49679 do_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
>                   30c816 hrtimer_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
>                   30c96e sys_nanosleep (/lib/modules/4.14.0-rc3+/build/vmlinux)
>                   203a41 do_syscall_64 (/lib/modules/4.14.0-rc3+/build/vmlinux)
>                   a4ad67 return_from_SYSCALL_64 (/lib/modules/4.14.0-rc3+/build/vmlinux)
>
> perf 30567 248464.097288:          1 cycles:ppp:
>                   20d1d0 intel_bts_enable_local (/lib/modules/4.14.0-rc3+/build/vmlinux)
>
> perf 30567 248464.097302:          3 cycles:ppp:
>                   223111 native_sched_clock (/lib/modules/4.14.0-rc3+/build/vmlinux)
>
> perf 30567 248464.097317:         16 cycles:ppp:
>                   223111 native_sched_clock (/lib/modules/4.14.0-rc3+/build/vmlinux)
>
> [root@jouet ~]# perf -v
> perf version 4.14.rc6.gd688d0
> [root@jouet ~]# uname -a
> Linux jouet 4.14.0-rc3+ #1 SMP Fri Oct 13 12:21:12 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
> [root@jouet ~]#



-- 
Owner/Director of Game Studio Abraham Stolk Inc.
Vancouver BC, Canada
b.stolk@gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding a recording delay interferes with symbols/call-stack
  2017-10-31 17:10   ` Bram Stolk
@ 2017-10-31 19:14     ` Arnaldo Carvalho de Melo
  2017-11-01  4:36       ` Bram Stolk
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-10-31 19:14 UTC (permalink / raw)
  To: Bram Stolk; +Cc: linux-perf-users

Em Tue, Oct 31, 2017 at 10:10:28AM -0700, Bram Stolk escreveu:
> Thanks for trying to reproduce, Arnaldo.
> 
> But for me, it's not the kernel symbols that are missing: they are all
> there, syscalls and stuff.
> Just the app's symbols itself are missing.
> 
> What happens if you try it on a binary you built locally, that spends
> cycles in userspace?

So, trying with 'perf trace' to do a system wide strace like session for the
'open' syscall, that will use DWARF callchains and go from userspace all the
way to raw_syscalls:sys_enter/sys_exit, to check various setups to try to
narrow down what may be broken:

[root@jouet ~]# perf trace --max-stack 32 -e open
   518.496 ( 0.049 ms): libvirtd/1171 open(filename: /proc/cpuinfo, mode: IRUGO|IWUGO                       ) = 23
                                       __open_nocancel (inlined)
                                       __GI__IO_file_open (inlined)
                                       __GI__IO_file_fopen (inlined)
                                       __fopen_internal (/usr/lib64/libc-2.25.so)
                                       virHostCPUGetInfo (/usr/lib64/libvirt.so.0.3002.1)
                                       virCapabilitiesGetNodeInfo (/usr/lib64/libvirt.so.0.3002.1)
                                       virNodeGetInfo (/usr/lib64/libvirt.so.0.3002.1)
                                       _init (/usr/sbin/libvirtd)
                                       virNetServerProgramDispatch (/usr/lib64/libvirt.so.0.3002.1)
                                       [0xffffaa7a2a4a8737] (/usr/sbin/libvirtd)
                                       [0xffff80b476e99f30] (/usr/lib64/libvirt.so.0.3002.1)
                                       [0xffff80b476e992a7] (/usr/lib64/libvirt.so.0.3002.1)
                                       start_thread (/usr/lib64/libpthread-2.25.so)
                                       __GI___clone (inlined)
   518.817 ( 0.022 ms): libvirtd/1171 open(filename: /sys/devices/system/cpu/present, mode: IFSOCK|ISGID|IXUSR|IRGRP|0x720b0000) = 24
                                       [0xffff80b47aad198d] (/usr/lib64/libpthread-2.25.so)
                                       virFileReadAll (/usr/lib64/libvirt.so.0.3002.1)
                                       virFileReadValueBitmap (/usr/lib64/libvirt.so.0.3002.1)
                                       virSysfsGetValueBitmap (/usr/lib64/libvirt.so.0.3002.1)
                                       virHostCPUGetPresentBitmap (/usr/lib64/libvirt.so.0.3002.1)
                                       virHostCPUGetInfoPopulateLinux (/usr/lib64/libvirt.so.0.3002.1)
                                       virHostCPUGetInfo (/usr/lib64/libvirt.so.0.3002.1)
                                       virCapabilitiesGetNodeInfo (/usr/lib64/libvirt.so.0.3002.1)
                                       virNodeGetInfo (/usr/lib64/libvirt.so.0.3002.1)
                                       _init (/usr/sbin/libvirtd)
                                       virNetServerProgramDispatch (/usr/lib64/libvirt.so.0.3002.1)
                                       [0xffffaa7a2a4a8737] (/usr/sbin/libvirtd)
                                       [0xffff80b476e99f30] (/usr/lib64/libvirt.so.0.3002.1)
                                       [0xffff80b476e992a7] (/usr/lib64/libvirt.so.0.3002.1)
                                       start_thread (/usr/lib64/libpthread-2.25.so)
                                       __GI___clone (inlined)

--------------------------

Seems to work, now using a CPU hog and sampling the most precise CPU "cycles" hardware counter:

# time perf record --call-graph=dwarf stress --cpu 1 --timeout 10
stress: info: [4690] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
stress: info: [4690] successful run completed in 10s
[ perf record: Woken up 1288 times to write data ]
[ perf record: Captured and wrote 321.915 MB perf.data (40001 samples) ]

real	0m10.174s
user	0m10.065s
sys	0m0.244s
# 
# perf evlist -v
cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|DATA_SRC, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192
# 

# perf report --stdio --dsos stress

# To display the perf.data header info, please use --header/--header-only options.
#
# dso: stress
#
# Total Lost Samples: 0
#
# Samples: 40K of event 'cycles:ppp'
# Event count (approx.): 31368767617
#
# Children      Self  Command  Symbol                
# ........  ........  .......  ......................
#
    88.38%    14.59%  stress   [.] hogcpu
            |          
            |--73.78%--hogcpu
            |          |          
            |           --73.73%--rand
            |                     |          
            |                      --73.71%--__random
            |                                |          
            |                                 --40.58%--__random_r
            |          
            |--13.00%--_start
            |          __libc_start_main
            |          main
            |          hogcpu
            |          
             --1.59%--0xbb7
                       hogcpu

    86.78%     0.00%  stress   [.] _start
            |
            ---_start
               __libc_start_main
               main
               hogcpu
               |          
                --73.73%--rand
                          |          
                           --73.71%--__random
                                     |          
                                      --40.58%--__random_r

    86.78%     0.00%  stress   [.] __libc_start_main
            |
            ---__libc_start_main
               main
               hogcpu
               |          
                --73.73%--rand
                          |          
                           --73.71%--__random
                                     |          
                                      --40.58%--__random_r

    86.78%     0.00%  stress   [.] main
            |
            ---main
               hogcpu
               |          
                --73.73%--rand
                          |          
                           --73.71%--__random
                                     |          
                                      --40.58%--__random_r

     0.01%     0.01%  stress   [k] 0x000055ae24920bc9
     0.00%     0.00%  stress   [k] 0x00007fba0ad15790
     0.00%     0.00%  stress   [k] 0x00007fba0ad157e6
     0.00%     0.00%  stress   [k] 0x00007fba0ad15770
     0.00%     0.00%  stress   [k] 0x000055ae24920bcd
     0.00%     0.00%  stress   [k] 0x00007fba0ad157a5
     0.00%     0.00%  stress   [k] 0x00007fba0ad15c7d
     0.00%     0.00%  stress   [k] 0x00007fba0ad15808


#
# (Tip: System-wide collection from all CPUs: perf record -a)
#

Also with 'perf script':

<SNIP>
stress  4691  1422.153895:     576462 cycles:ppp: 
                   3b946 __random_r (/usr/lib64/libc-2.25.so)
                   3b7d1 __random (/usr/lib64/libc-2.25.so)
                   3bc78 rand (/usr/lib64/libc-2.25.so)
                    2bbc hogcpu (/usr/bin/stress)
                    160c main (/usr/bin/stress)
                   20509 __libc_start_main (/usr/lib64/libc-2.25.so)
                    2ac9 _start (/usr/bin/stress)

stress  4691  1422.154121:     582967 cycles:ppp: 
                   3b946 __random_r (/usr/lib64/libc-2.25.so)
                   3b7d1 __random (/usr/lib64/libc-2.25.so)
                   3bc78 rand (/usr/lib64/libc-2.25.so)
                    2bbc hogcpu (/usr/bin/stress)
                    160c main (/usr/bin/stress)
                   20509 __libc_start_main (/usr/lib64/libc-2.25.so)
                    2ac9 _start (/usr/bin/stress)

stress  4691  1422.154349:     590924 cycles:ppp: 
                   3b93c __random_r (/usr/lib64/libc-2.25.so)
                   3b7d1 __random (/usr/lib64/libc-2.25.so)
                   3bc78 rand (/usr/lib64/libc-2.25.so)
                    2bbc hogcpu (/usr/bin/stress)
                    160c main (/usr/bin/stress)
                   20509 __libc_start_main (/usr/lib64/libc-2.25.so)
                    2ac9 _start (/usr/bin/stress)
<SNIP>
 
> Here is my output on my clang-built binary:

I'll try as well with a clang built binary:

[acme@jouet linux]$ make CC=clang O=/tmp/build/perf -C tools/perf install-bin

I.e. will profile a perf binary built with clang:

$ readelf -wi ~/bin/perf | grep DW_AT_producer | head -1
    <c>   DW_AT_producer    : (indirect string, offset: 0x0): clang version 4.0.0 (http://llvm.org/git/clang.git f5be8ba13adc4ba1011a7ccd60c844bd60427c1c) (http://llvm.org/git/llvm.git efca1a37676f4cd276d947658cf90b0fb625abfd)
$

# perf record -F 2000 --call-graph=dwarf perf top
[ perf record: Woken up 20 times to write data ]
[ perf record: Captured and wrote 5.771 MB perf.data (694 samples) ]
# perf report --dsos=perf --no-children --stdio 
# To display the perf.data header info, please use --header/--header-only options.
#
# dso: perf
#
# Total Lost Samples: 0
#
# Samples: 694  of event 'cycles:ppp'
# Event count (approx.): 889028447
#
# Overhead  Command  Symbol                                
# ........  .......  ......................................
#
     4.77%  perf     [.] rb_next
            |
            ---rb_next
               |          
               |--3.15%--map_groups__fixup_overlappings
               |          thread__insert_map
               |          machine__process_mmap2_event
               |          machine__process_event
               |          perf_event__process
               |          perf_event__synthesize_mmap_events
               |          0xffffffffffffffff
               |          
                --1.12%--map__next
                          map_groups__find_by_name
                          map_groups__set_modules_path_dir
                          map_groups__set_modules_path_dir
                          0xffffffffffffffff

     2.69%  perf     [.] map_groups__fixup_overlappings
            |
            ---map_groups__fixup_overlappings
               thread__insert_map
               machine__process_mmap2_event
               machine__process_event
               perf_event__process
               perf_event__synthesize_mmap_events
               0xffffffffffffffff

     1.36%  perf     [.] rb_erase
            |
            ---rb_erase
               symbols__delete
               dso__delete
               dso__put
               |          
                --1.18%--map__put
                          __map__zput (inlined)
                          hist_entry__delete
                          hists_evsel__exit
                          perf_evsel__exit
                          perf_evsel__delete
                          perf_evlist__purge (inlined)
                          perf_evlist__delete
                          cmd_top
                          0xffffffffffffffff

Also seems to work.

Can you try building the latest codebase, that I'm putting at:

[acme@jouet linux]$ rm -f perf-4.14.0-rc6.tar.xz 
[acme@jouet linux]$ make help | grep perf
  perf-tar-src-pkg    - Build perf-4.14.0-rc6.tar source tarball
  perf-targz-src-pkg  - Build perf-4.14.0-rc6.tar.gz source tarball
  perf-tarbz2-src-pkg - Build perf-4.14.0-rc6.tar.bz2 source tarball
  perf-tarxz-src-pkg  - Build perf-4.14.0-rc6.tar.xz source tarball
[acme@jouet linux]$ make perf-tarxz-src-pkg
  TAR
  PERF_VERSION = 4.14.rc6.gadb243
[acme@jouet linux]$ ls -lart perf-4*
-rw-rw-r--. 1 acme acme 1252444 Oct 31 16:06 perf-4.14.0-rc6.tar.xz
[acme@jouet linux]$

http://vger.kernel.org/~acme/perf/perf-4.14.0-rc6.tar.xz

Download it and try building it:

$ mkdir /tmp/perf
$ cd /tmp/perf
$ wget http://vger.kernel.org/~acme/perf/perf-4.14.0-rc6.tar.xz
$ tar xf perf-4.14.0-rc6.tar.xz
$ cd perf-4.14.0-rc6
$ make -C tools/perf install-bin

Then make sure your ~/bin/ directory is in the path and try your workloads to
see if there was some fix that addresses the problems you're experiencing.

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding a recording delay interferes with symbols/call-stack
  2017-10-31 19:14     ` Arnaldo Carvalho de Melo
@ 2017-11-01  4:36       ` Bram Stolk
  2017-11-02  6:20         ` Namhyung Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Bram Stolk @ 2017-11-01  4:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-perf-users

Thanks again, Arnaldo!


So, ugh, there is something wrong with either my system or my binary,
because the newly built perf has the exact same problem.

bram@ubuntu64:~/apps/BloxTrace$ perf -v
perf version 4.14.rc6.gadb243

bram@ubuntu64:~/apps/BloxTrace$ perf record --delay=1000 --freq 2000
--call-graph lbr ./bench
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.015 MB perf.data (37 samples) ]

bram@ubuntu64:~/apps/BloxTrace$ perf script
perf 20537 185854.082586:          1 cycles:uppp:
        ffffffff81844357 native_irq_return_iret ([kernel.kallsyms])

perf 20537 185854.082591:          1 cycles:uppp:
          78400200000009 [unknown] ([unknown])

perf 20537 185854.082593:         12 cycles:uppp:
        ffffffff81844357 native_irq_return_iret ([kernel.kallsyms])

perf 20537 185854.082595:        343 cycles:uppp:
        ffffffff81844357 native_irq_return_iret ([kernel.kallsyms])

perf 20537 185854.082597:      10499 cycles:uppp:
                  412280 [unknown] ([unknown])

perf 20537 185854.082603:     326186 cycles:uppp:
                  412110 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 20537 185854.082708:    3664038 cycles:uppp:
                  411ba0 [unknown] ([unknown])
                  412634 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 20537 185854.083857:    5403078 cycles:uppp:
                  412110 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 20537 185854.085554: 17592235560281 cycles:uppp:
                  412110 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])

perf 20537 185854.525567:    9346320 cycles:uppp:
                  412110 [unknown] ([unknown])
                  40ada3 [unknown] ([unknown])
                  40b303 [unknown] ([unknown])
                  401917 [unknown] ([unknown])
...


On Tue, Oct 31, 2017 at 12:14 PM, Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
> Em Tue, Oct 31, 2017 at 10:10:28AM -0700, Bram Stolk escreveu:
>> Thanks for trying to reproduce, Arnaldo.
>>
>> But for me, it's not the kernel symbols that are missing: they are all
>> there, syscalls and stuff.
>> Just the app's symbols itself are missing.
>>
>> What happens if you try it on a binary you built locally, that spends
>> cycles in userspace?
>
> So, trying with 'perf trace' to do a system wide strace like session for the
> 'open' syscall, that will use DWARF callchains and go from userspace all the
> way to raw_syscalls:sys_enter/sys_exit, to check various setups to try to
> narrow down what may be broken:
>
> [root@jouet ~]# perf trace --max-stack 32 -e open
>    518.496 ( 0.049 ms): libvirtd/1171 open(filename: /proc/cpuinfo, mode: IRUGO|IWUGO                       ) = 23
>                                        __open_nocancel (inlined)
>                                        __GI__IO_file_open (inlined)
>                                        __GI__IO_file_fopen (inlined)
>                                        __fopen_internal (/usr/lib64/libc-2.25.so)
>                                        virHostCPUGetInfo (/usr/lib64/libvirt.so.0.3002.1)
>                                        virCapabilitiesGetNodeInfo (/usr/lib64/libvirt.so.0.3002.1)
>                                        virNodeGetInfo (/usr/lib64/libvirt.so.0.3002.1)
>                                        _init (/usr/sbin/libvirtd)
>                                        virNetServerProgramDispatch (/usr/lib64/libvirt.so.0.3002.1)
>                                        [0xffffaa7a2a4a8737] (/usr/sbin/libvirtd)
>                                        [0xffff80b476e99f30] (/usr/lib64/libvirt.so.0.3002.1)
>                                        [0xffff80b476e992a7] (/usr/lib64/libvirt.so.0.3002.1)
>                                        start_thread (/usr/lib64/libpthread-2.25.so)
>                                        __GI___clone (inlined)
>    518.817 ( 0.022 ms): libvirtd/1171 open(filename: /sys/devices/system/cpu/present, mode: IFSOCK|ISGID|IXUSR|IRGRP|0x720b0000) = 24
>                                        [0xffff80b47aad198d] (/usr/lib64/libpthread-2.25.so)
>                                        virFileReadAll (/usr/lib64/libvirt.so.0.3002.1)
>                                        virFileReadValueBitmap (/usr/lib64/libvirt.so.0.3002.1)
>                                        virSysfsGetValueBitmap (/usr/lib64/libvirt.so.0.3002.1)
>                                        virHostCPUGetPresentBitmap (/usr/lib64/libvirt.so.0.3002.1)
>                                        virHostCPUGetInfoPopulateLinux (/usr/lib64/libvirt.so.0.3002.1)
>                                        virHostCPUGetInfo (/usr/lib64/libvirt.so.0.3002.1)
>                                        virCapabilitiesGetNodeInfo (/usr/lib64/libvirt.so.0.3002.1)
>                                        virNodeGetInfo (/usr/lib64/libvirt.so.0.3002.1)
>                                        _init (/usr/sbin/libvirtd)
>                                        virNetServerProgramDispatch (/usr/lib64/libvirt.so.0.3002.1)
>                                        [0xffffaa7a2a4a8737] (/usr/sbin/libvirtd)
>                                        [0xffff80b476e99f30] (/usr/lib64/libvirt.so.0.3002.1)
>                                        [0xffff80b476e992a7] (/usr/lib64/libvirt.so.0.3002.1)
>                                        start_thread (/usr/lib64/libpthread-2.25.so)
>                                        __GI___clone (inlined)
>
> --------------------------
>
> Seems to work, now using a CPU hog and sampling the most precise CPU "cycles" hardware counter:
>
> # time perf record --call-graph=dwarf stress --cpu 1 --timeout 10
> stress: info: [4690] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
> stress: info: [4690] successful run completed in 10s
> [ perf record: Woken up 1288 times to write data ]
> [ perf record: Captured and wrote 321.915 MB perf.data (40001 samples) ]
>
> real    0m10.174s
> user    0m10.065s
> sys     0m0.244s
> #
> # perf evlist -v
> cycles:ppp: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|CALLCHAIN|PERIOD|REGS_USER|STACK_USER|DATA_SRC, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, mmap_data: 1, sample_id_all: 1, exclude_guest: 1, exclude_callchain_user: 1, mmap2: 1, comm_exec: 1, sample_regs_user: 0xff0fff, sample_stack_user: 8192
> #
>
> # perf report --stdio --dsos stress
>
> # To display the perf.data header info, please use --header/--header-only options.
> #
> # dso: stress
> #
> # Total Lost Samples: 0
> #
> # Samples: 40K of event 'cycles:ppp'
> # Event count (approx.): 31368767617
> #
> # Children      Self  Command  Symbol
> # ........  ........  .......  ......................
> #
>     88.38%    14.59%  stress   [.] hogcpu
>             |
>             |--73.78%--hogcpu
>             |          |
>             |           --73.73%--rand
>             |                     |
>             |                      --73.71%--__random
>             |                                |
>             |                                 --40.58%--__random_r
>             |
>             |--13.00%--_start
>             |          __libc_start_main
>             |          main
>             |          hogcpu
>             |
>              --1.59%--0xbb7
>                        hogcpu
>
>     86.78%     0.00%  stress   [.] _start
>             |
>             ---_start
>                __libc_start_main
>                main
>                hogcpu
>                |
>                 --73.73%--rand
>                           |
>                            --73.71%--__random
>                                      |
>                                       --40.58%--__random_r
>
>     86.78%     0.00%  stress   [.] __libc_start_main
>             |
>             ---__libc_start_main
>                main
>                hogcpu
>                |
>                 --73.73%--rand
>                           |
>                            --73.71%--__random
>                                      |
>                                       --40.58%--__random_r
>
>     86.78%     0.00%  stress   [.] main
>             |
>             ---main
>                hogcpu
>                |
>                 --73.73%--rand
>                           |
>                            --73.71%--__random
>                                      |
>                                       --40.58%--__random_r
>
>      0.01%     0.01%  stress   [k] 0x000055ae24920bc9
>      0.00%     0.00%  stress   [k] 0x00007fba0ad15790
>      0.00%     0.00%  stress   [k] 0x00007fba0ad157e6
>      0.00%     0.00%  stress   [k] 0x00007fba0ad15770
>      0.00%     0.00%  stress   [k] 0x000055ae24920bcd
>      0.00%     0.00%  stress   [k] 0x00007fba0ad157a5
>      0.00%     0.00%  stress   [k] 0x00007fba0ad15c7d
>      0.00%     0.00%  stress   [k] 0x00007fba0ad15808
>
>
> #
> # (Tip: System-wide collection from all CPUs: perf record -a)
> #
>
> Also with 'perf script':
>
> <SNIP>
> stress  4691  1422.153895:     576462 cycles:ppp:
>                    3b946 __random_r (/usr/lib64/libc-2.25.so)
>                    3b7d1 __random (/usr/lib64/libc-2.25.so)
>                    3bc78 rand (/usr/lib64/libc-2.25.so)
>                     2bbc hogcpu (/usr/bin/stress)
>                     160c main (/usr/bin/stress)
>                    20509 __libc_start_main (/usr/lib64/libc-2.25.so)
>                     2ac9 _start (/usr/bin/stress)
>
> stress  4691  1422.154121:     582967 cycles:ppp:
>                    3b946 __random_r (/usr/lib64/libc-2.25.so)
>                    3b7d1 __random (/usr/lib64/libc-2.25.so)
>                    3bc78 rand (/usr/lib64/libc-2.25.so)
>                     2bbc hogcpu (/usr/bin/stress)
>                     160c main (/usr/bin/stress)
>                    20509 __libc_start_main (/usr/lib64/libc-2.25.so)
>                     2ac9 _start (/usr/bin/stress)
>
> stress  4691  1422.154349:     590924 cycles:ppp:
>                    3b93c __random_r (/usr/lib64/libc-2.25.so)
>                    3b7d1 __random (/usr/lib64/libc-2.25.so)
>                    3bc78 rand (/usr/lib64/libc-2.25.so)
>                     2bbc hogcpu (/usr/bin/stress)
>                     160c main (/usr/bin/stress)
>                    20509 __libc_start_main (/usr/lib64/libc-2.25.so)
>                     2ac9 _start (/usr/bin/stress)
> <SNIP>
>
>> Here is my output on my clang-built binary:
>
> I'll try as well with a clang built binary:
>
> [acme@jouet linux]$ make CC=clang O=/tmp/build/perf -C tools/perf install-bin
>
> I.e. will profile a perf binary built with clang:
>
> $ readelf -wi ~/bin/perf | grep DW_AT_producer | head -1
>     <c>   DW_AT_producer    : (indirect string, offset: 0x0): clang version 4.0.0 (http://llvm.org/git/clang.git f5be8ba13adc4ba1011a7ccd60c844bd60427c1c) (http://llvm.org/git/llvm.git efca1a37676f4cd276d947658cf90b0fb625abfd)
> $
>
> # perf record -F 2000 --call-graph=dwarf perf top
> [ perf record: Woken up 20 times to write data ]
> [ perf record: Captured and wrote 5.771 MB perf.data (694 samples) ]
> # perf report --dsos=perf --no-children --stdio
> # To display the perf.data header info, please use --header/--header-only options.
> #
> # dso: perf
> #
> # Total Lost Samples: 0
> #
> # Samples: 694  of event 'cycles:ppp'
> # Event count (approx.): 889028447
> #
> # Overhead  Command  Symbol
> # ........  .......  ......................................
> #
>      4.77%  perf     [.] rb_next
>             |
>             ---rb_next
>                |
>                |--3.15%--map_groups__fixup_overlappings
>                |          thread__insert_map
>                |          machine__process_mmap2_event
>                |          machine__process_event
>                |          perf_event__process
>                |          perf_event__synthesize_mmap_events
>                |          0xffffffffffffffff
>                |
>                 --1.12%--map__next
>                           map_groups__find_by_name
>                           map_groups__set_modules_path_dir
>                           map_groups__set_modules_path_dir
>                           0xffffffffffffffff
>
>      2.69%  perf     [.] map_groups__fixup_overlappings
>             |
>             ---map_groups__fixup_overlappings
>                thread__insert_map
>                machine__process_mmap2_event
>                machine__process_event
>                perf_event__process
>                perf_event__synthesize_mmap_events
>                0xffffffffffffffff
>
>      1.36%  perf     [.] rb_erase
>             |
>             ---rb_erase
>                symbols__delete
>                dso__delete
>                dso__put
>                |
>                 --1.18%--map__put
>                           __map__zput (inlined)
>                           hist_entry__delete
>                           hists_evsel__exit
>                           perf_evsel__exit
>                           perf_evsel__delete
>                           perf_evlist__purge (inlined)
>                           perf_evlist__delete
>                           cmd_top
>                           0xffffffffffffffff
>
> Also seems to work.
>
> Can you try building the latest codebase, that I'm putting at:
>
> [acme@jouet linux]$ rm -f perf-4.14.0-rc6.tar.xz
> [acme@jouet linux]$ make help | grep perf
>   perf-tar-src-pkg    - Build perf-4.14.0-rc6.tar source tarball
>   perf-targz-src-pkg  - Build perf-4.14.0-rc6.tar.gz source tarball
>   perf-tarbz2-src-pkg - Build perf-4.14.0-rc6.tar.bz2 source tarball
>   perf-tarxz-src-pkg  - Build perf-4.14.0-rc6.tar.xz source tarball
> [acme@jouet linux]$ make perf-tarxz-src-pkg
>   TAR
>   PERF_VERSION = 4.14.rc6.gadb243
> [acme@jouet linux]$ ls -lart perf-4*
> -rw-rw-r--. 1 acme acme 1252444 Oct 31 16:06 perf-4.14.0-rc6.tar.xz
> [acme@jouet linux]$
>
> http://vger.kernel.org/~acme/perf/perf-4.14.0-rc6.tar.xz
>
> Download it and try building it:
>
> $ mkdir /tmp/perf
> $ cd /tmp/perf
> $ wget http://vger.kernel.org/~acme/perf/perf-4.14.0-rc6.tar.xz
> $ tar xf perf-4.14.0-rc6.tar.xz
> $ cd perf-4.14.0-rc6
> $ make -C tools/perf install-bin
>
> Then make sure your ~/bin/ directory is in the path and try your workloads to
> see if there was some fix that addresses the problems you're experiencing.
>
> - Arnaldo



-- 
Owner/Director of Game Studio Abraham Stolk Inc.
Vancouver BC, Canada
b.stolk@gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding a recording delay interferes with symbols/call-stack
  2017-11-01  4:36       ` Bram Stolk
@ 2017-11-02  6:20         ` Namhyung Kim
  2017-11-02  7:01           ` Bram Stolk
  0 siblings, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2017-11-02  6:20 UTC (permalink / raw)
  To: Bram Stolk; +Cc: Arnaldo Carvalho de Melo, linux-perf-users, kernel-team

Hello,

On Tue, Oct 31, 2017 at 09:36:20PM -0700, Bram Stolk wrote:
> On Tue, Oct 31, 2017 at 12:14 PM, Arnaldo Carvalho de Melo
> <arnaldo.melo@gmail.com> wrote:
> > Em Tue, Oct 31, 2017 at 10:10:28AM -0700, Bram Stolk escreveu:
> >> Thanks for trying to reproduce, Arnaldo.
> >>
> >> But for me, it's not the kernel symbols that are missing: they are all
> >> there, syscalls and stuff.
> >> Just the app's symbols itself are missing.
> >>
> >> What happens if you try it on a binary you built locally, that spends
> >> cycles in userspace?

The problem is that FORK, COMM and MMAP2 events were not generated for
the workload given on the command line.  Normally they are generated
by kernel during exec for this case (target__none).  But initial_delay
makes the attr->enable_on_exec bit cleared so kernel won't generate
the evetns for us.  System-wide and existing process cases are not
affected since they always synthesize the events.

Could you please check below patch solves your problem?

Thanks,
Namhyung


diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f4d9fc54b382..89a192dc235c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -949,6 +949,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
        if (opts->initial_delay) {
                usleep(opts->initial_delay * USEC_PER_MSEC);
+
+               /*
+                * it did not set attr.enable_on_exec so misses
+                * task and mmap events.
+                */
+               if (target__none(&rec->opts.target))
+                       record__synthesize_workload(rec, false);
+
                perf_evlist__enable(rec->evlist);
        }
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Adding a recording delay interferes with symbols/call-stack
  2017-11-02  6:20         ` Namhyung Kim
@ 2017-11-02  7:01           ` Bram Stolk
  2017-11-02  7:07             ` Namhyung Kim
  2017-11-03 18:41             ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 10+ messages in thread
From: Bram Stolk @ 2017-11-02  7:01 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Arnaldo Carvalho de Melo, linux-perf-users, kernel-team

Thanks Namhyung,

I tried the patch, but still no app symbols:

perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
...
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.347 MB perf.data (41 samples) ]

perf report -g 'graph,0.25,callee'
# To display the perf.data header info, please use
--header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 41  of event 'cycles:uppp'
# Event count (approx.): 94489518008269
#
# Children      Self  Command  Shared Object      Symbol
# ........  ........  .......  .................  ..........................
#
    50.11%    50.11%  bench    bench              [.] 0x0000000000010372
     7.13%     7.13%  bench    bench              [.] 0x0000000000010358
     3.56%     3.56%  bench    bench              [.] 0x00000000000127fa
     3.56%     3.56%  bench    libm-2.23.so       [.] 0x00000000000328c3
     3.56%     3.56%  bench    bench              [.] 0x000000000001241d
     3.56%     3.56%  bench    bench              [.] 0x0000000000012121
     3.56%     3.56%  bench    bench              [.] 0x000000000001032d
     3.56%     3.56%  bench    bench              [.] 0x000000000001006e
     3.56%     3.56%  bench    bench              [.] 0x00000000000121b8
     3.56%     3.56%  bench    bench              [.] 0x0000000000012464
     3.56%     3.56%  bench    bench              [.] 0x0000000000012400
     3.56%     3.56%  bench    bench              [.] 0x0000000000010377
     3.56%     3.56%  bench    bench              [.] 0x000000000001008c
     3.56%     3.56%  bench    bench              [.] 0x0000000000010091
     0.00%     0.00%  bench    bench              [.] 0x00000000000124ee
     0.00%     0.00%  bench    bench              [.] 0x000000000000a530
     0.00%     0.00%  bench    bench              [.] 0x0000000000012091
     0.00%     0.00%  bench    bench              [.] 0x000000000001212f
     0.00%     0.00%  bench    bench              [.] 0x000000000000ffff
     0.00%     0.00%  bench    bench              [.] 0x000000000000ff9b
     0.00%     0.00%  bench    bench              [.] 0x000000000001009b
     0.00%     0.00%  bench    bench              [.] 0x0000000000010332
     0.00%     0.00%  bench    bench              [.] 0x00000000000102f0
     0.00%     0.00%  bench    bench              [.] 0x0000000000010368
     0.00%     0.00%  bench    bench              [.] 0x0000000000012113
     0.00%     0.00%  bench    bench              [.] 0x0000000000010068
     0.00%     0.00%  bench    bench              [.] 0x0000000000012419
     0.00%     0.00%  bench    bench              [.] 0x0000000000012431
     0.00%     0.00%  bench    bench              [.] 0x000000000001267f
     0.00%     0.00%  bench    bench              [.] 0x000000000000aea9
     0.00%     0.00%  bench    bench              [.] 0x0000000000012742
     0.00%     0.00%  bench    [kernel.kallsyms]  [.] native_irq_return_iret





On Wed, Nov 1, 2017 at 11:20 PM, Namhyung Kim <namhyung@kernel.org> wrote:
> Hello,
>
> On Tue, Oct 31, 2017 at 09:36:20PM -0700, Bram Stolk wrote:
>> On Tue, Oct 31, 2017 at 12:14 PM, Arnaldo Carvalho de Melo
>> <arnaldo.melo@gmail.com> wrote:
>> > Em Tue, Oct 31, 2017 at 10:10:28AM -0700, Bram Stolk escreveu:
>> >> Thanks for trying to reproduce, Arnaldo.
>> >>
>> >> But for me, it's not the kernel symbols that are missing: they are all
>> >> there, syscalls and stuff.
>> >> Just the app's symbols itself are missing.
>> >>
>> >> What happens if you try it on a binary you built locally, that spends
>> >> cycles in userspace?
>
> The problem is that FORK, COMM and MMAP2 events were not generated for
> the workload given on the command line.  Normally they are generated
> by kernel during exec for this case (target__none).  But initial_delay
> makes the attr->enable_on_exec bit cleared so kernel won't generate
> the evetns for us.  System-wide and existing process cases are not
> affected since they always synthesize the events.
>
> Could you please check below patch solves your problem?
>
> Thanks,
> Namhyung
>
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index f4d9fc54b382..89a192dc235c 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -949,6 +949,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>
>         if (opts->initial_delay) {
>                 usleep(opts->initial_delay * USEC_PER_MSEC);
> +
> +               /*
> +                * it did not set attr.enable_on_exec so misses
> +                * task and mmap events.
> +                */
> +               if (target__none(&rec->opts.target))
> +                       record__synthesize_workload(rec, false);
> +
>                 perf_evlist__enable(rec->evlist);
>         }
>



-- 
Owner/Director of Game Studio Abraham Stolk Inc.
Vancouver BC, Canada
b.stolk@gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding a recording delay interferes with symbols/call-stack
  2017-11-02  7:01           ` Bram Stolk
@ 2017-11-02  7:07             ` Namhyung Kim
  2017-11-02  7:10               ` Bram Stolk
  2017-11-03 18:41             ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2017-11-02  7:07 UTC (permalink / raw)
  To: Bram Stolk; +Cc: Arnaldo Carvalho de Melo, linux-perf-users, kernel-team

On Thu, Nov 02, 2017 at 12:01:24AM -0700, Bram Stolk wrote:
> Thanks Namhyung,
> 
> I tried the patch, but still no app symbols:
> 
> perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
> ...
> [ perf record: Woken up 2 times to write data ]
> [ perf record: Captured and wrote 0.347 MB perf.data (41 samples) ]
> 
> perf report -g 'graph,0.25,callee'

Hmmm... could you please show me whether MMAP2 events are recorded?

  $ perf report -D | tail

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding a recording delay interferes with symbols/call-stack
  2017-11-02  7:07             ` Namhyung Kim
@ 2017-11-02  7:10               ` Bram Stolk
  0 siblings, 0 replies; 10+ messages in thread
From: Bram Stolk @ 2017-11-02  7:10 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Arnaldo Carvalho de Melo, linux-perf-users, kernel-team

$ perf report -D | tail
            COMM events:          2
            EXIT events:          5
            FORK events:          4
          SAMPLE events:         62
           MMAP2 events:         93
  FINISHED_ROUND events:          2
       TIME_CONV events:          1
cycles:uppp stats:
           TOTAL events:         62
          SAMPLE events:         62

This is with patched perf.

Bed time on west coast, I'll do more testing tomorrow. Thanks for the help!


On Thu, Nov 2, 2017 at 12:07 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> On Thu, Nov 02, 2017 at 12:01:24AM -0700, Bram Stolk wrote:
>> Thanks Namhyung,
>>
>> I tried the patch, but still no app symbols:
>>
>> perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench
>> ...
>> [ perf record: Woken up 2 times to write data ]
>> [ perf record: Captured and wrote 0.347 MB perf.data (41 samples) ]
>>
>> perf report -g 'graph,0.25,callee'
>
> Hmmm... could you please show me whether MMAP2 events are recorded?
>
>   $ perf report -D | tail
>
> Thanks,
> Namhyung



-- 
Owner/Director of Game Studio Abraham Stolk Inc.
Vancouver BC, Canada
b.stolk@gmail.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Adding a recording delay interferes with symbols/call-stack
  2017-11-02  7:01           ` Bram Stolk
  2017-11-02  7:07             ` Namhyung Kim
@ 2017-11-03 18:41             ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-11-03 18:41 UTC (permalink / raw)
  To: Bram Stolk
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, linux-perf-users, kernel-team

Em Thu, Nov 02, 2017 at 12:01:24AM -0700, Bram Stolk escreveu:
> Thanks Namhyung,
> 
> I tried the patch, but still no app symbols:
> 
> perf record --delay=1000 --freq 2000 --call-graph dwarf ./bench

Can you try this one instead:

commit 81b5911db9f587ebe30af81a667ff8de2c43eaf0
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Fri Nov 3 15:34:34 2017 -0300

    perf record: Generate PERF_RECORD_{MMAP,COMM,EXEC} with --delay
    
    When we use an initial delay, e.g.: 'perf record --delay 1000', we do not
    enable the events until that delay has passed after we started the workload,
    including the tracking event, i.e. the one for which we have attr.mmap, etc,
    enabled to ask the kernel to generate the PERF_RECORD_{MMAP,COMM,EXEC} metadata
    events that will then allow us to resolve addresses in samples to the map, dso
    and symbol. There will be a shadow that even synthesizing samples won't cover,
    i.e. the workload that we start and others that fork while we wait for that delay.
    
    So use a dummy event to be the tracking one and make it be enabled on exec.
    
    Before:
    
      # perf record --delay 1000 stress --cpu 1 --timeout 5
      stress: info: [9029] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
      stress: info: [9029] successful run completed in 5s
      [ perf record: Woken up 3 times to write data ]
      [ perf record: Captured and wrote 0.624 MB perf.data (15908 samples) ]
      # perf script | head
          :9031 9031 32001.826888:       1 cycles:ppp: ffffffff831aa30d event_function (/lib/modules/4.14.0-rc6+/build/vmlinux)
          :9031 9031 32001.826893:       1 cycles:ppp: ffffffff8300d1a0 intel_bts_enable_local (/lib/modules/4.14.0-rc6+/build/vmlinux)
          :9031 9031 32001.826895:       7 cycles:ppp: ffffffff83023870 sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
          :9031 9031 32001.826897:     103 cycles:ppp: ffffffff8300c331 intel_pmu_handle_irq (/lib/modules/4.14.0-rc6+/build/vmlinux)
          :9031 9031 32001.826899:    1615 cycles:ppp: ffffffff830231f8 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
          :9031 9031 32001.826902:   26724 cycles:ppp: ffffffff8384c6a7 native_irq_return_iret (/lib/modules/4.14.0-rc6+/build/vmlinux)
          :9031 9031 32001.826913:  329739 cycles:ppp:     7fb2a5410932 [unknown] ([unknown])
          :9031 9031 32001.827033: 1225451 cycles:ppp:     7fb2a5410930 [unknown] ([unknown])
          :9031 9031 32001.827474: 1391725 cycles:ppp:     7fb2a5410930 [unknown] ([unknown])
          :9031 9031 32001.827978: 1233697 cycles:ppp:     7fb2a5410928 [unknown] ([unknown])
      #
    
    After:
    
      # perf record --delay 1000 stress --cpu 1 --timeout 5
      stress: info: [9741] dispatching hogs: 1 cpu, 0 io, 0 vm, 0 hdd
      stress: info: [9741] successful run completed in 5s
      [ perf record: Woken up 3 times to write data ]
      [ perf record: Captured and wrote 0.751 MB perf.data (15976 samples) ]
      # perf script | head
         stress  9742 32110.959106:          1 cycles:ppp:  ffffffff831b26f6 __perf_event_task_sched_in (/lib/modules/4.14.0-rc6+/build/vmlinux)
         stress 9742 32110.959110:       1 cycles:ppp: ffffffff8300c2e9 intel_pmu_handle_irq (/lib/modules/4.14.0-rc6+/build/vmlinux)
         stress 9742 32110.959112:       7 cycles:ppp: ffffffff830231e0 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
         stress 9742 32110.959115:     101 cycles:ppp: ffffffff83023870 sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
         stress 9742 32110.959117:    1533 cycles:ppp: ffffffff830231f8 native_sched_clock (/lib/modules/4.14.0-rc6+/build/vmlinux)
         stress 9742 32110.959119:   23992 cycles:ppp: ffffffff831b0900 ctx_sched_in (/lib/modules/4.14.0-rc6+/build/vmlinux)
         stress 9742 32110.959129:  329406 cycles:ppp:     7f4b1b661930 __random_r (/usr/lib64/libc-2.25.so)
         stress 9742 32110.959249: 1288322 cycles:ppp:     5566e1e7cbc9 hogcpu (/usr/bin/stress)
         stress 9742 32110.959712: 1464046 cycles:ppp:     7f4b1b66179e __random (/usr/lib64/libc-2.25.so)
         stress 9742 32110.960241: 1266918 cycles:ppp:     7f4b1b66195b __random_r (/usr/lib64/libc-2.25.so)
      #
    
    Reported-by: Bram Stolk <b.stolk@gmail.com>
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Wang Nan <wangnan0@huawei.com>
    Link: http://lkml.kernel.org/n/tip-nrdfchshqxf7diszhxcecqb9@git.kernel.org
    Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f4d9fc54b382..3c74bfae1aeb 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -338,6 +338,22 @@ static int record__open(struct record *rec)
 	struct perf_evsel_config_term *err_term;
 	int rc = 0;
 
+	/*
+	 * For initial_delay we need to add a dummy event so that we can track
+	 * PERF_RECORD_MMAP while we wait for the initial delay to enable the
+	 * real events, the ones asked by the user.
+	 */
+	if (opts->initial_delay) {
+		if (perf_evlist__add_dummy(evlist))
+			return -ENOMEM;
+
+		pos = perf_evlist__first(evlist);
+		pos->tracking = 0;
+		pos = perf_evlist__last(evlist);
+		pos->tracking = 1;
+		pos->attr.enable_on_exec = 1;
+	}
+
 	perf_evlist__config(evlist, opts, &callchain_param);
 
 	evlist__for_each_entry(evlist, pos) {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-11-03 18:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-30 20:25 Adding a recording delay interferes with symbols/call-stack Bram Stolk
2017-10-31 14:25 ` Arnaldo Carvalho de Melo
2017-10-31 17:10   ` Bram Stolk
2017-10-31 19:14     ` Arnaldo Carvalho de Melo
2017-11-01  4:36       ` Bram Stolk
2017-11-02  6:20         ` Namhyung Kim
2017-11-02  7:01           ` Bram Stolk
2017-11-02  7:07             ` Namhyung Kim
2017-11-02  7:10               ` Bram Stolk
2017-11-03 18:41             ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.