All of lore.kernel.org
 help / color / mirror / Atom feed
* Using perf to generate a call stack from a kernel function to the user space caller
@ 2015-01-20 17:08 AKel
  2015-01-20 18:40 ` Rick Jones
  0 siblings, 1 reply; 5+ messages in thread
From: AKel @ 2015-01-20 17:08 UTC (permalink / raw)
  To: linux-perf-users

Hi,

I'm trying to profile an application which is spending quite a bit of time 
in the kernel (~40% in do_raw_spin_lock for example). To fully understand 
what is happening I would like to generate a call stack using perf (or 
otherwise) to see where the calls to this function are originating from. 
Is this possible? My hands are slightly tied in that I don't have sudo 
access to the system I'm working on, so a solution which accounts for this 
would be greatly appreciated. Even transferring files to/from the system 
is very difficult so I'm rather restricted in what i can do.

Thanks,
A

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using perf to generate a call stack from a kernel function to the user space caller
  2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel
@ 2015-01-20 18:40 ` Rick Jones
  2015-01-21  9:42   ` AKel
  0 siblings, 1 reply; 5+ messages in thread
From: Rick Jones @ 2015-01-20 18:40 UTC (permalink / raw)
  To: AKel, linux-perf-users

On 01/20/2015 09:08 AM, AKel wrote:
> Hi,
>
> I'm trying to profile an application which is spending quite a bit of time
> in the kernel (~40% in do_raw_spin_lock for example). To fully understand
> what is happening I would like to generate a call stack using perf (or
> otherwise) to see where the calls to this function are originating from.
> Is this possible? My hands are slightly tied in that I don't have sudo
> access to the system I'm working on, so a solution which accounts for this
> would be greatly appreciated. Even transferring files to/from the system
> is very difficult so I'm rather restricted in what i can do.

I suspect you will have to resolve the sudo issue by getting sudo. 
Ignoring that for the time-being, it sounds like you could use the 
call-graph functionality of perf record/report/top?

rick jones

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using perf to generate a call stack from a kernel function to the user space caller
  2015-01-20 18:40 ` Rick Jones
@ 2015-01-21  9:42   ` AKel
  2015-01-21 12:40     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 5+ messages in thread
From: AKel @ 2015-01-21  9:42 UTC (permalink / raw)
  To: linux-perf-users


Thanks Rick for your reply. Unfortunately I'm in the big bad corporate 
world where a simple minion such as myself is not granted sudo access. 
I've tried passing the callgraph (-g) flag to perf, however I can only 
recover the call stack within the kernel, it begins at what I suspect is 
the kernel entry point. Ideally I could generate a complete call stack.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using perf to generate a call stack from a kernel function to the user space caller
  2015-01-21  9:42   ` AKel
@ 2015-01-21 12:40     ` Arnaldo Carvalho de Melo
  2015-01-21 14:56       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-01-21 12:40 UTC (permalink / raw)
  To: AKel; +Cc: linux-perf-users

Em Wed, Jan 21, 2015 at 09:42:29AM +0000, AKel escreveu:
> Thanks Rick for your reply. Unfortunately I'm in the big bad corporate 
> world where a simple minion such as myself is not granted sudo access. 

That should not be a problem in most of the cases for profiling your own
apps.

> I've tried passing the callgraph (-g) flag to perf, however I can only 
> recover the call stack within the kernel, it begins at what I suspect is 

Because it will try, by default, to collect %bp (frame pointer)
callchains, which unfortunately is not usually enabled for user space
programs, that use the %bp register for other purposes.

> the kernel entry point. Ideally I could generate a complete call stack.

You can try using the DWARF CFI (Call Frame Information), by using
instead:

  $ perf record --call-graph dwarf firefox

Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few
callgraphs:

  $ perf report --no-children # expand callgraphs, press P, press q
  $ head -45 perf.hist.0  
  -    3,47%  Image Scaler     libxul.so                      [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
       _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
       _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
       _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
       _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiPv
       _ZN7mozilla3gfx5ScaleEPhiiiS1_iiiNS0_13SurfaceFormatE
       _ZN7mozilla5image11ScaleRunner3RunEv
       _ZN8nsThread16ProcessNextEventEbPb
       _Z19NS_ProcessNextEventP9nsIThreadb
       _ZN7mozilla3ipc28MessagePumpForNonMainThreads3RunEPN4base11MessagePump8DelegateE
       _ZN11MessageLoop3RunEv
       _ZN8nsThread10ThreadFuncEPv
       _pt_root
       start_thread
       __clone
  -    3,00%  Cache2 I/O       libc-2.18.so                   [.] __memmove_ssse3_back
     - __memmove_ssse3_back
        - 69,64% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE15InsertElementAtIRKS3_EEPS3_mOT_
           - 51,91% _ZN7mozilla3net10CacheIndex29InsertRecordToExpirationArrayEPNS0_16CacheIndexRecordE
                _ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
                _ZN7mozilla3net10CacheIndex12ParseRecordsEv
                _ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
                _ZN7mozilla3net9ReadEvent3RunEv
                _ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEv
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
                _pt_root
                start_thread
                __clone
           - 48,09% _ZN7mozilla3net10CacheIndex27InsertRecordToFrecencyArrayEPNS0_16CacheIndexRecordE
                _ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
                _ZN7mozilla3net10CacheIndex12ParseRecordsEv
                _ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
                _ZN7mozilla3net9ReadEvent3RunEv
                _ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEv
                _ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
                _pt_root
                start_thread
                __clone
        + 30,36% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
  +    1,79%  firefox          libz.so.1.2.8                  [.] inflate_fast
  +    1,76%  Cache2 I/O       libxul.so                      [.] _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
  +    1,39%  firefox          libpthread-2.18.so             [.] pthread_mutex_lock
  +    1,37%  firefox          firefox                        [.] arena_dalloc
  $ 


These C++ methods that needs unmangling, and I have the firefox-debuginfo
package matching the binary... Have to check this.

Anyway, either you have your binaries (your main binary and the libraries it
uses) built with -fno-omit-frame-pointer, like the kernel is, and then you can
use '-g', that, in its default form is equivalent to '--call-graph fp', or you
will have to ask that the kernel collects stack dumps to then use CFI to do the
callchains, i.e. use '--call-graph dwarf'.

You may want to state how much stack is collected when using --call-graph dwarf,
please look at the 'record' help:

  -g                    enables call-graph recording
      --call-graph <mode[,dump_size]>
                        setup and enables call-graph (stack chain/backtrace)
                        recording: fp dwarf

Ah, here is another example, this time crossing from kernel to userspace, all the way
back from a spin lock operation to the memory allocation routines used by
firefox, after I pressed -> and zoomed into the kernel DSO:

[acme@zoo linux]$ head -50 perf.hist.1 
+    0,32%  firefox          [k] clear_page_c_e
+    0,26%  firefox          [k] copy_user_enhanced_fast_string
+    0,26%  firefox          [k] page_fault
+    0,20%  firefox          [k] get_page_from_freelist
+    0,20%  firefox          [k] __list_del_entry
+    0,20%  firefox          [k] mem_cgroup_page_lruvec
+    0,16%  firefox          [k] lookup_page_cgroup_used
-    0,14%  firefox          [k] _raw_spin_lock
   - _raw_spin_lock
      - 37,73% try_to_wake_up
         - 64,20% wake_up_state
              wake_futex
              futex_wake
              do_futex
              sys_futex
              system_call
            - __lll_unlock_wake
                 37,39% 0x7f5e81b10400
                 32,07% 0x7f5e63e81bd0
               - 30,54% arena_malloc
                    malloc
                    _ZN14nsStringBuffer5AllocEm
                    _ZN18nsAString_internal10MutatePrepEjPPDsPj
                    _ZN18nsAString_internal19ReplacePrepInternalEjjjj
         + 35,80% default_wake_function
      - 17,55% handle_mm_fault
           __do_page_fault
           do_page_fault
         - page_fault
            + 52,88% _ZN2js2gc5Chunk25fetchNextDecommittedArenaEv
            - 47,12% arena_dalloc
                 _ZN13nsTArray_baseI25nsTArrayFallibleAllocator25nsTArray_CopyWithMemutilsE9ShiftDataEmmmmm
                 _ZN15SnowWhiteKillerD2Ev
                 _ZN16nsCycleCollector13FreeSnowWhiteEb
                 _ZN16nsCycleCollector8ShutdownEv
                 _Z25nsCycleCollector_shutdownv
                 _ZN7mozilla13ShutdownXPCOMEP17nsIServiceManager
                 _ZN18ScopedXPCOMStartupD2Ev
                 _ZN7XREMain8XRE_mainEiPPcPK12nsXREAppData
                 XRE_main
                 _ZL7do_mainiPPcP7nsIFile
                 main
                 __libc_start_main
                 _start
      + 9,54% do_futex
      + 9,36% do_cow_fault
      + 9,33% tick_do_update_jiffies64
      + 8,72% futex_wait_setup
      + 7,62% unix_stream_recvmsg
+    0,14%  firefox          [k] mem_cgroup_begin_page_stat
[acme@zoo linux]$

- Arnaldo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Using perf to generate a call stack from a kernel function to the user space caller
  2015-01-21 12:40     ` Arnaldo Carvalho de Melo
@ 2015-01-21 14:56       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-01-21 14:56 UTC (permalink / raw)
  To: AKel; +Cc: linux-perf-users

Em Wed, Jan 21, 2015 at 09:40:21AM -0300, Arnaldo Carvalho de Melo escreveu:
> Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few
> callgraphs:
 
>   $ perf report --no-children # expand callgraphs, press P, press q
>   $ head -45 perf.hist.0  
>   -    3,47%  Image Scaler     libxul.so                      [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
>        _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
>        _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
>        _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv

<SNIP>

> These C++ methods that needs unmangling, and I have the firefox-debuginfo
> package matching the binary... Have to check this.

  [acme@zoo linux]$ make -C tools/perf O=/tmp/build/perf install-bin
  make: Entering directory `/home/git/linux/tools/perf'
    BUILD:   Doing 'make -j4' parallel build
  config/Makefile:597: No bfd.h/libbfd found, install binutils-dev[el]/zlib-static to gain symbol demangling

  Auto-detecting system features:
  ...                         dwarf: [ on  ]
  ...                         glibc: [ on  ]
  ...                          gtk2: [ on  ]
  ...                      libaudit: [ on  ]
  ...                        libbfd: [ OFF ]

after installing binutils-devel it gets symbol demangling:
 
-    3,47%  Image Scaler     libxul.so                      [.] skia::BGRAConvolve2D
     skia::BGRAConvolve2D
     skia::ImageOperations::ResizeBasic
     skia::ImageOperations::Resize
     skia::ImageOperations::Resize
     mozilla::gfx::Scale
     mozilla::image::ScaleRunner::Run
     nsThread::ProcessNextEvent
     NS_ProcessNextEvent
     mozilla::ipc::MessagePumpForNonMainThreads::Run
     MessageLoop::Run
     nsThread::ThreadFunc
     _pt_root
     start_thread
     __clone
+    3,00%  Cache2 I/O       libc-2.18.so                   [.] __memmove_ssse3_back
+    1,79%  firefox          libz.so.1.2.8                  [.] inflate_fast
+    1,76%  Cache2 I/O       libxul.so                      [.] nsTArray_Impl<mozilla::net::CacheIndexRecord*, nsTArrayInfallibleAllocator>::RemoveElement<mozilla::net::CacheIndexRecord*, nsDefaultComparator<mozilla::net::CacheIndexRecord*, mozilla::net::CacheIndexRecord*> >
+    1,39%  firefox          libpthread-2.18.so             [.] pthread_mutex_lock
+    1,37%  firefox          firefox                        [.] arena_dalloc
+    1,27%  firefox          libpthread-2.18.so             [.] pthread_mutex_unlock
+    1,02%  firefox          firefox                        [.] arena_malloc
+    0,68%  firefox          libxul.so                      [.] PLDHashTable::SearchTable
+    0,58%  firefox          libxul.so                      [.] js::AtomizeChars<unsigned char>
+    0,55%  firefox          libxul.so                      [.] Interpret
+    0,52%  firefox          libxul.so                      [.] js::ObjectImpl::nativeLookup
+    0,49%  firefox          libc-2.18.so                   [.] __memcpy_sse2_unaligned
+    0,47%  firefox          libxul.so                      [.] NewObject
+    0,45%  firefox          p11-kit-trust.so               [.] lookup_or_create_bucket
+    0,41%  mozStorage #5    libpthread-2.18.so             [.] pthread_mutex_lock
+    0,39%  firefox          libxul.so                      [.] js::EmptyShape::getInitialShape
+    0,38%  mozStorage #5    libpthread-2.18.so             [.] pthread_mutex_unlock
+    0,35%  ImageDecoder #2  libz.so.1.2.8                  [.] inflate_fast
+    0,35%  firefox          libxul.so                      [.] mozilla::gfx::AlphaBoxBlur::BoxBlur_SSE2
+    0,34%  firefox          ld-2.18.so                     [.] do_lookup_x
+    0,34%  firefox          libxul.so                      [.] JSObject::putProperty<(js::ExecutionMode)0>
+    0,34%  firefox          libxul.so                      [.] js::ShapeTable::search
+    0,33%  firefox          libxul.so                      [.] js::baseops::GetProperty
-    0,32%  firefox          [kernel.vmlinux]               [k] clear_page_c_e
     clear_page_c_e
   - __alloc_pages_nodemask
      - 97,50% alloc_pages_vma
         - 95,75% handle_mm_fault
              __do_page_fault
              do_page_fault
            - page_fault
               - 19,82% __memcpy_sse2_unaligned
                  - 35,58% _XRead
                       _XReadPad
                       XGetImage
                       _get_image_surface
                       _cairo_xlib_surface_acquire_source_image
                       _cairo_surface_acquire_source_image
                       _pixman_image_for_pattern
                       _clip_and_composite_boxes
                       _cairo_image_surface_paint
                       _cairo_surface_paint
                       _cairo_gstate_paint
                       INT__moz_cairo_paint
                       _moz_cairo_paint_with_alpha
                       mozilla::gfx::DrawTargetCairo::DrawSurface
                       mozilla::image::imgFrame::LockImageData
                       mozilla::image::RasterImage::RequestScale
                       mozilla::image::RasterImage::OptimalImageSizeForDest
                       DrawImageInternal
                       nsLayoutUtils::DrawBackgroundImage
                       nsImageRenderer::Draw
                       nsImageRenderer::DrawBackground
                       nsCSSRendering::PaintBackgroundWithSC
                       nsCSSRendering::PaintBackground
                       nsDisplayBackgroundImage::PaintInternal
                       mozilla::FrameLayerBuilder::PaintItems
                       mozilla::FrameLayerBuilder::DrawThebesLayer
                       mozilla::layers::BasicThebesLayer::PaintBuffer
                       mozilla::layers::BasicThebesLayer::Validate
                       mozilla::layers::BasicContainerLayer::Validate
                       mozilla::layers::BasicContainerLayer::Validate
                       mozilla::layers::BasicLayerManager::EndTransactionInternal
                       nsDisplayList::PaintForFrame
                  - 22,39% js::jit::Assembler::executableCopy
                       js::jit::JitCode::copyFrom
                       js::jit::Linker::newCode<(js::AllowGC)1>
                       js::jit::ICStubCompiler::getStubCode
                       js::jit::ICCallScriptedCompiler::getStub
                       js::jit::TryAttachCallStub
                       js::jit::DoCallFallback
                       0x7f5e847ffdb0
                  - 21,73% js::NewObjectWithClassProtoCommon
                       js::jit::NewInitObject
                       js::jit::DoNewObject
                       0x7f5e84800ec4
                  + 20,30% updatewindow
               + 15,70% arena_dalloc
               + 12,82% arena_malloc
               + 12,73% js::gc::Chunk::fetchNextDecommittedArena
               + 7,31% skb_copy_datagram_iovec
               + 7,04% sse2_blt
               + 4,45% NS_CopySegmentToBuffer
               + 4,18% js::InflateString
               + 4,18% mozilla::VectorBase<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy, js::Vector<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy> >::growStorageBy
               + 4,13% mozilla::image::imgFrame::Init
               + 3,86% _cairo_image_surface_span
               + 3,78% sse2_fill
         + 4,25% do_wp_page
      + 2,50% alloc_pages_current
+    0,32%  ImageDecoder #3  libz.so.1.2.8                  [.] inflate_fast


And if you press 'V' you get the DSO name right after the symbol name:


+    0,32%  firefox          [k] clear_page_c_e
+    0,26%  firefox          [k] copy_user_enhanced_fast_string
+    0,26%  firefox          [k] page_fault
+    0,20%  firefox          [k] get_page_from_freelist
+    0,20%  firefox          [k] __list_del_entry
+    0,20%  firefox          [k] mem_cgroup_page_lruvec
+    0,16%  firefox          [k] lookup_page_cgroup_used
-    0,14%  firefox          [k] _raw_spin_lock
   - _raw_spin_lock [kernel.vmlinux]
      - 37,73% try_to_wake_up [kernel.vmlinux]
         - 64,20% wake_up_state [kernel.vmlinux]
              wake_futex [kernel.vmlinux]
              futex_wake [kernel.vmlinux]
              do_futex [kernel.vmlinux]
              sys_futex [kernel.vmlinux]
              system_call [kernel.vmlinux]
            - __lll_unlock_wake libpthread-2.18.so
                 37,39% 0x7f5e81b10400 libxul.so
                 32,07% 0x7f5e63e81bd0 unknown
               - 30,54% arena_malloc firefox
                    malloc firefox
                    nsStringBuffer::Alloc libxul.so
                    nsAString_internal::MutatePrep libxul.so
                    nsAString_internal::ReplacePrepInternal libxul.so
         + 35,80% default_wake_function [kernel.vmlinux]
      - 17,55% handle_mm_fault [kernel.vmlinux]
           __do_page_fault [kernel.vmlinux]
           do_page_fault [kernel.vmlinux]
         - page_fault [kernel.vmlinux]
            - 52,88% js::gc::Chunk::fetchNextDecommittedArena libxul.so
                 js::gc::Chunk::allocateArena libxul.so
                 js::gc::ArenaLists::allocateFromArenaInline libxul.so
                 js::gc::ArenaLists::refillFreeList<(js::AllowGC)1> libxul.so
                 js::PropertyTree::getChild libxul.so
                 JSObject::getChildProperty libxul.so
                 JSObject::addPropertyInternal<(js::ExecutionMode)0> libxul.so
                 JSObject::putProperty<(js::ExecutionMode)0> libxul.so
                 DefinePropertyOrElement<(js::ExecutionMode)0> libxul.so
                 js::DefineNativeProperty libxul.so
                 JSObject::defineGeneric libxul.so
                 js::DefineFunction libxul.so
                 JS_DefineFunctions libxul.so
                 js::GlobalObject::resolveConstructor libxul.so
                 js::GetBuiltinPrototype libxul.so
                 js::NewDenseUnallocatedArray libxul.so
                 js::XDRObjectLiteral<(js::XDRMode)1> libxul.so
                 js::XDRScript<(js::XDRMode)1> libxul.so
                 js::XDRState<(js::XDRMode)1>::codeScript libxul.so
                 JS_DecodeScript libxul.so
                 ReadCachedScript libxul.so
                 mozJSComponentLoader::ObjectForLocation libxul.so
                 mozJSComponentLoader::ImportInto libxul.so
                 mozJSComponentLoader::Import libxul.so
                 nsXPCComponents_Utils::Import libxul.so
                 NS_InvokeByIndex libxul.so
                 XPCWrappedNative::CallMethod libxul.so
                 XPC_WN_CallMethod libxul.so
                 js::Invoke libxul.so
            - 47,12% arena_dalloc firefox
                 nsTArray_base<nsTArrayFallibleAllocator, nsTArray_CopyWithMemutils>::ShiftData libxul.so
                 SnowWhiteKiller::~SnowWhiteKiller libxul.so
                 nsCycleCollector::FreeSnowWhite libxul.so
                 nsCycleCollector::Shutdown libxul.so
                 nsCycleCollector_shutdown libxul.so
                 mozilla::ShutdownXPCOM libxul.so
                 ScopedXPCOMStartup::~ScopedXPCOMStartup libxul.so
                 XREMain::XRE_main libxul.so
                 XRE_main libxul.so
                 do_main firefox
                 main firefox
                 __libc_start_main libc-2.18.so
                 _start firefox
      + 9,54% do_futex [kernel.vmlinux]

- Arnaldo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-21 14:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel
2015-01-20 18:40 ` Rick Jones
2015-01-21  9:42   ` AKel
2015-01-21 12:40     ` Arnaldo Carvalho de Melo
2015-01-21 14:56       ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.