* Using perf to generate a call stack from a kernel function to the user space caller
@ 2015-01-20 17:08 AKel
2015-01-20 18:40 ` Rick Jones
0 siblings, 1 reply; 5+ messages in thread
From: AKel @ 2015-01-20 17:08 UTC (permalink / raw)
To: linux-perf-users
Hi,
I'm trying to profile an application which is spending quite a bit of time
in the kernel (~40% in do_raw_spin_lock for example). To fully understand
what is happening I would like to generate a call stack using perf (or
otherwise) to see where the calls to this function are originating from.
Is this possible? My hands are slightly tied in that I don't have sudo
access to the system I'm working on, so a solution which accounts for this
would be greatly appreciated. Even transferring files to/from the system
is very difficult so I'm rather restricted in what i can do.
Thanks,
A
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller
2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel
@ 2015-01-20 18:40 ` Rick Jones
2015-01-21 9:42 ` AKel
0 siblings, 1 reply; 5+ messages in thread
From: Rick Jones @ 2015-01-20 18:40 UTC (permalink / raw)
To: AKel, linux-perf-users
On 01/20/2015 09:08 AM, AKel wrote:
> Hi,
>
> I'm trying to profile an application which is spending quite a bit of time
> in the kernel (~40% in do_raw_spin_lock for example). To fully understand
> what is happening I would like to generate a call stack using perf (or
> otherwise) to see where the calls to this function are originating from.
> Is this possible? My hands are slightly tied in that I don't have sudo
> access to the system I'm working on, so a solution which accounts for this
> would be greatly appreciated. Even transferring files to/from the system
> is very difficult so I'm rather restricted in what i can do.
I suspect you will have to resolve the sudo issue by getting sudo.
Ignoring that for the time-being, it sounds like you could use the
call-graph functionality of perf record/report/top?
rick jones
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller
2015-01-20 18:40 ` Rick Jones
@ 2015-01-21 9:42 ` AKel
2015-01-21 12:40 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 5+ messages in thread
From: AKel @ 2015-01-21 9:42 UTC (permalink / raw)
To: linux-perf-users
Thanks Rick for your reply. Unfortunately I'm in the big bad corporate
world where a simple minion such as myself is not granted sudo access.
I've tried passing the callgraph (-g) flag to perf, however I can only
recover the call stack within the kernel, it begins at what I suspect is
the kernel entry point. Ideally I could generate a complete call stack.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller
2015-01-21 9:42 ` AKel
@ 2015-01-21 12:40 ` Arnaldo Carvalho de Melo
2015-01-21 14:56 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-01-21 12:40 UTC (permalink / raw)
To: AKel; +Cc: linux-perf-users
Em Wed, Jan 21, 2015 at 09:42:29AM +0000, AKel escreveu:
> Thanks Rick for your reply. Unfortunately I'm in the big bad corporate
> world where a simple minion such as myself is not granted sudo access.
That should not be a problem in most of the cases for profiling your own
apps.
> I've tried passing the callgraph (-g) flag to perf, however I can only
> recover the call stack within the kernel, it begins at what I suspect is
Because it will try, by default, to collect %bp (frame pointer)
callchains, which unfortunately is not usually enabled for user space
programs, that use the %bp register for other purposes.
> the kernel entry point. Ideally I could generate a complete call stack.
You can try using the DWARF CFI (Call Frame Information), by using
instead:
$ perf record --call-graph dwarf firefox
Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few
callgraphs:
$ perf report --no-children # expand callgraphs, press P, press q
$ head -45 perf.hist.0
- 3,47% Image Scaler libxul.so [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
_ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
_ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
_ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
_ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiPv
_ZN7mozilla3gfx5ScaleEPhiiiS1_iiiNS0_13SurfaceFormatE
_ZN7mozilla5image11ScaleRunner3RunEv
_ZN8nsThread16ProcessNextEventEbPb
_Z19NS_ProcessNextEventP9nsIThreadb
_ZN7mozilla3ipc28MessagePumpForNonMainThreads3RunEPN4base11MessagePump8DelegateE
_ZN11MessageLoop3RunEv
_ZN8nsThread10ThreadFuncEPv
_pt_root
start_thread
__clone
- 3,00% Cache2 I/O libc-2.18.so [.] __memmove_ssse3_back
- __memmove_ssse3_back
- 69,64% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE15InsertElementAtIRKS3_EEPS3_mOT_
- 51,91% _ZN7mozilla3net10CacheIndex29InsertRecordToExpirationArrayEPNS0_16CacheIndexRecordE
_ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
_ZN7mozilla3net10CacheIndex12ParseRecordsEv
_ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
_ZN7mozilla3net9ReadEvent3RunEv
_ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
_ZN7mozilla3net13CacheIOThread10ThreadFuncEv
_ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
_pt_root
start_thread
__clone
- 48,09% _ZN7mozilla3net10CacheIndex27InsertRecordToFrecencyArrayEPNS0_16CacheIndexRecordE
_ZN7mozilla3net25CacheIndexEntryAutoManageD2Ev
_ZN7mozilla3net10CacheIndex12ParseRecordsEv
_ZN7mozilla3net10CacheIndex10OnDataReadEPNS0_15CacheFileHandleEPc12tag_nsresult
_ZN7mozilla3net9ReadEvent3RunEv
_ZN7mozilla3net13CacheIOThread12LoopOneLevelEj
_ZN7mozilla3net13CacheIOThread10ThreadFuncEv
_ZN7mozilla3net13CacheIOThread10ThreadFuncEPv
_pt_root
start_thread
__clone
+ 30,36% _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
+ 1,79% firefox libz.so.1.2.8 [.] inflate_fast
+ 1,76% Cache2 I/O libxul.so [.] _ZN13nsTArray_ImplIPN7mozilla3net16CacheIndexRecordE27nsTArrayInfallibleAllocatorE13RemoveElementIS3_19nsDefaultComparatorIS3_S3_EEEbRKT_RKT0_.isra.60
+ 1,39% firefox libpthread-2.18.so [.] pthread_mutex_lock
+ 1,37% firefox firefox [.] arena_dalloc
$
These C++ methods that needs unmangling, and I have the firefox-debuginfo
package matching the binary... Have to check this.
Anyway, either you have your binaries (your main binary and the libraries it
uses) built with -fno-omit-frame-pointer, like the kernel is, and then you can
use '-g', that, in its default form is equivalent to '--call-graph fp', or you
will have to ask that the kernel collects stack dumps to then use CFI to do the
callchains, i.e. use '--call-graph dwarf'.
You may want to state how much stack is collected when using --call-graph dwarf,
please look at the 'record' help:
-g enables call-graph recording
--call-graph <mode[,dump_size]>
setup and enables call-graph (stack chain/backtrace)
recording: fp dwarf
Ah, here is another example, this time crossing from kernel to userspace, all the way
back from a spin lock operation to the memory allocation routines used by
firefox, after I pressed -> and zoomed into the kernel DSO:
[acme@zoo linux]$ head -50 perf.hist.1
+ 0,32% firefox [k] clear_page_c_e
+ 0,26% firefox [k] copy_user_enhanced_fast_string
+ 0,26% firefox [k] page_fault
+ 0,20% firefox [k] get_page_from_freelist
+ 0,20% firefox [k] __list_del_entry
+ 0,20% firefox [k] mem_cgroup_page_lruvec
+ 0,16% firefox [k] lookup_page_cgroup_used
- 0,14% firefox [k] _raw_spin_lock
- _raw_spin_lock
- 37,73% try_to_wake_up
- 64,20% wake_up_state
wake_futex
futex_wake
do_futex
sys_futex
system_call
- __lll_unlock_wake
37,39% 0x7f5e81b10400
32,07% 0x7f5e63e81bd0
- 30,54% arena_malloc
malloc
_ZN14nsStringBuffer5AllocEm
_ZN18nsAString_internal10MutatePrepEjPPDsPj
_ZN18nsAString_internal19ReplacePrepInternalEjjjj
+ 35,80% default_wake_function
- 17,55% handle_mm_fault
__do_page_fault
do_page_fault
- page_fault
+ 52,88% _ZN2js2gc5Chunk25fetchNextDecommittedArenaEv
- 47,12% arena_dalloc
_ZN13nsTArray_baseI25nsTArrayFallibleAllocator25nsTArray_CopyWithMemutilsE9ShiftDataEmmmmm
_ZN15SnowWhiteKillerD2Ev
_ZN16nsCycleCollector13FreeSnowWhiteEb
_ZN16nsCycleCollector8ShutdownEv
_Z25nsCycleCollector_shutdownv
_ZN7mozilla13ShutdownXPCOMEP17nsIServiceManager
_ZN18ScopedXPCOMStartupD2Ev
_ZN7XREMain8XRE_mainEiPPcPK12nsXREAppData
XRE_main
_ZL7do_mainiPPcP7nsIFile
main
__libc_start_main
_start
+ 9,54% do_futex
+ 9,36% do_cow_fault
+ 9,33% tick_do_update_jiffies64
+ 8,72% futex_wait_setup
+ 7,62% unix_stream_recvmsg
+ 0,14% firefox [k] mem_cgroup_begin_page_stat
[acme@zoo linux]$
- Arnaldo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Using perf to generate a call stack from a kernel function to the user space caller
2015-01-21 12:40 ` Arnaldo Carvalho de Melo
@ 2015-01-21 14:56 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 5+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-01-21 14:56 UTC (permalink / raw)
To: AKel; +Cc: linux-perf-users
Em Wed, Jan 21, 2015 at 09:40:21AM -0300, Arnaldo Carvalho de Melo escreveu:
> Then, in the 'perf report' TUI, use the 'P' hotkey after expanding a few
> callgraphs:
> $ perf report --no-children # expand callgraphs, press P, press q
> $ head -45 perf.hist.0
> - 3,47% Image Scaler libxul.so [.] _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
> _ZN4skia14BGRAConvolve2DEPKhibRKNS_19ConvolutionFilter1DES4_iPhb
> _ZN4skia15ImageOperations11ResizeBasicERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
> _ZN4skia15ImageOperations6ResizeERK8SkBitmapNS0_12ResizeMethodEiiRK7SkIRectPv
<SNIP>
> These C++ methods that needs unmangling, and I have the firefox-debuginfo
> package matching the binary... Have to check this.
[acme@zoo linux]$ make -C tools/perf O=/tmp/build/perf install-bin
make: Entering directory `/home/git/linux/tools/perf'
BUILD: Doing 'make -j4' parallel build
config/Makefile:597: No bfd.h/libbfd found, install binutils-dev[el]/zlib-static to gain symbol demangling
Auto-detecting system features:
... dwarf: [ on ]
... glibc: [ on ]
... gtk2: [ on ]
... libaudit: [ on ]
... libbfd: [ OFF ]
after installing binutils-devel it gets symbol demangling:
- 3,47% Image Scaler libxul.so [.] skia::BGRAConvolve2D
skia::BGRAConvolve2D
skia::ImageOperations::ResizeBasic
skia::ImageOperations::Resize
skia::ImageOperations::Resize
mozilla::gfx::Scale
mozilla::image::ScaleRunner::Run
nsThread::ProcessNextEvent
NS_ProcessNextEvent
mozilla::ipc::MessagePumpForNonMainThreads::Run
MessageLoop::Run
nsThread::ThreadFunc
_pt_root
start_thread
__clone
+ 3,00% Cache2 I/O libc-2.18.so [.] __memmove_ssse3_back
+ 1,79% firefox libz.so.1.2.8 [.] inflate_fast
+ 1,76% Cache2 I/O libxul.so [.] nsTArray_Impl<mozilla::net::CacheIndexRecord*, nsTArrayInfallibleAllocator>::RemoveElement<mozilla::net::CacheIndexRecord*, nsDefaultComparator<mozilla::net::CacheIndexRecord*, mozilla::net::CacheIndexRecord*> >
+ 1,39% firefox libpthread-2.18.so [.] pthread_mutex_lock
+ 1,37% firefox firefox [.] arena_dalloc
+ 1,27% firefox libpthread-2.18.so [.] pthread_mutex_unlock
+ 1,02% firefox firefox [.] arena_malloc
+ 0,68% firefox libxul.so [.] PLDHashTable::SearchTable
+ 0,58% firefox libxul.so [.] js::AtomizeChars<unsigned char>
+ 0,55% firefox libxul.so [.] Interpret
+ 0,52% firefox libxul.so [.] js::ObjectImpl::nativeLookup
+ 0,49% firefox libc-2.18.so [.] __memcpy_sse2_unaligned
+ 0,47% firefox libxul.so [.] NewObject
+ 0,45% firefox p11-kit-trust.so [.] lookup_or_create_bucket
+ 0,41% mozStorage #5 libpthread-2.18.so [.] pthread_mutex_lock
+ 0,39% firefox libxul.so [.] js::EmptyShape::getInitialShape
+ 0,38% mozStorage #5 libpthread-2.18.so [.] pthread_mutex_unlock
+ 0,35% ImageDecoder #2 libz.so.1.2.8 [.] inflate_fast
+ 0,35% firefox libxul.so [.] mozilla::gfx::AlphaBoxBlur::BoxBlur_SSE2
+ 0,34% firefox ld-2.18.so [.] do_lookup_x
+ 0,34% firefox libxul.so [.] JSObject::putProperty<(js::ExecutionMode)0>
+ 0,34% firefox libxul.so [.] js::ShapeTable::search
+ 0,33% firefox libxul.so [.] js::baseops::GetProperty
- 0,32% firefox [kernel.vmlinux] [k] clear_page_c_e
clear_page_c_e
- __alloc_pages_nodemask
- 97,50% alloc_pages_vma
- 95,75% handle_mm_fault
__do_page_fault
do_page_fault
- page_fault
- 19,82% __memcpy_sse2_unaligned
- 35,58% _XRead
_XReadPad
XGetImage
_get_image_surface
_cairo_xlib_surface_acquire_source_image
_cairo_surface_acquire_source_image
_pixman_image_for_pattern
_clip_and_composite_boxes
_cairo_image_surface_paint
_cairo_surface_paint
_cairo_gstate_paint
INT__moz_cairo_paint
_moz_cairo_paint_with_alpha
mozilla::gfx::DrawTargetCairo::DrawSurface
mozilla::image::imgFrame::LockImageData
mozilla::image::RasterImage::RequestScale
mozilla::image::RasterImage::OptimalImageSizeForDest
DrawImageInternal
nsLayoutUtils::DrawBackgroundImage
nsImageRenderer::Draw
nsImageRenderer::DrawBackground
nsCSSRendering::PaintBackgroundWithSC
nsCSSRendering::PaintBackground
nsDisplayBackgroundImage::PaintInternal
mozilla::FrameLayerBuilder::PaintItems
mozilla::FrameLayerBuilder::DrawThebesLayer
mozilla::layers::BasicThebesLayer::PaintBuffer
mozilla::layers::BasicThebesLayer::Validate
mozilla::layers::BasicContainerLayer::Validate
mozilla::layers::BasicContainerLayer::Validate
mozilla::layers::BasicLayerManager::EndTransactionInternal
nsDisplayList::PaintForFrame
- 22,39% js::jit::Assembler::executableCopy
js::jit::JitCode::copyFrom
js::jit::Linker::newCode<(js::AllowGC)1>
js::jit::ICStubCompiler::getStubCode
js::jit::ICCallScriptedCompiler::getStub
js::jit::TryAttachCallStub
js::jit::DoCallFallback
0x7f5e847ffdb0
- 21,73% js::NewObjectWithClassProtoCommon
js::jit::NewInitObject
js::jit::DoNewObject
0x7f5e84800ec4
+ 20,30% updatewindow
+ 15,70% arena_dalloc
+ 12,82% arena_malloc
+ 12,73% js::gc::Chunk::fetchNextDecommittedArena
+ 7,31% skb_copy_datagram_iovec
+ 7,04% sse2_blt
+ 4,45% NS_CopySegmentToBuffer
+ 4,18% js::InflateString
+ 4,18% mozilla::VectorBase<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy, js::Vector<js::jit::BaselineCompilerShared::ICLoadLabel, 16ul, js::SystemAllocPolicy> >::growStorageBy
+ 4,13% mozilla::image::imgFrame::Init
+ 3,86% _cairo_image_surface_span
+ 3,78% sse2_fill
+ 4,25% do_wp_page
+ 2,50% alloc_pages_current
+ 0,32% ImageDecoder #3 libz.so.1.2.8 [.] inflate_fast
And if you press 'V' you get the DSO name right after the symbol name:
+ 0,32% firefox [k] clear_page_c_e
+ 0,26% firefox [k] copy_user_enhanced_fast_string
+ 0,26% firefox [k] page_fault
+ 0,20% firefox [k] get_page_from_freelist
+ 0,20% firefox [k] __list_del_entry
+ 0,20% firefox [k] mem_cgroup_page_lruvec
+ 0,16% firefox [k] lookup_page_cgroup_used
- 0,14% firefox [k] _raw_spin_lock
- _raw_spin_lock [kernel.vmlinux]
- 37,73% try_to_wake_up [kernel.vmlinux]
- 64,20% wake_up_state [kernel.vmlinux]
wake_futex [kernel.vmlinux]
futex_wake [kernel.vmlinux]
do_futex [kernel.vmlinux]
sys_futex [kernel.vmlinux]
system_call [kernel.vmlinux]
- __lll_unlock_wake libpthread-2.18.so
37,39% 0x7f5e81b10400 libxul.so
32,07% 0x7f5e63e81bd0 unknown
- 30,54% arena_malloc firefox
malloc firefox
nsStringBuffer::Alloc libxul.so
nsAString_internal::MutatePrep libxul.so
nsAString_internal::ReplacePrepInternal libxul.so
+ 35,80% default_wake_function [kernel.vmlinux]
- 17,55% handle_mm_fault [kernel.vmlinux]
__do_page_fault [kernel.vmlinux]
do_page_fault [kernel.vmlinux]
- page_fault [kernel.vmlinux]
- 52,88% js::gc::Chunk::fetchNextDecommittedArena libxul.so
js::gc::Chunk::allocateArena libxul.so
js::gc::ArenaLists::allocateFromArenaInline libxul.so
js::gc::ArenaLists::refillFreeList<(js::AllowGC)1> libxul.so
js::PropertyTree::getChild libxul.so
JSObject::getChildProperty libxul.so
JSObject::addPropertyInternal<(js::ExecutionMode)0> libxul.so
JSObject::putProperty<(js::ExecutionMode)0> libxul.so
DefinePropertyOrElement<(js::ExecutionMode)0> libxul.so
js::DefineNativeProperty libxul.so
JSObject::defineGeneric libxul.so
js::DefineFunction libxul.so
JS_DefineFunctions libxul.so
js::GlobalObject::resolveConstructor libxul.so
js::GetBuiltinPrototype libxul.so
js::NewDenseUnallocatedArray libxul.so
js::XDRObjectLiteral<(js::XDRMode)1> libxul.so
js::XDRScript<(js::XDRMode)1> libxul.so
js::XDRState<(js::XDRMode)1>::codeScript libxul.so
JS_DecodeScript libxul.so
ReadCachedScript libxul.so
mozJSComponentLoader::ObjectForLocation libxul.so
mozJSComponentLoader::ImportInto libxul.so
mozJSComponentLoader::Import libxul.so
nsXPCComponents_Utils::Import libxul.so
NS_InvokeByIndex libxul.so
XPCWrappedNative::CallMethod libxul.so
XPC_WN_CallMethod libxul.so
js::Invoke libxul.so
- 47,12% arena_dalloc firefox
nsTArray_base<nsTArrayFallibleAllocator, nsTArray_CopyWithMemutils>::ShiftData libxul.so
SnowWhiteKiller::~SnowWhiteKiller libxul.so
nsCycleCollector::FreeSnowWhite libxul.so
nsCycleCollector::Shutdown libxul.so
nsCycleCollector_shutdown libxul.so
mozilla::ShutdownXPCOM libxul.so
ScopedXPCOMStartup::~ScopedXPCOMStartup libxul.so
XREMain::XRE_main libxul.so
XRE_main libxul.so
do_main firefox
main firefox
__libc_start_main libc-2.18.so
_start firefox
+ 9,54% do_futex [kernel.vmlinux]
- Arnaldo
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-01-21 14:56 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-20 17:08 Using perf to generate a call stack from a kernel function to the user space caller AKel
2015-01-20 18:40 ` Rick Jones
2015-01-21 9:42 ` AKel
2015-01-21 12:40 ` Arnaldo Carvalho de Melo
2015-01-21 14:56 ` Arnaldo Carvalho de Melo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.