All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC] perf sampling library for LTTng-UST
       [not found] <1483f0d3-da42-8269-a25b-a45d7db5e1ea@gmail.com>
@ 2016-11-17 16:14 ` Staffan Tjernstrom
  2016-11-18 16:36 ` Milian Wolff
  1 sibling, 0 replies; 3+ messages in thread
From: Staffan Tjernstrom @ 2016-11-17 16:14 UTC (permalink / raw)
  To: Francis Giraldeau; +Cc: lttng-dev

Whilst I'm not competent to comment on the implementation, I really
like the functionality - it's going to be a life-saver for me.

The need to sample perf counters from user space programs is huge in
my world.
> Hello!
>
> I did a small shared library for profiling code using perf sampling
> and LTTng-UST:
>
>   https://github.com/giraldeau/lttng-ust/tree/sampling/liblttng-ust-sampling
>
> It works by preloading the library when executing a program. Inside
> the library constructor, a perf counter is created and samples are
> saved inside the SIGIO handler using an LTTng-UST tracepoint. The call
> stack is obtained using libunwind, and thus it works even without
> frame pointers and with unmodified executables.
>
> Preliminary overhead measure with the default sampling period of 1E4
> for cpu cycles is about 8.5%, or 2.2us per event. On my machine, about
> 38k samples per second are generated. This figure is obtained when
> compiling libunwind without signal re-entrance support.
>
> Genevieve Bastien did a nice view in TraceCompass to load this trace
> and display the corresponding call graph view.
>
>   http://secretaire.dorsal.polymtl.ca/~gbastien/screenshots/lttng_sampling_callstack.png
>
> The counter is hard coded now, but it's just a prototype to
> demonstrate the concept. I would find it very cool to see such feature
> in LTTng. What do you think?
>
> Cheers,
>
> Francis
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>


_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] perf sampling library for LTTng-UST
       [not found] <1483f0d3-da42-8269-a25b-a45d7db5e1ea@gmail.com>
  2016-11-17 16:14 ` [RFC] perf sampling library for LTTng-UST Staffan Tjernstrom
@ 2016-11-18 16:36 ` Milian Wolff
  1 sibling, 0 replies; 3+ messages in thread
From: Milian Wolff @ 2016-11-18 16:36 UTC (permalink / raw)
  To: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 2402 bytes --]

On Friday, September 23, 2016 2:43:32 PM CET Francis Giraldeau wrote:
> Hello!
> 
> I did a small shared library for profiling code using perf sampling and
> LTTng-UST:
> 
>   https://github.com/giraldeau/lttng-ust/tree/sampling/liblttng-ust-sampling
> 
> It works by preloading the library when executing a program. Inside the
> library constructor, a perf counter is created and samples are saved inside
> the SIGIO handler using an LTTng-UST tracepoint. The call stack is obtained
> using libunwind, and thus it works even without frame pointers and with
> unmodified executables.
> 
> Preliminary overhead measure with the default sampling period of 1E4 for cpu
> cycles is about 8.5%, or 2.2us per event. On my machine, about 38k samples
> per second are generated. This figure is obtained when compiling libunwind
> without signal re-entrance support.
> 
> Genevieve Bastien did a nice view in TraceCompass to load this trace and
> display the corresponding call graph view.
> 
>  
> http://secretaire.dorsal.polymtl.ca/~gbastien/screenshots/lttng_sampling_ca
> llstack.png
> 
> The counter is hard coded now, but it's just a prototype to demonstrate the
> concept. I would find it very cool to see such feature in LTTng. What do
> you think?

The image times out for me, i.e. I cannot load it.

I very much like the idea, as it would easily allow to combine LTTNG and perf. 
The performance impact is pretty bad though. Can't this be done differently, 
such that you reuse whatever `perf record` uses internally? That one has a far 
smaller overhead, even when using libunwind/libdw for DWARF based unwinding 
(--call-graph dwarf).

In general, I see perf being really good for profiling, whereas LTTNG seems to 
be really good for tracing. Both somehow support the other side, but not as 
nicely. I would welcome if the two projects would start to collaborate more 
deeply. 

From my POV, I'd like to:

- trace most of the kernel stuff, most notably scheduler, page faults, 
syscalls, ...
- trace all UST points
- sample CPU

The latter two usually only for a single process, but sometimes multiple ones. 
LTTNG gives me the first two points, and perf gives me the latter.

Bye

-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFC] perf sampling library for LTTng-UST
@ 2016-09-23 18:43 Francis Giraldeau
  0 siblings, 0 replies; 3+ messages in thread
From: Francis Giraldeau @ 2016-09-23 18:43 UTC (permalink / raw)
  To: lttng-dev

Hello! 

I did a small shared library for profiling code using perf sampling and LTTng-UST:

  https://github.com/giraldeau/lttng-ust/tree/sampling/liblttng-ust-sampling

It works by preloading the library when executing a program. Inside the library constructor, a perf counter is created and samples are saved inside the SIGIO handler using an LTTng-UST tracepoint. The call stack is obtained using libunwind, and thus it works even without frame pointers and with unmodified executables. 

Preliminary overhead measure with the default sampling period of 1E4 for cpu cycles is about 8.5%, or 2.2us per event. On my machine, about 38k samples per second are generated. This figure is obtained when compiling libunwind without signal re-entrance support.

Genevieve Bastien did a nice view in TraceCompass to load this trace and display the corresponding call graph view. 

  http://secretaire.dorsal.polymtl.ca/~gbastien/screenshots/lttng_sampling_callstack.png

The counter is hard coded now, but it's just a prototype to demonstrate the concept. I would find it very cool to see such feature in LTTng. What do you think?

Cheers, 

Francis
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-11-18 16:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1483f0d3-da42-8269-a25b-a45d7db5e1ea@gmail.com>
2016-11-17 16:14 ` [RFC] perf sampling library for LTTng-UST Staffan Tjernstrom
2016-11-18 16:36 ` Milian Wolff
2016-09-23 18:43 Francis Giraldeau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.