All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Userspace Tracing and Backtraces
       [not found] <BY2SR01MB595A5A400E2211CC7567298C81F0@BY2SR01MB595.namsdf01.sdf.exchangelabs.com>
@ 2015-03-06 15:12 ` Francis Giraldeau
       [not found] ` <CAC6yHM6VLGQgjF4tS1dLg749rEZ8DKMMda=DsrihLCoQ9sHPdw@mail.gmail.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Francis Giraldeau @ 2015-03-06 15:12 UTC (permalink / raw)
  To: Brian Robbins, lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 1388 bytes --]

2015-03-05 17:39 GMT-05:00 Brian Robbins <brianrob@microsoft.com>:

>  Hello,
>
>
>
> I’m looking into using the userspace tracing capabilities of LTTng, and I
> wanted to find out if it is possible to capture a stack backtrace when a
> userspace tracepoint is hit.
>


The simple way is to use a tracepoint, where there is an array of
addresses. Symbols resolution can be done offline using the baddr events
indicating the address of shared libraries, and using libdwarf to parse
ELF.

To get the actual backtrace, you can use either frame pointers, but most
linux distributions compiles with -fomit-frame-pointers, and then this
method does not work (even of your own program is compiled with
-fno-omit-frame-pointer, a shared library, like libc, without frame pointer
in the middle of the stack does break the frame chain). Then, stack unwind
(using libunwind for instance) is necessary.

It could be nice to add such feature an event context to lttng-ust, it
would allow to record the ELF callstack for any userspace event. The
question is whether doing the unwind online (but it is quite costly in
time), or do it offline, like perf is doing, by recording 2 pages of the
stack with registers (but records much more data, and may not work if there
are large variables on the stack, and requires additional support for JIT
code).

Cheers,

Francis

[-- Attachment #1.2: Type: text/html, Size: 2021 bytes --]

[-- Attachment #2: Type: text/plain, Size: 155 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Userspace Tracing and Backtraces
       [not found] ` <CAC6yHM6VLGQgjF4tS1dLg749rEZ8DKMMda=DsrihLCoQ9sHPdw@mail.gmail.com>
@ 2015-03-06 18:51   ` Brian Robbins
       [not found]   ` <BY2SR01MB5952376CA9DE7B8211078C8C81C0@BY2SR01MB595.namsdf01.sdf.exchangelabs.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Brian Robbins @ 2015-03-06 18:51 UTC (permalink / raw)
  To: Francis Giraldeau, lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 1817 bytes --]

Thanks Francis.

Is it accurate to say then that the array of addresses would need to be captured by app code by writing a stack walker by hand or using the perf capture mechanism that you describe below?

Thanks.
-Brian

From: Francis Giraldeau [mailto:francis.giraldeau@gmail.com]
Sent: Friday, March 6, 2015 7:13 AM
To: Brian Robbins; lttng-dev@lists.lttng.org
Subject: Re: [lttng-dev] Userspace Tracing and Backtraces

2015-03-05 17:39 GMT-05:00 Brian Robbins <brianrob@microsoft.com<mailto:brianrob@microsoft.com>>:
Hello,

I’m looking into using the userspace tracing capabilities of LTTng, and I wanted to find out if it is possible to capture a stack backtrace when a userspace tracepoint is hit.


The simple way is to use a tracepoint, where there is an array of addresses. Symbols resolution can be done offline using the baddr events indicating the address of shared libraries, and using libdwarf to parse ELF.

To get the actual backtrace, you can use either frame pointers, but most linux distributions compiles with -fomit-frame-pointers, and then this method does not work (even of your own program is compiled with -fno-omit-frame-pointer, a shared library, like libc, without frame pointer in the middle of the stack does break the frame chain). Then, stack unwind (using libunwind for instance) is necessary.

It could be nice to add such feature an event context to lttng-ust, it would allow to record the ELF callstack for any userspace event. The question is whether doing the unwind online (but it is quite costly in time), or do it offline, like perf is doing, by recording 2 pages of the stack with registers (but records much more data, and may not work if there are large variables on the stack, and requires additional support for JIT code).

Cheers,

Francis

[-- Attachment #1.2: Type: text/html, Size: 5835 bytes --]

[-- Attachment #2: Type: text/plain, Size: 155 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Userspace Tracing and Backtraces
       [not found]   ` <BY2SR01MB5952376CA9DE7B8211078C8C81C0@BY2SR01MB595.namsdf01.sdf.exchangelabs.com>
@ 2015-03-09 19:38     ` Francis Giraldeau
       [not found]     ` <CAC6yHM4AXG5btHdaoHjuCwzCDvPn9AORyh8gvo-VOua7ndWF2w@mail.gmail.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Francis Giraldeau @ 2015-03-09 19:38 UTC (permalink / raw)
  To: Brian Robbins; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 1739 bytes --]

2015-03-06 13:51 GMT-05:00 Brian Robbins <brianrob@microsoft.com>:

>  Thanks Francis.
>
>
>
> Is it accurate to say then that the array of addresses would need to be
> captured by app code by writing a stack walker by hand
>

Yes, the callstack can be recorded in userspace. You would need a
tracepoint with a varying length field:

TRACEPOINT_EVENT(myprovider, callstack,
TP_ARGS(unsigned long *, buf, size_t, depth),
TP_FIELDS(
ctf_sequence(unsigned long, addr, buf, size_t, depth)
)
)

In the app code, use libunwind [1] to get the addresses, then call the
tracepoint:

  do_unwind() // use libunwind here
  tracepoint(myprovider, buf, depth);

However, the unwind will be done whether or not the tracepoint is active
(~10us-100us in steady state, so it's may become expansive if called
often). I know there was discussion about tp_code() for such use case (some
code to call before the tracepoint only if it is enabled). Or you can
cheat:

if (__builtin_expect(!!(__tracepoint_myprovider___callstack.state), 0)) {
    do_unwind(...)
    tracepoint(myprovider, buf, depth);
}

That said, instead of having a callstack tracepoint, IMHO the best solution
would be instead extending lttng-ust to add callstack event context (itself
linked to libunwind). Then, recording the callstack would be simple like
that:

  $ lttng add-context -u -t callstack



> or using the perf capture mechanism that you describe below?
>


Perf is peeking at the userspace from kernel space, it's another story. I
guess that libunwind was not ported to the kernel because it is a large
chunk of complicated code that performs a lot of I/O and computation, while
copying a portion of the stack is really about KISS and low runtime
overhead.

Cheers,

Francis

[-- Attachment #1.2: Type: text/html, Size: 3505 bytes --]

[-- Attachment #2: Type: text/plain, Size: 155 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Userspace Tracing and Backtraces
       [not found]     ` <CAC6yHM4AXG5btHdaoHjuCwzCDvPn9AORyh8gvo-VOua7ndWF2w@mail.gmail.com>
@ 2015-03-09 23:17       ` Brian Robbins
       [not found]       ` <CH1SR01MB5996051E5E9E51406F6502DC81B0@CH1SR01MB599.namsdf01.sdf.exchangelabs.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Brian Robbins @ 2015-03-09 23:17 UTC (permalink / raw)
  To: Francis Giraldeau; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 2482 bytes --]

Thanks Francis.

This is what I expected to have to do.  I do agree though that adding this to lttng-ust would be a good way to go.

Should we end up on this path, it certainly seems like it might we worth our time to investigate what it would take to add it to lttng-ust.  Do you know who is the right person to talk to about this?  I’d want to make sure that this would not be a non-starter.

Thanks.
-Brian

From: Francis Giraldeau [mailto:francis.giraldeau@gmail.com]
Sent: Monday, March 9, 2015 12:39 PM
To: Brian Robbins
Cc: lttng-dev@lists.lttng.org
Subject: Re: [lttng-dev] Userspace Tracing and Backtraces

2015-03-06 13:51 GMT-05:00 Brian Robbins <brianrob@microsoft.com<mailto:brianrob@microsoft.com>>:
Thanks Francis.

Is it accurate to say then that the array of addresses would need to be captured by app code by writing a stack walker by hand

Yes, the callstack can be recorded in userspace. You would need a tracepoint with a varying length field:

TRACEPOINT_EVENT(myprovider, callstack,
            TP_ARGS(unsigned long *, buf, size_t, depth),
            TP_FIELDS(
                        ctf_sequence(unsigned long, addr, buf, size_t, depth)
            )
)

In the app code, use libunwind [1] to get the addresses, then call the tracepoint:

  do_unwind() // use libunwind here
  tracepoint(myprovider, buf, depth);

However, the unwind will be done whether or not the tracepoint is active (~10us-100us in steady state, so it's may become expansive if called often). I know there was discussion about tp_code() for such use case (some code to call before the tracepoint only if it is enabled). Or you can cheat:

if (__builtin_expect(!!(__tracepoint_myprovider___callstack.state), 0)) {
    do_unwind(...)
    tracepoint(myprovider, buf, depth);
}

That said, instead of having a callstack tracepoint, IMHO the best solution would be instead extending lttng-ust to add callstack event context (itself linked to libunwind). Then, recording the callstack would be simple like that:

  $ lttng add-context -u -t callstack


or using the perf capture mechanism that you describe below?


Perf is peeking at the userspace from kernel space, it's another story. I guess that libunwind was not ported to the kernel because it is a large chunk of complicated code that performs a lot of I/O and computation, while copying a portion of the stack is really about KISS and low runtime overhead.

Cheers,

Francis

[-- Attachment #1.2: Type: text/html, Size: 8990 bytes --]

[-- Attachment #2: Type: text/plain, Size: 155 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Userspace Tracing and Backtraces
       [not found]       ` <CH1SR01MB5996051E5E9E51406F6502DC81B0@CH1SR01MB599.namsdf01.sdf.exchangelabs.com>
@ 2015-03-11  1:47         ` Mathieu Desnoyers
       [not found]         ` <353474327.273521.1426038464379.JavaMail.zimbra@efficios.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Mathieu Desnoyers @ 2015-03-11  1:47 UTC (permalink / raw)
  To: Brian Robbins; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 4642 bytes --]

----- Original Message -----

> From: "Brian Robbins" <brianrob@microsoft.com>
> To: "Francis Giraldeau" <francis.giraldeau@gmail.com>
> Cc: lttng-dev@lists.lttng.org
> Sent: Monday, March 9, 2015 7:17:51 PM
> Subject: Re: [lttng-dev] Userspace Tracing and Backtraces

> Thanks Francis.

> This is what I expected to have to do. I do agree though that adding this to
> lttng-ust would be a good way to go.

> Should we end up on this path, it certainly seems like it might we worth our
> time to investigate what it would take to add it to lttng-ust. Do you know
> who is the right person to talk to about this? I’d want to make sure that
> this would not be a non-starter.

Hi! 

I'm the maintainer of LTTng-UST. I agree that adding a backtrace context 
to lttng-ust would be a very useful feature. 

Some comments inline below, 

> Thanks.

> -Brian

> From: Francis Giraldeau [mailto:francis.giraldeau@gmail.com]
> Sent: Monday, March 9, 2015 12:39 PM
> To: Brian Robbins
> Cc: lttng-dev@lists.lttng.org
> Subject: Re: [lttng-dev] Userspace Tracing and Backtraces

> 2015-03-06 13:51 GMT-05:00 Brian Robbins < brianrob@microsoft.com >:
> > Thanks Francis.
> 

> > Is it accurate to say then that the array of addresses would need to be
> > captured by app code by writing a stack walker by hand
> 

> Yes, the callstack can be recorded in userspace. You would need a tracepoint
> with a varying length field:

> TRACEPOINT_EVENT(myprovider, callstack,

> TP_ARGS(unsigned long *, buf, size_t, depth),

> TP_FIELDS(

> ctf_sequence(unsigned long, addr, buf, size_t, depth)

> )

> )

> In the app code, use libunwind [1] to get the addresses, then call the
> tracepoint:

> do_unwind() // use libunwind here

> tracepoint(myprovider, buf, depth);

> However, the unwind will be done whether or not the tracepoint is active
> (~10us-100us in steady state, so it's may become expansive if called often).
> I know there was discussion about tp_code() for such use case (some code to
> call before the tracepoint only if it is enabled). Or you can cheat:

Francis: Did you define UNW_LOCAL_ONLY before including 
the libunwind header in your benchmarks ? (see 
http://www.nongnu.org/libunwind/man/libunwind%283%29.html ) 

The seems to change performance dramatically according to the documentation. 

> if (__builtin_expect(!!(__tracepoint_myprovider___callstack.state), 0)) {

> do_unwind(...)

> tracepoint(myprovider, buf, depth);

> }

> That said, instead of having a callstack tracepoint, IMHO the best solution
> would be instead extending lttng-ust to add callstack event context (itself
> linked to libunwind). Then, recording the callstack would be simple like
> that:

> $ lttng add-context -u -t callstack

Agreed on having the backtrace as a context. The main question left is 
to figure out if we want to call libunwind from within the traced application 
execution context. 

Unfortunately, libunwind is not reentrant wrt signals. This is already 
a good argument for not calling it from within a tracepoint. I wonder 
if the authors of libunwind would be open to make it signal-reentrant 
in the future (not by disabling signals, but rather by keeping a TLS 
nesting counter, and returning an error if nested, for performance 
considerations). 

> > or using the perf capture mechanism that you describe below?
> 

> Perf is peeking at the userspace from kernel space, it's another story. I
> guess that libunwind was not ported to the kernel because it is a large
> chunk of complicated code that performs a lot of I/O and computation, while
> copying a portion of the stack is really about KISS and low runtime
> overhead.

If using libunwind does not work out, another alternative I would consider 
would be to copy the stack like perf is doing from the kernel. However, 
in the spirit of compacting trace data, I would be tempted to do the following 
if we go down that route: check each pointer-aligned address for its content. 
If it looks like a pointer to an executable memory area (library, executable, or 
JIT'd code), we keep it. Else, we zero this information (not needed). We can 
then do a RLE-alike compression on the zeroes, so we can keep the layout 
of the stack after uncompression. 

Thoughts ? 

Thanks, 

Mathieu 

> Cheers,

> Francis

> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers 
EfficiOS Inc. 
http://www.efficios.com 

[-- Attachment #1.2: Type: text/html, Size: 11063 bytes --]

[-- Attachment #2: Type: text/plain, Size: 155 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Userspace Tracing and Backtraces
       [not found]         ` <353474327.273521.1426038464379.JavaMail.zimbra@efficios.com>
@ 2015-03-12 18:34           ` Francis Giraldeau
       [not found]           ` <CAC6yHM4QSUhv4mw0HzhENDY32BrmckQwvX==gAvw0ej3mwBGAg@mail.gmail.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Francis Giraldeau @ 2015-03-12 18:34 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 4015 bytes --]

2015-03-10 21:47 GMT-04:00 Mathieu Desnoyers <mathieu.desnoyers@efficios.com
>:
>
> Francis: Did you define UNW_LOCAL_ONLY before including
> the libunwind header in your benchmarks ? (see
> http://www.nongnu.org/libunwind/man/libunwind%283%29.html)
>

> The seems to change performance dramatically according to the
> documentation.
>


Yes, this is the case. Time to unwind is higher at the beginning (probably
related to internal cache build), and also vary according to call-stack
depth.


> Agreed on having the backtrace as a context. The main question left is
> to figure out if we want to call libunwind from within the traced
> application
> execution context.
>
> Unfortunately, libunwind is not reentrant wrt signals. This is already
> a good argument for not calling it from within a tracepoint. I wonder
> if the authors of libunwind would be open to make it signal-reentrant
> in the future (not by disabling signals, but rather by keeping a TLS
> nesting counter, and returning an error if nested, for performance
> considerations).
>

The functions unw_init_local() and unw_step() are signal safe [1]. The
critical sections are protected using lock_acquire() that blocks all
signals before taking the mutex, which prevent the recursion.

#define lock_acquire(l,m)                               \
do {                                                    \
  SIGPROCMASK (SIG_SETMASK, &unwi_full_mask, &(m));     \
  mutex_lock (l);                                       \
} while (0)
#define lock_release(l,m)                       \
do {                                            \
  mutex_unlock (l);                             \
  SIGPROCMASK (SIG_SETMASK, &(m), NULL);        \
} while (0)

To understand the implications, I did a small program to study nested
signals [2], where a signal is sent from within a signal, or when
segmentation fault occurs in a signal handler. Blocking a signal differs it
when it is unblocked, while ignored signals are discarded. Blocked signals
that can't be ignored have their default behaviour. It prevents a possible
deadlock, let's say if lock_acquire() was nesting with a custom SIGSEGV
handler trying to get the same lock.

So, let's say that instead of blocking signals, we have a per-thread mutex,
that returns if try_lock() fails. It would be faster, but from the user's
point of view, the backtrace will be dropped randomly. I would prefer it a
bit slower, but reliable.

In addition, could it be possible that TLS is not signal safe [3]?

> or using the perf capture mechanism that you describe below?
>
> Perf is peeking at the userspace from kernel space, it's another story. I
> guess that libunwind was not ported to the kernel because it is a large
> chunk of complicated code that performs a lot of I/O and computation, while
> copying a portion of the stack is really about KISS and low runtime
> overhead.
>
> If using libunwind does not work out, another alternative I would consider
> would be to copy the stack like perf is doing from the kernel. However,
> in the spirit of compacting trace data, I would be tempted to do the
> following
> if we go down that route: check each pointer-aligned address for its
> content.
> If it looks like a pointer to an executable memory area (library,
> executable, or
> JIT'd code), we keep it. Else, we zero this information (not needed). We
> can
> then do a RLE-alike compression on the zeroes, so we can keep the layout
> of the stack after uncompression.
>
>
Interesting! For comparison, here is a perf event [4] that shows there is a
lot of room for reducing the event size. We should check if discarding
other saved register values on the stack impacts restoring the instruction
pointer register. Doing the unwind offline also solves signal safety,
should be fast and scalable.

Francis

[1] http://www.nongnu.org/libunwind/man/unw_init_local(3).html
[2] https://gist.github.com/giraldeau/98f08161e83a7ab800ea
[3] https://sourceware.org/glibc/wiki/TLSandSignals
[4] http://pastebin.com/sByfXXAQ

[-- Attachment #1.2: Type: text/html, Size: 7280 bytes --]

[-- Attachment #2: Type: text/plain, Size: 155 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Userspace Tracing and Backtraces
       [not found]           ` <CAC6yHM4QSUhv4mw0HzhENDY32BrmckQwvX==gAvw0ej3mwBGAg@mail.gmail.com>
@ 2015-03-17  0:35             ` Brian Robbins
       [not found]             ` <BY2SR01MB59586D89F6A546404D75DE3C8030@BY2SR01MB595.namsdf01.sdf.exchangelabs.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Brian Robbins @ 2015-03-17  0:35 UTC (permalink / raw)
  To: Francis Giraldeau, Mathieu Desnoyers; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 4343 bytes --]

Hi All,

This is great.  Thank you very much for the information.

-Brian

From: Francis Giraldeau [mailto:francis.giraldeau@gmail.com]
Sent: Thursday, March 12, 2015 11:34 AM
To: Mathieu Desnoyers
Cc: Brian Robbins; lttng-dev@lists.lttng.org
Subject: Re: [lttng-dev] Userspace Tracing and Backtraces

2015-03-10 21:47 GMT-04:00 Mathieu Desnoyers <mathieu.desnoyers@efficios.com<mailto:mathieu.desnoyers@efficios.com>>:
Francis: Did you define UNW_LOCAL_ONLY before including
the libunwind header in your benchmarks ? (see
http://www.nongnu.org/libunwind/man/libunwind%283%29.html)

The seems to change performance dramatically according to the documentation.


Yes, this is the case. Time to unwind is higher at the beginning (probably related to internal cache build), and also vary according to call-stack depth.

Agreed on having the backtrace as a context. The main question left is
to figure out if we want to call libunwind from within the traced application
execution context.

Unfortunately, libunwind is not reentrant wrt signals. This is already
a good argument for not calling it from within a tracepoint. I wonder
if the authors of libunwind would be open to make it signal-reentrant
in the future (not by disabling signals, but rather by keeping a TLS
nesting counter, and returning an error if nested, for performance
considerations).

The functions unw_init_local() and unw_step() are signal safe [1]. The critical sections are protected using lock_acquire() that blocks all signals before taking the mutex, which prevent the recursion.

#define lock_acquire(l,m)                               \
do {                                                    \
  SIGPROCMASK (SIG_SETMASK, &unwi_full_mask, &(m));     \
  mutex_lock (l);                                       \
} while (0)
#define lock_release(l,m)                       \
do {                                            \
  mutex_unlock (l);                             \
  SIGPROCMASK (SIG_SETMASK, &(m), NULL);        \
} while (0)

To understand the implications, I did a small program to study nested signals [2], where a signal is sent from within a signal, or when segmentation fault occurs in a signal handler. Blocking a signal differs it when it is unblocked, while ignored signals are discarded. Blocked signals that can't be ignored have their default behaviour. It prevents a possible deadlock, let's say if lock_acquire() was nesting with a custom SIGSEGV handler trying to get the same lock.

So, let's say that instead of blocking signals, we have a per-thread mutex, that returns if try_lock() fails. It would be faster, but from the user's point of view, the backtrace will be dropped randomly. I would prefer it a bit slower, but reliable.

In addition, could it be possible that TLS is not signal safe [3]?
or using the perf capture mechanism that you describe below?
Perf is peeking at the userspace from kernel space, it's another story. I guess that libunwind was not ported to the kernel because it is a large chunk of complicated code that performs a lot of I/O and computation, while copying a portion of the stack is really about KISS and low runtime overhead.
If using libunwind does not work out, another alternative I would consider
would be to copy the stack like perf is doing from the kernel. However,
in the spirit of compacting trace data, I would be tempted to do the following
if we go down that route: check each pointer-aligned address for its content.
If it looks like a pointer to an executable memory area (library, executable, or
JIT'd code), we keep it. Else, we zero this information (not needed). We can
then do a RLE-alike compression on the zeroes, so we can keep the layout
of the stack after uncompression.


Interesting! For comparison, here is a perf event [4] that shows there is a lot of room for reducing the event size. We should check if discarding other saved register values on the stack impacts restoring the instruction pointer register. Doing the unwind offline also solves signal safety, should be fast and scalable.

Francis

[1] http://www.nongnu.org/libunwind/man/unw_init_local(3).html
[2] https://gist.github.com/giraldeau/98f08161e83a7ab800ea
[3] https://sourceware.org/glibc/wiki/TLSandSignals
[4] http://pastebin.com/sByfXXAQ

[-- Attachment #1.2: Type: text/html, Size: 13446 bytes --]

[-- Attachment #2: Type: text/plain, Size: 155 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Userspace Tracing and Backtraces
       [not found]             ` <BY2SR01MB59586D89F6A546404D75DE3C8030@BY2SR01MB595.namsdf01.sdf.exchangelabs.com>
@ 2015-03-17 18:50               ` François Doray
  0 siblings, 0 replies; 9+ messages in thread
From: François Doray @ 2015-03-17 18:50 UTC (permalink / raw)
  To: Brian Robbins; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 6033 bytes --]

Hi,

I worked on these 2 optimizations to libunwind:
- I replaced the global cache by a thread-local cache. This removes the
need for some locking.
- Instead of restoring all registers for every stack frame, I restore only
EBP, EIP and ESP.

The code is here:
https://github.com/fdoray/libunwind/tree/minimal_regs
Warning: This version of the library is no longer signal-safe and has
undefined behavior if more registers than EBP/EIP/ESP are required to
unwind a stack frame (this never happened in the few tests that I made so
far). These two limitations could easily be overcome in the future.

Performance results, on x86_64:

unw_backtrace(), original libunwind
Mean time per backtrace: 6130 ns / 80% of samples between 1479 and 13837 ns

unw_backtrace(), modified libunwind ***
Mean time per backtrace: 4255 ns / 80% of samples between 1526 and 5252 ns.

unw_step()/unw_get_reg() [1], original libuwndin
Mean time per backtrace: 43520 ns / 80% of samples between 13705 and 58782
ns.

unw_step()/unw_get_reg(), modified libunwind
Mean time per backtrace: 5804 ns / 80% of samples between 2844 and 11325 ns.

Francois

[1] As in the example presented here:
http://www.nongnu.org/libunwind/man/libunwind(3).html


On Mon, Mar 16, 2015 at 8:35 PM, Brian Robbins <brianrob@microsoft.com>
wrote:

>  Hi All,
>
>
>
> This is great.  Thank you very much for the information.
>
>
>
> -Brian
>
>
>
> *From:* Francis Giraldeau [mailto:francis.giraldeau@gmail.com]
> *Sent:* Thursday, March 12, 2015 11:34 AM
> *To:* Mathieu Desnoyers
> *Cc:* Brian Robbins; lttng-dev@lists.lttng.org
> *Subject:* Re: [lttng-dev] Userspace Tracing and Backtraces
>
>
>
> 2015-03-10 21:47 GMT-04:00 Mathieu Desnoyers <
> mathieu.desnoyers@efficios.com>:
>
>  Francis: Did you define UNW_LOCAL_ONLY before including
>
> the libunwind header in your benchmarks ? (see
>
> http://www.nongnu.org/libunwind/man/libunwind%283%29.html)
>
>
>
> The seems to change performance dramatically according to the
> documentation.
>
>
>
>
>
> Yes, this is the case. Time to unwind is higher at the beginning (probably
> related to internal cache build), and also vary according to call-stack
> depth.
>
>
>
>  Agreed on having the backtrace as a context. The main question left is
>
> to figure out if we want to call libunwind from within the traced
> application
>
> execution context.
>
>
>
> Unfortunately, libunwind is not reentrant wrt signals. This is already
>
> a good argument for not calling it from within a tracepoint. I wonder
>
> if the authors of libunwind would be open to make it signal-reentrant
>
> in the future (not by disabling signals, but rather by keeping a TLS
>
> nesting counter, and returning an error if nested, for performance
>
> considerations).
>
>
>
> The functions unw_init_local() and unw_step() are signal safe [1]. The
> critical sections are protected using lock_acquire() that blocks all
> signals before taking the mutex, which prevent the recursion.
>
>
>
> #define lock_acquire(l,m)                               \
>
> do {                                                    \
>
>   SIGPROCMASK (SIG_SETMASK, &unwi_full_mask, &(m));     \
>
>   mutex_lock (l);                                       \
>
> } while (0)
>
> #define lock_release(l,m)                       \
>
> do {                                            \
>
>   mutex_unlock (l);                             \
>
>   SIGPROCMASK (SIG_SETMASK, &(m), NULL);        \
>
> } while (0)
>
>
>
> To understand the implications, I did a small program to study nested
> signals [2], where a signal is sent from within a signal, or when
> segmentation fault occurs in a signal handler. Blocking a signal differs it
> when it is unblocked, while ignored signals are discarded. Blocked signals
> that can't be ignored have their default behaviour. It prevents a possible
> deadlock, let's say if lock_acquire() was nesting with a custom SIGSEGV
> handler trying to get the same lock.
>
>
>
> So, let's say that instead of blocking signals, we have a per-thread
> mutex, that returns if try_lock() fails. It would be faster, but from the
> user's point of view, the backtrace will be dropped randomly. I would
> prefer it a bit slower, but reliable.
>
>
>
> In addition, could it be possible that TLS is not signal safe [3]?
>
>      or using the perf capture mechanism that you describe below?
>
>  Perf is peeking at the userspace from kernel space, it's another story.
> I guess that libunwind was not ported to the kernel because it is a large
> chunk of complicated code that performs a lot of I/O and computation, while
> copying a portion of the stack is really about KISS and low runtime
> overhead.
>
>  If using libunwind does not work out, another alternative I would
> consider
>
> would be to copy the stack like perf is doing from the kernel. However,
>
> in the spirit of compacting trace data, I would be tempted to do the
> following
>
> if we go down that route: check each pointer-aligned address for its
> content.
>
> If it looks like a pointer to an executable memory area (library,
> executable, or
>
> JIT'd code), we keep it. Else, we zero this information (not needed). We
> can
>
> then do a RLE-alike compression on the zeroes, so we can keep the layout
>
> of the stack after uncompression.
>
>
>
>
>
> Interesting! For comparison, here is a perf event [4] that shows there is
> a lot of room for reducing the event size. We should check if discarding
> other saved register values on the stack impacts restoring the instruction
> pointer register. Doing the unwind offline also solves signal safety,
> should be fast and scalable.
>
>
>
> Francis
>
>
>
> [1] http://www.nongnu.org/libunwind/man/unw_init_local(3).html
>
> [2] https://gist.github.com/giraldeau/98f08161e83a7ab800ea
>
> [3] https://sourceware.org/glibc/wiki/TLSandSignals
>
> [4] http://pastebin.com/sByfXXAQ
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
>

[-- Attachment #1.2: Type: text/html, Size: 14118 bytes --]

[-- Attachment #2: Type: text/plain, Size: 155 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Userspace Tracing and Backtraces
@ 2015-03-05 22:39 Brian Robbins
  0 siblings, 0 replies; 9+ messages in thread
From: Brian Robbins @ 2015-03-05 22:39 UTC (permalink / raw)
  To: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 291 bytes --]

Hello,

I'm looking into using the userspace tracing capabilities of LTTng, and I wanted to find out if it is possible to capture a stack backtrace when a userspace tracepoint is hit.

I've done a bunch of searching, but have not yet been able to find the answer.

Thank you.
-Brian

[-- Attachment #1.2: Type: text/html, Size: 2223 bytes --]

[-- Attachment #2: Type: text/plain, Size: 155 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-03-17 18:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <BY2SR01MB595A5A400E2211CC7567298C81F0@BY2SR01MB595.namsdf01.sdf.exchangelabs.com>
2015-03-06 15:12 ` Userspace Tracing and Backtraces Francis Giraldeau
     [not found] ` <CAC6yHM6VLGQgjF4tS1dLg749rEZ8DKMMda=DsrihLCoQ9sHPdw@mail.gmail.com>
2015-03-06 18:51   ` Brian Robbins
     [not found]   ` <BY2SR01MB5952376CA9DE7B8211078C8C81C0@BY2SR01MB595.namsdf01.sdf.exchangelabs.com>
2015-03-09 19:38     ` Francis Giraldeau
     [not found]     ` <CAC6yHM4AXG5btHdaoHjuCwzCDvPn9AORyh8gvo-VOua7ndWF2w@mail.gmail.com>
2015-03-09 23:17       ` Brian Robbins
     [not found]       ` <CH1SR01MB5996051E5E9E51406F6502DC81B0@CH1SR01MB599.namsdf01.sdf.exchangelabs.com>
2015-03-11  1:47         ` Mathieu Desnoyers
     [not found]         ` <353474327.273521.1426038464379.JavaMail.zimbra@efficios.com>
2015-03-12 18:34           ` Francis Giraldeau
     [not found]           ` <CAC6yHM4QSUhv4mw0HzhENDY32BrmckQwvX==gAvw0ej3mwBGAg@mail.gmail.com>
2015-03-17  0:35             ` Brian Robbins
     [not found]             ` <BY2SR01MB59586D89F6A546404D75DE3C8030@BY2SR01MB595.namsdf01.sdf.exchangelabs.com>
2015-03-17 18:50               ` François Doray
2015-03-05 22:39 Brian Robbins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.