bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] Batteries-included symbolization with blazesym
@ 2023-02-27 19:34 Daniel Müller
  2023-02-27 20:07 ` Daniel Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Müller @ 2023-02-27 19:34 UTC (permalink / raw)
  To: lsf-pc; +Cc: bpf

Symbolization of addresses is a commonly encountered problem, maybe most so in
the context of BPF and tracing with the capturing of stack traces. Perhaps
superficially straightforward-looking, there a variety of considerations and
intricacies, such as:
- different formats/standards (e.g., ELF symbol information, DWARF, GSYM) cater
  to different use cases and require vastly different steps to work with
  - on top of that, even if a library such as libelf or libdwarf is relied on,
    plenty of format specific details need to be known to symbolize addresses
    properly
- discovery of symbolization sources (e.g., DWARF debug files)
- symbolization trade-offs (performance, memory usage)
- system-specific details and corner cases

We are working on blazesym [0], a library that aims to provide users with a
batteries-included experience for symbolizing addresses (but also the reverse:
mapping symbols to addresses).

We would like to provide a brief overview of the library and its goals and then
open up for discussion. Some topics we are specifically interested in
understanding better:
- What are current issues with symbolization that would be great to support?
- Does the usage of Rust pose a problem in your context? (C bindings are
  available, but a Rust toolchain is required for building; are pre-built
  binaries and packages for common distributions sufficient for your use cases?)

In general, we'd be interested in hearing your use cases and in discussing
whether blazesym is a fit or could be made to work.

Thanks,
Daniel

[0] https://github.com/libbpf/blazesym

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Batteries-included symbolization with blazesym
  2023-02-27 19:34 [LSF/MM/BPF TOPIC] Batteries-included symbolization with blazesym Daniel Müller
@ 2023-02-27 20:07 ` Daniel Xu
  2023-02-27 21:34   ` Daniel Müller
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Xu @ 2023-02-27 20:07 UTC (permalink / raw)
  To: Daniel Müller; +Cc: lsf-pc, bpf

Hi Daniel,

On Mon, Feb 27, 2023 at 07:34:56PM +0000, Daniel Müller wrote:
> Symbolization of addresses is a commonly encountered problem, maybe most so in
> the context of BPF and tracing with the capturing of stack traces. Perhaps
> superficially straightforward-looking, there a variety of considerations and
> intricacies, such as:
> - different formats/standards (e.g., ELF symbol information, DWARF, GSYM) cater
>   to different use cases and require vastly different steps to work with
>   - on top of that, even if a library such as libelf or libdwarf is relied on,
>     plenty of format specific details need to be known to symbolize addresses
>     properly
> - discovery of symbolization sources (e.g., DWARF debug files)
> - symbolization trade-offs (performance, memory usage)
> - system-specific details and corner cases
> 
> We are working on blazesym [0], a library that aims to provide users with a
> batteries-included experience for symbolizing addresses (but also the reverse:
> mapping symbols to addresses).
> 
> We would like to provide a brief overview of the library and its goals and then
> open up for discussion. Some topics we are specifically interested in
> understanding better:
> - What are current issues with symbolization that would be great to support?
> - Does the usage of Rust pose a problem in your context? (C bindings are
>   available, but a Rust toolchain is required for building; are pre-built
>   binaries and packages for common distributions sufficient for your use cases?)
> 
> In general, we'd be interested in hearing your use cases and in discussing
> whether blazesym is a fit or could be made to work.

I didn't look super close at blazesym yet, but was wondering if it would
support a use case I have in mind.

Context is it's tricky to determine why a packet was dropped by kernel.
kfree_skb_reason() with caller address in `location` is a good start but
we can do better I think.

The issue is the call stack alone is not enough detail. I want to see
all the branches taken in the case a single call frame has multiple ways
to drop.

Vague idea is to use the recent LBR work (also haven't looked hard yet,
so this may not be possible) to take LBR stack at
`tracepoint:skb:kfree_skb` tracepoint. Then map the branches to line
numbers.

So my question is this: can/will blazesym be able to map kernel
addresses to line numbers / file names?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Batteries-included symbolization with blazesym
  2023-02-27 20:07 ` Daniel Xu
@ 2023-02-27 21:34   ` Daniel Müller
  2023-02-27 21:41     ` Daniel Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Müller @ 2023-02-27 21:34 UTC (permalink / raw)
  To: Daniel Xu; +Cc: lsf-pc, bpf

Hi Daniel,

On Mon, Feb 27, 2023 at 01:07:48PM -0700, Daniel Xu wrote:
> On Mon, Feb 27, 2023 at 07:34:56PM +0000, Daniel Müller wrote:
> > Symbolization of addresses is a commonly encountered problem, maybe most so in
> > the context of BPF and tracing with the capturing of stack traces. Perhaps
> > superficially straightforward-looking, there a variety of considerations and
> > intricacies, such as:
> > - different formats/standards (e.g., ELF symbol information, DWARF, GSYM) cater
> >   to different use cases and require vastly different steps to work with
> >   - on top of that, even if a library such as libelf or libdwarf is relied on,
> >     plenty of format specific details need to be known to symbolize addresses
> >     properly
> > - discovery of symbolization sources (e.g., DWARF debug files)
> > - symbolization trade-offs (performance, memory usage)
> > - system-specific details and corner cases
> > 
> > We are working on blazesym [0], a library that aims to provide users with a
> > batteries-included experience for symbolizing addresses (but also the reverse:
> > mapping symbols to addresses).
> > 
> > We would like to provide a brief overview of the library and its goals and then
> > open up for discussion. Some topics we are specifically interested in
> > understanding better:
> > - What are current issues with symbolization that would be great to support?
> > - Does the usage of Rust pose a problem in your context? (C bindings are
> >   available, but a Rust toolchain is required for building; are pre-built
> >   binaries and packages for common distributions sufficient for your use cases?)
> > 
> > In general, we'd be interested in hearing your use cases and in discussing
> > whether blazesym is a fit or could be made to work.
> 
> I didn't look super close at blazesym yet, but was wondering if it would
> support a use case I have in mind.
> 
> Context is it's tricky to determine why a packet was dropped by kernel.
> kfree_skb_reason() with caller address in `location` is a good start but
> we can do better I think.
> 
> The issue is the call stack alone is not enough detail. I want to see
> all the branches taken in the case a single call frame has multiple ways
> to drop.
> 
> Vague idea is to use the recent LBR work (also haven't looked hard yet,
> so this may not be possible) to take LBR stack at
> `tracepoint:skb:kfree_skb` tracepoint. Then map the branches to line
> numbers.
> 
> So my question is this: can/will blazesym be able to map kernel
> addresses to line numbers / file names?

Blazesym should be able to help with the symbolization aspect, yes. That is, it
can convert the addresses you captured into symbol name + source file + line
information as you asked for (you may need DWARF debug information for anything
beyond mere symbol names). In general, the library is able to handle both user
space and kernel addresses.

It is not designed, however, to help you capture those addresses. So how you get
them (e.g., using LBR as you mentioned) is up to you.

Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Batteries-included symbolization with blazesym
  2023-02-27 21:34   ` Daniel Müller
@ 2023-02-27 21:41     ` Daniel Xu
  2023-02-27 22:14       ` Daniel Müller
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Xu @ 2023-02-27 21:41 UTC (permalink / raw)
  To: Daniel Müller; +Cc: lsf-pc, bpf

Hi Daniel,

On Mon, Feb 27, 2023 at 09:34:30PM +0000, Daniel Müller wrote:
> Hi Daniel,
> 
> On Mon, Feb 27, 2023 at 01:07:48PM -0700, Daniel Xu wrote:
> > On Mon, Feb 27, 2023 at 07:34:56PM +0000, Daniel Müller wrote:
> > > Symbolization of addresses is a commonly encountered problem, maybe most so in
> > > the context of BPF and tracing with the capturing of stack traces. Perhaps
> > > superficially straightforward-looking, there a variety of considerations and
> > > intricacies, such as:
> > > - different formats/standards (e.g., ELF symbol information, DWARF, GSYM) cater
> > >   to different use cases and require vastly different steps to work with
> > >   - on top of that, even if a library such as libelf or libdwarf is relied on,
> > >     plenty of format specific details need to be known to symbolize addresses
> > >     properly
> > > - discovery of symbolization sources (e.g., DWARF debug files)
> > > - symbolization trade-offs (performance, memory usage)
> > > - system-specific details and corner cases
> > > 
> > > We are working on blazesym [0], a library that aims to provide users with a
> > > batteries-included experience for symbolizing addresses (but also the reverse:
> > > mapping symbols to addresses).
> > > 
> > > We would like to provide a brief overview of the library and its goals and then
> > > open up for discussion. Some topics we are specifically interested in
> > > understanding better:
> > > - What are current issues with symbolization that would be great to support?
> > > - Does the usage of Rust pose a problem in your context? (C bindings are
> > >   available, but a Rust toolchain is required for building; are pre-built
> > >   binaries and packages for common distributions sufficient for your use cases?)
> > > 
> > > In general, we'd be interested in hearing your use cases and in discussing
> > > whether blazesym is a fit or could be made to work.
> > 
> > I didn't look super close at blazesym yet, but was wondering if it would
> > support a use case I have in mind.
> > 
> > Context is it's tricky to determine why a packet was dropped by kernel.
> > kfree_skb_reason() with caller address in `location` is a good start but
> > we can do better I think.
> > 
> > The issue is the call stack alone is not enough detail. I want to see
> > all the branches taken in the case a single call frame has multiple ways
> > to drop.
> > 
> > Vague idea is to use the recent LBR work (also haven't looked hard yet,
> > so this may not be possible) to take LBR stack at
> > `tracepoint:skb:kfree_skb` tracepoint. Then map the branches to line
> > numbers.
> > 
> > So my question is this: can/will blazesym be able to map kernel
> > addresses to line numbers / file names?
> 
> Blazesym should be able to help with the symbolization aspect, yes. That is, it
> can convert the addresses you captured into symbol name + source file + line
> information as you asked for (you may need DWARF debug information for anything
> beyond mere symbol names). In general, the library is able to handle both user
> space and kernel addresses.

Awesome, sounds great. After looking slightly more carefully, how about
split debug info support and debuginfod support? Extremely unlikely
anybody ships production kernels with debug symbols. But debuginfod
service is more likely.

> It is not designed, however, to help you capture those addresses. So how you get
> them (e.g., using LBR as you mentioned) is up to you.

Makes sense.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Batteries-included symbolization with blazesym
  2023-02-27 21:41     ` Daniel Xu
@ 2023-02-27 22:14       ` Daniel Müller
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Müller @ 2023-02-27 22:14 UTC (permalink / raw)
  To: Daniel Xu; +Cc: lsf-pc, bpf

Hi Daniel,

On Mon, Feb 27, 2023 at 02:41:15PM -0700, Daniel Xu wrote:
> On Mon, Feb 27, 2023 at 09:34:30PM +0000, Daniel Müller wrote:
> > On Mon, Feb 27, 2023 at 01:07:48PM -0700, Daniel Xu wrote:
> > > So my question is this: can/will blazesym be able to map kernel
> > > addresses to line numbers / file names?
> > 
> > Blazesym should be able to help with the symbolization aspect, yes. That is, it
> > can convert the addresses you captured into symbol name + source file + line
> > information as you asked for (you may need DWARF debug information for anything
> > beyond mere symbol names). In general, the library is able to handle both user
> > space and kernel addresses.
> 
> Awesome, sounds great. After looking slightly more carefully, how about
> split debug info support and debuginfod support? Extremely unlikely
> anybody ships production kernels with debug symbols. But debuginfod
> service is more likely.

Good questions. Split debug information is definitely something we want to
support out of the box, but we still lack such support at this point (it's still
somewhat early days).

Regarding debuginfod, we had some discussions about it in the past and it is
also something we are interested in supporting in some form. The way it will
most likely work is that the library will provide an interface that accepts a
callback that is invoked as part of the symbolization process and which allows
the user to fetch debug info based on data such as the build ID of a binary
(passed to the callback).
So it will likely be up to the user to make an HTTP request to a debuginfod
instance and fetch the data. Once that is done (and the callback returns)
blazesym would take over again and use that debug information to complete the
symbolization request. (We may provide a default implementation for such a
callback that does all the heavy lifting; in the batteries-included spirit)

Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-02-27 22:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-27 19:34 [LSF/MM/BPF TOPIC] Batteries-included symbolization with blazesym Daniel Müller
2023-02-27 20:07 ` Daniel Xu
2023-02-27 21:34   ` Daniel Müller
2023-02-27 21:41     ` Daniel Xu
2023-02-27 22:14       ` Daniel Müller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).