bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Question: missing vmlinux BTF variable declarations
@ 2022-03-09 23:20 Stephen Brennan
  2022-03-14  7:09 ` Shung-Hsi Yu
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Brennan @ 2022-03-09 23:20 UTC (permalink / raw)
  To: bpf; +Cc: Omar Sandoval, Arnaldo Carvalho de Melo

Hello everyone,

I've been recently learning about BTF with a keen interest in using it
as a fallback source of debug information. On the face of it, Linux
kernels these days have a lot of introspection information. BTF provides
information about types. kallsyms provides information about symbol
locations. ORC allows us to reliably unwind stack traces. So together,
these could enable a debugger (either postmortem, or live) to do a lot
without needing to read the (very large) DWARF debuginfo files. For
example, we could format backtraces with function names, we could
pretty-print global variables and data structures, etc. This is nice
given that depending on your distro, it might be tough to get debuginfo,
and it is quite large to download or install.

As I've worked toward this goal, I discovered that while the
BTF_KIND_VAR exists [1], the BTF included in the core kernel only has
declarations for percpu variables. This makes BTF much less useful for
this (admittedly odd) use case. Without a way to bind a name found in
kallsyms to its type, we can't interpret global variables. It looks like
the restriction for percpu-only variables is baked into the pahole BTF
encoder [2].

[1]: https://www.kernel.org/doc/html/latest/bpf/btf.html#btf-kind-var
[2]: https://github.com/acmel/dwarves/blob/master/btf_encoder.c

I wonder what the BPF / BTF community's thoughts are on including more
of these global variable declarations? Perhaps behind a
CONFIG_DEBUG_INFO_BTF_ALL, like how kallsyms does it? I'm aware that
each declaration costs at least 16 bytes of BTF records, plus the
strings and any necessary type data. The string cost could be mitigated
by allowing "name_off" to refer to the kallsyms offset for variable or
function declaration. But the additional records could cost around 1MiB
for common distribution configurations.

I know this isn't the designed use case for BTF, but I think it's very
exciting.

Thanks for your attention!
Stephen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-03-09 23:20 Question: missing vmlinux BTF variable declarations Stephen Brennan
@ 2022-03-14  7:09 ` Shung-Hsi Yu
  2022-03-15  5:53   ` Yonghong Song
  0 siblings, 1 reply; 13+ messages in thread
From: Shung-Hsi Yu @ 2022-03-14  7:09 UTC (permalink / raw)
  To: Stephen Brennan; +Cc: bpf, Omar Sandoval, Arnaldo Carvalho de Melo

On Wed, Mar 09, 2022 at 03:20:47PM -0800, Stephen Brennan wrote:
> Hello everyone,
> 
> I've been recently learning about BTF with a keen interest in using it
> as a fallback source of debug information. On the face of it, Linux
> kernels these days have a lot of introspection information. BTF provides
> information about types. kallsyms provides information about symbol
> locations. ORC allows us to reliably unwind stack traces. So together,
> these could enable a debugger (either postmortem, or live) to do a lot
> without needing to read the (very large) DWARF debuginfo files. For
> example, we could format backtraces with function names, we could
> pretty-print global variables and data structures, etc. This is nice
> given that depending on your distro, it might be tough to get debuginfo,
> and it is quite large to download or install.
> 
> As I've worked toward this goal, I discovered that while the
> BTF_KIND_VAR exists [1], the BTF included in the core kernel only has
> declarations for percpu variables. This makes BTF much less useful for
> this (admittedly odd) use case. Without a way to bind a name found in
> kallsyms to its type, we can't interpret global variables. It looks like
> the restriction for percpu-only variables is baked into the pahole BTF
> encoder [2].
> 
> [1]: https://www.kernel.org/doc/html/latest/bpf/btf.html#btf-kind-var
> [2]: https://github.com/acmel/dwarves/blob/master/btf_encoder.c
> 
> I wonder what the BPF / BTF community's thoughts are on including more
> of these global variable declarations? Perhaps behind a
> CONFIG_DEBUG_INFO_BTF_ALL, like how kallsyms does it? I'm aware that
> each declaration costs at least 16 bytes of BTF records, plus the
> strings and any necessary type data. The string cost could be mitigated
> by allowing "name_off" to refer to the kallsyms offset for variable or
> function declaration. But the additional records could cost around 1MiB
> for common distribution configurations.
> 
> I know this isn't the designed use case for BTF, but I think it's very
> exciting.

I've been wondering about the same (possibility of using BTF for postmortem
debugging without debuginfo), though not to the extend that you've
researched.

I find the idea exciting as well, and quite useful for distros where the
kernel package changes quite often that the debuginfo package may be long
gone by the time a crash dump for such kernel is captured.

Shung-Hsi

> Thanks for your attention!
> Stephen


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-03-14  7:09 ` Shung-Hsi Yu
@ 2022-03-15  5:53   ` Yonghong Song
  2022-03-15 16:37     ` Stephen Brennan
  0 siblings, 1 reply; 13+ messages in thread
From: Yonghong Song @ 2022-03-15  5:53 UTC (permalink / raw)
  To: Shung-Hsi Yu, Stephen Brennan
  Cc: bpf, Omar Sandoval, Arnaldo Carvalho de Melo



On 3/14/22 12:09 AM, Shung-Hsi Yu wrote:
> On Wed, Mar 09, 2022 at 03:20:47PM -0800, Stephen Brennan wrote:
>> Hello everyone,
>>
>> I've been recently learning about BTF with a keen interest in using it
>> as a fallback source of debug information. On the face of it, Linux
>> kernels these days have a lot of introspection information. BTF provides
>> information about types. kallsyms provides information about symbol
>> locations. ORC allows us to reliably unwind stack traces. So together,
>> these could enable a debugger (either postmortem, or live) to do a lot
>> without needing to read the (very large) DWARF debuginfo files. For
>> example, we could format backtraces with function names, we could

For backtraces with function names, you probably still need ksyms since
BTF won't encode address => symbol translation.

>> pretty-print global variables and data structures, etc. This is nice

This indeed is a potential use case.
We discussed this during adding per-cpu
global variables. Ultimately we just added per-cpu global variables 
since we didn't have a use case or request for other global variables.

But I still would like to know beyond this whether you have other needs
which BPF may or may not help. It would be good to know since if 
ultimately you still need dwarf, then it might be undesirable to
add general global variables to BTF.

>> given that depending on your distro, it might be tough to get debuginfo,
>> and it is quite large to download or install.
>>
>> As I've worked toward this goal, I discovered that while the
>> BTF_KIND_VAR exists [1], the BTF included in the core kernel only has
>> declarations for percpu variables. This makes BTF much less useful for
>> this (admittedly odd) use case. Without a way to bind a name found in
>> kallsyms to its type, we can't interpret global variables. It looks like
>> the restriction for percpu-only variables is baked into the pahole BTF
>> encoder [2].
>>
>> [1]: https://www.kernel.org/doc/html/latest/bpf/btf.html#btf-kind-var
>> [2]: https://github.com/acmel/dwarves/blob/master/btf_encoder.c
>>
>> I wonder what the BPF / BTF community's thoughts are on including more
>> of these global variable declarations? Perhaps behind a
>> CONFIG_DEBUG_INFO_BTF_ALL, like how kallsyms does it? I'm aware that

Currently on my local machine, the vmlinux BTF's size is 4.2MB and
adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good
idea. But we might be able to just add global variables without this
new config if we have strong use case.


>> each declaration costs at least 16 bytes of BTF records, plus the
>> strings and any necessary type data. The string cost could be mitigated
>> by allowing "name_off" to refer to the kallsyms offset for variable or
>> function declaration. But the additional records could cost around 1MiB
>> for common distribution configurations.
>>
>> I know this isn't the designed use case for BTF, but I think it's very
>> exciting.
> 
> I've been wondering about the same (possibility of using BTF for postmortem
> debugging without debuginfo), though not to the extend that you've
> researched.
> 
> I find the idea exciting as well, and quite useful for distros where the
> kernel package changes quite often that the debuginfo package may be long
> gone by the time a crash dump for such kernel is captured.

I would love to use BTF (including global variables in BTF) for crash 
dump. But I suspect we may still have some gaps. Maybe you can
explore a little bit more on this?

> 
> Shung-Hsi
> 
>> Thanks for your attention!
>> Stephen
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-03-15  5:53   ` Yonghong Song
@ 2022-03-15 16:37     ` Stephen Brennan
  2022-03-15 17:58       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Brennan @ 2022-03-15 16:37 UTC (permalink / raw)
  To: Yonghong Song, Shung-Hsi Yu
  Cc: bpf, Omar Sandoval, Arnaldo Carvalho de Melo, Stephen Brennan

Yonghong Song <yhs@fb.com> writes:
> On 3/14/22 12:09 AM, Shung-Hsi Yu wrote:
>> On Wed, Mar 09, 2022 at 03:20:47PM -0800, Stephen Brennan wrote:
>>> Hello everyone,
>>>
>>> I've been recently learning about BTF with a keen interest in using it
>>> as a fallback source of debug information. On the face of it, Linux
>>> kernels these days have a lot of introspection information. BTF provides
>>> information about types. kallsyms provides information about symbol
>>> locations. ORC allows us to reliably unwind stack traces. So together,
>>> these could enable a debugger (either postmortem, or live) to do a lot
>>> without needing to read the (very large) DWARF debuginfo files. For
>>> example, we could format backtraces with function names, we could
>
> For backtraces with function names, you probably still need ksyms since
> BTF won't encode address => symbol translation.

Yes, kallsyms is definitely required in this scheme. In practice, it
seems very common for distributions to be compiled not just with
CONFIG_KALLSYMS, but CONFIG_KALLSYMS_ALL.

Kallsyms is critical for mapping names to addresses (and vice versa).

>
>>> pretty-print global variables and data structures, etc. This is nice
>
> This indeed is a potential use case.
> We discussed this during adding per-cpu
> global variables. Ultimately we just added per-cpu global variables 
> since we didn't have a use case or request for other global variables.
>
> But I still would like to know beyond this whether you have other needs
> which BPF may or may not help. It would be good to know since if 
> ultimately you still need dwarf, then it might be undesirable to
> add general global variables to BTF.

I think that kallsyms, BTF, and ORC together will be enough to provide a
lite debugging experience. Some things will be missing:

- mapping backtrace addresses to source code lines
- intelligent stack frame information from DWARF CFI (e.g.
  register/variable values)
- probably other things, I'm not a DWARF expert.

However, I do have two interesting branches of drgn which demonstrate
the utility of just BTF+kallsyms:

1. https://github.com/osandov/drgn/pull/162
2. https://github.com/brenns10/drgn/tree/kallsyms_plus_btf

#1 adds preliminary BTF support, and #2 adds basic kallsyms support,
building on #1. Finally, I have some unpublished patches which add some
symbols into vmcoreinfo, which help us locate kallsyms info. From there,
drgn is able to take a core dump, and lookup symbols and get their
corresponding type info!

The only real blocker I see here is that the BTF data is mainly limited
to functions, so most of what you're doing is looking up function names
and viewing their signatures :)

>
>>> given that depending on your distro, it might be tough to get debuginfo,
>>> and it is quite large to download or install.
>>>
>>> As I've worked toward this goal, I discovered that while the
>>> BTF_KIND_VAR exists [1], the BTF included in the core kernel only has
>>> declarations for percpu variables. This makes BTF much less useful for
>>> this (admittedly odd) use case. Without a way to bind a name found in
>>> kallsyms to its type, we can't interpret global variables. It looks like
>>> the restriction for percpu-only variables is baked into the pahole BTF
>>> encoder [2].
>>>
>>> [1]: https://www.kernel.org/doc/html/latest/bpf/btf.html#btf-kind-var
>>> [2]: https://github.com/acmel/dwarves/blob/master/btf_encoder.c
>>>
>>> I wonder what the BPF / BTF community's thoughts are on including more
>>> of these global variable declarations? Perhaps behind a
>>> CONFIG_DEBUG_INFO_BTF_ALL, like how kallsyms does it? I'm aware that
>
> Currently on my local machine, the vmlinux BTF's size is 4.2MB and
> adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good
> idea. But we might be able to just add global variables without this
> new config if we have strong use case.

And unfortunately 1MiB is really just a shot in the dark, guessing
around 70k variables with no string data.

I'd love to use kallsyms to avoid adding new strings into BTF. If the
"all variables BTF" config added a dependency on "CONFIG_KALLSYMS_ALL",
then we could use the BTF "kind_flag" to indicate that string values
should be looked up in the kallsyms table, not the BTF strings section.
This could even be used to reduce the string footprint for BTF
function names.

Of course it's a more complex change to dwarves :(

>
>>> each declaration costs at least 16 bytes of BTF records, plus the
>>> strings and any necessary type data. The string cost could be mitigated
>>> by allowing "name_off" to refer to the kallsyms offset for variable or
>>> function declaration. But the additional records could cost around 1MiB
>>> for common distribution configurations.
>>>
>>> I know this isn't the designed use case for BTF, but I think it's very
>>> exciting.
>> 
>> I've been wondering about the same (possibility of using BTF for postmortem
>> debugging without debuginfo), though not to the extend that you've
>> researched.
>> 
>> I find the idea exciting as well, and quite useful for distros where the
>> kernel package changes quite often that the debuginfo package may be long
>> gone by the time a crash dump for such kernel is captured.
>
> I would love to use BTF (including global variables in BTF) for crash 
> dump. But I suspect we may still have some gaps. Maybe you can
> explore a little bit more on this?

Hopefully my above explanation gives more context here. There is code
(not production-ready) which can make use of these features together.
The next step for me has been trying to get the dwarves/pahole BTF
encoder to output *all* functions but I've hit some issues with it. If I
can get that to work, then I can present a full demo of these pieces
working together and we can be confident that there are no gaps.

Maybe this is a topic worth discussing at LSF/MM/BPF conference? Though
it's quite late for that...

Thanks,
Stephen

>
>> 
>> Shung-Hsi
>> 
>>> Thanks for your attention!
>>> Stephen
>> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-03-15 16:37     ` Stephen Brennan
@ 2022-03-15 17:58       ` Arnaldo Carvalho de Melo
  2022-03-16 16:06         ` Stephen Brennan
  0 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-15 17:58 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Yonghong Song, Shung-Hsi Yu, bpf, Omar Sandoval,
	Arnaldo Carvalho de Melo, Stephen Brennan

Em Tue, Mar 15, 2022 at 09:37:46AM -0700, Stephen Brennan escreveu: Yonghong Song <yhs@fb.com> writes:
> > On 3/14/22 12:09 AM, Shung-Hsi Yu wrote:
> >> On Wed, Mar 09, 2022 at 03:20:47PM -0800, Stephen Brennan wrote:
> >>> I've been recently learning about BTF with a keen interest in using it
> >>> as a fallback source of debug information. On the face of it, Linux
> >>> kernels these days have a lot of introspection information. BTF provides
> >>> information about types. kallsyms provides information about symbol
> >>> locations. ORC allows us to reliably unwind stack traces. So together,
> >>> these could enable a debugger (either postmortem, or live) to do a lot
> >>> without needing to read the (very large) DWARF debuginfo files. For
> >>> example, we could format backtraces with function names, we could

> > For backtraces with function names, you probably still need ksyms since
> > BTF won't encode address => symbol translation.
 
> Yes, kallsyms is definitely required in this scheme. In practice, it
> seems very common for distributions to be compiled not just with
> CONFIG_KALLSYMS, but CONFIG_KALLSYMS_ALL.
 
> Kallsyms is critical for mapping names to addresses (and vice versa).
 
> >>> pretty-print global variables and data structures, etc. This is nice

> > This indeed is a potential use case.
> > We discussed this during adding per-cpu
> > global variables. Ultimately we just added per-cpu global variables 
> > since we didn't have a use case or request for other global variables.

> > But I still would like to know beyond this whether you have other needs
> > which BPF may or may not help. It would be good to know since if 
> > ultimately you still need dwarf, then it might be undesirable to
> > add general global variables to BTF.

> I think that kallsyms, BTF, and ORC together will be enough to provide a
> lite debugging experience. Some things will be missing:

> - mapping backtrace addresses to source code lines

So, BTF has provisions for that, and its present in the eBPF programs,
perf annotate uses it, see tools/perf/util/annotate.c,
symbol__disassemble_bpf(), it goes like:

        struct bpf_prog_linfo *prog_linfo = NULL;

        info_node = perf_env__find_bpf_prog_info(dso->bpf_prog.env,
                                                 dso->bpf_prog.id);
        if (!info_node) {
                ret = SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF;
                goto out;
        }
        info_linear = info_node->info_linear;
        sub_id = dso->bpf_prog.sub_id;

        info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns);
        info.buffer_length = info_linear->info.jited_prog_len;

        if (info_linear->info.nr_line_info)
                prog_linfo = bpf_prog_linfo__new(&info_linear->info);

                addr = pc + ((u64 *)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id];
                count = disassemble(pc, &info);

                if (prog_linfo)
                        linfo = bpf_prog_linfo__lfind_addr_func(prog_linfo,
                                                                addr, sub_id,
                                                                nr_skip);
		                if (linfo && btf) {
                        srcline = btf__name_by_offset(btf, linfo->line_off);
                        nr_skip++;
                } else
                        srcline = NULL;

etc.

Having this for the kernel proper is thus doable, but then we go on
making BTF info grow.

Perhaps having this as optional, distros or appliances wanting to have a
kernel with this extra info would add it and then tools would use it if
available?

> - intelligent stack frame information from DWARF CFI (e.g.
>   register/variable values)
> - probably other things, I'm not a DWARF expert.
 
> However, I do have two interesting branches of drgn which demonstrate
> the utility of just BTF+kallsyms:
 
> 1. https://github.com/osandov/drgn/pull/162
> 2. https://github.com/brenns10/drgn/tree/kallsyms_plus_btf
 
> #1 adds preliminary BTF support, and #2 adds basic kallsyms support,
> building on #1. Finally, I have some unpublished patches which add some
> symbols into vmcoreinfo, which help us locate kallsyms info. From there,
> drgn is able to take a core dump, and lookup symbols and get their
> corresponding type info!
 
> The only real blocker I see here is that the BTF data is mainly limited
> to functions, so most of what you're doing is looking up function names
> and viewing their signatures :)
 
> >>> given that depending on your distro, it might be tough to get debuginfo,
> >>> and it is quite large to download or install.
> >>>
> >>> As I've worked toward this goal, I discovered that while the
> >>> BTF_KIND_VAR exists [1], the BTF included in the core kernel only has
> >>> declarations for percpu variables. This makes BTF much less useful for
> >>> this (admittedly odd) use case. Without a way to bind a name found in
> >>> kallsyms to its type, we can't interpret global variables. It looks like
> >>> the restriction for percpu-only variables is baked into the pahole BTF
> >>> encoder [2].

> >>> [1]: https://www.kernel.org/doc/html/latest/bpf/btf.html#btf-kind-var
> >>> [2]: https://github.com/acmel/dwarves/blob/master/btf_encoder.c

> >>> I wonder what the BPF / BTF community's thoughts are on including more
> >>> of these global variable declarations? Perhaps behind a
> >>> CONFIG_DEBUG_INFO_BTF_ALL, like how kallsyms does it? I'm aware that

> > Currently on my local machine, the vmlinux BTF's size is 4.2MB and
> > adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good
> > idea. But we might be able to just add global variables without this
> > new config if we have strong use case.
 
> And unfortunately 1MiB is really just a shot in the dark, guessing
> around 70k variables with no string data.

Maybe we can have a separate BTF file with all this extra info that
could be fetched from somewhere, keyed by build-id, like is now possible
with debuginfod and DWARF?
 
> I'd love to use kallsyms to avoid adding new strings into BTF. If the
> "all variables BTF" config added a dependency on "CONFIG_KALLSYMS_ALL",
> then we could use the BTF "kind_flag" to indicate that string values
> should be looked up in the kallsyms table, not the BTF strings section.
> This could even be used to reduce the string footprint for BTF
> function names.
 
> Of course it's a more complex change to dwarves :(
 
> >>> each declaration costs at least 16 bytes of BTF records, plus the
> >>> strings and any necessary type data. The string cost could be mitigated
> >>> by allowing "name_off" to refer to the kallsyms offset for variable or
> >>> function declaration. But the additional records could cost around 1MiB
> >>> for common distribution configurations.
> >>>
> >>> I know this isn't the designed use case for BTF, but I think it's very
> >>> exciting.
> >> 
> >> I've been wondering about the same (possibility of using BTF for postmortem
> >> debugging without debuginfo), though not to the extend that you've
> >> researched.
> >> 
> >> I find the idea exciting as well, and quite useful for distros where the
> >> kernel package changes quite often that the debuginfo package may be long
> >> gone by the time a crash dump for such kernel is captured.
> >
> > I would love to use BTF (including global variables in BTF) for crash 
> > dump. But I suspect we may still have some gaps. Maybe you can
> > explore a little bit more on this?
> 
> Hopefully my above explanation gives more context here. There is code
> (not production-ready) which can make use of these features together.
> The next step for me has been trying to get the dwarves/pahole BTF
> encoder to output *all* functions but I've hit some issues with it. If I
> can get that to work, then I can present a full demo of these pieces
> working together and we can be confident that there are no gaps.
> 
> Maybe this is a topic worth discussing at LSF/MM/BPF conference? Though
> it's quite late for that...
> 
> Thanks,
> Stephen
> 
> >
> >> 
> >> Shung-Hsi
> >> 
> >>> Thanks for your attention!
> >>> Stephen
> >> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-03-15 17:58       ` Arnaldo Carvalho de Melo
@ 2022-03-16 16:06         ` Stephen Brennan
  2022-03-25 17:07           ` Andrii Nakryiko
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Brennan @ 2022-03-16 16:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Yonghong Song, Shung-Hsi Yu, bpf, Omar Sandoval,
	Arnaldo Carvalho de Melo

Arnaldo Carvalho de Melo <acme@kernel.org> writes:
[...]
>> I think that kallsyms, BTF, and ORC together will be enough to provide a
>> lite debugging experience. Some things will be missing:
>
>> - mapping backtrace addresses to source code lines
>
> So, BTF has provisions for that, and its present in the eBPF programs,
> perf annotate uses it, see tools/perf/util/annotate.c,
> symbol__disassemble_bpf(), it goes like:
>
>         struct bpf_prog_linfo *prog_linfo = NULL;
>
>         info_node = perf_env__find_bpf_prog_info(dso->bpf_prog.env,
>                                                  dso->bpf_prog.id);
>         if (!info_node) {
>                 ret = SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF;
>                 goto out;
>         }
>         info_linear = info_node->info_linear;
>         sub_id = dso->bpf_prog.sub_id;
>
>         info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns);
>         info.buffer_length = info_linear->info.jited_prog_len;
>
>         if (info_linear->info.nr_line_info)
>                 prog_linfo = bpf_prog_linfo__new(&info_linear->info);
>
>                 addr = pc + ((u64 *)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id];
>                 count = disassemble(pc, &info);
>
>                 if (prog_linfo)
>                         linfo = bpf_prog_linfo__lfind_addr_func(prog_linfo,
>                                                                 addr, sub_id,
>                                                                 nr_skip);
> 		                if (linfo && btf) {
>                         srcline = btf__name_by_offset(btf, linfo->line_off);
>                         nr_skip++;
>                 } else
>                         srcline = NULL;
>
> etc.
>
> Having this for the kernel proper is thus doable, but then we go on
> making BTF info grow.
>
> Perhaps having this as optional, distros or appliances wanting to have a
> kernel with this extra info would add it and then tools would use it if
> available?

I didn't know about the source code mapping support! And I certainly see
the utility of it for BPF programs. However, I'm not sure that a "lite"
kernel debugging experience *needs* source line mapping. I suppose I
should have made it more clear, but I don't think of that list of
"missing" features as a checklist of things we'd want feature parity
for.

The advantage of BTF for debugging would be that it is small, and that
it is part of the kernel image without referencing any other file,
build-id, or kernel version. Ideally, a debugger could load a crash dump
with no additional information, and support a reasonable level of
debugging. I think looking up typed data structure values via global
symbols is part of that level, as well as simple backtraces and other
memory access.

I wouldn't want to try to re-implement DWARF for debuginfo. If you have
the DWARF debuginfo, then your experience should be much better.

>> - intelligent stack frame information from DWARF CFI (e.g.
>>   register/variable values)
>> - probably other things, I'm not a DWARF expert.
[...]
>> > Currently on my local machine, the vmlinux BTF's size is 4.2MB and
>> > adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good
>> > idea. But we might be able to just add global variables without this
>> > new config if we have strong use case.
>  
>> And unfortunately 1MiB is really just a shot in the dark, guessing
>> around 70k variables with no string data.
>
> Maybe we can have a separate BTF file with all this extra info that
> could be fetched from somewhere, keyed by build-id, like is now possible
> with debuginfod and DWARF?

For me, this ranges into the territory of duplicating DWARF. If you lose
the one key advantage of "debuginfoless debugging", then you might as
well use the build-id to lookup DWARF debuginfo as we can today.

This is why I'm trying to propose the means of combining the kallsyms
string data with BTF. Anything that can make the overall size increase
manageable so that all the necessary data can stay in the kernel image.

Thanks,
Stephen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-03-16 16:06         ` Stephen Brennan
@ 2022-03-25 17:07           ` Andrii Nakryiko
  2022-04-27 18:24             ` Stephen Brennan
  0 siblings, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2022-03-25 17:07 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Arnaldo Carvalho de Melo, Yonghong Song, Shung-Hsi Yu, bpf,
	Omar Sandoval, Arnaldo Carvalho de Melo

On Wed, Mar 16, 2022 at 11:11 PM Stephen Brennan <stephen@brennan.io> wrote:
>
> Arnaldo Carvalho de Melo <acme@kernel.org> writes:
> [...]
> >> I think that kallsyms, BTF, and ORC together will be enough to provide a
> >> lite debugging experience. Some things will be missing:
> >
> >> - mapping backtrace addresses to source code lines
> >
> > So, BTF has provisions for that, and its present in the eBPF programs,
> > perf annotate uses it, see tools/perf/util/annotate.c,
> > symbol__disassemble_bpf(), it goes like:
> >
> >         struct bpf_prog_linfo *prog_linfo = NULL;
> >
> >         info_node = perf_env__find_bpf_prog_info(dso->bpf_prog.env,
> >                                                  dso->bpf_prog.id);
> >         if (!info_node) {
> >                 ret = SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF;
> >                 goto out;
> >         }
> >         info_linear = info_node->info_linear;
> >         sub_id = dso->bpf_prog.sub_id;
> >
> >         info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns);
> >         info.buffer_length = info_linear->info.jited_prog_len;
> >
> >         if (info_linear->info.nr_line_info)
> >                 prog_linfo = bpf_prog_linfo__new(&info_linear->info);
> >
> >                 addr = pc + ((u64 *)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id];
> >                 count = disassemble(pc, &info);
> >
> >                 if (prog_linfo)
> >                         linfo = bpf_prog_linfo__lfind_addr_func(prog_linfo,
> >                                                                 addr, sub_id,
> >                                                                 nr_skip);
> >                               if (linfo && btf) {
> >                         srcline = btf__name_by_offset(btf, linfo->line_off);
> >                         nr_skip++;
> >                 } else
> >                         srcline = NULL;
> >
> > etc.
> >
> > Having this for the kernel proper is thus doable, but then we go on
> > making BTF info grow.
> >
> > Perhaps having this as optional, distros or appliances wanting to have a
> > kernel with this extra info would add it and then tools would use it if
> > available?
>
> I didn't know about the source code mapping support! And I certainly see
> the utility of it for BPF programs. However, I'm not sure that a "lite"
> kernel debugging experience *needs* source line mapping. I suppose I
> should have made it more clear, but I don't think of that list of
> "missing" features as a checklist of things we'd want feature parity
> for.
>
> The advantage of BTF for debugging would be that it is small, and that
> it is part of the kernel image without referencing any other file,
> build-id, or kernel version. Ideally, a debugger could load a crash dump
> with no additional information, and support a reasonable level of
> debugging. I think looking up typed data structure values via global
> symbols is part of that level, as well as simple backtraces and other
> memory access.
>
> I wouldn't want to try to re-implement DWARF for debuginfo. If you have
> the DWARF debuginfo, then your experience should be much better.
>
> >> - intelligent stack frame information from DWARF CFI (e.g.
> >>   register/variable values)
> >> - probably other things, I'm not a DWARF expert.
> [...]
> >> > Currently on my local machine, the vmlinux BTF's size is 4.2MB and
> >> > adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good
> >> > idea. But we might be able to just add global variables without this
> >> > new config if we have strong use case.
> >
> >> And unfortunately 1MiB is really just a shot in the dark, guessing
> >> around 70k variables with no string data.
> >
> > Maybe we can have a separate BTF file with all this extra info that
> > could be fetched from somewhere, keyed by build-id, like is now possible
> > with debuginfod and DWARF?
>
> For me, this ranges into the territory of duplicating DWARF. If you lose
> the one key advantage of "debuginfoless debugging", then you might as
> well use the build-id to lookup DWARF debuginfo as we can today.
>
> This is why I'm trying to propose the means of combining the kallsyms
> string data with BTF. Anything that can make the overall size increase
> manageable so that all the necessary data can stay in the kernel image.

I think this quirk of using kallsyms strings is a no-go. But we should
experiment and see how much bigger BTF becomes when including all the
variables. Can you try to prototype pahole's support for this? As you
said, we can guard this extra information with KConfig and pahole
flags, so distros can always opt-out of bigger BTF if that's too
prohibitive. As it is right now, without firm understanding how big
the final BTF is it's hard to make a good decision about go or no-go
for this.

As for including source code itself, it going to be prohibitively
huge, so it's probably out of the question for now as well.

>
> Thanks,
> Stephen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-03-25 17:07           ` Andrii Nakryiko
@ 2022-04-27 18:24             ` Stephen Brennan
  2022-04-29 17:10               ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Brennan @ 2022-04-27 18:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Arnaldo Carvalho de Melo, Yonghong Song, Shung-Hsi Yu, bpf,
	Omar Sandoval, Arnaldo Carvalho de Melo

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Wed, Mar 16, 2022 at 11:11 PM Stephen Brennan <stephen@brennan.io> wrote:
>>
>> Arnaldo Carvalho de Melo <acme@kernel.org> writes:
>> [...]
>> >> I think that kallsyms, BTF, and ORC together will be enough to provide a
>> >> lite debugging experience. Some things will be missing:
>> >
>> >> - mapping backtrace addresses to source code lines
>> >
>> > So, BTF has provisions for that, and its present in the eBPF programs,
>> > perf annotate uses it, see tools/perf/util/annotate.c,
>> > symbol__disassemble_bpf(), it goes like:
>> >
>> >         struct bpf_prog_linfo *prog_linfo = NULL;
>> >
>> >         info_node = perf_env__find_bpf_prog_info(dso->bpf_prog.env,
>> >                                                  dso->bpf_prog.id);
>> >         if (!info_node) {
>> >                 ret = SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF;
>> >                 goto out;
>> >         }
>> >         info_linear = info_node->info_linear;
>> >         sub_id = dso->bpf_prog.sub_id;
>> >
>> >         info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns);
>> >         info.buffer_length = info_linear->info.jited_prog_len;
>> >
>> >         if (info_linear->info.nr_line_info)
>> >                 prog_linfo = bpf_prog_linfo__new(&info_linear->info);
>> >
>> >                 addr = pc + ((u64 *)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id];
>> >                 count = disassemble(pc, &info);
>> >
>> >                 if (prog_linfo)
>> >                         linfo = bpf_prog_linfo__lfind_addr_func(prog_linfo,
>> >                                                                 addr, sub_id,
>> >                                                                 nr_skip);
>> >                               if (linfo && btf) {
>> >                         srcline = btf__name_by_offset(btf, linfo->line_off);
>> >                         nr_skip++;
>> >                 } else
>> >                         srcline = NULL;
>> >
>> > etc.
>> >
>> > Having this for the kernel proper is thus doable, but then we go on
>> > making BTF info grow.
>> >
>> > Perhaps having this as optional, distros or appliances wanting to have a
>> > kernel with this extra info would add it and then tools would use it if
>> > available?
>>
>> I didn't know about the source code mapping support! And I certainly see
>> the utility of it for BPF programs. However, I'm not sure that a "lite"
>> kernel debugging experience *needs* source line mapping. I suppose I
>> should have made it more clear, but I don't think of that list of
>> "missing" features as a checklist of things we'd want feature parity
>> for.
>>
>> The advantage of BTF for debugging would be that it is small, and that
>> it is part of the kernel image without referencing any other file,
>> build-id, or kernel version. Ideally, a debugger could load a crash dump
>> with no additional information, and support a reasonable level of
>> debugging. I think looking up typed data structure values via global
>> symbols is part of that level, as well as simple backtraces and other
>> memory access.
>>
>> I wouldn't want to try to re-implement DWARF for debuginfo. If you have
>> the DWARF debuginfo, then your experience should be much better.
>>
>> >> - intelligent stack frame information from DWARF CFI (e.g.
>> >>   register/variable values)
>> >> - probably other things, I'm not a DWARF expert.
>> [...]
>> >> > Currently on my local machine, the vmlinux BTF's size is 4.2MB and
>> >> > adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good
>> >> > idea. But we might be able to just add global variables without this
>> >> > new config if we have strong use case.
>> >
>> >> And unfortunately 1MiB is really just a shot in the dark, guessing
>> >> around 70k variables with no string data.
>> >
>> > Maybe we can have a separate BTF file with all this extra info that
>> > could be fetched from somewhere, keyed by build-id, like is now possible
>> > with debuginfod and DWARF?
>>
>> For me, this ranges into the territory of duplicating DWARF. If you lose
>> the one key advantage of "debuginfoless debugging", then you might as
>> well use the build-id to lookup DWARF debuginfo as we can today.
>>
>> This is why I'm trying to propose the means of combining the kallsyms
>> string data with BTF. Anything that can make the overall size increase
>> manageable so that all the necessary data can stay in the kernel image.
>
> I think this quirk of using kallsyms strings is a no-go. But we should
> experiment and see how much bigger BTF becomes when including all the
> variables. Can you try to prototype pahole's support for this?

Hi Andrii,

Sorry for such a delay here. I tried to prototype this last month but
encountered some issues I couldn't resolve. But recently I picked it up
and I've created a prototype [1] which outputs all variables. (It's a
quite bad prototype, it strips out some useful logic regarding the
BTF_VAR_DATASEC for percpu variables. But I think it's good enough).

On my 5.4-based kernel I saw an increase in BTF section size from 3.8
MiB all the way to 6.1 MiB, or more precisely:

BTF section before: 3905938 bytes
BTF section after:  6391989 bytes (+2486051, +63.6%)

So almost a 2.5 MiB increase. My prototype doesn't output the
btf_var_secinfo structs for percpu variables anymore, which probably
breaks some BPF and reduces BTF slightly. But it also is outputting
a few thousand "dwarf variables" which were correctly filtered before,
so I think it's a wash and it's a pretty good comparison.

Clearly it can't be added without a configuration option, as 2.5 MiB is
pretty huge for a kernel memory addition. But I don't think it's so huge
that nobody would enable it. I know I would :)

[1]: https://github.com/brenns10/dwarves/tree/remove_percpu_restriction_1

> As you
> said, we can guard this extra information with KConfig and pahole
> flags, so distros can always opt-out of bigger BTF if that's too
> prohibitive. As it is right now, without firm understanding how big
> the final BTF is it's hard to make a good decision about go or no-go
> for this.

Hopefully this comparison sheds some light on that now!

>
> As for including source code itself, it going to be prohibitively
> huge, so it's probably out of the question for now as well.

Yeah, I wouldn't advocate for that.

Now, to share some of the cool possibilities that this enables. I have:
- prototype pahole [1] used for the kernel build,
- a prototype drgn with BTF+kallsyms support [2],
- some small kernel patches which add symbols to vmcoreinfo, so that
  drgn can find the kallsyms section. I'm happy to share these, I just
  haven't sent them anywhere yet.

[2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf

Combining these three things, I've got a debugger which can open up a
vmcore _without DWARF debuginfo_ and allow you to print out typed
variable values. It just relies on BTF + kallsyms.

So the proof of concept is proven, and I'm quite excited about it!

Stephen

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-04-27 18:24             ` Stephen Brennan
@ 2022-04-29 17:10               ` Alexei Starovoitov
  2022-05-03 14:39                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2022-04-29 17:10 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Andrii Nakryiko, Arnaldo Carvalho de Melo, Yonghong Song,
	Shung-Hsi Yu, bpf, Omar Sandoval, Arnaldo Carvalho de Melo

On Wed, Apr 27, 2022 at 11:43 AM Stephen Brennan
<stephen.s.brennan@oracle.com> wrote:
> >
> > I think this quirk of using kallsyms strings is a no-go. But we should
> > experiment and see how much bigger BTF becomes when including all the
> > variables. Can you try to prototype pahole's support for this?
>
> Hi Andrii,
>
> Sorry for such a delay here. I tried to prototype this last month but
> encountered some issues I couldn't resolve. But recently I picked it up
> and I've created a prototype [1] which outputs all variables. (It's a
> quite bad prototype, it strips out some useful logic regarding the
> BTF_VAR_DATASEC for percpu variables. But I think it's good enough).
>
> On my 5.4-based kernel I saw an increase in BTF section size from 3.8
> MiB all the way to 6.1 MiB, or more precisely:
>
> BTF section before: 3905938 bytes
> BTF section after:  6391989 bytes (+2486051, +63.6%)
>
> So almost a 2.5 MiB increase. My prototype doesn't output the
> btf_var_secinfo structs for percpu variables anymore, which probably
> breaks some BPF and reduces BTF slightly. But it also is outputting
> a few thousand "dwarf variables" which were correctly filtered before,
> so I think it's a wash and it's a pretty good comparison.
>
> Clearly it can't be added without a configuration option, as 2.5 MiB is
> pretty huge for a kernel memory addition. But I don't think it's so huge
> that nobody would enable it. I know I would :)
>
> [1]: https://github.com/brenns10/dwarves/tree/remove_percpu_restriction_1
>
> > As you
> > said, we can guard this extra information with KConfig and pahole
> > flags, so distros can always opt-out of bigger BTF if that's too
> > prohibitive. As it is right now, without firm understanding how big
> > the final BTF is it's hard to make a good decision about go or no-go
> > for this.
>
> Hopefully this comparison sheds some light on that now!
>
> >
> > As for including source code itself, it going to be prohibitively
> > huge, so it's probably out of the question for now as well.
>
> Yeah, I wouldn't advocate for that.
>
> Now, to share some of the cool possibilities that this enables. I have:
> - prototype pahole [1] used for the kernel build,
> - a prototype drgn with BTF+kallsyms support [2],
> - some small kernel patches which add symbols to vmcoreinfo, so that
>   drgn can find the kallsyms section. I'm happy to share these, I just
>   haven't sent them anywhere yet.
>
> [2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf
>
> Combining these three things, I've got a debugger which can open up a
> vmcore _without DWARF debuginfo_ and allow you to print out typed
> variable values. It just relies on BTF + kallsyms.
>
> So the proof of concept is proven, and I'm quite excited about it!

Exciting indeed. This is pretty cool.

I'm afraid we cannot justify 2.5 Mb kernel memory increase
for pure debugging. The existing vmlinux BTF is used
by the kernel itself to validate bpf prog access.
bpf progs cannot access normal global vars.
If/when they are we can reconsider.

As an alternative path I think we could introduce hierarchical
split BTF.
Currently vmlinux BTF and BTF of kernel modules is a tree
of depth 2.
We can keep such representation of BTFs and
introduce a fake kernel module that contains kernel global vars.
drgn can parse vmlinux BTF plus BTFs of all ko-s including fake one
and obtain the same amount of debug info as if global vars
were part of vmlinux BTF.
Consuming 2.5Mb on demand via ko would be acceptable
in some scenarios whereas unconditionally burning
that much memory in vmlinux BTF (even optional via kconfig)
is probably not.

Ideally we structure BTFs as a multi level tree.
Where BTF with global vars and other non essential BTF info
can be added to vmlinux BTF at run-time. BTF of kernel mods
can add on top and mods can have split BTF too.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-04-29 17:10               ` Alexei Starovoitov
@ 2022-05-03 14:39                 ` Arnaldo Carvalho de Melo
  2022-05-03 17:29                   ` Stephen Brennan
  0 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-03 14:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stephen Brennan, Andrii Nakryiko, Yonghong Song, Shung-Hsi Yu,
	bpf, Omar Sandoval, Arnaldo Carvalho de Melo

Em Fri, Apr 29, 2022 at 10:10:01AM -0700, Alexei Starovoitov escreveu:
> On Wed, Apr 27, 2022 at 11:43 AM Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
> > [2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf

> > Combining these three things, I've got a debugger which can open up a
> > vmcore _without DWARF debuginfo_ and allow you to print out typed
> > variable values. It just relies on BTF + kallsyms.

> > So the proof of concept is proven, and I'm quite excited about it!

> Exciting indeed. This is pretty cool.

Indeed!
 
> I'm afraid we cannot justify 2.5 Mb kernel memory increase for pure
> debugging. The existing vmlinux BTF is used by the kernel itself to
> validate bpf prog access.  bpf progs cannot access normal global vars.
> If/when they are we can reconsider.
 
> As an alternative path I think we could introduce hierarchical
> split BTF.

Which we already have in the form of BTF for modules that use vmlinux as
a base for common types.

> Currently vmlinux BTF and BTF of kernel modules is a tree
> of depth 2.

> We can keep such representation of BTFs and introduce a fake kernel
> module that contains kernel global vars.

pahole would generate a naked BTF just with variables and types not
present in the main vmlinux BTF and refer to it for all the other types.

> drgn can parse vmlinux BTF plus BTFs of all ko-s including fake one
> and obtain the same amount of debug info as if global vars
> were part of vmlinux BTF.

Right.

> Consuming 2.5Mb on demand via ko would be acceptable in some scenarios
> whereas unconditionally burning that much memory in vmlinux BTF (even
> optional via kconfig) is probably not.

And since it would be just an extra kernel module, the existing
packaging processes (in distros, embedded systems, etc) that care about
BTF would carry this without any modification to existing practices,
i.e. selecting CONFIG_DEBUG_INFO_BTF=y would bt default enable
CONFIG_DEBUG_INFO_GLOBAL_VARIABLES_BTF=y, which could be optionally
disabled by people not wanting to carry this extra info.

I.e. it would be always available but not always loaded.

> Ideally we structure BTFs as a multi level tree.  Where BTF with
> global vars and other non essential BTF info can be added to vmlinux
> BTF at run-time. BTF of kernel mods can add on top and mods can have
> split BTF too.

Yeah, reuses existing mechanizm, doesn't increase the kernel BTF
footprint by default, allows for debuggers, profilers, tracers, etc to
ask for extra info in the form of just loading btf_global_variables.ko.

- Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-05-03 14:39                 ` Arnaldo Carvalho de Melo
@ 2022-05-03 17:29                   ` Stephen Brennan
  2022-05-03 22:31                     ` Alan Maguire
  0 siblings, 1 reply; 13+ messages in thread
From: Stephen Brennan @ 2022-05-03 17:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Alexei Starovoitov
  Cc: Andrii Nakryiko, Yonghong Song, Shung-Hsi Yu, bpf, Omar Sandoval,
	Arnaldo Carvalho de Melo, Stephen Brennan

Arnaldo Carvalho de Melo <acme@kernel.org> writes:
> Em Fri, Apr 29, 2022 at 10:10:01AM -0700, Alexei Starovoitov escreveu:
>> On Wed, Apr 27, 2022 at 11:43 AM Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
>> > [2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf
>
>> > Combining these three things, I've got a debugger which can open up a
>> > vmcore _without DWARF debuginfo_ and allow you to print out typed
>> > variable values. It just relies on BTF + kallsyms.
>
>> > So the proof of concept is proven, and I'm quite excited about it!
>
>> Exciting indeed. This is pretty cool.
>
> Indeed!
>
>> I'm afraid we cannot justify 2.5 Mb kernel memory increase for pure
>> debugging. The existing vmlinux BTF is used by the kernel itself to
>> validate bpf prog access.  bpf progs cannot access normal global vars.
>> If/when they are we can reconsider.
>
>> As an alternative path I think we could introduce hierarchical
>> split BTF.
>
> Which we already have in the form of BTF for modules that use vmlinux as
> a base for common types.
>
>> Currently vmlinux BTF and BTF of kernel modules is a tree
>> of depth 2.
>
>> We can keep such representation of BTFs and introduce a fake kernel
>> module that contains kernel global vars.

This is an awesome idea :)

>
> pahole would generate a naked BTF just with variables and types not
> present in the main vmlinux BTF and refer to it for all the other types.
>
>> drgn can parse vmlinux BTF plus BTFs of all ko-s including fake one
>> and obtain the same amount of debug info as if global vars
>> were part of vmlinux BTF.
>
> Right.
>
>> Consuming 2.5Mb on demand via ko would be acceptable in some scenarios
>> whereas unconditionally burning that much memory in vmlinux BTF (even
>> optional via kconfig) is probably not.
>
> And since it would be just an extra kernel module, the existing
> packaging processes (in distros, embedded systems, etc) that care about
> BTF would carry this without any modification to existing practices,
> i.e. selecting CONFIG_DEBUG_INFO_BTF=y would bt default enable
> CONFIG_DEBUG_INFO_GLOBAL_VARIABLES_BTF=y, which could be optionally
> disabled by people not wanting to carry this extra info.
>
> I.e. it would be always available but not always loaded.
>
>> Ideally we structure BTFs as a multi level tree.  Where BTF with
>> global vars and other non essential BTF info can be added to vmlinux
>> BTF at run-time. BTF of kernel mods can add on top and mods can have
>> split BTF too.

I see what you mean. It does sound a bit frustrating to have an
additional BTF module to augment every external module, which would be
the third level of that tree.

We'll need to allocate more module structs and pages within the kernel
for that data, I wonder whether it would be cheaper for the
"non-essential" module BTF to just reside in the same BTF section of
that module.

I suppose I can run my modified pahole on some sample modules and see
the BTF size difference, rather than just speculating, I'll do that in a
follow-up here.

> Yeah, reuses existing mechanizm, doesn't increase the kernel BTF
> footprint by default, allows for debuggers, profilers, tracers, etc to
> ask for extra info in the form of just loading btf_global_variables.ko.

I agree, this is a quite elegant solution. Though it'll require a fair
bit of work to achieve, I do think it's important to keep the footprint
down. One thing I'd like to see in this world is a way to instruct the
kernel that "I always want the non-essential BTF loaded", maybe via
cmdline or sysctl. This way, the module loader would know to search for
"$MODNAME-btf" for each module which doesn't end with "-btf".

The reason for this would be to increase the chances that a vmcore you
create would be truly self-contained: any loaded module has all
"non-essential" BTF alongside it. I suppose this would need to be
implemented across the kernel and the userspace tools for loading kernel
modules.

This is all because the only case where BTF+kallsyms would be useful to
you is when you don't have DWARF readily available. In the live case,
you can load the modules you need dynamically, so you don't necessarily
*need* to have the extra BTF loaded at all times. But if you want a
system configured to create vmcores, and you'd like to enable analysis
even in the absence of the DWARF or other data, then you should ensure
that all the non-essential BTF is in memory at all times. Otherwise,
you'd need to go hunting for some .ko files in some kernel package, and
at that point... just go find the DWARF!

It'll take some learning on my part to see how all of this would come
together on the pahole and kbuild side of things. If anybody has any
pointers for this I'd appreciate it :)

Thanks,
Stephen

> - Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-05-03 17:29                   ` Stephen Brennan
@ 2022-05-03 22:31                     ` Alan Maguire
  2022-05-10  0:10                       ` Andrii Nakryiko
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Maguire @ 2022-05-03 22:31 UTC (permalink / raw)
  To: Stephen Brennan
  Cc: Arnaldo Carvalho de Melo, Alexei Starovoitov, Andrii Nakryiko,
	Yonghong Song, Shung-Hsi Yu, bpf, Omar Sandoval,
	Arnaldo Carvalho de Melo, Stephen Brennan

On Tue, 3 May 2022, Stephen Brennan wrote:

> >> Ideally we structure BTFs as a multi level tree.  Where BTF with
> >> global vars and other non essential BTF info can be added to vmlinux
> >> BTF at run-time. BTF of kernel mods can add on top and mods can have
> >> split BTF too.
> 
> I see what you mean. It does sound a bit frustrating to have an
> additional BTF module to augment every external module, which would be
> the third level of that tree.
> 
> We'll need to allocate more module structs and pages within the kernel
> for that data, I wonder whether it would be cheaper for the
> "non-essential" module BTF to just reside in the same BTF section of
> that module.
> 
> I suppose I can run my modified pahole on some sample modules and see
> the BTF size difference, rather than just speculating, I'll do that in a
> follow-up here.
> 
> > Yeah, reuses existing mechanizm, doesn't increase the kernel BTF
> > footprint by default, allows for debuggers, profilers, tracers, etc to
> > ask for extra info in the form of just loading btf_global_variables.ko.
> 
> I agree, this is a quite elegant solution. Though it'll require a fair
> bit of work to achieve, I do think it's important to keep the footprint
> down. One thing I'd like to see in this world is a way to instruct the
> kernel that "I always want the non-essential BTF loaded", maybe via
> cmdline or sysctl. This way, the module loader would know to search for
> "$MODNAME-btf" for each module which doesn't end with "-btf".
>

Hmm, could we just have a tristate CONFIG_DEBUG_INFO_BTF_EXTRA?
If set to y, the extra vars are builtin to vmlinux BTF and
modules, and if set to m, they reside in /sys/kernel/btf/vmlinux-btf-extra
courtesy of the vmlinux-btf-extra.ko module (or whatever naming
scheme makes sense). Looks like pahole already has an option to
store encoded BTF elsewhere:

--btf_encode_detached=FILENAME

...so maybe all we need is something like --btf_gen_var_only for
the case where we build the btf-extra module

pahole -J --btf_base vmlinux  --btf_gen_var_only 
--btf_encode_detached=vmlinux_btf_extra

?

That's still only 2-way split BTF (base vmlinux BTF
plus vmlinux variables); we'd only need the
three-way split for the case where modules use the
-extra approach too, and I'd wonder about the viability
of having an -extra BTF module for each module, especially
if space-saving is the goal.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Question: missing vmlinux BTF variable declarations
  2022-05-03 22:31                     ` Alan Maguire
@ 2022-05-10  0:10                       ` Andrii Nakryiko
  0 siblings, 0 replies; 13+ messages in thread
From: Andrii Nakryiko @ 2022-05-10  0:10 UTC (permalink / raw)
  To: Alan Maguire
  Cc: Stephen Brennan, Arnaldo Carvalho de Melo, Alexei Starovoitov,
	Yonghong Song, Shung-Hsi Yu, bpf, Omar Sandoval,
	Arnaldo Carvalho de Melo, Stephen Brennan

On Tue, May 3, 2022 at 3:32 PM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Tue, 3 May 2022, Stephen Brennan wrote:
>
> > >> Ideally we structure BTFs as a multi level tree.  Where BTF with
> > >> global vars and other non essential BTF info can be added to vmlinux
> > >> BTF at run-time. BTF of kernel mods can add on top and mods can have
> > >> split BTF too.
> >
> > I see what you mean. It does sound a bit frustrating to have an
> > additional BTF module to augment every external module, which would be
> > the third level of that tree.
> >
> > We'll need to allocate more module structs and pages within the kernel
> > for that data, I wonder whether it would be cheaper for the
> > "non-essential" module BTF to just reside in the same BTF section of
> > that module.
> >
> > I suppose I can run my modified pahole on some sample modules and see
> > the BTF size difference, rather than just speculating, I'll do that in a
> > follow-up here.
> >
> > > Yeah, reuses existing mechanizm, doesn't increase the kernel BTF
> > > footprint by default, allows for debuggers, profilers, tracers, etc to
> > > ask for extra info in the form of just loading btf_global_variables.ko.
> >
> > I agree, this is a quite elegant solution. Though it'll require a fair
> > bit of work to achieve, I do think it's important to keep the footprint
> > down. One thing I'd like to see in this world is a way to instruct the
> > kernel that "I always want the non-essential BTF loaded", maybe via
> > cmdline or sysctl. This way, the module loader would know to search for
> > "$MODNAME-btf" for each module which doesn't end with "-btf".
> >
>
> Hmm, could we just have a tristate CONFIG_DEBUG_INFO_BTF_EXTRA?
> If set to y, the extra vars are builtin to vmlinux BTF and
> modules, and if set to m, they reside in /sys/kernel/btf/vmlinux-btf-extra
> courtesy of the vmlinux-btf-extra.ko module (or whatever naming
> scheme makes sense). Looks like pahole already has an option to
> store encoded BTF elsewhere:
>
> --btf_encode_detached=FILENAME
>
> ...so maybe all we need is something like --btf_gen_var_only for
> the case where we build the btf-extra module
>
> pahole -J --btf_base vmlinux  --btf_gen_var_only
> --btf_encode_detached=vmlinux_btf_extra
>
> ?
>

So BTF dedup would take care of keeping only extra DATASEC in such
module while reusing all the types from vmlinux BTF, as long as the
module itself has all the vmlinux BTF types plus those variables. It's
just a question of having ability to enable/disable global variables
generation. Which honestly is not a bad idea in general to have
overall more or less granular control over which subsets of BTF pahole
should emit.

> That's still only 2-way split BTF (base vmlinux BTF
> plus vmlinux variables); we'd only need the
> three-way split for the case where modules use the
> -extra approach too, and I'd wonder about the viability
> of having an -extra BTF module for each module, especially
> if space-saving is the goal.
>

Yeah, it feels like having that for modules is taking this to another
level of complexity, while adding this for vmlinux only seems pretty
doable with minimal changes (we don't need any extra BTF functionality
as we will just leave current start-based topology with vmlinux BTF in
the center; it's only Kbuild/Makefile modifications). I also wonder if
it will allow saving much for modules, they are probably not having
that many global variables anyways and it's acceptable to have them in
module's BTF. We can also separately control VMLINUX_BTF_EXTRAS as
y/n/m and MODULE_BTF_EXTRAS as y/n (that is, none or built into the
module itself), for starters at least.

> Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-05-10  0:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-09 23:20 Question: missing vmlinux BTF variable declarations Stephen Brennan
2022-03-14  7:09 ` Shung-Hsi Yu
2022-03-15  5:53   ` Yonghong Song
2022-03-15 16:37     ` Stephen Brennan
2022-03-15 17:58       ` Arnaldo Carvalho de Melo
2022-03-16 16:06         ` Stephen Brennan
2022-03-25 17:07           ` Andrii Nakryiko
2022-04-27 18:24             ` Stephen Brennan
2022-04-29 17:10               ` Alexei Starovoitov
2022-05-03 14:39                 ` Arnaldo Carvalho de Melo
2022-05-03 17:29                   ` Stephen Brennan
2022-05-03 22:31                     ` Alan Maguire
2022-05-10  0:10                       ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).