Re: Question: missing vmlinux BTF variable declarations

From: Stephen Brennan <stephen@brennan.io>
To: Yonghong Song <yhs@fb.com>, Shung-Hsi Yu <shung-hsi.yu@suse.com>
Cc: bpf@vger.kernel.org, Omar Sandoval <osandov@osandov.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Stephen Brennan <stephen.s.brennan@oracle.com>
Subject: Re: Question: missing vmlinux BTF variable declarations
Date: Tue, 15 Mar 2022 09:37:46 -0700	[thread overview]
Message-ID: <8735jjw4rp.fsf@brennan.io> (raw)
In-Reply-To: <f6f4a548-8e50-f676-8482-0ca541652cc6@fb.com>

Yonghong Song <yhs@fb.com> writes:
> On 3/14/22 12:09 AM, Shung-Hsi Yu wrote:
>> On Wed, Mar 09, 2022 at 03:20:47PM -0800, Stephen Brennan wrote:
>>> Hello everyone,
>>>
>>> I've been recently learning about BTF with a keen interest in using it
>>> as a fallback source of debug information. On the face of it, Linux
>>> kernels these days have a lot of introspection information. BTF provides
>>> information about types. kallsyms provides information about symbol
>>> locations. ORC allows us to reliably unwind stack traces. So together,
>>> these could enable a debugger (either postmortem, or live) to do a lot
>>> without needing to read the (very large) DWARF debuginfo files. For
>>> example, we could format backtraces with function names, we could
>
> For backtraces with function names, you probably still need ksyms since
> BTF won't encode address => symbol translation.

Yes, kallsyms is definitely required in this scheme. In practice, it
seems very common for distributions to be compiled not just with
CONFIG_KALLSYMS, but CONFIG_KALLSYMS_ALL.

Kallsyms is critical for mapping names to addresses (and vice versa).

>
>>> pretty-print global variables and data structures, etc. This is nice
>
> This indeed is a potential use case.
> We discussed this during adding per-cpu
> global variables. Ultimately we just added per-cpu global variables 
> since we didn't have a use case or request for other global variables.
>
> But I still would like to know beyond this whether you have other needs
> which BPF may or may not help. It would be good to know since if 
> ultimately you still need dwarf, then it might be undesirable to
> add general global variables to BTF.

I think that kallsyms, BTF, and ORC together will be enough to provide a
lite debugging experience. Some things will be missing:

- mapping backtrace addresses to source code lines
- intelligent stack frame information from DWARF CFI (e.g.
  register/variable values)
- probably other things, I'm not a DWARF expert.

However, I do have two interesting branches of drgn which demonstrate
the utility of just BTF+kallsyms:

1. https://github.com/osandov/drgn/pull/162
2. https://github.com/brenns10/drgn/tree/kallsyms_plus_btf

#1 adds preliminary BTF support, and #2 adds basic kallsyms support,
building on #1. Finally, I have some unpublished patches which add some
symbols into vmcoreinfo, which help us locate kallsyms info. From there,
drgn is able to take a core dump, and lookup symbols and get their
corresponding type info!

The only real blocker I see here is that the BTF data is mainly limited
to functions, so most of what you're doing is looking up function names
and viewing their signatures :)

>
>>> given that depending on your distro, it might be tough to get debuginfo,
>>> and it is quite large to download or install.
>>>
>>> As I've worked toward this goal, I discovered that while the
>>> BTF_KIND_VAR exists [1], the BTF included in the core kernel only has
>>> declarations for percpu variables. This makes BTF much less useful for
>>> this (admittedly odd) use case. Without a way to bind a name found in
>>> kallsyms to its type, we can't interpret global variables. It looks like
>>> the restriction for percpu-only variables is baked into the pahole BTF
>>> encoder [2].
>>>
>>> [1]: https://www.kernel.org/doc/html/latest/bpf/btf.html#btf-kind-var
>>> [2]: https://github.com/acmel/dwarves/blob/master/btf_encoder.c
>>>
>>> I wonder what the BPF / BTF community's thoughts are on including more
>>> of these global variable declarations? Perhaps behind a
>>> CONFIG_DEBUG_INFO_BTF_ALL, like how kallsyms does it? I'm aware that
>
> Currently on my local machine, the vmlinux BTF's size is 4.2MB and
> adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good
> idea. But we might be able to just add global variables without this
> new config if we have strong use case.

And unfortunately 1MiB is really just a shot in the dark, guessing
around 70k variables with no string data.

I'd love to use kallsyms to avoid adding new strings into BTF. If the
"all variables BTF" config added a dependency on "CONFIG_KALLSYMS_ALL",
then we could use the BTF "kind_flag" to indicate that string values
should be looked up in the kallsyms table, not the BTF strings section.
This could even be used to reduce the string footprint for BTF
function names.

Of course it's a more complex change to dwarves :(

>
>>> each declaration costs at least 16 bytes of BTF records, plus the
>>> strings and any necessary type data. The string cost could be mitigated
>>> by allowing "name_off" to refer to the kallsyms offset for variable or
>>> function declaration. But the additional records could cost around 1MiB
>>> for common distribution configurations.
>>>
>>> I know this isn't the designed use case for BTF, but I think it's very
>>> exciting.
>> 
>> I've been wondering about the same (possibility of using BTF for postmortem
>> debugging without debuginfo), though not to the extend that you've
>> researched.
>> 
>> I find the idea exciting as well, and quite useful for distros where the
>> kernel package changes quite often that the debuginfo package may be long
>> gone by the time a crash dump for such kernel is captured.
>
> I would love to use BTF (including global variables in BTF) for crash 
> dump. But I suspect we may still have some gaps. Maybe you can
> explore a little bit more on this?

Hopefully my above explanation gives more context here. There is code
(not production-ready) which can make use of these features together.
The next step for me has been trying to get the dwarves/pahole BTF
encoder to output *all* functions but I've hit some issues with it. If I
can get that to work, then I can present a full demo of these pieces
working together and we can be confident that there are no gaps.

Maybe this is a topic worth discussing at LSF/MM/BPF conference? Though
it's quite late for that...

Thanks,
Stephen

>
>> 
>> Shung-Hsi
>> 
>>> Thanks for your attention!
>>> Stephen
>>