Re: Question: missing vmlinux BTF variable declarations

From: Stephen Brennan <stephen@brennan.io>
To: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Yonghong Song <yhs@fb.com>, Shung-Hsi Yu <shung-hsi.yu@suse.com>,
	bpf <bpf@vger.kernel.org>, Omar Sandoval <osandov@osandov.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Stephen Brennan <stephen.s.brennan@oracle.com>
Subject: Re: Question: missing vmlinux BTF variable declarations
Date: Tue, 03 May 2022 10:29:05 -0700	[thread overview]
Message-ID: <87zgjy8qzi.fsf@brennan.io> (raw)
In-Reply-To: <YnE+k33iUtLH7Lks@kernel.org>

Arnaldo Carvalho de Melo <acme@kernel.org> writes:
> Em Fri, Apr 29, 2022 at 10:10:01AM -0700, Alexei Starovoitov escreveu:
>> On Wed, Apr 27, 2022 at 11:43 AM Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
>> > [2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf
>
>> > Combining these three things, I've got a debugger which can open up a
>> > vmcore _without DWARF debuginfo_ and allow you to print out typed
>> > variable values. It just relies on BTF + kallsyms.
>
>> > So the proof of concept is proven, and I'm quite excited about it!
>
>> Exciting indeed. This is pretty cool.
>
> Indeed!
>
>> I'm afraid we cannot justify 2.5 Mb kernel memory increase for pure
>> debugging. The existing vmlinux BTF is used by the kernel itself to
>> validate bpf prog access.  bpf progs cannot access normal global vars.
>> If/when they are we can reconsider.
>
>> As an alternative path I think we could introduce hierarchical
>> split BTF.
>
> Which we already have in the form of BTF for modules that use vmlinux as
> a base for common types.
>
>> Currently vmlinux BTF and BTF of kernel modules is a tree
>> of depth 2.
>
>> We can keep such representation of BTFs and introduce a fake kernel
>> module that contains kernel global vars.

This is an awesome idea :)

>
> pahole would generate a naked BTF just with variables and types not
> present in the main vmlinux BTF and refer to it for all the other types.
>
>> drgn can parse vmlinux BTF plus BTFs of all ko-s including fake one
>> and obtain the same amount of debug info as if global vars
>> were part of vmlinux BTF.
>
> Right.
>
>> Consuming 2.5Mb on demand via ko would be acceptable in some scenarios
>> whereas unconditionally burning that much memory in vmlinux BTF (even
>> optional via kconfig) is probably not.
>
> And since it would be just an extra kernel module, the existing
> packaging processes (in distros, embedded systems, etc) that care about
> BTF would carry this without any modification to existing practices,
> i.e. selecting CONFIG_DEBUG_INFO_BTF=y would bt default enable
> CONFIG_DEBUG_INFO_GLOBAL_VARIABLES_BTF=y, which could be optionally
> disabled by people not wanting to carry this extra info.
>
> I.e. it would be always available but not always loaded.
>
>> Ideally we structure BTFs as a multi level tree.  Where BTF with
>> global vars and other non essential BTF info can be added to vmlinux
>> BTF at run-time. BTF of kernel mods can add on top and mods can have
>> split BTF too.

I see what you mean. It does sound a bit frustrating to have an
additional BTF module to augment every external module, which would be
the third level of that tree.

We'll need to allocate more module structs and pages within the kernel
for that data, I wonder whether it would be cheaper for the
"non-essential" module BTF to just reside in the same BTF section of
that module.

I suppose I can run my modified pahole on some sample modules and see
the BTF size difference, rather than just speculating, I'll do that in a
follow-up here.

> Yeah, reuses existing mechanizm, doesn't increase the kernel BTF
> footprint by default, allows for debuggers, profilers, tracers, etc to
> ask for extra info in the form of just loading btf_global_variables.ko.

I agree, this is a quite elegant solution. Though it'll require a fair
bit of work to achieve, I do think it's important to keep the footprint
down. One thing I'd like to see in this world is a way to instruct the
kernel that "I always want the non-essential BTF loaded", maybe via
cmdline or sysctl. This way, the module loader would know to search for
"$MODNAME-btf" for each module which doesn't end with "-btf".

The reason for this would be to increase the chances that a vmcore you
create would be truly self-contained: any loaded module has all
"non-essential" BTF alongside it. I suppose this would need to be
implemented across the kernel and the userspace tools for loading kernel
modules.

This is all because the only case where BTF+kallsyms would be useful to
you is when you don't have DWARF readily available. In the live case,
you can load the modules you need dynamically, so you don't necessarily
*need* to have the extra BTF loaded at all times. But if you want a
system configured to create vmcores, and you'd like to enable analysis
even in the absence of the DWARF or other data, then you should ensure
that all the non-essential BTF is in memory at all times. Otherwise,
you'd need to go hunting for some .ko files in some kernel package, and
at that point... just go find the DWARF!

It'll take some learning on my part to see how all of this would come
together on the pahole and kbuild side of things. If anybody has any
pointers for this I'd appreciate it :)

Thanks,
Stephen

> - Arnaldo