All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Brennan <stephen.s.brennan@oracle.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>,
	Yonghong Song <yhs@fb.com>, Shung-Hsi Yu <shung-hsi.yu@suse.com>,
	bpf <bpf@vger.kernel.org>, Omar Sandoval <osandov@osandov.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: Re: Question: missing vmlinux BTF variable declarations
Date: Wed, 27 Apr 2022 11:24:42 -0700	[thread overview]
Message-ID: <87r15iv0yd.fsf@stepbren-lnx.us.oracle.com> (raw)
In-Reply-To: <CAEf4BzbiFNnsu9pji5ifzj4nVEyAYYdqP=QVZ3XFwzL48prP3A@mail.gmail.com>

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Wed, Mar 16, 2022 at 11:11 PM Stephen Brennan <stephen@brennan.io> wrote:
>>
>> Arnaldo Carvalho de Melo <acme@kernel.org> writes:
>> [...]
>> >> I think that kallsyms, BTF, and ORC together will be enough to provide a
>> >> lite debugging experience. Some things will be missing:
>> >
>> >> - mapping backtrace addresses to source code lines
>> >
>> > So, BTF has provisions for that, and its present in the eBPF programs,
>> > perf annotate uses it, see tools/perf/util/annotate.c,
>> > symbol__disassemble_bpf(), it goes like:
>> >
>> >         struct bpf_prog_linfo *prog_linfo = NULL;
>> >
>> >         info_node = perf_env__find_bpf_prog_info(dso->bpf_prog.env,
>> >                                                  dso->bpf_prog.id);
>> >         if (!info_node) {
>> >                 ret = SYMBOL_ANNOTATE_ERRNO__BPF_MISSING_BTF;
>> >                 goto out;
>> >         }
>> >         info_linear = info_node->info_linear;
>> >         sub_id = dso->bpf_prog.sub_id;
>> >
>> >         info.buffer = (void *)(uintptr_t)(info_linear->info.jited_prog_insns);
>> >         info.buffer_length = info_linear->info.jited_prog_len;
>> >
>> >         if (info_linear->info.nr_line_info)
>> >                 prog_linfo = bpf_prog_linfo__new(&info_linear->info);
>> >
>> >                 addr = pc + ((u64 *)(uintptr_t)(info_linear->info.jited_ksyms))[sub_id];
>> >                 count = disassemble(pc, &info);
>> >
>> >                 if (prog_linfo)
>> >                         linfo = bpf_prog_linfo__lfind_addr_func(prog_linfo,
>> >                                                                 addr, sub_id,
>> >                                                                 nr_skip);
>> >                               if (linfo && btf) {
>> >                         srcline = btf__name_by_offset(btf, linfo->line_off);
>> >                         nr_skip++;
>> >                 } else
>> >                         srcline = NULL;
>> >
>> > etc.
>> >
>> > Having this for the kernel proper is thus doable, but then we go on
>> > making BTF info grow.
>> >
>> > Perhaps having this as optional, distros or appliances wanting to have a
>> > kernel with this extra info would add it and then tools would use it if
>> > available?
>>
>> I didn't know about the source code mapping support! And I certainly see
>> the utility of it for BPF programs. However, I'm not sure that a "lite"
>> kernel debugging experience *needs* source line mapping. I suppose I
>> should have made it more clear, but I don't think of that list of
>> "missing" features as a checklist of things we'd want feature parity
>> for.
>>
>> The advantage of BTF for debugging would be that it is small, and that
>> it is part of the kernel image without referencing any other file,
>> build-id, or kernel version. Ideally, a debugger could load a crash dump
>> with no additional information, and support a reasonable level of
>> debugging. I think looking up typed data structure values via global
>> symbols is part of that level, as well as simple backtraces and other
>> memory access.
>>
>> I wouldn't want to try to re-implement DWARF for debuginfo. If you have
>> the DWARF debuginfo, then your experience should be much better.
>>
>> >> - intelligent stack frame information from DWARF CFI (e.g.
>> >>   register/variable values)
>> >> - probably other things, I'm not a DWARF expert.
>> [...]
>> >> > Currently on my local machine, the vmlinux BTF's size is 4.2MB and
>> >> > adding 1MB would be a big increase. CONFIG_DEBUG_INFO_BTF_ALL is a good
>> >> > idea. But we might be able to just add global variables without this
>> >> > new config if we have strong use case.
>> >
>> >> And unfortunately 1MiB is really just a shot in the dark, guessing
>> >> around 70k variables with no string data.
>> >
>> > Maybe we can have a separate BTF file with all this extra info that
>> > could be fetched from somewhere, keyed by build-id, like is now possible
>> > with debuginfod and DWARF?
>>
>> For me, this ranges into the territory of duplicating DWARF. If you lose
>> the one key advantage of "debuginfoless debugging", then you might as
>> well use the build-id to lookup DWARF debuginfo as we can today.
>>
>> This is why I'm trying to propose the means of combining the kallsyms
>> string data with BTF. Anything that can make the overall size increase
>> manageable so that all the necessary data can stay in the kernel image.
>
> I think this quirk of using kallsyms strings is a no-go. But we should
> experiment and see how much bigger BTF becomes when including all the
> variables. Can you try to prototype pahole's support for this?

Hi Andrii,

Sorry for such a delay here. I tried to prototype this last month but
encountered some issues I couldn't resolve. But recently I picked it up
and I've created a prototype [1] which outputs all variables. (It's a
quite bad prototype, it strips out some useful logic regarding the
BTF_VAR_DATASEC for percpu variables. But I think it's good enough).

On my 5.4-based kernel I saw an increase in BTF section size from 3.8
MiB all the way to 6.1 MiB, or more precisely:

BTF section before: 3905938 bytes
BTF section after:  6391989 bytes (+2486051, +63.6%)

So almost a 2.5 MiB increase. My prototype doesn't output the
btf_var_secinfo structs for percpu variables anymore, which probably
breaks some BPF and reduces BTF slightly. But it also is outputting
a few thousand "dwarf variables" which were correctly filtered before,
so I think it's a wash and it's a pretty good comparison.

Clearly it can't be added without a configuration option, as 2.5 MiB is
pretty huge for a kernel memory addition. But I don't think it's so huge
that nobody would enable it. I know I would :)

[1]: https://github.com/brenns10/dwarves/tree/remove_percpu_restriction_1

> As you
> said, we can guard this extra information with KConfig and pahole
> flags, so distros can always opt-out of bigger BTF if that's too
> prohibitive. As it is right now, without firm understanding how big
> the final BTF is it's hard to make a good decision about go or no-go
> for this.

Hopefully this comparison sheds some light on that now!

>
> As for including source code itself, it going to be prohibitively
> huge, so it's probably out of the question for now as well.

Yeah, I wouldn't advocate for that.

Now, to share some of the cool possibilities that this enables. I have:
- prototype pahole [1] used for the kernel build,
- a prototype drgn with BTF+kallsyms support [2],
- some small kernel patches which add symbols to vmcoreinfo, so that
  drgn can find the kallsyms section. I'm happy to share these, I just
  haven't sent them anywhere yet.

[2]: https://github.com/brenns10/drgn/tree/kallsyms_plus_btf

Combining these three things, I've got a debugger which can open up a
vmcore _without DWARF debuginfo_ and allow you to print out typed
variable values. It just relies on BTF + kallsyms.

So the proof of concept is proven, and I'm quite excited about it!

Stephen

  reply	other threads:[~2022-04-27 18:41 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-09 23:20 Question: missing vmlinux BTF variable declarations Stephen Brennan
2022-03-14  7:09 ` Shung-Hsi Yu
2022-03-15  5:53   ` Yonghong Song
2022-03-15 16:37     ` Stephen Brennan
2022-03-15 17:58       ` Arnaldo Carvalho de Melo
2022-03-16 16:06         ` Stephen Brennan
2022-03-25 17:07           ` Andrii Nakryiko
2022-04-27 18:24             ` Stephen Brennan [this message]
2022-04-29 17:10               ` Alexei Starovoitov
2022-05-03 14:39                 ` Arnaldo Carvalho de Melo
2022-05-03 17:29                   ` Stephen Brennan
2022-05-03 22:31                     ` Alan Maguire
2022-05-10  0:10                       ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r15iv0yd.fsf@stepbren-lnx.us.oracle.com \
    --to=stephen.s.brennan@oracle.com \
    --cc=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=osandov@osandov.com \
    --cc=shung-hsi.yu@suse.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.