Re: [PATCH RFC bpf-next 00/10] bpf: CO-RE support in the kernel.

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: "David S . Miller" <davem@davemloft.net>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	mcroce@microsoft.com, bpf <bpf@vger.kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH RFC bpf-next 00/10] bpf: CO-RE support in the kernel.
Date: Tue, 28 Sep 2021 09:35:06 -0700	[thread overview]
Message-ID: <20210928163506.uji2h54evv3g4tlb@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CACAyw98=qk_zoAP_J4eG3p_OhHJgU3-6ae+Xzrd4h6tjdm_GCQ@mail.gmail.com>

On Tue, Sep 28, 2021 at 09:30:23AM +0100, Lorenz Bauer wrote:
> On Mon, 27 Sept 2021 at 17:50, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Sep 27, 2021 at 05:12:15PM +0100, Lorenz Bauer wrote:
> > > On Sat, 25 Sept 2021 at 00:13, Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Thu, Sep 23, 2021 at 12:33:58PM +0100, Lorenz Bauer wrote:
> > > > >
> > > > > Some questions:
> > > > > * How can this handle kernels that don't have built-in BTF? Not a
> > > > > problem for myself, but some people have to deal with BTF-less distro
> > > > > kernels by using pahole to generate external BTF from debug symbols.
> > > > > Can we accommodate that?
> > > >
> > > > I think so, but it probably should be done as a generic feature:
> > > > "populate kernel BTF".
> > > > When kernel wasn't compiled with BTF there could be a way to
> > > > populate it with such. Just like we do sys_bpf(BTF_LOAD)
> > > > for program's BTF we can allow populating vmlinux BTF this way.
> > > > Unlike builtin BTF it wouldn't be trusted for certain verifier assumptions,
> > > > but better than nothing and more convenient than specifying BTF file
> > > > on a side for every bpf prog load with traditional libbpf style.
> > >
> > > From my POV we already have an API for external BTF (and I think
> > > libbpf does too?) but would need a new API for "load kernel BTF".
> > > Global state like this also doesn't work well for several individual
> > > processes. Imagine multiple programs on the system trying to each
> > > replace the kernel BTF, how would that work? Which one wins?
> >
> > The kernel BTF can be only one, of course.
> > I don't expect progs to update the kernel BTF when they start.
> > It's more of the admin/chef job when kernel boots.
> > Only for the cases when kernel somehow was compiled without BTF.
> >
> > > Being
> > > able to give my own fd for kernel BTF circumvents all those problems
> > > and seems much cleaner to me.
> >
> > You mean to pass kernel BTF's fd to the kernel?
> > It's doable, but I don't quite see the operational side of it.
> > If progs have to carry both their BTF and kernel BTF why would
> > they need CO-RE at all? If they were compiled with given kernel BTF
> > there is no need to adjust offsets for the given host.
> > I suspect I simply don't understand your use case :)
> 
> This is the "distro ships without BTF" case that the aqua sec folks
> have been grappling with, and for which btfhub is a solution. If the
> distro disables BTF they are unlikely to perform this "admin" job in
> the first place. So whose responsibility is it to load that BTF?
> Currently it falls on the developers of the user space tooling to
> provide alternative BTF. Only allowing a single replacement BTF makes
> this difficult.

There is only one BTF that matches the kernel. If one was buggy
(due to pahole/compiler issue) it would be replaced with the fixed one.
I can see the case where two vmlinux BTFs would be used for testing.
Like the kernel compiled with clang produces one BTF and the kernel
compiled with gcc->pahole produces another BTF, but the vmlinux would
be different too. So the admins/users should be using BTF that
matches the kernel.

> Here is why:
> * Since external BTF is a thing, loaders today have to provide a way
> to relocate against external BTF in a non-standard location. This
> means loading the file from disk and then performing CO-RE using that
> info.
> * Users of the loader build a btfhub integration (or similar) and
> provide a path to the external BTF during load. They do this because
> they will have to support legacy kernels for years to come.
> * Under my proposal, a loader can detect whether in-kernel CO-RE is
> supported, load the BTF provided by the user into the kernel, and pass
> that fd to PROG_LOAD.
> * This is transparent to the user: they keep using their existing BTF
> but get the benefit of canonical CO-RE resolution.
> 
> We don't have to introduce a new loader-side API to deal with this
> situation. We also don't have to deal with a global resource that is
> subject to the whims of the distro.

I agree with all of the above. It's easy to add 'target_vmlinux_btf_fd'
to PROG_LOAD and let CO-RE in the kernel use that, but the kernel
has dynamically loaded kernel modules and it does search through them.
They will not be supported in such case. I think it's an ok limitation.