Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Yonghong Song <yhs@fb.com>, Florent Revest <revest@chromium.org>,
	KP Singh <kpsingh@chromium.org>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Florent Revest <revest@google.com>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper
Date: Fri, 18 Dec 2020 12:47:07 -0800	[thread overview]
Message-ID: <CAEf4BzZ7Pk677ZjUprRpsKzF3epBvouogxMU7YKN9i5pH2zQRg@mail.gmail.com> (raw)
In-Reply-To: <20201218203655.clqyeeamwicvej5z@ast-mbp>

On Fri, Dec 18, 2020 at 12:36 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Dec 18, 2020 at 10:53:57AM -0800, Andrii Nakryiko wrote:
> > On Thu, Dec 17, 2020 at 7:20 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Dec 17, 2020 at 09:26:09AM -0800, Yonghong Song wrote:
> > > >
> > > >
> > > > On 12/17/20 7:31 AM, Florent Revest wrote:
> > > > > On Mon, Dec 14, 2020 at 7:47 AM Yonghong Song <yhs@fb.com> wrote:
> > > > > > On 12/11/20 6:40 AM, Florent Revest wrote:
> > > > > > > On Wed, Dec 2, 2020 at 10:18 PM Alexei Starovoitov
> > > > > > > <alexei.starovoitov@gmail.com> wrote:
> > > > > > > > I still think that adopting printk/vsnprintf for this instead of
> > > > > > > > reinventing the wheel
> > > > > > > > is more flexible and easier to maintain long term.
> > > > > > > > Almost the same layout can be done with vsnprintf
> > > > > > > > with exception of \0 char.
> > > > > > > > More meaningful names, etc.
> > > > > > > > See Documentation/core-api/printk-formats.rst
> > > > > > >
> > > > > > > I agree this would be nice. I finally got a bit of time to experiment
> > > > > > > with this and I noticed a few things:
> > > > > > >
> > > > > > > First of all, because helpers only have 5 arguments, if we use two for
> > > > > > > the output buffer and its size and two for the format string and its
> > > > > > > size, we are only left with one argument for a modifier. This is still
> > > > > > > enough for our usecase (where we'd only use "%ps" for example) but it
> > > > > > > does not strictly-speaking allow for the same layout that Andrii
> > > > > > > proposed.
> > > > > >
> > > > > > See helper bpf_seq_printf. It packs all arguments for format string and
> > > > > > puts them into an array. bpf_seq_printf will unpack them as it parsed
> > > > > > through the format string. So it should be doable to have more than
> > > > > > "%ps" in format string.
> > > > >
> > > > > This could be a nice trick, thank you for the suggestion Yonghong :)
> > > > >
> > > > > My understanding is that this would also require two extra args (one
> > > > > for the array of arguments and one for the size of this array) so it
> > > > > would still not fit the 5 arguments limit I described in my previous
> > > > > email.
> > > > > eg: this would not be possible:
> > > > > long bpf_snprintf(const char *out, u32 out_size,
> > > > >                    const char *fmt, u32 fmt_size,
> > > > >                   const void *data, u32 data_len)
> > > >
> > > > Right. bpf allows only up to 5 parameters.
> > > > >
> > > > > Would you then suggest that we also put the format string and its
> > > > > length in the first and second cells of this array and have something
> > > > > along the line of:
> > > > > long bpf_snprintf(const char *out, u32 out_size,
> > > > >                    const void *args, u32 args_len) ?
> > > > > This seems like a fairly opaque signature to me and harder to verify.
> > > >
> > > > One way is to define an explicit type for args, something like
> > > >    struct bpf_fmt_str_data {
> > > >       char *fmt;
> > > >       u64 fmt_len;
> > > >       u64 data[];
> > > >    };
> > >
> > > that feels a bit convoluted.
> > >
> > > The reason I feel unease with the helper as was originally proposed
> > > and with Andrii's proposal is all the extra strlen and strcpy that
> > > needs to be done. In the helper we have to call kallsyms_lookup()
> > > which is ok interface for what it was desinged to do,
> > > but it's awkward to use to construct new string ("%s [%s]", sym, modname)
> > > or to send two strings into a ring buffer.
> > > Andrii's zero separator idea will simplify bpf prog, but user space
> > > would need to do strlen anyway if it needs to pretty print.
> > > If we take pain on converting addr to sym+modname let's figure out
> > > how to make it easy for the bpf prog to do and easy for user space to consume.
> > > That's why I proposed snprintf.
> >
> > I have nothing against snprintf support for symbols. But
> > bpf_ksym_resolve() solves only a partially overlapping problem, so
> > deserves to be added in addition to snprintf support. With snprintf,
> > it will be hard to avoid two lookups of the same symbol to print "%s
> > [%s]" form, so there is a performance loss, which is probably bigger
> > than a simple search for a zero-byte.
>
> I suspect we're not on the same page in terms of what printf can do.
> See Documentation/core-api/printk-formats.rst and lib/vsprintf.c:symbol_string()
> It's exactly one lookup in sprintf implementation.
> bpf_snprintf(buf, "%ps", addr) would be equivalent to
> {
>   ksym_resolve(sym, modname, addr, SYM | MOD);
>   printf("%s [%s]", sym, modname);
> }

Ah, I missed that we'll have a single specifier for "%s [%s]" format.
My assumption was that we have one for symbol name only and another
for symbol module. Yeah, then it's fine from the performance
perspective.

>
> > But bpf_ksym_resolve() can be
> > used flexibly. You can either do two separate bpf_ksym_resolve() calls
> > to get symbol name (and its length) and symbol's module (and its
> > length), if you need to process it programmatically in BPF program. Or
> > you can bundle it together and let user-space process it. User-space
> > will need to copy data anyways because it can't stay in
> > perfbuf/ringbuf for long. So scanning for zero delimiters will be
> > negligible, it will just bring data into cache. All I'm saying is that
> > ksym_resolve() gives flexibility which snprintf can't provide.
>
> Well, with snprintf there will be no way to print mod symbol
> without modname, but imo it's a good thing.
> What is the use case for getting mod symbol without modname?

For easier post-processing on the user side. Instead of parsing
"vmlinux_symbol" or "module_symbol [module_name]" (two non-uniform
variants already), user-space would just get two separate strings. I
just like APIs that don't assume how I am going to use them :), so
"symbol [module]" format is a bit more inconvenient than decomposed
pieces.

>
> > Additionally, with ksym_resolve() being able to return base address,
> > it's now possible to do a bunch of new stuff, from in-BPF
> > symbolization to additional things like correlating memory accesses or
> > function calls, etc.
>
> Getting adjusted base address could be useful some day, but why now? What for?

I proposed that only if we do bpf_ksym_resolve(). No need to support
that in snprintf case, of course.

>
> > bits), my point is that ksym_resolve() is more powerful than
> > snprintf(): the latter can be used pretty much only for
> > pretty-printing.
>
> Potentially yes. I think the stated goal was pretty printing.

That's fine if we do only snprintf, yes. But if a separate helper,
then we should think more broadly.

>
> >
> > >
> > > As far as 6 arg issue:
> > > long bpf_snprintf(const char *out, u32 out_size,
> > >                   const char *fmt, u32 fmt_size,
> > >                   const void *data, u32 data_len);
> > > Yeah. It won't work as-is, but fmt_size is unnecessary nowadays.
> > > The verifier understands read-only data.
> > > Hence the helper can be:
> > > long bpf_snprintf(const char *out, u32 out_size,
> >
> > With the power of BTF, we can also put these two correlated values
> > into a single struct and pass a pointer to it. It will take only one
> > parameter for one memory region. Alternative is the "fat pointer"
> > approach that Go and Rust use, but it's less flexible overall.
>
> I think it will be less flexible when output size is fixed by the type info.
> With explicit size the bpf_snprintf() can print directly into ringbuffer.
> Multiple bpf_snprintf() will be able to fill it one by one reducing
> space available at every step.
> bpf_snprintf() would need to return the number of bytes, of course.
> Just like probe_read_str.

Ok, I should have probably demonstrated with an example. I don't
propose to specify the size through BTF itself. I was thinking about:

struct bpf_mem_ptr {
    void *data;
    size_t size;
};

struct bpf_mem_ptr p = { ptr, 123 };
bpf_whatever_helper(&p, ...);

bpf_whatever_helper() will specify that the first argument has to be
PTR_TO_BTF_ID where btf_id corresponds to struct bpf_mem_ptr. Hope
this helps.