From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Yonghong Song <yhs@fb.com>
Cc: Andrii Nakryiko <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Kernel Team <kernel-team@fb.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links
Date: Fri, 30 Jul 2021 10:48:35 -0700 [thread overview]
Message-ID: <CAEf4BzbR=m3Qusth-1JU_E5YMYaoxrNom9tS_pcArsHyiBD85w@mail.gmail.com> (raw)
In-Reply-To: <5ffd3338-fe76-2080-13a9-5102917a434a@fb.com>
On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/29/21 9:31 PM, Andrii Nakryiko wrote:
> > On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 7/26/21 9:12 AM, Andrii Nakryiko wrote:
> >>> Add ability for users to specify custom u64 value when creating BPF link for
> >>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints).
> >>>
> >>> This is useful for cases when the same BPF program is used for attaching and
> >>> processing invocation of different tracepoints/kprobes/uprobes in a generic
> >>> fashion, but such that each invocation is distinguished from each other (e.g.,
> >>> BPF program can look up additional information associated with a specific
> >>> kernel function without having to rely on function IP lookups). This enables
> >>> new use cases to be implemented simply and efficiently that previously were
> >>> possible only through code generation (and thus multiple instances of almost
> >>> identical BPF program) or compilation at runtime (BCC-style) on target hosts
> >>> (even more expensive resource-wise). For uprobes it is not even possible in
> >>> some cases to know function IP before hand (e.g., when attaching to shared
> >>> library without PID filtering, in which case base load address is not known
> >>> for a library).
> >>>
> >>> This is done by storing u64 user_ctx in struct bpf_prog_array_item,
> >>> corresponding to each attached and run BPF program. Given cgroup BPF programs
> >>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't
> >>> have (yet?) support for user_ctx, reuse that space through union of
> >>> cgroup_storage and new user_ctx field.
> >>>
> >>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
> >>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
> >>> program execution code, which luckily is now also split from
> >>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
> >>> giving access to this user context value from inside a BPF program. Generic
> >>> perf_event BPF programs will access this value from perf_event itself through
> >>> passed in BPF program context.
> >>>
> >>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >>> ---
> >>> drivers/media/rc/bpf-lirc.c | 4 ++--
> >>> include/linux/bpf.h | 16 +++++++++++++++-
> >>> include/linux/perf_event.h | 1 +
> >>> include/linux/trace_events.h | 6 +++---
> >>> include/uapi/linux/bpf.h | 7 +++++++
> >>> kernel/bpf/core.c | 29 ++++++++++++++++++-----------
> >>> kernel/bpf/syscall.c | 2 +-
> >>> kernel/events/core.c | 21 ++++++++++++++-------
> >>> kernel/trace/bpf_trace.c | 8 +++++---
> >>> tools/include/uapi/linux/bpf.h | 7 +++++++
> >>> 10 files changed, 73 insertions(+), 28 deletions(-)
> >>>
> >>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
> >>> index afae0afe3f81..7490494273e4 100644
> >>> --- a/drivers/media/rc/bpf-lirc.c
> >>> +++ b/drivers/media/rc/bpf-lirc.c
> >>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
> >>> goto unlock;
> >>> }
> >>>
> >>> - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
> >>> + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array);
> >>> if (ret < 0)
> >>> goto unlock;
> >>>
> >> [...]
> >>> void bpf_trace_run1(struct bpf_prog *prog, u64 arg1);
> >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>> index 00b1267ab4f0..bc1fd54a8f58 100644
> >>> --- a/include/uapi/linux/bpf.h
> >>> +++ b/include/uapi/linux/bpf.h
> >>> @@ -1448,6 +1448,13 @@ union bpf_attr {
> >>> __aligned_u64 iter_info; /* extra bpf_iter_link_info */
> >>> __u32 iter_info_len; /* iter_info length */
> >>> };
> >>> + struct {
> >>> + /* black box user-provided value passed through
> >>> + * to BPF program at the execution time and
> >>> + * accessible through bpf_get_user_ctx() BPF helper
> >>> + */
> >>> + __u64 user_ctx;
> >>> + } perf_event;
> >>
> >> Is it possible to fold this field into previous union?
> >>
> >> union {
> >> __u32 target_btf_id; /* btf_id of
> >> target to attach to */
> >> struct {
> >> __aligned_u64 iter_info; /*
> >> extra bpf_iter_link_info */
> >> __u32 iter_info_len; /*
> >> iter_info length */
> >> };
> >> };
> >>
> >>
> >
> > I didn't want to do it, because different types of BPF links will
> > accept this user_ctx (or now bpf_cookie). And then we'll have to have
> > different locations of that field for different types of links.
> >
> > For example, when/if we add this user_ctx to BPF iterator programs,
> > having __u64 user_ctx in the same anonymous union will make it overlap
> > with iter_info, which is a problem. So I want to have a link
> > type-specific sections in LINK_CREATE command section, to allow the
> > same field name at different locations.
> >
> > I actually think that we should put iter_info/iter_info_len into a
> > named field, like this (also added user_ctx for bpf_iter link as a
> > demonstration):
> >
> > struct {
> > __aligned_u64 info;
> > __u32 info_len;
> > __aligned_u64 user_ctx; /* see how it's at a different offset
> > than perf_event.user_ctx */
> > } iter;
> > struct {
> > __u64 user_ctx;
> > } perf_event;
> >
> > (of course keeping already existing fields in anonymous struct for
> > backwards compatibility)
>
> Okay, then since user_ctx may be used by many link types. How
> about just with the field "user_ctx" without struct perf_event.
I'd love to do it because it is indeed generic and common field, like
target_fd. But I'm not sure what you are proposing below. Where
exactly that user_ctx (now called bpf_cookie) goes in your example? I
see few possible options that allow preserving ABI backwards
compatibility. Let's see if you and everyone else likes any of those
better. I'll use the full LINK_CREATE sub-struct definition from
bpf_attr to make it clear. And to demonstrate how this can be extended
to bpf_iter in the future, please note this part as this is an
important aspect.
1. Full backwards compatibility and per-link type sections (my current
approach):
struct { /* struct used by BPF_LINK_CREATE command */
__u32 prog_fd;
union {
__u32 target_fd;
__u32 target_ifindex;
};
__u32 attach_type;
__u32 flags;
union {
__u32 target_btf_id;
struct {
__aligned_u64 iter_info;
__u32 iter_info_len;
};
struct {
__u64 bpf_cookie;
} perf_event;
struct {
__aligned_u64 info;
__u32 info_len;
__aligned_u64 bpf_cookie;
} iter;
};
} link_create;
The good property here is that we can keep easily extending link
type-specific sections with extra fields where needed. For common
stuff like bpf_cookie it's suboptimal because we'll need to duplicate
field definition in each struct inside that union, but I think that's
fine. From end-user point of view, they will know which type of link
they are creating, so the use will be straightforward. This is why I
went with this approach. But let's consider alternatives.
2. Non-backwards compatible layout but extra flag to specify that new
field layout is used.
struct { /* struct used by BPF_LINK_CREATE command */
__u32 prog_fd;
union {
__u32 target_fd;
__u32 target_ifindex;
};
__u32 attach_type;
__u32 flags; /* this will start supporting
some new flag like BPF_F_LINK_CREATE_NEW */
__u64 bpf_cookie; /* common field now */
union { /* this parts is effectively deprecated now */
__u32 target_btf_id;
struct {
__aligned_u64 iter_info;
__u32 iter_info_len;
};
struct { /* this is new layout, but needs
BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */
__u64 bpf_cookie;
union {
struct {
__u32 target_btf_id;
} ext;
struct {
__aligned_u64 info;
__u32 info_len;
} iter;
}
}
};
} link_create;
This makes bpf_cookie a common field, but at least for EXT (freplace/)
and ITER (bpf_iter/) links we need to specify extra flag to specify
that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter
then will use iter.info and iter.info_len, and can use plain
bpf_cookie.
IMO, this is way too confusing and a maintainability nightmare.
I'm trying to guess what you are proposing, I can read it two ways,
but let me know if I missed something.
3. Just add bpf_cookie field before link type-specific section.
struct { /* struct used by BPF_LINK_CREATE command */
__u32 prog_fd;
union {
__u32 target_fd;
__u32 target_ifindex;
};
__u32 attach_type;
__u32 flags;
__u64 bpf_cookie; // <<<<<<<<<< HERE
union {
__u32 target_btf_id;
struct {
__aligned_u64 iter_info;
__u32 iter_info_len;
};
};
} link_create;
This looks really nice and would be great, but that changes offsets
for target_btf_id/iter_info/iter_info_len, so a no go. The only way to
rectify this is what proposal #2 above does with an extra flag.
4. Add bpf_cookie after link-type specific part:
struct { /* struct used by BPF_LINK_CREATE command */
__u32 prog_fd;
union {
__u32 target_fd;
__u32 target_ifindex;
};
__u32 attach_type;
__u32 flags;
union {
__u32 target_btf_id;
struct {
__aligned_u64 iter_info;
__u32 iter_info_len;
};
struct {
};
__u64 bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE
} link_create;
This could work. But we are wasting 16 bytes currently used for
target_btf_id/iter_info/iter_info_len. If we later need to do
something link type-specific, we can add it to the existing union if
we need <= 16 bytes, otherwise we'll need to start another union after
bpf_cookie, splitting this into two link type-specific sections.
Overall, this might work, especially assuming we won't need to extend
iter-specific portions. But I really hate that we didn't do named
structs inside that union (i.e., ext.target_btf_id and
iter.info/iter.info_len) and I'd like to rectify that in the follow up
patches with named structs duplicating existing field layout, but with
proper naming. But splitting this LINK_CREATE bpf_attr part into two
unions would make it hard and awkward in the future.
So, thoughts? Did you have something else in mind that I missed?
> Sometime like
>
> __u64 user_ctx;
>
> instead of
>
> struct {
> __u64 user_ctx;
> } perf_event;
>
> >
> > I decided to not do that in this patch set, though, to not distract
> > from the main goal. But I think we should avoid this shared field
> > "namespace" across different link types going forward.
> >
> >
> >>> };
> >>> } link_create;
> >>>
> >> [...]
next prev parent reply other threads:[~2021-07-30 17:48 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-26 16:11 [PATCH v2 bpf-next 00/14] BPF perf link and user-provided context value Andrii Nakryiko
2021-07-26 16:11 ` [PATCH v2 bpf-next 01/14] bpf: refactor BPF_PROG_RUN into a function Andrii Nakryiko
2021-07-29 16:49 ` Yonghong Song
2021-07-30 4:05 ` Andrii Nakryiko
2021-07-26 16:11 ` [PATCH v2 bpf-next 02/14] bpf: refactor BPF_PROG_RUN_ARRAY family of macros into functions Andrii Nakryiko
2021-07-29 17:04 ` Yonghong Song
2021-07-26 16:12 ` [PATCH v2 bpf-next 03/14] bpf: refactor perf_event_set_bpf_prog() to use struct bpf_prog input Andrii Nakryiko
2021-07-27 8:48 ` Peter Zijlstra
2021-07-29 17:09 ` Yonghong Song
2021-07-26 16:12 ` [PATCH v2 bpf-next 04/14] bpf: implement minimal BPF perf link Andrii Nakryiko
2021-07-27 9:04 ` Peter Zijlstra
2021-07-30 4:23 ` Andrii Nakryiko
2021-07-27 9:12 ` Peter Zijlstra
2021-07-27 20:56 ` Andrii Nakryiko
2021-07-27 15:40 ` Jiri Olsa
2021-07-27 20:56 ` Andrii Nakryiko
2021-07-29 17:35 ` Yonghong Song
2021-07-30 4:16 ` Andrii Nakryiko
2021-07-30 5:42 ` Yonghong Song
2021-07-26 16:12 ` [PATCH v2 bpf-next 05/14] bpf: allow to specify user-provided context value for BPF perf links Andrii Nakryiko
2021-07-27 9:11 ` Peter Zijlstra
2021-07-27 21:09 ` Andrii Nakryiko
2021-07-28 8:58 ` Peter Zijlstra
2021-07-29 18:00 ` Yonghong Song
2021-07-30 4:31 ` Andrii Nakryiko
2021-07-30 5:49 ` Yonghong Song
2021-07-30 17:48 ` Andrii Nakryiko [this message]
2021-07-30 21:34 ` Yonghong Song
2021-07-30 22:06 ` Andrii Nakryiko
2021-07-30 22:28 ` Yonghong Song
2021-07-26 16:12 ` [PATCH v2 bpf-next 06/14] bpf: add bpf_get_user_ctx() BPF helper to access user_ctx value Andrii Nakryiko
2021-07-29 18:17 ` Yonghong Song
2021-07-30 4:49 ` Andrii Nakryiko
2021-07-30 5:53 ` Yonghong Song
2021-07-26 16:12 ` [PATCH v2 bpf-next 07/14] libbpf: re-build libbpf.so when libbpf.map changes Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 08/14] libbpf: remove unused bpf_link's destroy operation, but add dealloc Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 09/14] libbpf: use BPF perf link when supported by kernel Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 10/14] libbpf: add user_ctx support to bpf_link_create() API Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 11/14] libbpf: add user_ctx to perf_event, kprobe, uprobe, and tp attach APIs Andrii Nakryiko
2021-07-30 1:11 ` Rafael David Tinoco
2021-07-26 16:12 ` [PATCH v2 bpf-next 12/14] selftests/bpf: test low-level perf BPF link API Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 13/14] selftests/bpf: extract uprobe-related helpers into trace_helpers.{c,h} Andrii Nakryiko
2021-07-26 16:12 ` [PATCH v2 bpf-next 14/14] selftests/bpf: add user_ctx selftests for high-level APIs Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAEf4BzbR=m3Qusth-1JU_E5YMYaoxrNom9tS_pcArsHyiBD85w@mail.gmail.com' \
--to=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=peterz@infradead.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).