bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction
Date: Mon, 2 Mar 2020 10:05:22 -0800	[thread overview]
Message-ID: <CAEf4BzZGn9FcUdEOSR_ouqSNvzY2AdJA=8ffMV5mTmJQS-10VA@mail.gmail.com> (raw)
In-Reply-To: <87mu8zt6a8.fsf@toke.dk>

On Mon, Mar 2, 2020 at 2:12 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andriin@fb.com> writes:
>
> > This patch series adds bpf_link abstraction, analogous to libbpf's already
> > existing bpf_link abstraction. This formalizes and makes more uniform existing
> > bpf_link-like BPF program link (attachment) types (raw tracepoint and tracing
> > links), which are FD-based objects that are automatically detached when last
> > file reference is closed. These types of BPF program links are switched to
> > using bpf_link framework.
> >
> > FD-based bpf_link approach provides great safety guarantees, by ensuring there
> > is not going to be an abandoned BPF program attached, if user process suddenly
> > exits or forgets to clean up after itself. This is especially important in
> > production environment and is what all the recent new BPF link types followed.
> >
> > One of the previously existing  inconveniences of FD-based approach, though,
> > was the scenario in which user process wants to install BPF link and exit, but
> > let attached BPF program run. Now, with bpf_link abstraction in place, it's
> > easy to support pinning links in BPF FS, which is done as part of the same
> > patch #1. This allows FD-based BPF program links to survive exit of a user
> > process and original file descriptor being closed, by creating an file entry
> > in BPF FS. This provides great safety by default, with simple way to opt out
> > for cases where it's needed.
>
> While being able to pin the fds returned by bpf_raw_tracepoint_open()
> certainly helps, I still feel like this is the wrong abstraction for
> freplace(): When I'm building a program using freplace to put in new
> functions (say, an XDP multi-prog dispatcher :)), I really want the
> 'new' functions (i.e., the freplace'd bpf_progs) to share their lifetime
> with the calling BPF program. I.e., I want to be able to do something
> like:

freplace programs will take refcount on a BPF program they are
replacing, so in that sense they do share lifetime, except dependency
is opposite to what you describe: rootlet/dispatcher program can't go
away as long it has at least one freplace program attached. It
(dispatcher) might get detached, though, but freplace, technically,
will still be attached to now-detached dispatcher (so won't be
invoked, yet still attached). I hope that makes sense :)

>
> prog_fd = sys_bpf(BPF_PROG_LOAD, ...); // dispatcher
> func_fd = sys_bpf(BPF_PROG_LOAD, ...); // replacement func
> err = sys_bpf(BPF_PROG_REPLACE_FUNC, prog_fd, btf_id, func_fd); // does *not* return an fd
>
> That last call should make the ref-counting be in the prog_fd -> func_fd
> direction, so that when prog_fd is released, it will do
> bpf_prog_put(func_fd). There could be an additional call like
> sys_bpf(BPF_PROG_REPLACE_FUNC_DETACH, prog_fd, btf_id) for explicit
> detach as well, of course.

Taking this additional refcount will create a dependency loop (see
above), so that's why it wasn't done, I think.

With FD-based bpf_link, though, you'll be able to "transfer ownership"
from application that installed freplace program in the first place,
to the program that eventually will unload/replace dispatcher BPF
program. You do that by pinning freplace program in BPFFS location,
that's known to this libxdp library, and when you need to detach and
unload XDP dispatcher and overriden XDP programs, the "admin process"
which manages XDP dispatcher, will be able to just go and unpin and
detach everything, if necessary.

>
> With such an API, lifecycle management for an XDP program keeps being
> obvious: There's an fd for the root program attached to the interface,
> and that's it. When that is released the whole thing disappears. Whereas
> with the bpf_raw_tracepoint_open() API, the userspace program suddenly
> has to make sure all the component function FDs are pinned, which seems
> cumbersome and error-prone...

I thought that's what libxdp is supposed to do (among other things).
So for user applications it will be all hidden inside the library API,
no?

>
> I'll try to propose patches for what this could look like; I think it
> could co-exist with this bpf_link abstraction, though, so no need to
> hold up this series...

Yeah, either way, this is important and is desired behavior not just
for freplace cases.

>
> -Toke
>

  reply	other threads:[~2020-03-02 18:05 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 22:39 [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 1/3] bpf: introduce pinnable bpf_link abstraction Andrii Nakryiko
2020-03-02 10:13   ` Toke Høiland-Jørgensen
2020-03-02 18:06     ` Andrii Nakryiko
2020-03-02 21:40       ` Toke Høiland-Jørgensen
2020-03-02 23:37         ` Andrii Nakryiko
2020-03-03  2:50   ` Alexei Starovoitov
2020-03-03  4:18     ` Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 2/3] libbpf: add bpf_link pinning/unpinning Andrii Nakryiko
2020-03-02 10:16   ` Toke Høiland-Jørgensen
2020-03-02 18:09     ` Andrii Nakryiko
2020-03-02 21:45       ` Toke Høiland-Jørgensen
2020-02-28 22:39 ` [PATCH bpf-next 3/3] selftests/bpf: add link pinning selftests Andrii Nakryiko
2020-03-02 10:11 ` [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Toke Høiland-Jørgensen
2020-03-02 18:05   ` Andrii Nakryiko [this message]
2020-03-02 22:24     ` Toke Høiland-Jørgensen
2020-03-02 23:35       ` Andrii Nakryiko
2020-03-03  8:12         ` Toke Høiland-Jørgensen
2020-03-03  8:12       ` Daniel Borkmann
2020-03-03 15:46         ` Alexei Starovoitov
2020-03-03 19:23           ` Daniel Borkmann
2020-03-03 19:46             ` Andrii Nakryiko
2020-03-03 20:24               ` Toke Høiland-Jørgensen
2020-03-03 20:53                 ` Daniel Borkmann
2020-03-03 22:01                   ` Alexei Starovoitov
2020-03-03 22:27                     ` Toke Høiland-Jørgensen
2020-03-04  4:36                       ` Alexei Starovoitov
2020-03-04  7:47                         ` Toke Høiland-Jørgensen
2020-03-04 15:47                           ` Alexei Starovoitov
2020-03-05 10:37                             ` Toke Høiland-Jørgensen
2020-03-05 16:34                               ` Alexei Starovoitov
2020-03-05 22:34                                 ` Daniel Borkmann
2020-03-05 22:50                                   ` Alexei Starovoitov
2020-03-05 23:42                                     ` Daniel Borkmann
2020-03-06  8:31                                       ` Toke Høiland-Jørgensen
2020-03-06 10:25                                         ` Daniel Borkmann
2020-03-06 10:42                                           ` Toke Høiland-Jørgensen
2020-03-06 18:09                                           ` David Ahern
2020-03-04 19:41                         ` Jakub Kicinski
2020-03-04 20:45                           ` Alexei Starovoitov
2020-03-04 21:24                             ` Jakub Kicinski
2020-03-05  1:07                               ` Alexei Starovoitov
2020-03-05  8:16                                 ` Jakub Kicinski
2020-03-05 11:05                                   ` Toke Høiland-Jørgensen
2020-03-05 18:13                                     ` Jakub Kicinski
2020-03-09 11:41                                       ` Toke Høiland-Jørgensen
2020-03-09 18:50                                         ` Jakub Kicinski
2020-03-10 12:22                                           ` Toke Høiland-Jørgensen
2020-03-05 16:39                                   ` Alexei Starovoitov
2020-03-03 22:40                 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzZGn9FcUdEOSR_ouqSNvzY2AdJA=8ffMV5mTmJQS-10VA@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).