bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Alexei Starovoitov" <ast@fb.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii.nakryiko@gmail.com>,
	"Andrii Nakryiko" <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	"Kernel Team" <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction
Date: Wed, 4 Mar 2020 17:07:08 -0800	[thread overview]
Message-ID: <20200305010706.dk7zedpyj5pb5jcv@ast-mbp> (raw)
In-Reply-To: <20200304132439.6abadbe3@kicinski-fedora-PC1C0HJN>

On Wed, Mar 04, 2020 at 01:24:39PM -0800, Jakub Kicinski wrote:
> On Wed, 4 Mar 2020 12:45:07 -0800 Alexei Starovoitov wrote:
> > On Wed, Mar 04, 2020 at 11:41:58AM -0800, Jakub Kicinski wrote:
> > > On Tue, 3 Mar 2020 20:36:45 -0800 Alexei Starovoitov wrote:  
> > > > > > libxdp can choose to pin it in some libxdp specific location, so other
> > > > > > libxdp-enabled applications can find it in the same location, detach,
> > > > > > replace, modify, but random app that wants to hack an xdp prog won't
> > > > > > be able to mess with it.    
> > > > > 
> > > > > What if that "random app" comes first, and keeps holding on to the link
> > > > > fd? Then the admin essentially has to start killing processes until they
> > > > > find the one that has the device locked, no?    
> > > > 
> > > > Of course not. We have to provide an api to make it easy to discover
> > > > what process holds that link and where it's pinned.  
> > > 
> > > That API to discover ownership would be useful but it's on the BPF side.  
> > 
> > it's on bpf side because it's bpf specific.
> > 
> > > We have netlink notifications in networking world. The application
> > > which doesn't want its program replaced should simply listen to the
> > > netlink notifications and act if something goes wrong.  
> > 
> > instead of locking the bike let's setup a camera and monitor the bike
> > when somebody steals it.
> > and then what? chase the thief and bring the bike back?
> 
> :) Is the bike the BPF program? It's more like thief is stealing our
> parking spot, we still have the program :)

yeah. parking spot is a better analogy.

> Maybe also the thief should not have CAP_ADMIN in the first place?
> And ask a daemon to perform its actions..

a daemon idea keeps coming back in circles.
With FD-based kprobe/uprobe/tracepoint/fexit/fentry that problem is gone,
but xdp, tc, cgroup still don't have the owner concept.
Some people argued that these three need three separate daemons.
Especially since cgroups are mainly managed by systemd plus container
manager it's quite different from networking (xdp, tc) where something
like 'networkd' might makes sense.
But if you take this line of thought all the ways systemd should be that
single daemon to coordinate attaching to xdp, tc, cgroup because
in many cases cgroup and tc progs have to coordinate the work.
At that's where it's getting gloomy... unless the kernel can provide
a facility so central daemon is not necessary.

> > current xdp, tc, cgroup apis don't have the concept of the link
> > and owner of that link.
> 
> Why do the attachment points have to have a concept of an owner and 
> not the program itself?

bpf program is an object. That object has an owner or multiple owners.
A user process that holds a pointer to that object is a shared owner.
FD is such pointer. FD == std::shared_ptr<bpf_prog>.
Holding that pointer guarantees that <bpf_prog> will not disappear,
but it says nothing that the program will keep running.
For [ku]probe,tp,fentry,fexit there was always <bpf_link> in the kernel.
It wasn't that formal in the past until most recent Andrii's patches,
but the concept existed for long time. FD == std::shared_ptr<bpf_link>
connects a kernel object with <bpf_prog>. When that kernel objects emits
an event the <bpf_link> guarantees that <bpf_prog> will be executed.

For cgroups we don't have such concept. We thought that three attach modes we
introduced (default, allow-override, allow-multi) will cover all use cases. But
in practice turned out that it only works when there is a central daemon for
_all_ cgroup-bpf progs in the system otherwise different processes step on each
other. More so there has to be a central diff-review human authority otherwise
teams step on each other. That's sort-of works within one org, but doesn't
scale.

To avoid making systemd a central place to coordinate attaching xdp, tc, cgroup
progs the kernel has to provide a mechanism for an application to connect a
kernel object with a prog and hold the ownership of that link so that no other
process in the system can break that connection. That kernel object is cgroup,
qdisc, netdev. Interesting question comes when that object disappears. What to
do with the link? Two ways to solve it:
1. make link hold the object, so it cannot be removed.
2. destroy the link when object goes away.
Both have pros and cons as I mentioned earlier.
And that's what's to be decided.
I think the truth is somewhat in the middle. The link has to hold the object,
so it doesn't disappear from under it, but get notified on deletion, so the
link can be self destroyed. From the user point of view the execution guarantee
is still preserved. The kernel object was removed and the link has one dangling
side. Note this behavior is vastly different from existing xdp, tc, cgroup
behavior where both object and bpf prog can be alive, but connection is gone
and execution guarantee is broken.

  reply	other threads:[~2020-03-05  1:07 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 22:39 [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 1/3] bpf: introduce pinnable bpf_link abstraction Andrii Nakryiko
2020-03-02 10:13   ` Toke Høiland-Jørgensen
2020-03-02 18:06     ` Andrii Nakryiko
2020-03-02 21:40       ` Toke Høiland-Jørgensen
2020-03-02 23:37         ` Andrii Nakryiko
2020-03-03  2:50   ` Alexei Starovoitov
2020-03-03  4:18     ` Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 2/3] libbpf: add bpf_link pinning/unpinning Andrii Nakryiko
2020-03-02 10:16   ` Toke Høiland-Jørgensen
2020-03-02 18:09     ` Andrii Nakryiko
2020-03-02 21:45       ` Toke Høiland-Jørgensen
2020-02-28 22:39 ` [PATCH bpf-next 3/3] selftests/bpf: add link pinning selftests Andrii Nakryiko
2020-03-02 10:11 ` [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Toke Høiland-Jørgensen
2020-03-02 18:05   ` Andrii Nakryiko
2020-03-02 22:24     ` Toke Høiland-Jørgensen
2020-03-02 23:35       ` Andrii Nakryiko
2020-03-03  8:12         ` Toke Høiland-Jørgensen
2020-03-03  8:12       ` Daniel Borkmann
2020-03-03 15:46         ` Alexei Starovoitov
2020-03-03 19:23           ` Daniel Borkmann
2020-03-03 19:46             ` Andrii Nakryiko
2020-03-03 20:24               ` Toke Høiland-Jørgensen
2020-03-03 20:53                 ` Daniel Borkmann
2020-03-03 22:01                   ` Alexei Starovoitov
2020-03-03 22:27                     ` Toke Høiland-Jørgensen
2020-03-04  4:36                       ` Alexei Starovoitov
2020-03-04  7:47                         ` Toke Høiland-Jørgensen
2020-03-04 15:47                           ` Alexei Starovoitov
2020-03-05 10:37                             ` Toke Høiland-Jørgensen
2020-03-05 16:34                               ` Alexei Starovoitov
2020-03-05 22:34                                 ` Daniel Borkmann
2020-03-05 22:50                                   ` Alexei Starovoitov
2020-03-05 23:42                                     ` Daniel Borkmann
2020-03-06  8:31                                       ` Toke Høiland-Jørgensen
2020-03-06 10:25                                         ` Daniel Borkmann
2020-03-06 10:42                                           ` Toke Høiland-Jørgensen
2020-03-06 18:09                                           ` David Ahern
2020-03-04 19:41                         ` Jakub Kicinski
2020-03-04 20:45                           ` Alexei Starovoitov
2020-03-04 21:24                             ` Jakub Kicinski
2020-03-05  1:07                               ` Alexei Starovoitov [this message]
2020-03-05  8:16                                 ` Jakub Kicinski
2020-03-05 11:05                                   ` Toke Høiland-Jørgensen
2020-03-05 18:13                                     ` Jakub Kicinski
2020-03-09 11:41                                       ` Toke Høiland-Jørgensen
2020-03-09 18:50                                         ` Jakub Kicinski
2020-03-10 12:22                                           ` Toke Høiland-Jørgensen
2020-03-05 16:39                                   ` Alexei Starovoitov
2020-03-03 22:40                 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200305010706.dk7zedpyj5pb5jcv@ast-mbp \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).