bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"Andrii Nakryiko" <andriin@fb.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Jesper Dangaard Brouer" <brouer@redhat.com>,
	"Andrey Ignatov" <rdna@fb.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
Date: Fri, 27 Mar 2020 12:42:20 -0700	[thread overview]
Message-ID: <CAEf4BzZ07Jfttmnnax3910vsSB9AWfUsswVibDhuwZNKZMOoRw@mail.gmail.com> (raw)
In-Reply-To: <CACAyw98yYE+eOx5OayyN2tNQeNqFXnHdRGSv6DYX7ehfMHt1+g@mail.gmail.com>

On Fri, Mar 27, 2020 at 4:07 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Thu, 26 Mar 2020 at 19:06, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Mar 26, 2020 at 5:35 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> > >
> > > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> > >
> > > > Now for XDP. It has same flawed model. And even if it seems to you
> > > > that it's not a big issue, and even if Jakub thinks we are trying to
> > > > solve non-existing problem, it is a real problem and a real concern
> > > > from people that have to support XDP in production with many
> > > > well-meaning developers developing BPF applications independently.
> > > > Copying what you wrote in another thread:
> > > >
> > > >> Setting aside the question of which is the best abstraction to represent
> > > >> an attachment, it seems to me that the actual behavioural problem (XDP
> > > >> programs being overridden by mistake) would be solvable by this patch,
> > > >> assuming well-behaved userspace applications.
> > > >
> > > > ... this is a horrible and unrealistic assumption that we just cannot
> > > > make and accept. However well-behaved userspace applications are, they
> > > > are written by people that make mistakes. And rather than blissfully
> > > > expect that everything will be fine, we want to have enforcements in
> > > > place that will prevent some buggy application to wreck havoc in
> > > > production.
> > >
> > > Look, I'm not trying to tell you how to managed your internal systems.
> > > I'm just objecting to your assertion that your deployment model is the
> > > only one that can possibly work, and the refusal to consider other
> > > alternatives that comes with it.
> >
> > Your assumption doesn't work for us. Because of that we need something
> > like bpf_link. Existing attachment API doesn't go away and is still
> > supported. Feel free to use existing API. As for EXPECTED_FD API you
> > are adding, it will be up to maintainers to decide, ultimately, I
> > can't block it, even if I wanted to.
> >
> > >
> > > >> You're saying that like we didn't already have the netlink API. We
> > > >> essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
> > > >> this is just adding LINK_UPDATE. It's a straight-forward fix of an
> > > >> existing API; essentially you're saying we should keep the old API in a
> > > >> crippled state in order to promote your (proposed) new API.
> > > >
> > > > This is the fundamental disagreement that we seem to have. XDP's BPF
> > > > program attachment is not in any way equivalent to bpf_link. So no,
> > > > netlink API currently doesn't have anything that's close to bpf_link.
> > > > Let me try to summarize what bpf_link is and what are its fundamental
> > > > properties regardless of type of BPF programs.
> > >
> > > First of all, thank you for this summary; that is very useful!
> >
> > Sure, you're welcome.
> >
> > >
> > > > 1. bpf_link represents a connection (pairing?) of BPF program and some
> > > > BPF hook it is attached to. BPF hook could be perf event, cgroup,
> > > > netdev, etc. It's a completely independent object in itself, along the
> > > > bpf_map and bpf_prog, which has its own lifetime and kernel
> > > > representation. To user-space application it is returned as an
> > > > installed FD, similar to loaded BPF program and BPF map. It is
> > > > important that it's not just a BPF program, because BPF program can be
> > > > attached to multiple BPF hooks (e.g., same XDP program can be attached
> > > > to multiple interface; same kprobe handler can be installed multiple
> > > > times), which means that having BPF program FD isn't enough to
> > > > uniquely represent that one specific BPF program attachment and detach
> > > > it or query it. Having kernel object for this allows to encapsulate
> > > > all these various details of what is attached were and present to
> > > > user-space a single handle (FD) to work with.
> > >
> > > For XDP there is already a unique handle, it's just implicit: Each
> > > netdev can have exactly one XDP program loaded. So I don't really see
> > > how bpf_link adds anything, other than another API for the same thing?
> >
> > I certainly failed to explain things clearly if you are still asking
> > this. See point #2, once you attach bpf_link you can't just replace
> > it. This is what XDP doesn't have right now.
>
> From your description I like bpf_link, because it'll make attachment easier
> to support, and the pinning behaviour also seems nice. I'm really not fussed
> by netlink vs syscall, whatever.

Great, thanks.

>
> However, this behaviour concerns me. It's like Windows not
> letting you delete a file while an application has it opened, which just leads
> to randomly killing programs until you find the right one. It's frustrating
> and counter productive.
>
> You're taking power away from the operator. In your deployment scenario
> this might make sense, but I think it's a really bad model in general. If I am
> privileged I need to be able to exercise that privilege. This means that if
> there is a netdevice in my network namespace, and I have CAP_NET_ADMIN
> or whatever, I can break the association.
>
> So, to be constructive: I'd prefer bpf_link to replace a netlink attachment and
> vice versa. If you need to restrict control, use network namespaces
> to hide the devices, instead of hiding the bpffs.

Alexei mentioned a "nuke" option few times already, that will solve
this. The idea is that human operation should be able to do this, but
not applications, even though they have CAP_NET_ADMIN. I don't know
how exactly interface will look like, but it shouldn't allow
applications just randomly replace bpf_link. There are legitimate use
cases where application has to have CAP_NET_ADMIN and we can't hide
netdevice from them, unfortunately.


>
> >
> > It's a game of picking features/properties in isolation and "we can do
> > this particular thing this different way with what we have". Please,
> > try consider all of it together, it's important. Every single aspect
> > of bpf_link is not that unique, but it's all of them together that
> > matter.
> >
> > >
> > > > 2. Due to having FD associated with bpf_link, it's not possible to
> > > > talk about "owning" bpf_link. If application created link and never
> > > > shared its FD with any other application, it is the sole owner of it.
> > > > But it also means that you can share it, if you need it. Now, once
> > > > application closes FD or app crashes and kernel automatically closes
> > > > that FD, bpf_link refcount is decremented. If it was the last or only
> > > > FD, it will trigger automatica detachment and clean up of that
> > > > particular BPF program attachment. Note, not a clean up of BPF
> > > > program, which can still be attached somewhere else: only that
> > > > particular attachment.
> > >
> > > This behaviour is actually one of my reservations against bpf_link for
> > > XDP: I think that automatically detaching XDP programs when the FD is
> > > closed is very much the wrong behaviour. An XDP program processes
> > > packets, and when loading one I very much expect it to keep doing that
> > > until I explicitly tell it to stop.
> >
> > As you mentioned earlier, "it's not the only one mode". Just like with
> > tracing APIs, you can imagine scripts that would adds their
> > packet-sniffing XDP program temporarily. If they crash, "temporarily"
> > turns into "permanently, but no one knows". This is bad. And again,
> > it's a choice, just with a default to auto-cleanup, because it's safe,
> > even if it requires extra step for applications willing to do
> > permanent XDP attachment.
> >
> > >
> > > > 3. This derives from the concept of ownership of bpf_link. Once
> > > > bpf_link is attached, no other application that doesn't own that
> > > > bpf_link can replace, detach or modify the link. For some cases it
> > > > doesn't matter. E.g., for tracing, all attachment to the same fentry
> > > > trampoline are completely independent. But for other cases this is
> > > > crucial property. E.g., when you attach BPF program in an exclusive
> > > > (single) mode, it means that particular cgroup and any of its children
> > > > cgroups can have any more BPF programs attached. This is important for
> > > > container management systems to enforce invariants and correct
> > > > functioning of the system. Right now it's very easy to violate that -
> > > > you just go and attach your own BPF program, and previous BPF program
> > > > gets automatically detached without original application that put it
> > > > there knowing about this. Chaos ensues after that and real people have
> > > > to deal with this. Which is why existing
> > > > BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding
> > > > bpf_link support.
> > >
> > > I can totally see how having an option to enforce a policy such as
> > > locking out others from installing cgroup BPF programs is useful. But
> > > such an option is just that: policy. So building this policy in as a
> > > fundamental property of the API seems like a bad idea; that is
> > > effectively enforcing policy in the kernel, isn't it?
> >
> > I hope we won't go into a dictionary definition of what "policy" means
> > here :). For me it's about guarantee that kernel gives to user-space.
> > bpf_link doesn't care about dictating policies. If you don't want this
> > guarantee - don't use bpf_link, use direct program attachment. As
> > simple as that. Policy is implemented by user-space application by
> > using APIs with just the right guarantees.
> >
> > >
> > > > Those same folks have similar concern with XDP. In the world where
> > > > container management installs "root" XDP program which other user
> > > > applications can plug into (libxdp use case, right?), it's crucial to
> > > > ensure that this root XDP program is not accidentally overwritten by
> > > > some well-meaning, but not overly cautious developer experimenting in
> > > > his own container with XDP programs. This is where bpf_link ownership
> > > > plays a huge role. Tupperware agent (FB's container management agent)
> > > > would install root XDP program and will hold onto this bpf_link
> > > > without sharing it with other applications. That will guarantee that
> > > > the system will be stable and can't be compromised.
> > >
> > > See this is where we get into "deployment-model specific territory". I
> > > mean, sure, in the "central management daemon" model, it makes sense
> > > that no other applications can replace the XDP program. But, erm, we
> > > already have a mechanism to ensure that: Just don't grant those
> > > applications CAP_NET_ADMIN? So again, bpf_link doesn't really seem to
> > > add anything other than a different way to do the same thing?
> >
> > Because there are still applications that need CAP_NET_ADMIN in order
> > to function (for other reasons than attaching XDP), so it's impossible
> > to enforce with for everyone.
>
> I think I'm missing some context. CAP_NET_ADMIN is trusted by definition,
> so trust these applications to not fiddle with XDP? Are there many of these?
> Are they inside a user namespace or something?
>
> >
> > >
> > > Additionally, in the case where there is *not* a central management
> > > daemon (i.e., what I'm implementing with libxdp), this would be the flow
> > > implemented by the library without bpf_link:
> > >
> > > 1. Query kernel for current BPF prog loaded on $IFACE
> > > 2. Sanity-check that this program is a dispatcher program installed by
> > >    libxdp
> > > 3. Create a new dispatcher program with whatever changes we want to do
> > >    (such as adding another component program).
> > > 4. Atomically replace the old program with the new one using the netlink
> > >    API in this patch series.
> > >
> > > Whereas with bpf_link, it would be:
> > >
> > > 1. Find the pinned bpf_link for $IFACE (e.g., load from
> > >    /sys/fs/bpf/iface-links/$IFNAME).
> >
> > But now you can hide this mount point from containerized
> > root/CAP_NET_ADMIN application, can't you? See the difference? One
> > might think about bpf_link as a fine-grained capability in this sense.
> >
> >
> > > 2. Query kernel for current BPF prog linked to $LINK
> > > 3. Sanity-check that this program is a dispatcher program installed by
> > >    libxdp
> > > 4. Create a new dispatcher program with whatever changes we want to do
> > >    (such as adding another component program).
> > > 5. Atomically replace the old program with the new one using the
> > >    LINK_UPDATE bpf() API.
> > >
> > >
> > > So all this does is add an additional step, and another dependency on
> > > bpffs. And crucially, I really don't see how the "bpf_link is the only
> > > thing that is not fundamentally broken" argument holds up.
> > >
> > > -Toke
> > >
>
>
>
> --
> Lorenz Bauer  |  Systems Engineer
> 6th Floor, County Hall/The Riverside Building, SE1 7PB, UK
>
> www.cloudflare.com

  parent reply	other threads:[~2020-03-27 19:42 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-19 13:13 [PATCH bpf-next 0/4] XDP: Support atomic replacement of XDP interface attachments Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
2020-03-19 22:52   ` Jakub Kicinski
2020-03-20  8:48     ` Toke Høiland-Jørgensen
2020-03-20 17:35       ` Jakub Kicinski
2020-03-20 18:17         ` Toke Høiland-Jørgensen
2020-03-20 18:35           ` Jakub Kicinski
2020-03-20 18:30         ` John Fastabend
2020-03-20 20:24           ` Andrii Nakryiko
2020-03-23 11:24             ` Toke Høiland-Jørgensen
2020-03-23 16:54               ` Jakub Kicinski
2020-03-23 18:14               ` Andrii Nakryiko
2020-03-23 19:23                 ` Toke Høiland-Jørgensen
2020-03-24  1:01                   ` David Ahern
2020-03-24  4:53                     ` Andrii Nakryiko
2020-03-24 20:55                       ` David Ahern
2020-03-24 22:56                         ` Andrii Nakryiko
2020-03-24  5:00                   ` Andrii Nakryiko
2020-03-24 10:57                     ` Toke Høiland-Jørgensen
2020-03-24 18:53                       ` Jakub Kicinski
2020-03-24 22:30                         ` Andrii Nakryiko
2020-03-25  1:25                           ` Jakub Kicinski
2020-03-24 19:22                       ` John Fastabend
2020-03-25  1:36                         ` Alexei Starovoitov
2020-03-25  2:15                           ` Jakub Kicinski
2020-03-25 18:06                             ` Alexei Starovoitov
2020-03-25 18:20                               ` Jakub Kicinski
2020-03-25 19:14                                 ` Alexei Starovoitov
2020-03-25 10:42                           ` Toke Høiland-Jørgensen
2020-03-25 18:11                             ` Alexei Starovoitov
2020-03-25 10:30                         ` Toke Høiland-Jørgensen
2020-03-25 17:56                           ` Alexei Starovoitov
2020-03-24 22:25                       ` Andrii Nakryiko
2020-03-25  9:38                         ` Toke Høiland-Jørgensen
2020-03-25 17:55                           ` Alexei Starovoitov
2020-03-26  0:16                           ` Andrii Nakryiko
2020-03-26  5:13                             ` Jakub Kicinski
2020-03-26 18:09                               ` Andrii Nakryiko
2020-03-26 19:40                               ` Alexei Starovoitov
2020-03-26 20:05                                 ` Edward Cree
2020-03-27 11:09                                   ` Lorenz Bauer
2020-03-27 23:11                                   ` Alexei Starovoitov
2020-03-26 10:04                             ` Lorenz Bauer
2020-03-26 17:47                               ` Jakub Kicinski
2020-03-26 19:45                                 ` Alexei Starovoitov
2020-03-26 18:18                               ` Andrii Nakryiko
2020-03-26 19:53                               ` Alexei Starovoitov
2020-03-27 11:11                                 ` Toke Høiland-Jørgensen
2020-04-02 20:21                                   ` bpf: ability to attach freplace to multiple parents Alexei Starovoitov
2020-04-02 21:23                                     ` Toke Høiland-Jørgensen
2020-04-02 21:54                                       ` Alexei Starovoitov
2020-04-03  8:38                                         ` Toke Høiland-Jørgensen
2020-04-07  1:44                                           ` Alexei Starovoitov
2020-04-07  9:20                                             ` Toke Høiland-Jørgensen
2020-05-12  8:34                                         ` Toke Høiland-Jørgensen
2020-05-12  9:53                                           ` Alan Maguire
2020-05-12 13:02                                             ` Toke Høiland-Jørgensen
2020-05-12 23:18                                             ` Alexei Starovoitov
2020-05-12 23:06                                           ` Alexei Starovoitov
2020-05-13 10:25                                             ` Toke Høiland-Jørgensen
2020-04-02 21:24                                     ` Andrey Ignatov
2020-04-02 22:01                                       ` Alexei Starovoitov
2020-03-26 12:35                             ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
2020-03-26 19:06                               ` Andrii Nakryiko
2020-03-27 11:06                                 ` Lorenz Bauer
2020-03-27 16:12                                   ` David Ahern
2020-03-27 20:10                                     ` Andrii Nakryiko
2020-03-27 23:02                                     ` Alexei Starovoitov
2020-03-30 15:25                                       ` Edward Cree
2020-03-31  3:43                                         ` Alexei Starovoitov
2020-03-31 22:05                                           ` Edward Cree
2020-03-31 22:16                                             ` Alexei Starovoitov
2020-03-27 19:42                                   ` Andrii Nakryiko [this message]
2020-03-27 19:45                                   ` Andrii Nakryiko
2020-03-27 23:09                                   ` Alexei Starovoitov
2020-03-27 11:46                                 ` Toke Høiland-Jørgensen
2020-03-27 20:07                                   ` Andrii Nakryiko
2020-03-27 22:16                                     ` Toke Høiland-Jørgensen
2020-03-27 22:54                                       ` Andrii Nakryiko
2020-03-28  1:09                                         ` Toke Høiland-Jørgensen
2020-03-28  1:44                                           ` Andrii Nakryiko
2020-03-28 19:43                                             ` Toke Høiland-Jørgensen
2020-03-26 19:58                               ` Alexei Starovoitov
2020-03-27 12:06                                 ` Toke Høiland-Jørgensen
2020-03-27 23:00                                   ` Alexei Starovoitov
2020-03-28  1:43                                     ` Toke Høiland-Jørgensen
2020-03-28  2:26                                       ` Alexei Starovoitov
2020-03-28 19:34                                         ` Toke Høiland-Jørgensen
2020-03-28 23:35                                           ` Alexei Starovoitov
2020-03-29 10:39                                             ` Toke Høiland-Jørgensen
2020-03-29 19:26                                               ` Alexei Starovoitov
2020-03-30 10:19                                                 ` Toke Høiland-Jørgensen
2020-03-29 20:23                                           ` Andrii Nakryiko
2020-03-30 13:53                                             ` Toke Høiland-Jørgensen
2020-03-30 20:17                                               ` Andrii Nakryiko
2020-03-31 10:13                                                 ` Toke Høiland-Jørgensen
2020-03-31 13:48                                                   ` Daniel Borkmann
2020-03-31 15:00                                                     ` Toke Høiland-Jørgensen
2020-03-31 20:19                                                       ` Andrii Nakryiko
2020-03-31 20:15                                                     ` Andrii Nakryiko
2020-03-30 15:41                                             ` Edward Cree
2020-03-30 19:13                                               ` Jakub Kicinski
2020-03-31  4:01                                               ` Alexei Starovoitov
2020-03-31 11:34                                                 ` Toke Høiland-Jørgensen
2020-03-31 18:52                                                   ` Alexei Starovoitov
2020-03-20 20:30       ` Daniel Borkmann
2020-03-20 20:40         ` Daniel Borkmann
2020-03-20 21:30           ` Jakub Kicinski
2020-03-20 21:55             ` Daniel Borkmann
2020-03-20 23:35               ` Jakub Kicinski
2020-03-20 20:39       ` Andrii Nakryiko
2020-03-23 11:25         ` Toke Høiland-Jørgensen
2020-03-23 18:07           ` Andrii Nakryiko
2020-03-23 23:54           ` Andrey Ignatov
2020-03-24 10:16             ` Toke Høiland-Jørgensen
2020-03-20  2:13   ` Yonghong Song
2020-03-20  8:48     ` Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 2/4] tools: Add EXPECTED_FD-related definitions in if_link.h Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 3/4] libbpf: Add function to set link XDP fd while specifying old fd Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for attaching XDP programs Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzZ07Jfttmnnax3910vsSB9AWfUsswVibDhuwZNKZMOoRw@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=rdna@fb.com \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).