From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>,
Jakub Kicinski <kuba@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>, Andrii Nakryiko <andriin@fb.com>,
"David S. Miller" <davem@davemloft.net>,
Jesper Dangaard Brouer <brouer@redhat.com>,
Lorenz Bauer <lmb@cloudflare.com>, Andrey Ignatov <rdna@fb.com>,
Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP
Date: Wed, 25 Mar 2020 10:38:32 +0100 [thread overview]
Message-ID: <87369wrcyv.fsf@toke.dk> (raw)
In-Reply-To: <CAEf4BzY1bs5WRsvr5UbfqV9UKnwxmCUa9NQ6FWirT2uREaj7_g@mail.gmail.com>
Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> On Tue, Mar 24, 2020 at 3:57 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Mon, Mar 23, 2020 at 12:23 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >>
>> >> > On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >>
>> >> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >> >>
>> >> >> > On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
>> >> >> > <john.fastabend@gmail.com> wrote:
>> >> >> >>
>> >> >> >> Jakub Kicinski wrote:
>> >> >> >> > On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
>> >> >> >> > > Jakub Kicinski <kuba@kernel.org> writes:
>> >> >> >> > > > On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
>> >> >> >> > > >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >> >> > > >>
>> >> >> >> > > >> While it is currently possible for userspace to specify that an existing
>> >> >> >> > > >> XDP program should not be replaced when attaching to an interface, there is
>> >> >> >> > > >> no mechanism to safely replace a specific XDP program with another.
>> >> >> >> > > >>
>> >> >> >> > > >> This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
>> >> >> >> > > >> set along with IFLA_XDP_FD. If set, the kernel will check that the program
>> >> >> >> > > >> currently loaded on the interface matches the expected one, and fail the
>> >> >> >> > > >> operation if it does not. This corresponds to a 'cmpxchg' memory operation.
>> >> >> >> > > >>
>> >> >> >> > > >> A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
>> >> >> >> > > >> request checking of the EXPECTED_FD attribute. This is needed for userspace
>> >> >> >> > > >> to discover whether the kernel supports the new attribute.
>> >> >> >> > > >>
>> >> >> >> > > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> >> >> > > >
>> >> >> >> > > > I didn't know we wanted to go ahead with this...
>> >> >> >> > >
>> >> >> >> > > Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
>> >> >> >> > > happening with that, though. So since this is a straight-forward
>> >> >> >> > > extension of the existing API, that doesn't carry a high implementation
>> >> >> >> > > cost, I figured I'd just go ahead with this. Doesn't mean we can't have
>> >> >> >> > > something similar in bpf_link as well, of course.
>> >> >> >> >
>> >> >> >> > I'm not really in the loop, but from what I overheard - I think the
>> >> >> >> > bpf_link may be targeting something non-networking first.
>> >> >> >>
>> >> >> >> My preference is to avoid building two different APIs one for XDP and another
>> >> >> >> for everything else. If we have userlands that already understand links and
>> >> >> >> pinning support is on the way imo lets use these APIs for networking as well.
>> >> >> >
>> >> >> > I agree here. And yes, I've been working on extending bpf_link into
>> >> >> > cgroup and then to XDP. We are still discussing some cgroup-specific
>> >> >> > details, but the patch is ready. I'm going to post it as an RFC to get
>> >> >> > the discussion started, before we do this for XDP.
>> >> >>
>> >> >> Well, my reason for being skeptic about bpf_link and proposing the
>> >> >> netlink-based API is actually exactly this, but in reverse: With
>> >> >> bpf_link we will be in the situation that everything related to a netdev
>> >> >> is configured over netlink *except* XDP.
>> >> >
>> >> > One can argue that everything related to use of BPF is going to be
>> >> > uniform and done through BPF syscall? Given variety of possible BPF
>> >> > hooks/targets, using custom ways to attach for all those many cases is
>> >> > really bad as well, so having a unifying concept and single entry to
>> >> > do this is good, no?
>> >>
>> >> Well, it depends on how you view the BPF subsystem's relation to the
>> >> rest of the kernel, I suppose. I tend to view it as a subsystem that
>> >> provides a bunch of functionality, which you can setup (using "internal"
>> >> BPF APIs), and then attach that object to a different subsystem
>> >> (networking) using that subsystem's configuration APIs.
>> >>
>> >> Seeing as this really boils down to a matter of taste, though, I'm not
>> >> sure we'll find agreement on this :)
>> >
>> > Yeah, seems like so. But then again, your view and reality don't seem
>> > to correlate completely. cgroup, a lot of tracing,
>> > flow_dissector/lirc_mode2 attachments all are done through BPF
>> > syscall.
>>
>> Well, I wasn't talking about any of those subsystems, I was talking
>> about networking :)
>
> So it's not "BPF subsystem's relation to the rest of the kernel" from
> your previous email, it's now only "talking about networking"? Since
> when the rest of the kernel is networking?
Not really, I would likely argue the same for any other subsystem, I
just prefer to limit myself to talking about things I actually know
something about. Hence, networking :)
> But anyways, I think John addressed modern XDP networking issues in
> his email very well already.
Going to reply to that email next...
>> In particular, networking already has a consistent and fairly
>> well-designed configuration mechanism (i.e., netlink) that we are
>> generally trying to move more functionality *towards* not *away from*
>> (see, e.g., converting ethtool to use netlink).
>>
>> > LINK_CREATE provides an opportunity to finally unify all those
>> > different ways to achieve the same "attach my BPF program to some
>> > target object" semantics.
>>
>> Well I also happen to think that "attach a BPF program to an object" is
>> the wrong way to think about XDP. Rather, in my mind the model is
>> "instruct the netdevice to execute this piece of BPF code".
>
> That can't be reconciled, so no point of arguing :) But thinking about
> BPF in general, I think it's closer to attach BPF program thinking
> (especially all the fexit/fentry, kprobe, etc), where objects that BPF
> is attached to is not "active" in the sense of "calling BPF", it's
> more of BPF system setting things up (attaching?) in such a way that
> BPF program is executed when appropriate.
I'd tend to agree with you on most of the tracing stuff, but not on
this. But let's just agree to disagree here :)
>> >> >> Other than that, I don't see any reason why the bpf_link API won't work.
>> >> >> So I guess that if no one else has any problem with BPF insisting on
>> >> >> being a special snowflake, I guess I can live with it as well... *shrugs* :)
>> >> >
>> >> > Apart from derogatory remark,
>> >>
>> >> Yeah, should have left out the 'snowflake' bit, sorry about that...
>> >>
>> >> > BPF is a bit special here, because it requires every potential BPF
>> >> > hook (be it cgroups, xdp, perf_event, etc) to be aware of BPF
>> >> > program(s) and execute them with special macro. So like it or not, it
>> >> > is special and each driver supporting BPF needs to implement this BPF
>> >> > wiring.
>> >>
>> >> All that is about internal implementation, though. I'm bothered by the
>> >> API discrepancy (i.e., from the user PoV we'll end up with: "netlink is
>> >> what you use to configure your netdev except if you want to attach an
>> >> XDP program to it").
>> >>
>> >
>> > See my reply to David. Depends on where you define user API. Is it
>> > libbpf API, which is what most users are using? Or kernel API?
>>
>> Well I'm talking about the kernel<->userspace API, obviously :)
>>
>> > If everyone is using libbpf, does kernel system (bpf syscall vs
>> > netlink) matter all that much?
>>
>> This argument works the other way as well, though: If libbpf can
>> abstract the subsystem differences and provide a consistent interface to
>> "the BPF world", why does BPF need to impose its own syscall API on the
>> networking subsystem?
>
> bpf_link in libbpf started as user-space abstraction only, but we
> realized that it's not enough and there is a need to have proper
> kernel support and corresponding kernel object, so it's not just
> user-space API concerns.
>
> As for having netlink interface for creating link only for XDP. Why
> duplicating and maintaining 2 interfaces?
Totally agree; why do we need two interfaces? Let's keep the one we
already have - the netlink interface! :)
> All the other subsystems will go through bpf syscall, only XDP wants
> to (also) have this through netlink. This means duplication of UAPI
> for no added benefit. It's a LINK_CREATE operations, as well as
> LINK_UPDATE operations. Do we need to duplicate LINK_QUERY (once its
> implemented)? What if we'd like to support some other generic bpf_link
> functionality, would it be ok to add it only to bpf syscall, or we
> need to duplicate this in netlink as well?
You're saying that like we didn't already have the netlink API. We
essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
this is just adding LINK_UPDATE. It's a straight-forward fix of an
existing API; essentially you're saying we should keep the old API in a
crippled state in order to promote your (proposed) new API.
-Toke
next prev parent reply other threads:[~2020-03-25 9:38 UTC|newest]
Thread overview: 120+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-19 13:13 [PATCH bpf-next 0/4] XDP: Support atomic replacement of XDP interface attachments Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
2020-03-19 22:52 ` Jakub Kicinski
2020-03-20 8:48 ` Toke Høiland-Jørgensen
2020-03-20 17:35 ` Jakub Kicinski
2020-03-20 18:17 ` Toke Høiland-Jørgensen
2020-03-20 18:35 ` Jakub Kicinski
2020-03-20 18:30 ` John Fastabend
2020-03-20 20:24 ` Andrii Nakryiko
2020-03-23 11:24 ` Toke Høiland-Jørgensen
2020-03-23 16:54 ` Jakub Kicinski
2020-03-23 18:14 ` Andrii Nakryiko
2020-03-23 19:23 ` Toke Høiland-Jørgensen
2020-03-24 1:01 ` David Ahern
2020-03-24 4:53 ` Andrii Nakryiko
2020-03-24 20:55 ` David Ahern
2020-03-24 22:56 ` Andrii Nakryiko
2020-03-24 5:00 ` Andrii Nakryiko
2020-03-24 10:57 ` Toke Høiland-Jørgensen
2020-03-24 18:53 ` Jakub Kicinski
2020-03-24 22:30 ` Andrii Nakryiko
2020-03-25 1:25 ` Jakub Kicinski
2020-03-24 19:22 ` John Fastabend
2020-03-25 1:36 ` Alexei Starovoitov
2020-03-25 2:15 ` Jakub Kicinski
2020-03-25 18:06 ` Alexei Starovoitov
2020-03-25 18:20 ` Jakub Kicinski
2020-03-25 19:14 ` Alexei Starovoitov
2020-03-25 10:42 ` Toke Høiland-Jørgensen
2020-03-25 18:11 ` Alexei Starovoitov
2020-03-25 10:30 ` Toke Høiland-Jørgensen
2020-03-25 17:56 ` Alexei Starovoitov
2020-03-24 22:25 ` Andrii Nakryiko
2020-03-25 9:38 ` Toke Høiland-Jørgensen [this message]
2020-03-25 17:55 ` Alexei Starovoitov
2020-03-26 0:16 ` Andrii Nakryiko
2020-03-26 5:13 ` Jakub Kicinski
2020-03-26 18:09 ` Andrii Nakryiko
2020-03-26 19:40 ` Alexei Starovoitov
2020-03-26 20:05 ` Edward Cree
2020-03-27 11:09 ` Lorenz Bauer
2020-03-27 23:11 ` Alexei Starovoitov
2020-03-26 10:04 ` Lorenz Bauer
2020-03-26 17:47 ` Jakub Kicinski
2020-03-26 19:45 ` Alexei Starovoitov
2020-03-26 18:18 ` Andrii Nakryiko
2020-03-26 19:53 ` Alexei Starovoitov
2020-03-27 11:11 ` Toke Høiland-Jørgensen
2020-04-02 20:21 ` bpf: ability to attach freplace to multiple parents Alexei Starovoitov
2020-04-02 21:23 ` Toke Høiland-Jørgensen
2020-04-02 21:54 ` Alexei Starovoitov
2020-04-03 8:38 ` Toke Høiland-Jørgensen
2020-04-07 1:44 ` Alexei Starovoitov
2020-04-07 9:20 ` Toke Høiland-Jørgensen
2020-05-12 8:34 ` Toke Høiland-Jørgensen
2020-05-12 9:53 ` Alan Maguire
2020-05-12 13:02 ` Toke Høiland-Jørgensen
2020-05-12 23:18 ` Alexei Starovoitov
2020-05-12 23:06 ` Alexei Starovoitov
2020-05-13 10:25 ` Toke Høiland-Jørgensen
2020-04-02 21:24 ` Andrey Ignatov
2020-04-02 22:01 ` Alexei Starovoitov
2020-03-26 12:35 ` [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP Toke Høiland-Jørgensen
2020-03-26 19:06 ` Andrii Nakryiko
2020-03-27 11:06 ` Lorenz Bauer
2020-03-27 16:12 ` David Ahern
2020-03-27 20:10 ` Andrii Nakryiko
2020-03-27 23:02 ` Alexei Starovoitov
2020-03-30 15:25 ` Edward Cree
2020-03-31 3:43 ` Alexei Starovoitov
2020-03-31 22:05 ` Edward Cree
2020-03-31 22:16 ` Alexei Starovoitov
2020-03-27 19:42 ` Andrii Nakryiko
2020-03-27 19:45 ` Andrii Nakryiko
2020-03-27 23:09 ` Alexei Starovoitov
2020-03-27 11:46 ` Toke Høiland-Jørgensen
2020-03-27 20:07 ` Andrii Nakryiko
2020-03-27 22:16 ` Toke Høiland-Jørgensen
2020-03-27 22:54 ` Andrii Nakryiko
2020-03-28 1:09 ` Toke Høiland-Jørgensen
2020-03-28 1:44 ` Andrii Nakryiko
2020-03-28 19:43 ` Toke Høiland-Jørgensen
2020-03-26 19:58 ` Alexei Starovoitov
2020-03-27 12:06 ` Toke Høiland-Jørgensen
2020-03-27 23:00 ` Alexei Starovoitov
2020-03-28 1:43 ` Toke Høiland-Jørgensen
2020-03-28 2:26 ` Alexei Starovoitov
2020-03-28 19:34 ` Toke Høiland-Jørgensen
2020-03-28 23:35 ` Alexei Starovoitov
2020-03-29 10:39 ` Toke Høiland-Jørgensen
2020-03-29 19:26 ` Alexei Starovoitov
2020-03-30 10:19 ` Toke Høiland-Jørgensen
2020-03-29 20:23 ` Andrii Nakryiko
2020-03-30 13:53 ` Toke Høiland-Jørgensen
2020-03-30 20:17 ` Andrii Nakryiko
2020-03-31 10:13 ` Toke Høiland-Jørgensen
2020-03-31 13:48 ` Daniel Borkmann
2020-03-31 15:00 ` Toke Høiland-Jørgensen
2020-03-31 20:19 ` Andrii Nakryiko
2020-03-31 20:15 ` Andrii Nakryiko
2020-03-30 15:41 ` Edward Cree
2020-03-30 19:13 ` Jakub Kicinski
2020-03-31 4:01 ` Alexei Starovoitov
2020-03-31 11:34 ` Toke Høiland-Jørgensen
2020-03-31 18:52 ` Alexei Starovoitov
2020-03-20 20:30 ` Daniel Borkmann
2020-03-20 20:40 ` Daniel Borkmann
2020-03-20 21:30 ` Jakub Kicinski
2020-03-20 21:55 ` Daniel Borkmann
2020-03-20 23:35 ` Jakub Kicinski
2020-03-20 20:39 ` Andrii Nakryiko
2020-03-23 11:25 ` Toke Høiland-Jørgensen
2020-03-23 18:07 ` Andrii Nakryiko
2020-03-23 23:54 ` Andrey Ignatov
2020-03-24 10:16 ` Toke Høiland-Jørgensen
2020-03-20 2:13 ` Yonghong Song
2020-03-20 8:48 ` Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 2/4] tools: Add EXPECTED_FD-related definitions in if_link.h Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 3/4] libbpf: Add function to set link XDP fd while specifying old fd Toke Høiland-Jørgensen
2020-03-19 13:13 ` [PATCH bpf-next 4/4] selftests/bpf: Add tests for attaching XDP programs Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87369wrcyv.fsf@toke.dk \
--to=toke@redhat.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andriin@fb.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kuba@kernel.org \
--cc=lmb@cloudflare.com \
--cc=netdev@vger.kernel.org \
--cc=rdna@fb.com \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).