From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@fb.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
Networking <netdev@vger.kernel.org>,
Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction
Date: Wed, 04 Mar 2020 08:47:44 +0100 [thread overview]
Message-ID: <87h7z44l3z.fsf@toke.dk> (raw)
In-Reply-To: <20200304043643.nqd2kzvabkrzlolh@ast-mbp>
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Tue, Mar 03, 2020 at 11:27:13PM +0100, Toke Høiland-Jørgensen wrote:
>> Alexei Starovoitov <ast@fb.com> writes:
>> >
>> > Legacy api for tc, xdp, cgroup will not be able to override FD-based
>> > link. For TC it's easy. cls-bpf allows multi-prog, so netlink
>> > adding/removing progs will not be able to touch progs that are
>> > attached via FD-based link.
>> > Same thing for cgroups. FD-based link will be similar to 'multi' mode.
>> > The owner of the link has a guarantee that their program will
>> > stay attached to cgroup.
>> > XDP is also easy. Since it has only one prog. Attaching FD-based link
>> > will prevent netlink from overriding it.
>>
>> So what happens if the device goes away?
>
> I'm not sure yet whether it's cleaner to make netdev, qdisc, cgroup to be held
> by the link or use notifier approach. There are pros and cons to both.
>
>> > This way the rootlet prog installed by libxdp (let's find a better name
>> > for it) will stay attached.
>>
>> Dispatcher prog?
>
> would be great, but 'bpf_dispatcher' name is already used in the kernel.
> I guess we can still call the library libdispatcher and dispatcher prog?
> Alternatives:
> libchainer and chainer prog
> libaggregator and aggregator prog?
> libpolicer kinda fits too, but could be misleading.
Of those, I like 'dispatcher' best.
> libxdp is very confusing. It's not xdp specific.
Presumably the parts that are generally useful will just end up in
libbpf (eventually)?
>> > libxdp can choose to pin it in some libxdp specific location, so other
>> > libxdp-enabled applications can find it in the same location, detach,
>> > replace, modify, but random app that wants to hack an xdp prog won't
>> > be able to mess with it.
>>
>> What if that "random app" comes first, and keeps holding on to the link
>> fd? Then the admin essentially has to start killing processes until they
>> find the one that has the device locked, no?
>
> Of course not. We have to provide an api to make it easy to discover
> what process holds that link and where it's pinned.
> But if we go with notifier approach none of it is an issue.
> Whether target obj is held or notifier is used everything I said before still
> stands. "random app" that uses netlink after libdispatcher got its link FD will
> not be able to mess with carefully orchestrated setup done by
> libdispatcher.
Protecting things against random modification is fine. What I want to
avoid is XDP/tc programs locking the device so an admin needs to perform
extra steps if it is in use when (e.g.) shutting down a device. XDP
should be something any application can use as acceleration, and if it
becomes known as "that annoying thing that locks my netdev", then that
is not going to happen.
> Also either approach will guarantee that infamous message:
> "unregister_netdevice: waiting for %s to become free. Usage count"
> users will never see.
>
>> And what about the case where the link fd is pinned on a bpffs that is
>> no longer available? I.e., if a netdevice with an XDP program moves
>> namespaces and no longer has access to the original bpffs, that XDP
>> program would essentially become immutable?
>
> 'immutable' will not be possible.
> I'm not clear to me how bpffs is going to disappear. What do you mean
> exactly?
# stat /sys/fs/bpf | grep Device
Device: 1fh/31d Inode: 1013963 Links: 2
# mkdir /sys/fs/bpf/test; ls /sys/fs/bpf
test
# ip netns add test
# ip netns exec test stat /sys/fs/bpf/test
stat: cannot stat '/sys/fs/bpf/test': No such file or directory
# ip netns exec test stat /sys/fs/bpf | grep Device
Device: 3fh/63d Inode: 12242 Links: 2
It's a different bpffs instance inside the netns, so it won't have
access to anything pinned in the outer one...
-Toke
next prev parent reply other threads:[~2020-03-04 7:47 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-28 22:39 [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 1/3] bpf: introduce pinnable bpf_link abstraction Andrii Nakryiko
2020-03-02 10:13 ` Toke Høiland-Jørgensen
2020-03-02 18:06 ` Andrii Nakryiko
2020-03-02 21:40 ` Toke Høiland-Jørgensen
2020-03-02 23:37 ` Andrii Nakryiko
2020-03-03 2:50 ` Alexei Starovoitov
2020-03-03 4:18 ` Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 2/3] libbpf: add bpf_link pinning/unpinning Andrii Nakryiko
2020-03-02 10:16 ` Toke Høiland-Jørgensen
2020-03-02 18:09 ` Andrii Nakryiko
2020-03-02 21:45 ` Toke Høiland-Jørgensen
2020-02-28 22:39 ` [PATCH bpf-next 3/3] selftests/bpf: add link pinning selftests Andrii Nakryiko
2020-03-02 10:11 ` [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Toke Høiland-Jørgensen
2020-03-02 18:05 ` Andrii Nakryiko
2020-03-02 22:24 ` Toke Høiland-Jørgensen
2020-03-02 23:35 ` Andrii Nakryiko
2020-03-03 8:12 ` Toke Høiland-Jørgensen
2020-03-03 8:12 ` Daniel Borkmann
2020-03-03 15:46 ` Alexei Starovoitov
2020-03-03 19:23 ` Daniel Borkmann
2020-03-03 19:46 ` Andrii Nakryiko
2020-03-03 20:24 ` Toke Høiland-Jørgensen
2020-03-03 20:53 ` Daniel Borkmann
2020-03-03 22:01 ` Alexei Starovoitov
2020-03-03 22:27 ` Toke Høiland-Jørgensen
2020-03-04 4:36 ` Alexei Starovoitov
2020-03-04 7:47 ` Toke Høiland-Jørgensen [this message]
2020-03-04 15:47 ` Alexei Starovoitov
2020-03-05 10:37 ` Toke Høiland-Jørgensen
2020-03-05 16:34 ` Alexei Starovoitov
2020-03-05 22:34 ` Daniel Borkmann
2020-03-05 22:50 ` Alexei Starovoitov
2020-03-05 23:42 ` Daniel Borkmann
2020-03-06 8:31 ` Toke Høiland-Jørgensen
2020-03-06 10:25 ` Daniel Borkmann
2020-03-06 10:42 ` Toke Høiland-Jørgensen
2020-03-06 18:09 ` David Ahern
2020-03-04 19:41 ` Jakub Kicinski
2020-03-04 20:45 ` Alexei Starovoitov
2020-03-04 21:24 ` Jakub Kicinski
2020-03-05 1:07 ` Alexei Starovoitov
2020-03-05 8:16 ` Jakub Kicinski
2020-03-05 11:05 ` Toke Høiland-Jørgensen
2020-03-05 18:13 ` Jakub Kicinski
2020-03-09 11:41 ` Toke Høiland-Jørgensen
2020-03-09 18:50 ` Jakub Kicinski
2020-03-10 12:22 ` Toke Høiland-Jørgensen
2020-03-05 16:39 ` Alexei Starovoitov
2020-03-03 22:40 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h7z44l3z.fsf@toke.dk \
--to=toke@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andriin@fb.com \
--cc=ast@fb.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).