netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction
Date: Wed, 04 Mar 2020 08:47:44 +0100	[thread overview]
Message-ID: <87h7z44l3z.fsf@toke.dk> (raw)
In-Reply-To: <20200304043643.nqd2kzvabkrzlolh@ast-mbp>

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Tue, Mar 03, 2020 at 11:27:13PM +0100, Toke Høiland-Jørgensen wrote:
>> Alexei Starovoitov <ast@fb.com> writes:
>> >
>> > Legacy api for tc, xdp, cgroup will not be able to override FD-based
>> > link. For TC it's easy. cls-bpf allows multi-prog, so netlink
>> > adding/removing progs will not be able to touch progs that are
>> > attached via FD-based link.
>> > Same thing for cgroups. FD-based link will be similar to 'multi' mode.
>> > The owner of the link has a guarantee that their program will
>> > stay attached to cgroup.
>> > XDP is also easy. Since it has only one prog. Attaching FD-based link
>> > will prevent netlink from overriding it.
>> 
>> So what happens if the device goes away?
>
> I'm not sure yet whether it's cleaner to make netdev, qdisc, cgroup to be held
> by the link or use notifier approach. There are pros and cons to both.
>
>> > This way the rootlet prog installed by libxdp (let's find a better name
>> > for it) will stay attached.
>> 
>> Dispatcher prog?
>
> would be great, but 'bpf_dispatcher' name is already used in the kernel.
> I guess we can still call the library libdispatcher and dispatcher prog?
> Alternatives:
> libchainer and chainer prog
> libaggregator and aggregator prog?
> libpolicer kinda fits too, but could be misleading.

Of those, I like 'dispatcher' best.

> libxdp is very confusing. It's not xdp specific.

Presumably the parts that are generally useful will just end up in
libbpf (eventually)?

>> > libxdp can choose to pin it in some libxdp specific location, so other
>> > libxdp-enabled applications can find it in the same location, detach,
>> > replace, modify, but random app that wants to hack an xdp prog won't
>> > be able to mess with it.
>> 
>> What if that "random app" comes first, and keeps holding on to the link
>> fd? Then the admin essentially has to start killing processes until they
>> find the one that has the device locked, no?
>
> Of course not. We have to provide an api to make it easy to discover
> what process holds that link and where it's pinned.
> But if we go with notifier approach none of it is an issue.
> Whether target obj is held or notifier is used everything I said before still
> stands. "random app" that uses netlink after libdispatcher got its link FD will
> not be able to mess with carefully orchestrated setup done by
> libdispatcher.

Protecting things against random modification is fine. What I want to
avoid is XDP/tc programs locking the device so an admin needs to perform
extra steps if it is in use when (e.g.) shutting down a device. XDP
should be something any application can use as acceleration, and if it
becomes known as "that annoying thing that locks my netdev", then that
is not going to happen.

> Also either approach will guarantee that infamous message:
> "unregister_netdevice: waiting for %s to become free. Usage count"
> users will never see.
>
>> And what about the case where the link fd is pinned on a bpffs that is
>> no longer available? I.e., if a netdevice with an XDP program moves
>> namespaces and no longer has access to the original bpffs, that XDP
>> program would essentially become immutable?
>
> 'immutable' will not be possible.
> I'm not clear to me how bpffs is going to disappear. What do you mean
> exactly?

# stat /sys/fs/bpf | grep Device
Device: 1fh/31d	Inode: 1013963     Links: 2
# mkdir /sys/fs/bpf/test; ls /sys/fs/bpf
test
# ip netns add test
# ip netns exec test stat /sys/fs/bpf/test
stat: cannot stat '/sys/fs/bpf/test': No such file or directory
# ip netns exec test stat /sys/fs/bpf | grep Device
Device: 3fh/63d	Inode: 12242       Links: 2

It's a different bpffs instance inside the netns, so it won't have
access to anything pinned in the outer one...

-Toke


  reply	other threads:[~2020-03-04  7:47 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 22:39 [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 1/3] bpf: introduce pinnable bpf_link abstraction Andrii Nakryiko
2020-03-02 10:13   ` Toke Høiland-Jørgensen
2020-03-02 18:06     ` Andrii Nakryiko
2020-03-02 21:40       ` Toke Høiland-Jørgensen
2020-03-02 23:37         ` Andrii Nakryiko
2020-03-03  2:50   ` Alexei Starovoitov
2020-03-03  4:18     ` Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 2/3] libbpf: add bpf_link pinning/unpinning Andrii Nakryiko
2020-03-02 10:16   ` Toke Høiland-Jørgensen
2020-03-02 18:09     ` Andrii Nakryiko
2020-03-02 21:45       ` Toke Høiland-Jørgensen
2020-02-28 22:39 ` [PATCH bpf-next 3/3] selftests/bpf: add link pinning selftests Andrii Nakryiko
2020-03-02 10:11 ` [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Toke Høiland-Jørgensen
2020-03-02 18:05   ` Andrii Nakryiko
2020-03-02 22:24     ` Toke Høiland-Jørgensen
2020-03-02 23:35       ` Andrii Nakryiko
2020-03-03  8:12         ` Toke Høiland-Jørgensen
2020-03-03  8:12       ` Daniel Borkmann
2020-03-03 15:46         ` Alexei Starovoitov
2020-03-03 19:23           ` Daniel Borkmann
2020-03-03 19:46             ` Andrii Nakryiko
2020-03-03 20:24               ` Toke Høiland-Jørgensen
2020-03-03 20:53                 ` Daniel Borkmann
2020-03-03 22:01                   ` Alexei Starovoitov
2020-03-03 22:27                     ` Toke Høiland-Jørgensen
2020-03-04  4:36                       ` Alexei Starovoitov
2020-03-04  7:47                         ` Toke Høiland-Jørgensen [this message]
2020-03-04 15:47                           ` Alexei Starovoitov
2020-03-05 10:37                             ` Toke Høiland-Jørgensen
2020-03-05 16:34                               ` Alexei Starovoitov
2020-03-05 22:34                                 ` Daniel Borkmann
2020-03-05 22:50                                   ` Alexei Starovoitov
2020-03-05 23:42                                     ` Daniel Borkmann
2020-03-06  8:31                                       ` Toke Høiland-Jørgensen
2020-03-06 10:25                                         ` Daniel Borkmann
2020-03-06 10:42                                           ` Toke Høiland-Jørgensen
2020-03-06 18:09                                           ` David Ahern
2020-03-04 19:41                         ` Jakub Kicinski
2020-03-04 20:45                           ` Alexei Starovoitov
2020-03-04 21:24                             ` Jakub Kicinski
2020-03-05  1:07                               ` Alexei Starovoitov
2020-03-05  8:16                                 ` Jakub Kicinski
2020-03-05 11:05                                   ` Toke Høiland-Jørgensen
2020-03-05 18:13                                     ` Jakub Kicinski
2020-03-09 11:41                                       ` Toke Høiland-Jørgensen
2020-03-09 18:50                                         ` Jakub Kicinski
2020-03-10 12:22                                           ` Toke Høiland-Jørgensen
2020-03-05 16:39                                   ` Alexei Starovoitov
2020-03-03 22:40                 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h7z44l3z.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).