netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Alexei Starovoitov" <ast@fb.com>,
	"Andrii Nakryiko" <andrii.nakryiko@gmail.com>,
	"Andrii Nakryiko" <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	"Kernel Team" <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction
Date: Fri, 6 Mar 2020 00:42:30 +0100	[thread overview]
Message-ID: <7b674384-1131-59d4-ee2f-5a17824c1eca@iogearbox.net> (raw)
In-Reply-To: <20200305225027.nazs3gtlcw3gjjvn@ast-mbp>

On 3/5/20 11:50 PM, Alexei Starovoitov wrote:
> On Thu, Mar 05, 2020 at 11:34:18PM +0100, Daniel Borkmann wrote:
>> On 3/5/20 5:34 PM, Alexei Starovoitov wrote:
>>> On Thu, Mar 05, 2020 at 11:37:11AM +0100, Toke Høiland-Jørgensen wrote:
>>>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>>>> On Wed, Mar 04, 2020 at 08:47:44AM +0100, Toke Høiland-Jørgensen wrote:
>> [...]
>>>> Anyway, what I was trying to express:
>>>>
>>>>> Still that doesn't mean that pinned link is 'immutable'.
>>>>
>>>> I don't mean 'immutable' in the sense that it cannot be removed ever.
>>>> Just that we may end up in a situation where an application can see a
>>>> netdev with an XDP program attached, has the right privileges to modify
>>>> it, but can't because it can't find the pinned bpf_link. Right? Or am I
>>>> misunderstanding your proposal?
>>>>
>>>> Amending my example from before, this could happen by:
>>>>
>>>> 1. Someone attaches a program to eth0, and pins the bpf_link to
>>>>      /sys/fs/bpf/myprog
>>>>
>>>> 2. eth0 is moved to a different namespace which mounts a new sysfs at
>>>>      /sys
>>>>
>>>> 3. Inside that namespace, /sys/fs/bpf/myprog is no longer accessible, so
>>>>      xdp-loader can't get access to the original bpf_link; but the XDP
>>>>      program is still attached to eth0.
>>>
>>> The key to decide is whether moving netdev across netns should be allowed
>>> when xdp attached. I think it should be denied. Even when legacy xdp
>>> program is attached, since it will confuse user space managing part.
>>
>> There are perfectly valid use cases where this is done already today (minus
>> bpf_link), for example, consider an orchestrator that is setting up the BPF
>> program on the device, moving to the newly created application pod during
>> the CNI call in k8s, such that the new pod does not have the /sys/fs/bpf/
>> mount instance and if unprivileged cannot remove the BPF prog from the dev
>> either. We do something like this in case of ipvlan, meaning, we attach a
>> rootlet prog that calls into single slot of a tail call map, move it to the
>> application pod, and only out of Cilium's own pod and it's pod-local bpf fs
>> instance we manage the pinned tail call map to update the main programs in
>> that single slot w/o having to switch any netns later on.
> 
> Right. You mentioned this use case before, but I managed to forget about it.
> Totally makes sense for prog to stay attached to netdev when it's moved.
> I think pod manager would also prefer that pod is not able to replace
> xdp prog from inside the container. It sounds to me that steps 1,2,3 above
> is exactly the desired behavior. Otherwise what stops some application
> that started in a pod to override it?

Generally, yes, and it shouldn't need to care nor see what is happening in
/sys/fs/bpf/ from the orchestrator at least (or could potentially have its
own private mount under /sys/fs/bpf/ or elsewhere). Ideally, the behavior
should be that orchestrator does all the setup out of its own namespace,
then moves everything over to the newly created target namespace and e.g.
only if the pod has the capable(cap_sys_admin) permissions, it could mess
around with anything attached there, or via similar model as done in [0]
when there is a master device. Last time I looked, there is a down/up cycle
on the dev upon netns migration and it flushes e.g. attached qdiscs afaik, so
there are limitations that still need to be addressed. Not sure atm if same
is happening to XDP right now. In this regards veth devices are a bit nicer
to work with since everything can be attached on hostns ingress w/o needing
to worry on the peer dev in the pod's netns.

   [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7cc9f7003a969d359f608ebb701d42cafe75b84a

  reply	other threads:[~2020-03-05 23:42 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 22:39 [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 1/3] bpf: introduce pinnable bpf_link abstraction Andrii Nakryiko
2020-03-02 10:13   ` Toke Høiland-Jørgensen
2020-03-02 18:06     ` Andrii Nakryiko
2020-03-02 21:40       ` Toke Høiland-Jørgensen
2020-03-02 23:37         ` Andrii Nakryiko
2020-03-03  2:50   ` Alexei Starovoitov
2020-03-03  4:18     ` Andrii Nakryiko
2020-02-28 22:39 ` [PATCH bpf-next 2/3] libbpf: add bpf_link pinning/unpinning Andrii Nakryiko
2020-03-02 10:16   ` Toke Høiland-Jørgensen
2020-03-02 18:09     ` Andrii Nakryiko
2020-03-02 21:45       ` Toke Høiland-Jørgensen
2020-02-28 22:39 ` [PATCH bpf-next 3/3] selftests/bpf: add link pinning selftests Andrii Nakryiko
2020-03-02 10:11 ` [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction Toke Høiland-Jørgensen
2020-03-02 18:05   ` Andrii Nakryiko
2020-03-02 22:24     ` Toke Høiland-Jørgensen
2020-03-02 23:35       ` Andrii Nakryiko
2020-03-03  8:12         ` Toke Høiland-Jørgensen
2020-03-03  8:12       ` Daniel Borkmann
2020-03-03 15:46         ` Alexei Starovoitov
2020-03-03 19:23           ` Daniel Borkmann
2020-03-03 19:46             ` Andrii Nakryiko
2020-03-03 20:24               ` Toke Høiland-Jørgensen
2020-03-03 20:53                 ` Daniel Borkmann
2020-03-03 22:01                   ` Alexei Starovoitov
2020-03-03 22:27                     ` Toke Høiland-Jørgensen
2020-03-04  4:36                       ` Alexei Starovoitov
2020-03-04  7:47                         ` Toke Høiland-Jørgensen
2020-03-04 15:47                           ` Alexei Starovoitov
2020-03-05 10:37                             ` Toke Høiland-Jørgensen
2020-03-05 16:34                               ` Alexei Starovoitov
2020-03-05 22:34                                 ` Daniel Borkmann
2020-03-05 22:50                                   ` Alexei Starovoitov
2020-03-05 23:42                                     ` Daniel Borkmann [this message]
2020-03-06  8:31                                       ` Toke Høiland-Jørgensen
2020-03-06 10:25                                         ` Daniel Borkmann
2020-03-06 10:42                                           ` Toke Høiland-Jørgensen
2020-03-06 18:09                                           ` David Ahern
2020-03-04 19:41                         ` Jakub Kicinski
2020-03-04 20:45                           ` Alexei Starovoitov
2020-03-04 21:24                             ` Jakub Kicinski
2020-03-05  1:07                               ` Alexei Starovoitov
2020-03-05  8:16                                 ` Jakub Kicinski
2020-03-05 11:05                                   ` Toke Høiland-Jørgensen
2020-03-05 18:13                                     ` Jakub Kicinski
2020-03-09 11:41                                       ` Toke Høiland-Jørgensen
2020-03-09 18:50                                         ` Jakub Kicinski
2020-03-10 12:22                                           ` Toke Høiland-Jørgensen
2020-03-05 16:39                                   ` Alexei Starovoitov
2020-03-03 22:40                 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b674384-1131-59d4-ee2f-5a17824c1eca@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).