bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: David Ahern <dsahern@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrey Ignatov <rdna@fb.com>, Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v3 bpf-next 0/4] Add support for cgroup bpf_link
Date: Tue, 31 Mar 2020 19:04:04 -0700	[thread overview]
Message-ID: <20200401020404.wbad24dxkv7qr2os@ast-mbp> (raw)
In-Reply-To: <95bfd8e0-86b3-cb87-9f06-68a7c1ba7d7a@gmail.com>

On Tue, Mar 31, 2020 at 07:42:46PM -0600, David Ahern wrote:
> On 3/30/20 7:17 PM, Alexei Starovoitov wrote:
> > On Mon, Mar 30, 2020 at 06:57:44PM -0600, David Ahern wrote:
> >> On 3/30/20 6:32 PM, Alexei Starovoitov wrote:
> >>>>
> >>>> This is not a large feature, and there is no reason for CREATE/UPDATE -
> >>>> a mere 4 patch set - to go in without something as essential as the
> >>>> QUERY for observability.
> >>>
> >>> As I said 'bpftool cgroup' covers it. Observability is not reduced in any way.
> >>
> >> You want a feature where a process can prevent another from installing a
> >> program on a cgroup. How do I learn which process is holding the
> >> bpf_link reference and preventing me from installing a program? Unless I
> >> have missed some recent change that is not currently covered by bpftool
> >> cgroup, and there is no way reading kernel code will tell me.
> > 
> > No. That's not the case at all. You misunderstood the concept.
> 
> I don't think so ...
> 
> > 
> >> That is my point. You are restricting what root can do and people will
> >> not want to resort to killing random processes trying to find the one
> >> holding a reference. 
> > 
> > Not true either.
> > bpf_link = old attach with allow_multi (but with extra safety for owner)
> 
> cgroup programs existed for roughly 1 year before BPF_F_ALLOW_MULTI.
> That's a year for tools like 'ip vrf exec' to exist and be relied on.
> 'ip vrf exec' does not use MULTI.
> 
> I have not done a deep dive on systemd code, but on ubuntu 18.04 system:
> 
> $ sudo ~/bin/bpftool cgroup tree
> CgroupPath
> ID       AttachType      AttachFlags     Name
> /sys/fs/cgroup/unified/system.slice/systemd-udevd.service
>     5        ingress
>     4        egress
> /sys/fs/cgroup/unified/system.slice/systemd-journald.service
>     3        ingress
>     2        egress
> /sys/fs/cgroup/unified/system.slice/systemd-logind.service
>     7        ingress
>     6        egress
> 
> suggests that multi is not common with systemd either at some point in
> its path, so 'ip vrf exec' is not alone in not using the flag. There
> most likely are many other tools.

Please take a look at systemd source code:
src/core/bpf-devices.c
src/core/bpf-firewall.c
It prefers to use BPF_F_ALLOW_MULTI when possible.
Since it's the most sensible flag.
Since 'ip vrf exec' is not using allow_multi it's breaking several systemd features.
(regardless of what bpf_link can and cannot do)

> > The only thing bpf_link protects is the owner of the link from other
> > processes of nuking that link.
> > It does _not_ prevent other processes attaching their own cgroup-bpf progs
> > either via old interface or via bpf_link.
> > 
> 
> It does when that older code does not use the MULTI flag. There is a
> history that is going to create conflicts and being able to id which
> program holds the bpf_link is essential.
> 
> And this is really just one use case. There are many other reasons for
> wanting to know what process is holding a reference to something.

I'm not disagreeing that it's useful to query what is attached where. My point
once again that bpf_link for cgroup didn't change a single bit in this logic.
There are processes (like systemd) that are using allow_multi. When they switch
to use bpf_link few years from now nothing will change for all other processes
in the system. Only systemd will be assured that their bpf-device prog will not
be accidentally removed by 'ip vrf'. Currently nothing protects systemd's bpf
progs. Any cap_net_admin process can _accidentally_ nuke it. It's even more
weird that bpf-cgroup-device that systemd is using is under cap_net_admin.
There is nothing networking about it. But that's a separate discussion.

May be you should fix 'ip vrf' first before systemd folks start yelling and
then we can continue arguing about merits of observability?

  reply	other threads:[~2020-04-01  2:04 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-30  2:59 [PATCH v3 bpf-next 0/4] Add support for cgroup bpf_link Andrii Nakryiko
2020-03-30  2:59 ` [PATCH v3 bpf-next 1/4] bpf: implement bpf_link-based cgroup BPF program attachment Andrii Nakryiko
2020-03-31  0:05   ` Andrey Ignatov
2020-03-31  0:38     ` Alexei Starovoitov
2020-03-31  1:06       ` Andrii Nakryiko
2020-03-30  2:59 ` [PATCH v3 bpf-next 2/4] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
2020-03-30  3:00 ` [PATCH v3 bpf-next 3/4] libbpf: add support for bpf_link-based cgroup attachment Andrii Nakryiko
2020-03-30  3:00 ` [PATCH v3 bpf-next 4/4] selftests/bpf: test FD-based " Andrii Nakryiko
2020-03-30 14:49 ` [PATCH v3 bpf-next 0/4] Add support for cgroup bpf_link David Ahern
2020-03-30 20:20   ` Andrii Nakryiko
2020-03-30 20:45     ` David Ahern
2020-03-30 20:50       ` Alexei Starovoitov
2020-03-30 22:50       ` Alexei Starovoitov
2020-03-30 23:43         ` David Ahern
2020-03-31  0:32           ` Alexei Starovoitov
2020-03-31  0:57             ` David Ahern
2020-03-31  1:17               ` Alexei Starovoitov
2020-04-01  1:42                 ` David Ahern
2020-04-01  2:04                   ` Alexei Starovoitov [this message]
2020-03-31  3:54               ` Andrii Nakryiko
2020-03-31 16:54                 ` David Ahern
2020-03-31 17:03                   ` Andrii Nakryiko
2020-03-31 17:11                   ` Alexei Starovoitov
2020-03-31 21:51                 ` Edward Cree
2020-03-31 22:44                   ` David Ahern
2020-04-01  0:45                     ` Andrii Nakryiko
2020-04-01  0:22                   ` Andrii Nakryiko
2020-04-01 14:26                     ` Edward Cree
2020-04-01 17:39                       ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200401020404.wbad24dxkv7qr2os@ast-mbp \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dsahern@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    --cc=rdna@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).