From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
Networking <netdev@vger.kernel.org>,
Alexei Starovoitov <ast@fb.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrey Ignatov <rdna@fb.com>, Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link
Date: Thu, 26 Mar 2020 16:59:06 -0700 [thread overview]
Message-ID: <CAEf4BzYQnzUAFo-Jmikg3va2d8tZ+ZL1x2QSf6NdrY629hKc2g@mail.gmail.com> (raw)
In-Reply-To: <20200326233533.gbyogvi57xufe34d@ast-mbp>
On Thu, Mar 26, 2020 at 4:35 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Mar 24, 2020 at 11:57:44PM -0700, Andrii Nakryiko wrote:
> >
> > +/* Swap updated BPF program for given link in effective program arrays across
> > + * all descendant cgroups. This function is guaranteed to succeed.
> > + */
> > +static void replace_effective_prog(struct cgroup *cgrp,
> > + enum bpf_attach_type type,
> > + struct bpf_cgroup_link *link)
> > +{
> > + struct bpf_prog_array_item *item;
> > + struct cgroup_subsys_state *css;
> > + struct bpf_prog_array *progs;
> > + struct bpf_prog_list *pl;
> > + struct list_head *head;
> > + struct cgroup *cg;
> > + int pos;
> > +
> > + css_for_each_descendant_pre(css, &cgrp->self) {
> > + struct cgroup *desc = container_of(css, struct cgroup, self);
> > +
> > + if (percpu_ref_is_zero(&desc->bpf.refcnt))
> > + continue;
> > +
> > + /* found position of link in effective progs array */
> > + for (pos = 0, cg = desc; cg; cg = cgroup_parent(cg)) {
> > + if (pos && !(cg->bpf.flags[type] & BPF_F_ALLOW_MULTI))
> > + continue;
> > +
> > + head = &cg->bpf.progs[type];
> > + list_for_each_entry(pl, head, node) {
> > + if (!prog_list_prog(pl))
> > + continue;
> > + if (pl->link == link)
> > + goto found;
> > + pos++;
> > + }
> > + }
> > +found:
> > + BUG_ON(!cg);
> > + progs = rcu_dereference_protected(
> > + desc->bpf.effective[type],
> > + lockdep_is_held(&cgroup_mutex));
> > + item = &progs->items[pos];
> > + WRITE_ONCE(item->prog, link->link.prog);
> > + }
> > +}
> > +
> > +/**
> > + * __cgroup_bpf_replace() - Replace link's program and propagate the change
> > + * to descendants
> > + * @cgrp: The cgroup which descendants to traverse
> > + * @link: A link for which to replace BPF program
> > + * @type: Type of attach operation
> > + *
> > + * Must be called with cgroup_mutex held.
> > + */
> > +int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
> > + struct bpf_prog *new_prog)
> > +{
> > + struct list_head *progs = &cgrp->bpf.progs[link->type];
> > + struct bpf_prog *old_prog;
> > + struct bpf_prog_list *pl;
> > + bool found = false;
> > +
> > + if (link->link.prog->type != new_prog->type)
> > + return -EINVAL;
> > +
> > + list_for_each_entry(pl, progs, node) {
> > + if (pl->link == link) {
> > + found = true;
> > + break;
> > + }
> > + }
> > + if (!found)
> > + return -ENOENT;
> > +
> > + old_prog = xchg(&link->link.prog, new_prog);
> > + replace_effective_prog(cgrp, link->type, link);
>
> I think with 'found = true' in this function you're assuming that it will be
> found in replace_effective_prog() ? I don't think that's the case.
> Try to create bpf_link with BPF_F_ALLOW_OVERRIDE, override it in a child cgroup
> with another link and then try to LINK_UPDATE the former. The link is there,
> but the prog is not executing and it's not in effective array. What LINK_UPDATE
> suppose to do? I guess it should succeed?
Yes, this is a great catch! I should have used ALLOW_OVERRIDE at the
root cgroup level in my selftest, it would catch it immediately.
BUG_ON(!cg) in replace_effective_prog() is too aggressive, if I
replace it with `if (!cg) continue;` it will handle this as well.
> Even trickier that the prog will be in effective array in some of
> css_for_each_descendant_pre() and not in others. This cgroup attach semantics
> were convoluted from the day one. Apparently people use all three variants now,
> but I wouldn't bet that everyone understands it.
Agree about convoluted logic, spent enormous time understanding and
modifying it :)
But apart from BUG_ON(!cg) problem, everything works because each
level of hierarchy is treated independently in
replace_effective_prog(). Search for attached link on each level is
reset and performed anew. If found - we replace program, if not - must
be ALLOW_OVERRIDE case, i.e., actually overridden link.
> Hence my proposal to support F_ALLOW_MULTI for links only. At least initially.
> It's so much simpler to explain. And owning bpf_link will guarantee that the
> prog is executing (unless cgroup is removed and sockets are closed). I guess
> default (no-override) is acceptable to bpf_link as well and in that sense it
> will be very similar to XDP with single prog attached. So I think I can live
> with default and ALLOW_MULTI for now. But we should probably redesign
> overriding capabilities. Folks need to attach multiple progs to a given cgroup
> and disallow all progs in children. Currently it's not possible to do, since
> MULTI in the parent allows at least one (default, override or multi) in the
> children. bpf_link inheriting this logic won't help to solve this use case. It
> feels that link should stay as multi only and override or not in the children
> should be a separate property. Probably not related to link at all. It fits
> better as a cgroup permission.
Yeah, we had a brief discussion with Andrey on mailing list. Not sure
what the solution looks like, but it should be orthogonal to link/prog
attachment operation, probably.
If you insist and Andrey is ok with dropping ALLOW_OVERRIDE, it's
easy. But fixing the logic to handle it is also easy. So are we sure
about supporting 2 out of 3 existing modes? :)
>
> Anyhow I'm going to apply patches 1 and 2, since they are good cleanup
> regardless of what we decide here.
Thanks, will rebase on top of bpf-next master for v3.
next prev parent reply other threads:[~2020-03-26 23:59 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-25 6:57 [PATCH v2 bpf-next 0/6] Add support for cgroup bpf_link Andrii Nakryiko
2020-03-25 6:57 ` [PATCH v2 bpf-next 1/6] bpf: factor out cgroup storages operations Andrii Nakryiko
2020-03-25 6:57 ` [PATCH v2 bpf-next 2/6] bpf: factor out attach_type to prog_type mapping for attach/detach Andrii Nakryiko
2020-03-25 6:57 ` [PATCH v2 bpf-next 3/6] bpf: implement bpf_link-based cgroup BPF program attachment Andrii Nakryiko
2020-03-25 6:57 ` [PATCH v2 bpf-next 4/6] bpf: implement bpf_prog replacement for an active bpf_cgroup_link Andrii Nakryiko
2020-03-25 22:57 ` kbuild test robot
2020-03-26 1:28 ` Andrii Nakryiko
2020-03-26 0:45 ` kbuild test robot
2020-03-26 23:35 ` Alexei Starovoitov
2020-03-26 23:59 ` Andrii Nakryiko [this message]
2020-03-27 0:34 ` Alexei Starovoitov
2020-03-27 0:55 ` Andrii Nakryiko
2020-03-25 6:57 ` [PATCH v2 bpf-next 5/6] libbpf: add support for bpf_link-based cgroup attachment Andrii Nakryiko
2020-03-25 6:57 ` [PATCH v2 bpf-next 6/6] selftests/bpf: test FD-based " Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAEf4BzYQnzUAFo-Jmikg3va2d8tZ+ZL1x2QSf6NdrY629hKc2g@mail.gmail.com \
--to=andrii.nakryiko@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andriin@fb.com \
--cc=ast@fb.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
--cc=rdna@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).