bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
	kernel-team@cloudflare.com, Andrii Nakryiko <andriin@fb.com>
Subject: Re: [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array
Date: Wed, 24 Jun 2020 20:37:04 +0200	[thread overview]
Message-ID: <87k0zwmhtb.fsf@cloudflare.com> (raw)
In-Reply-To: <CAEf4BzZYjPL+TmqFfuEFgzm+qUh_T2zHsRZjq-BE+LMu+25=ZQ@mail.gmail.com>

On Wed, Jun 24, 2020 at 08:24 PM CEST, Andrii Nakryiko wrote:
> On Wed, Jun 24, 2020 at 11:16 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> On Wed, Jun 24, 2020 at 07:47 PM CEST, Andrii Nakryiko wrote:
>> > On Wed, Jun 24, 2020 at 10:19 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> >>
>> >> On Tue, Jun 23, 2020 at 12:34 PM CEST, Jakub Sitnicki wrote:
>> >> > Prepare for having multi-prog attachments for new netns attach types by
>> >> > storing programs to run in a bpf_prog_array, which is well suited for
>> >> > iterating over programs and running them in sequence.
>> >> >
>> >> > Because bpf_prog_array is dynamically resized, after this change a
>> >> > potentially blocking memory allocation in bpf(PROG_QUERY) callback can
>> >> > happen, in order to collect program IDs before copying the values to
>> >> > user-space supplied buffer. This forces us to adapt how we protect access
>> >> > to the attached program in the callback. As bpf_prog_array_copy_to_user()
>> >> > helper can sleep, we switch from an RCU read lock to holding a mutex that
>> >> > serializes updaters.
>> >> >
>> >> > To handle bpf(PROG_ATTACH) scenario when we are replacing an already
>> >> > attached program, we introduce a new bpf_prog_array helper called
>> >> > bpf_prog_array_replace_item that will exchange the old program with a new
>> >> > one. bpf-cgroup does away with such helper by computing an index into the
>> >> > array from a program position in an external list of attached
>> >> > programs/links. Such approach fails when a dummy prog is left in the array
>> >> > after a memory allocation failure on link release, but is necessary in
>> >> > bpf-cgroup case because the same BPF program can be present in the array
>> >> > multiple times due to inheritance, and attachment cannot be reliably
>> >> > identified by bpf_prog pointer comparison.
>> >> >
>> >> > No functional changes intended.
>> >> >
>> >> > Acked-by: Andrii Nakryiko <andriin@fb.com>
>> >> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> >> > ---
>> >> >  include/linux/bpf.h        |   3 +
>> >> >  include/net/netns/bpf.h    |   5 +-
>> >> >  kernel/bpf/core.c          |  20 ++++--
>> >> >  kernel/bpf/net_namespace.c | 137 +++++++++++++++++++++++++++----------
>> >> >  net/core/flow_dissector.c  |  21 +++---
>> >> >  5 files changed, 132 insertions(+), 54 deletions(-)
>> >> >
>> >>
>> >> [...]
>> >>
>> >> > diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
>> >> > index b951dab2687f..593523a22168 100644
>> >> > --- a/kernel/bpf/net_namespace.c
>> >> > +++ b/kernel/bpf/net_namespace.c
>> >>
>> >> [...]
>> >>
>> >> > @@ -93,8 +108,16 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
>> >> >               goto out_unlock;
>> >> >       }
>> >> >
>> >> > +     run_array = rcu_dereference_protected(net->bpf.run_array[type],
>> >> > +                                           lockdep_is_held(&netns_bpf_mutex));
>> >> > +     if (run_array)
>> >> > +             ret = bpf_prog_array_replace_item(run_array, link->prog, new_prog);
>> >>
>> >> Thinking about this some more, link update should fail with -EINVAL if
>> >> new_prog already exists in run_array. Same as PROG_ATTACH fails with
>> >> -EINVAL when trying to attach the same prog for the second time.
>> >>
>> >> Otherwise, LINK_UPDATE can lead to having same BPF prog present multiple
>> >> times in the prog_array, once attaching more than one prog gets enabled.
>> >>
>> >> Then we would we end up with the same challenge as bpf-cgroup, that is
>> >> how to find the program index into the prog_array in presence of
>> >> dummy_prog's.
>> >
>> > If you attach 5 different links having the same bpf_prog, it should be
>> > allowed and all five bpf_progs should be attached and called 5 times.
>> > They are independent links, that's the main thing. What specific BPF
>> > program is attached by the link (or later updated to) shouldn't be of
>> > any concern here (relative to other attached links/programs).
>> >
>> > Attaching the same *link* twice shouldn't be allowed, though.
>>
>> Thanks for clarifying. I need to change the approach then:
>>
>>  1) find the prog index into prog_array by iterating the list of links,
>>  2) adjust the index for any dummy progs in prog_array below the index.
>>
>> That might work for bpf-cgroup too.
>>
>
> I thought that's what bpf-cgroup already does... Except right now
> there could be no dummy progs, but if we do non-failing detachment,
> there might be. Then hierarchies can get out of sync and we need to
> handle that nicely. It's not super straightforward, that's why I said
> that it's a nice challenge to consider :)

Now I get it... bpf-cgroup doesn't use bpf_prog_array_delete_safe(). I
was confusing it with what I saw in bpf_trace all along.

> But here we don't have hierarchy, it's a happy place to be in :)

It feels like there are some war stories to tell about bpf-cgroup.

>> The only other alternative I can think of it to copy the prog array to
>> filter out dummy_progs, before replacing the prog on link update.
>
> Probably over-complication. I'd filter dummy progs only on new link
> attachment or detachment. Update can be kept very simple.

OK, I know what I need to do.

[...]

  reply	other threads:[~2020-06-24 18:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23 10:34 [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Jakub Sitnicki
2020-06-23 10:34 ` [PATCH bpf-next v2 1/3] flow_dissector: Pull BPF program assignment up to bpf-netns Jakub Sitnicki
2020-06-23 10:34 ` [PATCH bpf-next v2 2/3] bpf, netns: Keep attached programs in bpf_prog_array Jakub Sitnicki
2020-06-23 19:33   ` Martin KaFai Lau
2020-06-23 20:59     ` Jakub Sitnicki
2020-06-23 21:24       ` Martin KaFai Lau
2020-06-24 17:33         ` Jakub Sitnicki
2020-06-24 17:18   ` Jakub Sitnicki
2020-06-24 17:47     ` Andrii Nakryiko
2020-06-24 18:13       ` Jakub Sitnicki
2020-06-24 18:24         ` Andrii Nakryiko
2020-06-24 18:37           ` Jakub Sitnicki [this message]
2020-06-23 10:34 ` [PATCH bpf-next v2 3/3] bpf, netns: Keep a list of attached bpf_link's Jakub Sitnicki
2020-06-29 14:56 ` [PATCH bpf-next v2 0/3] bpf, netns: Prepare for multi-prog attachment Daniel Borkmann
2020-06-29 14:57   ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k0zwmhtb.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).