bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@fomichev.me>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Stanislav Fomichev <sdf@google.com>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>
Subject: Re: [PATCH bpf 0/4] bpf: remove __rcu annotations from bpf_prog_array
Date: Tue, 14 May 2019 10:30:02 -0700	[thread overview]
Message-ID: <20190514173002.GB10244@mini-arch> (raw)
In-Reply-To: <CAADnVQLX3EcbW=iVxjsjO38M3Lqw5TfCcZtmbnt1DJwDvp64dA@mail.gmail.com>

On 05/14, Alexei Starovoitov wrote:
> On Mon, May 13, 2019 at 11:57 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 05/08, Stanislav Fomichev wrote:
> > > On 05/08, Alexei Starovoitov wrote:
> > > > On Wed, May 08, 2019 at 10:18:41AM -0700, Stanislav Fomichev wrote:
> > > > > Right now we are not using rcu api correctly: we pass __rcu pointers
> > > > > to bpf_prog_array_xyz routines but don't use rcu_dereference on them
> > > > > (see bpf_prog_array_delete_safe and bpf_prog_array_copy in particular).
> > > > > Instead of sprinkling rcu_dereferences, let's just get rid of those
> > > > > __rcu annotations and move rcu handling to a higher level.
> > > > >
> > > > > It looks like all those routines are called from the rcu update
> > > > > side and we can use simple rcu_dereference_protected to get a
> > > > > reference that is valid as long as we hold a mutex (i.e. no other
> > > > > updater can change the pointer, no need for rcu read section and
> > > > > there should not be a use-after-free problem).
> > > > >
> > > > > To be fair, there is currently no issue with the existing approach
> > > > > since the calls are mutex-protected, pointer values don't change,
> > > > > __rcu annotations are ignored. But it's still nice to use proper api.
> > > > >
> > > > > The series fixes the following sparse warnings:
> > > >
> > > > Absolutely not.
> > > > please fix it properly.
> > > > Removing annotations is not a fix.
> > > I'm fixing it properly by removing the annotations and moving lifetime
> > > management to the upper layer. See commits 2-4 where I fix the users, the
> > > first patch is just the "preparation".
> > >
> > > The users are supposed to do:
> > >
> > > mutex_lock(&x);
> > > p = rcu_dereference_protected(prog_array, lockdep_is_held(&x))
> > > // ...
> > > // call bpf_prog_array helpers while mutex guarantees that
> > > // the object referenced by p is valid (i.e. no need for bpf_prog_array
> > > // helpers to care about rcu lifetime)
> > > // ...
> > > mutex_unlock(&x);
> > >
> > > What am I missing here?
> >
> > Just to give you my perspective on why current api with __rcu annotations
> > is working, but not optimal (even if used from the rcu read section).
> >
> > Sample code:
> >
> >         struct bpf_prog_array __rcu *progs = <comes from somewhere>;
> >         int n;
> >
> >         rcu_read_lock();
> >         n = bpf_prog_array_length(progs);
> >         if (n > 0) {
> >           // do something with __rcu progs
> >           do_something(progs);
> >         }
> >         rcu_read_unlock();
> >
> > Since progs is __rcu annotated, do_something() would need to do
> > rcu_dereference again and it might get a different value compared to
> > whatever bpf_prog_array_free got while doing its dereference.
> 
> correct and I believe the code deals with it fine.
> cnt could be different between two calls.
Yes, currently there is no problem, all users of these apis
are fine because they are holding a mutex (and are hence in the rcu update
path, i.e. the pointer can't change and they have a consistent view
between the calls).

For example, we currently have:

	int n1, n2;
	mutex_lock(&x);
	n1 = bpf_prog_array_length(progs);
	n2 = bpf_prog_array_length(progs);
	// n1 is guaranteed to be the same as n2 as long as we
	// hold a mutex; single updater only
	...
	mutex_unlock(&x);

Versus:

	rcu_read_lock();
	n1 = bpf_prog_array_length(progs);
	n2 = bpf_prog_array_length(progs);
	// n1 and n2 can be different; rcu_read_lock is all about
	// lifetime
	...
	rcu_read_unlock();

But, as I said, we don't use those __rcu annotated bpf_prog_array
routines in the rcu read section, so we are fine.

(I'm just showing that potentially there might be a problem if we don't move
rcu management away from bpf_prog_array routines and if someone
decides to call them under rcu_read_lock).

> > A better way is not to deal with rcu inside those helpers and let
> > higher layers do that:
> >
> >         struct bpf_prog_array __rcu *progs = <comes from somewhere>;
> >         struct bpf_prog_array *p;
> >         int n;
> >
> >         rcu_read_lock();
> >         p = rcu_dereference(p);
> >         n = bpf_prog_array_length(p);
> >         if (n > 0) {
> >           do_something(p); // do_something sees the same p as bpf_prog_array_length
> >         }
> >         rcu_read_unlock();
> >
> > What do you think?
> 
> I'm not sure that generically applicable.
> Which piece of code do you have in mind for such refactoring?
All existing callers of (take a look at patches 2-4):

* bpf_prog_array_free
* bpf_prog_array_length
* bpf_prog_array_copy_to_user
* bpf_prog_array_delete_safe
* bpf_prog_array_copy_info
* bpf_prog_array_copy

They are:

* perf event bpf attach/detach/query (under bpf_event_mutex)
* cgroup attach/detach/query (management of cgroup_bpf->effective, under
  cgroup_mutex)
* lirc bpf attach/detach/query (under ir_raw_handler_lock)

Nobody uses these apis in the rcu read section, so we can remove the
annotations and use rcu_dereference_protected on the higher layer.

Side bonus is that we can also remove __rcu from cgroup_bpf.inactive
(which is just a temp storage and not updated in the rcu fashion) and
from old_array in activate_effective_progs (which is an on-stack
array and, I guess, only has __rcu annotation to satisfy sparse).

So nothing is changing functionality-wise, but it becomes a bit easier
to reason about by being more explicit with rcu api.

  reply	other threads:[~2019-05-14 17:30 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-08 17:18 [PATCH bpf 0/4] bpf: remove __rcu annotations from bpf_prog_array Stanislav Fomichev
2019-05-08 17:18 ` [PATCH bpf 1/4] " Stanislav Fomichev
2019-05-08 17:18 ` [PATCH bpf 2/4] bpf: media: properly use bpf_prog_array api Stanislav Fomichev
2019-05-08 17:18 ` [PATCH bpf 3/4] bpf: cgroup: " Stanislav Fomichev
2019-05-08 17:18 ` [PATCH bpf 4/4] bpf: tracing: " Stanislav Fomichev
2019-05-08 17:56 ` [PATCH bpf 0/4] bpf: remove __rcu annotations from bpf_prog_array Alexei Starovoitov
2019-05-08 18:12   ` Stanislav Fomichev
2019-05-13 18:57     ` Stanislav Fomichev
2019-05-14 16:55       ` Alexei Starovoitov
2019-05-14 17:30         ` Stanislav Fomichev [this message]
2019-05-14 17:45           ` Alexei Starovoitov
2019-05-14 17:53             ` Stanislav Fomichev
2019-05-15  1:25               ` Alexei Starovoitov
2019-05-15  2:11                 ` Stanislav Fomichev
2019-05-15  2:27                   ` Alexei Starovoitov
2019-05-15  2:44                     ` Eric Dumazet
2019-05-15  2:56                       ` Stanislav Fomichev
2019-05-15  3:16                         ` Alexei Starovoitov
2019-05-15  3:38                           ` Stanislav Fomichev
2019-05-15  3:42                             ` Alexei Starovoitov
2019-05-15  3:49                               ` Stanislav Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190514173002.GB10244@mini-arch \
    --to=sdf@fomichev.me \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).