From: KP Singh <kpsingh@kernel.org>
To: Martin KaFai Lau <kafai@fb.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Jann Horn <jannh@google.com>,
Florent Revest <revest@chromium.org>,
Brendan Jackman <jackmanb@chromium.org>
Subject: Re: [PATCH bpf-next 1/2] bpf: Allow bpf_local_storage to be used by sleepable programs
Date: Tue, 31 Aug 2021 11:50:48 +0200 [thread overview]
Message-ID: <CACYkzJ5nQ4O-XqX0VHCPs77hDcyjtbk2c9DjXLdZLJ-7sO6DgQ@mail.gmail.com> (raw)
In-Reply-To: <20210831021132.sehzvrudvcjbzmwt@kafai-mbp.dhcp.thefacebook.com>
On Tue, Aug 31, 2021 at 4:11 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Sun, Aug 29, 2021 at 11:52:03PM +0200, KP Singh wrote:
> [ ... ]
>
> > > > diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> > > > index b305270b7a4b..7760bc4e9565 100644
> > > > --- a/kernel/bpf/bpf_local_storage.c
> > > > +++ b/kernel/bpf/bpf_local_storage.c
> > > > @@ -11,6 +11,8 @@
> > > > #include <net/sock.h>
> > > > #include <uapi/linux/sock_diag.h>
> > > > #include <uapi/linux/btf.h>
> > > > +#include <linux/rcupdate_trace.h>
> > > > +#include <linux/rcupdate_wait.h>
> > > >
> > > > #define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
> > > >
> > > > @@ -81,6 +83,22 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
> > > > return NULL;
> > > > }
> > > >
> > > > +void bpf_local_storage_free_rcu(struct rcu_head *rcu)
> > > > +{
> > > > + struct bpf_local_storage *local_storage;
> > > > +
> > > > + local_storage = container_of(rcu, struct bpf_local_storage, rcu);
> > > > + kfree_rcu(local_storage, rcu);
> > > > +}
> > > > +
> > > > +static void bpf_selem_free_rcu(struct rcu_head *rcu)
> > > > +{
> > > > + struct bpf_local_storage_elem *selem;
> > > > +
> > > > + selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
> > > > + kfree_rcu(selem, rcu);
> > > > +}
> > > > +
> > > > /* local_storage->lock must be held and selem->local_storage == local_storage.
> > > > * The caller must ensure selem->smap is still valid to be
> > > > * dereferenced for its smap->elem_size and smap->cache_idx.
> > > > @@ -118,12 +136,12 @@ bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
> > > > *
> > > > * Although the unlock will be done under
> > > > * rcu_read_lock(), it is more intutivie to
> > > > - * read if kfree_rcu(local_storage, rcu) is done
> > > > + * read if the freeing of the storage is done
> > > > * after the raw_spin_unlock_bh(&local_storage->lock).
> > > > *
> > > > * Hence, a "bool free_local_storage" is returned
> > > > - * to the caller which then calls the kfree_rcu()
> > > > - * after unlock.
> > > > + * to the caller which then calls then frees the storage after
> > > > + * all the RCU grace periods have expired.
> > > > */
> > > > }
> > > > hlist_del_init_rcu(&selem->snode);
> > > > @@ -131,7 +149,7 @@ bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
> > > > SDATA(selem))
> > > > RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
> > > >
> > > > - kfree_rcu(selem, rcu);
> > > > + call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_rcu);
> > > Although the common use case is usually storage_get() much more often
> > > than storage_delete(), do you aware any performance impact for
> > > the bpf prog that does a lot of storage_delete()?
> >
> > I have not really measured the impact on deletes, My understanding is
> > that it should
> > not impact the BPF program, but yes, if there are some critical
> > sections that are prolonged
> > due to a sleepable program "sleeping" too long, then it would pile up
> > the callbacks.
> >
> > But this is not something new, as we have a similar thing in BPF
> > trampolines. If this really
> > becomes an issue, we could add a flag BPF_F_SLEEPABLE_STORAGE and only maps
> > with this flag would be allowed in sleepable progs.
> Agree that is similar to trampoline updates but not sure it is comparable
> in terms of the frequency of elems being deleted here. e.g. many
> short lived tcp connections created by external traffic.
>
> Adding a BPF_F_SLEEPABLE_STORAGE later won't work. It will break
> existing sleepable bpf prog.
>
> I don't know enough on call_rcu_tasks_trace() here, so the
> earlier question on perf/callback-pile-up implications in order to
> decide if extra logic or knob is needed here or not.
I will defer to the others, maybe Alexei and Paul, we could also just
add the flag to not affect existing performance characteristics?
>
> >
> > We could then wait for both critical sections only when this flag is
> > set on the map.
> >
> > >
> > > >
> > > > return free_local_storage;
> > > > }
> > > > @@ -154,7 +172,8 @@ static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
> > > > raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> > > >
> > > > if (free_local_storage)
> > > > - kfree_rcu(local_storage, rcu);
> > > > + call_rcu_tasks_trace(&local_storage->rcu,
> > > > + bpf_local_storage_free_rcu);
> > > > }
> > > >
> > > > void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
> > > > @@ -213,7 +232,8 @@ bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
> > > > struct bpf_local_storage_elem *selem;
> > > >
> > > > /* Fast path (cache hit) */
> > > > - sdata = rcu_dereference(local_storage->cache[smap->cache_idx]);
> > > > + sdata = rcu_dereference_protected(local_storage->cache[smap->cache_idx],
> > > > + bpf_local_storage_rcu_lock_held());
> > > There are other places using rcu_dereference() also.
> > > e.g. in bpf_local_storage_update().
> > > Should they be changed also?
> >
> > From what I saw, the other usage of rcu_derference is in a nested
> > (w.r.t to the RCU section that in bpf_prog_enter/exit) RCU
> > read side critical section/rcu_read_{lock, unlock} so it should not be required.
> hmm... not sure what nested or not has to do here.
> It is likely we are talking different things.
>
Yeah, we were looking at different things.
e.g. bpf_selem_unlink does not need to be changed as it is in
a rcu_read_lock.
But you are right there is another in bpf_local_storage_update which I will fix.
> Did you turn on the lockdep rcu debugging in kconfig?
Yes I have PROVE_RCU and LOCKDEP
>
> afaik, lockdep uses the second "check" argument to WARN on incorrect usage.
> Even the right check is passed here, the later rcu_dereference() will still
> complain because it only checks for rcu_read_lock_held().
>
> Also, after another look, why rcu_dereference_protected() is used
> here instead of rcu_dereference_check()? The spinlock is not acquired
> here. The same comment goes for similar rcu_dereference_protected() usages.
Good catch, it should be rcu_dereference_check.
>
> >
> > If there are some that are not, then they need to be updated. Did I miss any?
> >
> > >
> > > [ ... ]
> > >
> > > > --- a/net/core/bpf_sk_storage.c
> > > > +++ b/net/core/bpf_sk_storage.c
> > > > @@ -13,6 +13,7 @@
> > > > #include <net/sock.h>
> > > > #include <uapi/linux/sock_diag.h>
> > > > #include <uapi/linux/btf.h>
> > > > +#include <linux/rcupdate_trace.h>
> > > >
> > > > DEFINE_BPF_STORAGE_CACHE(sk_cache);
> > > >
> > > > @@ -22,7 +23,8 @@ bpf_sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
> > > > struct bpf_local_storage *sk_storage;
> > > > struct bpf_local_storage_map *smap;
> > > >
> > > > - sk_storage = rcu_dereference(sk->sk_bpf_storage);
> > > > + sk_storage = rcu_dereference_protected(sk->sk_bpf_storage,
> > > > + bpf_local_storage_rcu_lock_held());
> > > > if (!sk_storage)
> > > > return NULL;
> > > >
> > > > @@ -258,6 +260,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
> > > > {
> > > > struct bpf_local_storage_data *sdata;
> > > >
> > > > + WARN_ON_ONCE(!bpf_local_storage_rcu_lock_held());
> > > > if (!sk || !sk_fullsock(sk) || flags > BPF_SK_STORAGE_GET_F_CREATE)
> > > sk is protected by rcu_read_lock here.
> > > Is it always safe to access it with the rcu_read_lock_trace alone ?
> >
> > We don't dereference sk with an rcu_dereference though, is it still the case for
> > tracing and LSM programs? Or is it somehow implicity protected even
> > though we don't use rcu_dereference since that's just a READ_ONCE + some checks?
> e.g. the bpf_prog (currently run under rcu_read_lock()) may read the sk from
> req_sk->sk which I don't think the verifier will optimize it out, so as good
> as READ_ONCE(), iiuc.
>
> The sk here is obtained from the bpf_lsm_socket_* hooks? Those sk should have
> a refcnt, right? If that is the case, it should be good enough for now.
The one passed in the arguments yes, but if you notice the discussion in
https://lore.kernel.org/bpf/20210826133913.627361-1-memxor@gmail.com/T/#me254212a125516a6c5d2fbf349b97c199e66dce0
one may also get an sk in LSM and tracing progs by pointer walking.
>
> >
> > Would grabbing a refcount earlier in the code be helpful?
> Grab the refcount earlier in the bpf prog itself?
> right, it may be what eventually needs to be done
> if somehow there is a usecase that the sleepable bpf prog can get a
> hold on a rcu protected sk.
next prev parent reply other threads:[~2021-08-31 9:51 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-26 23:51 [PATCH bpf-next 0/2] Sleepable local storage KP Singh
2021-08-26 23:51 ` [PATCH bpf-next 1/2] bpf: Allow bpf_local_storage to be used by sleepable programs KP Singh
2021-08-27 20:55 ` Martin KaFai Lau
2021-08-29 21:52 ` KP Singh
2021-08-31 2:11 ` Martin KaFai Lau
2021-08-31 9:50 ` KP Singh [this message]
2021-08-31 18:22 ` Martin KaFai Lau
2021-08-31 19:38 ` KP Singh
2021-09-01 6:32 ` Martin KaFai Lau
2021-09-01 20:26 ` Paul E. McKenney
2021-09-02 4:44 ` Martin KaFai Lau
2021-11-23 17:11 ` KP Singh
2021-11-23 18:22 ` Paul E. McKenney
2021-11-23 22:29 ` Martin KaFai Lau
2021-11-23 23:14 ` KP Singh
2021-11-24 0:18 ` Martin KaFai Lau
2021-11-24 22:20 ` KP Singh
2021-11-30 2:34 ` Martin KaFai Lau
2021-11-30 16:22 ` KP Singh
2021-11-30 22:51 ` Paul E. McKenney
2021-12-04 1:01 ` Paul E. McKenney
2021-12-05 2:27 ` KP Singh
2021-12-05 3:52 ` Paul E. McKenney
2021-11-23 23:11 ` KP Singh
2021-11-25 3:47 ` Paul E. McKenney
2021-09-30 18:46 ` Martin KaFai Lau
2021-11-02 16:00 ` KP Singh
2021-08-26 23:51 ` [PATCH bpf-next 2/2] bpf/selftests: Update local storage selftest for " KP Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACYkzJ5nQ4O-XqX0VHCPs77hDcyjtbk2c9DjXLdZLJ-7sO6DgQ@mail.gmail.com \
--to=kpsingh@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jackmanb@chromium.org \
--cc=jannh@google.com \
--cc=kafai@fb.com \
--cc=paulmck@kernel.org \
--cc=revest@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).