bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: KP Singh <kpsingh@kernel.org>
Cc: <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Jann Horn <jannh@google.com>,
	Florent Revest <revest@chromium.org>,
	Brendan Jackman <jackmanb@chromium.org>
Subject: Re: [PATCH bpf-next 1/2] bpf: Allow bpf_local_storage to be used by sleepable programs
Date: Fri, 27 Aug 2021 13:55:30 -0700	[thread overview]
Message-ID: <20210827205530.zzqawd6wz52n65qh@kafai-mbp> (raw)
In-Reply-To: <20210826235127.303505-2-kpsingh@kernel.org>

On Thu, Aug 26, 2021 at 11:51:26PM +0000, KP Singh wrote:
> Other maps like hashmaps are already available to sleepable programs.
> Sleepable BPF programs run under trace RCU. Allow task, local and inode
> storage to be used from sleepable programs.
> 
> The local storage code mostly runs under the programs RCU read section
> (in __bpf_prog_enter{_sleepable} and __bpf_prog_exit{_sleepable})
> (rcu_read_lock or rcu_read_lock_trace) with the exception the logic
> where the map is freed.
> 
> After some discussions and help from Jann Horn, the following changes
> were made:
> 
> bpf_local_storage{_elem} are freed with a kfree_rcu
> wrapped with a call_rcu_tasks_trace callback instead of a direct
> kfree_rcu which does not respect the trace RCU grace periods. The
> callback frees the storage/selem with kfree_rcu which handles the normal
> RCU grace period similar to BPF trampolines.
> 
> Update the synchronise_rcu to synchronize_rcu_mult in the map freeing
> code to wait for trace RCU and normal RCU grace periods.
> While this is an expensive operation, bpf_local_storage_map_free
> is not called from within a BPF program, rather only called when the
> owning object is being freed.
> 
> Update the dereferencing of the pointers to use rcu_derference_protected
> (with either the trace or normal RCU locks held) and add warnings in the
> beginning of the get and delete helpers.
> 
> Signed-off-by: KP Singh <kpsingh@kernel.org>
> ---
>  include/linux/bpf_local_storage.h |  5 ++++
>  kernel/bpf/bpf_inode_storage.c    |  9 +++++--
>  kernel/bpf/bpf_local_storage.c    | 43 ++++++++++++++++++++++++-------
>  kernel/bpf/bpf_task_storage.c     |  6 ++++-
>  kernel/bpf/verifier.c             |  3 +++
>  net/core/bpf_sk_storage.c         |  8 +++++-
>  6 files changed, 61 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
> index 24496bc28e7b..8453488b334d 100644
> --- a/include/linux/bpf_local_storage.h
> +++ b/include/linux/bpf_local_storage.h
> @@ -16,6 +16,9 @@
>  
>  #define BPF_LOCAL_STORAGE_CACHE_SIZE	16
>  
> +#define bpf_local_storage_rcu_lock_held()			\
> +	(rcu_read_lock_held() || rcu_read_lock_trace_held() ||	\
> +		rcu_read_lock_bh_held())
There is a similar test in hashtab.  May be renaming it to a more
generic name such that it can be reused there?

[ ... ]

> diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> index b305270b7a4b..7760bc4e9565 100644
> --- a/kernel/bpf/bpf_local_storage.c
> +++ b/kernel/bpf/bpf_local_storage.c
> @@ -11,6 +11,8 @@
>  #include <net/sock.h>
>  #include <uapi/linux/sock_diag.h>
>  #include <uapi/linux/btf.h>
> +#include <linux/rcupdate_trace.h>
> +#include <linux/rcupdate_wait.h>
>  
>  #define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
>  
> @@ -81,6 +83,22 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
>  	return NULL;
>  }
>  
> +void bpf_local_storage_free_rcu(struct rcu_head *rcu)
> +{
> +	struct bpf_local_storage *local_storage;
> +
> +	local_storage = container_of(rcu, struct bpf_local_storage, rcu);
> +	kfree_rcu(local_storage, rcu);
> +}
> +
> +static void bpf_selem_free_rcu(struct rcu_head *rcu)
> +{
> +	struct bpf_local_storage_elem *selem;
> +
> +	selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
> +	kfree_rcu(selem, rcu);
> +}
> +
>  /* local_storage->lock must be held and selem->local_storage == local_storage.
>   * The caller must ensure selem->smap is still valid to be
>   * dereferenced for its smap->elem_size and smap->cache_idx.
> @@ -118,12 +136,12 @@ bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
>  		 *
>  		 * Although the unlock will be done under
>  		 * rcu_read_lock(),  it is more intutivie to
> -		 * read if kfree_rcu(local_storage, rcu) is done
> +		 * read if the freeing of the storage is done
>  		 * after the raw_spin_unlock_bh(&local_storage->lock).
>  		 *
>  		 * Hence, a "bool free_local_storage" is returned
> -		 * to the caller which then calls the kfree_rcu()
> -		 * after unlock.
> +		 * to the caller which then calls then frees the storage after
> +		 * all the RCU grace periods have expired.
>  		 */
>  	}
>  	hlist_del_init_rcu(&selem->snode);
> @@ -131,7 +149,7 @@ bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
>  	    SDATA(selem))
>  		RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
>  
> -	kfree_rcu(selem, rcu);
> +	call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_rcu);
Although the common use case is usually storage_get() much more often
than storage_delete(), do you aware any performance impact for
the bpf prog that does a lot of storage_delete()?

>  
>  	return free_local_storage;
>  }
> @@ -154,7 +172,8 @@ static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
>  	raw_spin_unlock_irqrestore(&local_storage->lock, flags);
>  
>  	if (free_local_storage)
> -		kfree_rcu(local_storage, rcu);
> +		call_rcu_tasks_trace(&local_storage->rcu,
> +				     bpf_local_storage_free_rcu);
>  }
>  
>  void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
> @@ -213,7 +232,8 @@ bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
>  	struct bpf_local_storage_elem *selem;
>  
>  	/* Fast path (cache hit) */
> -	sdata = rcu_dereference(local_storage->cache[smap->cache_idx]);
> +	sdata = rcu_dereference_protected(local_storage->cache[smap->cache_idx],
> +					  bpf_local_storage_rcu_lock_held());
There are other places using rcu_dereference() also.
e.g. in bpf_local_storage_update().
Should they be changed also?

[ ... ]

> --- a/net/core/bpf_sk_storage.c
> +++ b/net/core/bpf_sk_storage.c
> @@ -13,6 +13,7 @@
>  #include <net/sock.h>
>  #include <uapi/linux/sock_diag.h>
>  #include <uapi/linux/btf.h>
> +#include <linux/rcupdate_trace.h>
>  
>  DEFINE_BPF_STORAGE_CACHE(sk_cache);
>  
> @@ -22,7 +23,8 @@ bpf_sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
>  	struct bpf_local_storage *sk_storage;
>  	struct bpf_local_storage_map *smap;
>  
> -	sk_storage = rcu_dereference(sk->sk_bpf_storage);
> +	sk_storage = rcu_dereference_protected(sk->sk_bpf_storage,
> +					       bpf_local_storage_rcu_lock_held());
>  	if (!sk_storage)
>  		return NULL;
>  
> @@ -258,6 +260,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
>  {
>  	struct bpf_local_storage_data *sdata;
>  
> +	WARN_ON_ONCE(!bpf_local_storage_rcu_lock_held());
>  	if (!sk || !sk_fullsock(sk) || flags > BPF_SK_STORAGE_GET_F_CREATE)
sk is protected by rcu_read_lock here.
Is it always safe to access it with the rcu_read_lock_trace alone ?

>  		return (unsigned long)NULL;
>  

  reply	other threads:[~2021-08-27 20:55 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-26 23:51 [PATCH bpf-next 0/2] Sleepable local storage KP Singh
2021-08-26 23:51 ` [PATCH bpf-next 1/2] bpf: Allow bpf_local_storage to be used by sleepable programs KP Singh
2021-08-27 20:55   ` Martin KaFai Lau [this message]
2021-08-29 21:52     ` KP Singh
2021-08-31  2:11       ` Martin KaFai Lau
2021-08-31  9:50         ` KP Singh
2021-08-31 18:22           ` Martin KaFai Lau
2021-08-31 19:38             ` KP Singh
2021-09-01  6:32               ` Martin KaFai Lau
2021-09-01 20:26                 ` Paul E. McKenney
2021-09-02  4:44                   ` Martin KaFai Lau
2021-11-23 17:11                     ` KP Singh
2021-11-23 18:22                       ` Paul E. McKenney
2021-11-23 22:29                         ` Martin KaFai Lau
2021-11-23 23:14                           ` KP Singh
2021-11-24  0:18                             ` Martin KaFai Lau
2021-11-24 22:20                           ` KP Singh
2021-11-30  2:34                             ` Martin KaFai Lau
2021-11-30 16:22                               ` KP Singh
2021-11-30 22:51                                 ` Paul E. McKenney
2021-12-04  1:01                                   ` Paul E. McKenney
2021-12-05  2:27                                     ` KP Singh
2021-12-05  3:52                                       ` Paul E. McKenney
2021-11-23 23:11                         ` KP Singh
2021-11-25  3:47                           ` Paul E. McKenney
2021-09-30 18:46                 ` Martin KaFai Lau
2021-11-02 16:00                   ` KP Singh
2021-08-26 23:51 ` [PATCH bpf-next 2/2] bpf/selftests: Update local storage selftest for " KP Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210827205530.zzqawd6wz52n65qh@kafai-mbp \
    --to=kafai@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jackmanb@chromium.org \
    --cc=jannh@google.com \
    --cc=kpsingh@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=revest@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).