From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
Daniel Borkmann <daniel@iogearbox.net>,
"Paul E . McKenney" <paulmck@kernel.org>,
Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v4 bpf-next 2/5] bpf: Introduce sleepable BPF programs
Date: Mon, 29 Jun 2020 18:15:41 -0700 [thread overview]
Message-ID: <CAEf4BzZPmtsJu8L42rMrFc3mW+h=a9ZPH_kR+dv35szi04Dz5g@mail.gmail.com> (raw)
In-Reply-To: <20200630003441.42616-3-alexei.starovoitov@gmail.com>
On Mon, Jun 29, 2020 at 5:35 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Introduce sleepable BPF programs that can request such property for themselves
> via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
> to use helpers like bpf_copy_from_user() that might sleep. At present only
> fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
> when they are attached to kernel functions that are known to allow sleeping.
>
> The non-sleepable programs are relying on implicit rcu_read_lock() and
> migrate_disable() to protect life time of programs, maps that they use and
> per-cpu kernel structures used to pass info between bpf programs and the
> kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
> migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
> should not be enclosed in migrate_disable() as well. Therefore
> rcu_read_lock_trace is used to protect the life time of sleepable progs.
>
> There are many networking and tracing program types. In many cases the
> 'struct bpf_prog *' pointer itself is rcu protected within some other kernel
> data structure and the kernel code is using rcu_dereference() to load that
> program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
> Instead sleepable bpf programs are allowed with bpf trampoline only. The
> program pointers are hard-coded into generated assembly of bpf trampoline and
> synchronize_rcu_tasks_trace() is used to protect the life time of the program.
> The same trampoline can hold both sleepable and non-sleepable progs.
>
> When rcu_read_lock_trace is held it means that some sleepable bpf program is running
> from bpf trampoline. Those programs can use bpf arrays and preallocated hash/lru
> maps. These map types are waiting on programs to complete via
> synchronize_rcu_tasks_trace();
>
> Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
> synchronize_rcu_tasks() to wait for sleepable progs to finish and for
> trampoline assembly to finish.
>
> This is the first step of introducing sleepable progs.
>
> After that dynamically allocated hash maps can be allowed. All map elements
> would have to be rcu_trace protected instead of normal rcu.
> per-cpu maps will be allowed. Either via the following pattern:
> void *elem = bpf_map_lookup_elem(map, key);
> if (elem) {
> // access elem
> bpf_map_release_elem(map, elem);
> }
> where modified lookup() helper will do migrate_disable() and
> new bpf_map_release_elem() will do corresponding migrate_enable().
> Or explicit bpf_migrate_disable/enable() helpers will be introduced.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
LGTM.
Acked-by: Andrii Nakryiko <andriin@fb.com>
> arch/x86/net/bpf_jit_comp.c | 32 ++++++++++++------
> include/linux/bpf.h | 3 ++
> include/uapi/linux/bpf.h | 8 +++++
> init/Kconfig | 1 +
> kernel/bpf/arraymap.c | 1 +
> kernel/bpf/hashtab.c | 12 +++----
> kernel/bpf/syscall.c | 13 +++++--
> kernel/bpf/trampoline.c | 33 ++++++++++++++++--
> kernel/bpf/verifier.c | 62 +++++++++++++++++++++++++++++++++-
> tools/include/uapi/linux/bpf.h | 8 +++++
> 10 files changed, 149 insertions(+), 24 deletions(-)
>
[...]
> @@ -394,6 +406,21 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start)
> rcu_read_unlock();
> }
>
> +/* when rcu_read_lock_trace is held it means that some sleepable bpf program is
> + * running. Those programs can use bpf arrays and preallocated hash maps. These
> + * map types are waiting on programs to complete via
> + * synchronize_rcu_tasks_trace();
Wanted to leave comment that "map types are waiting" is outdated after
patch #1 and then recalled map-in-map complexities. So depending if
I'm right or wrong regarding issue in patch #1, this would stay or has
to be removed.
> + */
> +void notrace __bpf_prog_enter_sleepable(void)
> +{
> + rcu_read_lock_trace();
> +}
> +
> +void notrace __bpf_prog_exit_sleepable(void)
> +{
> + rcu_read_unlock_trace();
> +}
> +
> int __weak
> arch_prepare_bpf_trampoline(void *image, void *image_end,
> const struct btf_func_model *m, u32 flags,
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 7de98906ddf4..05aa990ba9a4 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -9112,6 +9112,23 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
> return -EINVAL;
> }
>
> + if (prog->aux->sleepable)
> + switch (map->map_type) {
> + case BPF_MAP_TYPE_HASH:
> + case BPF_MAP_TYPE_LRU_HASH:
> + case BPF_MAP_TYPE_ARRAY:
> + if (!is_preallocated_map(map)) {
> + verbose(env,
> + "Sleepable programs can only use preallocated hash maps\n");
> + return -EINVAL;
> + }
> + break;
> + default:
> + verbose(env,
> + "Sleepable programs can only use array and hash maps\n");
nit: message is a bit misleading. per-cpu array is also an array, yet
is not supported.
> + return -EINVAL;
> + }
> +
> return 0;
> }
>
> @@ -10722,6 +10739,22 @@ static int check_attach_modify_return(struct bpf_prog *prog, unsigned long addr)
> return -EINVAL;
> }
>
[...]
next prev parent reply other threads:[~2020-06-30 1:15 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-30 0:34 [PATCH v4 bpf-next 0/5] bpf: Introduce minimal support for sleepable progs Alexei Starovoitov
2020-06-30 0:34 ` [PATCH v4 bpf-next 1/5] bpf: Remove redundant synchronize_rcu Alexei Starovoitov
2020-06-30 0:58 ` Andrii Nakryiko
2020-06-30 1:08 ` Andrii Nakryiko
2020-06-30 2:56 ` Alexei Starovoitov
2020-06-30 3:31 ` Andrii Nakryiko
2020-06-30 0:34 ` [PATCH v4 bpf-next 2/5] bpf: Introduce sleepable BPF programs Alexei Starovoitov
2020-06-30 1:15 ` Andrii Nakryiko [this message]
2020-06-30 3:03 ` Alexei Starovoitov
2020-06-30 0:34 ` [PATCH v4 bpf-next 3/5] bpf: Add bpf_copy_from_user() helper Alexei Starovoitov
2020-06-30 1:18 ` Andrii Nakryiko
2020-06-30 0:34 ` [PATCH v4 bpf-next 4/5] libbpf: support sleepable progs Alexei Starovoitov
2020-06-30 0:34 ` [PATCH v4 bpf-next 5/5] selftests/bpf: Add sleepable tests Alexei Starovoitov
2020-06-30 1:25 ` Andrii Nakryiko
2020-06-30 3:06 ` Alexei Starovoitov
2020-06-30 3:34 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAEf4BzZPmtsJu8L42rMrFc3mW+h=a9ZPH_kR+dv35szi04Dz5g@mail.gmail.com' \
--to=andrii.nakryiko@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=kernel-team@fb.com \
--cc=netdev@vger.kernel.org \
--cc=paulmck@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).