* [PATCH bpf-next v2 3/5] bpf: allow sleepable uprobe programs to attach
2022-05-02 23:09 [PATCH bpf-next v2 0/5] sleepable uprobe support Delyan Kratunov
@ 2022-05-02 23:09 ` Delyan Kratunov
2022-05-09 23:52 ` Andrii Nakryiko
2022-05-02 23:09 ` [PATCH bpf-next v2 5/5] selftests/bpf: add tests for sleepable kprobes and uprobes Delyan Kratunov
` (3 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Delyan Kratunov @ 2022-05-02 23:09 UTC (permalink / raw)
To: daniel, ast, andrii, bpf
uprobe and kprobe programs have the same program type, KPROBE, which is
currently not allowed to load sleepable programs.
To avoid adding a new UPROBE type, we instead allow sleepable KPROBE
programs to load and defer the is-it-actually-a-uprobe-program check
to attachment time, where we're already validating the corresponding
perf_event.
A corollary of this patch is that you can now load a sleepable kprobe
program but cannot attach it.
Signed-off-by: Delyan Kratunov <delyank@fb.com>
---
kernel/bpf/verifier.c | 4 ++--
kernel/events/core.c | 16 ++++++++++------
2 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 813f6ee80419..82b37a1a8a33 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -14571,8 +14571,8 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
}
if (prog->aux->sleepable && prog->type != BPF_PROG_TYPE_TRACING &&
- prog->type != BPF_PROG_TYPE_LSM) {
- verbose(env, "Only fentry/fexit/fmod_ret and lsm programs can be sleepable\n");
+ prog->type != BPF_PROG_TYPE_LSM && prog->type != BPF_PROG_TYPE_KPROBE) {
+ verbose(env, "Only fentry/fexit/fmod_ret, lsm, and kprobe/uprobe programs can be sleepable\n");
return -EINVAL;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 23bb19716ad3..bd09c4152a3b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10069,26 +10069,30 @@ static inline bool perf_event_is_tracing(struct perf_event *event)
int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog,
u64 bpf_cookie)
{
- bool is_kprobe, is_tracepoint, is_syscall_tp;
+ bool is_kprobe, is_uprobe, is_tracepoint, is_syscall_tp;
if (!perf_event_is_tracing(event))
return perf_event_set_bpf_handler(event, prog, bpf_cookie);
- is_kprobe = event->tp_event->flags & TRACE_EVENT_FL_UKPROBE;
+ is_kprobe = event->tp_event->flags & TRACE_EVENT_FL_KPROBE;
+ is_uprobe = event->tp_event->flags & TRACE_EVENT_FL_UPROBE;
is_tracepoint = event->tp_event->flags & TRACE_EVENT_FL_TRACEPOINT;
is_syscall_tp = is_syscall_trace_event(event->tp_event);
- if (!is_kprobe && !is_tracepoint && !is_syscall_tp)
+ if (!is_kprobe && !is_uprobe && !is_tracepoint && !is_syscall_tp)
/* bpf programs can only be attached to u/kprobe or tracepoint */
return -EINVAL;
- if ((is_kprobe && prog->type != BPF_PROG_TYPE_KPROBE) ||
+ if (((is_kprobe || is_uprobe) && prog->type != BPF_PROG_TYPE_KPROBE) ||
(is_tracepoint && prog->type != BPF_PROG_TYPE_TRACEPOINT) ||
(is_syscall_tp && prog->type != BPF_PROG_TYPE_TRACEPOINT))
return -EINVAL;
+ if (prog->type == BPF_PROG_TYPE_KPROBE && prog->aux->sleepable && !is_uprobe)
+ /* only uprobe programs are allowed to be sleepable */
+ return -EINVAL;
+
/* Kprobe override only works for kprobes, not uprobes. */
- if (prog->kprobe_override &&
- !(event->tp_event->flags & TRACE_EVENT_FL_KPROBE))
+ if (prog->kprobe_override && !is_kprobe)
return -EINVAL;
if (is_tracepoint || is_syscall_tp) {
--
2.35.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 5/5] selftests/bpf: add tests for sleepable kprobes and uprobes
2022-05-02 23:09 [PATCH bpf-next v2 0/5] sleepable uprobe support Delyan Kratunov
2022-05-02 23:09 ` [PATCH bpf-next v2 3/5] bpf: allow sleepable uprobe programs to attach Delyan Kratunov
@ 2022-05-02 23:09 ` Delyan Kratunov
2022-05-09 23:57 ` Andrii Nakryiko
2022-05-02 23:09 ` [PATCH bpf-next v2 4/5] libbpf: add support for sleepable kprobe and uprobe programs Delyan Kratunov
` (2 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Delyan Kratunov @ 2022-05-02 23:09 UTC (permalink / raw)
To: daniel, ast, andrii, bpf
Add tests that ensure sleepable kprobe programs cannot attach.
Also attach both sleepable and non-sleepable uprobe programs to the same
location (i.e. same bpf_prog_array).
Signed-off-by: Delyan Kratunov <delyank@fb.com>
---
.../selftests/bpf/prog_tests/attach_probe.c | 51 +++++++++++++++-
.../selftests/bpf/progs/test_attach_probe.c | 58 +++++++++++++++++++
2 files changed, 108 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
index 08c0601b3e84..cddb17ab0588 100644
--- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
+++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
@@ -17,6 +17,14 @@ static void trigger_func2(void)
asm volatile ("");
}
+/* attach point for byname sleepable uprobe */
+static void trigger_func3(void)
+{
+ asm volatile ("");
+}
+
+static char test_data[] = "test_data";
+
void test_attach_probe(void)
{
DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts);
@@ -27,6 +35,7 @@ void test_attach_probe(void)
struct bpf_link *uprobe_err_link;
bool legacy;
char *mem;
+ int kprobe_s_flags;
/* Check if new-style kprobe/uprobe API is supported.
* Kernels that support new FD-based kprobe and uprobe BPF attachment
@@ -49,9 +58,18 @@ void test_attach_probe(void)
if (!ASSERT_GE(ref_ctr_offset, 0, "ref_ctr_offset"))
return;
- skel = test_attach_probe__open_and_load();
+ skel = test_attach_probe__open();
if (!ASSERT_OK_PTR(skel, "skel_open"))
return;
+
+ /* sleepable kprobe test case needs flags set before loading */
+ kprobe_s_flags = bpf_program__flags(skel->progs.handle_kprobe_sleepable);
+ if (!ASSERT_OK(bpf_program__set_flags(skel->progs.handle_kprobe_sleepable,
+ kprobe_s_flags | BPF_F_SLEEPABLE), "kprobe_sleepable_flags"))
+ goto cleanup;
+
+ if (!ASSERT_OK(test_attach_probe__load(skel), "skel_load"))
+ goto cleanup;
if (!ASSERT_OK_PTR(skel->bss, "check_bss"))
goto cleanup;
@@ -151,6 +169,30 @@ void test_attach_probe(void)
if (!ASSERT_OK_PTR(skel->links.handle_uretprobe_byname2, "attach_uretprobe_byname2"))
goto cleanup;
+ /* sleepable kprobes should not attach successfully */
+ skel->links.handle_kprobe_sleepable = bpf_program__attach(skel->progs.handle_kprobe_sleepable);
+ if (!ASSERT_ERR_PTR(skel->links.handle_kprobe_sleepable, "attach_kprobe_sleepable"))
+ goto cleanup;
+
+ /* test sleepable uprobe and uretprobe variants */
+ skel->links.handle_uprobe_byname3_sleepable = bpf_program__attach(skel->progs.handle_uprobe_byname3_sleepable);
+ if (!ASSERT_OK_PTR(skel->links.handle_uprobe_byname3_sleepable, "attach_uprobe_byname3_sleepable"))
+ goto cleanup;
+
+ skel->links.handle_uprobe_byname3 = bpf_program__attach(skel->progs.handle_uprobe_byname3);
+ if (!ASSERT_OK_PTR(skel->links.handle_uprobe_byname3, "attach_uprobe_byname3"))
+ goto cleanup;
+
+ skel->links.handle_uretprobe_byname3_sleepable = bpf_program__attach(skel->progs.handle_uretprobe_byname3_sleepable);
+ if (!ASSERT_OK_PTR(skel->links.handle_uretprobe_byname3_sleepable, "attach_uretprobe_byname3_sleepable"))
+ goto cleanup;
+
+ skel->links.handle_uretprobe_byname3 = bpf_program__attach(skel->progs.handle_uretprobe_byname3);
+ if (!ASSERT_OK_PTR(skel->links.handle_uretprobe_byname3, "attach_uretprobe_byname3"))
+ goto cleanup;
+
+ skel->bss->user_ptr = test_data;
+
/* trigger & validate kprobe && kretprobe */
usleep(1);
@@ -164,6 +206,9 @@ void test_attach_probe(void)
/* trigger & validate uprobe attached by name */
trigger_func2();
+ /* trigger & validate sleepable uprobe attached by name */
+ trigger_func3();
+
ASSERT_EQ(skel->bss->kprobe_res, 1, "check_kprobe_res");
ASSERT_EQ(skel->bss->kprobe2_res, 11, "check_kprobe_auto_res");
ASSERT_EQ(skel->bss->kretprobe_res, 2, "check_kretprobe_res");
@@ -174,6 +219,10 @@ void test_attach_probe(void)
ASSERT_EQ(skel->bss->uretprobe_byname_res, 6, "check_uretprobe_byname_res");
ASSERT_EQ(skel->bss->uprobe_byname2_res, 7, "check_uprobe_byname2_res");
ASSERT_EQ(skel->bss->uretprobe_byname2_res, 8, "check_uretprobe_byname2_res");
+ ASSERT_EQ(skel->bss->uprobe_byname3_sleepable_res, 9, "check_uprobe_byname3_sleepable_res");
+ ASSERT_EQ(skel->bss->uprobe_byname3_res, 10, "check_uprobe_byname3_res");
+ ASSERT_EQ(skel->bss->uretprobe_byname3_sleepable_res, 11, "check_uretprobe_byname3_sleepable_res");
+ ASSERT_EQ(skel->bss->uretprobe_byname3_res, 12, "check_uretprobe_byname3_res");
cleanup:
test_attach_probe__destroy(skel);
diff --git a/tools/testing/selftests/bpf/progs/test_attach_probe.c b/tools/testing/selftests/bpf/progs/test_attach_probe.c
index ce9acf4db8d2..5bcaab2c0c54 100644
--- a/tools/testing/selftests/bpf/progs/test_attach_probe.c
+++ b/tools/testing/selftests/bpf/progs/test_attach_probe.c
@@ -5,6 +5,7 @@
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
+#include <stdbool.h>
#include "bpf_misc.h"
int kprobe_res = 0;
@@ -17,6 +18,11 @@ int uprobe_byname_res = 0;
int uretprobe_byname_res = 0;
int uprobe_byname2_res = 0;
int uretprobe_byname2_res = 0;
+int uprobe_byname3_sleepable_res = 0;
+int uprobe_byname3_res = 0;
+int uretprobe_byname3_sleepable_res = 0;
+int uretprobe_byname3_res = 0;
+void *user_ptr = 0;
SEC("kprobe")
int handle_kprobe(struct pt_regs *ctx)
@@ -32,6 +38,17 @@ int BPF_KPROBE(handle_kprobe_auto)
return 0;
}
+/**
+ * This program will be manually made sleepable on the userspace side
+ * and should thus be unattachable.
+ */
+SEC("kprobe/" SYS_PREFIX "sys_nanosleep")
+int handle_kprobe_sleepable(struct pt_regs *ctx)
+{
+ kprobe_res = 2;
+ return 0;
+}
+
SEC("kretprobe")
int handle_kretprobe(struct pt_regs *ctx)
{
@@ -93,4 +110,45 @@ int handle_uretprobe_byname2(struct pt_regs *ctx)
return 0;
}
+static inline bool verify_sleepable_user_copy() {
+ char data[9];
+ bpf_copy_from_user(data, sizeof(data), user_ptr);
+ return bpf_strncmp(data, sizeof(data), "test_data") == 0;
+}
+
+SEC("uprobe.s//proc/self/exe:trigger_func3")
+int handle_uprobe_byname3_sleepable(struct pt_regs *ctx)
+{
+ if (verify_sleepable_user_copy())
+ uprobe_byname3_sleepable_res = 9;
+ return 0;
+}
+
+/**
+ * same target as the uprobe.s above to force sleepable and non-sleepable
+ * programs in the same bpf_prog_array
+ */
+SEC("uprobe//proc/self/exe:trigger_func3")
+int handle_uprobe_byname3(struct pt_regs *ctx)
+{
+ uprobe_byname3_res = 10;
+ return 0;
+}
+
+SEC("uretprobe.s//proc/self/exe:trigger_func3")
+int handle_uretprobe_byname3_sleepable(struct pt_regs *ctx)
+{
+ if (verify_sleepable_user_copy())
+ uretprobe_byname3_sleepable_res = 11;
+ return 0;
+}
+
+SEC("uretprobe//proc/self/exe:trigger_func3")
+int handle_uretprobe_byname3(struct pt_regs *ctx)
+{
+ uretprobe_byname3_res = 12;
+ return 0;
+}
+
+
char _license[] SEC("license") = "GPL";
--
2.35.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 0/5] sleepable uprobe support
@ 2022-05-02 23:09 Delyan Kratunov
2022-05-02 23:09 ` [PATCH bpf-next v2 3/5] bpf: allow sleepable uprobe programs to attach Delyan Kratunov
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Delyan Kratunov @ 2022-05-02 23:09 UTC (permalink / raw)
To: daniel, ast, andrii, bpf
This series implements support for sleepable uprobe programs.
Key work is in patches 2 and 3, the rest is plumbing and tests.
The main observation is that the only obstacle in the way of sleepable uprobe
programs is not the uprobe infrastructure, which already runs in a user context,
but the rcu usage around bpf_prog_array.
Details are in patch 2 but the tl;dr is that we chain trace_tasks and normal rcu
grace periods when releasing to array to accommodate users of either rcu type.
This introduces latency for non-sleepable users (kprobe, tp) but that's deemed
acceptable, given recent benchmarks by Andrii [1]. We're a couple of orders of
magnitude under the rate of bpf_prog_array churn that would raise flags (~1MM/s per Paul).
[1]: https://lore.kernel.org/bpf/CAEf4BzbpjN6ca7D9KOTiFPOoBYkciYvTz0UJNp5c-_3ptm=Mrg@mail.gmail.com/
v1 -> v2:
* Fix lockdep annotations in bpf_prog_run_array_sleepable
* Chain rcu grace periods only for perf_event-attached programs. This limits
the additional latency on the free path to use cases where we know it won't
be a problem.
* Add tests calling helpers only available in sleepable programs.
* Remove kprobe.s support from libbpf.
Delyan Kratunov (5):
bpf: move bpf_prog to bpf.h
bpf: implement sleepable uprobes by chaining tasks_trace and normal
rcu
bpf: allow sleepable uprobe programs to attach
libbpf: add support for sleepable kprobe and uprobe programs
selftests/bpf: add tests for sleepable kprobes and uprobes
include/linux/bpf.h | 93 +++++++++++++++++++
include/linux/filter.h | 34 -------
include/linux/trace_events.h | 1 +
kernel/bpf/core.c | 15 +++
kernel/bpf/verifier.c | 4 +-
kernel/events/core.c | 16 ++--
kernel/trace/bpf_trace.c | 27 +++++-
kernel/trace/trace_uprobe.c | 4 +-
tools/lib/bpf/libbpf.c | 6 +-
.../selftests/bpf/prog_tests/attach_probe.c | 51 +++++++++-
.../selftests/bpf/progs/test_attach_probe.c | 58 ++++++++++++
11 files changed, 260 insertions(+), 49 deletions(-)
--
2.35.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 4/5] libbpf: add support for sleepable kprobe and uprobe programs
2022-05-02 23:09 [PATCH bpf-next v2 0/5] sleepable uprobe support Delyan Kratunov
2022-05-02 23:09 ` [PATCH bpf-next v2 3/5] bpf: allow sleepable uprobe programs to attach Delyan Kratunov
2022-05-02 23:09 ` [PATCH bpf-next v2 5/5] selftests/bpf: add tests for sleepable kprobes and uprobes Delyan Kratunov
@ 2022-05-02 23:09 ` Delyan Kratunov
2022-05-09 23:53 ` Andrii Nakryiko
2022-05-02 23:09 ` [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu Delyan Kratunov
2022-05-02 23:09 ` [PATCH bpf-next v2 1/5] bpf: move bpf_prog to bpf.h Delyan Kratunov
4 siblings, 1 reply; 17+ messages in thread
From: Delyan Kratunov @ 2022-05-02 23:09 UTC (permalink / raw)
To: daniel, ast, andrii, bpf
Add section mappings for uprobe.s and kprobe.s programs. The latter
cannot currently attach but they're still useful to open and load in
order to validate that prohibition.
Signed-off-by: Delyan Kratunov <delyank@fb.com>
---
tools/lib/bpf/libbpf.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 63c0f412266c..d89529c9b52d 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8945,8 +8945,10 @@ static const struct bpf_sec_def section_defs[] = {
SEC_DEF("sk_reuseport", SK_REUSEPORT, BPF_SK_REUSEPORT_SELECT, SEC_ATTACHABLE | SEC_SLOPPY_PFX),
SEC_DEF("kprobe+", KPROBE, 0, SEC_NONE, attach_kprobe),
SEC_DEF("uprobe+", KPROBE, 0, SEC_NONE, attach_uprobe),
+ SEC_DEF("uprobe.s+", KPROBE, 0, SEC_SLEEPABLE, attach_uprobe),
SEC_DEF("kretprobe+", KPROBE, 0, SEC_NONE, attach_kprobe),
SEC_DEF("uretprobe+", KPROBE, 0, SEC_NONE, attach_uprobe),
+ SEC_DEF("uretprobe.s+", KPROBE, 0, SEC_SLEEPABLE, attach_uprobe),
SEC_DEF("kprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
SEC_DEF("kretprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
SEC_DEF("usdt+", KPROBE, 0, SEC_NONE, attach_usdt),
@@ -10697,6 +10699,7 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf
else
func_name = prog->sec_name + sizeof("kprobe/") - 1;
+
n = sscanf(func_name, "%m[a-zA-Z0-9_.]+%li", &func, &offset);
if (n < 1) {
pr_warn("kprobe name is invalid: %s\n", func_name);
@@ -11222,7 +11225,8 @@ static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf
break;
case 3:
case 4:
- opts.retprobe = strcmp(probe_type, "uretprobe") == 0;
+ opts.retprobe = strcmp(probe_type, "uretprobe") == 0 ||
+ strcmp(probe_type, "uretprobe.s") == 0;
if (opts.retprobe && offset != 0) {
pr_warn("prog '%s': uretprobes do not support offset specification\n",
prog->name);
--
2.35.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu
2022-05-02 23:09 [PATCH bpf-next v2 0/5] sleepable uprobe support Delyan Kratunov
` (2 preceding siblings ...)
2022-05-02 23:09 ` [PATCH bpf-next v2 4/5] libbpf: add support for sleepable kprobe and uprobe programs Delyan Kratunov
@ 2022-05-02 23:09 ` Delyan Kratunov
2022-05-03 6:30 ` kernel test robot
` (2 more replies)
2022-05-02 23:09 ` [PATCH bpf-next v2 1/5] bpf: move bpf_prog to bpf.h Delyan Kratunov
4 siblings, 3 replies; 17+ messages in thread
From: Delyan Kratunov @ 2022-05-02 23:09 UTC (permalink / raw)
To: daniel, ast, andrii, bpf
uprobes work by raising a trap, setting a task flag from within the
interrupt handler, and processing the actual work for the uprobe on the
way back to userspace. As a result, uprobe handlers already execute in a
user context. The primary obstacle to sleepable bpf uprobe programs is
therefore on the bpf side.
Namely, the bpf_prog_array attached to the uprobe is protected by normal
rcu and runs with disabled preemption. In order for uprobe bpf programs
to become actually sleepable, we need it to be protected by the tasks_trace
rcu flavor instead (and kfree() called after a corresponding grace period).
Based on Alexei's proposal, we change the free path for bpf_prog_array to
chain a tasks_trace and normal grace periods one after the other.
Users who iterate under tasks_trace read section would
be safe, as would users who iterate under normal read sections (from
non-sleepable locations). The downside is that we take the tasks_trace latency
for all perf_event-attached bpf programs (and not just uprobe ones)
but this is deemed safe given the possible attach rates for
kprobe/uprobe/tp programs.
Separately, non-sleepable programs need access to dynamically sized
rcu-protected maps, so we conditionally disable preemption and take an rcu
read section around them, in addition to the overarching tasks_trace section.
Signed-off-by: Delyan Kratunov <delyank@fb.com>
---
include/linux/bpf.h | 57 ++++++++++++++++++++++++++++++++++++
include/linux/trace_events.h | 1 +
kernel/bpf/core.c | 15 ++++++++++
kernel/trace/bpf_trace.c | 27 +++++++++++++++--
kernel/trace/trace_uprobe.c | 4 +--
5 files changed, 99 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 57ec619cf729..592886115011 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -26,6 +26,7 @@
#include <linux/stddef.h>
#include <linux/bpfptr.h>
#include <linux/btf.h>
+#include <linux/rcupdate_trace.h>
struct bpf_verifier_env;
struct bpf_verifier_log;
@@ -1343,6 +1344,8 @@ extern struct bpf_empty_prog_array bpf_empty_prog_array;
struct bpf_prog_array *bpf_prog_array_alloc(u32 prog_cnt, gfp_t flags);
void bpf_prog_array_free(struct bpf_prog_array *progs);
+/* Use when traversal over the bpf_prog_array uses tasks_trace rcu */
+void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs);
int bpf_prog_array_length(struct bpf_prog_array *progs);
bool bpf_prog_array_is_empty(struct bpf_prog_array *array);
int bpf_prog_array_copy_to_user(struct bpf_prog_array *progs,
@@ -1428,6 +1431,60 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
return ret;
}
+/**
+ * Notes on RCU design for bpf_prog_arrays containing sleepable programs:
+ *
+ * We use the tasks_trace rcu flavor read section to protect the bpf_prog_array
+ * overall. As a result, we must use the bpf_prog_array_free_sleepable
+ * in order to use the tasks_trace rcu grace period.
+ *
+ * When a non-sleepable program is inside the array, we take the rcu read
+ * section and disable preemption for that program alone, so it can access
+ * rcu-protected dynamically sized maps.
+ */
+static __always_inline u32
+bpf_prog_run_array_sleepable(const struct bpf_prog_array __rcu *array_rcu,
+ const void *ctx, bpf_prog_run_fn run_prog)
+{
+ const struct bpf_prog_array_item *item;
+ const struct bpf_prog *prog;
+ const struct bpf_prog_array *array;
+ struct bpf_run_ctx *old_run_ctx;
+ struct bpf_trace_run_ctx run_ctx;
+ u32 ret = 1;
+
+ might_fault();
+
+ migrate_disable();
+ rcu_read_lock_trace();
+
+ array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
+ if (unlikely(!array))
+ goto out;
+ old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+ item = &array->items[0];
+ while ((prog = READ_ONCE(item->prog))) {
+ if (!prog->aux->sleepable) {
+ preempt_disable();
+ rcu_read_lock();
+ }
+
+ run_ctx.bpf_cookie = item->bpf_cookie;
+ ret &= run_prog(prog, ctx);
+ item++;
+
+ if (!prog->aux->sleepable) {
+ rcu_read_unlock();
+ preempt_enable();
+ }
+ }
+ bpf_reset_run_ctx(old_run_ctx);
+out:
+ rcu_read_unlock_trace();
+ migrate_enable();
+ return ret;
+}
+
#ifdef CONFIG_BPF_SYSCALL
DECLARE_PER_CPU(int, bpf_prog_active);
extern struct mutex bpf_stats_enabled_mutex;
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index e6e95a9f07a5..d45889f1210d 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -736,6 +736,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
#ifdef CONFIG_BPF_EVENTS
unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
+unsigned int uprobe_call_bpf(struct trace_event_call *call, void *ctx);
int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
void perf_event_detach_bpf_prog(struct perf_event *event);
int perf_event_query_prog_array(struct perf_event *event, void __user *info);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 13e9dbeeedf3..9271b708807a 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2268,6 +2268,21 @@ void bpf_prog_array_free(struct bpf_prog_array *progs)
kfree_rcu(progs, rcu);
}
+static void __bpf_prog_array_free_sleepable_cb(struct rcu_head *rcu)
+{
+ struct bpf_prog_array *progs;
+
+ progs = container_of(rcu, struct bpf_prog_array, rcu);
+ kfree_rcu(progs, rcu);
+}
+
+void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs)
+{
+ if (!progs || progs == &bpf_empty_prog_array.hdr)
+ return;
+ call_rcu_tasks_trace(&progs->rcu, __bpf_prog_array_free_sleepable_cb);
+}
+
int bpf_prog_array_length(struct bpf_prog_array *array)
{
struct bpf_prog_array_item *item;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f15b826f9899..582a6171e096 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -140,6 +140,29 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
return ret;
}
+unsigned int uprobe_call_bpf(struct trace_event_call *call, void *ctx)
+{
+ unsigned int ret;
+
+ /*
+ * Instead of moving rcu_read_lock/rcu_dereference/rcu_read_unlock
+ * to all call sites, we did a bpf_prog_array_valid() there to check
+ * whether call->prog_array is empty or not, which is
+ * a heuristic to speed up execution.
+ *
+ * If bpf_prog_array_valid() fetched prog_array was
+ * non-NULL, we go into uprobe_call_bpf() and do the actual
+ * proper rcu_dereference() under RCU trace lock.
+ * If it turns out that prog_array is NULL then, we bail out.
+ * For the opposite, if the bpf_prog_array_valid() fetched pointer
+ * was NULL, you'll skip the prog_array with the risk of missing
+ * out of events when it was updated in between this and the
+ * rcu_dereference() which is accepted risk.
+ */
+ ret = bpf_prog_run_array_sleepable(call->prog_array, ctx, bpf_prog_run);
+ return ret;
+}
+
#ifdef CONFIG_BPF_KPROBE_OVERRIDE
BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
{
@@ -1915,7 +1938,7 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
event->prog = prog;
event->bpf_cookie = bpf_cookie;
rcu_assign_pointer(event->tp_event->prog_array, new_array);
- bpf_prog_array_free(old_array);
+ bpf_prog_array_free_sleepable(old_array);
unlock:
mutex_unlock(&bpf_event_mutex);
@@ -1941,7 +1964,7 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
bpf_prog_array_delete_safe(old_array, event->prog);
} else {
rcu_assign_pointer(event->tp_event->prog_array, new_array);
- bpf_prog_array_free(old_array);
+ bpf_prog_array_free_sleepable(old_array);
}
bpf_prog_put(event->prog);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 9711589273cd..3eb48897d15b 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -1346,9 +1346,7 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
if (bpf_prog_array_valid(call)) {
u32 ret;
- preempt_disable();
- ret = trace_call_bpf(call, regs);
- preempt_enable();
+ ret = uprobe_call_bpf(call, regs);
if (!ret)
return;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH bpf-next v2 1/5] bpf: move bpf_prog to bpf.h
2022-05-02 23:09 [PATCH bpf-next v2 0/5] sleepable uprobe support Delyan Kratunov
` (3 preceding siblings ...)
2022-05-02 23:09 ` [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu Delyan Kratunov
@ 2022-05-02 23:09 ` Delyan Kratunov
2022-05-10 3:04 ` Alexei Starovoitov
4 siblings, 1 reply; 17+ messages in thread
From: Delyan Kratunov @ 2022-05-02 23:09 UTC (permalink / raw)
To: daniel, ast, andrii, bpf
In order to add a version of bpf_prog_run_array which accesses the
bpf_prog->aux member, we need bpf_prog to be more than a forward
declaration inside bpf.h.
Given that filter.h already includes bpf.h, this merely reorders
the type declarations for filter.h users. bpf.h users now have access to
bpf_prog internals.
Signed-off-by: Delyan Kratunov <delyank@fb.com>
---
include/linux/bpf.h | 36 ++++++++++++++++++++++++++++++++++++
include/linux/filter.h | 34 ----------------------------------
2 files changed, 36 insertions(+), 34 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index be94833d390a..57ec619cf729 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -5,6 +5,7 @@
#define _LINUX_BPF_H 1
#include <uapi/linux/bpf.h>
+#include <uapi/linux/filter.h>
#include <linux/workqueue.h>
#include <linux/file.h>
@@ -22,6 +23,7 @@
#include <linux/sched/mm.h>
#include <linux/slab.h>
#include <linux/percpu-refcount.h>
+#include <linux/stddef.h>
#include <linux/bpfptr.h>
#include <linux/btf.h>
@@ -1068,6 +1070,40 @@ struct bpf_prog_aux {
};
};
+struct bpf_prog {
+ u16 pages; /* Number of allocated pages */
+ u16 jited:1, /* Is our filter JIT'ed? */
+ jit_requested:1,/* archs need to JIT the prog */
+ gpl_compatible:1, /* Is filter GPL compatible? */
+ cb_access:1, /* Is control block accessed? */
+ dst_needed:1, /* Do we need dst entry? */
+ blinding_requested:1, /* needs constant blinding */
+ blinded:1, /* Was blinded */
+ is_func:1, /* program is a bpf function */
+ kprobe_override:1, /* Do we override a kprobe? */
+ has_callchain_buf:1, /* callchain buffer allocated? */
+ enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
+ call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
+ call_get_func_ip:1, /* Do we call get_func_ip() */
+ tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
+ enum bpf_prog_type type; /* Type of BPF program */
+ enum bpf_attach_type expected_attach_type; /* For some prog types */
+ u32 len; /* Number of filter blocks */
+ u32 jited_len; /* Size of jited insns in bytes */
+ u8 tag[BPF_TAG_SIZE];
+ struct bpf_prog_stats __percpu *stats;
+ int __percpu *active;
+ unsigned int (*bpf_func)(const void *ctx,
+ const struct bpf_insn *insn);
+ struct bpf_prog_aux *aux; /* Auxiliary fields */
+ struct sock_fprog_kern *orig_prog; /* Original BPF program */
+ /* Instructions for interpreter */
+ union {
+ DECLARE_FLEX_ARRAY(struct sock_filter, insns);
+ DECLARE_FLEX_ARRAY(struct bpf_insn, insnsi);
+ };
+};
+
struct bpf_array_aux {
/* Programs with direct jumps into programs part of this array. */
struct list_head poke_progs;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index ed0c0ff42ad5..d0cbb31b1b4d 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -559,40 +559,6 @@ struct bpf_prog_stats {
struct u64_stats_sync syncp;
} __aligned(2 * sizeof(u64));
-struct bpf_prog {
- u16 pages; /* Number of allocated pages */
- u16 jited:1, /* Is our filter JIT'ed? */
- jit_requested:1,/* archs need to JIT the prog */
- gpl_compatible:1, /* Is filter GPL compatible? */
- cb_access:1, /* Is control block accessed? */
- dst_needed:1, /* Do we need dst entry? */
- blinding_requested:1, /* needs constant blinding */
- blinded:1, /* Was blinded */
- is_func:1, /* program is a bpf function */
- kprobe_override:1, /* Do we override a kprobe? */
- has_callchain_buf:1, /* callchain buffer allocated? */
- enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
- call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
- call_get_func_ip:1, /* Do we call get_func_ip() */
- tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
- enum bpf_prog_type type; /* Type of BPF program */
- enum bpf_attach_type expected_attach_type; /* For some prog types */
- u32 len; /* Number of filter blocks */
- u32 jited_len; /* Size of jited insns in bytes */
- u8 tag[BPF_TAG_SIZE];
- struct bpf_prog_stats __percpu *stats;
- int __percpu *active;
- unsigned int (*bpf_func)(const void *ctx,
- const struct bpf_insn *insn);
- struct bpf_prog_aux *aux; /* Auxiliary fields */
- struct sock_fprog_kern *orig_prog; /* Original BPF program */
- /* Instructions for interpreter */
- union {
- DECLARE_FLEX_ARRAY(struct sock_filter, insns);
- DECLARE_FLEX_ARRAY(struct bpf_insn, insnsi);
- };
-};
-
struct sk_filter {
refcount_t refcnt;
struct rcu_head rcu;
--
2.35.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu
2022-05-02 23:09 ` [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu Delyan Kratunov
@ 2022-05-03 6:30 ` kernel test robot
2022-05-03 17:20 ` Delyan Kratunov
2022-05-03 8:48 ` kernel test robot
2022-05-10 3:02 ` Alexei Starovoitov
2 siblings, 1 reply; 17+ messages in thread
From: kernel test robot @ 2022-05-03 6:30 UTC (permalink / raw)
To: Delyan Kratunov, daniel, ast, andrii, bpf; +Cc: kbuild-all
Hi Delyan,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Delyan-Kratunov/sleepable-uprobe-support/20220503-071247
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: i386-defconfig (https://download.01.org/0day-ci/archive/20220503/202205031441.1fhDuUQK-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce (this is a W=1 build):
# https://github.com/intel-lab-lkp/linux/commit/cfa0f114829902b579da16d7520a39317905c502
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Delyan-Kratunov/sleepable-uprobe-support/20220503-071247
git checkout cfa0f114829902b579da16d7520a39317905c502
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
kernel/trace/trace_uprobe.c: In function '__uprobe_perf_func':
>> kernel/trace/trace_uprobe.c:1349:23: error: implicit declaration of function 'uprobe_call_bpf'; did you mean 'trace_call_bpf'? [-Werror=implicit-function-declaration]
1349 | ret = uprobe_call_bpf(call, regs);
| ^~~~~~~~~~~~~~~
| trace_call_bpf
cc1: some warnings being treated as errors
vim +1349 kernel/trace/trace_uprobe.c
1334
1335 static void __uprobe_perf_func(struct trace_uprobe *tu,
1336 unsigned long func, struct pt_regs *regs,
1337 struct uprobe_cpu_buffer *ucb, int dsize)
1338 {
1339 struct trace_event_call *call = trace_probe_event_call(&tu->tp);
1340 struct uprobe_trace_entry_head *entry;
1341 struct hlist_head *head;
1342 void *data;
1343 int size, esize;
1344 int rctx;
1345
1346 if (bpf_prog_array_valid(call)) {
1347 u32 ret;
1348
> 1349 ret = uprobe_call_bpf(call, regs);
1350 if (!ret)
1351 return;
1352 }
1353
1354 esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
1355
1356 size = esize + tu->tp.size + dsize;
1357 size = ALIGN(size + sizeof(u32), sizeof(u64)) - sizeof(u32);
1358 if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
1359 return;
1360
1361 preempt_disable();
1362 head = this_cpu_ptr(call->perf_events);
1363 if (hlist_empty(head))
1364 goto out;
1365
1366 entry = perf_trace_buf_alloc(size, NULL, &rctx);
1367 if (!entry)
1368 goto out;
1369
1370 if (is_ret_probe(tu)) {
1371 entry->vaddr[0] = func;
1372 entry->vaddr[1] = instruction_pointer(regs);
1373 data = DATAOF_TRACE_ENTRY(entry, true);
1374 } else {
1375 entry->vaddr[0] = instruction_pointer(regs);
1376 data = DATAOF_TRACE_ENTRY(entry, false);
1377 }
1378
1379 memcpy(data, ucb->buf, tu->tp.size + dsize);
1380
1381 if (size - esize > tu->tp.size + dsize) {
1382 int len = tu->tp.size + dsize;
1383
1384 memset(data + len, 0, size - esize - len);
1385 }
1386
1387 perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
1388 head, NULL);
1389 out:
1390 preempt_enable();
1391 }
1392
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu
2022-05-02 23:09 ` [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu Delyan Kratunov
2022-05-03 6:30 ` kernel test robot
@ 2022-05-03 8:48 ` kernel test robot
2022-05-10 3:02 ` Alexei Starovoitov
2 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2022-05-03 8:48 UTC (permalink / raw)
To: Delyan Kratunov, daniel, ast, andrii, bpf; +Cc: llvm, kbuild-all
Hi Delyan,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on bpf-next/master]
url: https://github.com/intel-lab-lkp/linux/commits/Delyan-Kratunov/sleepable-uprobe-support/20220503-071247
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: s390-randconfig-r003-20220501 (https://download.01.org/0day-ci/archive/20220503/202205031643.UGYirnXb-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 363b3a645a1e30011cc8da624f13dac5fd915628)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install s390 cross compiling tool for clang build
# apt-get install binutils-s390x-linux-gnu
# https://github.com/intel-lab-lkp/linux/commit/cfa0f114829902b579da16d7520a39317905c502
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Delyan-Kratunov/sleepable-uprobe-support/20220503-071247
git checkout cfa0f114829902b579da16d7520a39317905c502
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash kernel/trace/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
In file included from kernel/trace/trace_uprobe.c:10:
In file included from include/linux/bpf-cgroup.h:11:
In file included from include/net/sock.h:46:
In file included from include/linux/netdevice.h:38:
In file included from include/net/net_namespace.h:40:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:75:
include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __raw_readb(PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:37:59: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
^
include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
#define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
^
In file included from kernel/trace/trace_uprobe.c:10:
In file included from include/linux/bpf-cgroup.h:11:
In file included from include/net/sock.h:46:
In file included from include/linux/netdevice.h:38:
In file included from include/net/net_namespace.h:40:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:75:
include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
~~~~~~~~~~ ^
include/uapi/linux/byteorder/big_endian.h:35:59: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
^
include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
#define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
^
In file included from kernel/trace/trace_uprobe.c:10:
In file included from include/linux/bpf-cgroup.h:11:
In file included from include/net/sock.h:46:
In file included from include/linux/netdevice.h:38:
In file included from include/net/net_namespace.h:40:
In file included from include/linux/skbuff.h:31:
In file included from include/linux/dma-mapping.h:10:
In file included from include/linux/scatterlist.h:9:
In file included from arch/s390/include/asm/io.h:75:
include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writeb(value, PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
__raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
~~~~~~~~~~ ^
include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
readsl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesb(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesw(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
writesl(PCI_IOBASE + addr, buffer, count);
~~~~~~~~~~ ^
>> kernel/trace/trace_uprobe.c:1349:9: error: call to undeclared function 'uprobe_call_bpf'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
ret = uprobe_call_bpf(call, regs);
^
kernel/trace/trace_uprobe.c:1349:9: note: did you mean 'trace_call_bpf'?
include/linux/trace_events.h:752:28: note: 'trace_call_bpf' declared here
static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
^
12 warnings and 1 error generated.
vim +/uprobe_call_bpf +1349 kernel/trace/trace_uprobe.c
1334
1335 static void __uprobe_perf_func(struct trace_uprobe *tu,
1336 unsigned long func, struct pt_regs *regs,
1337 struct uprobe_cpu_buffer *ucb, int dsize)
1338 {
1339 struct trace_event_call *call = trace_probe_event_call(&tu->tp);
1340 struct uprobe_trace_entry_head *entry;
1341 struct hlist_head *head;
1342 void *data;
1343 int size, esize;
1344 int rctx;
1345
1346 if (bpf_prog_array_valid(call)) {
1347 u32 ret;
1348
> 1349 ret = uprobe_call_bpf(call, regs);
1350 if (!ret)
1351 return;
1352 }
1353
1354 esize = SIZEOF_TRACE_ENTRY(is_ret_probe(tu));
1355
1356 size = esize + tu->tp.size + dsize;
1357 size = ALIGN(size + sizeof(u32), sizeof(u64)) - sizeof(u32);
1358 if (WARN_ONCE(size > PERF_MAX_TRACE_SIZE, "profile buffer not large enough"))
1359 return;
1360
1361 preempt_disable();
1362 head = this_cpu_ptr(call->perf_events);
1363 if (hlist_empty(head))
1364 goto out;
1365
1366 entry = perf_trace_buf_alloc(size, NULL, &rctx);
1367 if (!entry)
1368 goto out;
1369
1370 if (is_ret_probe(tu)) {
1371 entry->vaddr[0] = func;
1372 entry->vaddr[1] = instruction_pointer(regs);
1373 data = DATAOF_TRACE_ENTRY(entry, true);
1374 } else {
1375 entry->vaddr[0] = instruction_pointer(regs);
1376 data = DATAOF_TRACE_ENTRY(entry, false);
1377 }
1378
1379 memcpy(data, ucb->buf, tu->tp.size + dsize);
1380
1381 if (size - esize > tu->tp.size + dsize) {
1382 int len = tu->tp.size + dsize;
1383
1384 memset(data + len, 0, size - esize - len);
1385 }
1386
1387 perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
1388 head, NULL);
1389 out:
1390 preempt_enable();
1391 }
1392
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu
2022-05-03 6:30 ` kernel test robot
@ 2022-05-03 17:20 ` Delyan Kratunov
2022-05-09 23:58 ` Andrii Nakryiko
2022-05-10 2:54 ` Alexei Starovoitov
0 siblings, 2 replies; 17+ messages in thread
From: Delyan Kratunov @ 2022-05-03 17:20 UTC (permalink / raw)
To: daniel, ast, andrii, lkp, bpf; +Cc: kbuild-all
On Tue, 2022-05-03 at 14:30 +0800, kernel test robot wrote:
> Hi Delyan,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on bpf-next/master]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Delyan-Kratunov/sleepable-uprobe-support/20220503-071247
> base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> config: i386-defconfig (https://download.01.org/0day-ci/archive/20220503/202205031441.1fhDuUQK-lkp@intel.com/config )
> compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
> reproduce (this is a W=1 build):
> # https://github.com/intel-lab-lkp/linux/commit/cfa0f114829902b579da16d7520a39317905c502
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review Delyan-Kratunov/sleepable-uprobe-support/20220503-071247
> git checkout cfa0f114829902b579da16d7520a39317905c502
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
> kernel/trace/trace_uprobe.c: In function '__uprobe_perf_func':
> > > kernel/trace/trace_uprobe.c:1349:23: error: implicit declaration of function 'uprobe_call_bpf'; did you mean 'trace_call_bpf'? [-Werror=implicit-function-declaration]
> 1349 | ret = uprobe_call_bpf(call, regs);
> | ^~~~~~~~~~~~~~~
> | trace_call_bpf
> cc1: some warnings being treated as errors
Hm, CONFIG_BPF_EVENTS doesn't seem to guard the callsite from trace_uprobe.c, it's
only gated by CONFIG_PERF_EVENTS there. A PERF_EVENTS=y && BPF_EVENTS=n config would
lead to this error.
This is a pre-existing issue and I'll send a separate patch for it.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 3/5] bpf: allow sleepable uprobe programs to attach
2022-05-02 23:09 ` [PATCH bpf-next v2 3/5] bpf: allow sleepable uprobe programs to attach Delyan Kratunov
@ 2022-05-09 23:52 ` Andrii Nakryiko
0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2022-05-09 23:52 UTC (permalink / raw)
To: Delyan Kratunov; +Cc: daniel, ast, andrii, bpf
On Mon, May 2, 2022 at 4:09 PM Delyan Kratunov <delyank@fb.com> wrote:
>
> uprobe and kprobe programs have the same program type, KPROBE, which is
> currently not allowed to load sleepable programs.
>
> To avoid adding a new UPROBE type, we instead allow sleepable KPROBE
> programs to load and defer the is-it-actually-a-uprobe-program check
> to attachment time, where we're already validating the corresponding
> perf_event.
>
> A corollary of this patch is that you can now load a sleepable kprobe
> program but cannot attach it.
>
> Signed-off-by: Delyan Kratunov <delyank@fb.com>
> ---
LGTM.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
> kernel/bpf/verifier.c | 4 ++--
> kernel/events/core.c | 16 ++++++++++------
> 2 files changed, 12 insertions(+), 8 deletions(-)
>
[...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 4/5] libbpf: add support for sleepable kprobe and uprobe programs
2022-05-02 23:09 ` [PATCH bpf-next v2 4/5] libbpf: add support for sleepable kprobe and uprobe programs Delyan Kratunov
@ 2022-05-09 23:53 ` Andrii Nakryiko
0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2022-05-09 23:53 UTC (permalink / raw)
To: Delyan Kratunov; +Cc: daniel, ast, andrii, bpf
On Mon, May 2, 2022 at 4:09 PM Delyan Kratunov <delyank@fb.com> wrote:
>
> Add section mappings for uprobe.s and kprobe.s programs. The latter
> cannot currently attach but they're still useful to open and load in
> order to validate that prohibition.
>
> Signed-off-by: Delyan Kratunov <delyank@fb.com>
> ---
> tools/lib/bpf/libbpf.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
One nit below, otherwise LGTM
Acked-by: Andrii Nakryiko <andrii@kernel.org>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 63c0f412266c..d89529c9b52d 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -8945,8 +8945,10 @@ static const struct bpf_sec_def section_defs[] = {
> SEC_DEF("sk_reuseport", SK_REUSEPORT, BPF_SK_REUSEPORT_SELECT, SEC_ATTACHABLE | SEC_SLOPPY_PFX),
> SEC_DEF("kprobe+", KPROBE, 0, SEC_NONE, attach_kprobe),
> SEC_DEF("uprobe+", KPROBE, 0, SEC_NONE, attach_uprobe),
> + SEC_DEF("uprobe.s+", KPROBE, 0, SEC_SLEEPABLE, attach_uprobe),
> SEC_DEF("kretprobe+", KPROBE, 0, SEC_NONE, attach_kprobe),
> SEC_DEF("uretprobe+", KPROBE, 0, SEC_NONE, attach_uprobe),
> + SEC_DEF("uretprobe.s+", KPROBE, 0, SEC_SLEEPABLE, attach_uprobe),
> SEC_DEF("kprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
> SEC_DEF("kretprobe.multi+", KPROBE, BPF_TRACE_KPROBE_MULTI, SEC_NONE, attach_kprobe_multi),
> SEC_DEF("usdt+", KPROBE, 0, SEC_NONE, attach_usdt),
> @@ -10697,6 +10699,7 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf
> else
> func_name = prog->sec_name + sizeof("kprobe/") - 1;
>
> +
unnecessary empty line?
> n = sscanf(func_name, "%m[a-zA-Z0-9_.]+%li", &func, &offset);
> if (n < 1) {
> pr_warn("kprobe name is invalid: %s\n", func_name);
> @@ -11222,7 +11225,8 @@ static int attach_uprobe(const struct bpf_program *prog, long cookie, struct bpf
> break;
> case 3:
> case 4:
> - opts.retprobe = strcmp(probe_type, "uretprobe") == 0;
> + opts.retprobe = strcmp(probe_type, "uretprobe") == 0 ||
> + strcmp(probe_type, "uretprobe.s") == 0;
> if (opts.retprobe && offset != 0) {
> pr_warn("prog '%s': uretprobes do not support offset specification\n",
> prog->name);
> --
> 2.35.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 5/5] selftests/bpf: add tests for sleepable kprobes and uprobes
2022-05-02 23:09 ` [PATCH bpf-next v2 5/5] selftests/bpf: add tests for sleepable kprobes and uprobes Delyan Kratunov
@ 2022-05-09 23:57 ` Andrii Nakryiko
0 siblings, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2022-05-09 23:57 UTC (permalink / raw)
To: Delyan Kratunov; +Cc: daniel, ast, andrii, bpf
On Mon, May 2, 2022 at 4:09 PM Delyan Kratunov <delyank@fb.com> wrote:
>
> Add tests that ensure sleepable kprobe programs cannot attach.
>
> Also attach both sleepable and non-sleepable uprobe programs to the same
> location (i.e. same bpf_prog_array).
>
> Signed-off-by: Delyan Kratunov <delyank@fb.com>
> ---
> .../selftests/bpf/prog_tests/attach_probe.c | 51 +++++++++++++++-
> .../selftests/bpf/progs/test_attach_probe.c | 58 +++++++++++++++++++
> 2 files changed, 108 insertions(+), 1 deletion(-)
>
LGTM, suggestion below is just about making flag setting less verbose,
but it's minor.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
> diff --git a/tools/testing/selftests/bpf/prog_tests/attach_probe.c b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> index 08c0601b3e84..cddb17ab0588 100644
> --- a/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> +++ b/tools/testing/selftests/bpf/prog_tests/attach_probe.c
> @@ -17,6 +17,14 @@ static void trigger_func2(void)
> asm volatile ("");
> }
>
> +/* attach point for byname sleepable uprobe */
> +static void trigger_func3(void)
> +{
> + asm volatile ("");
> +}
> +
> +static char test_data[] = "test_data";
> +
> void test_attach_probe(void)
> {
> DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts);
> @@ -27,6 +35,7 @@ void test_attach_probe(void)
> struct bpf_link *uprobe_err_link;
> bool legacy;
> char *mem;
> + int kprobe_s_flags;
>
> /* Check if new-style kprobe/uprobe API is supported.
> * Kernels that support new FD-based kprobe and uprobe BPF attachment
> @@ -49,9 +58,18 @@ void test_attach_probe(void)
> if (!ASSERT_GE(ref_ctr_offset, 0, "ref_ctr_offset"))
> return;
>
> - skel = test_attach_probe__open_and_load();
> + skel = test_attach_probe__open();
> if (!ASSERT_OK_PTR(skel, "skel_open"))
> return;
> +
> + /* sleepable kprobe test case needs flags set before loading */
> + kprobe_s_flags = bpf_program__flags(skel->progs.handle_kprobe_sleepable);
> + if (!ASSERT_OK(bpf_program__set_flags(skel->progs.handle_kprobe_sleepable,
> + kprobe_s_flags | BPF_F_SLEEPABLE), "kprobe_sleepable_flags"))
> + goto cleanup;
This feels like unnecessary checks and fetching of the flag. We
control all this, so just doing
bpf_program__set_flag(skel->progs.handle_kprobe_sleepable, BPF_F_SLEEPABLE);
seems totally justified and reliable
> +
> + if (!ASSERT_OK(test_attach_probe__load(skel), "skel_load"))
> + goto cleanup;
> if (!ASSERT_OK_PTR(skel->bss, "check_bss"))
> goto cleanup;
>
[...]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu
2022-05-03 17:20 ` Delyan Kratunov
@ 2022-05-09 23:58 ` Andrii Nakryiko
2022-05-10 2:54 ` Alexei Starovoitov
1 sibling, 0 replies; 17+ messages in thread
From: Andrii Nakryiko @ 2022-05-09 23:58 UTC (permalink / raw)
To: Delyan Kratunov; +Cc: daniel, ast, andrii, lkp, bpf, kbuild-all
On Tue, May 3, 2022 at 10:20 AM Delyan Kratunov <delyank@fb.com> wrote:
>
> On Tue, 2022-05-03 at 14:30 +0800, kernel test robot wrote:
> > Hi Delyan,
> >
> > Thank you for the patch! Yet something to improve:
> >
> > [auto build test ERROR on bpf-next/master]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Delyan-Kratunov/sleepable-uprobe-support/20220503-071247
> > base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> > config: i386-defconfig (https://download.01.org/0day-ci/archive/20220503/202205031441.1fhDuUQK-lkp@intel.com/config )
> > compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
> > reproduce (this is a W=1 build):
> > # https://github.com/intel-lab-lkp/linux/commit/cfa0f114829902b579da16d7520a39317905c502
> > git remote add linux-review https://github.com/intel-lab-lkp/linux
> > git fetch --no-tags linux-review Delyan-Kratunov/sleepable-uprobe-support/20220503-071247
> > git checkout cfa0f114829902b579da16d7520a39317905c502
> > # save the config file
> > mkdir build_dir && cp config build_dir/.config
> > make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp@intel.com>
> >
> > All errors (new ones prefixed by >>):
> >
> > kernel/trace/trace_uprobe.c: In function '__uprobe_perf_func':
> > > > kernel/trace/trace_uprobe.c:1349:23: error: implicit declaration of function 'uprobe_call_bpf'; did you mean 'trace_call_bpf'? [-Werror=implicit-function-declaration]
> > 1349 | ret = uprobe_call_bpf(call, regs);
> > | ^~~~~~~~~~~~~~~
> > | trace_call_bpf
> > cc1: some warnings being treated as errors
>
> Hm, CONFIG_BPF_EVENTS doesn't seem to guard the callsite from trace_uprobe.c, it's
> only gated by CONFIG_PERF_EVENTS there. A PERF_EVENTS=y && BPF_EVENTS=n config would
> lead to this error.
>
> This is a pre-existing issue and I'll send a separate patch for it.
It would probably make sense to include it as a pre-patch for your
series, so that bpf-next doesn't trigger these warnings?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu
2022-05-03 17:20 ` Delyan Kratunov
2022-05-09 23:58 ` Andrii Nakryiko
@ 2022-05-10 2:54 ` Alexei Starovoitov
1 sibling, 0 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2022-05-10 2:54 UTC (permalink / raw)
To: Delyan Kratunov; +Cc: daniel, ast, andrii, lkp, bpf, kbuild-all
On Tue, May 03, 2022 at 05:20:26PM +0000, Delyan Kratunov wrote:
> On Tue, 2022-05-03 at 14:30 +0800, kernel test robot wrote:
> > Hi Delyan,
> >
> > Thank you for the patch! Yet something to improve:
> >
> > [auto build test ERROR on bpf-next/master]
> >
> > url: https://github.com/intel-lab-lkp/linux/commits/Delyan-Kratunov/sleepable-uprobe-support/20220503-071247
> > base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
> > config: i386-defconfig (https://download.01.org/0day-ci/archive/20220503/202205031441.1fhDuUQK-lkp@intel.com/config )
> > compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
> > reproduce (this is a W=1 build):
> > # https://github.com/intel-lab-lkp/linux/commit/cfa0f114829902b579da16d7520a39317905c502
> > git remote add linux-review https://github.com/intel-lab-lkp/linux
> > git fetch --no-tags linux-review Delyan-Kratunov/sleepable-uprobe-support/20220503-071247
> > git checkout cfa0f114829902b579da16d7520a39317905c502
> > # save the config file
> > mkdir build_dir && cp config build_dir/.config
> > make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <lkp@intel.com>
> >
> > All errors (new ones prefixed by >>):
> >
> > kernel/trace/trace_uprobe.c: In function '__uprobe_perf_func':
> > > > kernel/trace/trace_uprobe.c:1349:23: error: implicit declaration of function 'uprobe_call_bpf'; did you mean 'trace_call_bpf'? [-Werror=implicit-function-declaration]
> > 1349 | ret = uprobe_call_bpf(call, regs);
> > | ^~~~~~~~~~~~~~~
> > | trace_call_bpf
> > cc1: some warnings being treated as errors
>
> Hm, CONFIG_BPF_EVENTS doesn't seem to guard the callsite from trace_uprobe.c, it's
> only gated by CONFIG_PERF_EVENTS there. A PERF_EVENTS=y && BPF_EVENTS=n config would
> lead to this error.
>
> This is a pre-existing issue and I'll send a separate patch for it.
Maybe move uprobe_call_bpf into trace_uprobe.c so it gets a chance of being
fully inlined and will avoid this issue.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu
2022-05-02 23:09 ` [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu Delyan Kratunov
2022-05-03 6:30 ` kernel test robot
2022-05-03 8:48 ` kernel test robot
@ 2022-05-10 3:02 ` Alexei Starovoitov
2 siblings, 0 replies; 17+ messages in thread
From: Alexei Starovoitov @ 2022-05-10 3:02 UTC (permalink / raw)
To: Delyan Kratunov; +Cc: daniel, ast, andrii, bpf
On Mon, May 02, 2022 at 11:09:37PM +0000, Delyan Kratunov wrote:
> uprobes work by raising a trap, setting a task flag from within the
> interrupt handler, and processing the actual work for the uprobe on the
> way back to userspace. As a result, uprobe handlers already execute in a
> user context. The primary obstacle to sleepable bpf uprobe programs is
> therefore on the bpf side.
>
> Namely, the bpf_prog_array attached to the uprobe is protected by normal
> rcu and runs with disabled preemption.
disabled preemption was an artifact of the past.
It became unnecessary when migrate_disable was introduced.
Looks like we forgot to update this spot.
> In order for uprobe bpf programs
> to become actually sleepable, we need it to be protected by the tasks_trace
> rcu flavor instead (and kfree() called after a corresponding grace period).
>
> Based on Alexei's proposal, we change the free path for bpf_prog_array to
> chain a tasks_trace and normal grace periods one after the other.
>
> Users who iterate under tasks_trace read section would
> be safe, as would users who iterate under normal read sections (from
> non-sleepable locations). The downside is that we take the tasks_trace latency
> for all perf_event-attached bpf programs (and not just uprobe ones)
> but this is deemed safe given the possible attach rates for
> kprobe/uprobe/tp programs.
>
> Separately, non-sleepable programs need access to dynamically sized
> rcu-protected maps, so we conditionally disable preemption and take an rcu
> read section around them, in addition to the overarching tasks_trace section.
>
> Signed-off-by: Delyan Kratunov <delyank@fb.com>
> ---
> include/linux/bpf.h | 57 ++++++++++++++++++++++++++++++++++++
> include/linux/trace_events.h | 1 +
> kernel/bpf/core.c | 15 ++++++++++
> kernel/trace/bpf_trace.c | 27 +++++++++++++++--
> kernel/trace/trace_uprobe.c | 4 +--
> 5 files changed, 99 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 57ec619cf729..592886115011 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -26,6 +26,7 @@
> #include <linux/stddef.h>
> #include <linux/bpfptr.h>
> #include <linux/btf.h>
> +#include <linux/rcupdate_trace.h>
>
> struct bpf_verifier_env;
> struct bpf_verifier_log;
> @@ -1343,6 +1344,8 @@ extern struct bpf_empty_prog_array bpf_empty_prog_array;
>
> struct bpf_prog_array *bpf_prog_array_alloc(u32 prog_cnt, gfp_t flags);
> void bpf_prog_array_free(struct bpf_prog_array *progs);
> +/* Use when traversal over the bpf_prog_array uses tasks_trace rcu */
> +void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs);
> int bpf_prog_array_length(struct bpf_prog_array *progs);
> bool bpf_prog_array_is_empty(struct bpf_prog_array *array);
> int bpf_prog_array_copy_to_user(struct bpf_prog_array *progs,
> @@ -1428,6 +1431,60 @@ bpf_prog_run_array(const struct bpf_prog_array *array,
> return ret;
> }
>
> +/**
> + * Notes on RCU design for bpf_prog_arrays containing sleepable programs:
> + *
> + * We use the tasks_trace rcu flavor read section to protect the bpf_prog_array
> + * overall. As a result, we must use the bpf_prog_array_free_sleepable
> + * in order to use the tasks_trace rcu grace period.
> + *
> + * When a non-sleepable program is inside the array, we take the rcu read
> + * section and disable preemption for that program alone, so it can access
> + * rcu-protected dynamically sized maps.
> + */
> +static __always_inline u32
> +bpf_prog_run_array_sleepable(const struct bpf_prog_array __rcu *array_rcu,
> + const void *ctx, bpf_prog_run_fn run_prog)
> +{
> + const struct bpf_prog_array_item *item;
> + const struct bpf_prog *prog;
> + const struct bpf_prog_array *array;
> + struct bpf_run_ctx *old_run_ctx;
> + struct bpf_trace_run_ctx run_ctx;
> + u32 ret = 1;
> +
> + might_fault();
> +
> + migrate_disable();
> + rcu_read_lock_trace();
pls swap this order to make it the same as bpf trampoline.
> +
> + array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held());
> + if (unlikely(!array))
> + goto out;
> + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> + item = &array->items[0];
> + while ((prog = READ_ONCE(item->prog))) {
> + if (!prog->aux->sleepable) {
> + preempt_disable();
just remove this line.
> + rcu_read_lock();
In config-s where rcu_read_lock/unlock is a nop
this whole 'if' and one below will be optimized out by the compiler.
Which is nice.
> + }
> +
> + run_ctx.bpf_cookie = item->bpf_cookie;
> + ret &= run_prog(prog, ctx);
> + item++;
> +
> + if (!prog->aux->sleepable) {
> + rcu_read_unlock();
> + preempt_enable();
> + }
> + }
> + bpf_reset_run_ctx(old_run_ctx);
> +out:
> + rcu_read_unlock_trace();
> + migrate_enable();
> + return ret;
> +}
> +
> #ifdef CONFIG_BPF_SYSCALL
> DECLARE_PER_CPU(int, bpf_prog_active);
> extern struct mutex bpf_stats_enabled_mutex;
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index e6e95a9f07a5..d45889f1210d 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -736,6 +736,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
>
> #ifdef CONFIG_BPF_EVENTS
> unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
> +unsigned int uprobe_call_bpf(struct trace_event_call *call, void *ctx);
> int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
> void perf_event_detach_bpf_prog(struct perf_event *event);
> int perf_event_query_prog_array(struct perf_event *event, void __user *info);
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 13e9dbeeedf3..9271b708807a 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2268,6 +2268,21 @@ void bpf_prog_array_free(struct bpf_prog_array *progs)
> kfree_rcu(progs, rcu);
> }
>
> +static void __bpf_prog_array_free_sleepable_cb(struct rcu_head *rcu)
> +{
> + struct bpf_prog_array *progs;
> +
> + progs = container_of(rcu, struct bpf_prog_array, rcu);
> + kfree_rcu(progs, rcu);
> +}
> +
> +void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs)
> +{
> + if (!progs || progs == &bpf_empty_prog_array.hdr)
> + return;
> + call_rcu_tasks_trace(&progs->rcu, __bpf_prog_array_free_sleepable_cb);
> +}
> +
> int bpf_prog_array_length(struct bpf_prog_array *array)
> {
> struct bpf_prog_array_item *item;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f15b826f9899..582a6171e096 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -140,6 +140,29 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
> return ret;
> }
>
> +unsigned int uprobe_call_bpf(struct trace_event_call *call, void *ctx)
> +{
> + unsigned int ret;
> +
> + /*
> + * Instead of moving rcu_read_lock/rcu_dereference/rcu_read_unlock
> + * to all call sites, we did a bpf_prog_array_valid() there to check
> + * whether call->prog_array is empty or not, which is
> + * a heuristic to speed up execution.
> + *
> + * If bpf_prog_array_valid() fetched prog_array was
> + * non-NULL, we go into uprobe_call_bpf() and do the actual
> + * proper rcu_dereference() under RCU trace lock.
> + * If it turns out that prog_array is NULL then, we bail out.
> + * For the opposite, if the bpf_prog_array_valid() fetched pointer
> + * was NULL, you'll skip the prog_array with the risk of missing
> + * out of events when it was updated in between this and the
> + * rcu_dereference() which is accepted risk.
> + */
> + ret = bpf_prog_run_array_sleepable(call->prog_array, ctx, bpf_prog_run);
> + return ret;
> +}
> +
> #ifdef CONFIG_BPF_KPROBE_OVERRIDE
> BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
> {
> @@ -1915,7 +1938,7 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
> event->prog = prog;
> event->bpf_cookie = bpf_cookie;
> rcu_assign_pointer(event->tp_event->prog_array, new_array);
> - bpf_prog_array_free(old_array);
> + bpf_prog_array_free_sleepable(old_array);
>
> unlock:
> mutex_unlock(&bpf_event_mutex);
> @@ -1941,7 +1964,7 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
> bpf_prog_array_delete_safe(old_array, event->prog);
> } else {
> rcu_assign_pointer(event->tp_event->prog_array, new_array);
> - bpf_prog_array_free(old_array);
> + bpf_prog_array_free_sleepable(old_array);
> }
>
> bpf_prog_put(event->prog);
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index 9711589273cd..3eb48897d15b 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -1346,9 +1346,7 @@ static void __uprobe_perf_func(struct trace_uprobe *tu,
> if (bpf_prog_array_valid(call)) {
> u32 ret;
>
> - preempt_disable();
It should have been replaced with migrate_disable long ago.
> - ret = trace_call_bpf(call, regs);
> - preempt_enable();
> + ret = uprobe_call_bpf(call, regs);
> if (!ret)
> return;
> }
> --
> 2.35.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 1/5] bpf: move bpf_prog to bpf.h
2022-05-02 23:09 ` [PATCH bpf-next v2 1/5] bpf: move bpf_prog to bpf.h Delyan Kratunov
@ 2022-05-10 3:04 ` Alexei Starovoitov
2022-05-13 1:22 ` Delyan Kratunov
0 siblings, 1 reply; 17+ messages in thread
From: Alexei Starovoitov @ 2022-05-10 3:04 UTC (permalink / raw)
To: Delyan Kratunov; +Cc: daniel, ast, andrii, bpf
On Mon, May 02, 2022 at 11:09:38PM +0000, Delyan Kratunov wrote:
> In order to add a version of bpf_prog_run_array which accesses the
> bpf_prog->aux member, we need bpf_prog to be more than a forward
> declaration inside bpf.h.
>
> Given that filter.h already includes bpf.h, this merely reorders
> the type declarations for filter.h users. bpf.h users now have access to
> bpf_prog internals.
>
> Signed-off-by: Delyan Kratunov <delyank@fb.com>
> ---
> include/linux/bpf.h | 36 ++++++++++++++++++++++++++++++++++++
> include/linux/filter.h | 34 ----------------------------------
> 2 files changed, 36 insertions(+), 34 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index be94833d390a..57ec619cf729 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -5,6 +5,7 @@
> #define _LINUX_BPF_H 1
>
> #include <uapi/linux/bpf.h>
> +#include <uapi/linux/filter.h>
because of struct sock_filter ?
Pls fwd declare it instead.
> #include <linux/workqueue.h>
> #include <linux/file.h>
> @@ -22,6 +23,7 @@
> #include <linux/sched/mm.h>
> #include <linux/slab.h>
> #include <linux/percpu-refcount.h>
> +#include <linux/stddef.h>
> #include <linux/bpfptr.h>
> #include <linux/btf.h>
>
> @@ -1068,6 +1070,40 @@ struct bpf_prog_aux {
> };
> };
>
> +struct bpf_prog {
> + u16 pages; /* Number of allocated pages */
> + u16 jited:1, /* Is our filter JIT'ed? */
> + jit_requested:1,/* archs need to JIT the prog */
> + gpl_compatible:1, /* Is filter GPL compatible? */
> + cb_access:1, /* Is control block accessed? */
> + dst_needed:1, /* Do we need dst entry? */
> + blinding_requested:1, /* needs constant blinding */
> + blinded:1, /* Was blinded */
> + is_func:1, /* program is a bpf function */
> + kprobe_override:1, /* Do we override a kprobe? */
> + has_callchain_buf:1, /* callchain buffer allocated? */
> + enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> + call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> + call_get_func_ip:1, /* Do we call get_func_ip() */
> + tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> + enum bpf_prog_type type; /* Type of BPF program */
> + enum bpf_attach_type expected_attach_type; /* For some prog types */
> + u32 len; /* Number of filter blocks */
> + u32 jited_len; /* Size of jited insns in bytes */
> + u8 tag[BPF_TAG_SIZE];
> + struct bpf_prog_stats __percpu *stats;
> + int __percpu *active;
> + unsigned int (*bpf_func)(const void *ctx,
> + const struct bpf_insn *insn);
> + struct bpf_prog_aux *aux; /* Auxiliary fields */
> + struct sock_fprog_kern *orig_prog; /* Original BPF program */
> + /* Instructions for interpreter */
> + union {
> + DECLARE_FLEX_ARRAY(struct sock_filter, insns);
> + DECLARE_FLEX_ARRAY(struct bpf_insn, insnsi);
> + };
> +};
> +
> struct bpf_array_aux {
> /* Programs with direct jumps into programs part of this array. */
> struct list_head poke_progs;
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index ed0c0ff42ad5..d0cbb31b1b4d 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -559,40 +559,6 @@ struct bpf_prog_stats {
> struct u64_stats_sync syncp;
> } __aligned(2 * sizeof(u64));
>
> -struct bpf_prog {
> - u16 pages; /* Number of allocated pages */
> - u16 jited:1, /* Is our filter JIT'ed? */
> - jit_requested:1,/* archs need to JIT the prog */
> - gpl_compatible:1, /* Is filter GPL compatible? */
> - cb_access:1, /* Is control block accessed? */
> - dst_needed:1, /* Do we need dst entry? */
> - blinding_requested:1, /* needs constant blinding */
> - blinded:1, /* Was blinded */
> - is_func:1, /* program is a bpf function */
> - kprobe_override:1, /* Do we override a kprobe? */
> - has_callchain_buf:1, /* callchain buffer allocated? */
> - enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> - call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> - call_get_func_ip:1, /* Do we call get_func_ip() */
> - tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> - enum bpf_prog_type type; /* Type of BPF program */
> - enum bpf_attach_type expected_attach_type; /* For some prog types */
> - u32 len; /* Number of filter blocks */
> - u32 jited_len; /* Size of jited insns in bytes */
> - u8 tag[BPF_TAG_SIZE];
> - struct bpf_prog_stats __percpu *stats;
> - int __percpu *active;
> - unsigned int (*bpf_func)(const void *ctx,
> - const struct bpf_insn *insn);
> - struct bpf_prog_aux *aux; /* Auxiliary fields */
> - struct sock_fprog_kern *orig_prog; /* Original BPF program */
> - /* Instructions for interpreter */
> - union {
> - DECLARE_FLEX_ARRAY(struct sock_filter, insns);
> - DECLARE_FLEX_ARRAY(struct bpf_insn, insnsi);
> - };
> -};
> -
> struct sk_filter {
> refcount_t refcnt;
> struct rcu_head rcu;
> --
> 2.35.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH bpf-next v2 1/5] bpf: move bpf_prog to bpf.h
2022-05-10 3:04 ` Alexei Starovoitov
@ 2022-05-13 1:22 ` Delyan Kratunov
0 siblings, 0 replies; 17+ messages in thread
From: Delyan Kratunov @ 2022-05-13 1:22 UTC (permalink / raw)
To: alexei.starovoitov; +Cc: daniel, ast, andrii, bpf
On Mon, 2022-05-09 at 20:04 -0700, Alexei Starovoitov wrote:
> On Mon, May 02, 2022 at 11:09:38PM +0000, Delyan Kratunov wrote:
> > In order to add a version of bpf_prog_run_array which accesses the
> > bpf_prog->aux member, we need bpf_prog to be more than a forward
> > declaration inside bpf.h.
> >
> > Given that filter.h already includes bpf.h, this merely reorders
> > the type declarations for filter.h users. bpf.h users now have access to
> > bpf_prog internals.
> >
> > Signed-off-by: Delyan Kratunov <delyank@fb.com>
> > ---
> > include/linux/bpf.h | 36 ++++++++++++++++++++++++++++++++++++
> > include/linux/filter.h | 34 ----------------------------------
> > 2 files changed, 36 insertions(+), 34 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index be94833d390a..57ec619cf729 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -5,6 +5,7 @@
> > #define _LINUX_BPF_H 1
> >
> > #include <uapi/linux/bpf.h>
> > +#include <uapi/linux/filter.h>
>
> because of struct sock_filter ?
> Pls fwd declare it instead.
Yes but you can't forward declare it in this context.
It's used as within DECLARE_FLEX_ARRAY, so a forward declaration leads to:
./include/linux/bpf.h:1107:56: error: array type has incomplete element type ‘struct
sock_filter’
1107 | DECLARE_FLEX_ARRAY(struct sock_filter, insns);
>
> > #include <linux/workqueue.h>
> > #include <linux/file.h>
> > @@ -22,6 +23,7 @@
> > #include <linux/sched/mm.h>
> > #include <linux/slab.h>
> > #include <linux/percpu-refcount.h>
> > +#include <linux/stddef.h>
> > #include <linux/bpfptr.h>
> > #include <linux/btf.h>
> >
> > @@ -1068,6 +1070,40 @@ struct bpf_prog_aux {
> > };
> > };
> >
> > +struct bpf_prog {
> > + u16 pages; /* Number of allocated pages */
> > + u16 jited:1, /* Is our filter JIT'ed? */
> > + jit_requested:1,/* archs need to JIT the prog */
> > + gpl_compatible:1, /* Is filter GPL compatible? */
> > + cb_access:1, /* Is control block accessed? */
> > + dst_needed:1, /* Do we need dst entry? */
> > + blinding_requested:1, /* needs constant blinding */
> > + blinded:1, /* Was blinded */
> > + is_func:1, /* program is a bpf function */
> > + kprobe_override:1, /* Do we override a kprobe? */
> > + has_callchain_buf:1, /* callchain buffer allocated? */
> > + enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > + call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > + call_get_func_ip:1, /* Do we call get_func_ip() */
> > + tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > + enum bpf_prog_type type; /* Type of BPF program */
> > + enum bpf_attach_type expected_attach_type; /* For some prog types */
> > + u32 len; /* Number of filter blocks */
> > + u32 jited_len; /* Size of jited insns in bytes */
> > + u8 tag[BPF_TAG_SIZE];
> > + struct bpf_prog_stats __percpu *stats;
> > + int __percpu *active;
> > + unsigned int (*bpf_func)(const void *ctx,
> > + const struct bpf_insn *insn);
> > + struct bpf_prog_aux *aux; /* Auxiliary fields */
> > + struct sock_fprog_kern *orig_prog; /* Original BPF program */
> > + /* Instructions for interpreter */
> > + union {
> > + DECLARE_FLEX_ARRAY(struct sock_filter, insns);
> > + DECLARE_FLEX_ARRAY(struct bpf_insn, insnsi);
> > + };
> > +};
> > +
> > struct bpf_array_aux {
> > /* Programs with direct jumps into programs part of this array. */
> > struct list_head poke_progs;
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index ed0c0ff42ad5..d0cbb31b1b4d 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -559,40 +559,6 @@ struct bpf_prog_stats {
> > struct u64_stats_sync syncp;
> > } __aligned(2 * sizeof(u64));
> >
> > -struct bpf_prog {
> > - u16 pages; /* Number of allocated pages */
> > - u16 jited:1, /* Is our filter JIT'ed? */
> > - jit_requested:1,/* archs need to JIT the prog */
> > - gpl_compatible:1, /* Is filter GPL compatible? */
> > - cb_access:1, /* Is control block accessed? */
> > - dst_needed:1, /* Do we need dst entry? */
> > - blinding_requested:1, /* needs constant blinding */
> > - blinded:1, /* Was blinded */
> > - is_func:1, /* program is a bpf function */
> > - kprobe_override:1, /* Do we override a kprobe? */
> > - has_callchain_buf:1, /* callchain buffer allocated? */
> > - enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
> > - call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
> > - call_get_func_ip:1, /* Do we call get_func_ip() */
> > - tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
> > - enum bpf_prog_type type; /* Type of BPF program */
> > - enum bpf_attach_type expected_attach_type; /* For some prog types */
> > - u32 len; /* Number of filter blocks */
> > - u32 jited_len; /* Size of jited insns in bytes */
> > - u8 tag[BPF_TAG_SIZE];
> > - struct bpf_prog_stats __percpu *stats;
> > - int __percpu *active;
> > - unsigned int (*bpf_func)(const void *ctx,
> > - const struct bpf_insn *insn);
> > - struct bpf_prog_aux *aux; /* Auxiliary fields */
> > - struct sock_fprog_kern *orig_prog; /* Original BPF program */
> > - /* Instructions for interpreter */
> > - union {
> > - DECLARE_FLEX_ARRAY(struct sock_filter, insns);
> > - DECLARE_FLEX_ARRAY(struct bpf_insn, insnsi);
> > - };
> > -};
> > -
> > struct sk_filter {
> > refcount_t refcnt;
> > struct rcu_head rcu;
> > --
> > 2.35.1
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2022-05-13 1:22 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-02 23:09 [PATCH bpf-next v2 0/5] sleepable uprobe support Delyan Kratunov
2022-05-02 23:09 ` [PATCH bpf-next v2 3/5] bpf: allow sleepable uprobe programs to attach Delyan Kratunov
2022-05-09 23:52 ` Andrii Nakryiko
2022-05-02 23:09 ` [PATCH bpf-next v2 5/5] selftests/bpf: add tests for sleepable kprobes and uprobes Delyan Kratunov
2022-05-09 23:57 ` Andrii Nakryiko
2022-05-02 23:09 ` [PATCH bpf-next v2 4/5] libbpf: add support for sleepable kprobe and uprobe programs Delyan Kratunov
2022-05-09 23:53 ` Andrii Nakryiko
2022-05-02 23:09 ` [PATCH bpf-next v2 2/5] bpf: implement sleepable uprobes by chaining tasks_trace and normal rcu Delyan Kratunov
2022-05-03 6:30 ` kernel test robot
2022-05-03 17:20 ` Delyan Kratunov
2022-05-09 23:58 ` Andrii Nakryiko
2022-05-10 2:54 ` Alexei Starovoitov
2022-05-03 8:48 ` kernel test robot
2022-05-10 3:02 ` Alexei Starovoitov
2022-05-02 23:09 ` [PATCH bpf-next v2 1/5] bpf: move bpf_prog to bpf.h Delyan Kratunov
2022-05-10 3:04 ` Alexei Starovoitov
2022-05-13 1:22 ` Delyan Kratunov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).