[PATCH RFC v3 bpf-next 0/4] bpf: Introduce minimal support for sleepable progs

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC v3 bpf-next 0/4] bpf: Introduce minimal support for sleepable progs
@ 2020-06-11 22:23 Alexei Starovoitov
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 1/4] bpf: Introduce sleepable BPF programs Alexei Starovoitov
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Alexei Starovoitov @ 2020-06-11 22:23 UTC (permalink / raw)
  To: davem; +Cc: daniel, paulmck, netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

v2->v3:
- switched to rcu_trace
- added bpf_copy_from_user

Here is 'perf report' differences:
sleepable with SRCU:
   3.86%  bench     [k] __srcu_read_unlock
   3.22%  bench     [k] __srcu_read_lock
   0.92%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
   0.50%  bench     [k] bpf_trampoline_10297
   0.26%  bench     [k] __bpf_prog_exit_sleepable
   0.21%  bench     [k] __bpf_prog_enter_sleepable

sleepable with RCU_TRACE:
   0.79%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry_sleep
   0.72%  bench     [k] bpf_trampoline_10381
   0.31%  bench     [k] __bpf_prog_exit_sleepable
   0.29%  bench     [k] __bpf_prog_enter_sleepable

non-sleepable with RCU:
   0.88%  bench     [k] bpf_prog_740d4210cdcd99a3_bench_trigger_fentry
   0.84%  bench     [k] bpf_trampoline_10297
   0.13%  bench     [k] __bpf_prog_enter
   0.12%  bench     [k] __bpf_prog_exit

Happy to confirm that rcu_trace overhead is negligible.

v1->v2:
- split fmod_ret fix into separate patch
- added blacklist

v1:
This patch set introduces the minimal viable support for sleepable bpf programs.
In this patch only fentry/fexit/fmod_ret and lsm progs can be sleepable.
Only array and pre-allocated hash and lru maps allowed.

Alexei Starovoitov (4):
  bpf: Introduce sleepable BPF programs
  bpf: Add bpf_copy_from_user() helper.
  libbpf: support sleepable progs
  selftests/bpf: basic sleepable tests

 arch/x86/net/bpf_jit_comp.c                   | 32 ++++++----
 include/linux/bpf.h                           |  4 ++
 include/uapi/linux/bpf.h                      | 19 +++++-
 init/Kconfig                                  |  1 +
 kernel/bpf/arraymap.c                         |  6 ++
 kernel/bpf/hashtab.c                          | 20 ++++--
 kernel/bpf/helpers.c                          | 22 +++++++
 kernel/bpf/syscall.c                          | 13 +++-
 kernel/bpf/trampoline.c                       | 37 ++++++++---
 kernel/bpf/verifier.c                         | 62 ++++++++++++++++++-
 kernel/trace/bpf_trace.c                      |  2 +
 tools/include/uapi/linux/bpf.h                | 19 +++++-
 tools/lib/bpf/libbpf.c                        | 25 +++++++-
 tools/testing/selftests/bpf/bench.c           |  2 +
 .../selftests/bpf/benchs/bench_trigger.c      | 17 +++++
 tools/testing/selftests/bpf/progs/lsm.c       | 14 ++++-
 .../selftests/bpf/progs/trigger_bench.c       |  7 +++
 17 files changed, 268 insertions(+), 34 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC v3 bpf-next 1/4] bpf: Introduce sleepable BPF programs
  2020-06-11 22:23 [PATCH RFC v3 bpf-next 0/4] bpf: Introduce minimal support for sleepable progs Alexei Starovoitov
@ 2020-06-11 22:23 ` Alexei Starovoitov
  2020-06-11 22:29   ` Alexei Starovoitov
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 2/4] bpf: Add bpf_copy_from_user() helper Alexei Starovoitov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2020-06-11 22:23 UTC (permalink / raw)
  To: davem; +Cc: daniel, paulmck, netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Introduce sleepable BPF programs that can request such property for themselves
via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
to use helpers like bpf_copy_from_user() that might sleep. At present only
fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
when they are attached to kernel functions that are known to allow sleeping.

The non-sleepable programs are relying on implicit rcu_read_lock() and
migrate_disable() to protect life time of programs, maps that they use and
per-cpu kernel structures used to pass info between bpf programs and the
kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
should not be enclosed in migrate_disable() as well. Therefore
rcu_read_lock_trace is used to protect the life time of sleepable progs.

There are many networking and tracing program types. In many cases the
'struct bpf_prog *' pointer itself is rcu protected within some other kernel
data structure and the kernel code is using rcu_dereference() to load that
program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
Instead sleepable bpf programs are allowed with bpf trampoline only. The
program pointers are hard-coded into generated assembly of bpf trampoline and
synchronize_rcu_tasks_trace() is used to protect the life time of the program.
The same trampoline can hold both sleepable and non-sleepable progs.

When rcu_read_lock_trace is held it means that some sleepable bpf program is running
from bpf trampoline. Those programs can use bpf arrays and preallocated hash/lru
maps. These map types are waiting on programs to complete via
synchronize_rcu_tasks_trace();

Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
synchronize_rcu_tasks() to wait for sleepable progs to finish and for
trampoline assembly to finish.

This is the first step of introducing sleepable progs.

After that dynamically allocated hash maps can be allowed. All map elements
would have to be rcu_trace protected instead of normal rcu.
per-cpu maps will be allowed. Either via the following pattern:
void *elem = bpf_map_lookup_elem(map, key);
if (elem) {
   // access elem
   bpf_map_release_elem(map, elem);
}
where modified lookup() helper will do migrate_disable() and
new bpf_map_release_elem() will do corresponding migrate_enable().
Or explicit bpf_migrate_disable/enable() helpers will be introduced.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/x86/net/bpf_jit_comp.c    | 32 ++++++++++++------
 include/linux/bpf.h            |  3 ++
 include/uapi/linux/bpf.h       |  8 +++++
 init/Kconfig                   |  1 +
 kernel/bpf/arraymap.c          |  6 ++++
 kernel/bpf/hashtab.c           | 20 +++++++----
 kernel/bpf/syscall.c           | 13 +++++--
 kernel/bpf/trampoline.c        | 37 +++++++++++++++-----
 kernel/bpf/verifier.c          | 62 +++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h |  8 +++++
 10 files changed, 161 insertions(+), 29 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 42b6709e6dc7..7d9ea7b41c71 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1379,10 +1379,15 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 	u8 *prog = *pprog;
 	int cnt = 0;
 
-	if (emit_call(&prog, __bpf_prog_enter, prog))
-		return -EINVAL;
-	/* remember prog start time returned by __bpf_prog_enter */
-	emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0);
+	if (p->aux->sleepable) {
+		if (emit_call(&prog, __bpf_prog_enter_sleepable, prog))
+			return -EINVAL;
+	} else {
+		if (emit_call(&prog, __bpf_prog_enter, prog))
+			return -EINVAL;
+		/* remember prog start time returned by __bpf_prog_enter */
+		emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0);
+	}
 
 	/* arg1: lea rdi, [rbp - stack_size] */
 	EMIT4(0x48, 0x8D, 0x7D, -stack_size);
@@ -1402,13 +1407,18 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog,
 	if (mod_ret)
 		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
 
-	/* arg1: mov rdi, progs[i] */
-	emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32,
-		       (u32) (long) p);
-	/* arg2: mov rsi, rbx <- start time in nsec */
-	emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
-	if (emit_call(&prog, __bpf_prog_exit, prog))
-		return -EINVAL;
+	if (p->aux->sleepable) {
+		if (emit_call(&prog, __bpf_prog_exit_sleepable, prog))
+			return -EINVAL;
+	} else {
+		/* arg1: mov rdi, progs[i] */
+		emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32,
+			       (u32) (long) p);
+		/* arg2: mov rsi, rbx <- start time in nsec */
+		emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
+		if (emit_call(&prog, __bpf_prog_exit, prog))
+			return -EINVAL;
+	}
 
 	*pprog = prog;
 	return 0;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 07052d44bca1..6819000682a5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -484,6 +484,8 @@ int arch_prepare_bpf_trampoline(void *image, void *image_end,
 /* these two functions are called from generated trampoline */
 u64 notrace __bpf_prog_enter(void);
 void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start);
+void notrace __bpf_prog_enter_sleepable(void);
+void notrace __bpf_prog_exit_sleepable(void);
 
 struct bpf_ksym {
 	unsigned long		 start;
@@ -676,6 +678,7 @@ struct bpf_prog_aux {
 	bool offload_requested;
 	bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */
 	bool func_proto_unreliable;
+	bool sleepable;
 	enum bpf_tramp_prog_type trampoline_prog_type;
 	struct bpf_trampoline *trampoline;
 	struct hlist_node tramp_hlist;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c65b374a5090..0bef454c9598 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -332,6 +332,14 @@ enum bpf_link_type {
 /* The verifier internal test flag. Behavior is undefined */
 #define BPF_F_TEST_STATE_FREQ	(1U << 3)
 
+/* If BPF_F_SLEEPABLE is used in BPF_PROG_LOAD command, the verifier will
+ * restrict map and helper usage for such programs. Sleepable BPF programs can
+ * only be attached to hooks where kernel execution context allows sleeping.
+ * Such programs are allowed to use helpers that may sleep like
+ * bpf_copy_from_user().
+ */
+#define BPF_F_SLEEPABLE		(1U << 4)
+
 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
  * two extensions:
  *
diff --git a/init/Kconfig b/init/Kconfig
index 58a4b705c1c2..ca4480fa9f50 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1652,6 +1652,7 @@ config BPF_SYSCALL
 	bool "Enable bpf() system call"
 	select BPF
 	select IRQ_WORK
+	select TASKS_TRACE_RCU
 	default n
 	help
 	  Enable the bpf() system call that allows to manipulate eBPF
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 11584618e861..cafb3c93c53d 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -10,6 +10,7 @@
 #include <linux/filter.h>
 #include <linux/perf_event.h>
 #include <uapi/linux/btf.h>
+#include <linux/rcupdate_trace.h>
 
 #include "map_in_map.h"
 
@@ -393,6 +394,11 @@ static void array_map_free(struct bpf_map *map)
 	 */
 	synchronize_rcu();
 
+	/* arrays could have been used by both sleepable and non-sleepable bpf
+	 * progs. Make sure to wait for both prog types to finish executing.
+	 */
+	synchronize_rcu_tasks_trace();
+
 	if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
 		bpf_array_free_percpu(array);
 
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index b4b288a3c3c9..c87d5d4e66a4 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -9,6 +9,7 @@
 #include <linux/rculist_nulls.h>
 #include <linux/random.h>
 #include <uapi/linux/btf.h>
+#include <linux/rcupdate_trace.h>
 #include "percpu_freelist.h"
 #include "bpf_lru_list.h"
 #include "map_in_map.h"
@@ -577,8 +578,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
 	struct htab_elem *l;
 	u32 hash, key_size;
 
-	/* Must be called with rcu_read_lock. */
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	/* Must be called with s?rcu_read_lock. */
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
 
 	key_size = map->key_size;
 
@@ -935,7 +936,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		/* unknown flags */
 		return -EINVAL;
 
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
 
 	key_size = map->key_size;
 
@@ -1026,7 +1027,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
 		/* unknown flags */
 		return -EINVAL;
 
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
 
 	key_size = map->key_size;
 
@@ -1214,7 +1215,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
 	u32 hash, key_size;
 	int ret = -ENOENT;
 
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
 
 	key_size = map->key_size;
 
@@ -1246,7 +1247,7 @@ static int htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	u32 hash, key_size;
 	int ret = -ENOENT;
 
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
 
 	key_size = map->key_size;
 
@@ -1297,6 +1298,13 @@ static void htab_map_free(struct bpf_map *map)
 	 */
 	synchronize_rcu();
 
+	/* preallocated hash map could have been used by both sleepable and
+	 * non-sleepable bpf progs. Make sure to wait for both prog types
+	 * to finish executing.
+	 */
+	if (htab_is_prealloc(htab))
+		synchronize_rcu_tasks_trace();
+
 	/* some of free_htab_elem() callbacks for elements of this map may
 	 * not have executed. Wait for them.
 	 */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4d530b1d5683..282bdf49ede3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -29,6 +29,7 @@
 #include <linux/bpf_lsm.h>
 #include <linux/poll.h>
 #include <linux/bpf-netns.h>
+#include <linux/rcupdate_trace.h>
 
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -1741,10 +1742,14 @@ static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred)
 	btf_put(prog->aux->btf);
 	bpf_prog_free_linfo(prog);
 
-	if (deferred)
-		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
-	else
+	if (deferred) {
+		if (prog->aux->sleepable)
+			call_rcu_tasks_trace(&prog->aux->rcu, __bpf_prog_put_rcu);
+		else
+			call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
+	} else {
 		__bpf_prog_put_rcu(&prog->aux->rcu);
+	}
 }
 
 static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
@@ -2109,6 +2114,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	if (attr->prog_flags & ~(BPF_F_STRICT_ALIGNMENT |
 				 BPF_F_ANY_ALIGNMENT |
 				 BPF_F_TEST_STATE_FREQ |
+				 BPF_F_SLEEPABLE |
 				 BPF_F_TEST_RND_HI32))
 		return -EINVAL;
 
@@ -2164,6 +2170,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	}
 
 	prog->aux->offload_requested = !!attr->prog_ifindex;
+	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
 
 	err = security_bpf_prog_alloc(prog->aux);
 	if (err)
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 9be85aa4ec5f..71cee87a3115 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -7,6 +7,7 @@
 #include <linux/rbtree_latch.h>
 #include <linux/perf_event.h>
 #include <linux/btf.h>
+#include <linux/rcupdate_trace.h>
 
 /* dummy _ops. The verifier will operate on target program's ops. */
 const struct bpf_verifier_ops bpf_extension_verifier_ops = {
@@ -205,14 +206,12 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
 	    tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)
 		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
 
-	/* Though the second half of trampoline page is unused a task could be
-	 * preempted in the middle of the first half of trampoline and two
-	 * updates to trampoline would change the code from underneath the
-	 * preempted task. Hence wait for tasks to voluntarily schedule or go
-	 * to userspace.
+	/* the same trampoline can hold both sleepable and non-sleepable progs.
+	 * synchronize_rcu_tasks_trace() is needed to make sure all sleepable
+	 * programs finish executing. It also ensures that the rest of
+	 * generated tramopline assembly finishes before updating trampoline.
 	 */
-
-	synchronize_rcu_tasks();
+	synchronize_rcu_tasks_trace();
 
 	err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2,
 					  &tr->func.model, flags, tprogs,
@@ -344,7 +343,14 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
 	if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT])))
 		goto out;
 	bpf_image_ksym_del(&tr->ksym);
-	/* wait for tasks to get out of trampoline before freeing it */
+	/* This code will be executed when all bpf progs (both sleepable and
+	 * non-sleepable) went through
+	 * bpf_prog_put()->call_rcu[_tasks_trace]()->bpf_prog_free_deferred().
+	 * Hence no need for another synchronize_rcu_tasks_trace() here,
+	 * but synchronize_rcu_tasks() is still needed, since trampoline
+	 * may not have had any sleepable programs and we need to wait
+	 * for tasks to get out of trampoline code before freeing it.
+	 */
 	synchronize_rcu_tasks();
 	bpf_jit_free_exec(tr->image);
 	hlist_del(&tr->hlist);
@@ -394,6 +400,21 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start)
 	rcu_read_unlock();
 }
 
+/* when rcu_read_lock_trace is held it means that some sleepable bpf program is
+ * running. Those programs can use bpf arrays and preallocated hash maps. These
+ * map types are waiting on programs to complete via
+ * synchronize_rcu_tasks_trace();
+ */
+void notrace __bpf_prog_enter_sleepable(void)
+{
+	rcu_read_lock_trace();
+}
+
+void notrace __bpf_prog_exit_sleepable(void)
+{
+	rcu_read_unlock_trace();
+}
+
 int __weak
 arch_prepare_bpf_trampoline(void *image, void *image_end,
 			    const struct btf_func_model *m, u32 flags,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5c7bbaac81ef..6dba7c656a92 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9019,6 +9019,23 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 		return -EINVAL;
 	}
 
+	if (prog->aux->sleepable)
+		switch (map->map_type) {
+		case BPF_MAP_TYPE_HASH:
+		case BPF_MAP_TYPE_LRU_HASH:
+		case BPF_MAP_TYPE_ARRAY:
+			if (!is_preallocated_map(map)) {
+				verbose(env,
+					"Sleepable programs can only use preallocated hash maps\n");
+				return -EINVAL;
+			}
+			break;
+		default:
+			verbose(env,
+				"Sleepable programs can only use array and hash maps\n");
+			return -EINVAL;
+		}
+
 	return 0;
 }
 
@@ -10629,6 +10646,22 @@ static int check_attach_modify_return(struct bpf_prog *prog, unsigned long addr)
 	return -EINVAL;
 }
 
+/* list of non-sleepable kernel functions that are otherwise
+ * available to attach by bpf_lsm or fmod_ret progs.
+ */
+static int check_sleepable_blacklist(unsigned long addr)
+{
+#ifdef CONFIG_BPF_LSM
+	if (addr == (long)bpf_lsm_task_free)
+		return -EINVAL;
+#endif
+#ifdef CONFIG_SECURITY
+	if (addr == (long)security_task_free)
+		return -EINVAL;
+#endif
+	return 0;
+}
+
 static int check_attach_btf_id(struct bpf_verifier_env *env)
 {
 	struct bpf_prog *prog = env->prog;
@@ -10646,6 +10679,12 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	long addr;
 	u64 key;
 
+	if (prog->aux->sleepable && prog->type != BPF_PROG_TYPE_TRACING &&
+	    prog->type != BPF_PROG_TYPE_LSM) {
+		verbose(env, "Only fentry/fexit/fmod_ret and lsm programs can be sleepable\n");
+		return -EINVAL;
+	}
+
 	if (prog->type == BPF_PROG_TYPE_STRUCT_OPS)
 		return check_struct_ops_btf_id(env);
 
@@ -10859,8 +10898,29 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			if (ret)
 				verbose(env, "%s() is not modifiable\n",
 					prog->aux->attach_func_name);
+		} else if (prog->aux->sleepable) {
+			switch (prog->type) {
+			case BPF_PROG_TYPE_TRACING:
+				/* fentry/fexit progs can be sleepable only if they are
+				 * attached to ALLOW_ERROR_INJECTION or security_*() funcs.
+				 */
+				ret = check_attach_modify_return(prog, addr);
+				if (!ret)
+					ret = check_sleepable_blacklist(addr);
+				break;
+			case BPF_PROG_TYPE_LSM:
+				/* LSM progs check that they are attached to bpf_lsm_*() funcs
+				 * which are sleepable too.
+				 */
+				ret = check_sleepable_blacklist(addr);
+				break;
+			default:
+				break;
+			}
+			if (ret)
+				verbose(env, "%s is not sleepable\n",
+					prog->aux->attach_func_name);
 		}
-
 		if (ret)
 			goto out;
 		tr->func.addr = (void *)addr;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c65b374a5090..0bef454c9598 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -332,6 +332,14 @@ enum bpf_link_type {
 /* The verifier internal test flag. Behavior is undefined */
 #define BPF_F_TEST_STATE_FREQ	(1U << 3)
 
+/* If BPF_F_SLEEPABLE is used in BPF_PROG_LOAD command, the verifier will
+ * restrict map and helper usage for such programs. Sleepable BPF programs can
+ * only be attached to hooks where kernel execution context allows sleeping.
+ * Such programs are allowed to use helpers that may sleep like
+ * bpf_copy_from_user().
+ */
+#define BPF_F_SLEEPABLE		(1U << 4)
+
 /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
  * two extensions:
  *
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v3 bpf-next 2/4] bpf: Add bpf_copy_from_user() helper.
  2020-06-11 22:23 [PATCH RFC v3 bpf-next 0/4] bpf: Introduce minimal support for sleepable progs Alexei Starovoitov
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 1/4] bpf: Introduce sleepable BPF programs Alexei Starovoitov
@ 2020-06-11 22:23 ` Alexei Starovoitov
  2020-06-18 22:33   ` Andrii Nakryiko
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 3/4] libbpf: support sleepable progs Alexei Starovoitov
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 4/4] selftests/bpf: basic sleepable tests Alexei Starovoitov
  3 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2020-06-11 22:23 UTC (permalink / raw)
  To: davem; +Cc: daniel, paulmck, netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Sleepable BPF programs can now use copy_from_user() to access user memory.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h            |  1 +
 include/uapi/linux/bpf.h       | 11 ++++++++++-
 kernel/bpf/helpers.c           | 22 ++++++++++++++++++++++
 kernel/trace/bpf_trace.c       |  2 ++
 tools/include/uapi/linux/bpf.h | 11 ++++++++++-
 5 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6819000682a5..c8c9217f3ac9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1632,6 +1632,7 @@ extern const struct bpf_func_proto bpf_ringbuf_reserve_proto;
 extern const struct bpf_func_proto bpf_ringbuf_submit_proto;
 extern const struct bpf_func_proto bpf_ringbuf_discard_proto;
 extern const struct bpf_func_proto bpf_ringbuf_query_proto;
+extern const struct bpf_func_proto bpf_copy_from_user_proto;
 
 const struct bpf_func_proto *bpf_tracing_func_proto(
 	enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0bef454c9598..a38c806d34ad 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3260,6 +3260,13 @@ union bpf_attr {
  * 		case of **BPF_CSUM_LEVEL_QUERY**, the current skb->csum_level
  * 		is returned or the error code -EACCES in case the skb is not
  * 		subject to CHECKSUM_UNNECESSARY.
+ *
+ * int bpf_copy_from_user(void *dst, u32 size, const void *user_ptr)
+ * 	Description
+ * 		Read *size* bytes from user space address *user_ptr* and store
+ * 		the data in *dst*. This is a wrapper of copy_from_user().
+ * 	Return
+ * 		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3397,7 +3404,9 @@ union bpf_attr {
 	FN(ringbuf_submit),		\
 	FN(ringbuf_discard),		\
 	FN(ringbuf_query),		\
-	FN(csum_level),
+	FN(csum_level),			\
+	FN(copy_from_user),		\
+	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index be43ab3e619f..5cc7425ee476 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -601,6 +601,28 @@ const struct bpf_func_proto bpf_event_output_data_proto =  {
 	.arg5_type      = ARG_CONST_SIZE_OR_ZERO,
 };
 
+BPF_CALL_3(bpf_copy_from_user, void *, dst, u32, size,
+	   const void __user *, user_ptr)
+{
+	int ret = copy_from_user(dst, user_ptr, size);
+
+	if (unlikely(ret)) {
+		memset(dst, 0, size);
+		ret = -EFAULT;
+	}
+
+	return ret;
+}
+
+const struct bpf_func_proto bpf_copy_from_user_proto = {
+	.func		= bpf_copy_from_user,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg3_type	= ARG_ANYTHING,
+};
+
 const struct bpf_func_proto bpf_get_current_task_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_proto __weak;
 const struct bpf_func_proto bpf_probe_read_user_str_proto __weak;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3744372a24e2..c1e833000fc3 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1098,6 +1098,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_ringbuf_discard_proto;
 	case BPF_FUNC_ringbuf_query:
 		return &bpf_ringbuf_query_proto;
+	case BPF_FUNC_copy_from_user:
+		return prog->aux->sleepable ? &bpf_copy_from_user_proto : NULL;
 	default:
 		return NULL;
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0bef454c9598..a38c806d34ad 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3260,6 +3260,13 @@ union bpf_attr {
  * 		case of **BPF_CSUM_LEVEL_QUERY**, the current skb->csum_level
  * 		is returned or the error code -EACCES in case the skb is not
  * 		subject to CHECKSUM_UNNECESSARY.
+ *
+ * int bpf_copy_from_user(void *dst, u32 size, const void *user_ptr)
+ * 	Description
+ * 		Read *size* bytes from user space address *user_ptr* and store
+ * 		the data in *dst*. This is a wrapper of copy_from_user().
+ * 	Return
+ * 		0 on success, or a negative error in case of failure.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3397,7 +3404,9 @@ union bpf_attr {
 	FN(ringbuf_submit),		\
 	FN(ringbuf_discard),		\
 	FN(ringbuf_query),		\
-	FN(csum_level),
+	FN(csum_level),			\
+	FN(copy_from_user),		\
+	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v3 bpf-next 3/4] libbpf: support sleepable progs
  2020-06-11 22:23 [PATCH RFC v3 bpf-next 0/4] bpf: Introduce minimal support for sleepable progs Alexei Starovoitov
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 1/4] bpf: Introduce sleepable BPF programs Alexei Starovoitov
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 2/4] bpf: Add bpf_copy_from_user() helper Alexei Starovoitov
@ 2020-06-11 22:23 ` Alexei Starovoitov
  2020-06-18 22:34   ` Andrii Nakryiko
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 4/4] selftests/bpf: basic sleepable tests Alexei Starovoitov
  3 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2020-06-11 22:23 UTC (permalink / raw)
  To: davem; +Cc: daniel, paulmck, netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Pass request to load program as sleepable via ".s" suffix in the section name.
If it happens in the future that all map types and helpers are allowed with
BPF_F_SLEEPABLE flag "fmod_ret/" and "lsm/" can be aliased to "fmod_ret.s/" and
"lsm.s/" to make all lsm and fmod_ret programs sleepable by default. The fentry
and fexit programs would always need to have sleepable vs non-sleepable
distinction, since not all fentry/fexit progs will be attached to sleepable
kernel functions.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@google.com>
---
 tools/lib/bpf/libbpf.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 7f01be2b88b8..936ce9e1623a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -209,6 +209,7 @@ struct bpf_sec_def {
 	bool is_exp_attach_type_optional;
 	bool is_attachable;
 	bool is_attach_btf;
+	bool is_sleepable;
 	attach_fn_t attach_fn;
 };
 
@@ -5451,6 +5452,8 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,
 			/* couldn't guess, but user might manually specify */
 			continue;
 
+		if (prog->sec_def->is_sleepable)
+			prog->prog_flags |= BPF_F_SLEEPABLE;
 		bpf_program__set_type(prog, prog->sec_def->prog_type);
 		bpf_program__set_expected_attach_type(prog,
 				prog->sec_def->expected_attach_type);
@@ -6646,6 +6649,21 @@ static const struct bpf_sec_def section_defs[] = {
 		.expected_attach_type = BPF_TRACE_FEXIT,
 		.is_attach_btf = true,
 		.attach_fn = attach_trace),
+	SEC_DEF("fentry.s/", TRACING,
+		.expected_attach_type = BPF_TRACE_FENTRY,
+		.is_attach_btf = true,
+		.is_sleepable = true,
+		.attach_fn = attach_trace),
+	SEC_DEF("fmod_ret.s/", TRACING,
+		.expected_attach_type = BPF_MODIFY_RETURN,
+		.is_attach_btf = true,
+		.is_sleepable = true,
+		.attach_fn = attach_trace),
+	SEC_DEF("fexit.s/", TRACING,
+		.expected_attach_type = BPF_TRACE_FEXIT,
+		.is_attach_btf = true,
+		.is_sleepable = true,
+		.attach_fn = attach_trace),
 	SEC_DEF("freplace/", EXT,
 		.is_attach_btf = true,
 		.attach_fn = attach_trace),
@@ -6653,6 +6671,11 @@ static const struct bpf_sec_def section_defs[] = {
 		.is_attach_btf = true,
 		.expected_attach_type = BPF_LSM_MAC,
 		.attach_fn = attach_lsm),
+	SEC_DEF("lsm.s/", LSM,
+		.is_attach_btf = true,
+		.is_sleepable = true,
+		.expected_attach_type = BPF_LSM_MAC,
+		.attach_fn = attach_lsm),
 	SEC_DEF("iter/", TRACING,
 		.expected_attach_type = BPF_TRACE_ITER,
 		.is_attach_btf = true,
@@ -7294,7 +7317,7 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
 
 		prog->prog_ifindex = attr->ifindex;
 		prog->log_level = attr->log_level;
-		prog->prog_flags = attr->prog_flags;
+		prog->prog_flags |= attr->prog_flags;
 		if (!first_prog)
 			first_prog = prog;
 	}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC v3 bpf-next 4/4] selftests/bpf: basic sleepable tests
  2020-06-11 22:23 [PATCH RFC v3 bpf-next 0/4] bpf: Introduce minimal support for sleepable progs Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 3/4] libbpf: support sleepable progs Alexei Starovoitov
@ 2020-06-11 22:23 ` Alexei Starovoitov
  2020-06-18 22:43   ` Andrii Nakryiko
  3 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2020-06-11 22:23 UTC (permalink / raw)
  To: davem; +Cc: daniel, paulmck, netdev, bpf, kernel-team

From: Alexei Starovoitov <ast@kernel.org>

Modify few tests to sanity test sleepable bpf functionality.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@google.com>
---
 tools/testing/selftests/bpf/bench.c             |  2 ++
 .../selftests/bpf/benchs/bench_trigger.c        | 17 +++++++++++++++++
 tools/testing/selftests/bpf/progs/lsm.c         | 14 ++++++++++++--
 .../testing/selftests/bpf/progs/trigger_bench.c |  7 +++++++
 4 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 944ad4721c83..1a427685a8a8 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -317,6 +317,7 @@ extern const struct bench bench_trig_tp;
 extern const struct bench bench_trig_rawtp;
 extern const struct bench bench_trig_kprobe;
 extern const struct bench bench_trig_fentry;
+extern const struct bench bench_trig_fentry_sleep;
 extern const struct bench bench_trig_fmodret;
 extern const struct bench bench_rb_libbpf;
 extern const struct bench bench_rb_custom;
@@ -338,6 +339,7 @@ static const struct bench *benchs[] = {
 	&bench_trig_rawtp,
 	&bench_trig_kprobe,
 	&bench_trig_fentry,
+	&bench_trig_fentry_sleep,
 	&bench_trig_fmodret,
 	&bench_rb_libbpf,
 	&bench_rb_custom,
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 49c22832f216..2a0b6c9885a4 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -90,6 +90,12 @@ static void trigger_fentry_setup()
 	attach_bpf(ctx.skel->progs.bench_trigger_fentry);
 }
 
+static void trigger_fentry_sleep_setup()
+{
+	setup_ctx();
+	attach_bpf(ctx.skel->progs.bench_trigger_fentry_sleep);
+}
+
 static void trigger_fmodret_setup()
 {
 	setup_ctx();
@@ -155,6 +161,17 @@ const struct bench bench_trig_fentry = {
 	.report_final = hits_drops_report_final,
 };
 
+const struct bench bench_trig_fentry_sleep = {
+	.name = "trig-fentry-sleep",
+	.validate = trigger_validate,
+	.setup = trigger_fentry_sleep_setup,
+	.producer_thread = trigger_producer,
+	.consumer_thread = trigger_consumer,
+	.measure = trigger_measure,
+	.report_progress = hits_drops_report_progress,
+	.report_final = hits_drops_report_final,
+};
+
 const struct bench bench_trig_fmodret = {
 	.name = "trig-fmodret",
 	.validate = trigger_validate,
diff --git a/tools/testing/selftests/bpf/progs/lsm.c b/tools/testing/selftests/bpf/progs/lsm.c
index b4598d4bc4f7..895445aa2b77 100644
--- a/tools/testing/selftests/bpf/progs/lsm.c
+++ b/tools/testing/selftests/bpf/progs/lsm.c
@@ -15,10 +15,12 @@ int monitored_pid = 0;
 int mprotect_count = 0;
 int bprm_count = 0;
 
-SEC("lsm/file_mprotect")
+SEC("lsm.s/file_mprotect")
 int BPF_PROG(test_int_hook, struct vm_area_struct *vma,
 	     unsigned long reqprot, unsigned long prot, int ret)
 {
+	char args[64];
+
 	if (ret != 0)
 		return ret;
 
@@ -28,6 +30,9 @@ int BPF_PROG(test_int_hook, struct vm_area_struct *vma,
 	is_stack = (vma->vm_start <= vma->vm_mm->start_stack &&
 		    vma->vm_end >= vma->vm_mm->start_stack);
 
+	bpf_copy_from_user(args, sizeof(args), (void *)vma->vm_mm->arg_start);
+	/*bpf_printk("args=%s\n", args);*/
+
 	if (is_stack && monitored_pid == pid) {
 		mprotect_count++;
 		ret = -EPERM;
@@ -36,7 +41,7 @@ int BPF_PROG(test_int_hook, struct vm_area_struct *vma,
 	return ret;
 }
 
-SEC("lsm/bprm_committed_creds")
+SEC("lsm.s/bprm_committed_creds")
 int BPF_PROG(test_void_hook, struct linux_binprm *bprm)
 {
 	__u32 pid = bpf_get_current_pid_tgid() >> 32;
@@ -46,3 +51,8 @@ int BPF_PROG(test_void_hook, struct linux_binprm *bprm)
 
 	return 0;
 }
+SEC("lsm/task_free") /* lsm/ is ok, lsm.s/ fails */
+int BPF_PROG(test_task_free, struct task_struct *task)
+{
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c
index 8b36b6640e7e..9a4d09590b3d 100644
--- a/tools/testing/selftests/bpf/progs/trigger_bench.c
+++ b/tools/testing/selftests/bpf/progs/trigger_bench.c
@@ -39,6 +39,13 @@ int bench_trigger_fentry(void *ctx)
 	return 0;
 }
 
+SEC("fentry.s/__x64_sys_getpgid")
+int bench_trigger_fentry_sleep(void *ctx)
+{
+	__sync_add_and_fetch(&hits, 1);
+	return 0;
+}
+
 SEC("fmod_ret/__x64_sys_getpgid")
 int bench_trigger_fmodret(void *ctx)
 {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 bpf-next 1/4] bpf: Introduce sleepable BPF programs
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 1/4] bpf: Introduce sleepable BPF programs Alexei Starovoitov
@ 2020-06-11 22:29   ` Alexei Starovoitov
  2020-06-12  0:04     ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2020-06-11 22:29 UTC (permalink / raw)
  To: David S. Miller, Paul McKenney
  Cc: Daniel Borkmann, Paul E. McKenney, Network Development, bpf, Kernel Team

On Thu, Jun 11, 2020 at 3:23 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
>  /* dummy _ops. The verifier will operate on target program's ops. */
>  const struct bpf_verifier_ops bpf_extension_verifier_ops = {
> @@ -205,14 +206,12 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
>             tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)
>                 flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
>
> -       /* Though the second half of trampoline page is unused a task could be
> -        * preempted in the middle of the first half of trampoline and two
> -        * updates to trampoline would change the code from underneath the
> -        * preempted task. Hence wait for tasks to voluntarily schedule or go
> -        * to userspace.
> +       /* the same trampoline can hold both sleepable and non-sleepable progs.
> +        * synchronize_rcu_tasks_trace() is needed to make sure all sleepable
> +        * programs finish executing. It also ensures that the rest of
> +        * generated tramopline assembly finishes before updating trampoline.
>          */
> -
> -       synchronize_rcu_tasks();
> +       synchronize_rcu_tasks_trace();

Hi Paul,

I've been looking at rcu_trace implementation and I think above change
is correct.
Could you please double check my understanding?

Also see benchmarking numbers in the cover letter :)

>         err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2,
>                                           &tr->func.model, flags, tprogs,
> @@ -344,7 +343,14 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
>         if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT])))
>                 goto out;
>         bpf_image_ksym_del(&tr->ksym);
> -       /* wait for tasks to get out of trampoline before freeing it */
> +       /* This code will be executed when all bpf progs (both sleepable and
> +        * non-sleepable) went through
> +        * bpf_prog_put()->call_rcu[_tasks_trace]()->bpf_prog_free_deferred().
> +        * Hence no need for another synchronize_rcu_tasks_trace() here,
> +        * but synchronize_rcu_tasks() is still needed, since trampoline
> +        * may not have had any sleepable programs and we need to wait
> +        * for tasks to get out of trampoline code before freeing it.
> +        */
>         synchronize_rcu_tasks();
>         bpf_jit_free_exec(tr->image);
>         hlist_del(&tr->hlist);
> @@ -394,6 +400,21 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start)
>         rcu_read_unlock();
>  }
>
> +/* when rcu_read_lock_trace is held it means that some sleepable bpf program is
> + * running. Those programs can use bpf arrays and preallocated hash maps. These
> + * map types are waiting on programs to complete via
> + * synchronize_rcu_tasks_trace();
> + */
> +void notrace __bpf_prog_enter_sleepable(void)
> +{
> +       rcu_read_lock_trace();
> +}
> +
> +void notrace __bpf_prog_exit_sleepable(void)
> +{
> +       rcu_read_unlock_trace();
> +}
> +

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 bpf-next 1/4] bpf: Introduce sleepable BPF programs
  2020-06-11 22:29   ` Alexei Starovoitov
@ 2020-06-12  0:04     ` Paul E. McKenney
  2020-06-12  2:13       ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2020-06-12  0:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Network Development, bpf, Kernel Team

On Thu, Jun 11, 2020 at 03:29:09PM -0700, Alexei Starovoitov wrote:
> On Thu, Jun 11, 2020 at 3:23 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> >  /* dummy _ops. The verifier will operate on target program's ops. */
> >  const struct bpf_verifier_ops bpf_extension_verifier_ops = {
> > @@ -205,14 +206,12 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
> >             tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)
> >                 flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
> >
> > -       /* Though the second half of trampoline page is unused a task could be
> > -        * preempted in the middle of the first half of trampoline and two
> > -        * updates to trampoline would change the code from underneath the
> > -        * preempted task. Hence wait for tasks to voluntarily schedule or go
> > -        * to userspace.
> > +       /* the same trampoline can hold both sleepable and non-sleepable progs.
> > +        * synchronize_rcu_tasks_trace() is needed to make sure all sleepable
> > +        * programs finish executing. It also ensures that the rest of
> > +        * generated tramopline assembly finishes before updating trampoline.
> >          */
> > -
> > -       synchronize_rcu_tasks();
> > +       synchronize_rcu_tasks_trace();
> 
> Hi Paul,
> 
> I've been looking at rcu_trace implementation and I think above change
> is correct.
> Could you please double check my understanding?

From an RCU Tasks Trace perspective, it looks good to me!

You have rcu_read_lock_trace() and rcu_read_unlock_trace() protecting
the readers and synchronize_rcu_trace() waiting for them.

One question given my lack of understanding of BPF:  Are there still
tramoplines for non-sleepable BPF programs?  If so, they might still
need to use synchronize_rcu_tasks() or some such.

The general principle is "never mix one type of RCU reader with another
type of RCU updater".

But in this case, one approach is to use synchronize_rcu_mult():

	synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);

That would wait for both types of readers, and do so concurrently.
And if there is also a need to wait on rcu_read_lock() and friends,
you could do this:

	synchronize_rcu_mult(call_rcu, call_rcu_tasks, call_rcu_tasks_trace);

> Also see benchmarking numbers in the cover letter :)

Now -that- is what I am talking about!!!  Very nice!  ;-)

							Thanx, Paul

> >         err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2,
> >                                           &tr->func.model, flags, tprogs,
> > @@ -344,7 +343,14 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
> >         if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT])))
> >                 goto out;
> >         bpf_image_ksym_del(&tr->ksym);
> > -       /* wait for tasks to get out of trampoline before freeing it */
> > +       /* This code will be executed when all bpf progs (both sleepable and
> > +        * non-sleepable) went through
> > +        * bpf_prog_put()->call_rcu[_tasks_trace]()->bpf_prog_free_deferred().
> > +        * Hence no need for another synchronize_rcu_tasks_trace() here,
> > +        * but synchronize_rcu_tasks() is still needed, since trampoline
> > +        * may not have had any sleepable programs and we need to wait
> > +        * for tasks to get out of trampoline code before freeing it.
> > +        */
> >         synchronize_rcu_tasks();
> >         bpf_jit_free_exec(tr->image);
> >         hlist_del(&tr->hlist);
> > @@ -394,6 +400,21 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start)
> >         rcu_read_unlock();
> >  }
> >
> > +/* when rcu_read_lock_trace is held it means that some sleepable bpf program is
> > + * running. Those programs can use bpf arrays and preallocated hash maps. These
> > + * map types are waiting on programs to complete via
> > + * synchronize_rcu_tasks_trace();
> > + */
> > +void notrace __bpf_prog_enter_sleepable(void)
> > +{
> > +       rcu_read_lock_trace();
> > +}
> > +
> > +void notrace __bpf_prog_exit_sleepable(void)
> > +{
> > +       rcu_read_unlock_trace();
> > +}
> > +

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 bpf-next 1/4] bpf: Introduce sleepable BPF programs
  2020-06-12  0:04     ` Paul E. McKenney
@ 2020-06-12  2:13       ` Alexei Starovoitov
  2020-06-12  3:40         ` Paul E. McKenney
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2020-06-12  2:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David S. Miller, Daniel Borkmann, Network Development, bpf, Kernel Team

On Thu, Jun 11, 2020 at 05:04:47PM -0700, Paul E. McKenney wrote:
> On Thu, Jun 11, 2020 at 03:29:09PM -0700, Alexei Starovoitov wrote:
> > On Thu, Jun 11, 2020 at 3:23 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > >  /* dummy _ops. The verifier will operate on target program's ops. */
> > >  const struct bpf_verifier_ops bpf_extension_verifier_ops = {
> > > @@ -205,14 +206,12 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
> > >             tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)
> > >                 flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
> > >
> > > -       /* Though the second half of trampoline page is unused a task could be
> > > -        * preempted in the middle of the first half of trampoline and two
> > > -        * updates to trampoline would change the code from underneath the
> > > -        * preempted task. Hence wait for tasks to voluntarily schedule or go
> > > -        * to userspace.
> > > +       /* the same trampoline can hold both sleepable and non-sleepable progs.
> > > +        * synchronize_rcu_tasks_trace() is needed to make sure all sleepable
> > > +        * programs finish executing. It also ensures that the rest of
> > > +        * generated tramopline assembly finishes before updating trampoline.
> > >          */
> > > -
> > > -       synchronize_rcu_tasks();
> > > +       synchronize_rcu_tasks_trace();
> > 
> > Hi Paul,
> > 
> > I've been looking at rcu_trace implementation and I think above change
> > is correct.
> > Could you please double check my understanding?
> 
> From an RCU Tasks Trace perspective, it looks good to me!
> 
> You have rcu_read_lock_trace() and rcu_read_unlock_trace() protecting
> the readers and synchronize_rcu_trace() waiting for them.
> 
> One question given my lack of understanding of BPF:  Are there still
> tramoplines for non-sleepable BPF programs?  If so, they might still
> need to use synchronize_rcu_tasks() or some such.

The same trampoline can hold both sleepable and non-sleepable progs.
The following is possible:
. trampoline asm starts
  . rcu_read_lock + migrate_disable
    . non-sleepable prog_A
  . rcu_read_unlock + migrate_enable
. trampoline asm
  . rcu_read_lock_trace
    . sleepable prog_B
  . rcu_read_unlock_trace
. trampoline asm
  . rcu_read_lock + migrate_disable
    . non-sleepable prog_C
  . rcu_read_unlock + migrate_enable
. trampoline asm ends

> 
> The general principle is "never mix one type of RCU reader with another
> type of RCU updater".
> 
> But in this case, one approach is to use synchronize_rcu_mult():
> 
> 	synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);

That was my first approach, but I've started looking deeper and looks
like rcu_tasks_trace is stronger than rcu_tasks.
'never mix' is a valid concern, so for future proofing the rcu_mult()
is cleaner, but from safety pov just sync*rcu_tasks_trace() is enough
even when trampoline doesn't hold sleepable progs, right ?

Also timing wise rcu_mult() is obviously faster than doing
one at a time, but how do you sort their speeds:
A: synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
B: synchronize_rcu_tasks();
C: synchronize_rcu_tasks_trace();

> That would wait for both types of readers, and do so concurrently.
> And if there is also a need to wait on rcu_read_lock() and friends,
> you could do this:
> 
> 	synchronize_rcu_mult(call_rcu, call_rcu_tasks, call_rcu_tasks_trace);

I was about to reply that trampoline doesn't need it and there is no such
case yet, but then realized that I can use it in hashtab freeing with:
synchronize_rcu_mult(call_rcu, call_rcu_tasks_trace);
That would be nice optimization.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 bpf-next 1/4] bpf: Introduce sleepable BPF programs
  2020-06-12  2:13       ` Alexei Starovoitov
@ 2020-06-12  3:40         ` Paul E. McKenney
  0 siblings, 0 replies; 15+ messages in thread
From: Paul E. McKenney @ 2020-06-12  3:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Network Development, bpf, Kernel Team

On Thu, Jun 11, 2020 at 07:13:01PM -0700, Alexei Starovoitov wrote:
> On Thu, Jun 11, 2020 at 05:04:47PM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 11, 2020 at 03:29:09PM -0700, Alexei Starovoitov wrote:
> > > On Thu, Jun 11, 2020 at 3:23 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > >  /* dummy _ops. The verifier will operate on target program's ops. */
> > > >  const struct bpf_verifier_ops bpf_extension_verifier_ops = {
> > > > @@ -205,14 +206,12 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
> > > >             tprogs[BPF_TRAMP_MODIFY_RETURN].nr_progs)
> > > >                 flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
> > > >
> > > > -       /* Though the second half of trampoline page is unused a task could be
> > > > -        * preempted in the middle of the first half of trampoline and two
> > > > -        * updates to trampoline would change the code from underneath the
> > > > -        * preempted task. Hence wait for tasks to voluntarily schedule or go
> > > > -        * to userspace.
> > > > +       /* the same trampoline can hold both sleepable and non-sleepable progs.
> > > > +        * synchronize_rcu_tasks_trace() is needed to make sure all sleepable
> > > > +        * programs finish executing. It also ensures that the rest of
> > > > +        * generated tramopline assembly finishes before updating trampoline.
> > > >          */
> > > > -
> > > > -       synchronize_rcu_tasks();
> > > > +       synchronize_rcu_tasks_trace();
> > > 
> > > Hi Paul,
> > > 
> > > I've been looking at rcu_trace implementation and I think above change
> > > is correct.
> > > Could you please double check my understanding?
> > 
> > From an RCU Tasks Trace perspective, it looks good to me!
> > 
> > You have rcu_read_lock_trace() and rcu_read_unlock_trace() protecting
> > the readers and synchronize_rcu_trace() waiting for them.
> > 
> > One question given my lack of understanding of BPF:  Are there still
> > tramoplines for non-sleepable BPF programs?  If so, they might still
> > need to use synchronize_rcu_tasks() or some such.
> 
> The same trampoline can hold both sleepable and non-sleepable progs.
> The following is possible:
> . trampoline asm starts
>   . rcu_read_lock + migrate_disable
>     . non-sleepable prog_A
>   . rcu_read_unlock + migrate_enable
> . trampoline asm
>   . rcu_read_lock_trace
>     . sleepable prog_B
>   . rcu_read_unlock_trace
> . trampoline asm
>   . rcu_read_lock + migrate_disable
>     . non-sleepable prog_C
>   . rcu_read_unlock + migrate_enable
> . trampoline asm ends

Ah, new one on me!

> > The general principle is "never mix one type of RCU reader with another
> > type of RCU updater".
> > 
> > But in this case, one approach is to use synchronize_rcu_mult():
> > 
> > 	synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
> 
> That was my first approach, but I've started looking deeper and looks
> like rcu_tasks_trace is stronger than rcu_tasks.
> 'never mix' is a valid concern, so for future proofing the rcu_mult()
> is cleaner, but from safety pov just sync*rcu_tasks_trace() is enough
> even when trampoline doesn't hold sleepable progs, right ?

You really can have synchronize_rcu_tasks_trace() return before
synchronize_rcu_tasks().  And vice versa, though perhaps with less
probability.  So if you need both, you need to use both.

> Also timing wise rcu_mult() is obviously faster than doing
> one at a time, but how do you sort their speeds:
> A: synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
> B: synchronize_rcu_tasks();
> C: synchronize_rcu_tasks_trace();

duration(A) cannot be shorter than either duration(B) or duration(C).
In theory, duration(A) = max(duration(B), duration(C)).  In practice,
things are a bit messier, but the max() is not a bad rule of thumb.

> > That would wait for both types of readers, and do so concurrently.
> > And if there is also a need to wait on rcu_read_lock() and friends,
> > you could do this:
> > 
> > 	synchronize_rcu_mult(call_rcu, call_rcu_tasks, call_rcu_tasks_trace);
> 
> I was about to reply that trampoline doesn't need it and there is no such
> case yet, but then realized that I can use it in hashtab freeing with:
> synchronize_rcu_mult(call_rcu, call_rcu_tasks_trace);
> That would be nice optimization.

Very good!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 bpf-next 2/4] bpf: Add bpf_copy_from_user() helper.
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 2/4] bpf: Add bpf_copy_from_user() helper Alexei Starovoitov
@ 2020-06-18 22:33   ` Andrii Nakryiko
  2020-06-30  0:28     ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2020-06-18 22:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Paul E . McKenney, Networking,
	bpf, Kernel Team

On Thu, Jun 11, 2020 at 3:24 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Sleepable BPF programs can now use copy_from_user() to access user memory.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/linux/bpf.h            |  1 +
>  include/uapi/linux/bpf.h       | 11 ++++++++++-
>  kernel/bpf/helpers.c           | 22 ++++++++++++++++++++++
>  kernel/trace/bpf_trace.c       |  2 ++
>  tools/include/uapi/linux/bpf.h | 11 ++++++++++-
>  5 files changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 6819000682a5..c8c9217f3ac9 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1632,6 +1632,7 @@ extern const struct bpf_func_proto bpf_ringbuf_reserve_proto;
>  extern const struct bpf_func_proto bpf_ringbuf_submit_proto;
>  extern const struct bpf_func_proto bpf_ringbuf_discard_proto;
>  extern const struct bpf_func_proto bpf_ringbuf_query_proto;
> +extern const struct bpf_func_proto bpf_copy_from_user_proto;
>
>  const struct bpf_func_proto *bpf_tracing_func_proto(
>         enum bpf_func_id func_id, const struct bpf_prog *prog);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 0bef454c9598..a38c806d34ad 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3260,6 +3260,13 @@ union bpf_attr {
>   *             case of **BPF_CSUM_LEVEL_QUERY**, the current skb->csum_level
>   *             is returned or the error code -EACCES in case the skb is not
>   *             subject to CHECKSUM_UNNECESSARY.
> + *
> + * int bpf_copy_from_user(void *dst, u32 size, const void *user_ptr)

Can we also add bpf_copy_str_from_user (or bpf_copy_from_user_str,
whichever makes more sense) as well?

> + *     Description
> + *             Read *size* bytes from user space address *user_ptr* and store
> + *             the data in *dst*. This is a wrapper of copy_from_user().
> + *     Return
> + *             0 on success, or a negative error in case of failure.
>   */

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 bpf-next 3/4] libbpf: support sleepable progs
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 3/4] libbpf: support sleepable progs Alexei Starovoitov
@ 2020-06-18 22:34   ` Andrii Nakryiko
  0 siblings, 0 replies; 15+ messages in thread
From: Andrii Nakryiko @ 2020-06-18 22:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Paul E . McKenney, Networking,
	bpf, Kernel Team

On Thu, Jun 11, 2020 at 3:25 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Pass request to load program as sleepable via ".s" suffix in the section name.
> If it happens in the future that all map types and helpers are allowed with
> BPF_F_SLEEPABLE flag "fmod_ret/" and "lsm/" can be aliased to "fmod_ret.s/" and
> "lsm.s/" to make all lsm and fmod_ret programs sleepable by default. The fentry
> and fexit programs would always need to have sleepable vs non-sleepable
> distinction, since not all fentry/fexit progs will be attached to sleepable
> kernel functions.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: KP Singh <kpsingh@google.com>
> ---

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  tools/lib/bpf/libbpf.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
>

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 bpf-next 4/4] selftests/bpf: basic sleepable tests
  2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 4/4] selftests/bpf: basic sleepable tests Alexei Starovoitov
@ 2020-06-18 22:43   ` Andrii Nakryiko
  0 siblings, 0 replies; 15+ messages in thread
From: Andrii Nakryiko @ 2020-06-18 22:43 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Paul E . McKenney, Networking,
	bpf, Kernel Team

On Thu, Jun 11, 2020 at 3:25 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Modify few tests to sanity test sleepable bpf functionality.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: KP Singh <kpsingh@google.com>
> ---
>  tools/testing/selftests/bpf/bench.c             |  2 ++
>  .../selftests/bpf/benchs/bench_trigger.c        | 17 +++++++++++++++++
>  tools/testing/selftests/bpf/progs/lsm.c         | 14 ++++++++++++--
>  .../testing/selftests/bpf/progs/trigger_bench.c |  7 +++++++
>  4 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
> index 944ad4721c83..1a427685a8a8 100644
> --- a/tools/testing/selftests/bpf/bench.c
> +++ b/tools/testing/selftests/bpf/bench.c
> @@ -317,6 +317,7 @@ extern const struct bench bench_trig_tp;
>  extern const struct bench bench_trig_rawtp;
>  extern const struct bench bench_trig_kprobe;
>  extern const struct bench bench_trig_fentry;
> +extern const struct bench bench_trig_fentry_sleep;
>  extern const struct bench bench_trig_fmodret;
>  extern const struct bench bench_rb_libbpf;
>  extern const struct bench bench_rb_custom;
> @@ -338,6 +339,7 @@ static const struct bench *benchs[] = {
>         &bench_trig_rawtp,
>         &bench_trig_kprobe,
>         &bench_trig_fentry,
> +       &bench_trig_fentry_sleep,

Can you please add results to commit description for fentry and
fentry_sleep benchmark, just for comparison?

>         &bench_trig_fmodret,
>         &bench_rb_libbpf,
>         &bench_rb_custom,

[...]

>
> @@ -28,6 +30,9 @@ int BPF_PROG(test_int_hook, struct vm_area_struct *vma,
>         is_stack = (vma->vm_start <= vma->vm_mm->start_stack &&
>                     vma->vm_end >= vma->vm_mm->start_stack);
>
> +       bpf_copy_from_user(args, sizeof(args), (void *)vma->vm_mm->arg_start);

is there some way to ensure that user memory definitely is not paged
in (and do it from user-space selftest part), so that
bpf_copy_from_user() *definitely* sleeps? That would be the real test.
As is, even bpf_probe_read_user() might succeed as well, isn't that
right?

Seems like doing madvise(MADV_DONTNEED) should be able to accomplish
that? So instead of reading arg_start, we can pre-setup mmap()'ed
file, MADV_DONTNEED it, then trigger LSM program and let it attempt
reading that chunk of memory?


> +       /*bpf_printk("args=%s\n", args);*/

debugging leftover?

> +
>         if (is_stack && monitored_pid == pid) {
>                 mprotect_count++;
>                 ret = -EPERM;
> @@ -36,7 +41,7 @@ int BPF_PROG(test_int_hook, struct vm_area_struct *vma,
>         return ret;
>  }
>
> -SEC("lsm/bprm_committed_creds")
> +SEC("lsm.s/bprm_committed_creds")
>  int BPF_PROG(test_void_hook, struct linux_binprm *bprm)
>  {
>         __u32 pid = bpf_get_current_pid_tgid() >> 32;
> @@ -46,3 +51,8 @@ int BPF_PROG(test_void_hook, struct linux_binprm *bprm)
>
>         return 0;
>  }

nit: empty line here, don't squash those functions together :)


> +SEC("lsm/task_free") /* lsm/ is ok, lsm.s/ fails */
> +int BPF_PROG(test_task_free, struct task_struct *task)
> +{
> +       return 0;
> +}
> diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c
> index 8b36b6640e7e..9a4d09590b3d 100644
> --- a/tools/testing/selftests/bpf/progs/trigger_bench.c
> +++ b/tools/testing/selftests/bpf/progs/trigger_bench.c
> @@ -39,6 +39,13 @@ int bench_trigger_fentry(void *ctx)
>         return 0;
>  }
>
> +SEC("fentry.s/__x64_sys_getpgid")
> +int bench_trigger_fentry_sleep(void *ctx)
> +{
> +       __sync_add_and_fetch(&hits, 1);
> +       return 0;
> +}
> +
>  SEC("fmod_ret/__x64_sys_getpgid")
>  int bench_trigger_fmodret(void *ctx)
>  {
> --
> 2.23.0
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 bpf-next 2/4] bpf: Add bpf_copy_from_user() helper.
  2020-06-18 22:33   ` Andrii Nakryiko
@ 2020-06-30  0:28     ` Alexei Starovoitov
  2020-06-30 18:23       ` Andrii Nakryiko
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2020-06-30  0:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S. Miller, Daniel Borkmann, Paul E . McKenney, Networking,
	bpf, Kernel Team

On Thu, Jun 18, 2020 at 3:33 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> > + *
> > + * int bpf_copy_from_user(void *dst, u32 size, const void *user_ptr)
>
> Can we also add bpf_copy_str_from_user (or bpf_copy_from_user_str,
> whichever makes more sense) as well?

Those would have to wait. I think strings need better long term design.
That would be separate patches.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 bpf-next 2/4] bpf: Add bpf_copy_from_user() helper.
  2020-06-30  0:28     ` Alexei Starovoitov
@ 2020-06-30 18:23       ` Andrii Nakryiko
  2020-06-30 18:53         ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2020-06-30 18:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Paul E . McKenney, Networking,
	bpf, Kernel Team

On Mon, Jun 29, 2020 at 5:28 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jun 18, 2020 at 3:33 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > > + *
> > > + * int bpf_copy_from_user(void *dst, u32 size, const void *user_ptr)
> >
> > Can we also add bpf_copy_str_from_user (or bpf_copy_from_user_str,
> > whichever makes more sense) as well?
>
> Those would have to wait. I think strings need better long term design.
> That would be separate patches.

I agree that it would be nice to have better support for strings, long
term, but that's beside the point.

I think bpf_copy_from_user_str() is a must have right now as a
sleepable counterpart to bpf_probe_read_user_str(), just like
bpf_copy_from_user() is a sleepable variant of bpf_probe_read_user().
Look at progs/strobemeta.h, it does bpf_probe_read_user_str() to get
user-space zero-terminated strings. It's well defined interface and
behavior. There is nothing extra needed beyond a sleepable variant of
bpf_probe_read_user_str() to allow Strobemeta reliably fetch data from
user-space from inside a sleepable BPF program.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC v3 bpf-next 2/4] bpf: Add bpf_copy_from_user() helper.
  2020-06-30 18:23       ` Andrii Nakryiko
@ 2020-06-30 18:53         ` Alexei Starovoitov
  0 siblings, 0 replies; 15+ messages in thread
From: Alexei Starovoitov @ 2020-06-30 18:53 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S. Miller, Daniel Borkmann, Paul E . McKenney, Networking,
	bpf, Kernel Team

On Tue, Jun 30, 2020 at 11:23:07AM -0700, Andrii Nakryiko wrote:
> On Mon, Jun 29, 2020 at 5:28 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jun 18, 2020 at 3:33 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > > > + *
> > > > + * int bpf_copy_from_user(void *dst, u32 size, const void *user_ptr)
> > >
> > > Can we also add bpf_copy_str_from_user (or bpf_copy_from_user_str,
> > > whichever makes more sense) as well?
> >
> > Those would have to wait. I think strings need better long term design.
> > That would be separate patches.
> 
> I agree that it would be nice to have better support for strings, long
> term, but that's beside the point.
> 
> I think bpf_copy_from_user_str() is a must have right now as a
> sleepable counterpart to bpf_probe_read_user_str(), just like
> bpf_copy_from_user() is a sleepable variant of bpf_probe_read_user().
> Look at progs/strobemeta.h, it does bpf_probe_read_user_str() to get
> user-space zero-terminated strings. It's well defined interface and
> behavior. There is nothing extra needed beyond a sleepable variant of
> bpf_probe_read_user_str() to allow Strobemeta reliably fetch data from
> user-space from inside a sleepable BPF program.

short answer: may be.
long answer: I'm not sure that bpf_probe_read_user_str() is such a great interface.
Copy pasting something just because it exists and already known is imo not
the best approach.
I believe KP is thinking about string chunking logic to be able
to pass many null terminated strings from bpf prog to user space.
envvars is such example. It's one giant array of strings separated
by zeros. The semantics of bpf_probe_read_user_str() may or may not
be a good fit.
strobemeta is 'cheating' with strings by doing max bound on all of them.
For tracing applications like strobemeta it's ok, but for lsm it's not.
Hence I'd like to see an api that solves lsm case along with strobemeta case.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-06-30 18:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-11 22:23 [PATCH RFC v3 bpf-next 0/4] bpf: Introduce minimal support for sleepable progs Alexei Starovoitov
2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 1/4] bpf: Introduce sleepable BPF programs Alexei Starovoitov
2020-06-11 22:29   ` Alexei Starovoitov
2020-06-12  0:04     ` Paul E. McKenney
2020-06-12  2:13       ` Alexei Starovoitov
2020-06-12  3:40         ` Paul E. McKenney
2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 2/4] bpf: Add bpf_copy_from_user() helper Alexei Starovoitov
2020-06-18 22:33   ` Andrii Nakryiko
2020-06-30  0:28     ` Alexei Starovoitov
2020-06-30 18:23       ` Andrii Nakryiko
2020-06-30 18:53         ` Alexei Starovoitov
2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 3/4] libbpf: support sleepable progs Alexei Starovoitov
2020-06-18 22:34   ` Andrii Nakryiko
2020-06-11 22:23 ` [PATCH RFC v3 bpf-next 4/4] selftests/bpf: basic sleepable tests Alexei Starovoitov
2020-06-18 22:43   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).