BPF Archive on lore.kernel.org
 help / color / Atom feed
* [RFC 0/5] bpf: Add trampoline helpers
@ 2019-12-29 14:37 Jiri Olsa
  2019-12-29 14:37 ` [PATCH 1/5] bpf: Allow non struct type for btf ctx access Jiri Olsa
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Jiri Olsa @ 2019-12-29 14:37 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, bpf, Andrii Nakryiko, Yonghong Song, Martin KaFai Lau,
	Jakub Kicinski, David Miller

hi,
I understand trampolines are brand new stuff and you guys
might be already working on this.

However, I was checking on the trampoline probes and could
get some really nice speedup for few bcc progs.

Here's output of perf bench while running klockstat.py:

    Without:
            $ perf bench sched messaging -l 50000
            ...
                 Total time: 18.571 [sec]

    With current kprobe tracing:
            $ perf bench sched messaging -l 50000
            ...
                 Total time: 183.395 [sec]

    With kfunc tracing:
            $ perf bench sched messaging -l 50000
            ...
                 Total time: 39.773 [sec]


I needed to add few perf_event_output, stack retrieval
helpers and trampoline lookup during orc unwinding.

It's also available in here:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git bpf/kfunc


Apart from these helpers, several other patches (like perf
and ftrace ring buffer renames) are needed to make it all work,
it's pushed in here:
  git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git bpf/kfunc_all

You can check on current bcc changes in here:
  https://github.com/olsajiri/bcc/tree/kfunc

thanks,
jirka


---
Jiri Olsa (5):
      bpf: Allow non struct type for btf ctx access
      bpf: Add bpf_perf_event_output_kfunc
      bpf: Add bpf_get_stackid_kfunc
      bpf: Add bpf_get_stack_kfunc
      bpf: Allow to resolve bpf trampoline in unwind

 include/linux/bpf.h      |   6 ++++++
 kernel/bpf/btf.c         |   6 ------
 kernel/bpf/core.c        |   2 ++
 kernel/bpf/trampoline.c  |  35 +++++++++++++++++++++++++++++++++++
 kernel/trace/bpf_trace.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 166 insertions(+), 6 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/5] bpf: Allow non struct type for btf ctx access
  2019-12-29 14:37 [RFC 0/5] bpf: Add trampoline helpers Jiri Olsa
@ 2019-12-29 14:37 ` Jiri Olsa
  2020-01-06 21:36   ` Yonghong Song
  2019-12-29 14:37 ` [PATCH 2/5] bpf: Add bpf_perf_event_output_kfunc Jiri Olsa
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2019-12-29 14:37 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, bpf, Andrii Nakryiko, Yonghong Song, Martin KaFai Lau,
	Jakub Kicinski, David Miller

I'm not sure why the restriction was added,
but I can't access pointers to POD types like
const char * when probing vfs_read function.

Removing the check and allow non struct type
access in context.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/bpf/btf.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index ed2075884724..ae90f60ac1b8 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3712,12 +3712,6 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 	/* skip modifiers */
 	while (btf_type_is_modifier(t))
 		t = btf_type_by_id(btf, t->type);
-	if (!btf_type_is_struct(t)) {
-		bpf_log(log,
-			"func '%s' arg%d type %s is not a struct\n",
-			tname, arg, btf_kind_str[BTF_INFO_KIND(t->info)]);
-		return false;
-	}
 	bpf_log(log, "func '%s' arg%d has btf_id %d type %s '%s'\n",
 		tname, arg, info->btf_id, btf_kind_str[BTF_INFO_KIND(t->info)],
 		__btf_name_by_offset(btf, t->name_off));
-- 
2.21.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/5] bpf: Add bpf_perf_event_output_kfunc
  2019-12-29 14:37 [RFC 0/5] bpf: Add trampoline helpers Jiri Olsa
  2019-12-29 14:37 ` [PATCH 1/5] bpf: Allow non struct type for btf ctx access Jiri Olsa
@ 2019-12-29 14:37 ` Jiri Olsa
  2020-01-06 23:27   ` Alexei Starovoitov
  2019-12-29 14:37 ` [PATCH 3/5] bpf: Add bpf_get_stackid_kfunc Jiri Olsa
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2019-12-29 14:37 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, bpf, Andrii Nakryiko, Yonghong Song, Martin KaFai Lau,
	Jakub Kicinski, David Miller

Adding support to use perf_event_output in
BPF_TRACE_FENTRY/BPF_TRACE_FEXIT programs.

There are no pt_regs available in the trampoline,
so getting one via bpf_kfunc_regs array.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/trace/bpf_trace.c | 67 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e5ef4ae9edb5..1b270bbd9016 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1151,6 +1151,69 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	}
 }
 
+struct bpf_kfunc_regs {
+	struct pt_regs regs[3];
+};
+
+static DEFINE_PER_CPU(struct bpf_kfunc_regs, bpf_kfunc_regs);
+static DEFINE_PER_CPU(int, bpf_kfunc_nest_level);
+
+static struct pt_regs *get_bpf_kfunc_regs(void)
+{
+	struct bpf_kfunc_regs *tp_regs = this_cpu_ptr(&bpf_kfunc_regs);
+	int nest_level = this_cpu_inc_return(bpf_kfunc_nest_level);
+
+	if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(tp_regs->regs))) {
+		this_cpu_dec(bpf_kfunc_nest_level);
+		return ERR_PTR(-EBUSY);
+	}
+
+	return &tp_regs->regs[nest_level - 1];
+}
+
+static void put_bpf_kfunc_regs(void)
+{
+	this_cpu_dec(bpf_kfunc_nest_level);
+}
+
+BPF_CALL_5(bpf_perf_event_output_kfunc, void *, ctx, struct bpf_map *, map,
+	   u64, flags, void *, data, u64, size)
+{
+	struct pt_regs *regs = get_bpf_kfunc_regs();
+	int ret;
+
+	if (IS_ERR(regs))
+		return PTR_ERR(regs);
+
+	perf_fetch_caller_regs(regs);
+	ret = ____bpf_perf_event_output(regs, map, flags, data, size);
+
+	put_bpf_kfunc_regs();
+	return ret;
+}
+
+static const struct bpf_func_proto bpf_perf_event_output_proto_kfunc = {
+	.func		= bpf_perf_event_output_kfunc,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE_OR_ZERO,
+};
+
+static const struct bpf_func_proto *
+kfunc_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_perf_event_output:
+		return &bpf_perf_event_output_proto_kfunc;
+	default:
+		return tracing_func_proto(func_id, prog);
+	}
+}
+
 static const struct bpf_func_proto *
 tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1160,6 +1223,10 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_skb_output_proto;
 #endif
 	default:
+		if (prog->expected_attach_type == BPF_TRACE_FENTRY ||
+		    prog->expected_attach_type == BPF_TRACE_FEXIT)
+			return kfunc_prog_func_proto(func_id, prog);
+
 		return raw_tp_prog_func_proto(func_id, prog);
 	}
 }
-- 
2.21.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/5] bpf: Add bpf_get_stackid_kfunc
  2019-12-29 14:37 [RFC 0/5] bpf: Add trampoline helpers Jiri Olsa
  2019-12-29 14:37 ` [PATCH 1/5] bpf: Allow non struct type for btf ctx access Jiri Olsa
  2019-12-29 14:37 ` [PATCH 2/5] bpf: Add bpf_perf_event_output_kfunc Jiri Olsa
@ 2019-12-29 14:37 ` Jiri Olsa
  2019-12-29 14:37 ` [PATCH 4/5] bpf: Add bpf_get_stack_kfunc Jiri Olsa
  2019-12-29 14:37 ` [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind Jiri Olsa
  4 siblings, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2019-12-29 14:37 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, bpf, Andrii Nakryiko, Yonghong Song, Martin KaFai Lau,
	Jakub Kicinski, David Miller

Adding support to use bpf_get_stackid_kfunc in
BPF_TRACE_FENTRY/BPF_TRACE_FEXIT programs.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/trace/bpf_trace.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 1b270bbd9016..c8e0709704f5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1203,12 +1203,40 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_kfunc = {
 	.arg5_type	= ARG_CONST_SIZE_OR_ZERO,
 };
 
+BPF_CALL_3(bpf_get_stackid_kfunc, void*, args,
+	   struct bpf_map *, map, u64, flags)
+{
+	struct pt_regs *regs = get_bpf_kfunc_regs();
+	int ret;
+
+	if (IS_ERR(regs))
+		return PTR_ERR(regs);
+
+	perf_fetch_caller_regs(regs);
+	/* similar to bpf_perf_event_output_tp, but pt_regs fetched differently */
+	ret = bpf_get_stackid((unsigned long) regs, (unsigned long) map,
+			      flags, 0, 0);
+	put_bpf_kfunc_regs();
+	return ret;
+}
+
+static const struct bpf_func_proto bpf_get_stackid_proto_kfunc = {
+	.func		= bpf_get_stackid_kfunc,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 kfunc_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
 		return &bpf_perf_event_output_proto_kfunc;
+	case BPF_FUNC_get_stackid:
+		return &bpf_get_stackid_proto_kfunc;
 	default:
 		return tracing_func_proto(func_id, prog);
 	}
-- 
2.21.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 4/5] bpf: Add bpf_get_stack_kfunc
  2019-12-29 14:37 [RFC 0/5] bpf: Add trampoline helpers Jiri Olsa
                   ` (2 preceding siblings ...)
  2019-12-29 14:37 ` [PATCH 3/5] bpf: Add bpf_get_stackid_kfunc Jiri Olsa
@ 2019-12-29 14:37 ` Jiri Olsa
  2019-12-29 14:37 ` [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind Jiri Olsa
  4 siblings, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2019-12-29 14:37 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, bpf, Andrii Nakryiko, Yonghong Song, Martin KaFai Lau,
	Jakub Kicinski, David Miller

Adding support to use bpf_get_stack_kfunc in
BPF_TRACE_FENTRY/BPF_TRACE_FEXIT programs.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/trace/bpf_trace.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index c8e0709704f5..02979c5d6357 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1229,6 +1229,32 @@ static const struct bpf_func_proto bpf_get_stackid_proto_kfunc = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_get_stack_kfunc, void*, args,
+	   void *, buf, u32, size, u64, flags)
+{
+	struct pt_regs *regs = get_bpf_kfunc_regs();
+	int ret;
+
+	if (IS_ERR(regs))
+		return PTR_ERR(regs);
+
+	perf_fetch_caller_regs(regs);
+	ret = bpf_get_stack((unsigned long) regs, (unsigned long) buf,
+			    (unsigned long) size, flags, 0);
+	put_bpf_kfunc_regs();
+	return ret;
+}
+
+static const struct bpf_func_proto bpf_get_stack_proto_kfunc = {
+	.func		= bpf_get_stack_raw_tp,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg4_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 kfunc_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1237,6 +1263,8 @@ kfunc_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_perf_event_output_proto_kfunc;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto_kfunc;
+	case BPF_FUNC_get_stack:
+		return &bpf_get_stack_proto_kfunc;
 	default:
 		return tracing_func_proto(func_id, prog);
 	}
-- 
2.21.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2019-12-29 14:37 [RFC 0/5] bpf: Add trampoline helpers Jiri Olsa
                   ` (3 preceding siblings ...)
  2019-12-29 14:37 ` [PATCH 4/5] bpf: Add bpf_get_stack_kfunc Jiri Olsa
@ 2019-12-29 14:37 ` Jiri Olsa
  2020-01-06 23:46   ` Alexei Starovoitov
  4 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2019-12-29 14:37 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, bpf, Andrii Nakryiko, Yonghong Song, Martin KaFai Lau,
	Jakub Kicinski, David Miller

When unwinding the stack we need to identify each
address to successfully continue. Adding latch tree
to keep trampolines for quick lookup during the
unwind.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 include/linux/bpf.h     |  6 ++++++
 kernel/bpf/core.c       |  2 ++
 kernel/bpf/trampoline.c | 35 +++++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b14e51d56a82..66825c821ac9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -470,6 +470,7 @@ struct bpf_trampoline {
 	/* Executable image of trampoline */
 	void *image;
 	u64 selector;
+	struct latch_tree_node tnode;
 };
 
 #define BPF_DISPATCHER_MAX 48 /* Fits in 2048B */
@@ -502,6 +503,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
 int bpf_trampoline_link_prog(struct bpf_prog *prog);
 int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
+bool is_bpf_trampoline(void *addr);
 void *bpf_jit_alloc_exec_page(void);
 #define BPF_DISPATCHER_INIT(name) {			\
 	.mutex = __MUTEX_INITIALIZER(name.mutex),	\
@@ -555,6 +557,10 @@ static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
 static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d,
 					      struct bpf_prog *from,
 					      struct bpf_prog *to) {}
+static inline bool is_bpf_trampoline(void *addr)
+{
+	return false;
+}
 #endif
 
 struct bpf_func_info_aux {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 29d47aae0dd1..63a515b5aa7b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -704,6 +704,8 @@ bool is_bpf_text_address(unsigned long addr)
 
 	rcu_read_lock();
 	ret = bpf_prog_kallsyms_find(addr) != NULL;
+	if (!ret)
+		ret = is_bpf_trampoline((void*) addr);
 	rcu_read_unlock();
 
 	return ret;
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 505f4e4b31d2..4b5f0d0b0072 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -4,16 +4,44 @@
 #include <linux/bpf.h>
 #include <linux/filter.h>
 #include <linux/ftrace.h>
+#include <linux/rbtree_latch.h>
 
 /* btf_vmlinux has ~22k attachable functions. 1k htab is enough. */
 #define TRAMPOLINE_HASH_BITS 10
 #define TRAMPOLINE_TABLE_SIZE (1 << TRAMPOLINE_HASH_BITS)
 
 static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
+static struct latch_tree_root tree __cacheline_aligned;
 
 /* serializes access to trampoline_table */
 static DEFINE_MUTEX(trampoline_mutex);
 
+static __always_inline bool tree_less(struct latch_tree_node *a,
+				      struct latch_tree_node *b)
+{
+	struct bpf_trampoline *ta = container_of(a, struct bpf_trampoline, tnode);
+	struct bpf_trampoline *tb = container_of(b, struct bpf_trampoline, tnode);
+
+	return ta->image < tb->image;
+}
+
+static __always_inline int tree_comp(void *addr, struct latch_tree_node *n)
+{
+	struct bpf_trampoline *tr = container_of(n, struct bpf_trampoline, tnode);
+
+	if (addr < tr->image)
+		return -1;
+	if (addr >= tr->image + PAGE_SIZE)
+		return  1;
+
+	return 0;
+}
+
+static const struct latch_tree_ops tree_ops = {
+	.less	= tree_less,
+	.comp	= tree_comp,
+};
+
 void *bpf_jit_alloc_exec_page(void)
 {
 	void *image;
@@ -30,6 +58,11 @@ void *bpf_jit_alloc_exec_page(void)
 	return image;
 }
 
+bool is_bpf_trampoline(void *addr)
+{
+	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
+}
+
 struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 {
 	struct bpf_trampoline *tr;
@@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 	for (i = 0; i < BPF_TRAMP_MAX; i++)
 		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
 	tr->image = image;
+	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
 out:
 	mutex_unlock(&trampoline_mutex);
 	return tr;
@@ -252,6 +286,7 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
 		goto out;
 	bpf_jit_free_exec(tr->image);
 	hlist_del(&tr->hlist);
+	latch_tree_erase(&tr->tnode, &tree, &tree_ops);
 	kfree(tr);
 out:
 	mutex_unlock(&trampoline_mutex);
-- 
2.21.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/5] bpf: Allow non struct type for btf ctx access
  2019-12-29 14:37 ` [PATCH 1/5] bpf: Allow non struct type for btf ctx access Jiri Olsa
@ 2020-01-06 21:36   ` Yonghong Song
  2020-01-07 12:13     ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2020-01-06 21:36 UTC (permalink / raw)
  To: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann
  Cc: netdev, bpf, Andrii Nakryiko, Martin Lau, Jakub Kicinski, David Miller



On 12/29/19 6:37 AM, Jiri Olsa wrote:
> I'm not sure why the restriction was added,
> but I can't access pointers to POD types like
> const char * when probing vfs_read function.
> 
> Removing the check and allow non struct type
> access in context.
> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>   kernel/bpf/btf.c | 6 ------
>   1 file changed, 6 deletions(-)
> 
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index ed2075884724..ae90f60ac1b8 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3712,12 +3712,6 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>   	/* skip modifiers */
>   	while (btf_type_is_modifier(t))
>   		t = btf_type_by_id(btf, t->type);
> -	if (!btf_type_is_struct(t)) {
> -		bpf_log(log,
> -			"func '%s' arg%d type %s is not a struct\n",
> -			tname, arg, btf_kind_str[BTF_INFO_KIND(t->info)]);
> -		return false;
> -	}

Hi, Jiri, the RFC looks great! Especially, you also referenced this will
give great performance boost for bcc scripts.

Could you provide more context on why the above change is needed?
The function btf_ctx_access is used to check validity of accessing
function parameters which are wrapped inside a structure, I am wondering
what kinds of accesses you tried to address here.

>   	bpf_log(log, "func '%s' arg%d has btf_id %d type %s '%s'\n",
>   		tname, arg, info->btf_id, btf_kind_str[BTF_INFO_KIND(t->info)],
>   		__btf_name_by_offset(btf, t->name_off));
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] bpf: Add bpf_perf_event_output_kfunc
  2019-12-29 14:37 ` [PATCH 2/5] bpf: Add bpf_perf_event_output_kfunc Jiri Olsa
@ 2020-01-06 23:27   ` Alexei Starovoitov
  2020-01-07 12:25     ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2020-01-06 23:27 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf,
	Andrii Nakryiko, Yonghong Song, Martin KaFai Lau, Jakub Kicinski,
	David Miller

On Sun, Dec 29, 2019 at 03:37:37PM +0100, Jiri Olsa wrote:
> Adding support to use perf_event_output in
> BPF_TRACE_FENTRY/BPF_TRACE_FEXIT programs.
> 
> There are no pt_regs available in the trampoline,
> so getting one via bpf_kfunc_regs array.
> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  kernel/trace/bpf_trace.c | 67 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 67 insertions(+)
> 
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index e5ef4ae9edb5..1b270bbd9016 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1151,6 +1151,69 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  	}
>  }
>  
> +struct bpf_kfunc_regs {
> +	struct pt_regs regs[3];
> +};
> +
> +static DEFINE_PER_CPU(struct bpf_kfunc_regs, bpf_kfunc_regs);
> +static DEFINE_PER_CPU(int, bpf_kfunc_nest_level);

Thanks a bunch for working on it.

I don't understand why new regs array and nest level is needed.
Can raw_tp_prog_func_proto() be reused as-is?
Instead of patches 2,3,4 ?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2019-12-29 14:37 ` [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind Jiri Olsa
@ 2020-01-06 23:46   ` Alexei Starovoitov
  2020-01-07  8:30     ` Daniel Borkmann
  2020-01-07 13:05     ` Jiri Olsa
  0 siblings, 2 replies; 26+ messages in thread
From: Alexei Starovoitov @ 2020-01-06 23:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf,
	Andrii Nakryiko, Yonghong Song, Martin KaFai Lau, Jakub Kicinski,
	David Miller, bjorn.topel

On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
> When unwinding the stack we need to identify each
> address to successfully continue. Adding latch tree
> to keep trampolines for quick lookup during the
> unwind.
> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
...
> +bool is_bpf_trampoline(void *addr)
> +{
> +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
> +}
> +
>  struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>  {
>  	struct bpf_trampoline *tr;
> @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>  	for (i = 0; i < BPF_TRAMP_MAX; i++)
>  		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
>  	tr->image = image;
> +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);

Thanks for the fix. I was thinking to apply it, but then realized that bpf
dispatcher logic has the same issue.
Could you generalize the fix for both?
May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
Wdyt?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2020-01-06 23:46   ` Alexei Starovoitov
@ 2020-01-07  8:30     ` Daniel Borkmann
  2020-01-07 13:15       ` Jiri Olsa
  2020-01-07 13:05     ` Jiri Olsa
  1 sibling, 1 reply; 26+ messages in thread
From: Daniel Borkmann @ 2020-01-07  8:30 UTC (permalink / raw)
  To: Alexei Starovoitov, Jiri Olsa
  Cc: Alexei Starovoitov, netdev, bpf, Andrii Nakryiko, Yonghong Song,
	Martin KaFai Lau, Jakub Kicinski, David Miller, bjorn.topel

On 1/7/20 12:46 AM, Alexei Starovoitov wrote:
> On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
>> When unwinding the stack we need to identify each
>> address to successfully continue. Adding latch tree
>> to keep trampolines for quick lookup during the
>> unwind.
>>
>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ...
>> +bool is_bpf_trampoline(void *addr)
>> +{
>> +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
>> +}
>> +
>>   struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>   {
>>   	struct bpf_trampoline *tr;
>> @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>   	for (i = 0; i < BPF_TRAMP_MAX; i++)
>>   		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
>>   	tr->image = image;
>> +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
> 
> Thanks for the fix. I was thinking to apply it, but then realized that bpf
> dispatcher logic has the same issue.
> Could you generalize the fix for both?
> May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
> and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
> Wdyt?

Also this patch is buggy since your latch lookup happens under RCU, but
I don't see anything that waits a grace period once you remove from the
tree. Instead you free the trampoline right away.

On a different question, given we have all the kallsym infrastructure
for BPF already in place, did you look into whether it's feasible to
make it a bit more generic to also cover JITed buffers from trampolines?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/5] bpf: Allow non struct type for btf ctx access
  2020-01-06 21:36   ` Yonghong Song
@ 2020-01-07 12:13     ` Jiri Olsa
  2020-01-07 15:50       ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2020-01-07 12:13 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, netdev, bpf,
	Andrii Nakryiko, Martin Lau, Jakub Kicinski, David Miller

On Mon, Jan 06, 2020 at 09:36:17PM +0000, Yonghong Song wrote:
> 
> 
> On 12/29/19 6:37 AM, Jiri Olsa wrote:
> > I'm not sure why the restriction was added,
> > but I can't access pointers to POD types like
> > const char * when probing vfs_read function.
> > 
> > Removing the check and allow non struct type
> > access in context.
> > 
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >   kernel/bpf/btf.c | 6 ------
> >   1 file changed, 6 deletions(-)
> > 
> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index ed2075884724..ae90f60ac1b8 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -3712,12 +3712,6 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >   	/* skip modifiers */
> >   	while (btf_type_is_modifier(t))
> >   		t = btf_type_by_id(btf, t->type);
> > -	if (!btf_type_is_struct(t)) {
> > -		bpf_log(log,
> > -			"func '%s' arg%d type %s is not a struct\n",
> > -			tname, arg, btf_kind_str[BTF_INFO_KIND(t->info)]);
> > -		return false;
> > -	}
> 
> Hi, Jiri, the RFC looks great! Especially, you also referenced this will
> give great performance boost for bcc scripts.
> 
> Could you provide more context on why the above change is needed?
> The function btf_ctx_access is used to check validity of accessing
> function parameters which are wrapped inside a structure, I am wondering
> what kinds of accesses you tried to address here.

when I was transforming opensnoop.py to use this I got fail in
there when I tried to access filename arg in do_sys_open

but actualy it seems this should get recognized earlier by:

          if (btf_type_is_int(t))
                /* accessing a scalar */
                return true;

I'm not sure why it did not pass for const char*, I'll check

jirka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] bpf: Add bpf_perf_event_output_kfunc
  2020-01-06 23:27   ` Alexei Starovoitov
@ 2020-01-07 12:25     ` Jiri Olsa
  2020-01-07 22:13       ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2020-01-07 12:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, netdev, bpf,
	Andrii Nakryiko, Yonghong Song, Martin KaFai Lau, Jakub Kicinski,
	David Miller

On Mon, Jan 06, 2020 at 03:27:21PM -0800, Alexei Starovoitov wrote:
> On Sun, Dec 29, 2019 at 03:37:37PM +0100, Jiri Olsa wrote:
> > Adding support to use perf_event_output in
> > BPF_TRACE_FENTRY/BPF_TRACE_FEXIT programs.
> > 
> > There are no pt_regs available in the trampoline,
> > so getting one via bpf_kfunc_regs array.
> > 
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  kernel/trace/bpf_trace.c | 67 ++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 67 insertions(+)
> > 
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index e5ef4ae9edb5..1b270bbd9016 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -1151,6 +1151,69 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >  	}
> >  }
> >  
> > +struct bpf_kfunc_regs {
> > +	struct pt_regs regs[3];
> > +};
> > +
> > +static DEFINE_PER_CPU(struct bpf_kfunc_regs, bpf_kfunc_regs);
> > +static DEFINE_PER_CPU(int, bpf_kfunc_nest_level);
> 
> Thanks a bunch for working on it.
> 
> I don't understand why new regs array and nest level is needed.
> Can raw_tp_prog_func_proto() be reused as-is?
> Instead of patches 2,3,4 ?

I thought that we might want to trace functions within the
raw tracepoint call, which would be prevented if we used
the same nest variable

now I'm not sure if there's not some other issue with nesting
bpf programs like that.. I'll need to check

jirka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2020-01-06 23:46   ` Alexei Starovoitov
  2020-01-07  8:30     ` Daniel Borkmann
@ 2020-01-07 13:05     ` Jiri Olsa
  2020-01-07 13:30       ` Björn Töpel
  1 sibling, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2020-01-07 13:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, netdev, bpf,
	Andrii Nakryiko, Yonghong Song, Martin KaFai Lau, Jakub Kicinski,
	David Miller, bjorn.topel

On Mon, Jan 06, 2020 at 03:46:40PM -0800, Alexei Starovoitov wrote:
> On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
> > When unwinding the stack we need to identify each
> > address to successfully continue. Adding latch tree
> > to keep trampolines for quick lookup during the
> > unwind.
> > 
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ...
> > +bool is_bpf_trampoline(void *addr)
> > +{
> > +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
> > +}
> > +
> >  struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> >  {
> >  	struct bpf_trampoline *tr;
> > @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> >  	for (i = 0; i < BPF_TRAMP_MAX; i++)
> >  		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
> >  	tr->image = image;
> > +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
> 
> Thanks for the fix. I was thinking to apply it, but then realized that bpf
> dispatcher logic has the same issue.
> Could you generalize the fix for both?
> May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
> and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
> Wdyt?

I need to check the dispatcher code, but seems ok.. will check

jirka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2020-01-07  8:30     ` Daniel Borkmann
@ 2020-01-07 13:15       ` Jiri Olsa
  2020-01-07 19:30         ` Daniel Borkmann
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2020-01-07 13:15 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov, netdev, bpf,
	Andrii Nakryiko, Yonghong Song, Martin KaFai Lau, Jakub Kicinski,
	David Miller, bjorn.topel

On Tue, Jan 07, 2020 at 09:30:12AM +0100, Daniel Borkmann wrote:
> On 1/7/20 12:46 AM, Alexei Starovoitov wrote:
> > On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
> > > When unwinding the stack we need to identify each
> > > address to successfully continue. Adding latch tree
> > > to keep trampolines for quick lookup during the
> > > unwind.
> > > 
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ...
> > > +bool is_bpf_trampoline(void *addr)
> > > +{
> > > +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
> > > +}
> > > +
> > >   struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> > >   {
> > >   	struct bpf_trampoline *tr;
> > > @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> > >   	for (i = 0; i < BPF_TRAMP_MAX; i++)
> > >   		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
> > >   	tr->image = image;
> > > +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
> > 
> > Thanks for the fix. I was thinking to apply it, but then realized that bpf
> > dispatcher logic has the same issue.
> > Could you generalize the fix for both?
> > May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
> > and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
> > Wdyt?
> 
> Also this patch is buggy since your latch lookup happens under RCU, but
> I don't see anything that waits a grace period once you remove from the
> tree. Instead you free the trampoline right away.

thanks, did not think of that.. will (try to) fix ;-)

> 
> On a different question, given we have all the kallsym infrastructure
> for BPF already in place, did you look into whether it's feasible to
> make it a bit more generic to also cover JITed buffers from trampolines?
> 

hum, it did not occur to me that we want to see it in kallsyms,
but sure.. how about: bpf_trampoline_<key> ?

key would be taken from bpf_trampoline::key as function's BTF id

jirka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2020-01-07 13:05     ` Jiri Olsa
@ 2020-01-07 13:30       ` Björn Töpel
  2020-01-13  9:43         ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Björn Töpel @ 2020-01-07 13:30 UTC (permalink / raw)
  To: Jiri Olsa, Alexei Starovoitov
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, netdev, bpf,
	Andrii Nakryiko, Yonghong Song, Martin KaFai Lau, Jakub Kicinski,
	David Miller

On 2020-01-07 14:05, Jiri Olsa wrote:
> On Mon, Jan 06, 2020 at 03:46:40PM -0800, Alexei Starovoitov wrote:
>> On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
>>> When unwinding the stack we need to identify each
>>> address to successfully continue. Adding latch tree
>>> to keep trampolines for quick lookup during the
>>> unwind.
>>>
>>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>> ...
>>> +bool is_bpf_trampoline(void *addr)
>>> +{
>>> +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
>>> +}
>>> +
>>>   struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>>   {
>>>   	struct bpf_trampoline *tr;
>>> @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>>   	for (i = 0; i < BPF_TRAMP_MAX; i++)
>>>   		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
>>>   	tr->image = image;
>>> +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
>>
>> Thanks for the fix. I was thinking to apply it, but then realized that bpf
>> dispatcher logic has the same issue.
>> Could you generalize the fix for both?
>> May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
>> and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
>> Wdyt?
> 
> I need to check the dispatcher code, but seems ok.. will check
>

Thanks Jiri! The trampoline and dispatcher share the image allocation, 
so putting it there would make sense.

It's annoying that the dispatcher doesn't show up correctly in perf, and 
it's been on my list to fix that. Hopefully you beat me to it! :-D


Björn

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/5] bpf: Allow non struct type for btf ctx access
  2020-01-07 12:13     ` Jiri Olsa
@ 2020-01-07 15:50       ` Jiri Olsa
  2020-01-07 17:55         ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2020-01-07 15:50 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, netdev, bpf,
	Andrii Nakryiko, Martin Lau, Jakub Kicinski, David Miller

On Tue, Jan 07, 2020 at 01:13:23PM +0100, Jiri Olsa wrote:
> On Mon, Jan 06, 2020 at 09:36:17PM +0000, Yonghong Song wrote:
> > 
> > 
> > On 12/29/19 6:37 AM, Jiri Olsa wrote:
> > > I'm not sure why the restriction was added,
> > > but I can't access pointers to POD types like
> > > const char * when probing vfs_read function.
> > > 
> > > Removing the check and allow non struct type
> > > access in context.
> > > 
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > >   kernel/bpf/btf.c | 6 ------
> > >   1 file changed, 6 deletions(-)
> > > 
> > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > index ed2075884724..ae90f60ac1b8 100644
> > > --- a/kernel/bpf/btf.c
> > > +++ b/kernel/bpf/btf.c
> > > @@ -3712,12 +3712,6 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > >   	/* skip modifiers */
> > >   	while (btf_type_is_modifier(t))
> > >   		t = btf_type_by_id(btf, t->type);
> > > -	if (!btf_type_is_struct(t)) {
> > > -		bpf_log(log,
> > > -			"func '%s' arg%d type %s is not a struct\n",
> > > -			tname, arg, btf_kind_str[BTF_INFO_KIND(t->info)]);
> > > -		return false;
> > > -	}
> > 
> > Hi, Jiri, the RFC looks great! Especially, you also referenced this will
> > give great performance boost for bcc scripts.
> > 
> > Could you provide more context on why the above change is needed?
> > The function btf_ctx_access is used to check validity of accessing
> > function parameters which are wrapped inside a structure, I am wondering
> > what kinds of accesses you tried to address here.
> 
> when I was transforming opensnoop.py to use this I got fail in
> there when I tried to access filename arg in do_sys_open
> 
> but actualy it seems this should get recognized earlier by:
> 
>           if (btf_type_is_int(t))
>                 /* accessing a scalar */
>                 return true;
> 
> I'm not sure why it did not pass for const char*, I'll check

it seems we don't check for pointer to scalar (just void),
which is the case in my example 'const char *filename'

I'll post this in v2 with other changes

jirka


---
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index ed2075884724..650df4ed346e 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3633,7 +3633,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		    const struct bpf_prog *prog,
 		    struct bpf_insn_access_aux *info)
 {
-	const struct btf_type *t = prog->aux->attach_func_proto;
+	const struct btf_type *tp, *t = prog->aux->attach_func_proto;
 	struct bpf_prog *tgt_prog = prog->aux->linked_prog;
 	struct btf *btf = bpf_prog_get_target_btf(prog);
 	const char *tname = prog->aux->attach_func_name;
@@ -3695,6 +3695,17 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		 */
 		return true;
 
+	tp = btf_type_by_id(btf, t->type);
+	/* skip modifiers */
+	while (btf_type_is_modifier(tp))
+		tp = btf_type_by_id(btf, tp->type);
+
+	if (btf_type_is_int(tp))
+		/* This is a pointer scalar.
+		 * It is the same as scalar from the verifier safety pov.
+		 */
+		return true;
+
 	/* this is a pointer to another type */
 	info->reg_type = PTR_TO_BTF_ID;
 	info->btf_id = t->type;


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/5] bpf: Allow non struct type for btf ctx access
  2020-01-07 15:50       ` Jiri Olsa
@ 2020-01-07 17:55         ` Yonghong Song
  2020-01-07 18:28           ` Yonghong Song
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2020-01-07 17:55 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, netdev, bpf,
	Andrii Nakryiko, Martin Lau, Jakub Kicinski, David Miller



On 1/7/20 7:50 AM, Jiri Olsa wrote:
> On Tue, Jan 07, 2020 at 01:13:23PM +0100, Jiri Olsa wrote:
>> On Mon, Jan 06, 2020 at 09:36:17PM +0000, Yonghong Song wrote:
>>>
>>>
>>> On 12/29/19 6:37 AM, Jiri Olsa wrote:
>>>> I'm not sure why the restriction was added,
>>>> but I can't access pointers to POD types like
>>>> const char * when probing vfs_read function.
>>>>
>>>> Removing the check and allow non struct type
>>>> access in context.
>>>>
>>>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>>>> ---
>>>>    kernel/bpf/btf.c | 6 ------
>>>>    1 file changed, 6 deletions(-)
>>>>
>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>> index ed2075884724..ae90f60ac1b8 100644
>>>> --- a/kernel/bpf/btf.c
>>>> +++ b/kernel/bpf/btf.c
>>>> @@ -3712,12 +3712,6 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>>>>    	/* skip modifiers */
>>>>    	while (btf_type_is_modifier(t))
>>>>    		t = btf_type_by_id(btf, t->type);
>>>> -	if (!btf_type_is_struct(t)) {
>>>> -		bpf_log(log,
>>>> -			"func '%s' arg%d type %s is not a struct\n",
>>>> -			tname, arg, btf_kind_str[BTF_INFO_KIND(t->info)]);
>>>> -		return false;
>>>> -	}
>>>
>>> Hi, Jiri, the RFC looks great! Especially, you also referenced this will
>>> give great performance boost for bcc scripts.
>>>
>>> Could you provide more context on why the above change is needed?
>>> The function btf_ctx_access is used to check validity of accessing
>>> function parameters which are wrapped inside a structure, I am wondering
>>> what kinds of accesses you tried to address here.
>>
>> when I was transforming opensnoop.py to use this I got fail in
>> there when I tried to access filename arg in do_sys_open
>>
>> but actualy it seems this should get recognized earlier by:
>>
>>            if (btf_type_is_int(t))
>>                  /* accessing a scalar */
>>                  return true;
>>
>> I'm not sure why it did not pass for const char*, I'll check
> 
> it seems we don't check for pointer to scalar (just void),
> which is the case in my example 'const char *filename'

Thanks for clarification. See some comments below.

> 
> I'll post this in v2 with other changes
> 
> jirka
> 
> 
> ---
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index ed2075884724..650df4ed346e 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3633,7 +3633,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>   		    const struct bpf_prog *prog,
>   		    struct bpf_insn_access_aux *info)
>   {
> -	const struct btf_type *t = prog->aux->attach_func_proto;
> +	const struct btf_type *tp, *t = prog->aux->attach_func_proto;
>   	struct bpf_prog *tgt_prog = prog->aux->linked_prog;
>   	struct btf *btf = bpf_prog_get_target_btf(prog);
>   	const char *tname = prog->aux->attach_func_name;
> @@ -3695,6 +3695,17 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>   		 */
>   		return true;
>   
> +	tp = btf_type_by_id(btf, t->type);
> +	/* skip modifiers */
> +	while (btf_type_is_modifier(tp))
> +		tp = btf_type_by_id(btf, tp->type);
> +
> +	if (btf_type_is_int(tp))
> +		/* This is a pointer scalar.
> +		 * It is the same as scalar from the verifier safety pov.
> +		 */
> +		return true;

This should work since:
    - the int pointer will be treated as a scalar later on
    - bpf_probe_read() will be used to read the contents

I am wondering whether we should add proper verifier support
to allow pointer to int ctx access. There, users do not need
to use bpf_probe_read() to dereference the pointer.

Discussed with Martin, maybe somewhere in check_ptr_to_btf_access(),
before btf_struct_access(), checking if it is a pointer to int/enum,
it should just allow and return SCALAR_VALUE.

If you do verifier changes, please ensure bpf_probe_read() is not
needed any more. In bcc, you need to hack to prevent rewriter to
re-introduce bpf_probe_read() :-).

> +
>   	/* this is a pointer to another type */
>   	info->reg_type = PTR_TO_BTF_ID;
>   	info->btf_id = t->type;
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: [PATCH 1/5] bpf: Allow non struct type for btf ctx access
  2020-01-07 17:55         ` Yonghong Song
@ 2020-01-07 18:28           ` Yonghong Song
  2020-01-08 14:38             ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Yonghong Song @ 2020-01-07 18:28 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, netdev, bpf,
	Andrii Nakryiko, Martin Lau, Jakub Kicinski, David Miller



On 1/7/20 9:55 AM, Yonghong Song wrote:
> 
> 
> On 1/7/20 7:50 AM, Jiri Olsa wrote:
>> On Tue, Jan 07, 2020 at 01:13:23PM +0100, Jiri Olsa wrote:
>>> On Mon, Jan 06, 2020 at 09:36:17PM +0000, Yonghong Song wrote:
>>>>
>>>>
>>>> On 12/29/19 6:37 AM, Jiri Olsa wrote:
>>>>> I'm not sure why the restriction was added,
>>>>> but I can't access pointers to POD types like
>>>>> const char * when probing vfs_read function.
>>>>>
>>>>> Removing the check and allow non struct type
>>>>> access in context.
>>>>>
>>>>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>>>>> ---
>>>>>     kernel/bpf/btf.c | 6 ------
>>>>>     1 file changed, 6 deletions(-)
>>>>>
>>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>>> index ed2075884724..ae90f60ac1b8 100644
>>>>> --- a/kernel/bpf/btf.c
>>>>> +++ b/kernel/bpf/btf.c
>>>>> @@ -3712,12 +3712,6 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>>>>>     	/* skip modifiers */
>>>>>     	while (btf_type_is_modifier(t))
>>>>>     		t = btf_type_by_id(btf, t->type);
>>>>> -	if (!btf_type_is_struct(t)) {
>>>>> -		bpf_log(log,
>>>>> -			"func '%s' arg%d type %s is not a struct\n",
>>>>> -			tname, arg, btf_kind_str[BTF_INFO_KIND(t->info)]);
>>>>> -		return false;
>>>>> -	}
>>>>
>>>> Hi, Jiri, the RFC looks great! Especially, you also referenced this will
>>>> give great performance boost for bcc scripts.
>>>>
>>>> Could you provide more context on why the above change is needed?
>>>> The function btf_ctx_access is used to check validity of accessing
>>>> function parameters which are wrapped inside a structure, I am wondering
>>>> what kinds of accesses you tried to address here.
>>>
>>> when I was transforming opensnoop.py to use this I got fail in
>>> there when I tried to access filename arg in do_sys_open
>>>
>>> but actualy it seems this should get recognized earlier by:
>>>
>>>             if (btf_type_is_int(t))
>>>                   /* accessing a scalar */
>>>                   return true;
>>>
>>> I'm not sure why it did not pass for const char*, I'll check
>>
>> it seems we don't check for pointer to scalar (just void),
>> which is the case in my example 'const char *filename'
> 
> Thanks for clarification. See some comments below.
> 
>>
>> I'll post this in v2 with other changes
>>
>> jirka
>>
>>
>> ---
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index ed2075884724..650df4ed346e 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -3633,7 +3633,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>>    		    const struct bpf_prog *prog,
>>    		    struct bpf_insn_access_aux *info)
>>    {
>> -	const struct btf_type *t = prog->aux->attach_func_proto;
>> +	const struct btf_type *tp, *t = prog->aux->attach_func_proto;
>>    	struct bpf_prog *tgt_prog = prog->aux->linked_prog;
>>    	struct btf *btf = bpf_prog_get_target_btf(prog);
>>    	const char *tname = prog->aux->attach_func_name;
>> @@ -3695,6 +3695,17 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>>    		 */
>>    		return true;
>>    
>> +	tp = btf_type_by_id(btf, t->type);
>> +	/* skip modifiers */
>> +	while (btf_type_is_modifier(tp))
>> +		tp = btf_type_by_id(btf, tp->type);
>> +
>> +	if (btf_type_is_int(tp))
>> +		/* This is a pointer scalar.
>> +		 * It is the same as scalar from the verifier safety pov.
>> +		 */
>> +		return true;
> 
> This should work since:
>      - the int pointer will be treated as a scalar later on
>      - bpf_probe_read() will be used to read the contents
> 
> I am wondering whether we should add proper verifier support
> to allow pointer to int ctx access. There, users do not need
> to use bpf_probe_read() to dereference the pointer.
> 
> Discussed with Martin, maybe somewhere in check_ptr_to_btf_access(),
> before btf_struct_access(), checking if it is a pointer to int/enum,
> it should just allow and return SCALAR_VALUE.

double checked check_ptr_to_btf_access() and btf_struct_access().
btf_struct_access() already returns SCALAR_VALUE for pointer to 
int/enum. So verifier change is probably not needed.

In your above code, could you do
    btf_type_is_int(t) || btf_type_is_enum(t)
which will cover pointer to enum as well?

> 
> If you do verifier changes, please ensure bpf_probe_read() is not
> needed any more. In bcc, you need to hack to prevent rewriter to
> re-introduce bpf_probe_read() :-).
> 
>> +
>>    	/* this is a pointer to another type */
>>    	info->reg_type = PTR_TO_BTF_ID;
>>    	info->btf_id = t->type;
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2020-01-07 13:15       ` Jiri Olsa
@ 2020-01-07 19:30         ` Daniel Borkmann
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Borkmann @ 2020-01-07 19:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov, netdev, bpf,
	Andrii Nakryiko, Yonghong Song, Martin KaFai Lau, Jakub Kicinski,
	David Miller, bjorn.topel

On 1/7/20 2:15 PM, Jiri Olsa wrote:
> On Tue, Jan 07, 2020 at 09:30:12AM +0100, Daniel Borkmann wrote:
>> On 1/7/20 12:46 AM, Alexei Starovoitov wrote:
>>> On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
>>>> When unwinding the stack we need to identify each
>>>> address to successfully continue. Adding latch tree
>>>> to keep trampolines for quick lookup during the
>>>> unwind.
>>>>
>>>> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
>>> ...
>>>> +bool is_bpf_trampoline(void *addr)
>>>> +{
>>>> +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
>>>> +}
>>>> +
>>>>    struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>>>    {
>>>>    	struct bpf_trampoline *tr;
>>>> @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>>>>    	for (i = 0; i < BPF_TRAMP_MAX; i++)
>>>>    		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
>>>>    	tr->image = image;
>>>> +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
>>>
>>> Thanks for the fix. I was thinking to apply it, but then realized that bpf
>>> dispatcher logic has the same issue.
>>> Could you generalize the fix for both?
>>> May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
>>> and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
>>> Wdyt?
>>
>> Also this patch is buggy since your latch lookup happens under RCU, but
>> I don't see anything that waits a grace period once you remove from the
>> tree. Instead you free the trampoline right away.
> 
> thanks, did not think of that.. will (try to) fix ;-)
> 
>> On a different question, given we have all the kallsym infrastructure
>> for BPF already in place, did you look into whether it's feasible to
>> make it a bit more generic to also cover JITed buffers from trampolines?
> 
> hum, it did not occur to me that we want to see it in kallsyms,
> but sure.. how about: bpf_trampoline_<key> ?
> 
> key would be taken from bpf_trampoline::key as function's BTF id

Yeap, I think bpf_trampoline_<btf_id> would make sense here.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] bpf: Add bpf_perf_event_output_kfunc
  2020-01-07 12:25     ` Jiri Olsa
@ 2020-01-07 22:13       ` Alexei Starovoitov
  2020-01-08 10:24         ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2020-01-07 22:13 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Network Development, bpf, Andrii Nakryiko, Yonghong Song,
	Martin KaFai Lau, Jakub Kicinski, David Miller

On Tue, Jan 7, 2020 at 4:25 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Mon, Jan 06, 2020 at 03:27:21PM -0800, Alexei Starovoitov wrote:
> > On Sun, Dec 29, 2019 at 03:37:37PM +0100, Jiri Olsa wrote:
> > > Adding support to use perf_event_output in
> > > BPF_TRACE_FENTRY/BPF_TRACE_FEXIT programs.
> > >
> > > There are no pt_regs available in the trampoline,
> > > so getting one via bpf_kfunc_regs array.
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> > >  kernel/trace/bpf_trace.c | 67 ++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 67 insertions(+)
> > >
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index e5ef4ae9edb5..1b270bbd9016 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -1151,6 +1151,69 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > >     }
> > >  }
> > >
> > > +struct bpf_kfunc_regs {
> > > +   struct pt_regs regs[3];
> > > +};
> > > +
> > > +static DEFINE_PER_CPU(struct bpf_kfunc_regs, bpf_kfunc_regs);
> > > +static DEFINE_PER_CPU(int, bpf_kfunc_nest_level);
> >
> > Thanks a bunch for working on it.
> >
> > I don't understand why new regs array and nest level is needed.
> > Can raw_tp_prog_func_proto() be reused as-is?
> > Instead of patches 2,3,4 ?
>
> I thought that we might want to trace functions within the
> raw tracepoint call, which would be prevented if we used
> the same nest variable
>
> now I'm not sure if there's not some other issue with nesting
> bpf programs like that.. I'll need to check

but nesting is what bpf_raw_tp_nest_level suppose to solve, no?
I just realized that we already have three *_nest_level counters
in that file. Not sure why one is not enough.
There was an issue in the past when tracepoint, kprobe and skb
collided and we had nasty memory corruption, but that was before
_nest_level was introduced. Not sure how we got to three independent
counters.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] bpf: Add bpf_perf_event_output_kfunc
  2020-01-07 22:13       ` Alexei Starovoitov
@ 2020-01-08 10:24         ` Jiri Olsa
  0 siblings, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2020-01-08 10:24 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Network Development, bpf, Andrii Nakryiko, Yonghong Song,
	Martin KaFai Lau, Jakub Kicinski, David Miller

On Tue, Jan 07, 2020 at 02:13:42PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 7, 2020 at 4:25 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Mon, Jan 06, 2020 at 03:27:21PM -0800, Alexei Starovoitov wrote:
> > > On Sun, Dec 29, 2019 at 03:37:37PM +0100, Jiri Olsa wrote:
> > > > Adding support to use perf_event_output in
> > > > BPF_TRACE_FENTRY/BPF_TRACE_FEXIT programs.
> > > >
> > > > There are no pt_regs available in the trampoline,
> > > > so getting one via bpf_kfunc_regs array.
> > > >
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > > ---
> > > >  kernel/trace/bpf_trace.c | 67 ++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 67 insertions(+)
> > > >
> > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > index e5ef4ae9edb5..1b270bbd9016 100644
> > > > --- a/kernel/trace/bpf_trace.c
> > > > +++ b/kernel/trace/bpf_trace.c
> > > > @@ -1151,6 +1151,69 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > > >     }
> > > >  }
> > > >
> > > > +struct bpf_kfunc_regs {
> > > > +   struct pt_regs regs[3];
> > > > +};
> > > > +
> > > > +static DEFINE_PER_CPU(struct bpf_kfunc_regs, bpf_kfunc_regs);
> > > > +static DEFINE_PER_CPU(int, bpf_kfunc_nest_level);
> > >
> > > Thanks a bunch for working on it.
> > >
> > > I don't understand why new regs array and nest level is needed.
> > > Can raw_tp_prog_func_proto() be reused as-is?
> > > Instead of patches 2,3,4 ?
> >
> > I thought that we might want to trace functions within the
> > raw tracepoint call, which would be prevented if we used
> > the same nest variable
> >
> > now I'm not sure if there's not some other issue with nesting
> > bpf programs like that.. I'll need to check
> 
> but nesting is what bpf_raw_tp_nest_level suppose to solve, no?
> I just realized that we already have three *_nest_level counters
> in that file. Not sure why one is not enough.
> There was an issue in the past when tracepoint, kprobe and skb
> collided and we had nasty memory corruption, but that was before
> _nest_level was introduced. Not sure how we got to three independent
> counters.

ok, I'm not sure what was the initial impuls for that now,
I'll make it share the counter with raw tracepoints

jirka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Re: [PATCH 1/5] bpf: Allow non struct type for btf ctx access
  2020-01-07 18:28           ` Yonghong Song
@ 2020-01-08 14:38             ` Jiri Olsa
  0 siblings, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2020-01-08 14:38 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, netdev, bpf,
	Andrii Nakryiko, Martin Lau, Jakub Kicinski, David Miller

On Tue, Jan 07, 2020 at 06:28:11PM +0000, Yonghong Song wrote:

SNIP

> >> index ed2075884724..650df4ed346e 100644
> >> --- a/kernel/bpf/btf.c
> >> +++ b/kernel/bpf/btf.c
> >> @@ -3633,7 +3633,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >>    		    const struct bpf_prog *prog,
> >>    		    struct bpf_insn_access_aux *info)
> >>    {
> >> -	const struct btf_type *t = prog->aux->attach_func_proto;
> >> +	const struct btf_type *tp, *t = prog->aux->attach_func_proto;
> >>    	struct bpf_prog *tgt_prog = prog->aux->linked_prog;
> >>    	struct btf *btf = bpf_prog_get_target_btf(prog);
> >>    	const char *tname = prog->aux->attach_func_name;
> >> @@ -3695,6 +3695,17 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >>    		 */
> >>    		return true;
> >>    
> >> +	tp = btf_type_by_id(btf, t->type);
> >> +	/* skip modifiers */
> >> +	while (btf_type_is_modifier(tp))
> >> +		tp = btf_type_by_id(btf, tp->type);
> >> +
> >> +	if (btf_type_is_int(tp))
> >> +		/* This is a pointer scalar.
> >> +		 * It is the same as scalar from the verifier safety pov.
> >> +		 */
> >> +		return true;
> > 
> > This should work since:
> >      - the int pointer will be treated as a scalar later on
> >      - bpf_probe_read() will be used to read the contents
> > 
> > I am wondering whether we should add proper verifier support
> > to allow pointer to int ctx access. There, users do not need
> > to use bpf_probe_read() to dereference the pointer.
> > 
> > Discussed with Martin, maybe somewhere in check_ptr_to_btf_access(),
> > before btf_struct_access(), checking if it is a pointer to int/enum,
> > it should just allow and return SCALAR_VALUE.
> 
> double checked check_ptr_to_btf_access() and btf_struct_access().
> btf_struct_access() already returns SCALAR_VALUE for pointer to 
> int/enum. So verifier change is probably not needed.

ok, great

> 
> In your above code, could you do
>     btf_type_is_int(t) || btf_type_is_enum(t)
> which will cover pointer to enum as well?

sure, I'll include that

thanks,
jirka


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2020-01-07 13:30       ` Björn Töpel
@ 2020-01-13  9:43         ` Jiri Olsa
  2020-01-13 12:21           ` Björn Töpel
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Olsa @ 2020-01-13  9:43 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, netdev, bpf, Andrii Nakryiko, Yonghong Song,
	Martin KaFai Lau, Jakub Kicinski, David Miller

On Tue, Jan 07, 2020 at 02:30:35PM +0100, Björn Töpel wrote:
> On 2020-01-07 14:05, Jiri Olsa wrote:
> > On Mon, Jan 06, 2020 at 03:46:40PM -0800, Alexei Starovoitov wrote:
> > > On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
> > > > When unwinding the stack we need to identify each
> > > > address to successfully continue. Adding latch tree
> > > > to keep trampolines for quick lookup during the
> > > > unwind.
> > > > 
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ...
> > > > +bool is_bpf_trampoline(void *addr)
> > > > +{
> > > > +	return latch_tree_find(addr, &tree, &tree_ops) != NULL;
> > > > +}
> > > > +
> > > >   struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> > > >   {
> > > >   	struct bpf_trampoline *tr;
> > > > @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> > > >   	for (i = 0; i < BPF_TRAMP_MAX; i++)
> > > >   		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
> > > >   	tr->image = image;
> > > > +	latch_tree_insert(&tr->tnode, &tree, &tree_ops);
> > > 
> > > Thanks for the fix. I was thinking to apply it, but then realized that bpf
> > > dispatcher logic has the same issue.
> > > Could you generalize the fix for both?
> > > May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
> > > and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
> > > Wdyt?
> > 
> > I need to check the dispatcher code, but seems ok.. will check
> > 
> 
> Thanks Jiri! The trampoline and dispatcher share the image allocation, so
> putting it there would make sense.
> 
> It's annoying that the dispatcher doesn't show up correctly in perf, and
> it's been on my list to fix that. Hopefully you beat me to it! :-D

hi,
attached patch seems to work for me (trampoline usecase), but I don't know
how to test it for dispatcher.. also I need to check if we need to decrease
BPF_TRAMP_MAX or BPF_DISPATCHER_MAX, it might take more time ;-)

jirka


---
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index aed2bc39d72b..e0ca8780dc7a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -510,7 +510,6 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
 int bpf_trampoline_link_prog(struct bpf_prog *prog);
 int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
-void *bpf_jit_alloc_exec_page(void);
 #define BPF_DISPATCHER_INIT(name) {			\
 	.mutex = __MUTEX_INITIALIZER(name.mutex),	\
 	.func = &name##func,				\
@@ -542,6 +541,13 @@ void *bpf_jit_alloc_exec_page(void);
 #define BPF_DISPATCHER_PTR(name) (&name)
 void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
 				struct bpf_prog *to);
+struct bpf_image {
+	struct latch_tree_node tnode;
+	unsigned char data[];
+};
+#define BPF_IMAGE_SIZE (PAGE_SIZE - sizeof(struct bpf_image))
+bool is_bpf_image(void *addr);
+void *bpf_image_alloc(void);
 #else
 static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 {
@@ -563,6 +569,10 @@ static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
 static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d,
 					      struct bpf_prog *from,
 					      struct bpf_prog *to) {}
+static inline bool is_bpf_image(void *addr)
+{
+	return false;
+}
 #endif
 
 struct bpf_func_info_aux {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 29d47aae0dd1..53dc3adf6077 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -704,6 +704,8 @@ bool is_bpf_text_address(unsigned long addr)
 
 	rcu_read_lock();
 	ret = bpf_prog_kallsyms_find(addr) != NULL;
+	if (!ret)
+		ret = is_bpf_image((void*) addr);
 	rcu_read_unlock();
 
 	return ret;
diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
index 204ee61a3904..b3e5b214fed8 100644
--- a/kernel/bpf/dispatcher.c
+++ b/kernel/bpf/dispatcher.c
@@ -113,7 +113,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
 		noff = 0;
 	} else {
 		old = d->image + d->image_off;
-		noff = d->image_off ^ (PAGE_SIZE / 2);
+		noff = d->image_off ^ (BPF_IMAGE_SIZE / 2);
 	}
 
 	new = d->num_progs ? d->image + noff : NULL;
@@ -140,7 +140,7 @@ void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
 
 	mutex_lock(&d->mutex);
 	if (!d->image) {
-		d->image = bpf_jit_alloc_exec_page();
+		d->image = bpf_image_alloc();
 		if (!d->image)
 			goto out;
 	}
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 79a04417050d..3ea56f89c68a 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -4,6 +4,7 @@
 #include <linux/bpf.h>
 #include <linux/filter.h>
 #include <linux/ftrace.h>
+#include <linux/rbtree_latch.h>
 
 /* btf_vmlinux has ~22k attachable functions. 1k htab is enough. */
 #define TRAMPOLINE_HASH_BITS 10
@@ -14,7 +15,12 @@ static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
 /* serializes access to trampoline_table */
 static DEFINE_MUTEX(trampoline_mutex);
 
-void *bpf_jit_alloc_exec_page(void)
+static struct latch_tree_root image_tree __cacheline_aligned;
+
+/* serializes access to image_tree */
+static DEFINE_MUTEX(image_mutex);
+
+static void *bpf_jit_alloc_exec_page(void)
 {
 	void *image;
 
@@ -30,6 +36,62 @@ void *bpf_jit_alloc_exec_page(void)
 	return image;
 }
 
+static __always_inline bool image_tree_less(struct latch_tree_node *a,
+				      struct latch_tree_node *b)
+{
+	struct bpf_image *ia = container_of(a, struct bpf_image, tnode);
+	struct bpf_image *ib = container_of(b, struct bpf_image, tnode);
+
+	return ia < ib;
+}
+
+static __always_inline int image_tree_comp(void *addr, struct latch_tree_node *n)
+{
+	void *image = container_of(n, struct bpf_image, tnode);
+
+	if (addr < image)
+		return -1;
+	if (addr >= image + PAGE_SIZE)
+		return 1;
+
+	return 0;
+}
+
+static const struct latch_tree_ops image_tree_ops = {
+	.less	= image_tree_less,
+	.comp	= image_tree_comp,
+};
+
+void *bpf_image_alloc(void)
+{
+	struct bpf_image *image;
+
+	image = bpf_jit_alloc_exec_page();
+	if (!image)
+		return NULL;
+
+	mutex_lock(&image_mutex);
+	latch_tree_insert(&image->tnode, &image_tree, &image_tree_ops);
+	mutex_unlock(&image_mutex);
+	return image->data;
+}
+
+void bpf_image_delete(void *ptr)
+{
+	struct bpf_image *image = container_of(ptr, struct bpf_image, data);
+
+	mutex_lock(&image_mutex);
+	latch_tree_erase(&image->tnode, &image_tree, &image_tree_ops);
+	synchronize_rcu();
+	bpf_jit_free_exec(image);
+	mutex_unlock(&image_mutex);
+}
+
+bool is_bpf_image(void *addr)
+{
+	return latch_tree_find(addr, &image_tree, &image_tree_ops) != NULL;
+}
+
 struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 {
 	struct bpf_trampoline *tr;
@@ -50,7 +112,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 		goto out;
 
 	/* is_root was checked earlier. No need for bpf_jit_charge_modmem() */
-	image = bpf_jit_alloc_exec_page();
+	image = bpf_image_alloc();
 	if (!image) {
 		kfree(tr);
 		tr = NULL;
@@ -125,14 +187,14 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
 }
 
 /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
- * bytes on x86.  Pick a number to fit into PAGE_SIZE / 2
+ * bytes on x86.  Pick a number to fit into BPF_IMAGE_SIZE / 2
  */
 #define BPF_MAX_TRAMP_PROGS 40
 
 static int bpf_trampoline_update(struct bpf_trampoline *tr)
 {
-	void *old_image = tr->image + ((tr->selector + 1) & 1) * PAGE_SIZE/2;
-	void *new_image = tr->image + (tr->selector & 1) * PAGE_SIZE/2;
+	void *old_image = tr->image + ((tr->selector + 1) & 1) * BPF_IMAGE_SIZE/2;
+	void *new_image = tr->image + (tr->selector & 1) * BPF_IMAGE_SIZE/2;
 	struct bpf_prog *progs_to_run[BPF_MAX_TRAMP_PROGS];
 	int fentry_cnt = tr->progs_cnt[BPF_TRAMP_FENTRY];
 	int fexit_cnt = tr->progs_cnt[BPF_TRAMP_FEXIT];
@@ -160,7 +222,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
 	if (fexit_cnt)
 		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
 
-	err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2,
+	err = arch_prepare_bpf_trampoline(new_image, new_image + BPF_IMAGE_SIZE / 2,
 					  &tr->func.model, flags,
 					  fentry, fentry_cnt,
 					  fexit, fexit_cnt,
@@ -251,7 +313,7 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
 		goto out;
 	if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT])))
 		goto out;
-	bpf_jit_free_exec(tr->image);
+	bpf_image_delete(tr->image);
 	hlist_del(&tr->hlist);
 	kfree(tr);
 out:


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2020-01-13  9:43         ` Jiri Olsa
@ 2020-01-13 12:21           ` Björn Töpel
  2020-01-13 12:31             ` Björn Töpel
  0 siblings, 1 reply; 26+ messages in thread
From: Björn Töpel @ 2020-01-13 12:21 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, netdev, bpf, Andrii Nakryiko, Yonghong Song,
	Martin KaFai Lau, Jakub Kicinski, David Miller


On 2020-01-13 10:43, Jiri Olsa wrote:
> hi,
> attached patch seems to work for me (trampoline usecase), but I don't know
> how to test it for dispatcher.. also I need to check if we need to decrease
> BPF_TRAMP_MAX or BPF_DISPATCHER_MAX, it might take more time;-)
> 

Thanks for working on it! I'll take the patch for a spin.

To test the dispatcher, just run XDP!

With your change, the BPF_DISPATCHER_MAX is still valid. 48 entries =>
1890B which is < (BPF_IMAGE_SIZE / 2).


Cheers,
Björn

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2020-01-13 12:21           ` Björn Töpel
@ 2020-01-13 12:31             ` Björn Töpel
  2020-01-13 12:37               ` Jiri Olsa
  0 siblings, 1 reply; 26+ messages in thread
From: Björn Töpel @ 2020-01-13 12:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, netdev, bpf, Andrii Nakryiko, Yonghong Song,
	Martin KaFai Lau, Jakub Kicinski, David Miller

On 2020-01-13 13:21, Björn Töpel wrote:
> 
> On 2020-01-13 10:43, Jiri Olsa wrote:
>> hi,
>> attached patch seems to work for me (trampoline usecase), but I don't 
>> know
>> how to test it for dispatcher.. also I need to check if we need to 
>> decrease
>> BPF_TRAMP_MAX or BPF_DISPATCHER_MAX, it might take more time;-)
>>
> 
> Thanks for working on it! I'll take the patch for a spin.
> 
> To test the dispatcher, just run XDP!
> 
> With your change, the BPF_DISPATCHER_MAX is still valid. 48 entries =>
> 1890B which is < (BPF_IMAGE_SIZE / 2).
>

...and FWIW, it would be nice with bpf_dispatcher_<...> entries in 
kallsyms as well. If that code could be shared with the trampoline code 
as well (bpf_trampoline_<btf_id>), that'd be great!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
  2020-01-13 12:31             ` Björn Töpel
@ 2020-01-13 12:37               ` Jiri Olsa
  0 siblings, 0 replies; 26+ messages in thread
From: Jiri Olsa @ 2020-01-13 12:37 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, netdev, bpf, Andrii Nakryiko, Yonghong Song,
	Martin KaFai Lau, Jakub Kicinski, David Miller

On Mon, Jan 13, 2020 at 01:31:38PM +0100, Björn Töpel wrote:
> On 2020-01-13 13:21, Björn Töpel wrote:
> > 
> > On 2020-01-13 10:43, Jiri Olsa wrote:
> > > hi,
> > > attached patch seems to work for me (trampoline usecase), but I
> > > don't know
> > > how to test it for dispatcher.. also I need to check if we need to
> > > decrease
> > > BPF_TRAMP_MAX or BPF_DISPATCHER_MAX, it might take more time;-)
> > > 
> > 
> > Thanks for working on it! I'll take the patch for a spin.
> > 
> > To test the dispatcher, just run XDP!
> > 
> > With your change, the BPF_DISPATCHER_MAX is still valid. 48 entries =>
> > 1890B which is < (BPF_IMAGE_SIZE / 2).

great

> > 
> 
> ...and FWIW, it would be nice with bpf_dispatcher_<...> entries in kallsyms

ok so it'd be 'bpf_dispatcher_<name>'

from DEFINE_BPF_DISPATCHER(name)

> as well. If that code could be shared with the trampoline code as well
> (bpf_trampoline_<btf_id>), that'd be great!
> 

ok, will add it

thanks,
jirka


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, back to index

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-29 14:37 [RFC 0/5] bpf: Add trampoline helpers Jiri Olsa
2019-12-29 14:37 ` [PATCH 1/5] bpf: Allow non struct type for btf ctx access Jiri Olsa
2020-01-06 21:36   ` Yonghong Song
2020-01-07 12:13     ` Jiri Olsa
2020-01-07 15:50       ` Jiri Olsa
2020-01-07 17:55         ` Yonghong Song
2020-01-07 18:28           ` Yonghong Song
2020-01-08 14:38             ` Jiri Olsa
2019-12-29 14:37 ` [PATCH 2/5] bpf: Add bpf_perf_event_output_kfunc Jiri Olsa
2020-01-06 23:27   ` Alexei Starovoitov
2020-01-07 12:25     ` Jiri Olsa
2020-01-07 22:13       ` Alexei Starovoitov
2020-01-08 10:24         ` Jiri Olsa
2019-12-29 14:37 ` [PATCH 3/5] bpf: Add bpf_get_stackid_kfunc Jiri Olsa
2019-12-29 14:37 ` [PATCH 4/5] bpf: Add bpf_get_stack_kfunc Jiri Olsa
2019-12-29 14:37 ` [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind Jiri Olsa
2020-01-06 23:46   ` Alexei Starovoitov
2020-01-07  8:30     ` Daniel Borkmann
2020-01-07 13:15       ` Jiri Olsa
2020-01-07 19:30         ` Daniel Borkmann
2020-01-07 13:05     ` Jiri Olsa
2020-01-07 13:30       ` Björn Töpel
2020-01-13  9:43         ` Jiri Olsa
2020-01-13 12:21           ` Björn Töpel
2020-01-13 12:31             ` Björn Töpel
2020-01-13 12:37               ` Jiri Olsa

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git