netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/6] bpf: add bpf_get_stack_helper
@ 2018-04-06 21:48 Yonghong Song
  2018-04-06 21:48 ` [RFC PATCH bpf-next 1/6] bpf: change prototype for stack_map_get_build_id_offset Yonghong Song
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Yonghong Song @ 2018-04-06 21:48 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

Currently, stackmap and bpf_get_stackid helper are provided
for bpf program to get the stack trace. This approach has
a limitation though. If two stack traces have the same hash,
only one will get stored in the stackmap table,
so some stack traces are missing from user perspective.

This patch implements a new helper, bpf_get_stack, will
send stack traces directly to bpf program. The bpf program
is able to see all stack traces, and then can do in-kernel
processing or send stack traces to user space through
shared map or bpf_perf_event_output.

Patches #1 and #2 implemented the core kernel support.
Patch #3 synced the new helper to tools headers.
Patches #4 and #5 added a test in samples/bpf by attaching
to a kprobe, and Patch #6 added a test in tools/bpf by
attaching to a tracepoint.

Yonghong Song (6):
  bpf: change prototype for stack_map_get_build_id_offset
  bpf: add bpf_get_stack helper
  tools/bpf: add bpf_get_stack helper to tools headers
  samples/bpf: move common-purpose perf_event functions to bpf_load.c
  samples/bpf: add a test for bpf_get_stack helper
  tools/bpf: add a test case for bpf_get_stack helper

 include/linux/bpf.h                               |   1 +
 include/linux/filter.h                            |   3 +-
 include/uapi/linux/bpf.h                          |  17 ++-
 kernel/bpf/stackmap.c                             |  69 ++++++++--
 kernel/bpf/syscall.c                              |  12 +-
 kernel/bpf/verifier.c                             |   3 +
 kernel/trace/bpf_trace.c                          |  50 +++++++-
 samples/bpf/Makefile                              |   4 +
 samples/bpf/bpf_load.c                            | 104 +++++++++++++++
 samples/bpf/bpf_load.h                            |   5 +
 samples/bpf/trace_get_stack_kern.c                |  80 ++++++++++++
 samples/bpf/trace_get_stack_user.c                | 150 ++++++++++++++++++++++
 samples/bpf/trace_output_user.c                   | 113 ++--------------
 tools/include/uapi/linux/bpf.h                    |  17 ++-
 tools/testing/selftests/bpf/bpf_helpers.h         |   2 +
 tools/testing/selftests/bpf/test_progs.c          |  41 +++++-
 tools/testing/selftests/bpf/test_stacktrace_map.c |  20 ++-
 17 files changed, 568 insertions(+), 123 deletions(-)
 create mode 100644 samples/bpf/trace_get_stack_kern.c
 create mode 100644 samples/bpf/trace_get_stack_user.c

-- 
2.9.5

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH bpf-next 1/6] bpf: change prototype for stack_map_get_build_id_offset
  2018-04-06 21:48 [RFC PATCH bpf-next 0/6] bpf: add bpf_get_stack_helper Yonghong Song
@ 2018-04-06 21:48 ` Yonghong Song
  2018-04-06 21:48 ` [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper Yonghong Song
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2018-04-06 21:48 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

This patch didn't incur functionality change. The function prototype
got changed so that the same function can be reused later.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/stackmap.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 57eeb12..04f6ec1 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -262,16 +262,11 @@ static int stack_map_get_build_id(struct vm_area_struct *vma,
 	return ret;
 }
 
-static void stack_map_get_build_id_offset(struct bpf_map *map,
-					  struct stack_map_bucket *bucket,
+static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 					  u64 *ips, u32 trace_nr, bool user)
 {
 	int i;
 	struct vm_area_struct *vma;
-	struct bpf_stack_build_id *id_offs;
-
-	bucket->nr = trace_nr;
-	id_offs = (struct bpf_stack_build_id *)bucket->data;
 
 	/*
 	 * We cannot do up_read() in nmi context, so build_id lookup is
@@ -361,8 +356,10 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
 			pcpu_freelist_pop(&smap->freelist);
 		if (unlikely(!new_bucket))
 			return -ENOMEM;
-		stack_map_get_build_id_offset(map, new_bucket, ips,
-					      trace_nr, user);
+		new_bucket->nr = trace_nr;
+		stack_map_get_build_id_offset(
+			(struct bpf_stack_build_id *)new_bucket->data,
+			ips, trace_nr, user);
 		trace_len = trace_nr * sizeof(struct bpf_stack_build_id);
 		if (hash_matches && bucket->nr == trace_nr &&
 		    memcmp(bucket->data, new_bucket->data, trace_len) == 0) {
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
  2018-04-06 21:48 [RFC PATCH bpf-next 0/6] bpf: add bpf_get_stack_helper Yonghong Song
  2018-04-06 21:48 ` [RFC PATCH bpf-next 1/6] bpf: change prototype for stack_map_get_build_id_offset Yonghong Song
@ 2018-04-06 21:48 ` Yonghong Song
  2018-04-09  3:34   ` Alexei Starovoitov
  2018-04-06 21:48 ` [RFC PATCH bpf-next 3/6] tools/bpf: add bpf_get_stack helper to tools headers Yonghong Song
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Yonghong Song @ 2018-04-06 21:48 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

Currently, stackmap and bpf_get_stackid helper are provided
for bpf program to get the stack trace. This approach has
a limitation though. If two stack traces have the same hash,
only one will get stored in the stackmap table,
so some stack traces are missing from user perspective.

This patch implements a new helper, bpf_get_stack, will
send stack traces directly to bpf program. The bpf program
is able to see all stack traces, and then can do in-kernel
processing or send stack traces to user space through
shared map or bpf_perf_event_output.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h      |  1 +
 include/linux/filter.h   |  3 ++-
 include/uapi/linux/bpf.h | 17 +++++++++++++--
 kernel/bpf/stackmap.c    | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c     | 12 ++++++++++-
 kernel/bpf/verifier.c    |  3 +++
 kernel/trace/bpf_trace.c | 50 +++++++++++++++++++++++++++++++++++++++++-
 7 files changed, 137 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 95a7abd..72ccb9a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -676,6 +676,7 @@ extern const struct bpf_func_proto bpf_get_current_comm_proto;
 extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
 extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
 extern const struct bpf_func_proto bpf_get_stackid_proto;
+extern const struct bpf_func_proto bpf_get_stack_proto;
 extern const struct bpf_func_proto bpf_sock_map_update_proto;
 
 /* Shared helpers among cBPF and eBPF. */
diff --git a/include/linux/filter.h b/include/linux/filter.h
index fc4e8f9..9b64f63 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -467,7 +467,8 @@ struct bpf_prog {
 				dst_needed:1,	/* Do we need dst entry? */
 				blinded:1,	/* Was blinded */
 				is_func:1,	/* program is a bpf function */
-				kprobe_override:1; /* Do we override a kprobe? */
+				kprobe_override:1, /* Do we override a kprobe? */
+				need_callchain_buf:1; /* Needs callchain buffer? */
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	enum bpf_attach_type	expected_attach_type; /* For some prog types */
 	u32			len;		/* Number of filter blocks */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec897..a4ff5b7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -517,6 +517,17 @@ union bpf_attr {
  *             other bits - reserved
  *     Return: >= 0 stackid on success or negative error
  *
+ * int bpf_get_stack(ctx, buf, size, flags)
+ *     walk user or kernel stack and store the ips in buf
+ *     @ctx: struct pt_regs*
+ *     @buf: user buffer to fill stack
+ *     @size: the buf size
+ *     @flags: bits 0-7 - numer of stack frames to skip
+ *             bit 8 - collect user stack instead of kernel
+ *             bit 11 - get build-id as well if user stack
+ *             other bits - reserved
+ *     Return: >= 0 size copied on success or negative error
+ *
  * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
  *     calculate csum diff
  *     @from: raw from buffer
@@ -821,7 +832,8 @@ union bpf_attr {
 	FN(msg_apply_bytes),		\
 	FN(msg_cork_bytes),		\
 	FN(msg_pull_data),		\
-	FN(bind),
+	FN(bind),			\
+	FN(get_stack),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -855,11 +867,12 @@ enum bpf_func_id {
 /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
 #define BPF_F_TUNINFO_IPV6		(1ULL << 0)
 
-/* BPF_FUNC_get_stackid flags. */
+/* BPF_FUNC_get_stackid and BPF_FUNC_get_stack flags. */
 #define BPF_F_SKIP_FIELD_MASK		0xffULL
 #define BPF_F_USER_STACK		(1ULL << 8)
 #define BPF_F_FAST_STACK_CMP		(1ULL << 9)
 #define BPF_F_REUSE_STACKID		(1ULL << 10)
+#define BPF_F_USER_BUILD_ID		(1ULL << 11)
 
 /* BPF_FUNC_skb_set_tunnel_key flags. */
 #define BPF_F_ZERO_CSUM_TX		(1ULL << 1)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 04f6ec1..371c72e 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -402,6 +402,62 @@ const struct bpf_func_proto bpf_get_stackid_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
+	   u64, flags)
+{
+	u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
+	bool user_build_id = flags & BPF_F_USER_BUILD_ID;
+	u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
+	bool user = flags & BPF_F_USER_STACK;
+	struct perf_callchain_entry *trace;
+	bool kernel = !user;
+	u64 *ips;
+
+	if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
+			       BPF_F_USER_BUILD_ID)))
+		return -EINVAL;
+
+	elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id)
+					    : sizeof(u64);
+	if (unlikely(size % elem_size))
+		return -EINVAL;
+
+	num_elem = size / elem_size;
+	if (sysctl_perf_event_max_stack < num_elem)
+		init_nr = 0;
+	else
+		init_nr = sysctl_perf_event_max_stack - num_elem;
+	trace = get_perf_callchain(regs, init_nr, kernel, user,
+				   sysctl_perf_event_max_stack, false, false);
+	if (unlikely(!trace))
+		return -EFAULT;
+
+	trace_nr = trace->nr - init_nr;
+	if (trace_nr <= skip)
+		return -EFAULT;
+
+	trace_nr -= skip;
+	trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
+	copy_len = trace_nr * elem_size;
+	ips = trace->ip + skip + init_nr;
+	if (user && user_build_id)
+		stack_map_get_build_id_offset(buf, ips, trace_nr, user);
+	else
+		memcpy(buf, ips, copy_len);
+
+	return copy_len;
+}
+
+const struct bpf_func_proto bpf_get_stack_proto = {
+	.func		= bpf_get_stack,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg4_type	= ARG_ANYTHING,
+};
+
 /* Called from eBPF program */
 static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0244973..2aa3a65 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -984,10 +984,13 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
 static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 {
 	struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
+	bool need_callchain_buf = aux->prog->need_callchain_buf;
 
 	free_used_maps(aux);
 	bpf_prog_uncharge_memlock(aux->prog);
 	security_bpf_prog_free(aux);
+	if (need_callchain_buf)
+		put_callchain_buffers();
 	bpf_prog_free(aux->prog);
 }
 
@@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
 			bpf_prog_kallsyms_del(prog->aux->func[i]);
 		bpf_prog_kallsyms_del(prog);
 
-		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
+		synchronize_rcu();
+		__bpf_prog_put_rcu(&prog->aux->rcu);
 	}
 }
 
@@ -1341,6 +1345,12 @@ static int bpf_prog_load(union bpf_attr *attr)
 	if (err)
 		goto free_used_maps;
 
+	if (prog->need_callchain_buf) {
+		err = get_callchain_buffers(sysctl_perf_event_max_stack);
+		if (err)
+			goto free_used_maps;
+	}
+
 	err = bpf_prog_new_fd(prog);
 	if (err < 0) {
 		/* failed to allocate fd.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5dd1dcb..aba9425 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2460,6 +2460,9 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 	if (err)
 		return err;
 
+	if (func_id == BPF_FUNC_get_stack)
+		env->prog->need_callchain_buf = true;
+
 	if (changes_data)
 		clear_all_pkt_pointers(env);
 	return 0;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d88e96d..fe8476f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -20,6 +20,7 @@
 #include "trace.h"
 
 u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
+u64 bpf_get_stack(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
 
 /**
  * trace_call_bpf - invoke BPF program
@@ -577,6 +578,8 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_perf_event_output_proto;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto;
+	case BPF_FUNC_get_stack:
+		return &bpf_get_stack_proto;
 	case BPF_FUNC_perf_event_read_value:
 		return &bpf_perf_event_read_value_proto;
 #ifdef CONFIG_BPF_KPROBE_OVERRIDE
@@ -664,6 +667,25 @@ static const struct bpf_func_proto bpf_get_stackid_proto_tp = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_get_stack_tp, void *, tp_buff, void *, buf, u32, size,
+	   u64, flags)
+{
+	struct pt_regs *regs = *(struct pt_regs **)tp_buff;
+
+	return bpf_get_stack((unsigned long) regs, (unsigned long) buf,
+			     (unsigned long) size, flags, 0);
+}
+
+static const struct bpf_func_proto bpf_get_stack_proto_tp = {
+	.func		= bpf_get_stack_tp,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg4_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -672,6 +694,8 @@ tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_perf_event_output_proto_tp;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto_tp;
+	case BPF_FUNC_get_stack:
+		return &bpf_get_stack_proto_tp;
 	default:
 		return tracing_func_proto(func_id, prog);
 	}
@@ -734,6 +758,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_perf_event_output_proto_tp;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto_tp;
+	case BPF_FUNC_get_stack:
+		return &bpf_get_stack_proto_tp;
 	case BPF_FUNC_perf_prog_read_value:
 		return &bpf_perf_prog_read_value_proto;
 	default:
@@ -744,7 +770,7 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 /*
  * bpf_raw_tp_regs are separate from bpf_pt_regs used from skb/xdp
  * to avoid potential recursive reuse issue when/if tracepoints are added
- * inside bpf_*_event_output and/or bpf_get_stack_id
+ * inside bpf_*_event_output, bpf_get_stackid and/or bpf_get_stack
  */
 static DEFINE_PER_CPU(struct pt_regs, bpf_raw_tp_regs);
 BPF_CALL_5(bpf_perf_event_output_raw_tp, struct bpf_raw_tracepoint_args *, args,
@@ -787,6 +813,26 @@ static const struct bpf_func_proto bpf_get_stackid_proto_raw_tp = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_get_stack_raw_tp, struct bpf_raw_tracepoint_args *, args,
+	   void *, buf, u32, size, u64, flags)
+{
+	struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
+
+	perf_fetch_caller_regs(regs);
+	return bpf_get_stack((unsigned long) regs, (unsigned long) buf,
+			     (unsigned long) size, flags, 0);
+}
+
+static const struct bpf_func_proto bpf_get_stack_proto_raw_tp = {
+	.func		= bpf_get_stack_raw_tp,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,
+	.arg4_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -795,6 +841,8 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_perf_event_output_proto_raw_tp;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto_raw_tp;
+	case BPF_FUNC_get_stack:
+		return &bpf_get_stack_proto_raw_tp;
 	default:
 		return tracing_func_proto(func_id, prog);
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH bpf-next 3/6] tools/bpf: add bpf_get_stack helper to tools headers
  2018-04-06 21:48 [RFC PATCH bpf-next 0/6] bpf: add bpf_get_stack_helper Yonghong Song
  2018-04-06 21:48 ` [RFC PATCH bpf-next 1/6] bpf: change prototype for stack_map_get_build_id_offset Yonghong Song
  2018-04-06 21:48 ` [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper Yonghong Song
@ 2018-04-06 21:48 ` Yonghong Song
  2018-04-06 21:48 ` [RFC PATCH bpf-next 4/6] samples/bpf: move common-purpose perf_event functions to bpf_load.c Yonghong Song
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2018-04-06 21:48 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/include/uapi/linux/bpf.h            | 17 +++++++++++++++--
 tools/testing/selftests/bpf/bpf_helpers.h |  2 ++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9d07465..3930463 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -517,6 +517,17 @@ union bpf_attr {
  *             other bits - reserved
  *     Return: >= 0 stackid on success or negative error
  *
+ * int bpf_get_stack(ctx, buf, size, flags)
+ *     walk user or kernel stack and store the ips in buf
+ *     @ctx: struct pt_regs*
+ *     @buf: user buffer to fill stack
+ *     @size: the buf size
+ *     @flags: bits 0-7 - numer of stack frames to skip
+ *             bit 8 - collect user stack instead of kernel
+ *             bit 11 - get build-id as well if user stack
+ *             other bits - reserved
+ *     Return: >= 0 size copied on success or negative error
+ *
  * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
  *     calculate csum diff
  *     @from: raw from buffer
@@ -821,7 +832,8 @@ union bpf_attr {
 	FN(msg_apply_bytes),		\
 	FN(msg_cork_bytes),		\
 	FN(msg_pull_data),		\
-	FN(bind),
+	FN(bind),			\
+	FN(get_stack),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -855,11 +867,12 @@ enum bpf_func_id {
 /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
 #define BPF_F_TUNINFO_IPV6		(1ULL << 0)
 
-/* BPF_FUNC_get_stackid flags. */
+/* BPF_FUNC_get_stackid and BPF_FUNC_get_stack flags. */
 #define BPF_F_SKIP_FIELD_MASK		0xffULL
 #define BPF_F_USER_STACK		(1ULL << 8)
 #define BPF_F_FAST_STACK_CMP		(1ULL << 9)
 #define BPF_F_REUSE_STACKID		(1ULL << 10)
+#define BPF_F_USER_BUILD_ID		(1ULL << 11)
 
 /* BPF_FUNC_skb_set_tunnel_key flags. */
 #define BPF_F_ZERO_CSUM_TX		(1ULL << 1)
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index d8223d9..acaed02 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -96,6 +96,8 @@ static int (*bpf_msg_pull_data)(void *ctx, int start, int end, int flags) =
 	(void *) BPF_FUNC_msg_pull_data;
 static int (*bpf_bind)(void *ctx, void *addr, int addr_len) =
 	(void *) BPF_FUNC_bind;
+static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
+	(void *) BPF_FUNC_get_stack;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH bpf-next 4/6] samples/bpf: move common-purpose perf_event functions to bpf_load.c
  2018-04-06 21:48 [RFC PATCH bpf-next 0/6] bpf: add bpf_get_stack_helper Yonghong Song
                   ` (2 preceding siblings ...)
  2018-04-06 21:48 ` [RFC PATCH bpf-next 3/6] tools/bpf: add bpf_get_stack helper to tools headers Yonghong Song
@ 2018-04-06 21:48 ` Yonghong Song
  2018-04-06 21:48 ` [RFC PATCH bpf-next 5/6] samples/bpf: add a test for bpf_get_stack helper Yonghong Song
  2018-04-06 21:48 ` [RFC PATCH bpf-next 6/6] tools/bpf: add a test case " Yonghong Song
  5 siblings, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2018-04-06 21:48 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

There is no functionality change in this patch. The common-purpose
perf_event functions are moved from trace_output_user.c to bpf_load.c
so that these function can be reused later.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/bpf_load.c          | 104 ++++++++++++++++++++++++++++++++++++
 samples/bpf/bpf_load.h          |   5 ++
 samples/bpf/trace_output_user.c | 113 ++++------------------------------------
 3 files changed, 118 insertions(+), 104 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index bebe418..62aa5cc 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -713,3 +713,107 @@ struct ksym *ksym_search(long key)
 	return &syms[0];
 }
 
+static int page_size;
+static int page_cnt = 8;
+static volatile struct perf_event_mmap_page *header;
+
+static int perf_event_mmap(int fd)
+{
+	void *base;
+	int mmap_size;
+
+	page_size = getpagesize();
+	mmap_size = page_size * (page_cnt + 1);
+
+	base = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+	if (base == MAP_FAILED) {
+		printf("mmap err\n");
+		return -1;
+	}
+
+	header = base;
+	return 0;
+}
+
+static int perf_event_poll(int fd)
+{
+	struct pollfd pfd = { .fd = fd, .events = POLLIN };
+
+	return poll(&pfd, 1, 1000);
+}
+
+struct perf_event_sample {
+	struct perf_event_header header;
+	__u32 size;
+	char data[];
+};
+
+static void perf_event_read(perf_event_print_fn fn)
+{
+	__u64 data_tail = header->data_tail;
+	__u64 data_head = header->data_head;
+	__u64 buffer_size = page_cnt * page_size;
+	void *base, *begin, *end;
+	char buf[256];
+
+	asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
+	if (data_head == data_tail)
+		return;
+
+	base = ((char *)header) + page_size;
+
+	begin = base + data_tail % buffer_size;
+	end = base + data_head % buffer_size;
+
+	while (begin != end) {
+		struct perf_event_sample *e;
+
+		e = begin;
+		if (begin + e->header.size > base + buffer_size) {
+			long len = base + buffer_size - begin;
+
+			assert(len < e->header.size);
+			memcpy(buf, begin, len);
+			memcpy(buf + len, base, e->header.size - len);
+			e = (void *) buf;
+			begin = base + e->header.size - len;
+		} else if (begin + e->header.size == base + buffer_size) {
+			begin = base;
+		} else {
+			begin += e->header.size;
+		}
+
+		if (e->header.type == PERF_RECORD_SAMPLE) {
+			fn(e->data, e->size);
+		} else if (e->header.type == PERF_RECORD_LOST) {
+			struct {
+				struct perf_event_header header;
+				__u64 id;
+				__u64 lost;
+			} *lost = (void *) e;
+			printf("lost %lld events\n", lost->lost);
+		} else {
+			printf("unknown event type=%d size=%d\n",
+			       e->header.type, e->header.size);
+		}
+	}
+
+	__sync_synchronize(); /* smp_mb() */
+	header->data_tail = data_head;
+}
+
+int perf_event_poller(int fd, perf_event_exec_fn exec_fn,
+		      perf_event_print_fn output_fn)
+{
+	if (perf_event_mmap(fd) < 0)
+		return 1;
+
+	exec_fn();
+
+	for (;;) {
+		perf_event_poll(fd);
+		perf_event_read(output_fn);
+	}
+
+	return 0;
+}
diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h
index 453c200..d618750 100644
--- a/samples/bpf/bpf_load.h
+++ b/samples/bpf/bpf_load.h
@@ -62,4 +62,9 @@ struct ksym {
 int load_kallsyms(void);
 struct ksym *ksym_search(long key);
 int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
+
+typedef void (*perf_event_exec_fn)(void);
+typedef void (*perf_event_print_fn)(void *data, int size);
+int perf_event_poller(int fd, perf_event_exec_fn exec_fn,
+		      perf_event_print_fn output_fn);
 #endif
diff --git a/samples/bpf/trace_output_user.c b/samples/bpf/trace_output_user.c
index ccca1e3..3d3991f 100644
--- a/samples/bpf/trace_output_user.c
+++ b/samples/bpf/trace_output_user.c
@@ -24,97 +24,6 @@
 
 static int pmu_fd;
 
-int page_size;
-int page_cnt = 8;
-volatile struct perf_event_mmap_page *header;
-
-typedef void (*print_fn)(void *data, int size);
-
-static int perf_event_mmap(int fd)
-{
-	void *base;
-	int mmap_size;
-
-	page_size = getpagesize();
-	mmap_size = page_size * (page_cnt + 1);
-
-	base = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-	if (base == MAP_FAILED) {
-		printf("mmap err\n");
-		return -1;
-	}
-
-	header = base;
-	return 0;
-}
-
-static int perf_event_poll(int fd)
-{
-	struct pollfd pfd = { .fd = fd, .events = POLLIN };
-
-	return poll(&pfd, 1, 1000);
-}
-
-struct perf_event_sample {
-	struct perf_event_header header;
-	__u32 size;
-	char data[];
-};
-
-static void perf_event_read(print_fn fn)
-{
-	__u64 data_tail = header->data_tail;
-	__u64 data_head = header->data_head;
-	__u64 buffer_size = page_cnt * page_size;
-	void *base, *begin, *end;
-	char buf[256];
-
-	asm volatile("" ::: "memory"); /* in real code it should be smp_rmb() */
-	if (data_head == data_tail)
-		return;
-
-	base = ((char *)header) + page_size;
-
-	begin = base + data_tail % buffer_size;
-	end = base + data_head % buffer_size;
-
-	while (begin != end) {
-		struct perf_event_sample *e;
-
-		e = begin;
-		if (begin + e->header.size > base + buffer_size) {
-			long len = base + buffer_size - begin;
-
-			assert(len < e->header.size);
-			memcpy(buf, begin, len);
-			memcpy(buf + len, base, e->header.size - len);
-			e = (void *) buf;
-			begin = base + e->header.size - len;
-		} else if (begin + e->header.size == base + buffer_size) {
-			begin = base;
-		} else {
-			begin += e->header.size;
-		}
-
-		if (e->header.type == PERF_RECORD_SAMPLE) {
-			fn(e->data, e->size);
-		} else if (e->header.type == PERF_RECORD_LOST) {
-			struct {
-				struct perf_event_header header;
-				__u64 id;
-				__u64 lost;
-			} *lost = (void *) e;
-			printf("lost %lld events\n", lost->lost);
-		} else {
-			printf("unknown event type=%d size=%d\n",
-			       e->header.type, e->header.size);
-		}
-	}
-
-	__sync_synchronize(); /* smp_mb() */
-	header->data_tail = data_head;
-}
-
 static __u64 time_get_ns(void)
 {
 	struct timespec ts;
@@ -166,10 +75,17 @@ static void test_bpf_perf_event(void)
 	ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
 }
 
+static void exec_action(void)
+{
+	FILE *f;
+
+	f = popen("taskset 1 dd if=/dev/zero of=/dev/null", "r");
+	(void) f;
+}
+
 int main(int argc, char **argv)
 {
 	char filename[256];
-	FILE *f;
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 
@@ -180,17 +96,6 @@ int main(int argc, char **argv)
 
 	test_bpf_perf_event();
 
-	if (perf_event_mmap(pmu_fd) < 0)
-		return 1;
-
-	f = popen("taskset 1 dd if=/dev/zero of=/dev/null", "r");
-	(void) f;
-
 	start_time = time_get_ns();
-	for (;;) {
-		perf_event_poll(pmu_fd);
-		perf_event_read(print_bpf_output);
-	}
-
-	return 0;
+	return perf_event_poller(pmu_fd, exec_action, print_bpf_output);
 }
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH bpf-next 5/6] samples/bpf: add a test for bpf_get_stack helper
  2018-04-06 21:48 [RFC PATCH bpf-next 0/6] bpf: add bpf_get_stack_helper Yonghong Song
                   ` (3 preceding siblings ...)
  2018-04-06 21:48 ` [RFC PATCH bpf-next 4/6] samples/bpf: move common-purpose perf_event functions to bpf_load.c Yonghong Song
@ 2018-04-06 21:48 ` Yonghong Song
  2018-04-06 21:48 ` [RFC PATCH bpf-next 6/6] tools/bpf: add a test case " Yonghong Song
  5 siblings, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2018-04-06 21:48 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

The test attached a kprobe program to kernel function sys_write.
It tested to get stack for user space, kernel space and user
space with build_id request. It also tested to get user
and kernel stack into the same buffer with back-to-back
bpf_get_stack helper calls.

Whenever the kernel stack is available, the user space
application will check to ensure that sys_write/SyS_write
is part of the stack.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/Makefile               |   4 +
 samples/bpf/trace_get_stack_kern.c |  80 ++++++++++++++++++++
 samples/bpf/trace_get_stack_user.c | 150 +++++++++++++++++++++++++++++++++++++
 3 files changed, 234 insertions(+)
 create mode 100644 samples/bpf/trace_get_stack_kern.c
 create mode 100644 samples/bpf/trace_get_stack_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 4d6a6ed..94e7b10 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -44,6 +44,7 @@ hostprogs-y += xdp_monitor
 hostprogs-y += xdp_rxq_info
 hostprogs-y += syscall_tp
 hostprogs-y += cpustat
+hostprogs-y += trace_get_stack
 
 # Libbpf dependencies
 LIBBPF := ../../tools/lib/bpf/bpf.o ../../tools/lib/bpf/nlattr.o
@@ -95,6 +96,7 @@ xdp_monitor-objs := bpf_load.o $(LIBBPF) xdp_monitor_user.o
 xdp_rxq_info-objs := bpf_load.o $(LIBBPF) xdp_rxq_info_user.o
 syscall_tp-objs := bpf_load.o $(LIBBPF) syscall_tp_user.o
 cpustat-objs := bpf_load.o $(LIBBPF) cpustat_user.o
+trace_get_stack-objs := bpf_load.o $(LIBBPF) trace_get_stack_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -148,6 +150,7 @@ always += xdp_rxq_info_kern.o
 always += xdp2skb_meta_kern.o
 always += syscall_tp_kern.o
 always += cpustat_kern.o
+always += trace_get_stack_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 HOSTCFLAGS += -I$(srctree)/tools/lib/
@@ -193,6 +196,7 @@ HOSTLOADLIBES_xdp_monitor += -lelf
 HOSTLOADLIBES_xdp_rxq_info += -lelf
 HOSTLOADLIBES_syscall_tp += -lelf
 HOSTLOADLIBES_cpustat += -lelf
+HOSTLOADLIBES_trace_get_stack += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/trace_get_stack_kern.c b/samples/bpf/trace_get_stack_kern.c
new file mode 100644
index 0000000..c7cc7b1
--- /dev/null
+++ b/samples/bpf/trace_get_stack_kern.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/ptrace.h>
+#include <linux/version.h>
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+/* Permit pretty deep stack traces */
+#define MAX_STACK 100
+struct stack_trace_t {
+	int pid;
+	int kern_stack_size;
+	int user_stack_size;
+	int user_stack_buildid_size;
+	u64 kern_stack[MAX_STACK];
+	u64 user_stack[MAX_STACK];
+	struct bpf_stack_build_id user_stack_buildid[MAX_STACK];
+};
+
+struct bpf_map_def SEC("maps") perfmap = {
+	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+	.key_size = sizeof(int),
+	.value_size = sizeof(u32),
+	.max_entries = 2,
+};
+
+struct bpf_map_def SEC("maps") stackdata_map = {
+	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
+	.key_size = sizeof(u32),
+	.value_size = sizeof(struct stack_trace_t),
+	.max_entries = 1,
+};
+
+SEC("kprobe/sys_write")
+int bpf_prog1(struct pt_regs *ctx)
+{
+	int max_len, max_buildid_len, usize, ksize, total_size;
+	struct stack_trace_t *data;
+	void *raw_data;
+	u32 key = 0;
+
+	data = bpf_map_lookup_elem(&stackdata_map, &key);
+	if (!data)
+		return 0;
+
+	max_len = MAX_STACK * sizeof(u64);
+	max_buildid_len = MAX_STACK * sizeof(struct bpf_stack_build_id);
+	data->pid = bpf_get_current_pid_tgid();
+	data->kern_stack_size = bpf_get_stack(ctx, data->kern_stack,
+					      max_len, 0);
+	data->user_stack_size = bpf_get_stack(ctx, data->user_stack, max_len,
+					    BPF_F_USER_STACK);
+	data->user_stack_buildid_size = bpf_get_stack(
+		ctx, data->user_stack_buildid, max_buildid_len,
+		BPF_F_USER_STACK | BPF_F_USER_BUILD_ID);
+	bpf_perf_event_output(ctx, &perfmap, 0, data, sizeof(*data));
+
+	/* write both kernel and user stacks to the same buffer */
+	raw_data = (void *)data;
+	usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
+	if (usize < 0)
+		return 0;
+
+	ksize = 0;
+	if (usize < max_len) {
+		ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize,
+				      0);
+		if (ksize < 0)
+			return 0;
+	}
+	total_size = (usize < max_len ? usize : 0) +
+		     (ksize < max_len ? ksize : 0);
+	if (total_size > 0 && total_size < max_len)
+		bpf_perf_event_output(ctx, &perfmap, 0, raw_data, total_size);
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/trace_get_stack_user.c b/samples/bpf/trace_get_stack_user.c
new file mode 100644
index 0000000..f64f5a5
--- /dev/null
+++ b/samples/bpf/trace_get_stack_user.c
@@ -0,0 +1,150 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <linux/perf_event.h>
+#include <linux/bpf.h>
+#include <errno.h>
+#include <assert.h>
+#include <sys/syscall.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <time.h>
+#include <signal.h>
+#include "libbpf.h"
+#include "bpf_load.h"
+#include "perf-sys.h"
+
+static int pmu_fd;
+
+#define MAX_CNT 10ull
+#define MAX_STACK 100
+struct stack_trace_t {
+	int pid;
+	int kern_stack_size;
+	int user_stack_size;
+	int user_stack_buildid_size;
+	__u64 kern_stack[MAX_STACK];
+	__u64 user_stack[MAX_STACK];
+	struct bpf_stack_build_id user_stack_buildid[MAX_STACK];
+};
+
+static void print_bpf_output(void *data, int size)
+{
+	struct stack_trace_t *e = data;
+	int i, num_stack;
+	static __u64 cnt;
+	bool found = false;
+
+	cnt++;
+
+	if (size < sizeof(struct stack_trace_t)) {
+		__u64 *raw_data = data;
+
+		num_stack = size / sizeof(__u64);
+		printf("sample size = %d, raw stack\n\t", size);
+		for (i = 0; i < num_stack; i++) {
+			struct ksym *ks = ksym_search(raw_data[i]);
+
+			printf("0x%llx ", raw_data[i]);
+			if (ks && (strcmp(ks->name, "sys_write") == 0 ||
+				   strcmp(ks->name, "SyS_write") == 0))
+				found = true;
+		}
+		printf("\n");
+	} else {
+		printf("sample size = %d, pid %d\n", size, e->pid);
+		if (e->kern_stack_size > 0) {
+			num_stack = e->kern_stack_size / sizeof(__u64);
+			printf("\tkernel_stack(%d): ", num_stack);
+			for (i = 0; i < num_stack; i++) {
+				struct ksym *ks = ksym_search(e->kern_stack[i]);
+
+				printf("0x%llx ", e->kern_stack[i]);
+				if (ks && (strcmp(ks->name, "sys_write") == 0 ||
+					   strcmp(ks->name, "SyS_write") == 0))
+					found = true;
+			}
+			printf("\n");
+		}
+		if (e->user_stack_size > 0) {
+			num_stack = e->user_stack_size / sizeof(__u64);
+			printf("\tuser_stack(%d): ", num_stack);
+			for (i = 0; i < num_stack; i++)
+				printf("0x%llx ", e->user_stack[i]);
+			printf("\n");
+		}
+		if (e->user_stack_buildid_size > 0) {
+			num_stack = e->user_stack_buildid_size /
+				    sizeof(struct bpf_stack_build_id);
+			printf("\tuser_stack_buildid(%d): ", num_stack);
+			for (i = 0; i < num_stack; i++) {
+				int j;
+
+				printf("(%d, 0x", e->user_stack_buildid[i].status);
+				for (j = 0; j < BPF_BUILD_ID_SIZE; j++)
+					printf("%02x", e->user_stack_buildid[i].build_id[i]);
+				printf(", %llx) ", e->user_stack_buildid[i].offset);
+			}
+			printf("\n");
+		}
+	}
+	if (!found) {
+		printf("received %lld events, kern symbol not found, exiting ...\n", cnt);
+		kill(0, SIGINT);
+	}
+
+	if (cnt == MAX_CNT) {
+		printf("received max %lld events, exiting ...\n", cnt);
+		kill(0, SIGINT);
+	}
+}
+
+static void test_bpf_perf_event(void)
+{
+	struct perf_event_attr attr = {
+		.sample_type = PERF_SAMPLE_RAW,
+		.type = PERF_TYPE_SOFTWARE,
+		.config = PERF_COUNT_SW_BPF_OUTPUT,
+	};
+	int key = 0;
+
+	pmu_fd = sys_perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
+
+	assert(pmu_fd >= 0);
+	assert(bpf_map_update_elem(map_fd[0], &key, &pmu_fd, BPF_ANY) == 0);
+	ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+}
+
+static void action(void)
+{
+	FILE *f;
+
+	f = popen("taskset 1 dd if=/dev/zero of=/dev/null", "r");
+	(void) f;
+}
+
+int main(int argc, char **argv)
+{
+	char filename[256];
+
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+	if (load_kallsyms()) {
+		printf("failed to process /proc/kallsyms\n");
+		return 2;
+	}
+
+	if (load_bpf_file(filename)) {
+		printf("%s", bpf_log_buf);
+		return 1;
+	}
+
+	test_bpf_perf_event();
+	return perf_event_poller(pmu_fd, action, print_bpf_output);
+}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH bpf-next 6/6] tools/bpf: add a test case for bpf_get_stack helper
  2018-04-06 21:48 [RFC PATCH bpf-next 0/6] bpf: add bpf_get_stack_helper Yonghong Song
                   ` (4 preceding siblings ...)
  2018-04-06 21:48 ` [RFC PATCH bpf-next 5/6] samples/bpf: add a test for bpf_get_stack helper Yonghong Song
@ 2018-04-06 21:48 ` Yonghong Song
  5 siblings, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2018-04-06 21:48 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

The test_stacktrace_map is enhanced to call bpf_get_stack
in the helper to get the stack trace as well.
The stack traces from bpf_get_stack and bpf_get_stackid
are compared to ensure that for the same stack as
represented as the same hash, their ip addresses
must be the same.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/test_progs.c          | 41 ++++++++++++++++++++++-
 tools/testing/selftests/bpf/test_stacktrace_map.c | 20 +++++++++--
 2 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index faadbe2..8aa2844 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -865,9 +865,39 @@ static int compare_map_keys(int map1_fd, int map2_fd)
 	return 0;
 }
 
+static int compare_stack_ips(int smap_fd, int amap_fd)
+{
+	int max_len = PERF_MAX_STACK_DEPTH * sizeof(__u64);
+	__u32 key, next_key, *cur_key_p, *next_key_p;
+	char val_buf1[max_len], val_buf2[max_len];
+	int i, err;
+
+	cur_key_p = NULL;
+	next_key_p = &key;
+	while (bpf_map_get_next_key(smap_fd, cur_key_p, next_key_p) == 0) {
+		err = bpf_map_lookup_elem(smap_fd, next_key_p, val_buf1);
+		if (err)
+			return err;
+		err = bpf_map_lookup_elem(amap_fd, next_key_p, val_buf2);
+		if (err)
+			return err;
+		for (i = 0; i < max_len; i++) {
+			if (val_buf1[i] != val_buf2[i])
+				return -1;
+		}
+		key = *next_key_p;
+		cur_key_p = &key;
+		next_key_p = &next_key;
+	}
+	if (errno != ENOENT)
+		return -1;
+
+	return 0;
+}
+
 static void test_stacktrace_map()
 {
-	int control_map_fd, stackid_hmap_fd, stackmap_fd;
+	int control_map_fd, stackid_hmap_fd, stackmap_fd, stack_amap_fd;
 	const char *file = "./test_stacktrace_map.o";
 	int bytes, efd, err, pmu_fd, prog_fd;
 	struct perf_event_attr attr = {};
@@ -925,6 +955,10 @@ static void test_stacktrace_map()
 	if (stackmap_fd < 0)
 		goto disable_pmu;
 
+	stack_amap_fd = bpf_find_map(__func__, obj, "stack_amap");
+	if (stack_amap_fd < 0)
+		goto disable_pmu;
+
 	/* give some time for bpf program run */
 	sleep(1);
 
@@ -946,6 +980,11 @@ static void test_stacktrace_map()
 		  "err %d errno %d\n", err, errno))
 		goto disable_pmu_noerr;
 
+	err = compare_stack_ips(stackmap_fd, stack_amap_fd);
+	if (CHECK(err, "compare_stack_ips stackmap vs. stack_amap",
+		  "err %d errno %d\n", err, errno))
+		goto disable_pmu_noerr;
+
 	goto disable_pmu_noerr;
 disable_pmu:
 	error_cnt++;
diff --git a/tools/testing/selftests/bpf/test_stacktrace_map.c b/tools/testing/selftests/bpf/test_stacktrace_map.c
index 76d85c5d..f83c7b6 100644
--- a/tools/testing/selftests/bpf/test_stacktrace_map.c
+++ b/tools/testing/selftests/bpf/test_stacktrace_map.c
@@ -19,14 +19,21 @@ struct bpf_map_def SEC("maps") stackid_hmap = {
 	.type = BPF_MAP_TYPE_HASH,
 	.key_size = sizeof(__u32),
 	.value_size = sizeof(__u32),
-	.max_entries = 10000,
+	.max_entries = 16384,
 };
 
 struct bpf_map_def SEC("maps") stackmap = {
 	.type = BPF_MAP_TYPE_STACK_TRACE,
 	.key_size = sizeof(__u32),
 	.value_size = sizeof(__u64) * PERF_MAX_STACK_DEPTH,
-	.max_entries = 10000,
+	.max_entries = 16384,
+};
+
+struct bpf_map_def SEC("maps") stack_amap = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u64) * PERF_MAX_STACK_DEPTH,
+	.max_entries = 16384,
 };
 
 /* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
@@ -44,7 +51,10 @@ struct sched_switch_args {
 SEC("tracepoint/sched/sched_switch")
 int oncpu(struct sched_switch_args *ctx)
 {
+	__u32 max_len = PERF_MAX_STACK_DEPTH * sizeof(__u64);
 	__u32 key = 0, val = 0, *value_p;
+	void *stack_p;
+
 
 	value_p = bpf_map_lookup_elem(&control_map, &key);
 	if (value_p && *value_p)
@@ -52,8 +62,12 @@ int oncpu(struct sched_switch_args *ctx)
 
 	/* The size of stackmap and stackid_hmap should be the same */
 	key = bpf_get_stackid(ctx, &stackmap, 0);
-	if ((int)key >= 0)
+	if ((int)key >= 0) {
 		bpf_map_update_elem(&stackid_hmap, &key, &val, 0);
+		stack_p = bpf_map_lookup_elem(&stack_amap, &key);
+		if (stack_p)
+			bpf_get_stack(ctx, stack_p, max_len, 0);
+	}
 
 	return 0;
 }
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
  2018-04-06 21:48 ` [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper Yonghong Song
@ 2018-04-09  3:34   ` Alexei Starovoitov
  2018-04-09  4:53     ` Yonghong Song
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2018-04-09  3:34 UTC (permalink / raw)
  To: Yonghong Song, daniel, netdev; +Cc: kernel-team

On 4/6/18 2:48 PM, Yonghong Song wrote:
> Currently, stackmap and bpf_get_stackid helper are provided
> for bpf program to get the stack trace. This approach has
> a limitation though. If two stack traces have the same hash,
> only one will get stored in the stackmap table,
> so some stack traces are missing from user perspective.
>
> This patch implements a new helper, bpf_get_stack, will
> send stack traces directly to bpf program. The bpf program
> is able to see all stack traces, and then can do in-kernel
> processing or send stack traces to user space through
> shared map or bpf_perf_event_output.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/linux/bpf.h      |  1 +
>  include/linux/filter.h   |  3 ++-
>  include/uapi/linux/bpf.h | 17 +++++++++++++--
>  kernel/bpf/stackmap.c    | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/bpf/syscall.c     | 12 ++++++++++-
>  kernel/bpf/verifier.c    |  3 +++
>  kernel/trace/bpf_trace.c | 50 +++++++++++++++++++++++++++++++++++++++++-
>  7 files changed, 137 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 95a7abd..72ccb9a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -676,6 +676,7 @@ extern const struct bpf_func_proto bpf_get_current_comm_proto;
>  extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
>  extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
>  extern const struct bpf_func_proto bpf_get_stackid_proto;
> +extern const struct bpf_func_proto bpf_get_stack_proto;
>  extern const struct bpf_func_proto bpf_sock_map_update_proto;
>
>  /* Shared helpers among cBPF and eBPF. */
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index fc4e8f9..9b64f63 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -467,7 +467,8 @@ struct bpf_prog {
>  				dst_needed:1,	/* Do we need dst entry? */
>  				blinded:1,	/* Was blinded */
>  				is_func:1,	/* program is a bpf function */
> -				kprobe_override:1; /* Do we override a kprobe? */
> +				kprobe_override:1, /* Do we override a kprobe? */
> +				need_callchain_buf:1; /* Needs callchain buffer? */
>  	enum bpf_prog_type	type;		/* Type of BPF program */
>  	enum bpf_attach_type	expected_attach_type; /* For some prog types */
>  	u32			len;		/* Number of filter blocks */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c5ec897..a4ff5b7 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -517,6 +517,17 @@ union bpf_attr {
>   *             other bits - reserved
>   *     Return: >= 0 stackid on success or negative error
>   *
> + * int bpf_get_stack(ctx, buf, size, flags)
> + *     walk user or kernel stack and store the ips in buf
> + *     @ctx: struct pt_regs*
> + *     @buf: user buffer to fill stack
> + *     @size: the buf size
> + *     @flags: bits 0-7 - numer of stack frames to skip
> + *             bit 8 - collect user stack instead of kernel
> + *             bit 11 - get build-id as well if user stack
> + *             other bits - reserved
> + *     Return: >= 0 size copied on success or negative error
> + *
>   * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
>   *     calculate csum diff
>   *     @from: raw from buffer
> @@ -821,7 +832,8 @@ union bpf_attr {
>  	FN(msg_apply_bytes),		\
>  	FN(msg_cork_bytes),		\
>  	FN(msg_pull_data),		\
> -	FN(bind),
> +	FN(bind),			\
> +	FN(get_stack),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> @@ -855,11 +867,12 @@ enum bpf_func_id {
>  /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
>  #define BPF_F_TUNINFO_IPV6		(1ULL << 0)
>
> -/* BPF_FUNC_get_stackid flags. */
> +/* BPF_FUNC_get_stackid and BPF_FUNC_get_stack flags. */
>  #define BPF_F_SKIP_FIELD_MASK		0xffULL
>  #define BPF_F_USER_STACK		(1ULL << 8)
>  #define BPF_F_FAST_STACK_CMP		(1ULL << 9)
>  #define BPF_F_REUSE_STACKID		(1ULL << 10)
> +#define BPF_F_USER_BUILD_ID		(1ULL << 11)

the comment above is not quite correct.
This new flag is only available for new helper.

>
>  /* BPF_FUNC_skb_set_tunnel_key flags. */
>  #define BPF_F_ZERO_CSUM_TX		(1ULL << 1)
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 04f6ec1..371c72e 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -402,6 +402,62 @@ const struct bpf_func_proto bpf_get_stackid_proto = {
>  	.arg3_type	= ARG_ANYTHING,
>  };
>
> +BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
> +	   u64, flags)
> +{
> +	u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
> +	bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> +	u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> +	bool user = flags & BPF_F_USER_STACK;
> +	struct perf_callchain_entry *trace;
> +	bool kernel = !user;
> +	u64 *ips;
> +
> +	if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
> +			       BPF_F_USER_BUILD_ID)))
> +		return -EINVAL;
> +
> +	elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id)
> +					    : sizeof(u64);
> +	if (unlikely(size % elem_size))
> +		return -EINVAL;
> +
> +	num_elem = size / elem_size;
> +	if (sysctl_perf_event_max_stack < num_elem)
> +		init_nr = 0;

prog's buffer should be zero padded in this case since it
points to uninit_mem.

> +	else
> +		init_nr = sysctl_perf_event_max_stack - num_elem;
> +	trace = get_perf_callchain(regs, init_nr, kernel, user,
> +				   sysctl_perf_event_max_stack, false, false);
> +	if (unlikely(!trace))
> +		return -EFAULT;
> +
> +	trace_nr = trace->nr - init_nr;
> +	if (trace_nr <= skip)
> +		return -EFAULT;
> +
> +	trace_nr -= skip;
> +	trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
> +	copy_len = trace_nr * elem_size;
> +	ips = trace->ip + skip + init_nr;
> +	if (user && user_build_id)

the combination of kern + user_build_id should probably be rejected
earlier with einval.

> +		stack_map_get_build_id_offset(buf, ips, trace_nr, user);
> +	else
> +		memcpy(buf, ips, copy_len);
> +
> +	return copy_len;
> +}
> +
> +const struct bpf_func_proto bpf_get_stack_proto = {
> +	.func		= bpf_get_stack,
> +	.gpl_only	= true,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg3_type	= ARG_CONST_SIZE_OR_ZERO,

why allow zero size?
I'm not sure the helper will work correctly when size=0

> +	.arg4_type	= ARG_ANYTHING,
> +};
> +
>  /* Called from eBPF program */
>  static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
>  {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 0244973..2aa3a65 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -984,10 +984,13 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
>  static void __bpf_prog_put_rcu(struct rcu_head *rcu)
>  {
>  	struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
> +	bool need_callchain_buf = aux->prog->need_callchain_buf;
>
>  	free_used_maps(aux);
>  	bpf_prog_uncharge_memlock(aux->prog);
>  	security_bpf_prog_free(aux);
> +	if (need_callchain_buf)
> +		put_callchain_buffers();
>  	bpf_prog_free(aux->prog);
>  }
>
> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
>  			bpf_prog_kallsyms_del(prog->aux->func[i]);
>  		bpf_prog_kallsyms_del(prog);
>
> -		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
> +		synchronize_rcu();
> +		__bpf_prog_put_rcu(&prog->aux->rcu);

there should have been lockdep splat.
We cannot call synchronize_rcu here, since we cannot sleep
in some cases.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
  2018-04-09  3:34   ` Alexei Starovoitov
@ 2018-04-09  4:53     ` Yonghong Song
  2018-04-09  5:02       ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Yonghong Song @ 2018-04-09  4:53 UTC (permalink / raw)
  To: Alexei Starovoitov, daniel, netdev; +Cc: kernel-team



On 4/8/18 8:34 PM, Alexei Starovoitov wrote:
> On 4/6/18 2:48 PM, Yonghong Song wrote:
>> Currently, stackmap and bpf_get_stackid helper are provided
>> for bpf program to get the stack trace. This approach has
>> a limitation though. If two stack traces have the same hash,
>> only one will get stored in the stackmap table,
>> so some stack traces are missing from user perspective.
>>
>> This patch implements a new helper, bpf_get_stack, will
>> send stack traces directly to bpf program. The bpf program
>> is able to see all stack traces, and then can do in-kernel
>> processing or send stack traces to user space through
>> shared map or bpf_perf_event_output.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>  include/linux/bpf.h      |  1 +
>>  include/linux/filter.h   |  3 ++-
>>  include/uapi/linux/bpf.h | 17 +++++++++++++--
>>  kernel/bpf/stackmap.c    | 56 
>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>  kernel/bpf/syscall.c     | 12 ++++++++++-
>>  kernel/bpf/verifier.c    |  3 +++
>>  kernel/trace/bpf_trace.c | 50 +++++++++++++++++++++++++++++++++++++++++-
>>  7 files changed, 137 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 95a7abd..72ccb9a 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -676,6 +676,7 @@ extern const struct bpf_func_proto 
>> bpf_get_current_comm_proto;
>>  extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
>>  extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
>>  extern const struct bpf_func_proto bpf_get_stackid_proto;
>> +extern const struct bpf_func_proto bpf_get_stack_proto;
>>  extern const struct bpf_func_proto bpf_sock_map_update_proto;
>>
>>  /* Shared helpers among cBPF and eBPF. */
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index fc4e8f9..9b64f63 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -467,7 +467,8 @@ struct bpf_prog {
>>                  dst_needed:1,    /* Do we need dst entry? */
>>                  blinded:1,    /* Was blinded */
>>                  is_func:1,    /* program is a bpf function */
>> -                kprobe_override:1; /* Do we override a kprobe? */
>> +                kprobe_override:1, /* Do we override a kprobe? */
>> +                need_callchain_buf:1; /* Needs callchain buffer? */
>>      enum bpf_prog_type    type;        /* Type of BPF program */
>>      enum bpf_attach_type    expected_attach_type; /* For some prog 
>> types */
>>      u32            len;        /* Number of filter blocks */
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index c5ec897..a4ff5b7 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -517,6 +517,17 @@ union bpf_attr {
>>   *             other bits - reserved
>>   *     Return: >= 0 stackid on success or negative error
>>   *
>> + * int bpf_get_stack(ctx, buf, size, flags)
>> + *     walk user or kernel stack and store the ips in buf
>> + *     @ctx: struct pt_regs*
>> + *     @buf: user buffer to fill stack
>> + *     @size: the buf size
>> + *     @flags: bits 0-7 - numer of stack frames to skip
>> + *             bit 8 - collect user stack instead of kernel
>> + *             bit 11 - get build-id as well if user stack
>> + *             other bits - reserved
>> + *     Return: >= 0 size copied on success or negative error
>> + *
>>   * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
>>   *     calculate csum diff
>>   *     @from: raw from buffer
>> @@ -821,7 +832,8 @@ union bpf_attr {
>>      FN(msg_apply_bytes),        \
>>      FN(msg_cork_bytes),        \
>>      FN(msg_pull_data),        \
>> -    FN(bind),
>> +    FN(bind),            \
>> +    FN(get_stack),
>>
>>  /* integer value in 'imm' field of BPF_CALL instruction selects which 
>> helper
>>   * function eBPF program intends to call
>> @@ -855,11 +867,12 @@ enum bpf_func_id {
>>  /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
>>  #define BPF_F_TUNINFO_IPV6        (1ULL << 0)
>>
>> -/* BPF_FUNC_get_stackid flags. */
>> +/* BPF_FUNC_get_stackid and BPF_FUNC_get_stack flags. */
>>  #define BPF_F_SKIP_FIELD_MASK        0xffULL
>>  #define BPF_F_USER_STACK        (1ULL << 8)
>>  #define BPF_F_FAST_STACK_CMP        (1ULL << 9)
>>  #define BPF_F_REUSE_STACKID        (1ULL << 10)
>> +#define BPF_F_USER_BUILD_ID        (1ULL << 11)
> 
> the comment above is not quite correct.
> This new flag is only available for new helper.

Right, some flags are used for both helpers and some are only used for 
one of them. Will make it clear in the next revision.

> 
>>
>>  /* BPF_FUNC_skb_set_tunnel_key flags. */
>>  #define BPF_F_ZERO_CSUM_TX        (1ULL << 1)
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index 04f6ec1..371c72e 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -402,6 +402,62 @@ const struct bpf_func_proto bpf_get_stackid_proto 
>> = {
>>      .arg3_type    = ARG_ANYTHING,
>>  };
>>
>> +BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, 
>> size,
>> +       u64, flags)
>> +{
>> +    u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
>> +    bool user_build_id = flags & BPF_F_USER_BUILD_ID;
>> +    u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
>> +    bool user = flags & BPF_F_USER_STACK;
>> +    struct perf_callchain_entry *trace;
>> +    bool kernel = !user;
>> +    u64 *ips;
>> +
>> +    if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
>> +                   BPF_F_USER_BUILD_ID)))
>> +        return -EINVAL;
>> +
>> +    elem_size = (user && user_build_id) ? sizeof(struct 
>> bpf_stack_build_id)
>> +                        : sizeof(u64);
>> +    if (unlikely(size % elem_size))
>> +        return -EINVAL;
>> +
>> +    num_elem = size / elem_size;
>> +    if (sysctl_perf_event_max_stack < num_elem)
>> +        init_nr = 0;
> 
> prog's buffer should be zero padded in this case since it
> points to uninit_mem.

Will make the change in the next revision.

> 
>> +    else
>> +        init_nr = sysctl_perf_event_max_stack - num_elem;
>> +    trace = get_perf_callchain(regs, init_nr, kernel, user,
>> +                   sysctl_perf_event_max_stack, false, false);
>> +    if (unlikely(!trace))
>> +        return -EFAULT;
>> +
>> +    trace_nr = trace->nr - init_nr;
>> +    if (trace_nr <= skip)
>> +        return -EFAULT;
>> +
>> +    trace_nr -= skip;
>> +    trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
>> +    copy_len = trace_nr * elem_size;
>> +    ips = trace->ip + skip + init_nr;
>> +    if (user && user_build_id)
> 
> the combination of kern + user_build_id should probably be rejected
> earlier with einval.

Right, I missed this case. It should be rejected.

> 
>> +        stack_map_get_build_id_offset(buf, ips, trace_nr, user);
>> +    else
>> +        memcpy(buf, ips, copy_len);
>> +
>> +    return copy_len;
>> +}
>> +
>> +const struct bpf_func_proto bpf_get_stack_proto = {
>> +    .func        = bpf_get_stack,
>> +    .gpl_only    = true,
>> +    .ret_type    = RET_INTEGER,
>> +    .arg1_type    = ARG_PTR_TO_CTX,
>> +    .arg2_type    = ARG_PTR_TO_UNINIT_MEM,
>> +    .arg3_type    = ARG_CONST_SIZE_OR_ZERO,
> 
> why allow zero size?
> I'm not sure the helper will work correctly when size=0

The only reason I had is to make bpf program easier so
they do not need to test zero size, similiar to 
bpf_probe_read/bpf_perf_event_output.

I have double checked my implementation and it should
handle zero size properly.

Let me double check whether whether not allowing zero
can still have reasonable bpf program coding which
passes verifier.

> 
>> +    .arg4_type    = ARG_ANYTHING,
>> +};
>> +
>>  /* Called from eBPF program */
>>  static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
>>  {
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 0244973..2aa3a65 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -984,10 +984,13 @@ void bpf_prog_free_id(struct bpf_prog *prog, 
>> bool do_idr_lock)
>>  static void __bpf_prog_put_rcu(struct rcu_head *rcu)
>>  {
>>      struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, 
>> rcu);
>> +    bool need_callchain_buf = aux->prog->need_callchain_buf;
>>
>>      free_used_maps(aux);
>>      bpf_prog_uncharge_memlock(aux->prog);
>>      security_bpf_prog_free(aux);
>> +    if (need_callchain_buf)
>> +        put_callchain_buffers();
>>      bpf_prog_free(aux->prog);
>>  }
>>
>> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog 
>> *prog, bool do_idr_lock)
>>              bpf_prog_kallsyms_del(prog->aux->func[i]);
>>          bpf_prog_kallsyms_del(prog);
>>
>> -        call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>> +        synchronize_rcu();
>> +        __bpf_prog_put_rcu(&prog->aux->rcu);
> 
> there should have been lockdep splat.
> We cannot call synchronize_rcu here, since we cannot sleep
> in some cases.

Let me double check this. The following is the reason
why I am using synchronize_rcu().

With call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu) and
_bpf_prog_put_rcu calls put_callchain_buffers() which
calls mutex_lock(), the runtime with CONFIG_DEBUG_ATOMIC_SLEEP=y
will complains since potential sleep inside the call_rcu is not
allowed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
  2018-04-09  4:53     ` Yonghong Song
@ 2018-04-09  5:02       ` Alexei Starovoitov
  2018-04-09 10:01         ` Daniel Borkmann
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2018-04-09  5:02 UTC (permalink / raw)
  To: Yonghong Song, daniel, netdev; +Cc: kernel-team

On 4/8/18 9:53 PM, Yonghong Song wrote:
>>> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog
>>> *prog, bool do_idr_lock)
>>>              bpf_prog_kallsyms_del(prog->aux->func[i]);
>>>          bpf_prog_kallsyms_del(prog);
>>>
>>> -        call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>>> +        synchronize_rcu();
>>> +        __bpf_prog_put_rcu(&prog->aux->rcu);
>>
>> there should have been lockdep splat.
>> We cannot call synchronize_rcu here, since we cannot sleep
>> in some cases.
>
> Let me double check this. The following is the reason
> why I am using synchronize_rcu().
>
> With call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu) and
> _bpf_prog_put_rcu calls put_callchain_buffers() which
> calls mutex_lock(), the runtime with CONFIG_DEBUG_ATOMIC_SLEEP=y
> will complains since potential sleep inside the call_rcu is not
> allowed.

I see. Indeed. We cannot call put_callchain_buffers() from rcu callback,
but doing synchronize_rcu() here is also not possible.
How about moving put_callchain into bpf_prog_free_deferred() ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
  2018-04-09  5:02       ` Alexei Starovoitov
@ 2018-04-09 10:01         ` Daniel Borkmann
  2018-04-09 16:52           ` Yonghong Song
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2018-04-09 10:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Yonghong Song, netdev; +Cc: kernel-team

On 04/09/2018 07:02 AM, Alexei Starovoitov wrote:
> On 4/8/18 9:53 PM, Yonghong Song wrote:
>>>> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog
>>>> *prog, bool do_idr_lock)
>>>>              bpf_prog_kallsyms_del(prog->aux->func[i]);
>>>>          bpf_prog_kallsyms_del(prog);
>>>>
>>>> -        call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>>>> +        synchronize_rcu();
>>>> +        __bpf_prog_put_rcu(&prog->aux->rcu);
>>>
>>> there should have been lockdep splat.
>>> We cannot call synchronize_rcu here, since we cannot sleep
>>> in some cases.
>>
>> Let me double check this. The following is the reason
>> why I am using synchronize_rcu().
>>
>> With call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu) and
>> _bpf_prog_put_rcu calls put_callchain_buffers() which
>> calls mutex_lock(), the runtime with CONFIG_DEBUG_ATOMIC_SLEEP=y
>> will complains since potential sleep inside the call_rcu is not
>> allowed.
> 
> I see. Indeed. We cannot call put_callchain_buffers() from rcu callback,
> but doing synchronize_rcu() here is also not possible.
> How about moving put_callchain into bpf_prog_free_deferred() ?

+1, the assumption is that you can call bpf_prog_put() and also the
bpf_map_put() from any context. Sleeping here for a long time might
subtly break things badly.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
  2018-04-09 10:01         ` Daniel Borkmann
@ 2018-04-09 16:52           ` Yonghong Song
  0 siblings, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2018-04-09 16:52 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, netdev; +Cc: kernel-team



On 4/9/18 3:01 AM, Daniel Borkmann wrote:
> On 04/09/2018 07:02 AM, Alexei Starovoitov wrote:
>> On 4/8/18 9:53 PM, Yonghong Song wrote:
>>>>> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog
>>>>> *prog, bool do_idr_lock)
>>>>>               bpf_prog_kallsyms_del(prog->aux->func[i]);
>>>>>           bpf_prog_kallsyms_del(prog);
>>>>>
>>>>> -        call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
>>>>> +        synchronize_rcu();
>>>>> +        __bpf_prog_put_rcu(&prog->aux->rcu);
>>>>
>>>> there should have been lockdep splat.
>>>> We cannot call synchronize_rcu here, since we cannot sleep
>>>> in some cases.
>>>
>>> Let me double check this. The following is the reason
>>> why I am using synchronize_rcu().
>>>
>>> With call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu) and
>>> _bpf_prog_put_rcu calls put_callchain_buffers() which
>>> calls mutex_lock(), the runtime with CONFIG_DEBUG_ATOMIC_SLEEP=y
>>> will complains since potential sleep inside the call_rcu is not
>>> allowed.
>>
>> I see. Indeed. We cannot call put_callchain_buffers() from rcu callback,
>> but doing synchronize_rcu() here is also not possible.
>> How about moving put_callchain into bpf_prog_free_deferred() ?
> 
> +1, the assumption is that you can call bpf_prog_put() and also the
> bpf_map_put() from any context. Sleeping here for a long time might
> subtly break things badly.

Thanks for the suggestion! This should work.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-04-09 16:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-06 21:48 [RFC PATCH bpf-next 0/6] bpf: add bpf_get_stack_helper Yonghong Song
2018-04-06 21:48 ` [RFC PATCH bpf-next 1/6] bpf: change prototype for stack_map_get_build_id_offset Yonghong Song
2018-04-06 21:48 ` [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper Yonghong Song
2018-04-09  3:34   ` Alexei Starovoitov
2018-04-09  4:53     ` Yonghong Song
2018-04-09  5:02       ` Alexei Starovoitov
2018-04-09 10:01         ` Daniel Borkmann
2018-04-09 16:52           ` Yonghong Song
2018-04-06 21:48 ` [RFC PATCH bpf-next 3/6] tools/bpf: add bpf_get_stack helper to tools headers Yonghong Song
2018-04-06 21:48 ` [RFC PATCH bpf-next 4/6] samples/bpf: move common-purpose perf_event functions to bpf_load.c Yonghong Song
2018-04-06 21:48 ` [RFC PATCH bpf-next 5/6] samples/bpf: add a test for bpf_get_stack helper Yonghong Song
2018-04-06 21:48 ` [RFC PATCH bpf-next 6/6] tools/bpf: add a test case " Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).