bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/5] bpf: fix stackmap on perf_events with PEBS
@ 2020-07-11  1:26 Song Liu
  2020-07-11  1:26 ` [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries Song Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Song Liu @ 2020-07-11  1:26 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer,
	peterz, Song Liu

Calling get_perf_callchain() on perf_events from PEBS entries may cause
unwinder errors. To fix this issue, perf subsystem fetches callchain early,
and marks perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
Similar issue exists when BPF program calls get_perf_callchain() via
helper functions. For more information about this issue, please refer to
discussions in [1].

This set provides a solution for this problem.

1/5 blocks ioctl(PERF_EVENT_IOC_SET_BPF) attaching BPF program that calls
    get_perf_callchain() to perf events with PEBS entries.
2/5 exposes callchain fetched by perf subsystem to BPF program.
3/5 introduces bpf_get_callchain_stackid(), which is alternative to
    bpf_get_stackid() for perf_event with PEBS.
4/5 adds selftests for 1/5.
5/5 adds selftests for 2/5 and 3/5.

[1] https://lore.kernel.org/lkml/ED7B9430-6489-4260-B3C5-9CFA2E3AA87A@fb.com/

Song Liu (5):
  bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries
  bpf: add callchain to bpf_perf_event_data
  bpf: introduce bpf_get_callchain_stackid
  selftests/bpf: add get_stackid_cannot_attach
  selftests/bpf: add callchain_stackid

 include/linux/bpf.h                           |  1 +
 include/linux/filter.h                        |  3 +-
 include/linux/perf_event.h                    |  5 --
 include/linux/trace_events.h                  |  5 ++
 include/uapi/linux/bpf.h                      | 43 +++++++++++++
 include/uapi/linux/bpf_perf_event.h           |  7 +++
 kernel/bpf/btf.c                              |  5 ++
 kernel/bpf/stackmap.c                         | 63 ++++++++++++++-----
 kernel/bpf/verifier.c                         |  7 ++-
 kernel/events/core.c                          | 10 +++
 kernel/trace/bpf_trace.c                      | 29 +++++++++
 scripts/bpf_helpers_doc.py                    |  2 +
 tools/include/uapi/linux/bpf.h                | 43 +++++++++++++
 tools/include/uapi/linux/bpf_perf_event.h     |  8 +++
 .../bpf/prog_tests/callchain_stackid.c        | 61 ++++++++++++++++++
 .../prog_tests/get_stackid_cannot_attach.c    | 57 +++++++++++++++++
 .../selftests/bpf/progs/callchain_stackid.c   | 37 +++++++++++
 17 files changed, 364 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/callchain_stackid.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/get_stackid_cannot_attach.c
 create mode 100644 tools/testing/selftests/bpf/progs/callchain_stackid.c

--
2.24.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries
  2020-07-11  1:26 [PATCH bpf-next 0/5] bpf: fix stackmap on perf_events with PEBS Song Liu
@ 2020-07-11  1:26 ` Song Liu
  2020-07-11  3:53   ` Andrii Nakryiko
  2020-07-11  1:26 ` [PATCH bpf-next 2/5] bpf: add callchain to bpf_perf_event_data Song Liu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Song Liu @ 2020-07-11  1:26 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer,
	peterz, Song Liu

Calling get_perf_callchain() on perf_events from PEBS entries may cause
unwinder errors. To fix this issue, the callchain is fetched early. Such
perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.

Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
also cause unwinder errors. To fix this, block bpf_get_[stack|stackid] on
these perf_events. Unfortunately, bpf verifier cannot tell whether the
program will be attached to perf_event with PEBS entries. Therefore,
block such programs during ioctl(PERF_EVENT_IOC_SET_BPF).

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/filter.h |  3 ++-
 kernel/bpf/verifier.c  |  3 +++
 kernel/events/core.c   | 10 ++++++++++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 2593777236037..fb34dc40f039b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -534,7 +534,8 @@ struct bpf_prog {
 				is_func:1,	/* program is a bpf function */
 				kprobe_override:1, /* Do we override a kprobe? */
 				has_callchain_buf:1, /* callchain buffer allocated? */
-				enforce_expected_attach_type:1; /* Enforce expected_attach_type checking at attach time */
+				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
+				call_get_perf_callchain:1; /* Do we call helpers that uses get_perf_callchain()? */
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	enum bpf_attach_type	expected_attach_type; /* For some prog types */
 	u32			len;		/* Number of filter blocks */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b608185e1ffd5..1e11b0f6fba31 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4884,6 +4884,9 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 		env->prog->has_callchain_buf = true;
 	}
 
+	if (func_id == BPF_FUNC_get_stackid || func_id == BPF_FUNC_get_stack)
+		env->prog->call_get_perf_callchain = true;
+
 	if (changes_data)
 		clear_all_pkt_pointers(env);
 	return 0;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 856d98c36f562..f2f575a286bb4 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9544,6 +9544,16 @@ static int perf_event_set_bpf_handler(struct perf_event *event, u32 prog_fd)
 	if (IS_ERR(prog))
 		return PTR_ERR(prog);
 
+	if ((event->attr.sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY) &&
+	    prog->call_get_perf_callchain) {
+		/*
+		 * The perf_event get_perf_callchain() early, the attached
+		 * BPF program shouldn't call get_perf_callchain() again.
+		 */
+		bpf_prog_put(prog);
+		return -EINVAL;
+	}
+
 	event->prog = prog;
 	event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
 	WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next 2/5] bpf: add callchain to bpf_perf_event_data
  2020-07-11  1:26 [PATCH bpf-next 0/5] bpf: fix stackmap on perf_events with PEBS Song Liu
  2020-07-11  1:26 ` [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries Song Liu
@ 2020-07-11  1:26 ` Song Liu
  2020-07-11  1:26 ` [PATCH bpf-next 3/5] bpf: introduce bpf_get_callchain_stackid Song Liu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2020-07-11  1:26 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer,
	peterz, Song Liu

If the callchain is available, BPF program can use bpf_probe_read_kernel()
to fetch the callchain, or use it in a BPF helper.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/perf_event.h                |  5 -----
 include/linux/trace_events.h              |  5 +++++
 include/uapi/linux/bpf_perf_event.h       |  7 ++++++
 kernel/bpf/btf.c                          |  5 +++++
 kernel/trace/bpf_trace.c                  | 27 +++++++++++++++++++++++
 tools/include/uapi/linux/bpf_perf_event.h |  8 +++++++
 6 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 00ab5efa38334..3a68c999f50d1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -59,11 +59,6 @@ struct perf_guest_info_callbacks {
 #include <linux/security.h>
 #include <asm/local.h>
 
-struct perf_callchain_entry {
-	__u64				nr;
-	__u64				ip[]; /* /proc/sys/kernel/perf_event_max_stack */
-};
-
 struct perf_callchain_entry_ctx {
 	struct perf_callchain_entry *entry;
 	u32			    max_stack;
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 5c69433540494..8e1e88f40eef9 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -631,6 +631,7 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
 			    u32 *fd_type, const char **buf,
 			    u64 *probe_offset, u64 *probe_addr);
+int bpf_trace_init_btf_ids(struct btf *btf);
 #else
 static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 {
@@ -672,6 +673,10 @@ static inline int bpf_get_perf_event_info(const struct perf_event *event,
 {
 	return -EOPNOTSUPP;
 }
+int bpf_trace_init_btf_ids(struct btf *btf)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 enum {
diff --git a/include/uapi/linux/bpf_perf_event.h b/include/uapi/linux/bpf_perf_event.h
index eb1b9d21250c6..40f4df80ab4fa 100644
--- a/include/uapi/linux/bpf_perf_event.h
+++ b/include/uapi/linux/bpf_perf_event.h
@@ -9,11 +9,18 @@
 #define _UAPI__LINUX_BPF_PERF_EVENT_H__
 
 #include <asm/bpf_perf_event.h>
+#include <linux/bpf.h>
+
+struct perf_callchain_entry {
+	__u64				nr;
+	__u64				ip[]; /* /proc/sys/kernel/perf_event_max_stack */
+};
 
 struct bpf_perf_event_data {
 	bpf_user_pt_regs_t regs;
 	__u64 sample_period;
 	__u64 addr;
+	__bpf_md_ptr(struct perf_callchain_entry *, callchain);
 };
 
 #endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 4c3007f428b16..cb122e14dba38 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -20,6 +20,7 @@
 #include <linux/btf.h>
 #include <linux/skmsg.h>
 #include <linux/perf_event.h>
+#include <linux/trace_events.h>
 #include <net/sock.h>
 
 /* BTF (BPF Type Format) is the meta data format which describes
@@ -3673,6 +3674,10 @@ struct btf *btf_parse_vmlinux(void)
 	if (err < 0)
 		goto errout;
 
+	err = bpf_trace_init_btf_ids(btf);
+	if (err < 0)
+		goto errout;
+
 	bpf_struct_ops_init(btf, log);
 	init_btf_sock_ids(btf);
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e0b7775039ab9..c014846c2723c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -6,6 +6,7 @@
 #include <linux/types.h>
 #include <linux/slab.h>
 #include <linux/bpf.h>
+#include <linux/btf.h>
 #include <linux/bpf_perf_event.h>
 #include <linux/filter.h>
 #include <linux/uaccess.h>
@@ -31,6 +32,20 @@ struct bpf_trace_module {
 static LIST_HEAD(bpf_trace_modules);
 static DEFINE_MUTEX(bpf_module_mutex);
 
+static u32 perf_callchain_entry_btf_id;
+
+int bpf_trace_init_btf_ids(struct btf *btf)
+{
+	s32 type_id;
+
+	type_id = btf_find_by_name_kind(btf, "perf_callchain_entry",
+					BTF_KIND_STRUCT);
+	if (type_id < 0)
+		return -EINVAL;
+	perf_callchain_entry_btf_id = type_id;
+	return 0;
+}
+
 static struct bpf_raw_event_map *bpf_get_raw_tracepoint_module(const char *name)
 {
 	struct bpf_raw_event_map *btp, *ret = NULL;
@@ -1650,6 +1665,10 @@ static bool pe_prog_is_valid_access(int off, int size, enum bpf_access_type type
 		if (!bpf_ctx_narrow_access_ok(off, size, size_u64))
 			return false;
 		break;
+	case bpf_ctx_range(struct bpf_perf_event_data, callchain):
+		info->reg_type = PTR_TO_BTF_ID;
+		info->btf_id = perf_callchain_entry_btf_id;
+		break;
 	default:
 		if (size != sizeof(long))
 			return false;
@@ -1682,6 +1701,14 @@ static u32 pe_prog_convert_ctx_access(enum bpf_access_type type,
 				      bpf_target_off(struct perf_sample_data, addr, 8,
 						     target_size));
 		break;
+	case offsetof(struct bpf_perf_event_data, callchain):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_perf_event_data_kern,
+						       data), si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_perf_event_data_kern, data));
+		*insn++ = BPF_LDX_MEM(BPF_DW, si->dst_reg, si->dst_reg,
+				      bpf_target_off(struct perf_sample_data, callchain,
+						     8, target_size));
+		break;
 	default:
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct bpf_perf_event_data_kern,
 						       regs), si->dst_reg, si->src_reg,
diff --git a/tools/include/uapi/linux/bpf_perf_event.h b/tools/include/uapi/linux/bpf_perf_event.h
index 8f95303f9d807..40f4df80ab4fa 100644
--- a/tools/include/uapi/linux/bpf_perf_event.h
+++ b/tools/include/uapi/linux/bpf_perf_event.h
@@ -9,10 +9,18 @@
 #define _UAPI__LINUX_BPF_PERF_EVENT_H__
 
 #include <asm/bpf_perf_event.h>
+#include <linux/bpf.h>
+
+struct perf_callchain_entry {
+	__u64				nr;
+	__u64				ip[]; /* /proc/sys/kernel/perf_event_max_stack */
+};
 
 struct bpf_perf_event_data {
 	bpf_user_pt_regs_t regs;
 	__u64 sample_period;
+	__u64 addr;
+	__bpf_md_ptr(struct perf_callchain_entry *, callchain);
 };
 
 #endif /* _UAPI__LINUX_BPF_PERF_EVENT_H__ */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next 3/5] bpf: introduce bpf_get_callchain_stackid
  2020-07-11  1:26 [PATCH bpf-next 0/5] bpf: fix stackmap on perf_events with PEBS Song Liu
  2020-07-11  1:26 ` [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries Song Liu
  2020-07-11  1:26 ` [PATCH bpf-next 2/5] bpf: add callchain to bpf_perf_event_data Song Liu
@ 2020-07-11  1:26 ` Song Liu
  2020-07-11  1:26 ` [PATCH bpf-next 4/5] selftests/bpf: add get_stackid_cannot_attach Song Liu
  2020-07-11  1:26 ` [PATCH bpf-next 5/5] selftests/bpf: add callchain_stackid Song Liu
  4 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2020-07-11  1:26 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer,
	peterz, Song Liu

This helper is only used by BPF program attached to perf_event. If the
perf_event has PEBS entries, calling get_perf_callchain from BPF program
may cause unwinder errors. bpf_get_callchain_stackid serves as alternative
to bpf_get_stackid for these BPF programs.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/bpf.h            |  1 +
 include/uapi/linux/bpf.h       | 43 +++++++++++++++++++++++
 kernel/bpf/stackmap.c          | 63 ++++++++++++++++++++++++++--------
 kernel/bpf/verifier.c          |  4 ++-
 kernel/trace/bpf_trace.c       |  2 ++
 scripts/bpf_helpers_doc.py     |  2 ++
 tools/include/uapi/linux/bpf.h | 43 +++++++++++++++++++++++
 7 files changed, 142 insertions(+), 16 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0cd7f6884c5cd..45cf12acb0e26 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1628,6 +1628,7 @@ extern const struct bpf_func_proto bpf_get_current_comm_proto;
 extern const struct bpf_func_proto bpf_get_stackid_proto;
 extern const struct bpf_func_proto bpf_get_stack_proto;
 extern const struct bpf_func_proto bpf_get_task_stack_proto;
+extern const struct bpf_func_proto bpf_get_callchain_stackid_proto;
 extern const struct bpf_func_proto bpf_sock_map_update_proto;
 extern const struct bpf_func_proto bpf_sock_hash_update_proto;
 extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 548a749aebb3e..a808accfbd457 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3319,6 +3319,48 @@ union bpf_attr {
  *		A non-negative value equal to or less than *size* on success,
  *		or a negative error in case of failure.
  *
+ * long bpf_get_callchain_stackid(struct perf_callchain_entry *callchain, struct bpf_map *map, u64 flags)
+ *	Description
+ *		Walk a user or a kernel stack and return its id. To achieve
+ *		this, the helper needs *callchain*, which is a pointer to a
+ *		valid perf_callchain_entry, and a pointer to a *map* of type
+ *		**BPF_MAP_TYPE_STACK_TRACE**.
+ *
+ *		The last argument, *flags*, holds the number of stack frames to
+ *		skip (from 0 to 255), masked with
+ *		**BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ *		a combination of the following flags:
+ *
+ *		**BPF_F_USER_STACK**
+ *			Collect a user space stack instead of a kernel stack.
+ *		**BPF_F_FAST_STACK_CMP**
+ *			Compare stacks by hash only.
+ *		**BPF_F_REUSE_STACKID**
+ *			If two different stacks hash into the same *stackid*,
+ *			discard the old one.
+ *
+ *		The stack id retrieved is a 32 bit long integer handle which
+ *		can be further combined with other data (including other stack
+ *		ids) and used as a key into maps. This can be useful for
+ *		generating a variety of graphs (such as flame graphs or off-cpu
+ *		graphs).
+ *
+ *		For walking a stack, this helper is an improvement over
+ *		**bpf_probe_read**\ (), which can be used with unrolled loops
+ *		but is not efficient and consumes a lot of eBPF instructions.
+ *		Instead, **bpf_get_callchain_stackid**\ () can collect up to
+ *		**PERF_MAX_STACK_DEPTH** both kernel and user frames. Note that
+ *		this limit can be controlled with the **sysctl** program, and
+ *		that it should be manually increased in order to profile long
+ *		user stacks (such as stacks for Java programs). To do so, use:
+ *
+ *		::
+ *
+ *			# sysctl kernel.perf_event_max_stack=<new value>
+ *	Return
+ *		The positive or null stack id on success, or a negative error
+ *		in case of failure.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3463,6 +3505,7 @@ union bpf_attr {
 	FN(skc_to_tcp_request_sock),	\
 	FN(skc_to_udp6_sock),		\
 	FN(get_task_stack),		\
+	FN(get_callchain_stackid),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index a6c361ed7937b..28acc610f7f94 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -386,11 +386,10 @@ get_callchain_entry_for_task(struct task_struct *task, u32 init_nr)
 #endif
 }
 
-BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
-	   u64, flags)
+static long __bpf_get_stackid(struct bpf_map *map, struct perf_callchain_entry *trace,
+			      u64 flags)
 {
 	struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
-	struct perf_callchain_entry *trace;
 	struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
 	u32 max_depth = map->value_size / stack_map_data_size(map);
 	/* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */
@@ -398,21 +397,9 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
 	u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
 	u32 hash, id, trace_nr, trace_len;
 	bool user = flags & BPF_F_USER_STACK;
-	bool kernel = !user;
 	u64 *ips;
 	bool hash_matches;
 
-	if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
-			       BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
-		return -EINVAL;
-
-	trace = get_perf_callchain(regs, init_nr, kernel, user,
-				   sysctl_perf_event_max_stack, false, false);
-
-	if (unlikely(!trace))
-		/* couldn't fetch the stack trace */
-		return -EFAULT;
-
 	/* get_perf_callchain() guarantees that trace->nr >= init_nr
 	 * and trace-nr <= sysctl_perf_event_max_stack, so trace_nr <= max_depth
 	 */
@@ -477,6 +464,30 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
 	return id;
 }
 
+BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
+	   u64, flags)
+{
+	u32 max_depth = map->value_size / stack_map_data_size(map);
+	/* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */
+	u32 init_nr = sysctl_perf_event_max_stack - max_depth;
+	bool user = flags & BPF_F_USER_STACK;
+	struct perf_callchain_entry *trace;
+	bool kernel = !user;
+
+	if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
+			       BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
+		return -EINVAL;
+
+	trace = get_perf_callchain(regs, init_nr, kernel, user,
+				   sysctl_perf_event_max_stack, false, false);
+
+	if (unlikely(!trace))
+		/* couldn't fetch the stack trace */
+		return -EFAULT;
+
+	return __bpf_get_stackid(map, trace, flags);
+}
+
 const struct bpf_func_proto bpf_get_stackid_proto = {
 	.func		= bpf_get_stackid,
 	.gpl_only	= true,
@@ -486,6 +497,28 @@ const struct bpf_func_proto bpf_get_stackid_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+BPF_CALL_3(bpf_get_callchain_stackid, struct perf_callchain_entry *, callchain,
+	   struct bpf_map *, map, u64, flags)
+{
+	if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
+			       BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
+		return -EINVAL;
+	if (!callchain)
+		return -EFAULT;
+	return __bpf_get_stackid(map, callchain, flags);
+}
+
+static int bpf_get_callchain_stackid_btf_ids[5];
+const struct bpf_func_proto bpf_get_callchain_stackid_proto = {
+	.func		= bpf_get_callchain_stackid,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_ANYTHING,
+	.btf_id		= bpf_get_callchain_stackid_btf_ids,
+};
+
 static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
 			    void *buf, u32 size, u64 flags)
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 1e11b0f6fba31..07be75550ca93 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4094,7 +4094,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 			goto error;
 		break;
 	case BPF_MAP_TYPE_STACK_TRACE:
-		if (func_id != BPF_FUNC_get_stackid)
+		if (func_id != BPF_FUNC_get_stackid &&
+		    func_id != BPF_FUNC_get_callchain_stackid)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_CGROUP_ARRAY:
@@ -4187,6 +4188,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 			goto error;
 		break;
 	case BPF_FUNC_get_stackid:
+	case BPF_FUNC_get_callchain_stackid:
 		if (map->map_type != BPF_MAP_TYPE_STACK_TRACE)
 			goto error;
 		break;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index c014846c2723c..7a504f734a025 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1396,6 +1396,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_perf_prog_read_value_proto;
 	case BPF_FUNC_read_branch_records:
 		return &bpf_read_branch_records_proto;
+	case BPF_FUNC_get_callchain_stackid:
+		return &bpf_get_callchain_stackid_proto;
 	default:
 		return bpf_tracing_func_proto(func_id, prog);
 	}
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 6843376733df8..1b99e3618e492 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -427,6 +427,7 @@ class PrinterHelpers(Printer):
             'struct tcp_request_sock',
             'struct udp6_sock',
             'struct task_struct',
+            'struct perf_callchain_entry',
 
             'struct __sk_buff',
             'struct sk_msg_md',
@@ -470,6 +471,7 @@ class PrinterHelpers(Printer):
             'struct tcp_request_sock',
             'struct udp6_sock',
             'struct task_struct',
+            'struct perf_callchain_entry',
     }
     mapped_types = {
             'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 548a749aebb3e..a808accfbd457 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3319,6 +3319,48 @@ union bpf_attr {
  *		A non-negative value equal to or less than *size* on success,
  *		or a negative error in case of failure.
  *
+ * long bpf_get_callchain_stackid(struct perf_callchain_entry *callchain, struct bpf_map *map, u64 flags)
+ *	Description
+ *		Walk a user or a kernel stack and return its id. To achieve
+ *		this, the helper needs *callchain*, which is a pointer to a
+ *		valid perf_callchain_entry, and a pointer to a *map* of type
+ *		**BPF_MAP_TYPE_STACK_TRACE**.
+ *
+ *		The last argument, *flags*, holds the number of stack frames to
+ *		skip (from 0 to 255), masked with
+ *		**BPF_F_SKIP_FIELD_MASK**. The next bits can be used to set
+ *		a combination of the following flags:
+ *
+ *		**BPF_F_USER_STACK**
+ *			Collect a user space stack instead of a kernel stack.
+ *		**BPF_F_FAST_STACK_CMP**
+ *			Compare stacks by hash only.
+ *		**BPF_F_REUSE_STACKID**
+ *			If two different stacks hash into the same *stackid*,
+ *			discard the old one.
+ *
+ *		The stack id retrieved is a 32 bit long integer handle which
+ *		can be further combined with other data (including other stack
+ *		ids) and used as a key into maps. This can be useful for
+ *		generating a variety of graphs (such as flame graphs or off-cpu
+ *		graphs).
+ *
+ *		For walking a stack, this helper is an improvement over
+ *		**bpf_probe_read**\ (), which can be used with unrolled loops
+ *		but is not efficient and consumes a lot of eBPF instructions.
+ *		Instead, **bpf_get_callchain_stackid**\ () can collect up to
+ *		**PERF_MAX_STACK_DEPTH** both kernel and user frames. Note that
+ *		this limit can be controlled with the **sysctl** program, and
+ *		that it should be manually increased in order to profile long
+ *		user stacks (such as stacks for Java programs). To do so, use:
+ *
+ *		::
+ *
+ *			# sysctl kernel.perf_event_max_stack=<new value>
+ *	Return
+ *		The positive or null stack id on success, or a negative error
+ *		in case of failure.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3463,6 +3505,7 @@ union bpf_attr {
 	FN(skc_to_tcp_request_sock),	\
 	FN(skc_to_udp6_sock),		\
 	FN(get_task_stack),		\
+	FN(get_callchain_stackid),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next 4/5] selftests/bpf: add get_stackid_cannot_attach
  2020-07-11  1:26 [PATCH bpf-next 0/5] bpf: fix stackmap on perf_events with PEBS Song Liu
                   ` (2 preceding siblings ...)
  2020-07-11  1:26 ` [PATCH bpf-next 3/5] bpf: introduce bpf_get_callchain_stackid Song Liu
@ 2020-07-11  1:26 ` Song Liu
  2020-07-11  1:26 ` [PATCH bpf-next 5/5] selftests/bpf: add callchain_stackid Song Liu
  4 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2020-07-11  1:26 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer,
	peterz, Song Liu

This test confirms that BPF program that calls bpf_get_stackid() cannot
attach to perf_event with PEBS entry.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 .../prog_tests/get_stackid_cannot_attach.c    | 57 +++++++++++++++++++
 1 file changed, 57 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/get_stackid_cannot_attach.c

diff --git a/tools/testing/selftests/bpf/prog_tests/get_stackid_cannot_attach.c b/tools/testing/selftests/bpf/prog_tests/get_stackid_cannot_attach.c
new file mode 100644
index 0000000000000..ae943c502b62b
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/get_stackid_cannot_attach.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Facebook
+#include <test_progs.h>
+#include "test_stacktrace_build_id.skel.h"
+
+void test_get_stackid_cannot_attach(void)
+{
+	struct perf_event_attr attr = {
+		/* .type = PERF_TYPE_SOFTWARE, */
+		.type = PERF_TYPE_HARDWARE,
+		.config = PERF_COUNT_HW_CPU_CYCLES,
+		.precise_ip = 2,
+		.sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_BRANCH_STACK |
+			PERF_SAMPLE_CALLCHAIN,
+		.branch_sample_type = PERF_SAMPLE_BRANCH_USER |
+			PERF_SAMPLE_BRANCH_NO_FLAGS |
+			PERF_SAMPLE_BRANCH_NO_CYCLES |
+			PERF_SAMPLE_BRANCH_CALL_STACK,
+		.sample_period = 5000,
+		.size = sizeof(struct perf_event_attr),
+	};
+	struct test_stacktrace_build_id *skel;
+	__u32 duration = 0;
+	int pmu_fd, err;
+
+	skel = test_stacktrace_build_id__open();
+	if (CHECK(!skel, "skel_open", "skeleton open failed\n"))
+		return;
+
+	/* override program type */
+	bpf_program__set_perf_event(skel->progs.oncpu);
+
+	err = test_stacktrace_build_id__load(skel);
+	if (CHECK(err, "skel_load", "skeleton load failed: %d\n", err))
+		goto cleanup;
+
+	pmu_fd = syscall(__NR_perf_event_open, &attr, -1 /* pid */,
+			 0 /* cpu 0 */, -1 /* group id */,
+			 0 /* flags */);
+	if (pmu_fd < 0 && errno == ENOENT) {
+		printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n", __func__);
+		test__skip();
+		goto cleanup;
+	}
+	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n",
+		  pmu_fd, errno))
+		goto cleanup;
+
+	skel->links.oncpu = bpf_program__attach_perf_event(skel->progs.oncpu,
+							   pmu_fd);
+	CHECK(!IS_ERR(skel->links.oncpu), "attach_perf_event",
+	      "should have failed\n");
+	close(pmu_fd);
+
+cleanup:
+	test_stacktrace_build_id__destroy(skel);
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next 5/5] selftests/bpf: add callchain_stackid
  2020-07-11  1:26 [PATCH bpf-next 0/5] bpf: fix stackmap on perf_events with PEBS Song Liu
                   ` (3 preceding siblings ...)
  2020-07-11  1:26 ` [PATCH bpf-next 4/5] selftests/bpf: add get_stackid_cannot_attach Song Liu
@ 2020-07-11  1:26 ` Song Liu
  4 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2020-07-11  1:26 UTC (permalink / raw)
  To: linux-kernel, bpf, netdev
  Cc: ast, daniel, kernel-team, john.fastabend, kpsingh, brouer,
	peterz, Song Liu

This tests new helper function bpf_get_callchain_stackid(), which is the
alternative to bpf_get_stackid() for perf_event with PEBS entry.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 .../bpf/prog_tests/callchain_stackid.c        | 61 +++++++++++++++++++
 .../selftests/bpf/progs/callchain_stackid.c   | 37 +++++++++++
 2 files changed, 98 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/callchain_stackid.c
 create mode 100644 tools/testing/selftests/bpf/progs/callchain_stackid.c

diff --git a/tools/testing/selftests/bpf/prog_tests/callchain_stackid.c b/tools/testing/selftests/bpf/prog_tests/callchain_stackid.c
new file mode 100644
index 0000000000000..ebe6251324a1a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/callchain_stackid.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Facebook
+#include <test_progs.h>
+#include "callchain_stackid.skel.h"
+
+void test_callchain_stackid(void)
+{
+	struct perf_event_attr attr = {
+		/* .type = PERF_TYPE_SOFTWARE, */
+		.type = PERF_TYPE_HARDWARE,
+		.config = PERF_COUNT_HW_CPU_CYCLES,
+		.precise_ip = 2,
+		.sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_BRANCH_STACK |
+			PERF_SAMPLE_CALLCHAIN,
+		.branch_sample_type = PERF_SAMPLE_BRANCH_USER |
+			PERF_SAMPLE_BRANCH_NO_FLAGS |
+			PERF_SAMPLE_BRANCH_NO_CYCLES |
+			PERF_SAMPLE_BRANCH_CALL_STACK,
+		.sample_period = 5000,
+		.size = sizeof(struct perf_event_attr),
+	};
+	struct callchain_stackid *skel;
+	__u32 duration = 0;
+	int pmu_fd, err;
+
+	skel = callchain_stackid__open();
+
+	if (CHECK(!skel, "skel_open", "skeleton open failed\n"))
+		return;
+
+	/* override program type */
+	bpf_program__set_perf_event(skel->progs.oncpu);
+
+	err = callchain_stackid__load(skel);
+	if (CHECK(err, "skel_load", "skeleton load failed: %d\n", err))
+		goto cleanup;
+
+	pmu_fd = syscall(__NR_perf_event_open, &attr, -1 /* pid */,
+			 0 /* cpu 0 */, -1 /* group id */,
+			 0 /* flags */);
+	if (pmu_fd < 0) {
+		printf("%s:SKIP:cpu doesn't support the event\n", __func__);
+		test__skip();
+		goto cleanup;
+	}
+
+	skel->links.oncpu = bpf_program__attach_perf_event(skel->progs.oncpu,
+							   pmu_fd);
+	if (CHECK(IS_ERR(skel->links.oncpu), "attach_perf_event",
+		  "err %ld\n", PTR_ERR(skel->links.oncpu))) {
+		close(pmu_fd);
+		goto cleanup;
+	}
+	usleep(500000);
+
+	CHECK(skel->data->total_val == 1, "get_callchain_stack", "failed\n");
+	close(pmu_fd);
+
+cleanup:
+	callchain_stackid__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/callchain_stackid.c b/tools/testing/selftests/bpf/progs/callchain_stackid.c
new file mode 100644
index 0000000000000..aab2c736a0a45
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/callchain_stackid.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2020 Facebook
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+#ifndef PERF_MAX_STACK_DEPTH
+#define PERF_MAX_STACK_DEPTH         127
+#endif
+
+#ifndef BPF_F_USER_STACK
+#define BPF_F_USER_STACK		(1ULL << 8)
+#endif
+
+typedef __u64 stack_trace_t[PERF_MAX_STACK_DEPTH];
+struct {
+	__uint(type, BPF_MAP_TYPE_STACK_TRACE);
+	__uint(max_entries, 16384);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(stack_trace_t));
+} stackmap SEC(".maps");
+
+long total_val = 1;
+
+SEC("perf_event")
+int oncpu(struct bpf_perf_event_data *ctx)
+{
+	long val;
+
+	val = bpf_get_callchain_stackid(ctx->callchain, &stackmap, 0);
+
+	if (val > 0)
+		total_val += val;
+
+	return 0;
+}
+
+char LICENSE[] SEC("license") = "GPL";
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries
  2020-07-11  1:26 ` [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries Song Liu
@ 2020-07-11  3:53   ` Andrii Nakryiko
  2020-07-11  6:28     ` Song Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2020-07-11  3:53 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Peter Ziljstra

On Fri, Jul 10, 2020 at 6:30 PM Song Liu <songliubraving@fb.com> wrote:
>
> Calling get_perf_callchain() on perf_events from PEBS entries may cause
> unwinder errors. To fix this issue, the callchain is fetched early. Such
> perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
>
> Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
> also cause unwinder errors. To fix this, block bpf_get_[stack|stackid] on
> these perf_events. Unfortunately, bpf verifier cannot tell whether the
> program will be attached to perf_event with PEBS entries. Therefore,
> block such programs during ioctl(PERF_EVENT_IOC_SET_BPF).
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---

Perhaps it's a stupid question, but why bpf_get_stack/bpf_get_stackid
can't figure out automatically that they are called from
__PERF_SAMPLE_CALLCHAIN_EARLY perf event and use different callchain,
if necessary?

It is quite suboptimal from a user experience point of view to require
two different BPF helpers depending on PEBS or non-PEBS perf events.

[...]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries
  2020-07-11  3:53   ` Andrii Nakryiko
@ 2020-07-11  6:28     ` Song Liu
  2020-07-12  5:06       ` Andrii Nakryiko
  0 siblings, 1 reply; 10+ messages in thread
From: Song Liu @ 2020-07-11  6:28 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Peter Ziljstra



> On Jul 10, 2020, at 8:53 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Fri, Jul 10, 2020 at 6:30 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> Calling get_perf_callchain() on perf_events from PEBS entries may cause
>> unwinder errors. To fix this issue, the callchain is fetched early. Such
>> perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
>> 
>> Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
>> also cause unwinder errors. To fix this, block bpf_get_[stack|stackid] on
>> these perf_events. Unfortunately, bpf verifier cannot tell whether the
>> program will be attached to perf_event with PEBS entries. Therefore,
>> block such programs during ioctl(PERF_EVENT_IOC_SET_BPF).
>> 
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
> 
> Perhaps it's a stupid question, but why bpf_get_stack/bpf_get_stackid
> can't figure out automatically that they are called from
> __PERF_SAMPLE_CALLCHAIN_EARLY perf event and use different callchain,
> if necessary?
> 
> It is quite suboptimal from a user experience point of view to require
> two different BPF helpers depending on PEBS or non-PEBS perf events.

I am not aware of an easy way to tell the difference in bpf_get_stack. 
But I do agree that would be much better. 

Thanks,
Song

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries
  2020-07-11  6:28     ` Song Liu
@ 2020-07-12  5:06       ` Andrii Nakryiko
  2020-07-12  6:34         ` Song Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Andrii Nakryiko @ 2020-07-12  5:06 UTC (permalink / raw)
  To: Song Liu
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Peter Ziljstra

On Fri, Jul 10, 2020 at 11:28 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Jul 10, 2020, at 8:53 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Jul 10, 2020 at 6:30 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >> Calling get_perf_callchain() on perf_events from PEBS entries may cause
> >> unwinder errors. To fix this issue, the callchain is fetched early. Such
> >> perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
> >>
> >> Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
> >> also cause unwinder errors. To fix this, block bpf_get_[stack|stackid] on
> >> these perf_events. Unfortunately, bpf verifier cannot tell whether the
> >> program will be attached to perf_event with PEBS entries. Therefore,
> >> block such programs during ioctl(PERF_EVENT_IOC_SET_BPF).
> >>
> >> Signed-off-by: Song Liu <songliubraving@fb.com>
> >> ---
> >
> > Perhaps it's a stupid question, but why bpf_get_stack/bpf_get_stackid
> > can't figure out automatically that they are called from
> > __PERF_SAMPLE_CALLCHAIN_EARLY perf event and use different callchain,
> > if necessary?
> >
> > It is quite suboptimal from a user experience point of view to require
> > two different BPF helpers depending on PEBS or non-PEBS perf events.
>
> I am not aware of an easy way to tell the difference in bpf_get_stack.
> But I do agree that would be much better.
>

Hm... Looking a bit more how all this is tied together in the kernel,
I think it's actually quite easy. So, for perf_event BPF program type:

1. return a special prototype for bpf_get_stack/bpf_get_stackid, which
will have this extra bit of logic for callchain. All other program
types with access to bpf_get_stack/bpf_get_stackid should use the
current one, probably.
2. For that special program, just like for bpf_read_branch_records(),
we know that context is actually `struct bpf_perf_event_data_kern *`,
and it has pt_regs, perf_sample_data and perf_event itself.
3. With that, it seems like you'll have everything you need to
automatically choose a proper callchain.

All this absolutely transparently to the BPF program.

Am I missing something?

> Thanks,
> Song

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries
  2020-07-12  5:06       ` Andrii Nakryiko
@ 2020-07-12  6:34         ` Song Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2020-07-12  6:34 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: open list, bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, john fastabend, KP Singh, Jesper Dangaard Brouer,
	Peter Ziljstra



> On Jul 11, 2020, at 10:06 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Fri, Jul 10, 2020 at 11:28 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> 
>> 
>>> On Jul 10, 2020, at 8:53 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>> 
>>> On Fri, Jul 10, 2020 at 6:30 PM Song Liu <songliubraving@fb.com> wrote:
>>>> 
>>>> Calling get_perf_callchain() on perf_events from PEBS entries may cause
>>>> unwinder errors. To fix this issue, the callchain is fetched early. Such
>>>> perf_events are marked with __PERF_SAMPLE_CALLCHAIN_EARLY.
>>>> 
>>>> Similarly, calling bpf_get_[stack|stackid] on perf_events from PEBS may
>>>> also cause unwinder errors. To fix this, block bpf_get_[stack|stackid] on
>>>> these perf_events. Unfortunately, bpf verifier cannot tell whether the
>>>> program will be attached to perf_event with PEBS entries. Therefore,
>>>> block such programs during ioctl(PERF_EVENT_IOC_SET_BPF).
>>>> 
>>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>>> ---
>>> 
>>> Perhaps it's a stupid question, but why bpf_get_stack/bpf_get_stackid
>>> can't figure out automatically that they are called from
>>> __PERF_SAMPLE_CALLCHAIN_EARLY perf event and use different callchain,
>>> if necessary?
>>> 
>>> It is quite suboptimal from a user experience point of view to require
>>> two different BPF helpers depending on PEBS or non-PEBS perf events.
>> 
>> I am not aware of an easy way to tell the difference in bpf_get_stack.
>> But I do agree that would be much better.
>> 
> 
> Hm... Looking a bit more how all this is tied together in the kernel,
> I think it's actually quite easy. So, for perf_event BPF program type:
> 
> 1. return a special prototype for bpf_get_stack/bpf_get_stackid, which
> will have this extra bit of logic for callchain. All other program
> types with access to bpf_get_stack/bpf_get_stackid should use the
> current one, probably.
> 2. For that special program, just like for bpf_read_branch_records(),
> we know that context is actually `struct bpf_perf_event_data_kern *`,
> and it has pt_regs, perf_sample_data and perf_event itself.
> 3. With that, it seems like you'll have everything you need to
> automatically choose a proper callchain.
> 
> All this absolutely transparently to the BPF program.
> 
> Am I missing something?

Good idea! A separate prototype should work here. 

Thanks,
Song


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-12  6:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-11  1:26 [PATCH bpf-next 0/5] bpf: fix stackmap on perf_events with PEBS Song Liu
2020-07-11  1:26 ` [PATCH bpf-next 1/5] bpf: block bpf_get_[stack|stackid] on perf_event with PEBS entries Song Liu
2020-07-11  3:53   ` Andrii Nakryiko
2020-07-11  6:28     ` Song Liu
2020-07-12  5:06       ` Andrii Nakryiko
2020-07-12  6:34         ` Song Liu
2020-07-11  1:26 ` [PATCH bpf-next 2/5] bpf: add callchain to bpf_perf_event_data Song Liu
2020-07-11  1:26 ` [PATCH bpf-next 3/5] bpf: introduce bpf_get_callchain_stackid Song Liu
2020-07-11  1:26 ` [PATCH bpf-next 4/5] selftests/bpf: add get_stackid_cannot_attach Song Liu
2020-07-11  1:26 ` [PATCH bpf-next 5/5] selftests/bpf: add callchain_stackid Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).