linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] bpf: Adjust BPF stack helper functions to accommodate skip > 0
@ 2022-03-14 18:20 Namhyung Kim
  2022-03-14 18:20 ` [PATCH v3 2/2] bpf/selftests: Test skipping stacktrace Namhyung Kim
  2022-03-21  2:30 ` [PATCH v3 1/2] bpf: Adjust BPF stack helper functions to accommodate skip > 0 patchwork-bot+netdevbpf
  0 siblings, 2 replies; 4+ messages in thread
From: Namhyung Kim @ 2022-03-14 18:20 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, netdev, bpf, LKML, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Eugene Loh, Hao Luo

Let's say that the caller has storage for num_elem stack frames.  Then,
the BPF stack helper functions walk the stack for only num_elem frames.
This means that if skip > 0, one keeps only 'num_elem - skip' frames.

This is because it sets init_nr in the perf_callchain_entry to the end
of the buffer to save num_elem entries only.  I believe it was because
the perf callchain code unwound the stack frames until it reached the
global max size (sysctl_perf_event_max_stack).

However it now has perf_callchain_entry_ctx.max_stack to limit the
iteration locally.  This simplifies the code to handle init_nr in the
BPF callstack entries and removes the confusion with the perf_event's
__PERF_SAMPLE_CALLCHAIN_EARLY which sets init_nr to 0.

Also change the comment on bpf_get_stack() in the header file to be
more explicit what the return value means.

Link: https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@oracle.com

Fixes: c195651e565a ("bpf: add bpf_get_stack helper")
Based-on-patch-by: Eugene Loh <eugene.loh@oracle.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/uapi/linux/bpf.h |  8 +++---
 kernel/bpf/stackmap.c    | 56 +++++++++++++++++-----------------------
 2 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b0383d371b9a..f09f20845904 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2975,8 +2975,8 @@ union bpf_attr {
  *
  * 			# sysctl kernel.perf_event_max_stack=<new value>
  * 	Return
- * 		A non-negative value equal to or less than *size* on success,
- * 		or a negative error in case of failure.
+ * 		The non-negative copied *buf* length equal to or less than
+ * 		*size* on success, or a negative error in case of failure.
  *
  * long bpf_skb_load_bytes_relative(const void *skb, u32 offset, void *to, u32 len, u32 start_header)
  * 	Description
@@ -4279,8 +4279,8 @@ union bpf_attr {
  *
  *			# sysctl kernel.perf_event_max_stack=<new value>
  *	Return
- *		A non-negative value equal to or less than *size* on success,
- *		or a negative error in case of failure.
+ * 		The non-negative copied *buf* length equal to or less than
+ * 		*size* on success, or a negative error in case of failure.
  *
  * long bpf_load_hdr_opt(struct bpf_sock_ops *skops, void *searchby_res, u32 len, u64 flags)
  *	Description
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 22c8ae94e4c1..2823dcefae10 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -166,7 +166,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 }
 
 static struct perf_callchain_entry *
-get_callchain_entry_for_task(struct task_struct *task, u32 init_nr)
+get_callchain_entry_for_task(struct task_struct *task, u32 max_depth)
 {
 #ifdef CONFIG_STACKTRACE
 	struct perf_callchain_entry *entry;
@@ -177,9 +177,8 @@ get_callchain_entry_for_task(struct task_struct *task, u32 init_nr)
 	if (!entry)
 		return NULL;
 
-	entry->nr = init_nr +
-		stack_trace_save_tsk(task, (unsigned long *)(entry->ip + init_nr),
-				     sysctl_perf_event_max_stack - init_nr, 0);
+	entry->nr = stack_trace_save_tsk(task, (unsigned long *)entry->ip,
+					 max_depth, 0);
 
 	/* stack_trace_save_tsk() works on unsigned long array, while
 	 * perf_callchain_entry uses u64 array. For 32-bit systems, it is
@@ -191,7 +190,7 @@ get_callchain_entry_for_task(struct task_struct *task, u32 init_nr)
 		int i;
 
 		/* copy data from the end to avoid using extra buffer */
-		for (i = entry->nr - 1; i >= (int)init_nr; i--)
+		for (i = entry->nr - 1; i >= 0; i--)
 			to[i] = (u64)(from[i]);
 	}
 
@@ -208,27 +207,19 @@ static long __bpf_get_stackid(struct bpf_map *map,
 {
 	struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
 	struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
-	u32 max_depth = map->value_size / stack_map_data_size(map);
-	/* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */
-	u32 init_nr = sysctl_perf_event_max_stack - max_depth;
 	u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
 	u32 hash, id, trace_nr, trace_len;
 	bool user = flags & BPF_F_USER_STACK;
 	u64 *ips;
 	bool hash_matches;
 
-	/* get_perf_callchain() guarantees that trace->nr >= init_nr
-	 * and trace-nr <= sysctl_perf_event_max_stack, so trace_nr <= max_depth
-	 */
-	trace_nr = trace->nr - init_nr;
-
-	if (trace_nr <= skip)
+	if (trace->nr <= skip)
 		/* skipping more than usable stack trace */
 		return -EFAULT;
 
-	trace_nr -= skip;
+	trace_nr = trace->nr - skip;
 	trace_len = trace_nr * sizeof(u64);
-	ips = trace->ip + skip + init_nr;
+	ips = trace->ip + skip;
 	hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0);
 	id = hash & (smap->n_buckets - 1);
 	bucket = READ_ONCE(smap->buckets[id]);
@@ -285,8 +276,7 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
 	   u64, flags)
 {
 	u32 max_depth = map->value_size / stack_map_data_size(map);
-	/* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */
-	u32 init_nr = sysctl_perf_event_max_stack - max_depth;
+	u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
 	bool user = flags & BPF_F_USER_STACK;
 	struct perf_callchain_entry *trace;
 	bool kernel = !user;
@@ -295,8 +285,12 @@ BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map,
 			       BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID)))
 		return -EINVAL;
 
-	trace = get_perf_callchain(regs, init_nr, kernel, user,
-				   sysctl_perf_event_max_stack, false, false);
+	max_depth += skip;
+	if (max_depth > sysctl_perf_event_max_stack)
+		max_depth = sysctl_perf_event_max_stack;
+
+	trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
+				   false, false);
 
 	if (unlikely(!trace))
 		/* couldn't fetch the stack trace */
@@ -387,7 +381,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
 			    struct perf_callchain_entry *trace_in,
 			    void *buf, u32 size, u64 flags)
 {
-	u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
+	u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
 	bool user_build_id = flags & BPF_F_USER_BUILD_ID;
 	u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
 	bool user = flags & BPF_F_USER_STACK;
@@ -412,30 +406,28 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
 		goto err_fault;
 
 	num_elem = size / elem_size;
-	if (sysctl_perf_event_max_stack < num_elem)
-		init_nr = 0;
-	else
-		init_nr = sysctl_perf_event_max_stack - num_elem;
+	max_depth = num_elem + skip;
+	if (sysctl_perf_event_max_stack < max_depth)
+		max_depth = sysctl_perf_event_max_stack;
 
 	if (trace_in)
 		trace = trace_in;
 	else if (kernel && task)
-		trace = get_callchain_entry_for_task(task, init_nr);
+		trace = get_callchain_entry_for_task(task, max_depth);
 	else
-		trace = get_perf_callchain(regs, init_nr, kernel, user,
-					   sysctl_perf_event_max_stack,
+		trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
 					   false, false);
 	if (unlikely(!trace))
 		goto err_fault;
 
-	trace_nr = trace->nr - init_nr;
-	if (trace_nr < skip)
+	if (trace->nr < skip)
 		goto err_fault;
 
-	trace_nr -= skip;
+	trace_nr = trace->nr - skip;
 	trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
 	copy_len = trace_nr * elem_size;
-	ips = trace->ip + skip + init_nr;
+
+	ips = trace->ip + skip;
 	if (user && user_build_id)
 		stack_map_get_build_id_offset(buf, ips, trace_nr, user);
 	else
-- 
2.35.1.723.g4982287a31-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v3 2/2] bpf/selftests: Test skipping stacktrace
  2022-03-14 18:20 [PATCH v3 1/2] bpf: Adjust BPF stack helper functions to accommodate skip > 0 Namhyung Kim
@ 2022-03-14 18:20 ` Namhyung Kim
  2022-03-15  5:18   ` Yonghong Song
  2022-03-21  2:30 ` [PATCH v3 1/2] bpf: Adjust BPF stack helper functions to accommodate skip > 0 patchwork-bot+netdevbpf
  1 sibling, 1 reply; 4+ messages in thread
From: Namhyung Kim @ 2022-03-14 18:20 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, netdev, bpf, LKML, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Eugene Loh, Hao Luo

Add a test case for stacktrace with skip > 0 using a small sized
buffer.  It didn't support skipping entries greater than or equal to
the size of buffer and filled the skipped part with 0.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
v3)
 * add pid filter
 * change assert condition

 .../bpf/prog_tests/stacktrace_map_skip.c      | 63 +++++++++++++++++
 .../selftests/bpf/progs/stacktrace_map_skip.c | 68 +++++++++++++++++++
 2 files changed, 131 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c
 create mode 100644 tools/testing/selftests/bpf/progs/stacktrace_map_skip.c

diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c
new file mode 100644
index 000000000000..1932b1e0685c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include "stacktrace_map_skip.skel.h"
+
+#define TEST_STACK_DEPTH  2
+
+void test_stacktrace_map_skip(void)
+{
+	struct stacktrace_map_skip *skel;
+	int stackid_hmap_fd, stackmap_fd, stack_amap_fd;
+	int err, stack_trace_len;
+
+	skel = stacktrace_map_skip__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	/* find map fds */
+	stackid_hmap_fd = bpf_map__fd(skel->maps.stackid_hmap);
+	if (!ASSERT_GE(stackid_hmap_fd, 0, "stackid_hmap fd"))
+		goto out;
+
+	stackmap_fd = bpf_map__fd(skel->maps.stackmap);
+	if (!ASSERT_GE(stackmap_fd, 0, "stackmap fd"))
+		goto out;
+
+	stack_amap_fd = bpf_map__fd(skel->maps.stack_amap);
+	if (!ASSERT_GE(stack_amap_fd, 0, "stack_amap fd"))
+		goto out;
+
+	skel->bss->pid = getpid();
+
+	err = stacktrace_map_skip__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto out;
+
+	/* give some time for bpf program run */
+	sleep(1);
+
+	/* disable stack trace collection */
+	skel->bss->control = 1;
+
+	/* for every element in stackid_hmap, we can find a corresponding one
+	 * in stackmap, and vise versa.
+	 */
+	err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
+	if (!ASSERT_OK(err, "compare_map_keys stackid_hmap vs. stackmap"))
+		goto out;
+
+	err = compare_map_keys(stackmap_fd, stackid_hmap_fd);
+	if (!ASSERT_OK(err, "compare_map_keys stackmap vs. stackid_hmap"))
+		goto out;
+
+	stack_trace_len = TEST_STACK_DEPTH * sizeof(__u64);
+	err = compare_stack_ips(stackmap_fd, stack_amap_fd, stack_trace_len);
+	if (!ASSERT_OK(err, "compare_stack_ips stackmap vs. stack_amap"))
+		goto out;
+
+	if (!ASSERT_EQ(skel->bss->failed, 0, "skip_failed"))
+		goto out;
+
+out:
+	stacktrace_map_skip__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/stacktrace_map_skip.c b/tools/testing/selftests/bpf/progs/stacktrace_map_skip.c
new file mode 100644
index 000000000000..2eb297df3dd6
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/stacktrace_map_skip.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+#define TEST_STACK_DEPTH	2
+#define TEST_MAX_ENTRIES	16384
+
+typedef __u64 stack_trace_t[TEST_STACK_DEPTH];
+
+struct {
+	__uint(type, BPF_MAP_TYPE_STACK_TRACE);
+	__uint(max_entries, TEST_MAX_ENTRIES);
+	__type(key, __u32);
+	__type(value, stack_trace_t);
+} stackmap SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, TEST_MAX_ENTRIES);
+	__type(key, __u32);
+	__type(value, __u32);
+} stackid_hmap SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, TEST_MAX_ENTRIES);
+	__type(key, __u32);
+	__type(value, stack_trace_t);
+} stack_amap SEC(".maps");
+
+int pid = 0;
+int control = 0;
+int failed = 0;
+
+SEC("tracepoint/sched/sched_switch")
+int oncpu(struct trace_event_raw_sched_switch *ctx)
+{
+	__u32 max_len = TEST_STACK_DEPTH * sizeof(__u64);
+	__u32 key = 0, val = 0;
+	__u64 *stack_p;
+
+	if (pid != (bpf_get_current_pid_tgid() >> 32))
+		return 0;
+
+	if (control)
+		return 0;
+
+	/* it should allow skipping whole buffer size entries */
+	key = bpf_get_stackid(ctx, &stackmap, TEST_STACK_DEPTH);
+	if ((int)key >= 0) {
+		/* The size of stackmap and stack_amap should be the same */
+		bpf_map_update_elem(&stackid_hmap, &key, &val, 0);
+		stack_p = bpf_map_lookup_elem(&stack_amap, &key);
+		if (stack_p) {
+			bpf_get_stack(ctx, stack_p, max_len, TEST_STACK_DEPTH);
+			/* it wrongly skipped all the entries and filled zero */
+			if (stack_p[0] == 0)
+				failed = 1;
+		}
+	} else {
+		/* old kernel doesn't support skipping that many entries */
+		failed = 2;
+	}
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.35.1.723.g4982287a31-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 2/2] bpf/selftests: Test skipping stacktrace
  2022-03-14 18:20 ` [PATCH v3 2/2] bpf/selftests: Test skipping stacktrace Namhyung Kim
@ 2022-03-15  5:18   ` Yonghong Song
  0 siblings, 0 replies; 4+ messages in thread
From: Yonghong Song @ 2022-03-15  5:18 UTC (permalink / raw)
  To: Namhyung Kim, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Song Liu, John Fastabend, KP Singh, netdev,
	bpf, LKML, Arnaldo Carvalho de Melo, Peter Zijlstra, Eugene Loh,
	Hao Luo



On 3/14/22 11:20 AM, Namhyung Kim wrote:
> Add a test case for stacktrace with skip > 0 using a small sized
> buffer.  It didn't support skipping entries greater than or equal to
> the size of buffer and filled the skipped part with 0.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 1/2] bpf: Adjust BPF stack helper functions to accommodate skip > 0
  2022-03-14 18:20 [PATCH v3 1/2] bpf: Adjust BPF stack helper functions to accommodate skip > 0 Namhyung Kim
  2022-03-14 18:20 ` [PATCH v3 2/2] bpf/selftests: Test skipping stacktrace Namhyung Kim
@ 2022-03-21  2:30 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-03-21  2:30 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, netdev, bpf, linux-kernel, acme, peterz, eugene.loh,
	haoluo

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Mon, 14 Mar 2022 11:20:41 -0700 you wrote:
> Let's say that the caller has storage for num_elem stack frames.  Then,
> the BPF stack helper functions walk the stack for only num_elem frames.
> This means that if skip > 0, one keeps only 'num_elem - skip' frames.
> 
> This is because it sets init_nr in the perf_callchain_entry to the end
> of the buffer to save num_elem entries only.  I believe it was because
> the perf callchain code unwound the stack frames until it reached the
> global max size (sysctl_perf_event_max_stack).
> 
> [...]

Here is the summary with links:
  - [v3,1/2] bpf: Adjust BPF stack helper functions to accommodate skip > 0
    https://git.kernel.org/bpf/bpf-next/c/ee2a098851bf
  - [v3,2/2] bpf/selftests: Test skipping stacktrace
    https://git.kernel.org/bpf/bpf-next/c/e1cc1f39981b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-21  2:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-14 18:20 [PATCH v3 1/2] bpf: Adjust BPF stack helper functions to accommodate skip > 0 Namhyung Kim
2022-03-14 18:20 ` [PATCH v3 2/2] bpf/selftests: Test skipping stacktrace Namhyung Kim
2022-03-15  5:18   ` Yonghong Song
2022-03-21  2:30 ` [PATCH v3 1/2] bpf: Adjust BPF stack helper functions to accommodate skip > 0 patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).