bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper
@ 2020-01-23 21:23 Daniel Xu
  2020-01-23 21:23 ` [PATCH v3 bpf-next 1/3] bpf: " Daniel Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Daniel Xu @ 2020-01-23 21:23 UTC (permalink / raw)
  To: bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.

We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.

Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.

Changes in v3:
- Document filling unused buffer with zero
- Formatting fixes
- Rebase

Changes in v2:
- Change to a bpf helper instead of context access
- Avoid mentioning Intel specific things

Daniel Xu (3):
  bpf: Add bpf_perf_prog_read_branches() helper
  tools/bpf: Sync uapi header bpf.h
  selftests/bpf: add bpf_perf_prog_read_branches() selftest

 include/uapi/linux/bpf.h                      |  15 ++-
 kernel/trace/bpf_trace.c                      |  31 +++++
 tools/include/uapi/linux/bpf.h                |  15 ++-
 .../selftests/bpf/prog_tests/perf_branches.c  | 106 ++++++++++++++++++
 .../selftests/bpf/progs/test_perf_branches.c  |  39 +++++++
 5 files changed, 204 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c

-- 
2.21.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23 21:23 [PATCH v3 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper Daniel Xu
@ 2020-01-23 21:23 ` Daniel Xu
  2020-01-23 23:48   ` Yonghong Song
  2020-01-24  0:49   ` John Fastabend
  2020-01-23 21:23 ` [PATCH v3 bpf-next 2/3] tools/bpf: Sync uapi header bpf.h Daniel Xu
  2020-01-23 21:23 ` [PATCH v3 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest Daniel Xu
  2 siblings, 2 replies; 11+ messages in thread
From: Daniel Xu @ 2020-01-23 21:23 UTC (permalink / raw)
  To: bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.

We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.

Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 include/uapi/linux/bpf.h | 15 ++++++++++++++-
 kernel/trace/bpf_trace.c | 31 +++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f1d74a2bd234..50c580c8a201 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2892,6 +2892,18 @@ union bpf_attr {
  *		Obtain the 64bit jiffies
  *	Return
  *		The 64 bit jiffies
+ *
+ * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
+ *	Description
+ *		For en eBPF program attached to a perf event, retrieve the
+ *		branch records (struct perf_branch_entry) associated to *ctx*
+ *		and store it in	the buffer pointed by *buf* up to size
+ *		*buf_size* bytes.
+ *
+ *		Any unused parts of *buf* will be filled with zeros.
+ *	Return
+ *		On success, number of bytes written to *buf*. On error, a
+ *		negative value.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3012,7 +3024,8 @@ union bpf_attr {
 	FN(probe_read_kernel_str),	\
 	FN(tcp_send_ack),		\
 	FN(send_signal_thread),		\
-	FN(jiffies64),
+	FN(jiffies64),			\
+	FN(perf_prog_read_branches),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 19e793aa441a..24c51272a1f7 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1028,6 +1028,35 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
          .arg3_type      = ARG_CONST_SIZE,
 };
 
+BPF_CALL_3(bpf_perf_prog_read_branches, struct bpf_perf_event_data_kern *, ctx,
+	   void *, buf, u32, size)
+{
+	struct perf_branch_stack *br_stack = ctx->data->br_stack;
+	u32 to_copy = 0, to_clear = size;
+	int err = -EINVAL;
+
+	if (unlikely(!br_stack))
+		goto clear;
+
+	to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
+	to_clear -= to_copy;
+
+	memcpy(buf, br_stack->entries, to_copy);
+	err = to_copy;
+clear:
+	memset(buf + to_copy, 0, to_clear);
+	return err;
+}
+
+static const struct bpf_func_proto bpf_perf_prog_read_branches_proto = {
+         .func           = bpf_perf_prog_read_branches,
+         .gpl_only       = true,
+         .ret_type       = RET_INTEGER,
+         .arg1_type      = ARG_PTR_TO_CTX,
+         .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
+         .arg3_type      = ARG_CONST_SIZE,
+};
+
 static const struct bpf_func_proto *
 pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1040,6 +1069,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_stack_proto_tp;
 	case BPF_FUNC_perf_prog_read_value:
 		return &bpf_perf_prog_read_value_proto;
+	case BPF_FUNC_perf_prog_read_branches:
+		return &bpf_perf_prog_read_branches_proto;
 	default:
 		return tracing_func_proto(func_id, prog);
 	}
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 bpf-next 2/3] tools/bpf: Sync uapi header bpf.h
  2020-01-23 21:23 [PATCH v3 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper Daniel Xu
  2020-01-23 21:23 ` [PATCH v3 bpf-next 1/3] bpf: " Daniel Xu
@ 2020-01-23 21:23 ` Daniel Xu
  2020-01-23 21:23 ` [PATCH v3 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest Daniel Xu
  2 siblings, 0 replies; 11+ messages in thread
From: Daniel Xu @ 2020-01-23 21:23 UTC (permalink / raw)
  To: bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 tools/include/uapi/linux/bpf.h | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f1d74a2bd234..50c580c8a201 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2892,6 +2892,18 @@ union bpf_attr {
  *		Obtain the 64bit jiffies
  *	Return
  *		The 64 bit jiffies
+ *
+ * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
+ *	Description
+ *		For en eBPF program attached to a perf event, retrieve the
+ *		branch records (struct perf_branch_entry) associated to *ctx*
+ *		and store it in	the buffer pointed by *buf* up to size
+ *		*buf_size* bytes.
+ *
+ *		Any unused parts of *buf* will be filled with zeros.
+ *	Return
+ *		On success, number of bytes written to *buf*. On error, a
+ *		negative value.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3012,7 +3024,8 @@ union bpf_attr {
 	FN(probe_read_kernel_str),	\
 	FN(tcp_send_ack),		\
 	FN(send_signal_thread),		\
-	FN(jiffies64),
+	FN(jiffies64),			\
+	FN(perf_prog_read_branches),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest
  2020-01-23 21:23 [PATCH v3 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper Daniel Xu
  2020-01-23 21:23 ` [PATCH v3 bpf-next 1/3] bpf: " Daniel Xu
  2020-01-23 21:23 ` [PATCH v3 bpf-next 2/3] tools/bpf: Sync uapi header bpf.h Daniel Xu
@ 2020-01-23 21:23 ` Daniel Xu
  2020-01-23 23:20   ` Martin Lau
  2 siblings, 1 reply; 11+ messages in thread
From: Daniel Xu @ 2020-01-23 21:23 UTC (permalink / raw)
  To: bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 .../selftests/bpf/prog_tests/perf_branches.c  | 106 ++++++++++++++++++
 .../selftests/bpf/progs/test_perf_branches.c  |  39 +++++++
 2 files changed, 145 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_branches.c b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
new file mode 100644
index 000000000000..f8d7356a6507
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sched.h>
+#include <sys/socket.h>
+#include <test_progs.h>
+#include "bpf/libbpf_internal.h"
+
+static void on_sample(void *ctx, int cpu, void *data, __u32 size)
+{
+	int pbe_size = sizeof(struct perf_branch_entry);
+	int ret = *(int *)data, duration = 0;
+
+	// It's hard to validate the contents of the branch entries b/c it
+	// would require some kind of disassembler and also encoding the
+	// valid jump instructions for supported architectures. So just check
+	// the easy stuff for now.
+	CHECK(ret < 0, "read_branches", "err %d\n", ret);
+	CHECK(ret % pbe_size != 0, "read_branches",
+	      "bytes written=%d not multiple of struct size=%d\n",
+	      ret, pbe_size);
+
+	*(int *)ctx = 1;
+}
+
+void test_perf_branches(void)
+{
+	int err, prog_fd, i, pfd = -1, duration = 0, ok = 0;
+	const char *file = "./test_perf_branches.o";
+	const char *prog_name = "perf_event";
+	struct perf_buffer_opts pb_opts = {};
+	struct perf_event_attr attr = {};
+	struct bpf_map *perf_buf_map;
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	struct perf_buffer *pb;
+	struct bpf_link *link;
+	volatile int j = 0;
+	cpu_set_t cpu_set;
+
+	/* load program */
+	err = bpf_prog_load(file, BPF_PROG_TYPE_PERF_EVENT, &obj, &prog_fd);
+	if (CHECK(err, "obj_load", "err %d errno %d\n", err, errno)) {
+		obj = NULL;
+		goto out_close;
+	}
+
+	prog = bpf_object__find_program_by_title(obj, prog_name);
+	if (CHECK(!prog, "find_probe", "prog '%s' not found\n", prog_name))
+		goto out_close;
+
+	/* load map */
+	perf_buf_map = bpf_object__find_map_by_name(obj, "perf_buf_map");
+	if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n"))
+		goto out_close;
+
+	/* create perf event */
+	attr.size = sizeof(attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.config = PERF_COUNT_HW_CPU_CYCLES;
+	attr.freq = 1;
+	attr.sample_freq = 4000;
+	attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
+	attr.branch_sample_type = PERF_SAMPLE_BRANCH_USER | PERF_SAMPLE_BRANCH_ANY;
+	pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
+	if (CHECK(pfd < 0, "perf_event_open", "err %d\n", pfd))
+		goto out_close;
+
+	/* attach perf_event */
+	link = bpf_program__attach_perf_event(prog, pfd);
+	if (CHECK(IS_ERR(link), "attach_perf_event", "err %ld\n", PTR_ERR(link)))
+		goto out_close_perf;
+
+	/* set up perf buffer */
+	pb_opts.sample_cb = on_sample;
+	pb_opts.ctx = &ok;
+	pb = perf_buffer__new(bpf_map__fd(perf_buf_map), 1, &pb_opts);
+	if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
+		goto out_detach;
+
+	/* generate some branches on cpu 0 */
+	CPU_ZERO(&cpu_set);
+	CPU_SET(0, &cpu_set);
+	err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
+	if (err && CHECK(err, "set_affinity", "cpu #0, err %d\n", err))
+		goto out_free_pb;
+	for (i = 0; i < 1000000; ++i)
+		++j;
+
+	/* read perf buffer */
+	err = perf_buffer__poll(pb, 500);
+	if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
+		goto out_free_pb;
+
+	if (CHECK(!ok, "ok", "not ok\n"))
+		goto out_free_pb;
+
+out_free_pb:
+	perf_buffer__free(pb);
+out_detach:
+	bpf_link__destroy(link);
+out_close_perf:
+	close(pfd);
+out_close:
+	bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_branches.c b/tools/testing/selftests/bpf/progs/test_perf_branches.c
new file mode 100644
index 000000000000..d818079c7778
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_branches.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <linux/ptrace.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_trace_helpers.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+	__uint(key_size, sizeof(int));
+	__uint(value_size, sizeof(int));
+} perf_buf_map SEC(".maps");
+
+struct fake_perf_branch_entry {
+	__u64 _a;
+	__u64 _b;
+	__u64 _c;
+};
+
+SEC("perf_event")
+int perf_branches(void *ctx)
+{
+	int ret;
+	struct fake_perf_branch_entry entries[4];
+
+	ret = bpf_perf_prog_read_branches(ctx,
+					  entries,
+					  sizeof(entries));
+	/* ignore spurious events */
+	if (!ret)
+		return 1;
+
+	bpf_perf_event_output(ctx, &perf_buf_map, BPF_F_CURRENT_CPU,
+			      &ret, sizeof(ret));
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest
  2020-01-23 21:23 ` [PATCH v3 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest Daniel Xu
@ 2020-01-23 23:20   ` Martin Lau
  2020-01-24  2:55     ` Daniel Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Lau @ 2020-01-23 23:20 UTC (permalink / raw)
  To: Daniel Xu
  Cc: bpf, ast, daniel, Song Liu, Yonghong Song, Andrii Nakryiko,
	linux-kernel, Kernel Team, peterz, mingo, acme

On Thu, Jan 23, 2020 at 01:23:12PM -0800, Daniel Xu wrote:
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Please put some details to avoid empty commit message.
Same for patch 2.

> ---
>  .../selftests/bpf/prog_tests/perf_branches.c  | 106 ++++++++++++++++++
>  .../selftests/bpf/progs/test_perf_branches.c  |  39 +++++++
>  2 files changed, 145 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_branches.c b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
> new file mode 100644
> index 000000000000..f8d7356a6507
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
> @@ -0,0 +1,106 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <pthread.h>
> +#include <sched.h>
> +#include <sys/socket.h>
> +#include <test_progs.h>
> +#include "bpf/libbpf_internal.h"
> +
> +static void on_sample(void *ctx, int cpu, void *data, __u32 size)
> +{
> +	int pbe_size = sizeof(struct perf_branch_entry);
> +	int ret = *(int *)data, duration = 0;
> +
> +	// It's hard to validate the contents of the branch entries b/c it
> +	// would require some kind of disassembler and also encoding the
> +	// valid jump instructions for supported architectures. So just check
> +	// the easy stuff for now.
/* ... */ comment style

> +	CHECK(ret < 0, "read_branches", "err %d\n", ret);
> +	CHECK(ret % pbe_size != 0, "read_branches",
> +	      "bytes written=%d not multiple of struct size=%d\n",
> +	      ret, pbe_size);
> +
> +	*(int *)ctx = 1;
> +}
> +
> +void test_perf_branches(void)
> +{
> +	int err, prog_fd, i, pfd = -1, duration = 0, ok = 0;
> +	const char *file = "./test_perf_branches.o";
> +	const char *prog_name = "perf_event";
> +	struct perf_buffer_opts pb_opts = {};
> +	struct perf_event_attr attr = {};
> +	struct bpf_map *perf_buf_map;
> +	struct bpf_program *prog;
> +	struct bpf_object *obj;
> +	struct perf_buffer *pb;
> +	struct bpf_link *link;
> +	volatile int j = 0;
> +	cpu_set_t cpu_set;
> +
> +	/* load program */
> +	err = bpf_prog_load(file, BPF_PROG_TYPE_PERF_EVENT, &obj, &prog_fd);
> +	if (CHECK(err, "obj_load", "err %d errno %d\n", err, errno)) {
> +		obj = NULL;
> +		goto out_close;
> +	}
> +
> +	prog = bpf_object__find_program_by_title(obj, prog_name);
> +	if (CHECK(!prog, "find_probe", "prog '%s' not found\n", prog_name))
> +		goto out_close;
> +
> +	/* load map */
> +	perf_buf_map = bpf_object__find_map_by_name(obj, "perf_buf_map");
> +	if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n"))
> +		goto out_close;
Using skel may be able to cut some lines.

> +
> +	/* create perf event */
> +	attr.size = sizeof(attr);
> +	attr.type = PERF_TYPE_HARDWARE;
> +	attr.config = PERF_COUNT_HW_CPU_CYCLES;
> +	attr.freq = 1;
> +	attr.sample_freq = 4000;
> +	attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
> +	attr.branch_sample_type = PERF_SAMPLE_BRANCH_USER | PERF_SAMPLE_BRANCH_ANY;
> +	pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
> +	if (CHECK(pfd < 0, "perf_event_open", "err %d\n", pfd))
> +		goto out_close;
> +
> +	/* attach perf_event */
> +	link = bpf_program__attach_perf_event(prog, pfd);
> +	if (CHECK(IS_ERR(link), "attach_perf_event", "err %ld\n", PTR_ERR(link)))
> +		goto out_close_perf;
> +
> +	/* set up perf buffer */
> +	pb_opts.sample_cb = on_sample;
> +	pb_opts.ctx = &ok;
> +	pb = perf_buffer__new(bpf_map__fd(perf_buf_map), 1, &pb_opts);
> +	if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
> +		goto out_detach;
> +
> +	/* generate some branches on cpu 0 */
> +	CPU_ZERO(&cpu_set);
> +	CPU_SET(0, &cpu_set);
> +	err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
> +	if (err && CHECK(err, "set_affinity", "cpu #0, err %d\n", err))
'err &&' seems unnecessary.

> +		goto out_free_pb;
> +	for (i = 0; i < 1000000; ++i)
May be some comments on 1000000?

> +		++j;
> +
> +	/* read perf buffer */
> +	err = perf_buffer__poll(pb, 500);
> +	if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
> +		goto out_free_pb;
> +
> +	if (CHECK(!ok, "ok", "not ok\n"))
> +		goto out_free_pb;
> +
> +out_free_pb:
> +	perf_buffer__free(pb);
> +out_detach:
> +	bpf_link__destroy(link);
> +out_close_perf:
> +	close(pfd);
> +out_close:
> +	bpf_object__close(obj);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_perf_branches.c b/tools/testing/selftests/bpf/progs/test_perf_branches.c
> new file mode 100644
> index 000000000000..d818079c7778
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_perf_branches.c
> @@ -0,0 +1,39 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2019 Facebook
> +
> +#include <linux/ptrace.h>
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include "bpf_trace_helpers.h"
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
> +	__uint(key_size, sizeof(int));
> +	__uint(value_size, sizeof(int));
> +} perf_buf_map SEC(".maps");
> +
> +struct fake_perf_branch_entry {
> +	__u64 _a;
> +	__u64 _b;
> +	__u64 _c;
> +};
> +
> +SEC("perf_event")
> +int perf_branches(void *ctx)
> +{
> +	int ret;
> +	struct fake_perf_branch_entry entries[4];
Try to keep the reverse xmas tree.

> +
> +	ret = bpf_perf_prog_read_branches(ctx,
> +					  entries,
> +					  sizeof(entries));
> +	/* ignore spurious events */
> +	if (!ret)
Check for -ve also?

> +		return 1;
> +
> +	bpf_perf_event_output(ctx, &perf_buf_map, BPF_F_CURRENT_CPU,
> +			      &ret, sizeof(ret));
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> -- 
> 2.21.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23 21:23 ` [PATCH v3 bpf-next 1/3] bpf: " Daniel Xu
@ 2020-01-23 23:48   ` Yonghong Song
  2020-01-24  0:49   ` John Fastabend
  1 sibling, 0 replies; 11+ messages in thread
From: Yonghong Song @ 2020-01-23 23:48 UTC (permalink / raw)
  To: Daniel Xu, bpf, ast, daniel, Song Liu, Andrii Nakryiko
  Cc: linux-kernel, Kernel Team, peterz, mingo, acme



On 1/23/20 1:23 PM, Daniel Xu wrote:
> Branch records are a CPU feature that can be configured to record
> certain branches that are taken during code execution. This data is
> particularly interesting for profile guided optimizations. perf has had
> branch record support for a while but the data collection can be a bit
> coarse grained.
> 
> We (Facebook) have seen in experiments that associating metadata with
> branch records can improve results (after postprocessing). We generally
> use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
> support for branch records is useful.
> 
> Aside from this particular use case, having branch data available to bpf
> progs can be useful to get stack traces out of userspace applications
> that omit frame pointers.
> 
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---
>   include/uapi/linux/bpf.h | 15 ++++++++++++++-
>   kernel/trace/bpf_trace.c | 31 +++++++++++++++++++++++++++++++
>   2 files changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index f1d74a2bd234..50c580c8a201 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2892,6 +2892,18 @@ union bpf_attr {
>    *		Obtain the 64bit jiffies
>    *	Return
>    *		The 64 bit jiffies
> + *
> + * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
> + *	Description
> + *		For en eBPF program attached to a perf event, retrieve the

en => an

> + *		branch records (struct perf_branch_entry) associated to *ctx*
> + *		and store it in	the buffer pointed by *buf* up to size
> + *		*buf_size* bytes.
> + *
> + *		Any unused parts of *buf* will be filled with zeros.
> + *	Return
> + *		On success, number of bytes written to *buf*. On error, a
> + *		negative value.
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -3012,7 +3024,8 @@ union bpf_attr {
>   	FN(probe_read_kernel_str),	\
>   	FN(tcp_send_ack),		\
>   	FN(send_signal_thread),		\
> -	FN(jiffies64),
> +	FN(jiffies64),			\
> +	FN(perf_prog_read_branches),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 19e793aa441a..24c51272a1f7 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1028,6 +1028,35 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
>            .arg3_type      = ARG_CONST_SIZE,
>   };
>   
> +BPF_CALL_3(bpf_perf_prog_read_branches, struct bpf_perf_event_data_kern *, ctx,
> +	   void *, buf, u32, size)
> +{
> +	struct perf_branch_stack *br_stack = ctx->data->br_stack;
> +	u32 to_copy = 0, to_clear = size;
> +	int err = -EINVAL;
> +
> +	if (unlikely(!br_stack))
> +		goto clear;
> +
> +	to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
> +	to_clear -= to_copy;
> +
> +	memcpy(buf, br_stack->entries, to_copy);
> +	err = to_copy;
> +clear:
> +	memset(buf + to_copy, 0, to_clear);
> +	return err;

If size < u32, br_stack->nr * sizeof(struct perf_branch_entry),
user has no way to know whether some entries are not copied except
repeated trying larger buffers until the return value is smaller
than input buffer size.

I think returning the expected buffer size to users should be a good 
thing? We may not have malloc today in bpf, but future malloc thing 
should help in this case.

In user space, user may have a fixed buffer, repeated `read` should
read all values.

Using bpf_probe_read(), repeated read with adjusted source pointer
can also read all buffers.

One possible design is to add a flag to the function, e.g., if
flag == GET_BR_STACK_NR, return br_stack->nr in buf/size.
if flag == GET_BR_STACK, return br_stack->entries in buf/size.

What do you think?


> +}
> +
> +static const struct bpf_func_proto bpf_perf_prog_read_branches_proto = {
> +         .func           = bpf_perf_prog_read_branches,
> +         .gpl_only       = true,
> +         .ret_type       = RET_INTEGER,
> +         .arg1_type      = ARG_PTR_TO_CTX,
> +         .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
> +         .arg3_type      = ARG_CONST_SIZE,
> +};
> +
>   static const struct bpf_func_proto *
>   pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   {
> @@ -1040,6 +1069,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   		return &bpf_get_stack_proto_tp;
>   	case BPF_FUNC_perf_prog_read_value:
>   		return &bpf_perf_prog_read_value_proto;
> +	case BPF_FUNC_perf_prog_read_branches:
> +		return &bpf_perf_prog_read_branches_proto;
>   	default:
>   		return tracing_func_proto(func_id, prog);
>   	}
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v3 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23 21:23 ` [PATCH v3 bpf-next 1/3] bpf: " Daniel Xu
  2020-01-23 23:48   ` Yonghong Song
@ 2020-01-24  0:49   ` John Fastabend
  2020-01-24  2:02     ` Daniel Xu
  1 sibling, 1 reply; 11+ messages in thread
From: John Fastabend @ 2020-01-24  0:49 UTC (permalink / raw)
  To: Daniel Xu, bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Daniel Xu wrote:
> Branch records are a CPU feature that can be configured to record
> certain branches that are taken during code execution. This data is
> particularly interesting for profile guided optimizations. perf has had
> branch record support for a while but the data collection can be a bit
> coarse grained.
> 
> We (Facebook) have seen in experiments that associating metadata with
> branch records can improve results (after postprocessing). We generally
> use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
> support for branch records is useful.
> 
> Aside from this particular use case, having branch data available to bpf
> progs can be useful to get stack traces out of userspace applications
> that omit frame pointers.
> 
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---
>  include/uapi/linux/bpf.h | 15 ++++++++++++++-
>  kernel/trace/bpf_trace.c | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+), 1 deletion(-)
> 

[...]

>   * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 19e793aa441a..24c51272a1f7 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1028,6 +1028,35 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
>           .arg3_type      = ARG_CONST_SIZE,
>  };
>  
> +BPF_CALL_3(bpf_perf_prog_read_branches, struct bpf_perf_event_data_kern *, ctx,
> +	   void *, buf, u32, size)
> +{
> +	struct perf_branch_stack *br_stack = ctx->data->br_stack;
> +	u32 to_copy = 0, to_clear = size;
> +	int err = -EINVAL;
> +
> +	if (unlikely(!br_stack))
> +		goto clear;
> +
> +	to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
> +	to_clear -= to_copy;
> +
> +	memcpy(buf, br_stack->entries, to_copy);
> +	err = to_copy;
> +clear:

There appears to be agreement to clear the extra buffer on error but what about
in the non-error case? I expect one usage pattern is to submit a fairly large
buffer, large enough to handle worse case nr, in this case we end up zero'ing
memory even in the succesful case. Can we skip the clear in this case? Maybe
its not too important either way but seems unnecessary.

> +	memset(buf + to_copy, 0, to_clear);
> +	return err;
> +}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v3 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-24  0:49   ` John Fastabend
@ 2020-01-24  2:02     ` Daniel Xu
  2020-01-24  8:25       ` Martin Lau
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Xu @ 2020-01-24  2:02 UTC (permalink / raw)
  To: John Fastabend, bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

On Thu Jan 23, 2020 at 4:49 PM, John Fastabend wrote:
[...]
> >   * function eBPF program intends to call
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 19e793aa441a..24c51272a1f7 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -1028,6 +1028,35 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
> >           .arg3_type      = ARG_CONST_SIZE,
> >  };
> >  
> > +BPF_CALL_3(bpf_perf_prog_read_branches, struct bpf_perf_event_data_kern *, ctx,
> > +	   void *, buf, u32, size)
> > +{
> > +	struct perf_branch_stack *br_stack = ctx->data->br_stack;
> > +	u32 to_copy = 0, to_clear = size;
> > +	int err = -EINVAL;
> > +
> > +	if (unlikely(!br_stack))
> > +		goto clear;
> > +
> > +	to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
> > +	to_clear -= to_copy;
> > +
> > +	memcpy(buf, br_stack->entries, to_copy);
> > +	err = to_copy;
> > +clear:
>
> 
> There appears to be agreement to clear the extra buffer on error but
> what about
> in the non-error case? I expect one usage pattern is to submit a fairly
> large
> buffer, large enough to handle worse case nr, in this case we end up
> zero'ing
> memory even in the succesful case. Can we skip the clear in this case?
> Maybe
> its not too important either way but seems unnecessary.
>
> 
> > +	memset(buf + to_copy, 0, to_clear);
> > +	return err;
> > +}
>

Given Yonghong's suggestion of a flag argument, we need to allow users
to pass in a null ptr while getting buffer size. So I'll change the `buf`
argument to be ARG_PTR_TO_MEM_OR_NULL, which requires the buffer be
initialized. We can skip zero'ing out altogether.

Although I think the end result is the same -- now the user has to zero it
out. Unfortunately ARG_PTR_TO_UNINITIALIZED_MEM_OR_NULL is not
implemented yet.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest
  2020-01-23 23:20   ` Martin Lau
@ 2020-01-24  2:55     ` Daniel Xu
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Xu @ 2020-01-24  2:55 UTC (permalink / raw)
  To: Martin Lau
  Cc: bpf, ast, daniel, Song Liu, Yonghong Song, Andrii Nakryiko,
	linux-kernel, Kernel Team, peterz, mingo, acme

On Thu Jan 23, 2020 at 11:20 PM, Martin Lau wrote:
> On Thu, Jan 23, 2020 at 01:23:12PM -0800, Daniel Xu wrote:
> > Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> Please put some details to avoid empty commit message.
> Same for patch 2.

Ok.
>
> 
> > ---
> >  .../selftests/bpf/prog_tests/perf_branches.c  | 106 ++++++++++++++++++
> >  .../selftests/bpf/progs/test_perf_branches.c  |  39 +++++++
> >  2 files changed, 145 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c
> > 
> > diff --git a/tools/testing/selftests/bpf/prog_tests/perf_branches.c b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
> > new file mode 100644
> > index 000000000000..f8d7356a6507
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
> > @@ -0,0 +1,106 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#define _GNU_SOURCE
> > +#include <pthread.h>
> > +#include <sched.h>
> > +#include <sys/socket.h>
> > +#include <test_progs.h>
> > +#include "bpf/libbpf_internal.h"
> > +
> > +static void on_sample(void *ctx, int cpu, void *data, __u32 size)
> > +{
> > +	int pbe_size = sizeof(struct perf_branch_entry);
> > +	int ret = *(int *)data, duration = 0;
> > +
> > +	// It's hard to validate the contents of the branch entries b/c it
> > +	// would require some kind of disassembler and also encoding the
> > +	// valid jump instructions for supported architectures. So just check
> > +	// the easy stuff for now.
> /* ... */ comment style

Whoops, sorry.

>
> 
> > +	CHECK(ret < 0, "read_branches", "err %d\n", ret);
> > +	CHECK(ret % pbe_size != 0, "read_branches",
> > +	      "bytes written=%d not multiple of struct size=%d\n",
> > +	      ret, pbe_size);
> > +
> > +	*(int *)ctx = 1;
> > +}
> > +
> > +void test_perf_branches(void)
> > +{
> > +	int err, prog_fd, i, pfd = -1, duration = 0, ok = 0;
> > +	const char *file = "./test_perf_branches.o";
> > +	const char *prog_name = "perf_event";
> > +	struct perf_buffer_opts pb_opts = {};
> > +	struct perf_event_attr attr = {};
> > +	struct bpf_map *perf_buf_map;
> > +	struct bpf_program *prog;
> > +	struct bpf_object *obj;
> > +	struct perf_buffer *pb;
> > +	struct bpf_link *link;
> > +	volatile int j = 0;
> > +	cpu_set_t cpu_set;
> > +
> > +	/* load program */
> > +	err = bpf_prog_load(file, BPF_PROG_TYPE_PERF_EVENT, &obj, &prog_fd);
> > +	if (CHECK(err, "obj_load", "err %d errno %d\n", err, errno)) {
> > +		obj = NULL;
> > +		goto out_close;
> > +	}
> > +
> > +	prog = bpf_object__find_program_by_title(obj, prog_name);
> > +	if (CHECK(!prog, "find_probe", "prog '%s' not found\n", prog_name))
> > +		goto out_close;
> > +
> > +	/* load map */
> > +	perf_buf_map = bpf_object__find_map_by_name(obj, "perf_buf_map");
> > +	if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n"))
> > +		goto out_close;
> Using skel may be able to cut some lines.

Ok, will take a look.

>
> 
> > +
> > +	/* create perf event */
> > +	attr.size = sizeof(attr);
> > +	attr.type = PERF_TYPE_HARDWARE;
> > +	attr.config = PERF_COUNT_HW_CPU_CYCLES;
> > +	attr.freq = 1;
> > +	attr.sample_freq = 4000;
> > +	attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
> > +	attr.branch_sample_type = PERF_SAMPLE_BRANCH_USER | PERF_SAMPLE_BRANCH_ANY;
> > +	pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
> > +	if (CHECK(pfd < 0, "perf_event_open", "err %d\n", pfd))
> > +		goto out_close;
> > +
> > +	/* attach perf_event */
> > +	link = bpf_program__attach_perf_event(prog, pfd);
> > +	if (CHECK(IS_ERR(link), "attach_perf_event", "err %ld\n", PTR_ERR(link)))
> > +		goto out_close_perf;
> > +
> > +	/* set up perf buffer */
> > +	pb_opts.sample_cb = on_sample;
> > +	pb_opts.ctx = &ok;
> > +	pb = perf_buffer__new(bpf_map__fd(perf_buf_map), 1, &pb_opts);
> > +	if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
> > +		goto out_detach;
> > +
> > +	/* generate some branches on cpu 0 */
> > +	CPU_ZERO(&cpu_set);
> > +	CPU_SET(0, &cpu_set);
> > +	err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
> > +	if (err && CHECK(err, "set_affinity", "cpu #0, err %d\n", err))
> 'err &&' seems unnecessary.

Will remove.

>
> 
> > +		goto out_free_pb;
> > +	for (i = 0; i < 1000000; ++i)
> May be some comments on 1000000?
>
> 
> > +		++j;
> > +
> > +	/* read perf buffer */
> > +	err = perf_buffer__poll(pb, 500);
> > +	if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
> > +		goto out_free_pb;
> > +
> > +	if (CHECK(!ok, "ok", "not ok\n"))
> > +		goto out_free_pb;
> > +
> > +out_free_pb:
> > +	perf_buffer__free(pb);
> > +out_detach:
> > +	bpf_link__destroy(link);
> > +out_close_perf:
> > +	close(pfd);
> > +out_close:
> > +	bpf_object__close(obj);
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/test_perf_branches.c b/tools/testing/selftests/bpf/progs/test_perf_branches.c
> > new file mode 100644
> > index 000000000000..d818079c7778
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/test_perf_branches.c
> > @@ -0,0 +1,39 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +// Copyright (c) 2019 Facebook
> > +
> > +#include <linux/ptrace.h>
> > +#include <linux/bpf.h>
> > +#include <bpf/bpf_helpers.h>
> > +#include "bpf_trace_helpers.h"
> > +
> > +struct {
> > +	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
> > +	__uint(key_size, sizeof(int));
> > +	__uint(value_size, sizeof(int));
> > +} perf_buf_map SEC(".maps");
> > +
> > +struct fake_perf_branch_entry {
> > +	__u64 _a;
> > +	__u64 _b;
> > +	__u64 _c;
> > +};
> > +
> > +SEC("perf_event")
> > +int perf_branches(void *ctx)
> > +{
> > +	int ret;
> > +	struct fake_perf_branch_entry entries[4];
> Try to keep the reverse xmas tree.
>
> 
> > +
> > +	ret = bpf_perf_prog_read_branches(ctx,
> > +					  entries,
> > +					  sizeof(entries));
> > +	/* ignore spurious events */
> > +	if (!ret)
> Check for -ve also?

Assuming that means negative, no. Sometimes there aren't any branch
events stored. That's ok and we want to ignore that. If there's an error
(negative), we should pass that up to the selftest in userspace and fail
the test.

>
> 
> > +		return 1;
> > +
> > +	bpf_perf_event_output(ctx, &perf_buf_map, BPF_F_CURRENT_CPU,
> > +			      &ret, sizeof(ret));
> > +	return 0;
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";
> > -- 
> > 2.21.1
> > 
>
> 
>
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-24  2:02     ` Daniel Xu
@ 2020-01-24  8:25       ` Martin Lau
  2020-01-24 20:28         ` Daniel Xu
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Lau @ 2020-01-24  8:25 UTC (permalink / raw)
  To: Daniel Xu
  Cc: John Fastabend, bpf, ast, daniel, Song Liu, Yonghong Song,
	Andrii Nakryiko, linux-kernel, Kernel Team, peterz, mingo, acme

On Thu, Jan 23, 2020 at 06:02:58PM -0800, Daniel Xu wrote:
> On Thu Jan 23, 2020 at 4:49 PM, John Fastabend wrote:
> [...]
> > >   * function eBPF program intends to call
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index 19e793aa441a..24c51272a1f7 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -1028,6 +1028,35 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
> > >           .arg3_type      = ARG_CONST_SIZE,
> > >  };
> > >  
> > > +BPF_CALL_3(bpf_perf_prog_read_branches, struct bpf_perf_event_data_kern *, ctx,
> > > +	   void *, buf, u32, size)
> > > +{
> > > +	struct perf_branch_stack *br_stack = ctx->data->br_stack;
> > > +	u32 to_copy = 0, to_clear = size;
> > > +	int err = -EINVAL;
> > > +
> > > +	if (unlikely(!br_stack))
> > > +		goto clear;
> > > +
> > > +	to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
> > > +	to_clear -= to_copy;
> > > +
> > > +	memcpy(buf, br_stack->entries, to_copy);
> > > +	err = to_copy;
> > > +clear:
> >
> > 
> > There appears to be agreement to clear the extra buffer on error but
> > what about
> > in the non-error case? I expect one usage pattern is to submit a fairly
> > large
> > buffer, large enough to handle worse case nr, in this case we end up
> > zero'ing
> > memory even in the succesful case. Can we skip the clear in this case?
> > Maybe
> > its not too important either way but seems unnecessary.
After some thoughts,  I also think clearing for non-error case
is not ideal.  DanielXu, is it the common use case to always
have a large enough buf size to capture the interested data?

> >
> > 
> > > +	memset(buf + to_copy, 0, to_clear);
> > > +	return err;
> > > +}
> >
> 
> Given Yonghong's suggestion of a flag argument, we need to allow users
> to pass in a null ptr while getting buffer size. So I'll change the `buf`
> argument to be ARG_PTR_TO_MEM_OR_NULL, which requires the buffer be
> initialized. We can skip zero'ing out altogether.
> 
> Although I think the end result is the same -- now the user has to zero it
> out. Unfortunately ARG_PTR_TO_UNINITIALIZED_MEM_OR_NULL is not
> implemented yet.
A "flags" arg can be added but not used to keep our option open in the
future.  Not sure it has to be implemented now though.
I would think whether there is an immediate usecase to learn
br_stack->nr through an extra bpf helper call in every event.

When there is a use case for learning br_stack->nr,
there may have multiple ways to do it also,
this "flags" arg, or another helper,
or br_stack->nr may be read directly with the help of BTF.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-24  8:25       ` Martin Lau
@ 2020-01-24 20:28         ` Daniel Xu
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Xu @ 2020-01-24 20:28 UTC (permalink / raw)
  To: Martin Lau
  Cc: John Fastabend, bpf, ast, daniel, Song Liu, Yonghong Song,
	Andrii Nakryiko, linux-kernel, Kernel Team, peterz, mingo, acme

On Fri Jan 24, 2020 at 8:25 AM, Martin Lau wrote:
> On Thu, Jan 23, 2020 at 06:02:58PM -0800, Daniel Xu wrote:
> > On Thu Jan 23, 2020 at 4:49 PM, John Fastabend wrote:
> > [...]
> > > >   * function eBPF program intends to call
> > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > index 19e793aa441a..24c51272a1f7 100644
> > > > --- a/kernel/trace/bpf_trace.c
> > > > +++ b/kernel/trace/bpf_trace.c
> > > > @@ -1028,6 +1028,35 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
> > > >           .arg3_type      = ARG_CONST_SIZE,
> > > >  };
> > > >  
> > > > +BPF_CALL_3(bpf_perf_prog_read_branches, struct bpf_perf_event_data_kern *, ctx,
> > > > +	   void *, buf, u32, size)
> > > > +{
> > > > +	struct perf_branch_stack *br_stack = ctx->data->br_stack;
> > > > +	u32 to_copy = 0, to_clear = size;
> > > > +	int err = -EINVAL;
> > > > +
> > > > +	if (unlikely(!br_stack))
> > > > +		goto clear;
> > > > +
> > > > +	to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
> > > > +	to_clear -= to_copy;
> > > > +
> > > > +	memcpy(buf, br_stack->entries, to_copy);
> > > > +	err = to_copy;
> > > > +clear:
> > >
> > > 
> > > There appears to be agreement to clear the extra buffer on error but
> > > what about
> > > in the non-error case? I expect one usage pattern is to submit a fairly
> > > large
> > > buffer, large enough to handle worse case nr, in this case we end up
> > > zero'ing
> > > memory even in the succesful case. Can we skip the clear in this case?
> > > Maybe
> > > its not too important either way but seems unnecessary.
> After some thoughts, I also think clearing for non-error case
> is not ideal. DanielXu, is it the common use case to always
> have a large enough buf size to capture the interested data?

Yeah, ideally you want all of the data. But since branch data is already
sampled, it's not too bad to lose some events as long as they're
consistently lost (eg only keep 4 branch entries).

I personally don't have strong opinions about clearing unused buffer on
success. However the internal documentation does say the helper must
fill all the buffer, regardless of success. It seems like from this
conversation there's no functional difference.

>
> 
> > >
> > > 
> > > > +	memset(buf + to_copy, 0, to_clear);
> > > > +	return err;
> > > > +}
> > >
> > 
> > Given Yonghong's suggestion of a flag argument, we need to allow users
> > to pass in a null ptr while getting buffer size. So I'll change the `buf`
> > argument to be ARG_PTR_TO_MEM_OR_NULL, which requires the buffer be
> > initialized. We can skip zero'ing out altogether.
> > 
> > Although I think the end result is the same -- now the user has to zero it
> > out. Unfortunately ARG_PTR_TO_UNINITIALIZED_MEM_OR_NULL is not
> > implemented yet.
> A "flags" arg can be added but not used to keep our option open in the
> future. Not sure it has to be implemented now though.
> I would think whether there is an immediate usecase to learn
> br_stack->nr through an extra bpf helper call in every event.
>
> 
> When there is a use case for learning br_stack->nr,
> there may have multiple ways to do it also,
> this "flags" arg, or another helper,
> or br_stack->nr may be read directly with the help of BTF.

I thought about this more and I think one use case could be to figure
out how many branch entries you failed to collect and report it to
userspace for aggregation later. I agree there are multiple ways to do
it, but they would all require another helper call.

Since right now you have to statically define your buffer size, it's
quite easy to lose entries. So for completeness of the API, it would be
good to know how much data you lost.

Doing it via a flag seems fairly natural to me. Another helper seems a
little overkill. BTF could work but it's a less strong API guarantee
than the helper (ie field name changes).

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-01-24 20:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-23 21:23 [PATCH v3 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper Daniel Xu
2020-01-23 21:23 ` [PATCH v3 bpf-next 1/3] bpf: " Daniel Xu
2020-01-23 23:48   ` Yonghong Song
2020-01-24  0:49   ` John Fastabend
2020-01-24  2:02     ` Daniel Xu
2020-01-24  8:25       ` Martin Lau
2020-01-24 20:28         ` Daniel Xu
2020-01-23 21:23 ` [PATCH v3 bpf-next 2/3] tools/bpf: Sync uapi header bpf.h Daniel Xu
2020-01-23 21:23 ` [PATCH v3 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest Daniel Xu
2020-01-23 23:20   ` Martin Lau
2020-01-24  2:55     ` Daniel Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).