linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper
@ 2020-01-22 20:22 Daniel Xu
  2020-01-22 20:22 ` [PATCH v2 bpf-next 1/3] bpf: " Daniel Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Daniel Xu @ 2020-01-22 20:22 UTC (permalink / raw)
  To: bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.

We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.

Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.

Changes in v2:
- Change to a bpf helper instead of context access
- Avoid mentioning Intel specific things

Daniel Xu (3):
  bpf: Add bpf_perf_prog_read_branches() helper
  tools/bpf: Sync uapi header bpf.h
  selftests/bpf: add bpf_perf_prog_read_branches() selftest

 include/uapi/linux/bpf.h                      |  13 ++-
 kernel/trace/bpf_trace.c                      |  31 +++++
 tools/include/uapi/linux/bpf.h                |  13 ++-
 .../selftests/bpf/prog_tests/perf_branches.c  | 106 ++++++++++++++++++
 .../selftests/bpf/progs/test_perf_branches.c  |  39 +++++++
 5 files changed, 200 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c

-- 
2.21.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-22 20:22 [PATCH v2 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper Daniel Xu
@ 2020-01-22 20:22 ` Daniel Xu
  2020-01-23  5:39   ` John Fastabend
  2020-01-22 20:22 ` [PATCH v2 bpf-next 2/3] tools/bpf: Sync uapi header bpf.h Daniel Xu
  2020-01-22 20:22 ` [PATCH v2 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest Daniel Xu
  2 siblings, 1 reply; 13+ messages in thread
From: Daniel Xu @ 2020-01-22 20:22 UTC (permalink / raw)
  To: bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.

We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.

Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 include/uapi/linux/bpf.h | 13 ++++++++++++-
 kernel/trace/bpf_trace.c | 31 +++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 033d90a2282d..7350c5be6158 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2885,6 +2885,16 @@ union bpf_attr {
  *		**-EPERM** if no permission to send the *sig*.
  *
  *		**-EAGAIN** if bpf program can try again.
+ *
+ * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
+ * 	Description
+ * 		For en eBPF program attached to a perf event, retrieve the
+ * 		branch records (struct perf_branch_entry) associated to *ctx*
+ * 		and store it in	the buffer pointed by *buf* up to size
+ * 		*buf_size* bytes.
+ * 	Return
+ *		On success, number of bytes written to *buf*. On error, a
+ *		negative value.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3004,7 +3014,8 @@ union bpf_attr {
 	FN(probe_read_user_str),	\
 	FN(probe_read_kernel_str),	\
 	FN(tcp_send_ack),		\
-	FN(send_signal_thread),
+	FN(send_signal_thread),		\
+	FN(perf_prog_read_branches),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 19e793aa441a..24c51272a1f7 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1028,6 +1028,35 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
          .arg3_type      = ARG_CONST_SIZE,
 };
 
+BPF_CALL_3(bpf_perf_prog_read_branches, struct bpf_perf_event_data_kern *, ctx,
+	   void *, buf, u32, size)
+{
+	struct perf_branch_stack *br_stack = ctx->data->br_stack;
+	u32 to_copy = 0, to_clear = size;
+	int err = -EINVAL;
+
+	if (unlikely(!br_stack))
+		goto clear;
+
+	to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
+	to_clear -= to_copy;
+
+	memcpy(buf, br_stack->entries, to_copy);
+	err = to_copy;
+clear:
+	memset(buf + to_copy, 0, to_clear);
+	return err;
+}
+
+static const struct bpf_func_proto bpf_perf_prog_read_branches_proto = {
+         .func           = bpf_perf_prog_read_branches,
+         .gpl_only       = true,
+         .ret_type       = RET_INTEGER,
+         .arg1_type      = ARG_PTR_TO_CTX,
+         .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
+         .arg3_type      = ARG_CONST_SIZE,
+};
+
 static const struct bpf_func_proto *
 pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1040,6 +1069,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_stack_proto_tp;
 	case BPF_FUNC_perf_prog_read_value:
 		return &bpf_perf_prog_read_value_proto;
+	case BPF_FUNC_perf_prog_read_branches:
+		return &bpf_perf_prog_read_branches_proto;
 	default:
 		return tracing_func_proto(func_id, prog);
 	}
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 bpf-next 2/3] tools/bpf: Sync uapi header bpf.h
  2020-01-22 20:22 [PATCH v2 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper Daniel Xu
  2020-01-22 20:22 ` [PATCH v2 bpf-next 1/3] bpf: " Daniel Xu
@ 2020-01-22 20:22 ` Daniel Xu
  2020-01-22 20:22 ` [PATCH v2 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest Daniel Xu
  2 siblings, 0 replies; 13+ messages in thread
From: Daniel Xu @ 2020-01-22 20:22 UTC (permalink / raw)
  To: bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 tools/include/uapi/linux/bpf.h | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 033d90a2282d..7350c5be6158 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2885,6 +2885,16 @@ union bpf_attr {
  *		**-EPERM** if no permission to send the *sig*.
  *
  *		**-EAGAIN** if bpf program can try again.
+ *
+ * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
+ * 	Description
+ * 		For en eBPF program attached to a perf event, retrieve the
+ * 		branch records (struct perf_branch_entry) associated to *ctx*
+ * 		and store it in	the buffer pointed by *buf* up to size
+ * 		*buf_size* bytes.
+ * 	Return
+ *		On success, number of bytes written to *buf*. On error, a
+ *		negative value.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3004,7 +3014,8 @@ union bpf_attr {
 	FN(probe_read_user_str),	\
 	FN(probe_read_kernel_str),	\
 	FN(tcp_send_ack),		\
-	FN(send_signal_thread),
+	FN(send_signal_thread),		\
+	FN(perf_prog_read_branches),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest
  2020-01-22 20:22 [PATCH v2 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper Daniel Xu
  2020-01-22 20:22 ` [PATCH v2 bpf-next 1/3] bpf: " Daniel Xu
  2020-01-22 20:22 ` [PATCH v2 bpf-next 2/3] tools/bpf: Sync uapi header bpf.h Daniel Xu
@ 2020-01-22 20:22 ` Daniel Xu
  2 siblings, 0 replies; 13+ messages in thread
From: Daniel Xu @ 2020-01-22 20:22 UTC (permalink / raw)
  To: bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
---
 .../selftests/bpf/prog_tests/perf_branches.c  | 106 ++++++++++++++++++
 .../selftests/bpf/progs/test_perf_branches.c  |  39 +++++++
 2 files changed, 145 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_branches.c b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
new file mode 100644
index 000000000000..1d8c3bf3ab39
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sched.h>
+#include <sys/socket.h>
+#include <test_progs.h>
+#include "libbpf_internal.h"
+
+static void on_sample(void *ctx, int cpu, void *data, __u32 size)
+{
+	int pbe_size = sizeof(struct perf_branch_entry);
+	int ret = *(int *)data, duration = 0;
+
+	// It's hard to validate the contents of the branch entries b/c it
+	// would require some kind of disassembler and also encoding the
+	// valid jump instructions for supported architectures. So just check
+	// the easy stuff for now.
+	CHECK(ret < 0, "read_branches", "err %d\n", ret);
+	CHECK(ret % pbe_size != 0, "read_branches",
+	      "bytes written=%d not multiple of struct size=%d\n",
+	      ret, pbe_size);
+
+	*(int *)ctx = 1;
+}
+
+void test_perf_branches(void)
+{
+	int err, prog_fd, i, pfd = -1, duration = 0, ok = 0;
+	const char *file = "./test_perf_branches.o";
+	const char *prog_name = "perf_event";
+	struct perf_buffer_opts pb_opts = {};
+	struct perf_event_attr attr = {};
+	struct bpf_map *perf_buf_map;
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	struct perf_buffer *pb;
+	struct bpf_link *link;
+	volatile int j = 0;
+	cpu_set_t cpu_set;
+
+	/* load program */
+	err = bpf_prog_load(file, BPF_PROG_TYPE_PERF_EVENT, &obj, &prog_fd);
+	if (CHECK(err, "obj_load", "err %d errno %d\n", err, errno)) {
+		obj = NULL;
+		goto out_close;
+	}
+
+	prog = bpf_object__find_program_by_title(obj, prog_name);
+	if (CHECK(!prog, "find_probe", "prog '%s' not found\n", prog_name))
+		goto out_close;
+
+	/* load map */
+	perf_buf_map = bpf_object__find_map_by_name(obj, "perf_buf_map");
+	if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n"))
+		goto out_close;
+
+	/* create perf event */
+	attr.size = sizeof(attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.config = PERF_COUNT_HW_CPU_CYCLES;
+	attr.freq = 1;
+	attr.sample_freq = 4000;
+	attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
+	attr.branch_sample_type = PERF_SAMPLE_BRANCH_USER | PERF_SAMPLE_BRANCH_ANY;
+	pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
+	if (CHECK(pfd < 0, "perf_event_open", "err %d\n", pfd))
+		goto out_close;
+
+	/* attach perf_event */
+	link = bpf_program__attach_perf_event(prog, pfd);
+	if (CHECK(IS_ERR(link), "attach_perf_event", "err %ld\n", PTR_ERR(link)))
+		goto out_close_perf;
+
+	/* set up perf buffer */
+	pb_opts.sample_cb = on_sample;
+	pb_opts.ctx = &ok;
+	pb = perf_buffer__new(bpf_map__fd(perf_buf_map), 1, &pb_opts);
+	if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
+		goto out_detach;
+
+	/* generate some branches on cpu 0 */
+	CPU_ZERO(&cpu_set);
+	CPU_SET(0, &cpu_set);
+	err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
+	if (err && CHECK(err, "set_affinity", "cpu #0, err %d\n", err))
+		goto out_free_pb;
+	for (i = 0; i < 1000000; ++i)
+		++j;
+
+	/* read perf buffer */
+	err = perf_buffer__poll(pb, 500);
+	if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
+		goto out_free_pb;
+
+	if(CHECK(!ok, "ok", "not ok\n"))
+		goto out_free_pb;
+
+out_free_pb:
+	perf_buffer__free(pb);
+out_detach:
+	bpf_link__destroy(link);
+out_close_perf:
+	close(pfd);
+out_close:
+	bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_branches.c b/tools/testing/selftests/bpf/progs/test_perf_branches.c
new file mode 100644
index 000000000000..c210065e21c8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_branches.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <linux/ptrace.h>
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+#include "bpf_trace_helpers.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+	__uint(key_size, sizeof(int));
+	__uint(value_size, sizeof(int));
+} perf_buf_map SEC(".maps");
+
+struct fake_perf_branch_entry {
+	__u64 _a;
+	__u64 _b;
+	__u64 _c;
+};
+
+SEC("perf_event")
+int perf_branches(void *ctx)
+{
+	int ret;
+	struct fake_perf_branch_entry entries[4];
+
+	ret = bpf_perf_prog_read_branches(ctx,
+					  entries,
+					  sizeof(entries));
+	/* ignore spurious events */
+	if (!ret)
+		return 1;
+
+	bpf_perf_event_output(ctx, &perf_buf_map, BPF_F_CURRENT_CPU,
+			      &ret, sizeof(ret));
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-22 20:22 ` [PATCH v2 bpf-next 1/3] bpf: " Daniel Xu
@ 2020-01-23  5:39   ` John Fastabend
  2020-01-23 20:09     ` Daniel Xu
  0 siblings, 1 reply; 13+ messages in thread
From: John Fastabend @ 2020-01-23  5:39 UTC (permalink / raw)
  To: Daniel Xu, bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Daniel Xu wrote:
> Branch records are a CPU feature that can be configured to record
> certain branches that are taken during code execution. This data is
> particularly interesting for profile guided optimizations. perf has had
> branch record support for a while but the data collection can be a bit
> coarse grained.
> 
> We (Facebook) have seen in experiments that associating metadata with
> branch records can improve results (after postprocessing). We generally
> use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
> support for branch records is useful.
> 
> Aside from this particular use case, having branch data available to bpf
> progs can be useful to get stack traces out of userspace applications
> that omit frame pointers.
> 
> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
> ---
>  include/uapi/linux/bpf.h | 13 ++++++++++++-
>  kernel/trace/bpf_trace.c | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 033d90a2282d..7350c5be6158 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2885,6 +2885,16 @@ union bpf_attr {
>   *		**-EPERM** if no permission to send the *sig*.
>   *
>   *		**-EAGAIN** if bpf program can try again.
> + *
> + * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
> + * 	Description
> + * 		For en eBPF program attached to a perf event, retrieve the
> + * 		branch records (struct perf_branch_entry) associated to *ctx*
> + * 		and store it in	the buffer pointed by *buf* up to size
> + * 		*buf_size* bytes.

It seems extra bytes in buf will be cleared. The number of bytes
copied is returned so I don't see any reason to clear the extra bytes I would
just let the BPF program do this if they care. But it should be noted in
the description at least.

> + * 	Return
> + *		On success, number of bytes written to *buf*. On error, a
> + *		negative value.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -3004,7 +3014,8 @@ union bpf_attr {
>  	FN(probe_read_user_str),	\
>  	FN(probe_read_kernel_str),	\
>  	FN(tcp_send_ack),		\
> -	FN(send_signal_thread),
> +	FN(send_signal_thread),		\
> +	FN(perf_prog_read_branches),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 19e793aa441a..24c51272a1f7 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1028,6 +1028,35 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
>           .arg3_type      = ARG_CONST_SIZE,
>  };
>  
> +BPF_CALL_3(bpf_perf_prog_read_branches, struct bpf_perf_event_data_kern *, ctx,
> +	   void *, buf, u32, size)
> +{
> +	struct perf_branch_stack *br_stack = ctx->data->br_stack;
> +	u32 to_copy = 0, to_clear = size;
> +	int err = -EINVAL;
> +
> +	if (unlikely(!br_stack))
> +		goto clear;
> +
> +	to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
> +	to_clear -= to_copy;
> +
> +	memcpy(buf, br_stack->entries, to_copy);
> +	err = to_copy;
> +clear:
> +	memset(buf + to_copy, 0, to_clear);

Here, why do this at all? If the user cares they can clear the bytes
directly from the BPF program. I suspect its probably going to be
wasted work in most cases. If its needed for some reason provide 
a comment with it.

> +	return err;
> +}

[...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23  5:39   ` John Fastabend
@ 2020-01-23 20:09     ` Daniel Xu
  2020-01-23 22:23       ` Daniel Borkmann
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Xu @ 2020-01-23 20:09 UTC (permalink / raw)
  To: John Fastabend, bpf, ast, daniel, songliubraving, yhs, andriin
  Cc: Daniel Xu, linux-kernel, kernel-team, peterz, mingo, acme

Hi John, thanks for looking.

On Wed Jan 22, 2020 at 9:39 PM, John Fastabend wrote:
[...]
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 033d90a2282d..7350c5be6158 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2885,6 +2885,16 @@ union bpf_attr {
> >   *		**-EPERM** if no permission to send the *sig*.
> >   *
> >   *		**-EAGAIN** if bpf program can try again.
> > + *
> > + * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
> > + * 	Description
> > + * 		For en eBPF program attached to a perf event, retrieve the
> > + * 		branch records (struct perf_branch_entry) associated to *ctx*
> > + * 		and store it in	the buffer pointed by *buf* up to size
> > + * 		*buf_size* bytes.
>
> 
> It seems extra bytes in buf will be cleared. The number of bytes
> copied is returned so I don't see any reason to clear the extra bytes I
> would
> just let the BPF program do this if they care. But it should be noted in
> the description at least.

In include/linux/bpf.h:

        /* the following constraints used to prototype bpf_memcmp() and other
         * functions that access data on eBPF program stack
         */
        ARG_PTR_TO_UNINIT_MEM,  /* pointer to memory does not need to be initialized,
                                 * helper function must fill all bytes or clear
                                 * them in error case.
                                 */

I figured it would be good to clear out the stack b/c this helper
writes data on program stack.

Also bpf_perf_prog_read_value() does something similar (fill zeros on
failure).

[...]
> > +	to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
> > +	to_clear -= to_copy;
> > +
> > +	memcpy(buf, br_stack->entries, to_copy);
> > +	err = to_copy;
> > +clear:
> > +	memset(buf + to_copy, 0, to_clear);
>
> 
> Here, why do this at all? If the user cares they can clear the bytes
> directly from the BPF program. I suspect its probably going to be
> wasted work in most cases. If its needed for some reason provide
> a comment with it.

Same concern as above, right?

I can send a V3 with updated uapi/linux/bpf.h description (and a rebase).

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23 20:09     ` Daniel Xu
@ 2020-01-23 22:23       ` Daniel Borkmann
  2020-01-23 22:30         ` Daniel Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Borkmann @ 2020-01-23 22:23 UTC (permalink / raw)
  To: Daniel Xu, John Fastabend, bpf, ast, songliubraving, yhs, andriin
  Cc: linux-kernel, kernel-team, peterz, mingo, acme

On 1/23/20 9:09 PM, Daniel Xu wrote:
> Hi John, thanks for looking.
> 
> On Wed Jan 22, 2020 at 9:39 PM, John Fastabend wrote:
> [...]
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 033d90a2282d..7350c5be6158 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -2885,6 +2885,16 @@ union bpf_attr {
>>>    *		**-EPERM** if no permission to send the *sig*.
>>>    *
>>>    *		**-EAGAIN** if bpf program can try again.
>>> + *
>>> + * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
>>> + * 	Description
>>> + * 		For en eBPF program attached to a perf event, retrieve the
>>> + * 		branch records (struct perf_branch_entry) associated to *ctx*
>>> + * 		and store it in	the buffer pointed by *buf* up to size
>>> + * 		*buf_size* bytes.
>>
>> It seems extra bytes in buf will be cleared. The number of bytes
>> copied is returned so I don't see any reason to clear the extra bytes I
>> would
>> just let the BPF program do this if they care. But it should be noted in
>> the description at least.
> 
> In include/linux/bpf.h:
> 
>          /* the following constraints used to prototype bpf_memcmp() and other
>           * functions that access data on eBPF program stack
>           */
>          ARG_PTR_TO_UNINIT_MEM,  /* pointer to memory does not need to be initialized,
>                                   * helper function must fill all bytes or clear
>                                   * them in error case.
>                                   */
> 
> I figured it would be good to clear out the stack b/c this helper
> writes data on program stack.
> 
> Also bpf_perf_prog_read_value() does something similar (fill zeros on
> failure).
> 
> [...]
>>> +	to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
>>> +	to_clear -= to_copy;
>>> +
>>> +	memcpy(buf, br_stack->entries, to_copy);
>>> +	err = to_copy;
>>> +clear:
>>> +	memset(buf + to_copy, 0, to_clear);
>>
>>
>> Here, why do this at all? If the user cares they can clear the bytes
>> directly from the BPF program. I suspect its probably going to be
>> wasted work in most cases. If its needed for some reason provide
>> a comment with it.
> 
> Same concern as above, right?

Yes, so we've been following this practice for all the BPF helpers no matter
which program type. Though for tracing it may be up to debate whether it makes
still sense given there's nothing to be leaked here since you can read this data
anyway via probe read if you'd wanted to. So we might as well get rid of the
clearing for all tracing helpers.

Different question related to your set. It looks like br_stack is only available
on x86, is that correct? For other archs this will always bail out on !br_stack
test. Perhaps we should document this fact so users are not surprised why their
prog using this helper is not working on !x86. Wdyt?

Thanks,
Daniel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23 22:23       ` Daniel Borkmann
@ 2020-01-23 22:30         ` Daniel Xu
  2020-01-23 22:41           ` Andrii Nakryiko
  2020-01-23 22:44           ` Daniel Borkmann
  0 siblings, 2 replies; 13+ messages in thread
From: Daniel Xu @ 2020-01-23 22:30 UTC (permalink / raw)
  To: Daniel Borkmann, John Fastabend, bpf, ast, songliubraving, yhs, andriin
  Cc: linux-kernel, kernel-team, peterz, mingo, acme

On Thu Jan 23, 2020 at 11:23 PM, Daniel Borkmann wrote:
[...]
> 
> Yes, so we've been following this practice for all the BPF helpers no
> matter
> which program type. Though for tracing it may be up to debate whether it
> makes
> still sense given there's nothing to be leaked here since you can read
> this data
> anyway via probe read if you'd wanted to. So we might as well get rid of
> the
> clearing for all tracing helpers.

Right, that makes sense. Do you want me to leave it in for this patchset
and then remove all of them in a followup patchset?

> 
> Different question related to your set. It looks like br_stack is only
> available
> on x86, is that correct? For other archs this will always bail out on
> !br_stack
> test. Perhaps we should document this fact so users are not surprised
> why their
> prog using this helper is not working on !x86. Wdyt?

I think perf_event_open() should fail on !x86 if a user tries to configure
it with branch stack collection. So there would not be the opportunity for
the bpf prog to be attached and run. I haven't tested this, though. I'll
look through the code / install a VM and test it.

[...]

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23 22:30         ` Daniel Xu
@ 2020-01-23 22:41           ` Andrii Nakryiko
  2020-01-23 23:09             ` Daniel Borkmann
  2020-01-23 22:44           ` Daniel Borkmann
  1 sibling, 1 reply; 13+ messages in thread
From: Andrii Nakryiko @ 2020-01-23 22:41 UTC (permalink / raw)
  To: Daniel Xu, Daniel Borkmann, John Fastabend, bpf, ast, Song Liu,
	Yonghong Song
  Cc: linux-kernel, Kernel Team, peterz, mingo, acme

On 1/23/20 2:30 PM, Daniel Xu wrote:
> On Thu Jan 23, 2020 at 11:23 PM, Daniel Borkmann wrote:
> [...]
>>
>> Yes, so we've been following this practice for all the BPF helpers no
>> matter
>> which program type. Though for tracing it may be up to debate whether it
>> makes
>> still sense given there's nothing to be leaked here since you can read
>> this data
>> anyway via probe read if you'd wanted to. So we might as well get rid of
>> the
>> clearing for all tracing helpers.
> 
> Right, that makes sense. Do you want me to leave it in for this patchset
> and then remove all of them in a followup patchset?
> 

I don't think we can remove that for existing tracing helpers (e.g., 
bpf_probe_read). There are applications that explicitly expect 
destination memory to be zeroed out on failure. It's a BPF world's 
memset(0).

I also wonder if BPF verifier has any extra assumptions for 
ARG_PTR_TO_UNINIT_MEM w.r.t. it being initialized after helper call 
(e.g., for liveness tracking).

>>
>> Different question related to your set. It looks like br_stack is only
>> available
>> on x86, is that correct? For other archs this will always bail out on
>> !br_stack
>> test. Perhaps we should document this fact so users are not surprised
>> why their
>> prog using this helper is not working on !x86. Wdyt?
> 
> I think perf_event_open() should fail on !x86 if a user tries to configure
> it with branch stack collection. So there would not be the opportunity for
> the bpf prog to be attached and run. I haven't tested this, though. I'll
> look through the code / install a VM and test it.
> 
> [...]
> 
> Thanks,
> Daniel
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23 22:30         ` Daniel Xu
  2020-01-23 22:41           ` Andrii Nakryiko
@ 2020-01-23 22:44           ` Daniel Borkmann
  2020-01-23 23:07             ` Martin Lau
  2020-01-23 23:27             ` Daniel Xu
  1 sibling, 2 replies; 13+ messages in thread
From: Daniel Borkmann @ 2020-01-23 22:44 UTC (permalink / raw)
  To: Daniel Xu, John Fastabend, bpf, ast, songliubraving, yhs, andriin
  Cc: linux-kernel, kernel-team, peterz, mingo, acme

On 1/23/20 11:30 PM, Daniel Xu wrote:
> On Thu Jan 23, 2020 at 11:23 PM, Daniel Borkmann wrote:
> [...]
>>
>> Yes, so we've been following this practice for all the BPF helpers no
>> matter
>> which program type. Though for tracing it may be up to debate whether it
>> makes
>> still sense given there's nothing to be leaked here since you can read
>> this data
>> anyway via probe read if you'd wanted to. So we might as well get rid of
>> the
>> clearing for all tracing helpers.
> 
> Right, that makes sense. Do you want me to leave it in for this patchset
> and then remove all of them in a followup patchset?

Lets leave it in and in a different set, we can clean this up for all tracing
related helpers at once.

>> Different question related to your set. It looks like br_stack is only
>> available
>> on x86, is that correct? For other archs this will always bail out on
>> !br_stack
>> test. Perhaps we should document this fact so users are not surprised
>> why their
>> prog using this helper is not working on !x86. Wdyt?
> 
> I think perf_event_open() should fail on !x86 if a user tries to configure
> it with branch stack collection. So there would not be the opportunity for
> the bpf prog to be attached and run. I haven't tested this, though. I'll
> look through the code / install a VM and test it.

As far as I can see the prog would still be attachable and runnable, just that
the helper always will return -EINVAL on these archs. Maybe error code should be
changed into -ENOENT to avoid confusion wrt whether user provided some invalid
input args. Should this actually bail out with -EINVAL if size is not a multiple
of sizeof(struct perf_branch_entry) as otherwise we'd end up copying half broken
branch entry information?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23 22:44           ` Daniel Borkmann
@ 2020-01-23 23:07             ` Martin Lau
  2020-01-23 23:27             ` Daniel Xu
  1 sibling, 0 replies; 13+ messages in thread
From: Martin Lau @ 2020-01-23 23:07 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Daniel Xu, John Fastabend, bpf, ast, Song Liu, Yonghong Song,
	Andrii Nakryiko, linux-kernel, Kernel Team, peterz, mingo, acme

On Thu, Jan 23, 2020 at 11:44:53PM +0100, Daniel Borkmann wrote:
> On 1/23/20 11:30 PM, Daniel Xu wrote:
> > On Thu Jan 23, 2020 at 11:23 PM, Daniel Borkmann wrote:
> > [...]
> > > 
> > > Yes, so we've been following this practice for all the BPF helpers no
> > > matter
> > > which program type. Though for tracing it may be up to debate whether it
> > > makes
> > > still sense given there's nothing to be leaked here since you can read
> > > this data
> > > anyway via probe read if you'd wanted to. So we might as well get rid of
> > > the
> > > clearing for all tracing helpers.
> > 
> > Right, that makes sense. Do you want me to leave it in for this patchset
> > and then remove all of them in a followup patchset?
> 
> Lets leave it in and in a different set, we can clean this up for all tracing
> related helpers at once.
> 
> > > Different question related to your set. It looks like br_stack is only
> > > available
> > > on x86, is that correct? For other archs this will always bail out on
> > > !br_stack
> > > test. Perhaps we should document this fact so users are not surprised
> > > why their
> > > prog using this helper is not working on !x86. Wdyt?
> > 
> > I think perf_event_open() should fail on !x86 if a user tries to configure
> > it with branch stack collection. So there would not be the opportunity for
> > the bpf prog to be attached and run. I haven't tested this, though. I'll
> > look through the code / install a VM and test it.
> 
> As far as I can see the prog would still be attachable and runnable, just that
> the helper always will return -EINVAL on these archs. Maybe error code should be
> changed into -ENOENT to avoid confusion wrt whether user provided some invalid
+1 on -ENOENT.

> input args. Should this actually bail out with -EINVAL if size is not a multiple
> of sizeof(struct perf_branch_entry) as otherwise we'd end up copying half broken
> branch entry information?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23 22:41           ` Andrii Nakryiko
@ 2020-01-23 23:09             ` Daniel Borkmann
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2020-01-23 23:09 UTC (permalink / raw)
  To: Andrii Nakryiko, Daniel Xu, John Fastabend, bpf, ast, Song Liu,
	Yonghong Song
  Cc: linux-kernel, Kernel Team, peterz, mingo, acme

On 1/23/20 11:41 PM, Andrii Nakryiko wrote:
> On 1/23/20 2:30 PM, Daniel Xu wrote:
>> On Thu Jan 23, 2020 at 11:23 PM, Daniel Borkmann wrote:
>> [...]
>>>
>>> Yes, so we've been following this practice for all the BPF helpers no
>>> matter
>>> which program type. Though for tracing it may be up to debate whether it
>>> makes
>>> still sense given there's nothing to be leaked here since you can read
>>> this data
>>> anyway via probe read if you'd wanted to. So we might as well get rid of
>>> the
>>> clearing for all tracing helpers.
>>
>> Right, that makes sense. Do you want me to leave it in for this patchset
>> and then remove all of them in a followup patchset?
> 
> I don't think we can remove that for existing tracing helpers (e.g.,
> bpf_probe_read). There are applications that explicitly expect
> destination memory to be zeroed out on failure. It's a BPF world's
> memset(0).

Due to avoiding error checks that way if the expected outcome of the buf
is non-zero anyway? Agree, that those would break, so yeah they cannot be
removed then.

> I also wonder if BPF verifier has any extra assumptions for
> ARG_PTR_TO_UNINIT_MEM w.r.t. it being initialized after helper call
> (e.g., for liveness tracking).

There are no extra assumptions other than memory being written after the
helper call (whether success or failure of the helper itself doesn't matter,
so there are no assumptions about the content); the data that has been
written to the buffer is marked as initialized but unknown (e.g. in
check_stack_write() the case where reg remains NULL since value_regno is
negative).

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper
  2020-01-23 22:44           ` Daniel Borkmann
  2020-01-23 23:07             ` Martin Lau
@ 2020-01-23 23:27             ` Daniel Xu
  1 sibling, 0 replies; 13+ messages in thread
From: Daniel Xu @ 2020-01-23 23:27 UTC (permalink / raw)
  To: Daniel Borkmann, John Fastabend, bpf, ast, songliubraving, yhs, andriin
  Cc: linux-kernel, kernel-team, peterz, mingo, acme

On Thu Jan 23, 2020 at 11:44 PM, Daniel Borkmann wrote:
[...]
> >> Different question related to your set. It looks like br_stack is only
> >> available
> >> on x86, is that correct? For other archs this will always bail out on
> >> !br_stack
> >> test. Perhaps we should document this fact so users are not surprised
> >> why their
> >> prog using this helper is not working on !x86. Wdyt?
> > 
> > I think perf_event_open() should fail on !x86 if a user tries to configure
> > it with branch stack collection. So there would not be the opportunity for
> > the bpf prog to be attached and run. I haven't tested this, though. I'll
> > look through the code / install a VM and test it.
>
> 
> As far as I can see the prog would still be attachable and runnable,
> just that
> the helper always will return -EINVAL on these archs. Maybe error code
> should be
> changed into -ENOENT to avoid confusion wrt whether user provided some
> invalid
> input args. 

Ok, will add.

> Should this actually bail out with -EINVAL if size is not a
> multiple
> of sizeof(struct perf_branch_entry) as otherwise we'd end up copying
> half broken
> branch entry information?

Sure, makes sense.
>
> 
> Thanks,
> Daniel
>
> 
>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-01-23 23:27 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-22 20:22 [PATCH v2 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper Daniel Xu
2020-01-22 20:22 ` [PATCH v2 bpf-next 1/3] bpf: " Daniel Xu
2020-01-23  5:39   ` John Fastabend
2020-01-23 20:09     ` Daniel Xu
2020-01-23 22:23       ` Daniel Borkmann
2020-01-23 22:30         ` Daniel Xu
2020-01-23 22:41           ` Andrii Nakryiko
2020-01-23 23:09             ` Daniel Borkmann
2020-01-23 22:44           ` Daniel Borkmann
2020-01-23 23:07             ` Martin Lau
2020-01-23 23:27             ` Daniel Xu
2020-01-22 20:22 ` [PATCH v2 bpf-next 2/3] tools/bpf: Sync uapi header bpf.h Daniel Xu
2020-01-22 20:22 ` [PATCH v2 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest Daniel Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).