All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] bpf: Add bpf_perf_event_read_sample() helper (v1)
@ 2022-11-01  5:23 Namhyung Kim
  2022-11-01  5:23 ` [PATCH bpf-next 1/3] perf/core: Prepare sample data before calling BPF Namhyung Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Namhyung Kim @ 2022-11-01  5:23 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Peter Zijlstra
  Cc: Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Jiri Olsa,
	Steven Rostedt, Ingo Molnar, Arnaldo Carvalho de Melo

Hello,

I'd like to add bpf_perf_event_read_sample() helper to get the sample data
of an perf_event from BPF programs.  This enables more sophistigated filtering
for the perf samples.  Initially I'm thinking of code and data address based
filtering.

The original discussion can be seen here:
  https://lore.kernel.org/r/20220823210354.1407473-1-namhyung@kernel.org

The bpf_perf_event_read_sample() will take a buffer and size to save the data
as well as a flag to specify perf sample type.  The flag argument should have
a single value in the enum perf_event_sample_format like PERF_SAMPLE_IP.  If
the buffer is NULL, it will return the size of data instead.  This is to
support variable length data in the future.

The first patch adds bpf_prepare_sample() to setup necessary perf sample data
before calling the bpf overflow handler for the perf event.  The existing
logic for callchain moved to the function and it sets IP and ADDR data if
they are not set already.

The second patch actually adds the bpf_perf_event_read_sample() helper and
supports IP and ADDR data.  The last patch adds a test code for this.

The code is available at 'bpf/perf-sample-v1' branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung

Namhyung Kim (3):
  perf/core: Prepare sample data before calling BPF
  bpf: Add bpf_perf_event_read_sample() helper
  bpf: Add perf_event_read_sample test cases

 include/uapi/linux/bpf.h                      |  23 +++
 kernel/events/core.c                          |  40 +++-
 kernel/trace/bpf_trace.c                      |  49 +++++
 tools/include/uapi/linux/bpf.h                |  23 +++
 .../selftests/bpf/prog_tests/perf_sample.c    | 172 ++++++++++++++++++
 .../selftests/bpf/progs/test_perf_sample.c    |  28 +++
 6 files changed, 326 insertions(+), 9 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_sample.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_sample.c


base-commit: e39e739ab57399f46167d453bbdb8ef8d57c6488
-- 
2.38.1.273.g43a17bfeac-goog


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next 1/3] perf/core: Prepare sample data before calling BPF
  2022-11-01  5:23 [PATCH bpf-next 0/3] bpf: Add bpf_perf_event_read_sample() helper (v1) Namhyung Kim
@ 2022-11-01  5:23 ` Namhyung Kim
  2022-11-01 10:03   ` Jiri Olsa
  2022-11-01  5:23 ` [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper Namhyung Kim
  2022-11-01  5:23 ` [PATCH bpf-next 3/3] bpf: Add perf_event_read_sample test cases Namhyung Kim
  2 siblings, 1 reply; 19+ messages in thread
From: Namhyung Kim @ 2022-11-01  5:23 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Peter Zijlstra
  Cc: Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Jiri Olsa,
	Steven Rostedt, Ingo Molnar, Arnaldo Carvalho de Melo

To allow bpf overflow handler to access the perf sample data, it needs to
prepare missing but requested data before calling the handler.

I'm taking a conservative approach to allow a list of sample formats only
instead of allowing them all.  For now, IP and ADDR data are allowed and
I think it's good enough to build and verify general BPF-based sample
filters for perf events.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/events/core.c | 40 +++++++++++++++++++++++++++++++---------
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index aefc1e08e015..519f30c33a24 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7329,8 +7329,10 @@ void perf_prepare_sample(struct perf_event_header *header,
 	filtered_sample_type = sample_type & ~data->sample_flags;
 	__perf_event_header__init_id(header, data, event, filtered_sample_type);
 
-	if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
-		data->ip = perf_instruction_pointer(regs);
+	if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE)) {
+		if (filtered_sample_type & PERF_SAMPLE_IP)
+			data->ip = perf_instruction_pointer(regs);
+	}
 
 	if (sample_type & PERF_SAMPLE_CALLCHAIN) {
 		int size = 1;
@@ -10006,6 +10008,32 @@ static void perf_event_free_filter(struct perf_event *event)
 }
 
 #ifdef CONFIG_BPF_SYSCALL
+static void bpf_prepare_sample(struct bpf_prog *prog,
+			       struct perf_event *event,
+			       struct perf_sample_data *data,
+			       struct pt_regs *regs)
+{
+	u64 filtered_sample_type;
+
+	filtered_sample_type = event->attr.sample_type & ~data->sample_flags;
+
+	if (prog->call_get_stack &&
+	    (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)) {
+		data->callchain = perf_callchain(event, regs);
+		data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
+	}
+
+	if (filtered_sample_type & PERF_SAMPLE_IP) {
+		data->ip = perf_instruction_pointer(regs);
+		data->sample_flags |= PERF_SAMPLE_IP;
+	}
+
+	if (filtered_sample_type & PERF_SAMPLE_ADDR) {
+		data->addr = 0;
+		data->sample_flags |= PERF_SAMPLE_ADDR;
+	}
+}
+
 static void bpf_overflow_handler(struct perf_event *event,
 				 struct perf_sample_data *data,
 				 struct pt_regs *regs)
@@ -10023,13 +10051,7 @@ static void bpf_overflow_handler(struct perf_event *event,
 	rcu_read_lock();
 	prog = READ_ONCE(event->prog);
 	if (prog) {
-		if (prog->call_get_stack &&
-		    (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) &&
-		    !(data->sample_flags & PERF_SAMPLE_CALLCHAIN)) {
-			data->callchain = perf_callchain(event, regs);
-			data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
-		}
-
+		bpf_prepare_sample(prog, event, data, regs);
 		ret = bpf_prog_run(prog, &ctx);
 	}
 	rcu_read_unlock();
-- 
2.38.1.273.g43a17bfeac-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-01  5:23 [PATCH bpf-next 0/3] bpf: Add bpf_perf_event_read_sample() helper (v1) Namhyung Kim
  2022-11-01  5:23 ` [PATCH bpf-next 1/3] perf/core: Prepare sample data before calling BPF Namhyung Kim
@ 2022-11-01  5:23 ` Namhyung Kim
  2022-11-01 10:02   ` Jiri Olsa
  2022-11-01  5:23 ` [PATCH bpf-next 3/3] bpf: Add perf_event_read_sample test cases Namhyung Kim
  2 siblings, 1 reply; 19+ messages in thread
From: Namhyung Kim @ 2022-11-01  5:23 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Peter Zijlstra
  Cc: Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Jiri Olsa,
	Steven Rostedt, Ingo Molnar, Arnaldo Carvalho de Melo

The bpf_perf_event_read_sample() helper is to get the specified sample
data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
decision for filtering on samples.  Currently PERF_SAMPLE_IP and
PERF_SAMPLE_DATA flags are supported only.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/uapi/linux/bpf.h       | 23 ++++++++++++++++
 kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
 3 files changed, 95 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 94659f6b3395..cba501de9373 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5481,6 +5481,28 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
+ *	Description
+ *		For an eBPF program attached to a perf event, retrieve the
+ *		sample data associated to *ctx*	and store it in the buffer
+ *		pointed by *buf* up to size *size* bytes.
+ *
+ *		The *sample_flags* should contain a single value in the
+ *		**enum perf_event_sample_format**.
+ *	Return
+ *		On success, number of bytes written to *buf*. On error, a
+ *		negative value.
+ *
+ *		The *buf* can be set to **NULL** to return the number of bytes
+ *		required to store the requested sample data.
+ *
+ *		**-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
+ *
+ *		**-ENOENT** if the associated perf event doesn't have the data.
+ *
+ *		**-ENOSYS** if system doesn't support the sample data to be
+ *		retrieved.
  */
 #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
 	FN(unspec, 0, ##ctx)				\
@@ -5695,6 +5717,7 @@ union bpf_attr {
 	FN(user_ringbuf_drain, 209, ##ctx)		\
 	FN(cgrp_storage_get, 210, ##ctx)		\
 	FN(cgrp_storage_delete, 211, ##ctx)		\
+	FN(perf_event_read_sample, 212, ##ctx)		\
 	/* */
 
 /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ce0228c72a93..befd937afa3c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -28,6 +28,7 @@
 
 #include <uapi/linux/bpf.h>
 #include <uapi/linux/btf.h>
+#include <uapi/linux/perf_event.h>
 
 #include <asm/tlb.h>
 
@@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
 	.arg4_type      = ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
+	   void *, buf, u32, size, u64, flags)
+{
+	struct perf_sample_data *sd = ctx->data;
+	void *data;
+	u32 to_copy = sizeof(u64);
+
+	/* only allow a single sample flag */
+	if (!is_power_of_2(flags))
+		return -EINVAL;
+
+	/* support reading only already populated info */
+	if (flags & ~sd->sample_flags)
+		return -ENOENT;
+
+	switch (flags) {
+	case PERF_SAMPLE_IP:
+		data = &sd->ip;
+		break;
+	case PERF_SAMPLE_ADDR:
+		data = &sd->addr;
+		break;
+	default:
+		return -ENOSYS;
+	}
+
+	if (!buf)
+		return to_copy;
+
+	if (size < to_copy)
+		to_copy = size;
+
+	memcpy(buf, data, to_copy);
+	return to_copy;
+}
+
+static const struct bpf_func_proto bpf_perf_event_read_sample_proto = {
+	.func           = bpf_perf_event_read_sample,
+	.gpl_only       = true,
+	.ret_type       = RET_INTEGER,
+	.arg1_type      = ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_PTR_TO_MEM_OR_NULL,
+	.arg3_type      = ARG_CONST_SIZE_OR_ZERO,
+	.arg4_type      = ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1759,6 +1806,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_read_branch_records_proto;
 	case BPF_FUNC_get_attach_cookie:
 		return &bpf_get_attach_cookie_proto_pe;
+	case BPF_FUNC_perf_event_read_sample:
+		return &bpf_perf_event_read_sample_proto;
 	default:
 		return bpf_tracing_func_proto(func_id, prog);
 	}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 94659f6b3395..cba501de9373 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5481,6 +5481,28 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
+ *	Description
+ *		For an eBPF program attached to a perf event, retrieve the
+ *		sample data associated to *ctx*	and store it in the buffer
+ *		pointed by *buf* up to size *size* bytes.
+ *
+ *		The *sample_flags* should contain a single value in the
+ *		**enum perf_event_sample_format**.
+ *	Return
+ *		On success, number of bytes written to *buf*. On error, a
+ *		negative value.
+ *
+ *		The *buf* can be set to **NULL** to return the number of bytes
+ *		required to store the requested sample data.
+ *
+ *		**-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
+ *
+ *		**-ENOENT** if the associated perf event doesn't have the data.
+ *
+ *		**-ENOSYS** if system doesn't support the sample data to be
+ *		retrieved.
  */
 #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
 	FN(unspec, 0, ##ctx)				\
@@ -5695,6 +5717,7 @@ union bpf_attr {
 	FN(user_ringbuf_drain, 209, ##ctx)		\
 	FN(cgrp_storage_get, 210, ##ctx)		\
 	FN(cgrp_storage_delete, 211, ##ctx)		\
+	FN(perf_event_read_sample, 212, ##ctx)		\
 	/* */
 
 /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
-- 
2.38.1.273.g43a17bfeac-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf-next 3/3] bpf: Add perf_event_read_sample test cases
  2022-11-01  5:23 [PATCH bpf-next 0/3] bpf: Add bpf_perf_event_read_sample() helper (v1) Namhyung Kim
  2022-11-01  5:23 ` [PATCH bpf-next 1/3] perf/core: Prepare sample data before calling BPF Namhyung Kim
  2022-11-01  5:23 ` [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper Namhyung Kim
@ 2022-11-01  5:23 ` Namhyung Kim
  2 siblings, 0 replies; 19+ messages in thread
From: Namhyung Kim @ 2022-11-01  5:23 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Peter Zijlstra
  Cc: Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Jiri Olsa,
	Steven Rostedt, Ingo Molnar, Arnaldo Carvalho de Melo

It checks the bpf_perf_event_read_sample() helper with and without buffer
for supported PERF_SAMPLE_* flags.  The BPF program can control sample
data using the return value after checking the sample data and size.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 .../selftests/bpf/prog_tests/perf_sample.c    | 172 ++++++++++++++++++
 .../selftests/bpf/progs/test_perf_sample.c    |  28 +++
 2 files changed, 200 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_sample.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_sample.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_sample.c b/tools/testing/selftests/bpf/prog_tests/perf_sample.c
new file mode 100644
index 000000000000..eee11f23196c
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_sample.c
@@ -0,0 +1,172 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+
+#include <test_progs.h>
+#include "test_perf_sample.skel.h"
+
+#ifndef noinline
+#define noinline __attribute__((noinline))
+#endif
+
+/* treat user-stack data as invalid (for testing only) */
+#define PERF_SAMPLE_INVALID  PERF_SAMPLE_STACK_USER
+
+#define PERF_MMAP_SIZE  8192
+#define DATA_MMAP_SIZE  4096
+
+static int perf_fd = -1;
+static void *perf_ringbuf;
+static struct test_perf_sample *skel;
+
+static int open_perf_event(u64 sample_flags)
+{
+	struct perf_event_attr attr = {
+		.type = PERF_TYPE_SOFTWARE,
+		.config = PERF_COUNT_SW_PAGE_FAULTS,
+		.sample_type = sample_flags,
+		.sample_period = 1,
+		.disabled = 1,
+		.size = sizeof(attr),
+	};
+	int fd;
+	void *ptr;
+
+	fd = syscall(SYS_perf_event_open, &attr, 0, -1, -1, 0);
+	if (!ASSERT_GT(fd, 0, "perf_event_open"))
+		return -1;
+
+	ptr = mmap(NULL, PERF_MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+	if (!ASSERT_NEQ(ptr, MAP_FAILED, "mmap")) {
+		close(fd);
+		return -1;
+	}
+
+	perf_fd = fd;
+	perf_ringbuf = ptr;
+
+	return 0;
+}
+
+static void close_perf_event(void)
+{
+	if (perf_fd == -1)
+		return;
+
+	munmap(perf_ringbuf, PERF_MMAP_SIZE);
+	close(perf_fd);
+
+	perf_fd = -1;
+	perf_ringbuf = NULL;
+}
+
+static noinline void trigger_perf_event(void)
+{
+	int *buf = mmap(NULL, DATA_MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
+
+	if (!ASSERT_NEQ(buf, MAP_FAILED, "mmap"))
+		return;
+
+	ioctl(perf_fd, PERF_EVENT_IOC_ENABLE);
+
+	/* it should generate a page fault which triggers the perf_event */
+	*buf = 1;
+
+	ioctl(perf_fd, PERF_EVENT_IOC_DISABLE);
+
+	munmap(buf, DATA_MMAP_SIZE);
+}
+
+/* check if the perf ringbuf has a sample data */
+static int check_perf_event(void)
+{
+	struct perf_event_mmap_page *page = perf_ringbuf;
+	struct perf_event_header *hdr;
+
+	if (page->data_head == page->data_tail)
+		return 0;
+
+	hdr = perf_ringbuf + page->data_offset;
+
+	if (hdr->type != PERF_RECORD_SAMPLE)
+		return 0;
+
+	return 1;
+}
+
+static void setup_perf_sample_bpf_skel(u64 sample_flags)
+{
+	struct bpf_link *link;
+
+	skel = test_perf_sample__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_perf_sample_open_and_load"))
+		return;
+
+	skel->bss->sample_flag = sample_flags;
+	skel->bss->sample_size = sizeof(sample_flags);
+
+	link = bpf_program__attach_perf_event(skel->progs.perf_sample_filter, perf_fd);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_perf_event"))
+		return;
+}
+
+static void clean_perf_sample_bpf_skel(void)
+{
+	test_perf_sample__detach(skel);
+	test_perf_sample__destroy(skel);
+}
+
+static void test_perf_event_read_sample_invalid(void)
+{
+	u64 flags = PERF_SAMPLE_INVALID;
+
+	if (open_perf_event(flags) < 0)
+		return;
+	setup_perf_sample_bpf_skel(flags);
+	trigger_perf_event();
+	ASSERT_EQ(check_perf_event(), 0, "number of sample");
+	clean_perf_sample_bpf_skel();
+	close_perf_event();
+}
+
+static void test_perf_event_read_sample_ip(void)
+{
+	u64 flags = PERF_SAMPLE_IP;
+
+	if (open_perf_event(flags) < 0)
+		return;
+	setup_perf_sample_bpf_skel(flags);
+	trigger_perf_event();
+	ASSERT_EQ(check_perf_event(), 1, "number of sample");
+	clean_perf_sample_bpf_skel();
+	close_perf_event();
+}
+
+static void test_perf_event_read_sample_addr(void)
+{
+	u64 flags = PERF_SAMPLE_ADDR;
+
+	if (open_perf_event(flags) < 0)
+		return;
+	setup_perf_sample_bpf_skel(flags);
+	trigger_perf_event();
+	ASSERT_EQ(check_perf_event(), 1, "number of sample");
+	clean_perf_sample_bpf_skel();
+	close_perf_event();
+}
+
+void test_perf_event_read_sample(void)
+{
+	if (test__start_subtest("perf_event_read_sample_invalid"))
+		test_perf_event_read_sample_invalid();
+	if (test__start_subtest("perf_event_read_sample_ip"))
+		test_perf_event_read_sample_ip();
+	if (test__start_subtest("perf_event_read_sample_addr"))
+		test_perf_event_read_sample_addr();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_sample.c b/tools/testing/selftests/bpf/progs/test_perf_sample.c
new file mode 100644
index 000000000000..79664acafcd9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_sample.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2022 Google
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+unsigned long long sample_flag;
+unsigned long long sample_size;
+
+SEC("perf_event")
+int perf_sample_filter(void *ctx)
+{
+	long size;
+	unsigned long long buf[1] = {};
+
+	size = bpf_perf_event_read_sample(ctx, NULL, 0, sample_flag);
+	if (size != sample_size)
+		return 0;
+
+	if (bpf_perf_event_read_sample(ctx, buf, sizeof(buf), sample_flag) < 0)
+		return 0;
+
+	/* generate sample data */
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.38.1.273.g43a17bfeac-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-01  5:23 ` [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper Namhyung Kim
@ 2022-11-01 10:02   ` Jiri Olsa
  2022-11-01 18:26     ` Alexei Starovoitov
  2022-11-03 19:45     ` Yonghong Song
  0 siblings, 2 replies; 19+ messages in thread
From: Jiri Olsa @ 2022-11-01 10:02 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Peter Zijlstra, Martin KaFai Lau, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf, Steven Rostedt,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
> The bpf_perf_event_read_sample() helper is to get the specified sample
> data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
> decision for filtering on samples.  Currently PERF_SAMPLE_IP and
> PERF_SAMPLE_DATA flags are supported only.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  include/uapi/linux/bpf.h       | 23 ++++++++++++++++
>  kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
>  3 files changed, 95 insertions(+)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 94659f6b3395..cba501de9373 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5481,6 +5481,28 @@ union bpf_attr {
>   *		0 on success.
>   *
>   *		**-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
> + *	Description
> + *		For an eBPF program attached to a perf event, retrieve the
> + *		sample data associated to *ctx*	and store it in the buffer
> + *		pointed by *buf* up to size *size* bytes.
> + *
> + *		The *sample_flags* should contain a single value in the
> + *		**enum perf_event_sample_format**.
> + *	Return
> + *		On success, number of bytes written to *buf*. On error, a
> + *		negative value.
> + *
> + *		The *buf* can be set to **NULL** to return the number of bytes
> + *		required to store the requested sample data.
> + *
> + *		**-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
> + *
> + *		**-ENOENT** if the associated perf event doesn't have the data.
> + *
> + *		**-ENOSYS** if system doesn't support the sample data to be
> + *		retrieved.
>   */
>  #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
>  	FN(unspec, 0, ##ctx)				\
> @@ -5695,6 +5717,7 @@ union bpf_attr {
>  	FN(user_ringbuf_drain, 209, ##ctx)		\
>  	FN(cgrp_storage_get, 210, ##ctx)		\
>  	FN(cgrp_storage_delete, 211, ##ctx)		\
> +	FN(perf_event_read_sample, 212, ##ctx)		\
>  	/* */
>  
>  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ce0228c72a93..befd937afa3c 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -28,6 +28,7 @@
>  
>  #include <uapi/linux/bpf.h>
>  #include <uapi/linux/btf.h>
> +#include <uapi/linux/perf_event.h>
>  
>  #include <asm/tlb.h>
>  
> @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
>  	.arg4_type      = ARG_ANYTHING,
>  };
>  
> +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
> +	   void *, buf, u32, size, u64, flags)
> +{

I wonder we could add perf_btf (like we have tp_btf) program type that
could access ctx->data directly without helpers

> +	struct perf_sample_data *sd = ctx->data;
> +	void *data;
> +	u32 to_copy = sizeof(u64);
> +
> +	/* only allow a single sample flag */
> +	if (!is_power_of_2(flags))
> +		return -EINVAL;
> +
> +	/* support reading only already populated info */
> +	if (flags & ~sd->sample_flags)
> +		return -ENOENT;
> +
> +	switch (flags) {
> +	case PERF_SAMPLE_IP:
> +		data = &sd->ip;
> +		break;
> +	case PERF_SAMPLE_ADDR:
> +		data = &sd->addr;
> +		break;

AFAICS from pe_prog_convert_ctx_access you should be able to read addr
directly from context right? same as sample_period.. so I think if this
will be generic way to read sample data, should we add sample_period
as well?


> +	default:
> +		return -ENOSYS;
> +	}
> +
> +	if (!buf)
> +		return to_copy;
> +
> +	if (size < to_copy)
> +		to_copy = size;

should we fail in here instead? is there any point in returning
not complete data?

jirka


> +
> +	memcpy(buf, data, to_copy);
> +	return to_copy;
> +}
> +
> +static const struct bpf_func_proto bpf_perf_event_read_sample_proto = {
> +	.func           = bpf_perf_event_read_sample,
> +	.gpl_only       = true,
> +	.ret_type       = RET_INTEGER,
> +	.arg1_type      = ARG_PTR_TO_CTX,
> +	.arg2_type      = ARG_PTR_TO_MEM_OR_NULL,
> +	.arg3_type      = ARG_CONST_SIZE_OR_ZERO,
> +	.arg4_type      = ARG_ANYTHING,
> +};
> +
>  static const struct bpf_func_proto *
>  pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  {
> @@ -1759,6 +1806,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_read_branch_records_proto;
>  	case BPF_FUNC_get_attach_cookie:
>  		return &bpf_get_attach_cookie_proto_pe;
> +	case BPF_FUNC_perf_event_read_sample:
> +		return &bpf_perf_event_read_sample_proto;
>  	default:
>  		return bpf_tracing_func_proto(func_id, prog);
>  	}
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 94659f6b3395..cba501de9373 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5481,6 +5481,28 @@ union bpf_attr {
>   *		0 on success.
>   *
>   *		**-ENOENT** if the bpf_local_storage cannot be found.
> + *
> + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
> + *	Description
> + *		For an eBPF program attached to a perf event, retrieve the
> + *		sample data associated to *ctx*	and store it in the buffer
> + *		pointed by *buf* up to size *size* bytes.
> + *
> + *		The *sample_flags* should contain a single value in the
> + *		**enum perf_event_sample_format**.
> + *	Return
> + *		On success, number of bytes written to *buf*. On error, a
> + *		negative value.
> + *
> + *		The *buf* can be set to **NULL** to return the number of bytes
> + *		required to store the requested sample data.
> + *
> + *		**-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
> + *
> + *		**-ENOENT** if the associated perf event doesn't have the data.
> + *
> + *		**-ENOSYS** if system doesn't support the sample data to be
> + *		retrieved.
>   */
>  #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
>  	FN(unspec, 0, ##ctx)				\
> @@ -5695,6 +5717,7 @@ union bpf_attr {
>  	FN(user_ringbuf_drain, 209, ##ctx)		\
>  	FN(cgrp_storage_get, 210, ##ctx)		\
>  	FN(cgrp_storage_delete, 211, ##ctx)		\
> +	FN(perf_event_read_sample, 212, ##ctx)		\
>  	/* */
>  
>  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> -- 
> 2.38.1.273.g43a17bfeac-goog
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/3] perf/core: Prepare sample data before calling BPF
  2022-11-01  5:23 ` [PATCH bpf-next 1/3] perf/core: Prepare sample data before calling BPF Namhyung Kim
@ 2022-11-01 10:03   ` Jiri Olsa
  2022-11-04  6:03     ` Namhyung Kim
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Olsa @ 2022-11-01 10:03 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Peter Zijlstra, Martin KaFai Lau, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf, Steven Rostedt,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Mon, Oct 31, 2022 at 10:23:38PM -0700, Namhyung Kim wrote:
> To allow bpf overflow handler to access the perf sample data, it needs to
> prepare missing but requested data before calling the handler.
> 
> I'm taking a conservative approach to allow a list of sample formats only
> instead of allowing them all.  For now, IP and ADDR data are allowed and
> I think it's good enough to build and verify general BPF-based sample
> filters for perf events.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/events/core.c | 40 +++++++++++++++++++++++++++++++---------
>  1 file changed, 31 insertions(+), 9 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index aefc1e08e015..519f30c33a24 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7329,8 +7329,10 @@ void perf_prepare_sample(struct perf_event_header *header,
>  	filtered_sample_type = sample_type & ~data->sample_flags;
>  	__perf_event_header__init_id(header, data, event, filtered_sample_type);
>  
> -	if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
> -		data->ip = perf_instruction_pointer(regs);
> +	if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE)) {
> +		if (filtered_sample_type & PERF_SAMPLE_IP)
> +			data->ip = perf_instruction_pointer(regs);
> +	}
>  
>  	if (sample_type & PERF_SAMPLE_CALLCHAIN) {
>  		int size = 1;
> @@ -10006,6 +10008,32 @@ static void perf_event_free_filter(struct perf_event *event)
>  }
>  
>  #ifdef CONFIG_BPF_SYSCALL
> +static void bpf_prepare_sample(struct bpf_prog *prog,
> +			       struct perf_event *event,
> +			       struct perf_sample_data *data,
> +			       struct pt_regs *regs)
> +{
> +	u64 filtered_sample_type;
> +
> +	filtered_sample_type = event->attr.sample_type & ~data->sample_flags;

could we add the same comment in here as is in perf_prepare_sample

        /*
         * Clear the sample flags that have already been done by the
         * PMU driver.
         */

it took me while to recall while we set addr to 0 in here ;-)

thanks,
jirka

> +
> +	if (prog->call_get_stack &&
> +	    (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)) {
> +		data->callchain = perf_callchain(event, regs);
> +		data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
> +	}
> +
> +	if (filtered_sample_type & PERF_SAMPLE_IP) {
> +		data->ip = perf_instruction_pointer(regs);
> +		data->sample_flags |= PERF_SAMPLE_IP;
> +	}
> +
> +	if (filtered_sample_type & PERF_SAMPLE_ADDR) {
> +		data->addr = 0;
> +		data->sample_flags |= PERF_SAMPLE_ADDR;
> +	}
> +}
> +
>  static void bpf_overflow_handler(struct perf_event *event,
>  				 struct perf_sample_data *data,
>  				 struct pt_regs *regs)
> @@ -10023,13 +10051,7 @@ static void bpf_overflow_handler(struct perf_event *event,
>  	rcu_read_lock();
>  	prog = READ_ONCE(event->prog);
>  	if (prog) {
> -		if (prog->call_get_stack &&
> -		    (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) &&
> -		    !(data->sample_flags & PERF_SAMPLE_CALLCHAIN)) {
> -			data->callchain = perf_callchain(event, regs);
> -			data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
> -		}
> -
> +		bpf_prepare_sample(prog, event, data, regs);
>  		ret = bpf_prog_run(prog, &ctx);
>  	}
>  	rcu_read_unlock();
> -- 
> 2.38.1.273.g43a17bfeac-goog
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-01 10:02   ` Jiri Olsa
@ 2022-11-01 18:26     ` Alexei Starovoitov
  2022-11-01 18:46       ` Song Liu
  2022-11-03 19:45     ` Yonghong Song
  1 sibling, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2022-11-01 18:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Peter Zijlstra, Martin KaFai Lau,
	Yonghong Song, John Fastabend, KP Singh, Hao Luo,
	Stanislav Fomichev, LKML, bpf, Steven Rostedt, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Tue, Nov 1, 2022 at 3:03 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
> > The bpf_perf_event_read_sample() helper is to get the specified sample
> > data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
> > decision for filtering on samples.  Currently PERF_SAMPLE_IP and
> > PERF_SAMPLE_DATA flags are supported only.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  include/uapi/linux/bpf.h       | 23 ++++++++++++++++
> >  kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
> >  3 files changed, 95 insertions(+)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 94659f6b3395..cba501de9373 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -5481,6 +5481,28 @@ union bpf_attr {
> >   *           0 on success.
> >   *
> >   *           **-ENOENT** if the bpf_local_storage cannot be found.
> > + *
> > + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
> > + *   Description
> > + *           For an eBPF program attached to a perf event, retrieve the
> > + *           sample data associated to *ctx* and store it in the buffer
> > + *           pointed by *buf* up to size *size* bytes.
> > + *
> > + *           The *sample_flags* should contain a single value in the
> > + *           **enum perf_event_sample_format**.
> > + *   Return
> > + *           On success, number of bytes written to *buf*. On error, a
> > + *           negative value.
> > + *
> > + *           The *buf* can be set to **NULL** to return the number of bytes
> > + *           required to store the requested sample data.
> > + *
> > + *           **-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
> > + *
> > + *           **-ENOENT** if the associated perf event doesn't have the data.
> > + *
> > + *           **-ENOSYS** if system doesn't support the sample data to be
> > + *           retrieved.
> >   */
> >  #define ___BPF_FUNC_MAPPER(FN, ctx...)                       \
> >       FN(unspec, 0, ##ctx)                            \
> > @@ -5695,6 +5717,7 @@ union bpf_attr {
> >       FN(user_ringbuf_drain, 209, ##ctx)              \
> >       FN(cgrp_storage_get, 210, ##ctx)                \
> >       FN(cgrp_storage_delete, 211, ##ctx)             \
> > +     FN(perf_event_read_sample, 212, ##ctx)          \
> >       /* */
> >
> >  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ce0228c72a93..befd937afa3c 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -28,6 +28,7 @@
> >
> >  #include <uapi/linux/bpf.h>
> >  #include <uapi/linux/btf.h>
> > +#include <uapi/linux/perf_event.h>
> >
> >  #include <asm/tlb.h>
> >
> > @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
> >       .arg4_type      = ARG_ANYTHING,
> >  };
> >
> > +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
> > +        void *, buf, u32, size, u64, flags)
> > +{
>
> I wonder we could add perf_btf (like we have tp_btf) program type that
> could access ctx->data directly without helpers
>
> > +     struct perf_sample_data *sd = ctx->data;
> > +     void *data;
> > +     u32 to_copy = sizeof(u64);
> > +
> > +     /* only allow a single sample flag */
> > +     if (!is_power_of_2(flags))
> > +             return -EINVAL;
> > +
> > +     /* support reading only already populated info */
> > +     if (flags & ~sd->sample_flags)
> > +             return -ENOENT;
> > +
> > +     switch (flags) {
> > +     case PERF_SAMPLE_IP:
> > +             data = &sd->ip;
> > +             break;
> > +     case PERF_SAMPLE_ADDR:
> > +             data = &sd->addr;
> > +             break;
>
> AFAICS from pe_prog_convert_ctx_access you should be able to read addr
> directly from context right? same as sample_period.. so I think if this
> will be generic way to read sample data, should we add sample_period
> as well?

+1
Let's avoid new stable helpers for this.
Pls use CORE and read perf_sample_data directly.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-01 18:26     ` Alexei Starovoitov
@ 2022-11-01 18:46       ` Song Liu
  2022-11-01 18:52         ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Song Liu @ 2022-11-01 18:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Peter Zijlstra, Martin KaFai Lau,
	Yonghong Song, John Fastabend, KP Singh, Hao Luo,
	Stanislav Fomichev, LKML, bpf, Steven Rostedt, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Tue, Nov 1, 2022 at 11:26 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Nov 1, 2022 at 3:03 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
> > > The bpf_perf_event_read_sample() helper is to get the specified sample
> > > data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
> > > decision for filtering on samples.  Currently PERF_SAMPLE_IP and
> > > PERF_SAMPLE_DATA flags are supported only.
> > >
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > >  include/uapi/linux/bpf.h       | 23 ++++++++++++++++
> > >  kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
> > >  tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
> > >  3 files changed, 95 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 94659f6b3395..cba501de9373 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -5481,6 +5481,28 @@ union bpf_attr {
> > >   *           0 on success.
> > >   *
> > >   *           **-ENOENT** if the bpf_local_storage cannot be found.
> > > + *
> > > + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
> > > + *   Description
> > > + *           For an eBPF program attached to a perf event, retrieve the
> > > + *           sample data associated to *ctx* and store it in the buffer
> > > + *           pointed by *buf* up to size *size* bytes.
> > > + *
> > > + *           The *sample_flags* should contain a single value in the
> > > + *           **enum perf_event_sample_format**.
> > > + *   Return
> > > + *           On success, number of bytes written to *buf*. On error, a
> > > + *           negative value.
> > > + *
> > > + *           The *buf* can be set to **NULL** to return the number of bytes
> > > + *           required to store the requested sample data.
> > > + *
> > > + *           **-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
> > > + *
> > > + *           **-ENOENT** if the associated perf event doesn't have the data.
> > > + *
> > > + *           **-ENOSYS** if system doesn't support the sample data to be
> > > + *           retrieved.
> > >   */
> > >  #define ___BPF_FUNC_MAPPER(FN, ctx...)                       \
> > >       FN(unspec, 0, ##ctx)                            \
> > > @@ -5695,6 +5717,7 @@ union bpf_attr {
> > >       FN(user_ringbuf_drain, 209, ##ctx)              \
> > >       FN(cgrp_storage_get, 210, ##ctx)                \
> > >       FN(cgrp_storage_delete, 211, ##ctx)             \
> > > +     FN(perf_event_read_sample, 212, ##ctx)          \
> > >       /* */
> > >
> > >  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index ce0228c72a93..befd937afa3c 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -28,6 +28,7 @@
> > >
> > >  #include <uapi/linux/bpf.h>
> > >  #include <uapi/linux/btf.h>
> > > +#include <uapi/linux/perf_event.h>
> > >
> > >  #include <asm/tlb.h>
> > >
> > > @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
> > >       .arg4_type      = ARG_ANYTHING,
> > >  };
> > >
> > > +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
> > > +        void *, buf, u32, size, u64, flags)
> > > +{
> >
> > I wonder we could add perf_btf (like we have tp_btf) program type that
> > could access ctx->data directly without helpers
> >
> > > +     struct perf_sample_data *sd = ctx->data;
> > > +     void *data;
> > > +     u32 to_copy = sizeof(u64);
> > > +
> > > +     /* only allow a single sample flag */
> > > +     if (!is_power_of_2(flags))
> > > +             return -EINVAL;
> > > +
> > > +     /* support reading only already populated info */
> > > +     if (flags & ~sd->sample_flags)
> > > +             return -ENOENT;
> > > +
> > > +     switch (flags) {
> > > +     case PERF_SAMPLE_IP:
> > > +             data = &sd->ip;
> > > +             break;
> > > +     case PERF_SAMPLE_ADDR:
> > > +             data = &sd->addr;
> > > +             break;
> >
> > AFAICS from pe_prog_convert_ctx_access you should be able to read addr
> > directly from context right? same as sample_period.. so I think if this
> > will be generic way to read sample data, should we add sample_period
> > as well?
>
> +1
> Let's avoid new stable helpers for this.
> Pls use CORE and read perf_sample_data directly.

We have legacy ways to access sample_period and addr with
struct bpf_perf_event_data and struct bpf_perf_event_data_kern. I
think mixing that
with CORE makes it confusing for the user. And a helper or a kfunc would make it
easier to follow. perf_btf might also be a good approach for this.

Thanks,
Song

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-01 18:46       ` Song Liu
@ 2022-11-01 18:52         ` Alexei Starovoitov
  2022-11-01 20:04           ` Song Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2022-11-01 18:52 UTC (permalink / raw)
  To: Song Liu
  Cc: Jiri Olsa, Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Peter Zijlstra, Martin KaFai Lau,
	Yonghong Song, John Fastabend, KP Singh, Hao Luo,
	Stanislav Fomichev, LKML, bpf, Steven Rostedt, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Tue, Nov 1, 2022 at 11:47 AM Song Liu <song@kernel.org> wrote:
>
> On Tue, Nov 1, 2022 at 11:26 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Nov 1, 2022 at 3:03 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
> > > > The bpf_perf_event_read_sample() helper is to get the specified sample
> > > > data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
> > > > decision for filtering on samples.  Currently PERF_SAMPLE_IP and
> > > > PERF_SAMPLE_DATA flags are supported only.
> > > >
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > >  include/uapi/linux/bpf.h       | 23 ++++++++++++++++
> > > >  kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
> > > >  tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
> > > >  3 files changed, 95 insertions(+)
> > > >
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index 94659f6b3395..cba501de9373 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -5481,6 +5481,28 @@ union bpf_attr {
> > > >   *           0 on success.
> > > >   *
> > > >   *           **-ENOENT** if the bpf_local_storage cannot be found.
> > > > + *
> > > > + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
> > > > + *   Description
> > > > + *           For an eBPF program attached to a perf event, retrieve the
> > > > + *           sample data associated to *ctx* and store it in the buffer
> > > > + *           pointed by *buf* up to size *size* bytes.
> > > > + *
> > > > + *           The *sample_flags* should contain a single value in the
> > > > + *           **enum perf_event_sample_format**.
> > > > + *   Return
> > > > + *           On success, number of bytes written to *buf*. On error, a
> > > > + *           negative value.
> > > > + *
> > > > + *           The *buf* can be set to **NULL** to return the number of bytes
> > > > + *           required to store the requested sample data.
> > > > + *
> > > > + *           **-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
> > > > + *
> > > > + *           **-ENOENT** if the associated perf event doesn't have the data.
> > > > + *
> > > > + *           **-ENOSYS** if system doesn't support the sample data to be
> > > > + *           retrieved.
> > > >   */
> > > >  #define ___BPF_FUNC_MAPPER(FN, ctx...)                       \
> > > >       FN(unspec, 0, ##ctx)                            \
> > > > @@ -5695,6 +5717,7 @@ union bpf_attr {
> > > >       FN(user_ringbuf_drain, 209, ##ctx)              \
> > > >       FN(cgrp_storage_get, 210, ##ctx)                \
> > > >       FN(cgrp_storage_delete, 211, ##ctx)             \
> > > > +     FN(perf_event_read_sample, 212, ##ctx)          \
> > > >       /* */
> > > >
> > > >  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > index ce0228c72a93..befd937afa3c 100644
> > > > --- a/kernel/trace/bpf_trace.c
> > > > +++ b/kernel/trace/bpf_trace.c
> > > > @@ -28,6 +28,7 @@
> > > >
> > > >  #include <uapi/linux/bpf.h>
> > > >  #include <uapi/linux/btf.h>
> > > > +#include <uapi/linux/perf_event.h>
> > > >
> > > >  #include <asm/tlb.h>
> > > >
> > > > @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
> > > >       .arg4_type      = ARG_ANYTHING,
> > > >  };
> > > >
> > > > +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
> > > > +        void *, buf, u32, size, u64, flags)
> > > > +{
> > >
> > > I wonder we could add perf_btf (like we have tp_btf) program type that
> > > could access ctx->data directly without helpers
> > >
> > > > +     struct perf_sample_data *sd = ctx->data;
> > > > +     void *data;
> > > > +     u32 to_copy = sizeof(u64);
> > > > +
> > > > +     /* only allow a single sample flag */
> > > > +     if (!is_power_of_2(flags))
> > > > +             return -EINVAL;
> > > > +
> > > > +     /* support reading only already populated info */
> > > > +     if (flags & ~sd->sample_flags)
> > > > +             return -ENOENT;
> > > > +
> > > > +     switch (flags) {
> > > > +     case PERF_SAMPLE_IP:
> > > > +             data = &sd->ip;
> > > > +             break;
> > > > +     case PERF_SAMPLE_ADDR:
> > > > +             data = &sd->addr;
> > > > +             break;
> > >
> > > AFAICS from pe_prog_convert_ctx_access you should be able to read addr
> > > directly from context right? same as sample_period.. so I think if this
> > > will be generic way to read sample data, should we add sample_period
> > > as well?
> >
> > +1
> > Let's avoid new stable helpers for this.
> > Pls use CORE and read perf_sample_data directly.
>
> We have legacy ways to access sample_period and addr with
> struct bpf_perf_event_data and struct bpf_perf_event_data_kern. I
> think mixing that
> with CORE makes it confusing for the user. And a helper or a kfunc would make it
> easier to follow. perf_btf might also be a good approach for this.

imo that's a counter argument to non-CORE style.
struct bpf_perf_event_data has sample_period and addr,
and as soon as we pushed the boundaries it turned out it's not enough.
Now we're proposing to extend uapi a bit with sample_ip.
That will repeat the same mistake.
Just use CORE and read everything that is there today
and will be there in the future.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-01 18:52         ` Alexei Starovoitov
@ 2022-11-01 20:04           ` Song Liu
  2022-11-01 22:16             ` Namhyung Kim
  0 siblings, 1 reply; 19+ messages in thread
From: Song Liu @ 2022-11-01 20:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Song Liu, Peter Zijlstra, Martin KaFai Lau,
	Yonghong Song, John Fastabend, KP Singh, Hao Luo,
	Stanislav Fomichev, LKML, bpf, Steven Rostedt, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Tue, Nov 1, 2022 at 11:53 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Nov 1, 2022 at 11:47 AM Song Liu <song@kernel.org> wrote:
> >
> > On Tue, Nov 1, 2022 at 11:26 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, Nov 1, 2022 at 3:03 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > >
> > > > On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
> > > > > The bpf_perf_event_read_sample() helper is to get the specified sample
> > > > > data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
> > > > > decision for filtering on samples.  Currently PERF_SAMPLE_IP and
> > > > > PERF_SAMPLE_DATA flags are supported only.
> > > > >
> > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > ---
> > > > >  include/uapi/linux/bpf.h       | 23 ++++++++++++++++
> > > > >  kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
> > > > >  tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
> > > > >  3 files changed, 95 insertions(+)
> > > > >
> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > index 94659f6b3395..cba501de9373 100644
> > > > > --- a/include/uapi/linux/bpf.h
> > > > > +++ b/include/uapi/linux/bpf.h
> > > > > @@ -5481,6 +5481,28 @@ union bpf_attr {
> > > > >   *           0 on success.
> > > > >   *
> > > > >   *           **-ENOENT** if the bpf_local_storage cannot be found.
> > > > > + *
> > > > > + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
> > > > > + *   Description
> > > > > + *           For an eBPF program attached to a perf event, retrieve the
> > > > > + *           sample data associated to *ctx* and store it in the buffer
> > > > > + *           pointed by *buf* up to size *size* bytes.
> > > > > + *
> > > > > + *           The *sample_flags* should contain a single value in the
> > > > > + *           **enum perf_event_sample_format**.
> > > > > + *   Return
> > > > > + *           On success, number of bytes written to *buf*. On error, a
> > > > > + *           negative value.
> > > > > + *
> > > > > + *           The *buf* can be set to **NULL** to return the number of bytes
> > > > > + *           required to store the requested sample data.
> > > > > + *
> > > > > + *           **-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
> > > > > + *
> > > > > + *           **-ENOENT** if the associated perf event doesn't have the data.
> > > > > + *
> > > > > + *           **-ENOSYS** if system doesn't support the sample data to be
> > > > > + *           retrieved.
> > > > >   */
> > > > >  #define ___BPF_FUNC_MAPPER(FN, ctx...)                       \
> > > > >       FN(unspec, 0, ##ctx)                            \
> > > > > @@ -5695,6 +5717,7 @@ union bpf_attr {
> > > > >       FN(user_ringbuf_drain, 209, ##ctx)              \
> > > > >       FN(cgrp_storage_get, 210, ##ctx)                \
> > > > >       FN(cgrp_storage_delete, 211, ##ctx)             \
> > > > > +     FN(perf_event_read_sample, 212, ##ctx)          \
> > > > >       /* */
> > > > >
> > > > >  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > > index ce0228c72a93..befd937afa3c 100644
> > > > > --- a/kernel/trace/bpf_trace.c
> > > > > +++ b/kernel/trace/bpf_trace.c
> > > > > @@ -28,6 +28,7 @@
> > > > >
> > > > >  #include <uapi/linux/bpf.h>
> > > > >  #include <uapi/linux/btf.h>
> > > > > +#include <uapi/linux/perf_event.h>
> > > > >
> > > > >  #include <asm/tlb.h>
> > > > >
> > > > > @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
> > > > >       .arg4_type      = ARG_ANYTHING,
> > > > >  };
> > > > >
> > > > > +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
> > > > > +        void *, buf, u32, size, u64, flags)
> > > > > +{
> > > >
> > > > I wonder we could add perf_btf (like we have tp_btf) program type that
> > > > could access ctx->data directly without helpers
> > > >
> > > > > +     struct perf_sample_data *sd = ctx->data;
> > > > > +     void *data;
> > > > > +     u32 to_copy = sizeof(u64);
> > > > > +
> > > > > +     /* only allow a single sample flag */
> > > > > +     if (!is_power_of_2(flags))
> > > > > +             return -EINVAL;
> > > > > +
> > > > > +     /* support reading only already populated info */
> > > > > +     if (flags & ~sd->sample_flags)
> > > > > +             return -ENOENT;
> > > > > +
> > > > > +     switch (flags) {
> > > > > +     case PERF_SAMPLE_IP:
> > > > > +             data = &sd->ip;
> > > > > +             break;
> > > > > +     case PERF_SAMPLE_ADDR:
> > > > > +             data = &sd->addr;
> > > > > +             break;
> > > >
> > > > AFAICS from pe_prog_convert_ctx_access you should be able to read addr
> > > > directly from context right? same as sample_period.. so I think if this
> > > > will be generic way to read sample data, should we add sample_period
> > > > as well?
> > >
> > > +1
> > > Let's avoid new stable helpers for this.
> > > Pls use CORE and read perf_sample_data directly.
> >
> > We have legacy ways to access sample_period and addr with
> > struct bpf_perf_event_data and struct bpf_perf_event_data_kern. I
> > think mixing that
> > with CORE makes it confusing for the user. And a helper or a kfunc would make it
> > easier to follow. perf_btf might also be a good approach for this.
>
> imo that's a counter argument to non-CORE style.
> struct bpf_perf_event_data has sample_period and addr,
> and as soon as we pushed the boundaries it turned out it's not enough.
> Now we're proposing to extend uapi a bit with sample_ip.
> That will repeat the same mistake.
> Just use CORE and read everything that is there today
> and will be there in the future.

Another work of this effort is that we need the perf_event to prepare
required fields before calling the BPF program. I think we will need
some logic in addition to CORE to get that right. How about we add
perf_btf where the perf_event prepare all fields before calling the
BPF program? perf_btf + CORE will be able to read all fields in the
sample.

Thanks,
Song

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-01 20:04           ` Song Liu
@ 2022-11-01 22:16             ` Namhyung Kim
  2022-11-02  0:13               ` Song Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Namhyung Kim @ 2022-11-01 22:16 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Peter Zijlstra,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Steven Rostedt,
	Ingo Molnar, Arnaldo Carvalho de Melo

Hi,

On Tue, Nov 1, 2022 at 1:04 PM Song Liu <song@kernel.org> wrote:
>
> On Tue, Nov 1, 2022 at 11:53 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Nov 1, 2022 at 11:47 AM Song Liu <song@kernel.org> wrote:
> > >
> > > On Tue, Nov 1, 2022 at 11:26 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Tue, Nov 1, 2022 at 3:03 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > > > >
> > > > > On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
> > > > > > The bpf_perf_event_read_sample() helper is to get the specified sample
> > > > > > data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
> > > > > > decision for filtering on samples.  Currently PERF_SAMPLE_IP and
> > > > > > PERF_SAMPLE_DATA flags are supported only.
> > > > > >
> > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > ---
> > > > > >  include/uapi/linux/bpf.h       | 23 ++++++++++++++++
> > > > > >  kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
> > > > > >  tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
> > > > > >  3 files changed, 95 insertions(+)
> > > > > >
> > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > index 94659f6b3395..cba501de9373 100644
> > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > @@ -5481,6 +5481,28 @@ union bpf_attr {
> > > > > >   *           0 on success.
> > > > > >   *
> > > > > >   *           **-ENOENT** if the bpf_local_storage cannot be found.
> > > > > > + *
> > > > > > + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
> > > > > > + *   Description
> > > > > > + *           For an eBPF program attached to a perf event, retrieve the
> > > > > > + *           sample data associated to *ctx* and store it in the buffer
> > > > > > + *           pointed by *buf* up to size *size* bytes.
> > > > > > + *
> > > > > > + *           The *sample_flags* should contain a single value in the
> > > > > > + *           **enum perf_event_sample_format**.
> > > > > > + *   Return
> > > > > > + *           On success, number of bytes written to *buf*. On error, a
> > > > > > + *           negative value.
> > > > > > + *
> > > > > > + *           The *buf* can be set to **NULL** to return the number of bytes
> > > > > > + *           required to store the requested sample data.
> > > > > > + *
> > > > > > + *           **-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
> > > > > > + *
> > > > > > + *           **-ENOENT** if the associated perf event doesn't have the data.
> > > > > > + *
> > > > > > + *           **-ENOSYS** if system doesn't support the sample data to be
> > > > > > + *           retrieved.
> > > > > >   */
> > > > > >  #define ___BPF_FUNC_MAPPER(FN, ctx...)                       \
> > > > > >       FN(unspec, 0, ##ctx)                            \
> > > > > > @@ -5695,6 +5717,7 @@ union bpf_attr {
> > > > > >       FN(user_ringbuf_drain, 209, ##ctx)              \
> > > > > >       FN(cgrp_storage_get, 210, ##ctx)                \
> > > > > >       FN(cgrp_storage_delete, 211, ##ctx)             \
> > > > > > +     FN(perf_event_read_sample, 212, ##ctx)          \
> > > > > >       /* */
> > > > > >
> > > > > >  /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> > > > > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > > > > index ce0228c72a93..befd937afa3c 100644
> > > > > > --- a/kernel/trace/bpf_trace.c
> > > > > > +++ b/kernel/trace/bpf_trace.c
> > > > > > @@ -28,6 +28,7 @@
> > > > > >
> > > > > >  #include <uapi/linux/bpf.h>
> > > > > >  #include <uapi/linux/btf.h>
> > > > > > +#include <uapi/linux/perf_event.h>
> > > > > >
> > > > > >  #include <asm/tlb.h>
> > > > > >
> > > > > > @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
> > > > > >       .arg4_type      = ARG_ANYTHING,
> > > > > >  };
> > > > > >
> > > > > > +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
> > > > > > +        void *, buf, u32, size, u64, flags)
> > > > > > +{
> > > > >
> > > > > I wonder we could add perf_btf (like we have tp_btf) program type that
> > > > > could access ctx->data directly without helpers
> > > > >
> > > > > > +     struct perf_sample_data *sd = ctx->data;
> > > > > > +     void *data;
> > > > > > +     u32 to_copy = sizeof(u64);
> > > > > > +
> > > > > > +     /* only allow a single sample flag */
> > > > > > +     if (!is_power_of_2(flags))
> > > > > > +             return -EINVAL;
> > > > > > +
> > > > > > +     /* support reading only already populated info */
> > > > > > +     if (flags & ~sd->sample_flags)
> > > > > > +             return -ENOENT;
> > > > > > +
> > > > > > +     switch (flags) {
> > > > > > +     case PERF_SAMPLE_IP:
> > > > > > +             data = &sd->ip;
> > > > > > +             break;
> > > > > > +     case PERF_SAMPLE_ADDR:
> > > > > > +             data = &sd->addr;
> > > > > > +             break;
> > > > >
> > > > > AFAICS from pe_prog_convert_ctx_access you should be able to read addr
> > > > > directly from context right? same as sample_period.. so I think if this
> > > > > will be generic way to read sample data, should we add sample_period
> > > > > as well?
> > > >
> > > > +1
> > > > Let's avoid new stable helpers for this.
> > > > Pls use CORE and read perf_sample_data directly.
> > >
> > > We have legacy ways to access sample_period and addr with
> > > struct bpf_perf_event_data and struct bpf_perf_event_data_kern. I
> > > think mixing that
> > > with CORE makes it confusing for the user. And a helper or a kfunc would make it
> > > easier to follow. perf_btf might also be a good approach for this.
> >
> > imo that's a counter argument to non-CORE style.
> > struct bpf_perf_event_data has sample_period and addr,
> > and as soon as we pushed the boundaries it turned out it's not enough.
> > Now we're proposing to extend uapi a bit with sample_ip.
> > That will repeat the same mistake.
> > Just use CORE and read everything that is there today
> > and will be there in the future.
>
> Another work of this effort is that we need the perf_event to prepare
> required fields before calling the BPF program. I think we will need
> some logic in addition to CORE to get that right. How about we add
> perf_btf where the perf_event prepare all fields before calling the
> BPF program? perf_btf + CORE will be able to read all fields in the
> sample.

IIUC we want something like below to access sample data directly,
right?

  BPF_CORE_READ(ctx, data, ip);

Some fields like raw and callchains will have variable length data
so it'd be hard to check the boundary at load time.  Also it's possible
that some fields are not set (according to sample type), and it'd be
the user's (or programmer's) responsibility to check if the data is
valid.  If these are not the concerns, I think I'm good.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-01 22:16             ` Namhyung Kim
@ 2022-11-02  0:13               ` Song Liu
  2022-11-02 22:18                 ` Namhyung Kim
  0 siblings, 1 reply; 19+ messages in thread
From: Song Liu @ 2022-11-02  0:13 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Peter Zijlstra,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Steven Rostedt,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Tue, Nov 1, 2022 at 3:17 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > +1
> > > > > Let's avoid new stable helpers for this.
> > > > > Pls use CORE and read perf_sample_data directly.
> > > >
> > > > We have legacy ways to access sample_period and addr with
> > > > struct bpf_perf_event_data and struct bpf_perf_event_data_kern. I
> > > > think mixing that
> > > > with CORE makes it confusing for the user. And a helper or a kfunc would make it
> > > > easier to follow. perf_btf might also be a good approach for this.
> > >
> > > imo that's a counter argument to non-CORE style.
> > > struct bpf_perf_event_data has sample_period and addr,
> > > and as soon as we pushed the boundaries it turned out it's not enough.
> > > Now we're proposing to extend uapi a bit with sample_ip.
> > > That will repeat the same mistake.
> > > Just use CORE and read everything that is there today
> > > and will be there in the future.
> >
> > Another work of this effort is that we need the perf_event to prepare
> > required fields before calling the BPF program. I think we will need
> > some logic in addition to CORE to get that right. How about we add
> > perf_btf where the perf_event prepare all fields before calling the
> > BPF program? perf_btf + CORE will be able to read all fields in the
> > sample.
>
> IIUC we want something like below to access sample data directly,
> right?
>
>   BPF_CORE_READ(ctx, data, ip);
>

I haven't tried this, but I guess we may need something like

data = ctx->data;
BPF_CORE_READ(data, ip);

> Some fields like raw and callchains will have variable length data
> so it'd be hard to check the boundary at load time.

I think we are fine as long as we can check boundaries at run time.

> Also it's possible
> that some fields are not set (according to sample type), and it'd be
> the user's (or programmer's) responsibility to check if the data is
> valid.  If these are not the concerns, I think I'm good.

So we still need 1/3 of the set to make sure the data is valid?

Thanks,
Song

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-02  0:13               ` Song Liu
@ 2022-11-02 22:18                 ` Namhyung Kim
  2022-11-03 18:41                   ` Song Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Namhyung Kim @ 2022-11-02 22:18 UTC (permalink / raw)
  To: Song Liu
  Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Peter Zijlstra,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Steven Rostedt,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Tue, Nov 1, 2022 at 5:13 PM Song Liu <song@kernel.org> wrote:
>
> On Tue, Nov 1, 2022 at 3:17 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > IIUC we want something like below to access sample data directly,
> > right?
> >
> >   BPF_CORE_READ(ctx, data, ip);
> >
>
> I haven't tried this, but I guess we may need something like
>
> data = ctx->data;
> BPF_CORE_READ(data, ip);

Ok, will try.

>
> > Some fields like raw and callchains will have variable length data
> > so it'd be hard to check the boundary at load time.
>
> I think we are fine as long as we can check boundaries at run time.

Sure, that means it's the responsibility of BPF writers, right?

>
> > Also it's possible
> > that some fields are not set (according to sample type), and it'd be
> > the user's (or programmer's) responsibility to check if the data is
> > valid.  If these are not the concerns, I think I'm good.
>
> So we still need 1/3 of the set to make sure the data is valid?

Of course, I'll keep it in the v2.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-02 22:18                 ` Namhyung Kim
@ 2022-11-03 18:41                   ` Song Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Song Liu @ 2022-11-03 18:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Jiri Olsa, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Peter Zijlstra,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Steven Rostedt,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Wed, Nov 2, 2022 at 3:18 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Nov 1, 2022 at 5:13 PM Song Liu <song@kernel.org> wrote:
> >
> > On Tue, Nov 1, 2022 at 3:17 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > IIUC we want something like below to access sample data directly,
> > > right?
> > >
> > >   BPF_CORE_READ(ctx, data, ip);
> > >
> >
> > I haven't tried this, but I guess we may need something like
> >
> > data = ctx->data;
> > BPF_CORE_READ(data, ip);
>
> Ok, will try.
>
> >
> > > Some fields like raw and callchains will have variable length data
> > > so it'd be hard to check the boundary at load time.
> >
> > I think we are fine as long as we can check boundaries at run time.
>
> Sure, that means it's the responsibility of BPF writers, right?

Right, the author of the BPF program could check whether the data
is valid.

Song

>
> >
> > > Also it's possible
> > > that some fields are not set (according to sample type), and it'd be
> > > the user's (or programmer's) responsibility to check if the data is
> > > valid.  If these are not the concerns, I think I'm good.
> >
> > So we still need 1/3 of the set to make sure the data is valid?
>
> Of course, I'll keep it in the v2.
>
> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-01 10:02   ` Jiri Olsa
  2022-11-01 18:26     ` Alexei Starovoitov
@ 2022-11-03 19:45     ` Yonghong Song
  2022-11-03 20:55       ` Song Liu
  1 sibling, 1 reply; 19+ messages in thread
From: Yonghong Song @ 2022-11-03 19:45 UTC (permalink / raw)
  To: Jiri Olsa, Namhyung Kim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Peter Zijlstra, Martin KaFai Lau, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf, Steven Rostedt,
	Ingo Molnar, Arnaldo Carvalho de Melo



On 11/1/22 3:02 AM, Jiri Olsa wrote:
> On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
>> The bpf_perf_event_read_sample() helper is to get the specified sample
>> data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
>> decision for filtering on samples.  Currently PERF_SAMPLE_IP and
>> PERF_SAMPLE_DATA flags are supported only.
>>
>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>> ---
>>   include/uapi/linux/bpf.h       | 23 ++++++++++++++++
>>   kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
>>   tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
>>   3 files changed, 95 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 94659f6b3395..cba501de9373 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -5481,6 +5481,28 @@ union bpf_attr {
>>    *		0 on success.
>>    *
>>    *		**-ENOENT** if the bpf_local_storage cannot be found.
>> + *
>> + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
>> + *	Description
>> + *		For an eBPF program attached to a perf event, retrieve the
>> + *		sample data associated to *ctx*	and store it in the buffer
>> + *		pointed by *buf* up to size *size* bytes.
>> + *
>> + *		The *sample_flags* should contain a single value in the
>> + *		**enum perf_event_sample_format**.
>> + *	Return
>> + *		On success, number of bytes written to *buf*. On error, a
>> + *		negative value.
>> + *
>> + *		The *buf* can be set to **NULL** to return the number of bytes
>> + *		required to store the requested sample data.
>> + *
>> + *		**-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
>> + *
>> + *		**-ENOENT** if the associated perf event doesn't have the data.
>> + *
>> + *		**-ENOSYS** if system doesn't support the sample data to be
>> + *		retrieved.
>>    */
>>   #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
>>   	FN(unspec, 0, ##ctx)				\
>> @@ -5695,6 +5717,7 @@ union bpf_attr {
>>   	FN(user_ringbuf_drain, 209, ##ctx)		\
>>   	FN(cgrp_storage_get, 210, ##ctx)		\
>>   	FN(cgrp_storage_delete, 211, ##ctx)		\
>> +	FN(perf_event_read_sample, 212, ##ctx)		\
>>   	/* */
>>   
>>   /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ce0228c72a93..befd937afa3c 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -28,6 +28,7 @@
>>   
>>   #include <uapi/linux/bpf.h>
>>   #include <uapi/linux/btf.h>
>> +#include <uapi/linux/perf_event.h>
>>   
>>   #include <asm/tlb.h>
>>   
>> @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
>>   	.arg4_type      = ARG_ANYTHING,
>>   };
>>   
>> +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
>> +	   void *, buf, u32, size, u64, flags)
>> +{
> 
> I wonder we could add perf_btf (like we have tp_btf) program type that
> could access ctx->data directly without helpers

Martin and I have discussed an idea to introduce a generic helper like
     bpf_get_kern_ctx(void *ctx)
Given a context, the helper will return a PTR_TO_BTF_ID representing the
corresponding kernel ctx. So in the above example, user could call

     struct bpf_perf_event_data_kern *kctx = bpf_get_kern_ctx(ctx);
     ...

To implement bpf_get_kern_ctx helper, the verifier can find the type
of the context and provide a hidden btf_id as the second parameter of
the actual kernel helper function like
     bpf_get_kern_ctx(ctx) {
        return ctx;
     }
     /* based on ctx_btf_id, find kctx_btf_id and return it to verifier */

     The bpf_get_kern_ctx helper can be inlined as well.

> 
>> +	struct perf_sample_data *sd = ctx->data;
>> +	void *data;
>> +	u32 to_copy = sizeof(u64);
>> +
>> +	/* only allow a single sample flag */
>> +	if (!is_power_of_2(flags))
>> +		return -EINVAL;
>> +
>> +	/* support reading only already populated info */
>> +	if (flags & ~sd->sample_flags)
>> +		return -ENOENT;
>> +
>> +	switch (flags) {
>> +	case PERF_SAMPLE_IP:
>> +		data = &sd->ip;
>> +		break;
>> +	case PERF_SAMPLE_ADDR:
>> +		data = &sd->addr;
>> +		break;
> 
> AFAICS from pe_prog_convert_ctx_access you should be able to read addr
> directly from context right? same as sample_period.. so I think if this
> will be generic way to read sample data, should we add sample_period
> as well?
> 
> 
>> +	default:
>> +		return -ENOSYS;
>> +	}
>> +
>> +	if (!buf)
>> +		return to_copy;
>> +
>> +	if (size < to_copy)
>> +		to_copy = size;
> 
> should we fail in here instead? is there any point in returning
> not complete data?
> 
> jirka
> 
> 
>> +
>> +	memcpy(buf, data, to_copy);
>> +	return to_copy;
>> +}
>> +
>> +static const struct bpf_func_proto bpf_perf_event_read_sample_proto = {
>> +	.func           = bpf_perf_event_read_sample,
>> +	.gpl_only       = true,
>> +	.ret_type       = RET_INTEGER,
>> +	.arg1_type      = ARG_PTR_TO_CTX,
>> +	.arg2_type      = ARG_PTR_TO_MEM_OR_NULL,
>> +	.arg3_type      = ARG_CONST_SIZE_OR_ZERO,
>> +	.arg4_type      = ARG_ANYTHING,
>> +};
>> +
>[...]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-03 19:45     ` Yonghong Song
@ 2022-11-03 20:55       ` Song Liu
  2022-11-03 21:21         ` Yonghong Song
  0 siblings, 1 reply; 19+ messages in thread
From: Song Liu @ 2022-11-03 20:55 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Jiri Olsa, Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Peter Zijlstra, Martin Lau, Yonghong Song,
	John Fastabend, KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf,
	Steven Rostedt, Ingo Molnar, Arnaldo Carvalho de Melo



> On Nov 3, 2022, at 12:45 PM, Yonghong Song <yhs@meta.com> wrote:
> 
> 
> 
> On 11/1/22 3:02 AM, Jiri Olsa wrote:
>> On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
>>> The bpf_perf_event_read_sample() helper is to get the specified sample
>>> data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
>>> decision for filtering on samples.  Currently PERF_SAMPLE_IP and
>>> PERF_SAMPLE_DATA flags are supported only.
>>> 
>>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>>> ---
>>>  include/uapi/linux/bpf.h       | 23 ++++++++++++++++
>>>  kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
>>>  tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
>>>  3 files changed, 95 insertions(+)
>>> 
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 94659f6b3395..cba501de9373 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -5481,6 +5481,28 @@ union bpf_attr {
>>>   *		0 on success.
>>>   *
>>>   *		**-ENOENT** if the bpf_local_storage cannot be found.
>>> + *
>>> + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
>>> + *	Description
>>> + *		For an eBPF program attached to a perf event, retrieve the
>>> + *		sample data associated to *ctx*	and store it in the buffer
>>> + *		pointed by *buf* up to size *size* bytes.
>>> + *
>>> + *		The *sample_flags* should contain a single value in the
>>> + *		**enum perf_event_sample_format**.
>>> + *	Return
>>> + *		On success, number of bytes written to *buf*. On error, a
>>> + *		negative value.
>>> + *
>>> + *		The *buf* can be set to **NULL** to return the number of bytes
>>> + *		required to store the requested sample data.
>>> + *
>>> + *		**-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
>>> + *
>>> + *		**-ENOENT** if the associated perf event doesn't have the data.
>>> + *
>>> + *		**-ENOSYS** if system doesn't support the sample data to be
>>> + *		retrieved.
>>>   */
>>>  #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
>>>  	FN(unspec, 0, ##ctx)				\
>>> @@ -5695,6 +5717,7 @@ union bpf_attr {
>>>  	FN(user_ringbuf_drain, 209, ##ctx)		\
>>>  	FN(cgrp_storage_get, 210, ##ctx)		\
>>>  	FN(cgrp_storage_delete, 211, ##ctx)		\
>>> +	FN(perf_event_read_sample, 212, ##ctx)		\
>>>  	/* */
>>>    /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>> index ce0228c72a93..befd937afa3c 100644
>>> --- a/kernel/trace/bpf_trace.c
>>> +++ b/kernel/trace/bpf_trace.c
>>> @@ -28,6 +28,7 @@
>>>    #include <uapi/linux/bpf.h>
>>>  #include <uapi/linux/btf.h>
>>> +#include <uapi/linux/perf_event.h>
>>>    #include <asm/tlb.h>
>>>  @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
>>>  	.arg4_type      = ARG_ANYTHING,
>>>  };
>>>  +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
>>> +	   void *, buf, u32, size, u64, flags)
>>> +{
>> I wonder we could add perf_btf (like we have tp_btf) program type that
>> could access ctx->data directly without helpers
> 
> Martin and I have discussed an idea to introduce a generic helper like
>    bpf_get_kern_ctx(void *ctx)
> Given a context, the helper will return a PTR_TO_BTF_ID representing the
> corresponding kernel ctx. So in the above example, user could call
> 
>    struct bpf_perf_event_data_kern *kctx = bpf_get_kern_ctx(ctx);
>    ...

This is an interesting idea! 

> To implement bpf_get_kern_ctx helper, the verifier can find the type
> of the context and provide a hidden btf_id as the second parameter of
> the actual kernel helper function like
>    bpf_get_kern_ctx(ctx) {
>       return ctx;
>    }
>    /* based on ctx_btf_id, find kctx_btf_id and return it to verifier */

I think we will need a map of ctx_btf_id => kctx_btf_id. Shall we somehow
expose this to the user? 

Thanks,
Song


>    The bpf_get_kern_ctx helper can be inlined as well.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-03 20:55       ` Song Liu
@ 2022-11-03 21:21         ` Yonghong Song
  2022-11-04  6:18           ` Namhyung Kim
  0 siblings, 1 reply; 19+ messages in thread
From: Yonghong Song @ 2022-11-03 21:21 UTC (permalink / raw)
  To: Song Liu
  Cc: Jiri Olsa, Namhyung Kim, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Peter Zijlstra, Martin Lau, John Fastabend,
	KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf, Steven Rostedt,
	Ingo Molnar, Arnaldo Carvalho de Melo



On 11/3/22 1:55 PM, Song Liu wrote:
> 
> 
>> On Nov 3, 2022, at 12:45 PM, Yonghong Song <yhs@meta.com> wrote:
>>
>>
>>
>> On 11/1/22 3:02 AM, Jiri Olsa wrote:
>>> On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
>>>> The bpf_perf_event_read_sample() helper is to get the specified sample
>>>> data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
>>>> decision for filtering on samples.  Currently PERF_SAMPLE_IP and
>>>> PERF_SAMPLE_DATA flags are supported only.
>>>>
>>>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>>>> ---
>>>>   include/uapi/linux/bpf.h       | 23 ++++++++++++++++
>>>>   kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
>>>>   tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
>>>>   3 files changed, 95 insertions(+)
>>>>
>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>> index 94659f6b3395..cba501de9373 100644
>>>> --- a/include/uapi/linux/bpf.h
>>>> +++ b/include/uapi/linux/bpf.h
>>>> @@ -5481,6 +5481,28 @@ union bpf_attr {
>>>>    *		0 on success.
>>>>    *
>>>>    *		**-ENOENT** if the bpf_local_storage cannot be found.
>>>> + *
>>>> + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
>>>> + *	Description
>>>> + *		For an eBPF program attached to a perf event, retrieve the
>>>> + *		sample data associated to *ctx*	and store it in the buffer
>>>> + *		pointed by *buf* up to size *size* bytes.
>>>> + *
>>>> + *		The *sample_flags* should contain a single value in the
>>>> + *		**enum perf_event_sample_format**.
>>>> + *	Return
>>>> + *		On success, number of bytes written to *buf*. On error, a
>>>> + *		negative value.
>>>> + *
>>>> + *		The *buf* can be set to **NULL** to return the number of bytes
>>>> + *		required to store the requested sample data.
>>>> + *
>>>> + *		**-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
>>>> + *
>>>> + *		**-ENOENT** if the associated perf event doesn't have the data.
>>>> + *
>>>> + *		**-ENOSYS** if system doesn't support the sample data to be
>>>> + *		retrieved.
>>>>    */
>>>>   #define ___BPF_FUNC_MAPPER(FN, ctx...)			\
>>>>   	FN(unspec, 0, ##ctx)				\
>>>> @@ -5695,6 +5717,7 @@ union bpf_attr {
>>>>   	FN(user_ringbuf_drain, 209, ##ctx)		\
>>>>   	FN(cgrp_storage_get, 210, ##ctx)		\
>>>>   	FN(cgrp_storage_delete, 211, ##ctx)		\
>>>> +	FN(perf_event_read_sample, 212, ##ctx)		\
>>>>   	/* */
>>>>     /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>>> index ce0228c72a93..befd937afa3c 100644
>>>> --- a/kernel/trace/bpf_trace.c
>>>> +++ b/kernel/trace/bpf_trace.c
>>>> @@ -28,6 +28,7 @@
>>>>     #include <uapi/linux/bpf.h>
>>>>   #include <uapi/linux/btf.h>
>>>> +#include <uapi/linux/perf_event.h>
>>>>     #include <asm/tlb.h>
>>>>   @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
>>>>   	.arg4_type      = ARG_ANYTHING,
>>>>   };
>>>>   +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
>>>> +	   void *, buf, u32, size, u64, flags)
>>>> +{
>>> I wonder we could add perf_btf (like we have tp_btf) program type that
>>> could access ctx->data directly without helpers
>>
>> Martin and I have discussed an idea to introduce a generic helper like
>>     bpf_get_kern_ctx(void *ctx)
>> Given a context, the helper will return a PTR_TO_BTF_ID representing the
>> corresponding kernel ctx. So in the above example, user could call
>>
>>     struct bpf_perf_event_data_kern *kctx = bpf_get_kern_ctx(ctx);
>>     ...
> 
> This is an interesting idea!
> 
>> To implement bpf_get_kern_ctx helper, the verifier can find the type
>> of the context and provide a hidden btf_id as the second parameter of
>> the actual kernel helper function like
>>     bpf_get_kern_ctx(ctx) {
>>        return ctx;
>>     }
>>     /* based on ctx_btf_id, find kctx_btf_id and return it to verifier */
> 
> I think we will need a map of ctx_btf_id => kctx_btf_id. Shall we somehow
> expose this to the user?

Yes, inside the kernel we need ctx_btf_id -> kctx_btf_id mapping.
Good question. We might not want to this mapping as a stable API.
So using kfunc might be more appropriate.

> 
> Thanks,
> Song
> 
> 
>>     The bpf_get_kern_ctx helper can be inlined as well.
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/3] perf/core: Prepare sample data before calling BPF
  2022-11-01 10:03   ` Jiri Olsa
@ 2022-11-04  6:03     ` Namhyung Kim
  0 siblings, 0 replies; 19+ messages in thread
From: Namhyung Kim @ 2022-11-04  6:03 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Peter Zijlstra, Martin KaFai Lau, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf, Steven Rostedt,
	Ingo Molnar, Arnaldo Carvalho de Melo

Hi Jiri,

On Tue, Nov 1, 2022 at 3:03 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Oct 31, 2022 at 10:23:38PM -0700, Namhyung Kim wrote:
> > To allow bpf overflow handler to access the perf sample data, it needs to
> > prepare missing but requested data before calling the handler.
> >
> > I'm taking a conservative approach to allow a list of sample formats only
> > instead of allowing them all.  For now, IP and ADDR data are allowed and
> > I think it's good enough to build and verify general BPF-based sample
> > filters for perf events.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  kernel/events/core.c | 40 +++++++++++++++++++++++++++++++---------
> >  1 file changed, 31 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index aefc1e08e015..519f30c33a24 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -7329,8 +7329,10 @@ void perf_prepare_sample(struct perf_event_header *header,
> >       filtered_sample_type = sample_type & ~data->sample_flags;
> >       __perf_event_header__init_id(header, data, event, filtered_sample_type);
> >
> > -     if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE))
> > -             data->ip = perf_instruction_pointer(regs);
> > +     if (sample_type & (PERF_SAMPLE_IP | PERF_SAMPLE_CODE_PAGE_SIZE)) {
> > +             if (filtered_sample_type & PERF_SAMPLE_IP)
> > +                     data->ip = perf_instruction_pointer(regs);
> > +     }
> >
> >       if (sample_type & PERF_SAMPLE_CALLCHAIN) {
> >               int size = 1;
> > @@ -10006,6 +10008,32 @@ static void perf_event_free_filter(struct perf_event *event)
> >  }
> >
> >  #ifdef CONFIG_BPF_SYSCALL
> > +static void bpf_prepare_sample(struct bpf_prog *prog,
> > +                            struct perf_event *event,
> > +                            struct perf_sample_data *data,
> > +                            struct pt_regs *regs)
> > +{
> > +     u64 filtered_sample_type;
> > +
> > +     filtered_sample_type = event->attr.sample_type & ~data->sample_flags;
>
> could we add the same comment in here as is in perf_prepare_sample
>
>         /*
>          * Clear the sample flags that have already been done by the
>          * PMU driver.
>          */
>
> it took me while to recall while we set addr to 0 in here ;-)

Sorry about that! :)  I'll add the comment.

Thanks,
Namhyung


>
> > +
> > +     if (prog->call_get_stack &&
> > +         (filtered_sample_type & PERF_SAMPLE_CALLCHAIN)) {
> > +             data->callchain = perf_callchain(event, regs);
> > +             data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
> > +     }
> > +
> > +     if (filtered_sample_type & PERF_SAMPLE_IP) {
> > +             data->ip = perf_instruction_pointer(regs);
> > +             data->sample_flags |= PERF_SAMPLE_IP;
> > +     }
> > +
> > +     if (filtered_sample_type & PERF_SAMPLE_ADDR) {
> > +             data->addr = 0;
> > +             data->sample_flags |= PERF_SAMPLE_ADDR;
> > +     }
> > +}
> > +
> >  static void bpf_overflow_handler(struct perf_event *event,
> >                                struct perf_sample_data *data,
> >                                struct pt_regs *regs)
> > @@ -10023,13 +10051,7 @@ static void bpf_overflow_handler(struct perf_event *event,
> >       rcu_read_lock();
> >       prog = READ_ONCE(event->prog);
> >       if (prog) {
> > -             if (prog->call_get_stack &&
> > -                 (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) &&
> > -                 !(data->sample_flags & PERF_SAMPLE_CALLCHAIN)) {
> > -                     data->callchain = perf_callchain(event, regs);
> > -                     data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
> > -             }
> > -
> > +             bpf_prepare_sample(prog, event, data, regs);
> >               ret = bpf_prog_run(prog, &ctx);
> >       }
> >       rcu_read_unlock();
> > --
> > 2.38.1.273.g43a17bfeac-goog
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper
  2022-11-03 21:21         ` Yonghong Song
@ 2022-11-04  6:18           ` Namhyung Kim
  0 siblings, 0 replies; 19+ messages in thread
From: Namhyung Kim @ 2022-11-04  6:18 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Song Liu, Jiri Olsa, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Peter Zijlstra, Martin Lau, John Fastabend,
	KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf, Steven Rostedt,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Thu, Nov 3, 2022 at 2:21 PM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 11/3/22 1:55 PM, Song Liu wrote:
> >
> >
> >> On Nov 3, 2022, at 12:45 PM, Yonghong Song <yhs@meta.com> wrote:
> >>
> >>
> >>
> >> On 11/1/22 3:02 AM, Jiri Olsa wrote:
> >>> On Mon, Oct 31, 2022 at 10:23:39PM -0700, Namhyung Kim wrote:
> >>>> The bpf_perf_event_read_sample() helper is to get the specified sample
> >>>> data (by using PERF_SAMPLE_* flag in the argument) from BPF to make a
> >>>> decision for filtering on samples.  Currently PERF_SAMPLE_IP and
> >>>> PERF_SAMPLE_DATA flags are supported only.
> >>>>
> >>>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> >>>> ---
> >>>>   include/uapi/linux/bpf.h       | 23 ++++++++++++++++
> >>>>   kernel/trace/bpf_trace.c       | 49 ++++++++++++++++++++++++++++++++++
> >>>>   tools/include/uapi/linux/bpf.h | 23 ++++++++++++++++
> >>>>   3 files changed, 95 insertions(+)
> >>>>
> >>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>>> index 94659f6b3395..cba501de9373 100644
> >>>> --- a/include/uapi/linux/bpf.h
> >>>> +++ b/include/uapi/linux/bpf.h
> >>>> @@ -5481,6 +5481,28 @@ union bpf_attr {
> >>>>    *               0 on success.
> >>>>    *
> >>>>    *               **-ENOENT** if the bpf_local_storage cannot be found.
> >>>> + *
> >>>> + * long bpf_perf_event_read_sample(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 sample_flags)
> >>>> + *        Description
> >>>> + *                For an eBPF program attached to a perf event, retrieve the
> >>>> + *                sample data associated to *ctx* and store it in the buffer
> >>>> + *                pointed by *buf* up to size *size* bytes.
> >>>> + *
> >>>> + *                The *sample_flags* should contain a single value in the
> >>>> + *                **enum perf_event_sample_format**.
> >>>> + *        Return
> >>>> + *                On success, number of bytes written to *buf*. On error, a
> >>>> + *                negative value.
> >>>> + *
> >>>> + *                The *buf* can be set to **NULL** to return the number of bytes
> >>>> + *                required to store the requested sample data.
> >>>> + *
> >>>> + *                **-EINVAL** if *sample_flags* is not a PERF_SAMPLE_* flag.
> >>>> + *
> >>>> + *                **-ENOENT** if the associated perf event doesn't have the data.
> >>>> + *
> >>>> + *                **-ENOSYS** if system doesn't support the sample data to be
> >>>> + *                retrieved.
> >>>>    */
> >>>>   #define ___BPF_FUNC_MAPPER(FN, ctx...)                   \
> >>>>    FN(unspec, 0, ##ctx)                            \
> >>>> @@ -5695,6 +5717,7 @@ union bpf_attr {
> >>>>    FN(user_ringbuf_drain, 209, ##ctx)              \
> >>>>    FN(cgrp_storage_get, 210, ##ctx)                \
> >>>>    FN(cgrp_storage_delete, 211, ##ctx)             \
> >>>> +  FN(perf_event_read_sample, 212, ##ctx)          \
> >>>>    /* */
> >>>>     /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't
> >>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> >>>> index ce0228c72a93..befd937afa3c 100644
> >>>> --- a/kernel/trace/bpf_trace.c
> >>>> +++ b/kernel/trace/bpf_trace.c
> >>>> @@ -28,6 +28,7 @@
> >>>>     #include <uapi/linux/bpf.h>
> >>>>   #include <uapi/linux/btf.h>
> >>>> +#include <uapi/linux/perf_event.h>
> >>>>     #include <asm/tlb.h>
> >>>>   @@ -1743,6 +1744,52 @@ static const struct bpf_func_proto bpf_read_branch_records_proto = {
> >>>>    .arg4_type      = ARG_ANYTHING,
> >>>>   };
> >>>>   +BPF_CALL_4(bpf_perf_event_read_sample, struct bpf_perf_event_data_kern *, ctx,
> >>>> +     void *, buf, u32, size, u64, flags)
> >>>> +{
> >>> I wonder we could add perf_btf (like we have tp_btf) program type that
> >>> could access ctx->data directly without helpers
> >>
> >> Martin and I have discussed an idea to introduce a generic helper like
> >>     bpf_get_kern_ctx(void *ctx)
> >> Given a context, the helper will return a PTR_TO_BTF_ID representing the
> >> corresponding kernel ctx. So in the above example, user could call
> >>
> >>     struct bpf_perf_event_data_kern *kctx = bpf_get_kern_ctx(ctx);
> >>     ...
> >
> > This is an interesting idea!
> >
> >> To implement bpf_get_kern_ctx helper, the verifier can find the type
> >> of the context and provide a hidden btf_id as the second parameter of
> >> the actual kernel helper function like
> >>     bpf_get_kern_ctx(ctx) {
> >>        return ctx;
> >>     }
> >>     /* based on ctx_btf_id, find kctx_btf_id and return it to verifier */
> >
> > I think we will need a map of ctx_btf_id => kctx_btf_id. Shall we somehow
> > expose this to the user?
>
> Yes, inside the kernel we need ctx_btf_id -> kctx_btf_id mapping.
> Good question. We might not want to this mapping as a stable API.
> So using kfunc might be more appropriate.

Ok, now I don't think I'm following well.. ;-)

So currently perf event type BPF programs can have perf_event
data context directly as an argument, but we want to disallow it?
I guess the context id mapping can be done implicitly based on
the prog type and/or attach type, but probably I'm missing
something here. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-11-04  6:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-01  5:23 [PATCH bpf-next 0/3] bpf: Add bpf_perf_event_read_sample() helper (v1) Namhyung Kim
2022-11-01  5:23 ` [PATCH bpf-next 1/3] perf/core: Prepare sample data before calling BPF Namhyung Kim
2022-11-01 10:03   ` Jiri Olsa
2022-11-04  6:03     ` Namhyung Kim
2022-11-01  5:23 ` [PATCH bpf-next 2/3] bpf: Add bpf_perf_event_read_sample() helper Namhyung Kim
2022-11-01 10:02   ` Jiri Olsa
2022-11-01 18:26     ` Alexei Starovoitov
2022-11-01 18:46       ` Song Liu
2022-11-01 18:52         ` Alexei Starovoitov
2022-11-01 20:04           ` Song Liu
2022-11-01 22:16             ` Namhyung Kim
2022-11-02  0:13               ` Song Liu
2022-11-02 22:18                 ` Namhyung Kim
2022-11-03 18:41                   ` Song Liu
2022-11-03 19:45     ` Yonghong Song
2022-11-03 20:55       ` Song Liu
2022-11-03 21:21         ` Yonghong Song
2022-11-04  6:18           ` Namhyung Kim
2022-11-01  5:23 ` [PATCH bpf-next 3/3] bpf: Add perf_event_read_sample test cases Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.