All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/2] bpf: Allow access to perf sample data (v2)
@ 2022-12-20 22:01 Namhyung Kim
  2022-12-20 22:01 ` [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run() Namhyung Kim
  2022-12-20 22:01 ` [PATCH bpf-next 2/2] selftests/bpf: Add perf_event_read_sample test cases Namhyung Kim
  0 siblings, 2 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-20 22:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Jiri Olsa, Peter Zijlstra
  Cc: Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Ingo Molnar,
	Arnaldo Carvalho de Melo

Hello,

I'm working on perf event sample filtering using BPF.  To do that BPF needs
to access perf sample data and return 0 or 1 to drop or keep the samples.

Changes in v2)
 - reuse perf_prepare_sample() instead of adding new bpf_prepare_sample()
 - drop bpf_perf_event_read_helper() and access ctx->data directly using
   bpf_cast_to_kern_ctx().

v1) https://lore.kernel.org/r/20221101052340.1210239-1-namhyung@kernel.org

Thanks to bpf_cast_to_kern_ctx() kfunc, it can easily access the sample data
now.  But the problem is that perf didn't populate the sample data at the time
it calls bpf_prog_run().  I changed the code to simply call perf_prepare_sample
function before calling the BPF program.

But it also checks if the BPF calls bpf_cast_to_kern_ctx() since calling
perf_prepare_sample() is unnecessary if the BPF doesn't access to the sample.
The perf_prepare_sample() was only called right before putting it to the perf
ring buffer.  I think I can add a little optimization not to fill already set
fields as it can be called twice now.  It can be a separate patch for perf.

Another issue is that perf sample data only has selected fields according to
the sample_type flags in the perf_event_attr.  Accessing other fields can
result in uninitialized read.  I'm not sure how much it's gonna be a problem
but it seems there's no way to prevent it completely.  So properly written
programs should check the sample_type flags first when reading the sample data.

The code is available at 'bpf/perf-sample-v2' branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (2):
  bpf/perf: Call perf_prepare_sample() before bpf_prog_run()
  selftests/bpf: Add perf_event_read_sample test cases

 include/linux/bpf.h                           |   1 +
 kernel/bpf/verifier.c                         |   1 +
 kernel/events/core.c                          |   3 +
 .../selftests/bpf/prog_tests/perf_sample.c    | 167 ++++++++++++++++++
 .../selftests/bpf/progs/test_perf_sample.c    |  33 ++++
 5 files changed, 205 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_sample.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_sample.c

-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run()
  2022-12-20 22:01 [PATCH bpf-next 0/2] bpf: Allow access to perf sample data (v2) Namhyung Kim
@ 2022-12-20 22:01 ` Namhyung Kim
  2022-12-22 12:55   ` Peter Zijlstra
  2022-12-20 22:01 ` [PATCH bpf-next 2/2] selftests/bpf: Add perf_event_read_sample test cases Namhyung Kim
  1 sibling, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2022-12-20 22:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Jiri Olsa, Peter Zijlstra
  Cc: Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Ingo Molnar,
	Arnaldo Carvalho de Melo

When the BPF program calls bpf_cast_to_kern_ctx(), it assumes the program will
access perf sample data directly and call perf_prepare_sample() to make sure
the sample data is populated.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 include/linux/bpf.h   | 1 +
 kernel/bpf/verifier.c | 1 +
 kernel/events/core.c  | 3 +++
 3 files changed, 5 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5fec2d1be6d7..6bd4c21a6dd4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1341,6 +1341,7 @@ struct bpf_prog {
 				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
 				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
 				call_get_func_ip:1, /* Do we call get_func_ip() */
+				call_cast_kctx:1, /* Do we call bpf_cast_to_kern_ctx() */
 				tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	enum bpf_attach_type	expected_attach_type; /* For some prog types */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index faa358b3d5d7..23a9dc187292 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9236,6 +9236,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
 				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
 				regs[BPF_REG_0].btf = desc_btf;
 				regs[BPF_REG_0].btf_id = meta.ret_btf_id;
+				env->prog->call_cast_kctx = 1;
 			} else if (meta.func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
 				ret_t = btf_type_by_id(desc_btf, meta.arg_constant.value);
 				if (!ret_t || !btf_type_is_struct(ret_t)) {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e47914ac8732..a654a0cb6842 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10332,6 +10332,7 @@ static void bpf_overflow_handler(struct perf_event *event,
 		.event = event,
 	};
 	struct bpf_prog *prog;
+	struct perf_event_header dummy;
 	int ret = 0;
 
 	ctx.regs = perf_arch_bpf_user_pt_regs(regs);
@@ -10346,6 +10347,8 @@ static void bpf_overflow_handler(struct perf_event *event,
 			data->callchain = perf_callchain(event, regs);
 			data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
 		}
+		if (prog->call_cast_kctx)
+			perf_prepare_sample(&dummy, data, event, regs);
 
 		ret = bpf_prog_run(prog, &ctx);
 	}
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH bpf-next 2/2] selftests/bpf: Add perf_event_read_sample test cases
  2022-12-20 22:01 [PATCH bpf-next 0/2] bpf: Allow access to perf sample data (v2) Namhyung Kim
  2022-12-20 22:01 ` [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run() Namhyung Kim
@ 2022-12-20 22:01 ` Namhyung Kim
  1 sibling, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-20 22:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Jiri Olsa, Peter Zijlstra
  Cc: Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Ingo Molnar,
	Arnaldo Carvalho de Melo

It checks the perf event sample access with bpf_cast_to_kern_ctx().
It should access sample data only event->attr.sample_type allows.
Other fields might not be initialized.

  $ ./vmtest.sh ./test_progs -t perf_event_read_sample
  ...
  #135/1   perf_event_read_sample/perf_event_read_sample_ok:OK
  #135/2   perf_event_read_sample/perf_event_read_sample_invalid:OK
  #135     perf_event_read_sample:OK

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 .../selftests/bpf/prog_tests/perf_sample.c    | 167 ++++++++++++++++++
 .../selftests/bpf/progs/test_perf_sample.c    |  33 ++++
 2 files changed, 200 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_sample.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_sample.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_sample.c b/tools/testing/selftests/bpf/prog_tests/perf_sample.c
new file mode 100644
index 000000000000..dc1e88711e23
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_sample.c
@@ -0,0 +1,167 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+
+#include <test_progs.h>
+#include "test_perf_sample.skel.h"
+
+#ifndef noinline
+#define noinline __attribute__((noinline))
+#endif
+
+/* treat user-stack data as invalid (for testing only) */
+#define PERF_SAMPLE_INVALID  PERF_SAMPLE_STACK_USER
+
+#define PERF_MMAP_SIZE  8192
+#define DATA_MMAP_SIZE  4096
+
+static int perf_fd = -1;
+static void *perf_ringbuf;
+static struct test_perf_sample *skel;
+
+static int open_perf_event(u64 sample_flags)
+{
+	struct perf_event_attr attr = {
+		.type = PERF_TYPE_SOFTWARE,
+		.config = PERF_COUNT_SW_PAGE_FAULTS,
+		.sample_type = sample_flags,
+		.sample_period = 1,
+		.disabled = 1,
+		.size = sizeof(attr),
+	};
+	int fd;
+	void *ptr;
+
+	fd = syscall(SYS_perf_event_open, &attr, 0, -1, -1, 0);
+	if (!ASSERT_GT(fd, 0, "perf_event_open"))
+		return -1;
+
+	ptr = mmap(NULL, PERF_MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
+	if (!ASSERT_NEQ(ptr, MAP_FAILED, "mmap")) {
+		close(fd);
+		return -1;
+	}
+
+	perf_fd = fd;
+	perf_ringbuf = ptr;
+
+	return 0;
+}
+
+static void close_perf_event(void)
+{
+	if (perf_fd == -1)
+		return;
+
+	munmap(perf_ringbuf, PERF_MMAP_SIZE);
+	close(perf_fd);
+
+	perf_fd = -1;
+	perf_ringbuf = NULL;
+}
+
+static noinline void *trigger_perf_event(void)
+{
+	int *buf = mmap(NULL, DATA_MMAP_SIZE, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
+
+	if (!ASSERT_NEQ(buf, MAP_FAILED, "mmap"))
+		return NULL;
+
+	ioctl(perf_fd, PERF_EVENT_IOC_ENABLE);
+
+	/* it should generate a page fault which triggers the perf_event */
+	*buf = 1;
+
+	ioctl(perf_fd, PERF_EVENT_IOC_DISABLE);
+
+	munmap(buf, DATA_MMAP_SIZE);
+
+	/* return the map address to check the sample addr */
+	return buf;
+}
+
+/* check if the perf ringbuf has a sample data */
+static int check_perf_event(void)
+{
+	struct perf_event_mmap_page *page = perf_ringbuf;
+	struct perf_event_header *hdr;
+
+	if (page->data_head == page->data_tail)
+		return 0;
+
+	hdr = perf_ringbuf + page->data_offset;
+
+	if (hdr->type != PERF_RECORD_SAMPLE)
+		return 0;
+
+	return 1;
+}
+
+static void setup_perf_sample_bpf_skel(void)
+{
+	struct bpf_link *link;
+
+	skel = test_perf_sample__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "test_perf_sample_open_and_load"))
+		return;
+
+	link = bpf_program__attach_perf_event(skel->progs.perf_sample_filter, perf_fd);
+	if (!ASSERT_OK_PTR(link, "bpf_program__attach_perf_event"))
+		return;
+}
+
+static void clean_perf_sample_bpf_skel(void)
+{
+	test_perf_sample__detach(skel);
+	test_perf_sample__destroy(skel);
+}
+
+static void test_perf_event_read_sample_ok(void)
+{
+	u64 flags = PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_ADDR;
+	uintptr_t map_addr;
+
+	if (open_perf_event(flags) < 0)
+		return;
+	setup_perf_sample_bpf_skel();
+	map_addr = (uintptr_t)trigger_perf_event();
+
+	ASSERT_EQ(check_perf_event(), 1, "number of sample");
+	ASSERT_EQ(skel->bss->sample_addr, map_addr, "sample addr");
+	ASSERT_EQ((int)skel->bss->sample_pid, getpid(), "sample pid");
+	/* just assume the IP in (trigger_perf_event, +4096) */
+	ASSERT_GT(skel->bss->sample_ip, (uintptr_t)trigger_perf_event, "sample ip");
+	ASSERT_LT(skel->bss->sample_ip, (uintptr_t)trigger_perf_event + 4096, "sample ip");
+
+	clean_perf_sample_bpf_skel();
+	close_perf_event();
+}
+
+static void test_perf_event_read_sample_invalid(void)
+{
+	u64 flags = PERF_SAMPLE_INVALID;
+
+	if (open_perf_event(flags) < 0)
+		return;
+	setup_perf_sample_bpf_skel();
+	trigger_perf_event();
+
+	ASSERT_EQ(check_perf_event(), 0, "number of sample");
+
+	clean_perf_sample_bpf_skel();
+	close_perf_event();
+}
+
+void test_perf_event_read_sample(void)
+{
+	if (test__start_subtest("perf_event_read_sample_ok"))
+		test_perf_event_read_sample_ok();
+	if (test__start_subtest("perf_event_read_sample_invalid"))
+		test_perf_event_read_sample_invalid();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_sample.c b/tools/testing/selftests/bpf/progs/test_perf_sample.c
new file mode 100644
index 000000000000..b1f498d447b9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_sample.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2022 Google
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+unsigned long long sample_ip;
+unsigned long long sample_pid;
+unsigned long long sample_addr;
+
+void *bpf_cast_to_kern_ctx(void *) __ksym;
+
+#define SAMPLE_FLAGS  (PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_ADDR)
+
+SEC("perf_event")
+int perf_sample_filter(void *ctx)
+{
+	struct bpf_perf_event_data_kern *kctx;
+
+	kctx = bpf_cast_to_kern_ctx(ctx);
+
+	if ((kctx->event->attr.sample_type & SAMPLE_FLAGS) != SAMPLE_FLAGS)
+		return 0;
+
+	sample_ip = kctx->data->ip;
+	sample_pid = kctx->data->tid_entry.pid;
+	sample_addr = kctx->data->addr;
+
+	/* generate sample data */
+	return 1;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.39.0.314.g84b9a713c41-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run()
  2022-12-20 22:01 ` [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run() Namhyung Kim
@ 2022-12-22 12:55   ` Peter Zijlstra
  2022-12-22 13:17     ` Daniel Borkmann
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2022-12-22 12:55 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu,
	Jiri Olsa, Martin KaFai Lau, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Tue, Dec 20, 2022 at 02:01:43PM -0800, Namhyung Kim wrote:
> When the BPF program calls bpf_cast_to_kern_ctx(), it assumes the program will
> access perf sample data directly and call perf_prepare_sample() to make sure
> the sample data is populated.

I don't understand a word of this :/ What are you doing and why?

> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  include/linux/bpf.h   | 1 +
>  kernel/bpf/verifier.c | 1 +
>  kernel/events/core.c  | 3 +++
>  3 files changed, 5 insertions(+)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5fec2d1be6d7..6bd4c21a6dd4 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1341,6 +1341,7 @@ struct bpf_prog {
>  				enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */
>  				call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */
>  				call_get_func_ip:1, /* Do we call get_func_ip() */
> +				call_cast_kctx:1, /* Do we call bpf_cast_to_kern_ctx() */
>  				tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */
>  	enum bpf_prog_type	type;		/* Type of BPF program */
>  	enum bpf_attach_type	expected_attach_type; /* For some prog types */
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index faa358b3d5d7..23a9dc187292 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -9236,6 +9236,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
>  				regs[BPF_REG_0].type = PTR_TO_BTF_ID | PTR_TRUSTED;
>  				regs[BPF_REG_0].btf = desc_btf;
>  				regs[BPF_REG_0].btf_id = meta.ret_btf_id;
> +				env->prog->call_cast_kctx = 1;
>  			} else if (meta.func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
>  				ret_t = btf_type_by_id(desc_btf, meta.arg_constant.value);
>  				if (!ret_t || !btf_type_is_struct(ret_t)) {
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index e47914ac8732..a654a0cb6842 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10332,6 +10332,7 @@ static void bpf_overflow_handler(struct perf_event *event,
>  		.event = event,
>  	};
>  	struct bpf_prog *prog;
> +	struct perf_event_header dummy;
>  	int ret = 0;
>  
>  	ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> @@ -10346,6 +10347,8 @@ static void bpf_overflow_handler(struct perf_event *event,
>  			data->callchain = perf_callchain(event, regs);
>  			data->sample_flags |= PERF_SAMPLE_CALLCHAIN;
>  		}
> +		if (prog->call_cast_kctx)
> +			perf_prepare_sample(&dummy, data, event, regs);
>  
>  		ret = bpf_prog_run(prog, &ctx);
>  	}
> -- 
> 2.39.0.314.g84b9a713c41-goog
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run()
  2022-12-22 12:55   ` Peter Zijlstra
@ 2022-12-22 13:17     ` Daniel Borkmann
  2022-12-22 17:34       ` Namhyung Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Borkmann @ 2022-12-22 13:17 UTC (permalink / raw)
  To: Peter Zijlstra, Namhyung Kim
  Cc: Alexei Starovoitov, Andrii Nakryiko, Song Liu, Jiri Olsa,
	Martin KaFai Lau, Yonghong Song, John Fastabend, KP Singh,
	Hao Luo, Stanislav Fomichev, LKML, bpf, Ingo Molnar,
	Arnaldo Carvalho de Melo

On 12/22/22 1:55 PM, Peter Zijlstra wrote:
> On Tue, Dec 20, 2022 at 02:01:43PM -0800, Namhyung Kim wrote:
>> When the BPF program calls bpf_cast_to_kern_ctx(), it assumes the program will
>> access perf sample data directly and call perf_prepare_sample() to make sure
>> the sample data is populated.
> 
> I don't understand a word of this :/ What are you doing and why?

Yeah, above commit message is too terse and unclear. Also, not following where
this assumption comes from; bpf_cast_to_kern_ctx() can be used elsewhere, too,
not just tracing. Recent example from CI side can be found [0].

Thanks,
Daniel

   [0] bpf tree, 70a00e2f1dba ("selftests/bpf: Test bpf_skb_adjust_room on CHECKSUM_PARTIAL")

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run()
  2022-12-22 13:17     ` Daniel Borkmann
@ 2022-12-22 17:34       ` Namhyung Kim
  2022-12-22 20:16         ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2022-12-22 17:34 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Peter Zijlstra, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Jiri Olsa, Martin KaFai Lau, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf, Ingo Molnar,
	Arnaldo Carvalho de Melo

Hello,

On Thu, Dec 22, 2022 at 5:17 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/22/22 1:55 PM, Peter Zijlstra wrote:
> > On Tue, Dec 20, 2022 at 02:01:43PM -0800, Namhyung Kim wrote:
> >> When the BPF program calls bpf_cast_to_kern_ctx(), it assumes the program will
> >> access perf sample data directly and call perf_prepare_sample() to make sure
> >> the sample data is populated.
> >
> > I don't understand a word of this :/ What are you doing and why?
>
> Yeah, above commit message is too terse and unclear. Also, not following where
> this assumption comes from; bpf_cast_to_kern_ctx() can be used elsewhere, too,
> not just tracing. Recent example from CI side can be found [0].

Sorry about that.  Let me rephrase it like below:

With bpf_cast_to_kern_ctx(), BPF programs attached to a perf event
can access perf sample data directly from the ctx.  But the perf sample
data is not fully prepared at this point, and some fields can have invalid
uninitialized values.  So it needs to call perf_prepare_sample() before
calling the BPF overflow handler.

But just calling perf_prepare_sample() can be costly when the BPF
doesn't access the sample data.  It's needed only if the BPF program
uses the sample data.  But it seems hard for the BPF verifier to detect
if it'd access perf sample data.  So I just checked if it calls the
bpf_cast_to_kern_ctx() and assumed it does.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run()
  2022-12-22 17:34       ` Namhyung Kim
@ 2022-12-22 20:16         ` Peter Zijlstra
  2022-12-22 22:25           ` Namhyung Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2022-12-22 20:16 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Jiri Olsa, Martin KaFai Lau, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Thu, Dec 22, 2022 at 09:34:42AM -0800, Namhyung Kim wrote:

> Sorry about that.  Let me rephrase it like below:
> 
> With bpf_cast_to_kern_ctx(), BPF programs attached to a perf event
> can access perf sample data directly from the ctx. 

This is the bpf_prog_run() in bpf_overflow_handler(), right?

> But the perf sample
> data is not fully prepared at this point, and some fields can have invalid
> uninitialized values.  So it needs to call perf_prepare_sample() before
> calling the BPF overflow handler.

It never was, why is it a problem now?

> But just calling perf_prepare_sample() can be costly when the BPF

So you potentially call it twice now, how's that useful?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run()
  2022-12-22 20:16         ` Peter Zijlstra
@ 2022-12-22 22:25           ` Namhyung Kim
  2022-12-23  7:53             ` Jiri Olsa
  0 siblings, 1 reply; 10+ messages in thread
From: Namhyung Kim @ 2022-12-22 22:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko, Song Liu,
	Jiri Olsa, Martin KaFai Lau, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf, Ingo Molnar,
	Arnaldo Carvalho de Melo

On Thu, Dec 22, 2022 at 12:16 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Dec 22, 2022 at 09:34:42AM -0800, Namhyung Kim wrote:
>
> > Sorry about that.  Let me rephrase it like below:
> >
> > With bpf_cast_to_kern_ctx(), BPF programs attached to a perf event
> > can access perf sample data directly from the ctx.
>
> This is the bpf_prog_run() in bpf_overflow_handler(), right?

Yes.

>
> > But the perf sample
> > data is not fully prepared at this point, and some fields can have invalid
> > uninitialized values.  So it needs to call perf_prepare_sample() before
> > calling the BPF overflow handler.
>
> It never was, why is it a problem now?

BPF used to allow selected fields only like period and addr, and they
are initialized always by perf_sample_data_init().  This is relaxed
by the bpf_cast_to_kern_ctx() and it can easily access arbitrary
fields of perf_sample_data now.

The background of this change is to use BPF as a filter for perf
event samples.  The code is there already and returning 0 from
BPF can drop perf samples.  With access to more sample data,
it'd make more educated decisions.

For example, I got some requests to limit perf samples in a
selected region of address (code or data).  Or it can collect
samples only if some hardware specific information is set in
the raw data like in AMD IBS.  We can easily extend it to other
sample info based on users' needs.

>
> > But just calling perf_prepare_sample() can be costly when the BPF
>
> So you potentially call it twice now, how's that useful?

Right.  I think we can check data->sample_flags in
perf_prepare_sample() to minimize the duplicate work.
It already does it for some fields, but misses others.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run()
  2022-12-22 22:25           ` Namhyung Kim
@ 2022-12-23  7:53             ` Jiri Olsa
  2022-12-27 23:19               ` Namhyung Kim
  0 siblings, 1 reply; 10+ messages in thread
From: Jiri Olsa @ 2022-12-23  7:53 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko, Song Liu, Martin KaFai Lau, Yonghong Song,
	John Fastabend, KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Thu, Dec 22, 2022 at 02:25:49PM -0800, Namhyung Kim wrote:
> On Thu, Dec 22, 2022 at 12:16 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Dec 22, 2022 at 09:34:42AM -0800, Namhyung Kim wrote:
> >
> > > Sorry about that.  Let me rephrase it like below:
> > >
> > > With bpf_cast_to_kern_ctx(), BPF programs attached to a perf event
> > > can access perf sample data directly from the ctx.
> >
> > This is the bpf_prog_run() in bpf_overflow_handler(), right?
> 
> Yes.
> 
> >
> > > But the perf sample
> > > data is not fully prepared at this point, and some fields can have invalid
> > > uninitialized values.  So it needs to call perf_prepare_sample() before
> > > calling the BPF overflow handler.
> >
> > It never was, why is it a problem now?
> 
> BPF used to allow selected fields only like period and addr, and they
> are initialized always by perf_sample_data_init().  This is relaxed
> by the bpf_cast_to_kern_ctx() and it can easily access arbitrary
> fields of perf_sample_data now.
> 
> The background of this change is to use BPF as a filter for perf
> event samples.  The code is there already and returning 0 from
> BPF can drop perf samples.  With access to more sample data,
> it'd make more educated decisions.
> 
> For example, I got some requests to limit perf samples in a
> selected region of address (code or data).  Or it can collect
> samples only if some hardware specific information is set in
> the raw data like in AMD IBS.  We can easily extend it to other
> sample info based on users' needs.
> 
> >
> > > But just calling perf_prepare_sample() can be costly when the BPF
> >
> > So you potentially call it twice now, how's that useful?
> 
> Right.  I think we can check data->sample_flags in
> perf_prepare_sample() to minimize the duplicate work.
> It already does it for some fields, but misses others.

we used to have __PERF_SAMPLE_CALLCHAIN_EARLY to avoid extra perf_callchain,
could we add some flag like __PERF_SAMPLE_INIT_EARLY to avoid double call to
perf_prepare_sample?

jirka 

> 
> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run()
  2022-12-23  7:53             ` Jiri Olsa
@ 2022-12-27 23:19               ` Namhyung Kim
  0 siblings, 0 replies; 10+ messages in thread
From: Namhyung Kim @ 2022-12-27 23:19 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Peter Zijlstra, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko, Song Liu, Martin KaFai Lau, Yonghong Song,
	John Fastabend, KP Singh, Hao Luo, Stanislav Fomichev, LKML, bpf,
	Ingo Molnar, Arnaldo Carvalho de Melo

On Thu, Dec 22, 2022 at 11:53 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Dec 22, 2022 at 02:25:49PM -0800, Namhyung Kim wrote:
> > On Thu, Dec 22, 2022 at 12:16 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Dec 22, 2022 at 09:34:42AM -0800, Namhyung Kim wrote:
> > >
> > > > Sorry about that.  Let me rephrase it like below:
> > > >
> > > > With bpf_cast_to_kern_ctx(), BPF programs attached to a perf event
> > > > can access perf sample data directly from the ctx.
> > >
> > > This is the bpf_prog_run() in bpf_overflow_handler(), right?
> >
> > Yes.
> >
> > >
> > > > But the perf sample
> > > > data is not fully prepared at this point, and some fields can have invalid
> > > > uninitialized values.  So it needs to call perf_prepare_sample() before
> > > > calling the BPF overflow handler.
> > >
> > > It never was, why is it a problem now?
> >
> > BPF used to allow selected fields only like period and addr, and they
> > are initialized always by perf_sample_data_init().  This is relaxed
> > by the bpf_cast_to_kern_ctx() and it can easily access arbitrary
> > fields of perf_sample_data now.
> >
> > The background of this change is to use BPF as a filter for perf
> > event samples.  The code is there already and returning 0 from
> > BPF can drop perf samples.  With access to more sample data,
> > it'd make more educated decisions.
> >
> > For example, I got some requests to limit perf samples in a
> > selected region of address (code or data).  Or it can collect
> > samples only if some hardware specific information is set in
> > the raw data like in AMD IBS.  We can easily extend it to other
> > sample info based on users' needs.
> >
> > >
> > > > But just calling perf_prepare_sample() can be costly when the BPF
> > >
> > > So you potentially call it twice now, how's that useful?
> >
> > Right.  I think we can check data->sample_flags in
> > perf_prepare_sample() to minimize the duplicate work.
> > It already does it for some fields, but misses others.
>
> we used to have __PERF_SAMPLE_CALLCHAIN_EARLY to avoid extra perf_callchain,
> could we add some flag like __PERF_SAMPLE_INIT_EARLY to avoid double call to
> perf_prepare_sample?

I think we can check if the filtered_sample_type is 0.
But it still needs to update the perf_event_header.
I think we need to save the calculated size separately.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-12-27 23:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-20 22:01 [PATCH bpf-next 0/2] bpf: Allow access to perf sample data (v2) Namhyung Kim
2022-12-20 22:01 ` [PATCH bpf-next 1/2] bpf/perf: Call perf_prepare_sample() before bpf_prog_run() Namhyung Kim
2022-12-22 12:55   ` Peter Zijlstra
2022-12-22 13:17     ` Daniel Borkmann
2022-12-22 17:34       ` Namhyung Kim
2022-12-22 20:16         ` Peter Zijlstra
2022-12-22 22:25           ` Namhyung Kim
2022-12-23  7:53             ` Jiri Olsa
2022-12-27 23:19               ` Namhyung Kim
2022-12-20 22:01 ` [PATCH bpf-next 2/2] selftests/bpf: Add perf_event_read_sample test cases Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.