[PATCH 0/2] Combine perf and bpf for fast eval of hw breakpoint conditions

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Combine perf and bpf for fast eval of hw breakpoint conditions
@ 2023-12-04 20:14 Kyle Huey
  2023-12-04 20:14 ` [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals Kyle Huey
  2023-12-04 20:14 ` [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO Kyle Huey
  0 siblings, 2 replies; 15+ messages in thread
From: Kyle Huey @ 2023-12-04 20:14 UTC (permalink / raw)
  To: Kyle Huey, linux-kernel; +Cc: Robert O'Callahan, bpf

rr, a userspace record and replay debugger[0], replays asynchronous events
such as signals and context switches by essentially[1] setting a breakpoint
at the address where the asynchronous event was delivered during recording
with a condition that the program state matches the state when the event
was delivered.

Currently, rr uses software breakpoints that trap (via ptrace) to the
supervisor, and evaluates the condition from the supervisor. If the
asynchronous event is delivered in a tight loop (thus requiring the
breakpoint condition to be repeatedly evaluated) the overhead can be
immense. A patch to rr that uses hardware breakpoints via perf events with
an attached bpf program to reject breakpoint hits where the condition is
not satisfied reduces rr's replay overhead by 94% on a pathological (but a
real customer-provided, not contrived) rr trace.

The only obstacle to this approach is that while the kernel allows a bpf
program to suppress sample output when a perf event overflows it does not
suppress signalling the perf event fd. This appears to be a simple
oversight in the code. This patch set fixes that oversight and adds a
selftest.

[0] https://rr-project.org/
[1] Various optimizations exist to skip as much as execution as possible
before setting a breakpoint, and to determine a set of program state that
is practical to check and verify.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals.
  2023-12-04 20:14 [PATCH 0/2] Combine perf and bpf for fast eval of hw breakpoint conditions Kyle Huey
@ 2023-12-04 20:14 ` Kyle Huey
  2023-12-04 22:18   ` Andrii Nakryiko
  2023-12-04 20:14 ` [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO Kyle Huey
  1 sibling, 1 reply; 15+ messages in thread
From: Kyle Huey @ 2023-12-04 20:14 UTC (permalink / raw)
  To: Kyle Huey, linux-kernel
  Cc: Robert O'Callahan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Ian Rogers, Adrian Hunter,
	linux-perf-users, bpf

Returning zero from a bpf program attached to a perf event already
suppresses any data output. This allows it to suppress I/O availability
signals too.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 kernel/events/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b704d83a28b2..34d7b19d45eb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10417,8 +10417,10 @@ static void bpf_overflow_handler(struct perf_event *event,
 	rcu_read_unlock();
 out:
 	__this_cpu_dec(bpf_prog_active);
-	if (!ret)
+	if (!ret) {
+		event->pending_kill = 0;
 		return;
+	}
 
 	event->orig_overflow_handler(event, data, regs);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO.
  2023-12-04 20:14 [PATCH 0/2] Combine perf and bpf for fast eval of hw breakpoint conditions Kyle Huey
  2023-12-04 20:14 ` [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals Kyle Huey
@ 2023-12-04 20:14 ` Kyle Huey
  2023-12-04 22:14   ` Andrii Nakryiko
                     ` (2 more replies)
  1 sibling, 3 replies; 15+ messages in thread
From: Kyle Huey @ 2023-12-04 20:14 UTC (permalink / raw)
  To: Kyle Huey, linux-kernel
  Cc: Robert O'Callahan, Andrii Nakryiko, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, Shuah Khan, bpf, linux-kselftest

The test sets a hardware breakpoint and uses a bpf program to suppress the
I/O availability signal if the ip matches the expected value.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 .../selftests/bpf/prog_tests/perf_skip.c      | 95 +++++++++++++++++++
 .../selftests/bpf/progs/test_perf_skip.c      | 23 +++++
 2 files changed, 118 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
new file mode 100644
index 000000000000..b269a31669b7
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <test_progs.h>
+#include "test_perf_skip.skel.h"
+#include <linux/hw_breakpoint.h>
+#include <sys/mman.h>
+
+#define BPF_OBJECT            "test_perf_skip.bpf.o"
+
+static void handle_sig(int)
+{
+	ASSERT_OK(1, "perf event not skipped");
+}
+
+static noinline int test_function(void)
+{
+	return 0;
+}
+
+void serial_test_perf_skip(void)
+{
+	sighandler_t previous;
+	int duration = 0;
+	struct test_perf_skip *skel = NULL;
+	int map_fd = -1;
+	long page_size = sysconf(_SC_PAGE_SIZE);
+	uintptr_t *ip = NULL;
+	int prog_fd = -1;
+	struct perf_event_attr attr = {0};
+	int perf_fd = -1;
+	struct f_owner_ex owner;
+	int err;
+
+	previous = signal(SIGIO, handle_sig);
+
+	skel = test_perf_skip__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	prog_fd = bpf_program__fd(skel->progs.handler);
+	if (!ASSERT_OK(prog_fd < 0, "bpf_program__fd"))
+		goto cleanup;
+
+	map_fd = bpf_map__fd(skel->maps.ip);
+	if (!ASSERT_OK(map_fd < 0, "bpf_map__fd"))
+		goto cleanup;
+
+	ip = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, map_fd, 0);
+	if (!ASSERT_OK_PTR(ip, "mmap bpf map"))
+		goto cleanup;
+
+	*ip = (uintptr_t)test_function;
+
+	attr.type = PERF_TYPE_BREAKPOINT;
+	attr.size = sizeof(attr);
+	attr.bp_type = HW_BREAKPOINT_X;
+	attr.bp_addr = (uintptr_t)test_function;
+	attr.bp_len = sizeof(long);
+	attr.sample_period = 1;
+	attr.sample_type = PERF_SAMPLE_IP;
+	attr.pinned = 1;
+	attr.exclude_kernel = 1;
+	attr.exclude_hv = 1;
+	attr.precise_ip = 3;
+
+	perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
+	if (CHECK(perf_fd < 0, "perf_event_open", "err %d\n", perf_fd))
+		goto cleanup;
+
+	err = fcntl(perf_fd, F_SETFL, O_ASYNC);
+	if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
+		goto cleanup;
+
+	owner.type = F_OWNER_TID;
+	owner.pid = gettid();
+	err = fcntl(perf_fd, F_SETOWN_EX, &owner);
+	if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
+		goto cleanup;
+
+	err = ioctl(perf_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_SET_BPF)"))
+		goto cleanup;
+
+	test_function();
+
+cleanup:
+	if (perf_fd >= 0)
+		close(perf_fd);
+	if (ip)
+		munmap(ip, page_size);
+	if (skel)
+		test_perf_skip__destroy(skel);
+
+	signal(SIGIO, previous);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
new file mode 100644
index 000000000000..ef01a9161afe
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__uint(map_flags, BPF_F_MMAPABLE);
+	__type(key, uint32_t);
+	__type(value, uintptr_t);
+} ip SEC(".maps");
+
+SEC("perf_event")
+int handler(struct bpf_perf_event_data *data)
+{
+	const uint32_t index = 0;
+	uintptr_t *v = bpf_map_lookup_elem(&ip, &index);
+
+	return !(v && *v == PT_REGS_IP(&data->regs));
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO.
  2023-12-04 20:14 ` [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO Kyle Huey
@ 2023-12-04 22:14   ` Andrii Nakryiko
  2023-12-05 18:21     ` Kyle Huey
  2023-12-05 11:17   ` Jiri Olsa
  2023-12-05 16:54   ` Yonghong Song
  2 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2023-12-04 22:14 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kyle Huey, linux-kernel, Robert O'Callahan, Andrii Nakryiko,
	Mykola Lysenko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	bpf, linux-kselftest

On Mon, Dec 4, 2023 at 12:14 PM Kyle Huey <me@kylehuey.com> wrote:
>
> The test sets a hardware breakpoint and uses a bpf program to suppress the
> I/O availability signal if the ip matches the expected value.
>
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> ---
>  .../selftests/bpf/prog_tests/perf_skip.c      | 95 +++++++++++++++++++
>  .../selftests/bpf/progs/test_perf_skip.c      | 23 +++++
>  2 files changed, 118 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> new file mode 100644
> index 000000000000..b269a31669b7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <test_progs.h>
> +#include "test_perf_skip.skel.h"
> +#include <linux/hw_breakpoint.h>
> +#include <sys/mman.h>
> +
> +#define BPF_OBJECT            "test_perf_skip.bpf.o"

leftover?

> +
> +static void handle_sig(int)
> +{
> +       ASSERT_OK(1, "perf event not skipped");
> +}
> +
> +static noinline int test_function(void)
> +{

please add

asm volatile ("");

here to prevent compiler from actually inlining at the call site

> +       return 0;
> +}
> +
> +void serial_test_perf_skip(void)
> +{
> +       sighandler_t previous;
> +       int duration = 0;
> +       struct test_perf_skip *skel = NULL;
> +       int map_fd = -1;
> +       long page_size = sysconf(_SC_PAGE_SIZE);
> +       uintptr_t *ip = NULL;
> +       int prog_fd = -1;
> +       struct perf_event_attr attr = {0};
> +       int perf_fd = -1;
> +       struct f_owner_ex owner;
> +       int err;
> +
> +       previous = signal(SIGIO, handle_sig);
> +
> +       skel = test_perf_skip__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "skel_load"))
> +               goto cleanup;
> +
> +       prog_fd = bpf_program__fd(skel->progs.handler);
> +       if (!ASSERT_OK(prog_fd < 0, "bpf_program__fd"))
> +               goto cleanup;
> +
> +       map_fd = bpf_map__fd(skel->maps.ip);
> +       if (!ASSERT_OK(map_fd < 0, "bpf_map__fd"))
> +               goto cleanup;
> +
> +       ip = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, map_fd, 0);
> +       if (!ASSERT_OK_PTR(ip, "mmap bpf map"))
> +               goto cleanup;
> +
> +       *ip = (uintptr_t)test_function;
> +
> +       attr.type = PERF_TYPE_BREAKPOINT;
> +       attr.size = sizeof(attr);
> +       attr.bp_type = HW_BREAKPOINT_X;
> +       attr.bp_addr = (uintptr_t)test_function;
> +       attr.bp_len = sizeof(long);
> +       attr.sample_period = 1;
> +       attr.sample_type = PERF_SAMPLE_IP;
> +       attr.pinned = 1;
> +       attr.exclude_kernel = 1;
> +       attr.exclude_hv = 1;
> +       attr.precise_ip = 3;
> +
> +       perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
> +       if (CHECK(perf_fd < 0, "perf_event_open", "err %d\n", perf_fd))

please don't use CHECK() macro, stick to ASSERT_xxx()

also, we are going to run all this on different hardware and VMs, see
how we skip tests if hardware support is not there. See test__skip
usage in prog_tests/perf_branches.c, as one example

> +               goto cleanup;
> +
> +       err = fcntl(perf_fd, F_SETFL, O_ASYNC);

I assume this is what will send SIGIO, right? Can you add a small
comment explicitly saying this?

> +       if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
> +               goto cleanup;
> +
> +       owner.type = F_OWNER_TID;
> +       owner.pid = gettid();
> +       err = fcntl(perf_fd, F_SETOWN_EX, &owner);
> +       if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
> +               goto cleanup;
> +
> +       err = ioctl(perf_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +       if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_SET_BPF)"))
> +               goto cleanup;

we have a better way to do this, please use
bpf_program__attach_perf_event() instead

> +
> +       test_function();
> +
> +cleanup:
> +       if (perf_fd >= 0)
> +               close(perf_fd);
> +       if (ip)
> +               munmap(ip, page_size);
> +       if (skel)
> +               test_perf_skip__destroy(skel);
> +
> +       signal(SIGIO, previous);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> new file mode 100644
> index 000000000000..ef01a9161afe
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +struct {
> +       __uint(type, BPF_MAP_TYPE_ARRAY);
> +       __uint(max_entries, 1);
> +       __uint(map_flags, BPF_F_MMAPABLE);
> +       __type(key, uint32_t);
> +       __type(value, uintptr_t);
> +} ip SEC(".maps");

please use global variable:

__u64 ip;

and then access it from user-space side through skeleton

skel->bss.ip = &test_function;

> +
> +SEC("perf_event")
> +int handler(struct bpf_perf_event_data *data)
> +{
> +       const uint32_t index = 0;
> +       uintptr_t *v = bpf_map_lookup_elem(&ip, &index);
> +
> +       return !(v && *v == PT_REGS_IP(&data->regs));

and so we the above global var suggestion this will be just:

return ip == PT_REGS_IP(&data->regs);

> +}
> +
> +char _license[] SEC("license") = "GPL";
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals.
  2023-12-04 20:14 ` [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals Kyle Huey
@ 2023-12-04 22:18   ` Andrii Nakryiko
  2023-12-05 11:16     ` Jiri Olsa
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2023-12-04 22:18 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kyle Huey, linux-kernel, Robert O'Callahan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Ian Rogers,
	Adrian Hunter, linux-perf-users, bpf

On Mon, Dec 4, 2023 at 12:14 PM Kyle Huey <me@kylehuey.com> wrote:
>
> Returning zero from a bpf program attached to a perf event already
> suppresses any data output. This allows it to suppress I/O availability
> signals too.

make sense, just one question below

>
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> ---
>  kernel/events/core.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index b704d83a28b2..34d7b19d45eb 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10417,8 +10417,10 @@ static void bpf_overflow_handler(struct perf_event *event,
>         rcu_read_unlock();
>  out:
>         __this_cpu_dec(bpf_prog_active);
> -       if (!ret)
> +       if (!ret) {
> +               event->pending_kill = 0;
>                 return;
> +       }

What's the distinction between event->pending_kill and
event->pending_wakeup? Should we do something about pending_wakeup?
Asking out of complete ignorance of all these perf specifics.


>
>         event->orig_overflow_handler(event, data, regs);
>  }
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals.
  2023-12-04 22:18   ` Andrii Nakryiko
@ 2023-12-05 11:16     ` Jiri Olsa
  2023-12-05 18:07       ` Namhyung Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2023-12-05 11:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kyle Huey, Kyle Huey, linux-kernel, Robert O'Callahan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Ian Rogers,
	Adrian Hunter, linux-perf-users, bpf

On Mon, Dec 04, 2023 at 02:18:49PM -0800, Andrii Nakryiko wrote:
> On Mon, Dec 4, 2023 at 12:14 PM Kyle Huey <me@kylehuey.com> wrote:
> >
> > Returning zero from a bpf program attached to a perf event already
> > suppresses any data output. This allows it to suppress I/O availability
> > signals too.
> 
> make sense, just one question below
> 
> >
> > Signed-off-by: Kyle Huey <khuey@kylehuey.com>

Acked-by: Jiri Olsa <jolsa@kernel.org>

> > ---
> >  kernel/events/core.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index b704d83a28b2..34d7b19d45eb 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -10417,8 +10417,10 @@ static void bpf_overflow_handler(struct perf_event *event,
> >         rcu_read_unlock();
> >  out:
> >         __this_cpu_dec(bpf_prog_active);
> > -       if (!ret)
> > +       if (!ret) {
> > +               event->pending_kill = 0;
> >                 return;
> > +       }
> 
> What's the distinction between event->pending_kill and
> event->pending_wakeup? Should we do something about pending_wakeup?
> Asking out of complete ignorance of all these perf specifics.
> 

I think zeroing pending_kill is enough.. when it's set the perf code
sets pending_wakeup to call perf_event_wakeup in irq code that wakes
up event's ring buffer readers and sends sigio if pending_kill is set

jirka

> 
> >
> >         event->orig_overflow_handler(event, data, regs);
> >  }
> > --
> > 2.34.1
> >
> >

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO.
  2023-12-04 20:14 ` [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO Kyle Huey
  2023-12-04 22:14   ` Andrii Nakryiko
@ 2023-12-05 11:17   ` Jiri Olsa
  2023-12-05 16:54   ` Yonghong Song
  2 siblings, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2023-12-05 11:17 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kyle Huey, linux-kernel, Robert O'Callahan, Andrii Nakryiko,
	Mykola Lysenko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Shuah Khan, bpf,
	linux-kselftest

On Mon, Dec 04, 2023 at 12:14:06PM -0800, Kyle Huey wrote:
> The test sets a hardware breakpoint and uses a bpf program to suppress the
> I/O availability signal if the ip matches the expected value.
> 
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> ---
>  .../selftests/bpf/prog_tests/perf_skip.c      | 95 +++++++++++++++++++
>  .../selftests/bpf/progs/test_perf_skip.c      | 23 +++++
>  2 files changed, 118 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> new file mode 100644
> index 000000000000..b269a31669b7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <test_progs.h>
> +#include "test_perf_skip.skel.h"
> +#include <linux/hw_breakpoint.h>
> +#include <sys/mman.h>
> +
> +#define BPF_OBJECT            "test_perf_skip.bpf.o"
> +
> +static void handle_sig(int)
> +{
> +	ASSERT_OK(1, "perf event not skipped");
> +}
> +
> +static noinline int test_function(void)
> +{
> +	return 0;
> +}
> +
> +void serial_test_perf_skip(void)

does it need to be serial?

jirka

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO.
  2023-12-04 20:14 ` [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO Kyle Huey
  2023-12-04 22:14   ` Andrii Nakryiko
  2023-12-05 11:17   ` Jiri Olsa
@ 2023-12-05 16:54   ` Yonghong Song
  2023-12-05 17:52     ` Kyle Huey
  2 siblings, 1 reply; 15+ messages in thread
From: Yonghong Song @ 2023-12-05 16:54 UTC (permalink / raw)
  To: Kyle Huey, Kyle Huey, linux-kernel
  Cc: Robert O'Callahan, Andrii Nakryiko, Mykola Lysenko,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, bpf, linux-kselftest


On 12/4/23 3:14 PM, Kyle Huey wrote:
> The test sets a hardware breakpoint and uses a bpf program to suppress the
> I/O availability signal if the ip matches the expected value.
>
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> ---
>   .../selftests/bpf/prog_tests/perf_skip.c      | 95 +++++++++++++++++++
>   .../selftests/bpf/progs/test_perf_skip.c      | 23 +++++
>   2 files changed, 118 insertions(+)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
>   create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> new file mode 100644
> index 000000000000..b269a31669b7
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <test_progs.h>
> +#include "test_perf_skip.skel.h"
> +#include <linux/hw_breakpoint.h>
> +#include <sys/mman.h>
> +
> +#define BPF_OBJECT            "test_perf_skip.bpf.o"
> +
> +static void handle_sig(int)

I hit a warning here:
home/yhs/work/bpf-next/tools/testing/selftests/bpf/prog_tests/perf_skip.c:10:27: error: omitting the parameter name in a function definition is a C23 extension [-Werror,-Wc23-extensions]
    10 | static void handle_sig(int)
       |

Add a parameter and marked as unused can resolve the issue.

#define __always_unused         __attribute__((__unused__))

static void handle_sig(int unused __always_unused)
{
         ASSERT_OK(1, "perf event not skipped");
}


> +{
> +	ASSERT_OK(1, "perf event not skipped");
> +}
> +
> +static noinline int test_function(void)
> +{
> +	return 0;
> +}
> +
> +void serial_test_perf_skip(void)
> +{
> +	sighandler_t previous;
> +	int duration = 0;
> +	struct test_perf_skip *skel = NULL;
> +	int map_fd = -1;
> +	long page_size = sysconf(_SC_PAGE_SIZE);
> +	uintptr_t *ip = NULL;
> +	int prog_fd = -1;
> +	struct perf_event_attr attr = {0};
> +	int perf_fd = -1;
> +	struct f_owner_ex owner;
> +	int err;
> +
> +	previous = signal(SIGIO, handle_sig);
> +
> +	skel = test_perf_skip__open_and_load();
> +	if (!ASSERT_OK_PTR(skel, "skel_load"))
> +		goto cleanup;
> +
> +	prog_fd = bpf_program__fd(skel->progs.handler);
> +	if (!ASSERT_OK(prog_fd < 0, "bpf_program__fd"))
> +		goto cleanup;
> +
> +	map_fd = bpf_map__fd(skel->maps.ip);
> +	if (!ASSERT_OK(map_fd < 0, "bpf_map__fd"))
> +		goto cleanup;
> +
> +	ip = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, map_fd, 0);
> +	if (!ASSERT_OK_PTR(ip, "mmap bpf map"))
> +		goto cleanup;
> +
> +	*ip = (uintptr_t)test_function;
> +
> +	attr.type = PERF_TYPE_BREAKPOINT;
> +	attr.size = sizeof(attr);
> +	attr.bp_type = HW_BREAKPOINT_X;
> +	attr.bp_addr = (uintptr_t)test_function;
> +	attr.bp_len = sizeof(long);
> +	attr.sample_period = 1;
> +	attr.sample_type = PERF_SAMPLE_IP;
> +	attr.pinned = 1;
> +	attr.exclude_kernel = 1;
> +	attr.exclude_hv = 1;
> +	attr.precise_ip = 3;
> +
> +	perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
> +	if (CHECK(perf_fd < 0, "perf_event_open", "err %d\n", perf_fd))
> +		goto cleanup;
> +
> +	err = fcntl(perf_fd, F_SETFL, O_ASYNC);
> +	if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
> +		goto cleanup;
> +
> +	owner.type = F_OWNER_TID;
> +	owner.pid = gettid();

I hit a compilation failure here:

/home/yhs/work/bpf-next/tools/testing/selftests/bpf/prog_tests/perf_skip.c:75:14: error: call to undeclared function 'gettid'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    75 |         owner.pid = gettid();
       |                     ^

If you looked at some other examples, the common usage is do 'syscall(SYS_gettid)'.
So the following patch should fix the compilation error:

#include <sys/syscall.h>
...
         owner.pid = syscall(SYS_gettid);
...

> +	err = fcntl(perf_fd, F_SETOWN_EX, &owner);
> +	if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
> +		goto cleanup;
> +
> +	err = ioctl(perf_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +	if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_SET_BPF)"))
> +		goto cleanup;
> +
> +	test_function();

As Andrii has mentioned in previous comments, we will have
issue is RELEASE version of selftest is built
   RELEASE=1 make ...

See https://lore.kernel.org/bpf/20231127050342.1945270-1-yonghong.song@linux.dev

> +
> +cleanup:
> +	if (perf_fd >= 0)
> +		close(perf_fd);
> +	if (ip)
> +		munmap(ip, page_size);
> +	if (skel)
> +		test_perf_skip__destroy(skel);
> +
> +	signal(SIGIO, previous);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> new file mode 100644
> index 000000000000..ef01a9161afe
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> @@ -0,0 +1,23 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__uint(map_flags, BPF_F_MMAPABLE);
> +	__type(key, uint32_t);
> +	__type(value, uintptr_t);
> +} ip SEC(".maps");
> +
> +SEC("perf_event")
> +int handler(struct bpf_perf_event_data *data)
> +{
> +	const uint32_t index = 0;
> +	uintptr_t *v = bpf_map_lookup_elem(&ip, &index);
> +
> +	return !(v && *v == PT_REGS_IP(&data->regs));
> +}
> +
> +char _license[] SEC("license") = "GPL";

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO.
  2023-12-05 16:54   ` Yonghong Song
@ 2023-12-05 17:52     ` Kyle Huey
  0 siblings, 0 replies; 15+ messages in thread
From: Kyle Huey @ 2023-12-05 17:52 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Kyle Huey, linux-kernel, Robert O'Callahan, Andrii Nakryiko,
	Mykola Lysenko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan, bpf,
	linux-kselftest

On Tue, Dec 5, 2023 at 8:54 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>
>
> On 12/4/23 3:14 PM, Kyle Huey wrote:
> > The test sets a hardware breakpoint and uses a bpf program to suppress the
> > I/O availability signal if the ip matches the expected value.
> >
> > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> > ---
> >   .../selftests/bpf/prog_tests/perf_skip.c      | 95 +++++++++++++++++++
> >   .../selftests/bpf/progs/test_perf_skip.c      | 23 +++++
> >   2 files changed, 118 insertions(+)
> >   create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
> >   create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > new file mode 100644
> > index 000000000000..b269a31669b7
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > @@ -0,0 +1,95 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#define _GNU_SOURCE
> > +#include <test_progs.h>
> > +#include "test_perf_skip.skel.h"
> > +#include <linux/hw_breakpoint.h>
> > +#include <sys/mman.h>
> > +
> > +#define BPF_OBJECT            "test_perf_skip.bpf.o"
> > +
> > +static void handle_sig(int)
>
> I hit a warning here:
> home/yhs/work/bpf-next/tools/testing/selftests/bpf/prog_tests/perf_skip.c:10:27: error: omitting the parameter name in a function definition is a C23 extension [-Werror,-Wc23-extensions]

Yeah, Meta's kernel-ci bot sent me off-list email about this one.

>
>     10 | static void handle_sig(int)
>        |
>
> Add a parameter and marked as unused can resolve the issue.
>
> #define __always_unused         __attribute__((__unused__))
>
> static void handle_sig(int unused __always_unused)
> {
>          ASSERT_OK(1, "perf event not skipped");
> }
>
>
> > +{
> > +     ASSERT_OK(1, "perf event not skipped");
> > +}
> > +
> > +static noinline int test_function(void)
> > +{
> > +     return 0;
> > +}
> > +
> > +void serial_test_perf_skip(void)
> > +{
> > +     sighandler_t previous;
> > +     int duration = 0;
> > +     struct test_perf_skip *skel = NULL;
> > +     int map_fd = -1;
> > +     long page_size = sysconf(_SC_PAGE_SIZE);
> > +     uintptr_t *ip = NULL;
> > +     int prog_fd = -1;
> > +     struct perf_event_attr attr = {0};
> > +     int perf_fd = -1;
> > +     struct f_owner_ex owner;
> > +     int err;
> > +
> > +     previous = signal(SIGIO, handle_sig);
> > +
> > +     skel = test_perf_skip__open_and_load();
> > +     if (!ASSERT_OK_PTR(skel, "skel_load"))
> > +             goto cleanup;
> > +
> > +     prog_fd = bpf_program__fd(skel->progs.handler);
> > +     if (!ASSERT_OK(prog_fd < 0, "bpf_program__fd"))
> > +             goto cleanup;
> > +
> > +     map_fd = bpf_map__fd(skel->maps.ip);
> > +     if (!ASSERT_OK(map_fd < 0, "bpf_map__fd"))
> > +             goto cleanup;
> > +
> > +     ip = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, map_fd, 0);
> > +     if (!ASSERT_OK_PTR(ip, "mmap bpf map"))
> > +             goto cleanup;
> > +
> > +     *ip = (uintptr_t)test_function;
> > +
> > +     attr.type = PERF_TYPE_BREAKPOINT;
> > +     attr.size = sizeof(attr);
> > +     attr.bp_type = HW_BREAKPOINT_X;
> > +     attr.bp_addr = (uintptr_t)test_function;
> > +     attr.bp_len = sizeof(long);
> > +     attr.sample_period = 1;
> > +     attr.sample_type = PERF_SAMPLE_IP;
> > +     attr.pinned = 1;
> > +     attr.exclude_kernel = 1;
> > +     attr.exclude_hv = 1;
> > +     attr.precise_ip = 3;
> > +
> > +     perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
> > +     if (CHECK(perf_fd < 0, "perf_event_open", "err %d\n", perf_fd))
> > +             goto cleanup;
> > +
> > +     err = fcntl(perf_fd, F_SETFL, O_ASYNC);
> > +     if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
> > +             goto cleanup;
> > +
> > +     owner.type = F_OWNER_TID;
> > +     owner.pid = gettid();
>
> I hit a compilation failure here:
>
> /home/yhs/work/bpf-next/tools/testing/selftests/bpf/prog_tests/perf_skip.c:75:14: error: call to undeclared function 'gettid'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>     75 |         owner.pid = gettid();
>        |                     ^
>
> If you looked at some other examples, the common usage is do 'syscall(SYS_gettid)'.

Not clear why this works for me but sure I'll change that.

>
> So the following patch should fix the compilation error:
>
> #include <sys/syscall.h>
> ...
>          owner.pid = syscall(SYS_gettid);
> ...
>
> > +     err = fcntl(perf_fd, F_SETOWN_EX, &owner);
> > +     if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
> > +             goto cleanup;
> > +
> > +     err = ioctl(perf_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> > +     if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_SET_BPF)"))
> > +             goto cleanup;
> > +
> > +     test_function();
>
> As Andrii has mentioned in previous comments, we will have
> issue is RELEASE version of selftest is built
>    RELEASE=1 make ...
>
> See https://lore.kernel.org/bpf/20231127050342.1945270-1-yonghong.song@linux.dev

Not sure I follow this one. Are you saying adding asm volatile ("") in
test_function() is *not* sufficient?

- Kyle

>
> > +
> > +cleanup:
> > +     if (perf_fd >= 0)
> > +             close(perf_fd);
> > +     if (ip)
> > +             munmap(ip, page_size);
> > +     if (skel)
> > +             test_perf_skip__destroy(skel);
> > +
> > +     signal(SIGIO, previous);
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> > new file mode 100644
> > index 000000000000..ef01a9161afe
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> > @@ -0,0 +1,23 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include "vmlinux.h"
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +
> > +struct {
> > +     __uint(type, BPF_MAP_TYPE_ARRAY);
> > +     __uint(max_entries, 1);
> > +     __uint(map_flags, BPF_F_MMAPABLE);
> > +     __type(key, uint32_t);
> > +     __type(value, uintptr_t);
> > +} ip SEC(".maps");
> > +
> > +SEC("perf_event")
> > +int handler(struct bpf_perf_event_data *data)
> > +{
> > +     const uint32_t index = 0;
> > +     uintptr_t *v = bpf_map_lookup_elem(&ip, &index);
> > +
> > +     return !(v && *v == PT_REGS_IP(&data->regs));
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals.
  2023-12-05 11:16     ` Jiri Olsa
@ 2023-12-05 18:07       ` Namhyung Kim
  2023-12-05 18:16         ` Marco Elver
  2023-12-05 19:19         ` Kyle Huey
  0 siblings, 2 replies; 15+ messages in thread
From: Namhyung Kim @ 2023-12-05 18:07 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, Kyle Huey, Kyle Huey, linux-kernel,
	Robert O'Callahan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Ian Rogers, Adrian Hunter, linux-perf-users, bpf, Marco Elver

Hello,

Add Marco Elver to CC.

On Tue, Dec 5, 2023 at 3:16 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Dec 04, 2023 at 02:18:49PM -0800, Andrii Nakryiko wrote:
> > On Mon, Dec 4, 2023 at 12:14 PM Kyle Huey <me@kylehuey.com> wrote:
> > >
> > > Returning zero from a bpf program attached to a perf event already
> > > suppresses any data output. This allows it to suppress I/O availability
> > > signals too.
> >
> > make sense, just one question below
> >
> > >
> > > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
>
> > > ---
> > >  kernel/events/core.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index b704d83a28b2..34d7b19d45eb 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -10417,8 +10417,10 @@ static void bpf_overflow_handler(struct perf_event *event,
> > >         rcu_read_unlock();
> > >  out:
> > >         __this_cpu_dec(bpf_prog_active);
> > > -       if (!ret)
> > > +       if (!ret) {
> > > +               event->pending_kill = 0;
> > >                 return;
> > > +       }
> >
> > What's the distinction between event->pending_kill and
> > event->pending_wakeup? Should we do something about pending_wakeup?
> > Asking out of complete ignorance of all these perf specifics.
> >
>
> I think zeroing pending_kill is enough.. when it's set the perf code
> sets pending_wakeup to call perf_event_wakeup in irq code that wakes
> up event's ring buffer readers and sends sigio if pending_kill is set

Right, IIUC pending_wakeup is set by the ring buffer code when
a task is waiting for events and it gets enough events (watermark).
So I think it's good for ring buffer to manage the pending_wakeup.

And pending_kill is set when a task wants a signal delivery even
without getting enough events.  Clearing pending_kill looks ok
to suppress normal signals but I'm not sure if it's ok for SIGTRAP.

If we want to handle returning 0 from bpf as if the event didn't
happen, I think SIGTRAP and event_limit logic should be done
after the overflow handler depending on pending_kill or something.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals.
  2023-12-05 18:07       ` Namhyung Kim
@ 2023-12-05 18:16         ` Marco Elver
  2023-12-05 18:23           ` Kyle Huey
  2023-12-05 18:26           ` Namhyung Kim
  2023-12-05 19:19         ` Kyle Huey
  1 sibling, 2 replies; 15+ messages in thread
From: Marco Elver @ 2023-12-05 18:16 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Andrii Nakryiko, Kyle Huey, Kyle Huey, linux-kernel,
	Robert O'Callahan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Ian Rogers, Adrian Hunter, linux-perf-users, bpf

On Tue, 5 Dec 2023 at 19:07, Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> Add Marco Elver to CC.
>
> On Tue, Dec 5, 2023 at 3:16 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Dec 04, 2023 at 02:18:49PM -0800, Andrii Nakryiko wrote:
> > > On Mon, Dec 4, 2023 at 12:14 PM Kyle Huey <me@kylehuey.com> wrote:
> > > >
> > > > Returning zero from a bpf program attached to a perf event already
> > > > suppresses any data output. This allows it to suppress I/O availability
> > > > signals too.
> > >
> > > make sense, just one question below
> > >
> > > >
> > > > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> >
> > Acked-by: Jiri Olsa <jolsa@kernel.org>
> >
> > > > ---
> > > >  kernel/events/core.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index b704d83a28b2..34d7b19d45eb 100644
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -10417,8 +10417,10 @@ static void bpf_overflow_handler(struct perf_event *event,
> > > >         rcu_read_unlock();
> > > >  out:
> > > >         __this_cpu_dec(bpf_prog_active);
> > > > -       if (!ret)
> > > > +       if (!ret) {
> > > > +               event->pending_kill = 0;
> > > >                 return;
> > > > +       }
> > >
> > > What's the distinction between event->pending_kill and
> > > event->pending_wakeup? Should we do something about pending_wakeup?
> > > Asking out of complete ignorance of all these perf specifics.
> > >
> >
> > I think zeroing pending_kill is enough.. when it's set the perf code
> > sets pending_wakeup to call perf_event_wakeup in irq code that wakes
> > up event's ring buffer readers and sends sigio if pending_kill is set
>
> Right, IIUC pending_wakeup is set by the ring buffer code when
> a task is waiting for events and it gets enough events (watermark).
> So I think it's good for ring buffer to manage the pending_wakeup.
>
> And pending_kill is set when a task wants a signal delivery even
> without getting enough events.  Clearing pending_kill looks ok
> to suppress normal signals but I'm not sure if it's ok for SIGTRAP.
>
> If we want to handle returning 0 from bpf as if the event didn't
> happen, I think SIGTRAP and event_limit logic should be done
> after the overflow handler depending on pending_kill or something.

I'm not sure which kernel version this is for, but in recent kernels,
the SIGTRAP logic was changed to no longer "abuse" event_limit, and
uses its own "pending_sigtrap" + "pending_work" (on reschedule
transitions).

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO.
  2023-12-04 22:14   ` Andrii Nakryiko
@ 2023-12-05 18:21     ` Kyle Huey
  0 siblings, 0 replies; 15+ messages in thread
From: Kyle Huey @ 2023-12-05 18:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kyle Huey, linux-kernel, Robert O'Callahan, Andrii Nakryiko,
	Mykola Lysenko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	bpf, linux-kselftest

On Mon, Dec 4, 2023 at 2:14 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Dec 4, 2023 at 12:14 PM Kyle Huey <me@kylehuey.com> wrote:
> >
> > The test sets a hardware breakpoint and uses a bpf program to suppress the
> > I/O availability signal if the ip matches the expected value.
> >
> > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> > ---
> >  .../selftests/bpf/prog_tests/perf_skip.c      | 95 +++++++++++++++++++
> >  .../selftests/bpf/progs/test_perf_skip.c      | 23 +++++
> >  2 files changed, 118 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > new file mode 100644
> > index 000000000000..b269a31669b7
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > @@ -0,0 +1,95 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#define _GNU_SOURCE
> > +#include <test_progs.h>
> > +#include "test_perf_skip.skel.h"
> > +#include <linux/hw_breakpoint.h>
> > +#include <sys/mman.h>
> > +
> > +#define BPF_OBJECT            "test_perf_skip.bpf.o"
>
> leftover?

Indeed. Fixed.

> > +
> > +static void handle_sig(int)
> > +{
> > +       ASSERT_OK(1, "perf event not skipped");
> > +}
> > +
> > +static noinline int test_function(void)
> > +{
>
> please add
>
> asm volatile ("");
>
> here to prevent compiler from actually inlining at the call site

Ok.

> > +       return 0;
> > +}
> > +
> > +void serial_test_perf_skip(void)
> > +{
> > +       sighandler_t previous;
> > +       int duration = 0;
> > +       struct test_perf_skip *skel = NULL;
> > +       int map_fd = -1;
> > +       long page_size = sysconf(_SC_PAGE_SIZE);
> > +       uintptr_t *ip = NULL;
> > +       int prog_fd = -1;
> > +       struct perf_event_attr attr = {0};
> > +       int perf_fd = -1;
> > +       struct f_owner_ex owner;
> > +       int err;
> > +
> > +       previous = signal(SIGIO, handle_sig);
> > +
> > +       skel = test_perf_skip__open_and_load();
> > +       if (!ASSERT_OK_PTR(skel, "skel_load"))
> > +               goto cleanup;
> > +
> > +       prog_fd = bpf_program__fd(skel->progs.handler);
> > +       if (!ASSERT_OK(prog_fd < 0, "bpf_program__fd"))
> > +               goto cleanup;
> > +
> > +       map_fd = bpf_map__fd(skel->maps.ip);
> > +       if (!ASSERT_OK(map_fd < 0, "bpf_map__fd"))
> > +               goto cleanup;
> > +
> > +       ip = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, map_fd, 0);
> > +       if (!ASSERT_OK_PTR(ip, "mmap bpf map"))
> > +               goto cleanup;
> > +
> > +       *ip = (uintptr_t)test_function;
> > +
> > +       attr.type = PERF_TYPE_BREAKPOINT;
> > +       attr.size = sizeof(attr);
> > +       attr.bp_type = HW_BREAKPOINT_X;
> > +       attr.bp_addr = (uintptr_t)test_function;
> > +       attr.bp_len = sizeof(long);
> > +       attr.sample_period = 1;
> > +       attr.sample_type = PERF_SAMPLE_IP;
> > +       attr.pinned = 1;
> > +       attr.exclude_kernel = 1;
> > +       attr.exclude_hv = 1;
> > +       attr.precise_ip = 3;
> > +
> > +       perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
> > +       if (CHECK(perf_fd < 0, "perf_event_open", "err %d\n", perf_fd))
>
> please don't use CHECK() macro, stick to ASSERT_xxx()

Done.

> also, we are going to run all this on different hardware and VMs, see
> how we skip tests if hardware support is not there. See test__skip
> usage in prog_tests/perf_branches.c, as one example

Hmm I suppose it should be conditioned on CONFIG_HAVE_HW_BREAKPOINT.

> > +               goto cleanup;
> > +
> > +       err = fcntl(perf_fd, F_SETFL, O_ASYNC);
>
> I assume this is what will send SIGIO, right? Can you add a small
> comment explicitly saying this?

Done.

> > +       if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
> > +               goto cleanup;
> > +
> > +       owner.type = F_OWNER_TID;
> > +       owner.pid = gettid();
> > +       err = fcntl(perf_fd, F_SETOWN_EX, &owner);
> > +       if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
> > +               goto cleanup;
> > +
> > +       err = ioctl(perf_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> > +       if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_SET_BPF)"))
> > +               goto cleanup;
>
> we have a better way to do this, please use
> bpf_program__attach_perf_event() instead

Done.

> > +
> > +       test_function();
> > +
> > +cleanup:
> > +       if (perf_fd >= 0)
> > +               close(perf_fd);
> > +       if (ip)
> > +               munmap(ip, page_size);
> > +       if (skel)
> > +               test_perf_skip__destroy(skel);
> > +
> > +       signal(SIGIO, previous);
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> > new file mode 100644
> > index 000000000000..ef01a9161afe
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> > @@ -0,0 +1,23 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include "vmlinux.h"
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +
> > +struct {
> > +       __uint(type, BPF_MAP_TYPE_ARRAY);
> > +       __uint(max_entries, 1);
> > +       __uint(map_flags, BPF_F_MMAPABLE);
> > +       __type(key, uint32_t);
> > +       __type(value, uintptr_t);
> > +} ip SEC(".maps");
>
> please use global variable:
>
> __u64 ip;
>
> and then access it from user-space side through skeleton
>
> skel->bss.ip = &test_function;

Done.

> > +
> > +SEC("perf_event")
> > +int handler(struct bpf_perf_event_data *data)
> > +{
> > +       const uint32_t index = 0;
> > +       uintptr_t *v = bpf_map_lookup_elem(&ip, &index);
> > +
> > +       return !(v && *v == PT_REGS_IP(&data->regs));
>
> and so we the above global var suggestion this will be just:
>
> return ip == PT_REGS_IP(&data->regs);
>
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";
> > --
> > 2.34.1
> >

- Kyle

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals.
  2023-12-05 18:16         ` Marco Elver
@ 2023-12-05 18:23           ` Kyle Huey
  2023-12-05 18:26           ` Namhyung Kim
  1 sibling, 0 replies; 15+ messages in thread
From: Kyle Huey @ 2023-12-05 18:23 UTC (permalink / raw)
  To: Marco Elver
  Cc: Namhyung Kim, Jiri Olsa, Andrii Nakryiko, Kyle Huey,
	linux-kernel, Robert O'Callahan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Ian Rogers, Adrian Hunter, linux-perf-users, bpf

On Tue, Dec 5, 2023 at 10:17 AM Marco Elver <elver@google.com> wrote:
>
> On Tue, 5 Dec 2023 at 19:07, Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hello,
> >
> > Add Marco Elver to CC.
> >
> > On Tue, Dec 5, 2023 at 3:16 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Mon, Dec 04, 2023 at 02:18:49PM -0800, Andrii Nakryiko wrote:
> > > > On Mon, Dec 4, 2023 at 12:14 PM Kyle Huey <me@kylehuey.com> wrote:
> > > > >
> > > > > Returning zero from a bpf program attached to a perf event already
> > > > > suppresses any data output. This allows it to suppress I/O availability
> > > > > signals too.
> > > >
> > > > make sense, just one question below
> > > >
> > > > >
> > > > > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> > >
> > > Acked-by: Jiri Olsa <jolsa@kernel.org>
> > >
> > > > > ---
> > > > >  kernel/events/core.c | 4 +++-
> > > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > > index b704d83a28b2..34d7b19d45eb 100644
> > > > > --- a/kernel/events/core.c
> > > > > +++ b/kernel/events/core.c
> > > > > @@ -10417,8 +10417,10 @@ static void bpf_overflow_handler(struct perf_event *event,
> > > > >         rcu_read_unlock();
> > > > >  out:
> > > > >         __this_cpu_dec(bpf_prog_active);
> > > > > -       if (!ret)
> > > > > +       if (!ret) {
> > > > > +               event->pending_kill = 0;
> > > > >                 return;
> > > > > +       }
> > > >
> > > > What's the distinction between event->pending_kill and
> > > > event->pending_wakeup? Should we do something about pending_wakeup?
> > > > Asking out of complete ignorance of all these perf specifics.
> > > >
> > >
> > > I think zeroing pending_kill is enough.. when it's set the perf code
> > > sets pending_wakeup to call perf_event_wakeup in irq code that wakes
> > > up event's ring buffer readers and sends sigio if pending_kill is set
> >
> > Right, IIUC pending_wakeup is set by the ring buffer code when
> > a task is waiting for events and it gets enough events (watermark).
> > So I think it's good for ring buffer to manage the pending_wakeup.
> >
> > And pending_kill is set when a task wants a signal delivery even
> > without getting enough events.  Clearing pending_kill looks ok
> > to suppress normal signals but I'm not sure if it's ok for SIGTRAP.
> >
> > If we want to handle returning 0 from bpf as if the event didn't
> > happen, I think SIGTRAP and event_limit logic should be done
> > after the overflow handler depending on pending_kill or something.
>
> I'm not sure which kernel version this is for, but in recent kernels,
> the SIGTRAP logic was changed to no longer "abuse" event_limit, and
> uses its own "pending_sigtrap" + "pending_work" (on reschedule
> transitions).
>
> Thanks,
> -- Marco

The patch was prepared against a 6.7 release candidate.

- Kyle

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals.
  2023-12-05 18:16         ` Marco Elver
  2023-12-05 18:23           ` Kyle Huey
@ 2023-12-05 18:26           ` Namhyung Kim
  1 sibling, 0 replies; 15+ messages in thread
From: Namhyung Kim @ 2023-12-05 18:26 UTC (permalink / raw)
  To: Marco Elver
  Cc: Jiri Olsa, Andrii Nakryiko, Kyle Huey, Kyle Huey, linux-kernel,
	Robert O'Callahan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Ian Rogers, Adrian Hunter, linux-perf-users, bpf

On Tue, Dec 5, 2023 at 10:17 AM Marco Elver <elver@google.com> wrote:
>
> On Tue, 5 Dec 2023 at 19:07, Namhyung Kim <namhyung@kernel.org> wrote:
> > If we want to handle returning 0 from bpf as if the event didn't
> > happen, I think SIGTRAP and event_limit logic should be done
> > after the overflow handler depending on pending_kill or something.
>
> I'm not sure which kernel version this is for, but in recent kernels,
> the SIGTRAP logic was changed to no longer "abuse" event_limit, and
> uses its own "pending_sigtrap" + "pending_work" (on reschedule
> transitions).

Oh, I didn't mean SIGTRAP and event_limit together.
Maybe they have an issue separately.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals.
  2023-12-05 18:07       ` Namhyung Kim
  2023-12-05 18:16         ` Marco Elver
@ 2023-12-05 19:19         ` Kyle Huey
  1 sibling, 0 replies; 15+ messages in thread
From: Kyle Huey @ 2023-12-05 19:19 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Andrii Nakryiko, Kyle Huey, linux-kernel,
	Robert O'Callahan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Ian Rogers, Adrian Hunter, linux-perf-users, bpf, Marco Elver

On Tue, Dec 5, 2023 at 10:07 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> Add Marco Elver to CC.
>
> On Tue, Dec 5, 2023 at 3:16 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Dec 04, 2023 at 02:18:49PM -0800, Andrii Nakryiko wrote:
> > > On Mon, Dec 4, 2023 at 12:14 PM Kyle Huey <me@kylehuey.com> wrote:
> > > >
> > > > Returning zero from a bpf program attached to a perf event already
> > > > suppresses any data output. This allows it to suppress I/O availability
> > > > signals too.
> > >
> > > make sense, just one question below
> > >
> > > >
> > > > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> >
> > Acked-by: Jiri Olsa <jolsa@kernel.org>
> >
> > > > ---
> > > >  kernel/events/core.c | 4 +++-
> > > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index b704d83a28b2..34d7b19d45eb 100644
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -10417,8 +10417,10 @@ static void bpf_overflow_handler(struct perf_event *event,
> > > >         rcu_read_unlock();
> > > >  out:
> > > >         __this_cpu_dec(bpf_prog_active);
> > > > -       if (!ret)
> > > > +       if (!ret) {
> > > > +               event->pending_kill = 0;
> > > >                 return;
> > > > +       }
> > >
> > > What's the distinction between event->pending_kill and
> > > event->pending_wakeup? Should we do something about pending_wakeup?
> > > Asking out of complete ignorance of all these perf specifics.
> > >
> >
> > I think zeroing pending_kill is enough.. when it's set the perf code
> > sets pending_wakeup to call perf_event_wakeup in irq code that wakes
> > up event's ring buffer readers and sends sigio if pending_kill is set
>
> Right, IIUC pending_wakeup is set by the ring buffer code when
> a task is waiting for events and it gets enough events (watermark).
> So I think it's good for ring buffer to manage the pending_wakeup.
>
> And pending_kill is set when a task wants a signal delivery even
> without getting enough events.  Clearing pending_kill looks ok
> to suppress normal signals but I'm not sure if it's ok for SIGTRAP.
>
> If we want to handle returning 0 from bpf as if the event didn't
> happen, I think SIGTRAP and event_limit logic should be done
> after the overflow handler depending on pending_kill or something.

Hmm, yes, perhaps. The SIGTRAP thing (which I was previously unaware
of) would actually be more useful to us than an I/O signal.

I am slightly wary that event_limit appears to have no tests in the kernel tree.

- Kyle

> Thanks,
> Namhyung

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-12-05 19:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-04 20:14 [PATCH 0/2] Combine perf and bpf for fast eval of hw breakpoint conditions Kyle Huey
2023-12-04 20:14 ` [PATCH 1/2] perf/bpf: Allow a bpf program to suppress I/O signals Kyle Huey
2023-12-04 22:18   ` Andrii Nakryiko
2023-12-05 11:16     ` Jiri Olsa
2023-12-05 18:07       ` Namhyung Kim
2023-12-05 18:16         ` Marco Elver
2023-12-05 18:23           ` Kyle Huey
2023-12-05 18:26           ` Namhyung Kim
2023-12-05 19:19         ` Kyle Huey
2023-12-04 20:14 ` [PATCH 2/2] selftest/bpf: Test returning zero from a perf bpf program suppresses SIGIO Kyle Huey
2023-12-04 22:14   ` Andrii Nakryiko
2023-12-05 18:21     ` Kyle Huey
2023-12-05 11:17   ` Jiri Olsa
2023-12-05 16:54   ` Yonghong Song
2023-12-05 17:52     ` Kyle Huey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.