All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Combine perf and bpf for fast eval of hw breakpoint conditions
@ 2023-12-07 16:34 Kyle Huey
  2023-12-07 16:34 ` [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap Kyle Huey
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Kyle Huey @ 2023-12-07 16:34 UTC (permalink / raw)
  To: Kyle Huey, linux-kernel, Andrii Nakryiko, Jiri Olsa,
	Namhyung Kim, Marco Elver, Yonghong Song
  Cc: Robert O'Callahan, bpf

rr, a userspace record and replay debugger[0], replays asynchronous events
such as signals and context switches by essentially[1] setting a breakpoint
at the address where the asynchronous event was delivered during recording
with a condition that the program state matches the state when the event
was delivered.

Currently, rr uses software breakpoints that trap (via ptrace) to the
supervisor, and evaluates the condition from the supervisor. If the
asynchronous event is delivered in a tight loop (thus requiring the
breakpoint condition to be repeatedly evaluated) the overhead can be
immense. A patch to rr that uses hardware breakpoints via perf events with
an attached BPF program to reject breakpoint hits where the condition is
not satisfied reduces rr's replay overhead by 94% on a pathological (but a
real customer-provided, not contrived) rr trace.

The only obstacle to this approach is that while the kernel allows a BPF
program to suppress sample output when a perf event overflows it does not
suppress signalling the perf event fd. This appears to be a simple
oversight in the code. This patch set reorders the overflow handler
callback and the side effects of perf event overflow to allow an overflow
handler to suppress all side effects, changes bpf_overflow_handler() to
suppress those side effects if the BPF program returns zero, and adds a
selftest.

The previous version of this patchset can be found at
https://lore.kernel.org/linux-kernel/20231204201406.341074-1-khuey@kylehuey.com/

Changes since v1:

Patch 1 was added so that a sample suppressed by this mechanism will also
not generate SIGTRAPs nor count against the event limit.

Patch 2 is v1's patch 1.

Patch 3 is v1's patch 2, and addresses a number of review comments about
the self test and adds testing for the behavior introduced by patch 1.

[0] https://rr-project.org/
[1] Various optimizations exist to skip as much as execution as possible
before setting a breakpoint, and to determine a set of program state that
is practical to check and verify.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap
  2023-12-07 16:34 [PATCH v2 0/3] Combine perf and bpf for fast eval of hw breakpoint conditions Kyle Huey
@ 2023-12-07 16:34 ` Kyle Huey
  2023-12-07 17:05   ` Marco Elver
  2023-12-07 16:34 ` [PATCH v2 2/3] perf/bpf: Allow a bpf program to suppress all sample side effects Kyle Huey
  2023-12-07 16:34 ` [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses " Kyle Huey
  2 siblings, 1 reply; 16+ messages in thread
From: Kyle Huey @ 2023-12-07 16:34 UTC (permalink / raw)
  To: Kyle Huey, linux-kernel, Andrii Nakryiko, Jiri Olsa,
	Namhyung Kim, Marco Elver, Yonghong Song
  Cc: Robert O'Callahan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Ian Rogers, Adrian Hunter, linux-perf-users

The perf subsystem already allows an overflow handler to clear pending_kill
to suppress a fasync signal (although nobody currently does this). Allow an
overflow handler to suppress the other visible side effects of an overflow,
namely event_limit accounting and SIGTRAP generation.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 kernel/events/core.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b704d83a28b2..19fddfc27a4a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event,
 	 */
 
 	event->pending_kill = POLL_IN;
+
+	READ_ONCE(event->overflow_handler)(event, data, regs);
+
+	if (!event->pending_kill)
+		return ret;
+
 	if (events && atomic_dec_and_test(&event->event_limit)) {
 		ret = 1;
 		event->pending_kill = POLL_HUP;
@@ -9584,9 +9590,7 @@ static int __perf_event_overflow(struct perf_event *event,
 		irq_work_queue(&event->pending_irq);
 	}
 
-	READ_ONCE(event->overflow_handler)(event, data, regs);
-
-	if (*perf_event_fasync(event) && event->pending_kill) {
+	if (*perf_event_fasync(event)) {
 		event->pending_wakeup = 1;
 		irq_work_queue(&event->pending_irq);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/3] perf/bpf: Allow a bpf program to suppress all sample side effects.
  2023-12-07 16:34 [PATCH v2 0/3] Combine perf and bpf for fast eval of hw breakpoint conditions Kyle Huey
  2023-12-07 16:34 ` [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap Kyle Huey
@ 2023-12-07 16:34 ` Kyle Huey
  2023-12-07 19:12   ` Andrii Nakryiko
  2023-12-07 16:34 ` [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses " Kyle Huey
  2 siblings, 1 reply; 16+ messages in thread
From: Kyle Huey @ 2023-12-07 16:34 UTC (permalink / raw)
  To: Kyle Huey, linux-kernel, Andrii Nakryiko, Jiri Olsa,
	Namhyung Kim, Marco Elver, Yonghong Song
  Cc: Robert O'Callahan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Ian Rogers, Adrian Hunter, linux-perf-users, bpf

Returning zero from a bpf program attached to a perf event already
suppresses any data output. By clearing pending_kill, returning zero from a
bpf program will effectively pretend the sample never happened for all
userspace purposes.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 kernel/events/core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 19fddfc27a4a..6cda05a4969d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10421,8 +10421,10 @@ static void bpf_overflow_handler(struct perf_event *event,
 	rcu_read_unlock();
 out:
 	__this_cpu_dec(bpf_prog_active);
-	if (!ret)
+	if (!ret) {
+		event->pending_kill = 0;
 		return;
+	}
 
 	event->orig_overflow_handler(event, data, regs);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses side effects.
  2023-12-07 16:34 [PATCH v2 0/3] Combine perf and bpf for fast eval of hw breakpoint conditions Kyle Huey
  2023-12-07 16:34 ` [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap Kyle Huey
  2023-12-07 16:34 ` [PATCH v2 2/3] perf/bpf: Allow a bpf program to suppress all sample side effects Kyle Huey
@ 2023-12-07 16:34 ` Kyle Huey
  2023-12-07 19:12   ` Andrii Nakryiko
  2 siblings, 1 reply; 16+ messages in thread
From: Kyle Huey @ 2023-12-07 16:34 UTC (permalink / raw)
  To: Kyle Huey, linux-kernel, Andrii Nakryiko, Jiri Olsa,
	Namhyung Kim, Marco Elver, Yonghong Song
  Cc: Robert O'Callahan, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Mykola Lysenko,
	Shuah Khan, bpf, linux-kselftest

The test sets a hardware breakpoint and uses a bpf program to suppress the
side effects of a perf event sample, including I/O availability signals,
SIGTRAPs, and decrementing the event counter limit, if the ip matches the
expected value. Then the function with the breakpoint is executed multiple
times to test that all effects behave as expected.

Signed-off-by: Kyle Huey <khuey@kylehuey.com>
---
 .../selftests/bpf/prog_tests/perf_skip.c      | 145 ++++++++++++++++++
 .../selftests/bpf/progs/test_perf_skip.c      |  15 ++
 2 files changed, 160 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
new file mode 100644
index 000000000000..f6fa9bfd9efa
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+/* We need the latest siginfo from the kernel repo. */
+#include <asm/siginfo.h>
+#define __have_siginfo_t 1
+#define __have_sigval_t 1
+#define __have_sigevent_t 1
+#define __siginfo_t_defined
+#define __sigval_t_defined
+#define __sigevent_t_defined
+#define _BITS_SIGINFO_CONSTS_H 1
+#define _BITS_SIGEVENT_CONSTS_H 1
+
+#include <test_progs.h>
+#include "test_perf_skip.skel.h"
+#include <linux/compiler.h>
+#include <linux/hw_breakpoint.h>
+#include <sys/mman.h>
+
+int signals_unexpected = 1;
+int sigio_count, sigtrap_count;
+
+static void handle_sigio(int sig __always_unused)
+{
+	ASSERT_OK(signals_unexpected, "perf event not skipped");
+	++sigio_count;
+}
+
+static void handle_sigtrap(int signum __always_unused,
+			   siginfo_t *info,
+			   void *ucontext __always_unused)
+{
+	ASSERT_OK(signals_unexpected, "perf event not skipped");
+	ASSERT_EQ(info->si_code, TRAP_PERF, "wrong si_code");
+	++sigtrap_count;
+}
+
+static noinline int test_function(void)
+{
+	asm volatile ("");
+	return 0;
+}
+
+void serial_test_perf_skip(void)
+{
+	struct sigaction action = {};
+	struct sigaction previous_sigtrap;
+	sighandler_t previous_sigio;
+	struct test_perf_skip *skel = NULL;
+	struct perf_event_attr attr = {};
+	int perf_fd = -1;
+	int err;
+	struct f_owner_ex owner;
+	struct bpf_link *prog_link = NULL;
+
+	action.sa_flags = SA_SIGINFO | SA_NODEFER;
+	action.sa_sigaction = handle_sigtrap;
+	sigemptyset(&action.sa_mask);
+	if (!ASSERT_OK(sigaction(SIGTRAP, &action, &previous_sigtrap), "sigaction"))
+		return;
+
+	previous_sigio = signal(SIGIO, handle_sigio);
+
+	skel = test_perf_skip__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_load"))
+		goto cleanup;
+
+	attr.type = PERF_TYPE_BREAKPOINT;
+	attr.size = sizeof(attr);
+	attr.bp_type = HW_BREAKPOINT_X;
+	attr.bp_addr = (uintptr_t)test_function;
+	attr.bp_len = sizeof(long);
+	attr.sample_period = 1;
+	attr.sample_type = PERF_SAMPLE_IP;
+	attr.pinned = 1;
+	attr.exclude_kernel = 1;
+	attr.exclude_hv = 1;
+	attr.precise_ip = 3;
+	attr.sigtrap = 1;
+	attr.remove_on_exec = 1;
+
+	perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
+	if (perf_fd < 0 && (errno == ENOENT || errno == EOPNOTSUPP)) {
+		printf("SKIP:no PERF_TYPE_BREAKPOINT/HW_BREAKPOINT_X\n");
+		test__skip();
+		goto cleanup;
+	}
+	if (!ASSERT_OK(perf_fd < 0, "perf_event_open"))
+		goto cleanup;
+
+	// Configure the perf event to signal on sample.
+	err = fcntl(perf_fd, F_SETFL, O_ASYNC);
+	if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
+		goto cleanup;
+
+	owner.type = F_OWNER_TID;
+	owner.pid = syscall(__NR_gettid);
+	err = fcntl(perf_fd, F_SETOWN_EX, &owner);
+	if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
+		goto cleanup;
+
+	// Allow at most one sample. A sample rejected by bpf should
+	// not count against this.
+	err = ioctl(perf_fd, PERF_EVENT_IOC_REFRESH, 1);
+	if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_REFRESH)"))
+		goto cleanup;
+
+	prog_link = bpf_program__attach_perf_event(skel->progs.handler, perf_fd);
+	if (!ASSERT_OK_PTR(prog_link, "bpf_program__attach_perf_event"))
+		goto cleanup;
+
+	// Configure the bpf program to suppress the sample.
+	skel->bss->ip = (uintptr_t)test_function;
+	test_function();
+
+	ASSERT_EQ(sigio_count, 0, "sigio_count");
+	ASSERT_EQ(sigtrap_count, 0, "sigtrap_count");
+
+	// Configure the bpf program to allow the sample.
+	skel->bss->ip = 0;
+	signals_unexpected = 0;
+	test_function();
+
+	ASSERT_EQ(sigio_count, 1, "sigio_count");
+	ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
+
+	// Test that the sample above is the only one allowed (by perf, not
+	// by bpf)
+	test_function();
+
+	ASSERT_EQ(sigio_count, 1, "sigio_count");
+	ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
+
+cleanup:
+	if (prog_link)
+		bpf_link__destroy(prog_link);
+	if (perf_fd >= 0)
+		close(perf_fd);
+	if (skel)
+		test_perf_skip__destroy(skel);
+
+	signal(SIGIO, previous_sigio);
+	sigaction(SIGTRAP, &previous_sigtrap, NULL);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
new file mode 100644
index 000000000000..417a9db3b770
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+uintptr_t ip;
+
+SEC("perf_event")
+int handler(struct bpf_perf_event_data *data)
+{
+	// Skip events that have the correct ip.
+	return ip != PT_REGS_IP(&data->regs);
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap
  2023-12-07 16:34 ` [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap Kyle Huey
@ 2023-12-07 17:05   ` Marco Elver
       [not found]     ` <CAP045Ap8z0qUpuYtbf9hpBqfnngNU7wVT0HM0XwQMrYYt9CAkg@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Marco Elver @ 2023-12-07 17:05 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kyle Huey, linux-kernel, Andrii Nakryiko, Jiri Olsa,
	Namhyung Kim, Yonghong Song, Robert O'Callahan,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Ian Rogers, Adrian Hunter,
	linux-perf-users

On Thu, 7 Dec 2023 at 17:35, Kyle Huey <me@kylehuey.com> wrote:
>
> The perf subsystem already allows an overflow handler to clear pending_kill
> to suppress a fasync signal (although nobody currently does this). Allow an
> overflow handler to suppress the other visible side effects of an overflow,
> namely event_limit accounting and SIGTRAP generation.
>
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> ---
>  kernel/events/core.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index b704d83a28b2..19fddfc27a4a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event,
>          */
>
>         event->pending_kill = POLL_IN;
> +
> +       READ_ONCE(event->overflow_handler)(event, data, regs);
> +
> +       if (!event->pending_kill)
> +               return ret;

It's not at all intuitive that resetting pending_kill to 0 will
suppress everything else, too. There is no relationship between the
fasync signals and SIGTRAP. pending_kill is for the former and
pending_sigtrap is for the latter. One should not affect the other.

A nicer solution would be to properly undo the various pending_*
states (in the case of pending_sigtrap being set it should be enough
to reset pending_sigtrap to 0, and also decrement
event->ctx->nr_pending).

Although I can see why this solution is simpler. Perhaps with enough
comments it might be clearer.

Preferences?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap
       [not found]     ` <CAP045Ap8z0qUpuYtbf9hpBqfnngNU7wVT0HM0XwQMrYYt9CAkg@mail.gmail.com>
@ 2023-12-07 17:53       ` Marco Elver
  2023-12-07 22:38         ` Namhyung Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Marco Elver @ 2023-12-07 17:53 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kyle Huey, open list, Andrii Nakryiko, Jiri Olsa, Namhyung Kim,
	Yonghong Song, Robert O'Callahan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Ian Rogers, Adrian Hunter, linux-perf-users

On Thu, 7 Dec 2023 at 18:47, Kyle Huey <me@kylehuey.com> wrote:
>
>
>
> On Thu, Dec 7, 2023, 9:05 AM Marco Elver <elver@google.com> wrote:
>>
>> On Thu, 7 Dec 2023 at 17:35, Kyle Huey <me@kylehuey.com> wrote:
>> >
>> > The perf subsystem already allows an overflow handler to clear pending_kill
>> > to suppress a fasync signal (although nobody currently does this). Allow an
>> > overflow handler to suppress the other visible side effects of an overflow,
>> > namely event_limit accounting and SIGTRAP generation.
>> >
>> > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
>> > ---
>> >  kernel/events/core.c | 10 +++++++---
>> >  1 file changed, 7 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/kernel/events/core.c b/kernel/events/core.c
>> > index b704d83a28b2..19fddfc27a4a 100644
>> > --- a/kernel/events/core.c
>> > +++ b/kernel/events/core.c
>> > @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event,
>> >          */
>> >
>> >         event->pending_kill = POLL_IN;
>> > +
>> > +       READ_ONCE(event->overflow_handler)(event, data, regs);
>> > +
>> > +       if (!event->pending_kill)
>> > +               return ret;
>>
>> It's not at all intuitive that resetting pending_kill to 0 will
>> suppress everything else, too. There is no relationship between the
>> fasync signals and SIGTRAP. pending_kill is for the former and
>> pending_sigtrap is for the latter. One should not affect the other.
>>
>> A nicer solution would be to properly undo the various pending_*
>> states (in the case of pending_sigtrap being set it should be enough
>> to reset pending_sigtrap to 0, and also decrement
>> event->ctx->nr_pending).
>
>
> I don't believe it's possible to correctly undo the event_limit decrement after the fact (if it's e.g. racing with the ioctl that adds to the event limit).
>
>> Although I can see why this solution is simpler. Perhaps with enough
>> comments it might be clearer.
>>
>> Preferences?
>
>
> The cleanest way would probably be to add a return value to the overflow handler function that controls this. It requires changing a bunch of arch specific code on arches I don't have access to though.

Hmm.

Maybe wait for perf maintainers to say what is preferrable. (I could
live with just making sure this has no other weird side effects and
more comments.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] perf/bpf: Allow a bpf program to suppress all sample side effects.
  2023-12-07 16:34 ` [PATCH v2 2/3] perf/bpf: Allow a bpf program to suppress all sample side effects Kyle Huey
@ 2023-12-07 19:12   ` Andrii Nakryiko
  0 siblings, 0 replies; 16+ messages in thread
From: Andrii Nakryiko @ 2023-12-07 19:12 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kyle Huey, linux-kernel, Jiri Olsa, Namhyung Kim, Marco Elver,
	Yonghong Song, Robert O'Callahan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Ian Rogers, Adrian Hunter, linux-perf-users,
	bpf

On Thu, Dec 7, 2023 at 8:35 AM Kyle Huey <me@kylehuey.com> wrote:
>
> Returning zero from a bpf program attached to a perf event already
> suppresses any data output. By clearing pending_kill, returning zero from a
> bpf program will effectively pretend the sample never happened for all
> userspace purposes.
>
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> ---
>  kernel/events/core.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>

LGTM

Acked-by: Andrii Nakryiko <andrii@kernel.org>

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 19fddfc27a4a..6cda05a4969d 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -10421,8 +10421,10 @@ static void bpf_overflow_handler(struct perf_event *event,
>         rcu_read_unlock();
>  out:
>         __this_cpu_dec(bpf_prog_active);
> -       if (!ret)
> +       if (!ret) {
> +               event->pending_kill = 0;
>                 return;
> +       }
>
>         event->orig_overflow_handler(event, data, regs);
>  }
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses side effects.
  2023-12-07 16:34 ` [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses " Kyle Huey
@ 2023-12-07 19:12   ` Andrii Nakryiko
  2023-12-07 19:20     ` Marco Elver
  0 siblings, 1 reply; 16+ messages in thread
From: Andrii Nakryiko @ 2023-12-07 19:12 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Kyle Huey, linux-kernel, Jiri Olsa, Namhyung Kim, Marco Elver,
	Yonghong Song, Robert O'Callahan, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
	Mykola Lysenko, Shuah Khan, bpf, linux-kselftest

On Thu, Dec 7, 2023 at 8:35 AM Kyle Huey <me@kylehuey.com> wrote:
>
> The test sets a hardware breakpoint and uses a bpf program to suppress the
> side effects of a perf event sample, including I/O availability signals,
> SIGTRAPs, and decrementing the event counter limit, if the ip matches the
> expected value. Then the function with the breakpoint is executed multiple
> times to test that all effects behave as expected.
>
> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> ---
>  .../selftests/bpf/prog_tests/perf_skip.c      | 145 ++++++++++++++++++
>  .../selftests/bpf/progs/test_perf_skip.c      |  15 ++
>  2 files changed, 160 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> new file mode 100644
> index 000000000000..f6fa9bfd9efa
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> @@ -0,0 +1,145 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +
> +/* We need the latest siginfo from the kernel repo. */
> +#include <asm/siginfo.h>

selftests are built with UAPI headers' copies under tools/include, so
CI did catch a real issue, I think. Try copying
include/uapi/asm-generic/siginfo.h into
tools/include/uapi/asm-generic/siginfo.h ?

> +#define __have_siginfo_t 1
> +#define __have_sigval_t 1
> +#define __have_sigevent_t 1
> +#define __siginfo_t_defined
> +#define __sigval_t_defined
> +#define __sigevent_t_defined
> +#define _BITS_SIGINFO_CONSTS_H 1
> +#define _BITS_SIGEVENT_CONSTS_H 1
> +
> +#include <test_progs.h>
> +#include "test_perf_skip.skel.h"
> +#include <linux/compiler.h>
> +#include <linux/hw_breakpoint.h>
> +#include <sys/mman.h>
> +
> +int signals_unexpected = 1;
> +int sigio_count, sigtrap_count;
> +
> +static void handle_sigio(int sig __always_unused)
> +{
> +       ASSERT_OK(signals_unexpected, "perf event not skipped");
> +       ++sigio_count;
> +}
> +
> +static void handle_sigtrap(int signum __always_unused,
> +                          siginfo_t *info,
> +                          void *ucontext __always_unused)
> +{
> +       ASSERT_OK(signals_unexpected, "perf event not skipped");
> +       ASSERT_EQ(info->si_code, TRAP_PERF, "wrong si_code");
> +       ++sigtrap_count;
> +}
> +
> +static noinline int test_function(void)
> +{
> +       asm volatile ("");
> +       return 0;
> +}
> +
> +void serial_test_perf_skip(void)
> +{
> +       struct sigaction action = {};
> +       struct sigaction previous_sigtrap;
> +       sighandler_t previous_sigio;
> +       struct test_perf_skip *skel = NULL;
> +       struct perf_event_attr attr = {};
> +       int perf_fd = -1;
> +       int err;
> +       struct f_owner_ex owner;
> +       struct bpf_link *prog_link = NULL;
> +
> +       action.sa_flags = SA_SIGINFO | SA_NODEFER;
> +       action.sa_sigaction = handle_sigtrap;
> +       sigemptyset(&action.sa_mask);
> +       if (!ASSERT_OK(sigaction(SIGTRAP, &action, &previous_sigtrap), "sigaction"))
> +               return;
> +
> +       previous_sigio = signal(SIGIO, handle_sigio);
> +
> +       skel = test_perf_skip__open_and_load();
> +       if (!ASSERT_OK_PTR(skel, "skel_load"))
> +               goto cleanup;
> +
> +       attr.type = PERF_TYPE_BREAKPOINT;
> +       attr.size = sizeof(attr);
> +       attr.bp_type = HW_BREAKPOINT_X;
> +       attr.bp_addr = (uintptr_t)test_function;
> +       attr.bp_len = sizeof(long);
> +       attr.sample_period = 1;
> +       attr.sample_type = PERF_SAMPLE_IP;
> +       attr.pinned = 1;
> +       attr.exclude_kernel = 1;
> +       attr.exclude_hv = 1;
> +       attr.precise_ip = 3;
> +       attr.sigtrap = 1;
> +       attr.remove_on_exec = 1;
> +
> +       perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
> +       if (perf_fd < 0 && (errno == ENOENT || errno == EOPNOTSUPP)) {
> +               printf("SKIP:no PERF_TYPE_BREAKPOINT/HW_BREAKPOINT_X\n");
> +               test__skip();
> +               goto cleanup;
> +       }
> +       if (!ASSERT_OK(perf_fd < 0, "perf_event_open"))
> +               goto cleanup;
> +
> +       // Configure the perf event to signal on sample.

please don't use C++ style comments

> +       err = fcntl(perf_fd, F_SETFL, O_ASYNC);
> +       if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
> +               goto cleanup;
> +
> +       owner.type = F_OWNER_TID;
> +       owner.pid = syscall(__NR_gettid);
> +       err = fcntl(perf_fd, F_SETOWN_EX, &owner);
> +       if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
> +               goto cleanup;
> +
> +       // Allow at most one sample. A sample rejected by bpf should
> +       // not count against this.
> +       err = ioctl(perf_fd, PERF_EVENT_IOC_REFRESH, 1);
> +       if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_REFRESH)"))
> +               goto cleanup;
> +
> +       prog_link = bpf_program__attach_perf_event(skel->progs.handler, perf_fd);
> +       if (!ASSERT_OK_PTR(prog_link, "bpf_program__attach_perf_event"))
> +               goto cleanup;
> +
> +       // Configure the bpf program to suppress the sample.
> +       skel->bss->ip = (uintptr_t)test_function;
> +       test_function();
> +
> +       ASSERT_EQ(sigio_count, 0, "sigio_count");
> +       ASSERT_EQ(sigtrap_count, 0, "sigtrap_count");
> +
> +       // Configure the bpf program to allow the sample.
> +       skel->bss->ip = 0;
> +       signals_unexpected = 0;
> +       test_function();
> +
> +       ASSERT_EQ(sigio_count, 1, "sigio_count");
> +       ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
> +
> +       // Test that the sample above is the only one allowed (by perf, not
> +       // by bpf)
> +       test_function();
> +
> +       ASSERT_EQ(sigio_count, 1, "sigio_count");
> +       ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
> +
> +cleanup:
> +       if (prog_link)

nit: no need for a check, bpf_link__destroy() handles NULLs just fine

> +               bpf_link__destroy(prog_link);
> +       if (perf_fd >= 0)
> +               close(perf_fd);
> +       if (skel)
> +               test_perf_skip__destroy(skel);

same, no need for NULL check

> +
> +       signal(SIGIO, previous_sigio);
> +       sigaction(SIGTRAP, &previous_sigtrap, NULL);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> new file mode 100644
> index 000000000000..417a9db3b770
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +uintptr_t ip;
> +
> +SEC("perf_event")
> +int handler(struct bpf_perf_event_data *data)
> +{
> +       // Skip events that have the correct ip.

C++ comments


> +       return ip != PT_REGS_IP(&data->regs);
> +}
> +
> +char _license[] SEC("license") = "GPL";
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses side effects.
  2023-12-07 19:12   ` Andrii Nakryiko
@ 2023-12-07 19:20     ` Marco Elver
  2023-12-07 22:56       ` Kyle Huey
  0 siblings, 1 reply; 16+ messages in thread
From: Marco Elver @ 2023-12-07 19:20 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Kyle Huey, Kyle Huey, linux-kernel, Jiri Olsa, Namhyung Kim,
	Yonghong Song, Robert O'Callahan, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
	Mykola Lysenko, Shuah Khan, bpf, linux-kselftest

On Thu, 7 Dec 2023 at 20:12, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Dec 7, 2023 at 8:35 AM Kyle Huey <me@kylehuey.com> wrote:
> >
> > The test sets a hardware breakpoint and uses a bpf program to suppress the
> > side effects of a perf event sample, including I/O availability signals,
> > SIGTRAPs, and decrementing the event counter limit, if the ip matches the
> > expected value. Then the function with the breakpoint is executed multiple
> > times to test that all effects behave as expected.
> >
> > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> > ---
> >  .../selftests/bpf/prog_tests/perf_skip.c      | 145 ++++++++++++++++++
> >  .../selftests/bpf/progs/test_perf_skip.c      |  15 ++
> >  2 files changed, 160 insertions(+)
> >  create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
> >  create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > new file mode 100644
> > index 000000000000..f6fa9bfd9efa
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > @@ -0,0 +1,145 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#define _GNU_SOURCE
> > +
> > +/* We need the latest siginfo from the kernel repo. */
> > +#include <asm/siginfo.h>
>
> selftests are built with UAPI headers' copies under tools/include, so
> CI did catch a real issue, I think. Try copying
> include/uapi/asm-generic/siginfo.h into
> tools/include/uapi/asm-generic/siginfo.h ?

I believe parts of this were inspired by
tools/testing/selftests/perf_events/sigtrap_threads.c - getting the
kernel headers is allowed, as long as $(KHDR_INCLUDES) is added to
CFLAGS. See tools/testing/selftests/perf_events/Makefile. Not sure
it's appropriate for this test though, if you don't want to add
KHDR_INCLUDES for everything.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap
  2023-12-07 17:53       ` Marco Elver
@ 2023-12-07 22:38         ` Namhyung Kim
  2023-12-07 23:02           ` Kyle Huey
  0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2023-12-07 22:38 UTC (permalink / raw)
  To: Marco Elver
  Cc: Kyle Huey, Kyle Huey, open list, Andrii Nakryiko, Jiri Olsa,
	Yonghong Song, Robert O'Callahan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Ian Rogers, Adrian Hunter, linux-perf-users

Hello,

On Thu, Dec 07, 2023 at 06:53:58PM +0100, Marco Elver wrote:
> On Thu, 7 Dec 2023 at 18:47, Kyle Huey <me@kylehuey.com> wrote:
> >
> >
> >
> > On Thu, Dec 7, 2023, 9:05 AM Marco Elver <elver@google.com> wrote:
> >>
> >> On Thu, 7 Dec 2023 at 17:35, Kyle Huey <me@kylehuey.com> wrote:
> >> >
> >> > The perf subsystem already allows an overflow handler to clear pending_kill
> >> > to suppress a fasync signal (although nobody currently does this). Allow an
> >> > overflow handler to suppress the other visible side effects of an overflow,
> >> > namely event_limit accounting and SIGTRAP generation.

Well, I think it can still hit the throttling logic and generate
a PERF_RECORD_THROTTLE.  But it should be rare..

> >> >
> >> > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> >> > ---
> >> >  kernel/events/core.c | 10 +++++++---
> >> >  1 file changed, 7 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> >> > index b704d83a28b2..19fddfc27a4a 100644
> >> > --- a/kernel/events/core.c
> >> > +++ b/kernel/events/core.c
> >> > @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event,
> >> >          */
> >> >
> >> >         event->pending_kill = POLL_IN;
> >> > +
> >> > +       READ_ONCE(event->overflow_handler)(event, data, regs);
> >> > +
> >> > +       if (!event->pending_kill)
> >> > +               return ret;
> >>
> >> It's not at all intuitive that resetting pending_kill to 0 will
> >> suppress everything else, too. There is no relationship between the
> >> fasync signals and SIGTRAP. pending_kill is for the former and
> >> pending_sigtrap is for the latter. One should not affect the other.
> >>
> >> A nicer solution would be to properly undo the various pending_*
> >> states (in the case of pending_sigtrap being set it should be enough
> >> to reset pending_sigtrap to 0, and also decrement
> >> event->ctx->nr_pending).
> >
> >
> > I don't believe it's possible to correctly undo the event_limit decrement after the fact (if it's e.g. racing with the ioctl that adds to the event limit).
> >
> >> Although I can see why this solution is simpler. Perhaps with enough
> >> comments it might be clearer.
> >>
> >> Preferences?
> >
> >
> > The cleanest way would probably be to add a return value to the overflow handler function that controls this. It requires changing a bunch of arch specific code on arches I don't have access to though.
> 
> Hmm.
> 
> Maybe wait for perf maintainers to say what is preferrable. (I could
> live with just making sure this has no other weird side effects and
> more comments.)

What if we can call bpf handler directly and check the return value?
Then I think we can also get rid of the original overflow handler.

Something like this (untested..)

Thanks,
Namhyung


---8<---

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e85cd1c0eaf3..1eba6f5bb70b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -809,7 +809,6 @@ struct perf_event {
 	perf_overflow_handler_t		overflow_handler;
 	void				*overflow_handler_context;
 #ifdef CONFIG_BPF_SYSCALL
-	perf_overflow_handler_t		orig_overflow_handler;
 	struct bpf_prog			*prog;
 	u64				bpf_cookie;
 #endif
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4c72a41f11af..e1a00646dbbe 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9471,6 +9471,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
  * Generic event overflow handling, sampling.
  */
 
+#ifdef CONFIG_BPF_SYSCALL
+static int bpf_overflow_handler(struct perf_event *event,
+				struct perf_sample_data *data,
+				struct pt_regs *regs);
+#endif
+
 static int __perf_event_overflow(struct perf_event *event,
 				 int throttle, struct perf_sample_data *data,
 				 struct pt_regs *regs)
@@ -9487,6 +9493,11 @@ static int __perf_event_overflow(struct perf_event *event,
 
 	ret = __perf_event_account_interrupt(event, throttle);
 
+#ifdef CONFIG_BPF_SYSCALL
+	if (event->prog && bpf_overflow_handler(event, data, regs) == 0)
+		return ret;
+#endif
+
 	/*
 	 * XXX event_limit might not quite work as expected on inherited
 	 * events
@@ -10346,7 +10357,7 @@ static void perf_event_free_filter(struct perf_event *event)
 }
 
 #ifdef CONFIG_BPF_SYSCALL
-static void bpf_overflow_handler(struct perf_event *event,
+static int bpf_overflow_handler(struct perf_event *event,
 				 struct perf_sample_data *data,
 				 struct pt_regs *regs)
 {
@@ -10355,7 +10366,7 @@ static void bpf_overflow_handler(struct perf_event *event,
 		.event = event,
 	};
 	struct bpf_prog *prog;
-	int ret = 0;
+	int ret = 1;
 
 	ctx.regs = perf_arch_bpf_user_pt_regs(regs);
 	if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
@@ -10369,10 +10380,7 @@ static void bpf_overflow_handler(struct perf_event *event,
 	rcu_read_unlock();
 out:
 	__this_cpu_dec(bpf_prog_active);
-	if (!ret)
-		return;
-
-	event->orig_overflow_handler(event, data, regs);
+	return ret;
 }
 
 static int perf_event_set_bpf_handler(struct perf_event *event,
@@ -10408,8 +10416,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,
 
 	event->prog = prog;
 	event->bpf_cookie = bpf_cookie;
-	event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
-	WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
 	return 0;
 }
 
@@ -10420,7 +10426,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
 	if (!prog)
 		return;
 
-	WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
 	event->prog = NULL;
 	bpf_prog_put(prog);
 }
@@ -11880,13 +11885,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		overflow_handler = parent_event->overflow_handler;
 		context = parent_event->overflow_handler_context;
 #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
-		if (overflow_handler == bpf_overflow_handler) {
+		if (parent_event->prog) {
 			struct bpf_prog *prog = parent_event->prog;
 
 			bpf_prog_inc(prog);
 			event->prog = prog;
-			event->orig_overflow_handler =
-				parent_event->orig_overflow_handler;
 		}
 #endif
 	}

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses side effects.
  2023-12-07 19:20     ` Marco Elver
@ 2023-12-07 22:56       ` Kyle Huey
  2023-12-08  1:08         ` Kyle Huey
  0 siblings, 1 reply; 16+ messages in thread
From: Kyle Huey @ 2023-12-07 22:56 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrii Nakryiko, Kyle Huey, linux-kernel, Jiri Olsa,
	Namhyung Kim, Yonghong Song, Robert O'Callahan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Mykola Lysenko, Shuah Khan, bpf,
	linux-kselftest

On Thu, Dec 7, 2023 at 11:20 AM Marco Elver <elver@google.com> wrote:
>
> On Thu, 7 Dec 2023 at 20:12, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Dec 7, 2023 at 8:35 AM Kyle Huey <me@kylehuey.com> wrote:
> > >
> > > The test sets a hardware breakpoint and uses a bpf program to suppress the
> > > side effects of a perf event sample, including I/O availability signals,
> > > SIGTRAPs, and decrementing the event counter limit, if the ip matches the
> > > expected value. Then the function with the breakpoint is executed multiple
> > > times to test that all effects behave as expected.
> > >
> > > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> > > ---
> > >  .../selftests/bpf/prog_tests/perf_skip.c      | 145 ++++++++++++++++++
> > >  .../selftests/bpf/progs/test_perf_skip.c      |  15 ++
> > >  2 files changed, 160 insertions(+)
> > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > >  create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
> > >
> > > diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > > new file mode 100644
> > > index 000000000000..f6fa9bfd9efa
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > > @@ -0,0 +1,145 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +#define _GNU_SOURCE
> > > +
> > > +/* We need the latest siginfo from the kernel repo. */
> > > +#include <asm/siginfo.h>
> >
> > selftests are built with UAPI headers' copies under tools/include, so
> > CI did catch a real issue, I think. Try copying
> > include/uapi/asm-generic/siginfo.h into
> > tools/include/uapi/asm-generic/siginfo.h ?
>
> I believe parts of this were inspired by
> tools/testing/selftests/perf_events/sigtrap_threads.c - getting the
> kernel headers is allowed, as long as $(KHDR_INCLUDES) is added to
> CFLAGS. See tools/testing/selftests/perf_events/Makefile. Not sure
> it's appropriate for this test though, if you don't want to add
> KHDR_INCLUDES for everything.

Yes, that's right. Namhyung's commit message for 91c97b36bd69 leads me
to believe that I should copy siginfo.h over into tools/include and
fix the perf_events self tests too.

- Kyle

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap
  2023-12-07 22:38         ` Namhyung Kim
@ 2023-12-07 23:02           ` Kyle Huey
  0 siblings, 0 replies; 16+ messages in thread
From: Kyle Huey @ 2023-12-07 23:02 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Marco Elver, Kyle Huey, open list, Andrii Nakryiko, Jiri Olsa,
	Yonghong Song, Robert O'Callahan, Peter Zijlstra,
	Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland,
	Alexander Shishkin, Ian Rogers, Adrian Hunter, linux-perf-users

On Thu, Dec 7, 2023 at 2:38 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> On Thu, Dec 07, 2023 at 06:53:58PM +0100, Marco Elver wrote:
> > On Thu, 7 Dec 2023 at 18:47, Kyle Huey <me@kylehuey.com> wrote:
> > >
> > >
> > >
> > > On Thu, Dec 7, 2023, 9:05 AM Marco Elver <elver@google.com> wrote:
> > >>
> > >> On Thu, 7 Dec 2023 at 17:35, Kyle Huey <me@kylehuey.com> wrote:
> > >> >
> > >> > The perf subsystem already allows an overflow handler to clear pending_kill
> > >> > to suppress a fasync signal (although nobody currently does this). Allow an
> > >> > overflow handler to suppress the other visible side effects of an overflow,
> > >> > namely event_limit accounting and SIGTRAP generation.
>
> Well, I think it can still hit the throttling logic and generate
> a PERF_RECORD_THROTTLE.  But it should be rare..
>
> > >> >
> > >> > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> > >> > ---
> > >> >  kernel/events/core.c | 10 +++++++---
> > >> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > >> >
> > >> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > >> > index b704d83a28b2..19fddfc27a4a 100644
> > >> > --- a/kernel/events/core.c
> > >> > +++ b/kernel/events/core.c
> > >> > @@ -9541,6 +9541,12 @@ static int __perf_event_overflow(struct perf_event *event,
> > >> >          */
> > >> >
> > >> >         event->pending_kill = POLL_IN;
> > >> > +
> > >> > +       READ_ONCE(event->overflow_handler)(event, data, regs);
> > >> > +
> > >> > +       if (!event->pending_kill)
> > >> > +               return ret;
> > >>
> > >> It's not at all intuitive that resetting pending_kill to 0 will
> > >> suppress everything else, too. There is no relationship between the
> > >> fasync signals and SIGTRAP. pending_kill is for the former and
> > >> pending_sigtrap is for the latter. One should not affect the other.
> > >>
> > >> A nicer solution would be to properly undo the various pending_*
> > >> states (in the case of pending_sigtrap being set it should be enough
> > >> to reset pending_sigtrap to 0, and also decrement
> > >> event->ctx->nr_pending).
> > >
> > >
> > > I don't believe it's possible to correctly undo the event_limit decrement after the fact (if it's e.g. racing with the ioctl that adds to the event limit).
> > >
> > >> Although I can see why this solution is simpler. Perhaps with enough
> > >> comments it might be clearer.
> > >>
> > >> Preferences?
> > >
> > >
> > > The cleanest way would probably be to add a return value to the overflow handler function that controls this. It requires changing a bunch of arch specific code on arches I don't have access to though.
> >
> > Hmm.
> >
> > Maybe wait for perf maintainers to say what is preferrable. (I could
> > live with just making sure this has no other weird side effects and
> > more comments.)
>
> What if we can call bpf handler directly and check the return value?
> Then I think we can also get rid of the original overflow handler.
>
> Something like this (untested..)

mmm, that's an interesting idea. I'll experiment with it.

- Kyle

> Thanks,
> Namhyung
>
>
> ---8<---
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index e85cd1c0eaf3..1eba6f5bb70b 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -809,7 +809,6 @@ struct perf_event {
>         perf_overflow_handler_t         overflow_handler;
>         void                            *overflow_handler_context;
>  #ifdef CONFIG_BPF_SYSCALL
> -       perf_overflow_handler_t         orig_overflow_handler;
>         struct bpf_prog                 *prog;
>         u64                             bpf_cookie;
>  #endif
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4c72a41f11af..e1a00646dbbe 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9471,6 +9471,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
>   * Generic event overflow handling, sampling.
>   */
>
> +#ifdef CONFIG_BPF_SYSCALL
> +static int bpf_overflow_handler(struct perf_event *event,
> +                               struct perf_sample_data *data,
> +                               struct pt_regs *regs);
> +#endif
> +
>  static int __perf_event_overflow(struct perf_event *event,
>                                  int throttle, struct perf_sample_data *data,
>                                  struct pt_regs *regs)
> @@ -9487,6 +9493,11 @@ static int __perf_event_overflow(struct perf_event *event,
>
>         ret = __perf_event_account_interrupt(event, throttle);
>
> +#ifdef CONFIG_BPF_SYSCALL
> +       if (event->prog && bpf_overflow_handler(event, data, regs) == 0)
> +               return ret;
> +#endif
> +
>         /*
>          * XXX event_limit might not quite work as expected on inherited
>          * events
> @@ -10346,7 +10357,7 @@ static void perf_event_free_filter(struct perf_event *event)
>  }
>
>  #ifdef CONFIG_BPF_SYSCALL
> -static void bpf_overflow_handler(struct perf_event *event,
> +static int bpf_overflow_handler(struct perf_event *event,
>                                  struct perf_sample_data *data,
>                                  struct pt_regs *regs)
>  {
> @@ -10355,7 +10366,7 @@ static void bpf_overflow_handler(struct perf_event *event,
>                 .event = event,
>         };
>         struct bpf_prog *prog;
> -       int ret = 0;
> +       int ret = 1;
>
>         ctx.regs = perf_arch_bpf_user_pt_regs(regs);
>         if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
> @@ -10369,10 +10380,7 @@ static void bpf_overflow_handler(struct perf_event *event,
>         rcu_read_unlock();
>  out:
>         __this_cpu_dec(bpf_prog_active);
> -       if (!ret)
> -               return;
> -
> -       event->orig_overflow_handler(event, data, regs);
> +       return ret;
>  }
>
>  static int perf_event_set_bpf_handler(struct perf_event *event,
> @@ -10408,8 +10416,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,
>
>         event->prog = prog;
>         event->bpf_cookie = bpf_cookie;
> -       event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
> -       WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
>         return 0;
>  }
>
> @@ -10420,7 +10426,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
>         if (!prog)
>                 return;
>
> -       WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
>         event->prog = NULL;
>         bpf_prog_put(prog);
>  }
> @@ -11880,13 +11885,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
>                 overflow_handler = parent_event->overflow_handler;
>                 context = parent_event->overflow_handler_context;
>  #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
> -               if (overflow_handler == bpf_overflow_handler) {
> +               if (parent_event->prog) {
>                         struct bpf_prog *prog = parent_event->prog;
>
>                         bpf_prog_inc(prog);
>                         event->prog = prog;
> -                       event->orig_overflow_handler =
> -                               parent_event->orig_overflow_handler;
>                 }
>  #endif
>         }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses side effects.
  2023-12-07 22:56       ` Kyle Huey
@ 2023-12-08  1:08         ` Kyle Huey
  2023-12-08  3:46           ` Yonghong Song
  2023-12-08  8:06           ` Marco Elver
  0 siblings, 2 replies; 16+ messages in thread
From: Kyle Huey @ 2023-12-08  1:08 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrii Nakryiko, Kyle Huey, linux-kernel, Jiri Olsa,
	Namhyung Kim, Yonghong Song, Robert O'Callahan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Mykola Lysenko, Shuah Khan, bpf,
	linux-kselftest

On Thu, Dec 7, 2023 at 2:56 PM Kyle Huey <me@kylehuey.com> wrote:
>
> On Thu, Dec 7, 2023 at 11:20 AM Marco Elver <elver@google.com> wrote:
> >
> > On Thu, 7 Dec 2023 at 20:12, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Thu, Dec 7, 2023 at 8:35 AM Kyle Huey <me@kylehuey.com> wrote:
> > > >
> > > > The test sets a hardware breakpoint and uses a bpf program to suppress the
> > > > side effects of a perf event sample, including I/O availability signals,
> > > > SIGTRAPs, and decrementing the event counter limit, if the ip matches the
> > > > expected value. Then the function with the breakpoint is executed multiple
> > > > times to test that all effects behave as expected.
> > > >
> > > > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> > > > ---
> > > >  .../selftests/bpf/prog_tests/perf_skip.c      | 145 ++++++++++++++++++
> > > >  .../selftests/bpf/progs/test_perf_skip.c      |  15 ++
> > > >  2 files changed, 160 insertions(+)
> > > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > > >  create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > > > new file mode 100644
> > > > index 000000000000..f6fa9bfd9efa
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > > > @@ -0,0 +1,145 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +#define _GNU_SOURCE
> > > > +
> > > > +/* We need the latest siginfo from the kernel repo. */
> > > > +#include <asm/siginfo.h>
> > >
> > > selftests are built with UAPI headers' copies under tools/include, so
> > > CI did catch a real issue, I think. Try copying
> > > include/uapi/asm-generic/siginfo.h into
> > > tools/include/uapi/asm-generic/siginfo.h ?
> >
> > I believe parts of this were inspired by
> > tools/testing/selftests/perf_events/sigtrap_threads.c - getting the
> > kernel headers is allowed, as long as $(KHDR_INCLUDES) is added to
> > CFLAGS. See tools/testing/selftests/perf_events/Makefile. Not sure
> > it's appropriate for this test though, if you don't want to add
> > KHDR_INCLUDES for everything.
>
> Yes, that's right. Namhyung's commit message for 91c97b36bd69 leads me
> to believe that I should copy siginfo.h over into tools/include and
> fix the perf_events self tests too.
>
> - Kyle

That doesn't really help (though perhaps it should be done anyway so
the selftests aren't reaching into include/) because the glibc headers
still redefine a ton of stuff in asm-generic/siginfo.h.

- Kyle

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses side effects.
  2023-12-08  1:08         ` Kyle Huey
@ 2023-12-08  3:46           ` Yonghong Song
  2023-12-08  8:06           ` Marco Elver
  1 sibling, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2023-12-08  3:46 UTC (permalink / raw)
  To: Kyle Huey, Marco Elver
  Cc: Andrii Nakryiko, Kyle Huey, linux-kernel, Jiri Olsa,
	Namhyung Kim, Robert O'Callahan, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo,
	Mykola Lysenko, Shuah Khan, bpf, linux-kselftest


On 12/7/23 5:08 PM, Kyle Huey wrote:
> On Thu, Dec 7, 2023 at 2:56 PM Kyle Huey <me@kylehuey.com> wrote:
>> On Thu, Dec 7, 2023 at 11:20 AM Marco Elver <elver@google.com> wrote:
>>> On Thu, 7 Dec 2023 at 20:12, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>> On Thu, Dec 7, 2023 at 8:35 AM Kyle Huey <me@kylehuey.com> wrote:
>>>>> The test sets a hardware breakpoint and uses a bpf program to suppress the
>>>>> side effects of a perf event sample, including I/O availability signals,
>>>>> SIGTRAPs, and decrementing the event counter limit, if the ip matches the
>>>>> expected value. Then the function with the breakpoint is executed multiple
>>>>> times to test that all effects behave as expected.
>>>>>
>>>>> Signed-off-by: Kyle Huey <khuey@kylehuey.com>
>>>>> ---
>>>>>   .../selftests/bpf/prog_tests/perf_skip.c      | 145 ++++++++++++++++++
>>>>>   .../selftests/bpf/progs/test_perf_skip.c      |  15 ++
>>>>>   2 files changed, 160 insertions(+)
>>>>>   create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
>>>>>   create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
>>>>>
>>>>> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
>>>>> new file mode 100644
>>>>> index 000000000000..f6fa9bfd9efa
>>>>> --- /dev/null
>>>>> +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
>>>>> @@ -0,0 +1,145 @@
>>>>> +// SPDX-License-Identifier: GPL-2.0
>>>>> +#define _GNU_SOURCE
>>>>> +
>>>>> +/* We need the latest siginfo from the kernel repo. */
>>>>> +#include <asm/siginfo.h>
>>>> selftests are built with UAPI headers' copies under tools/include, so
>>>> CI did catch a real issue, I think. Try copying
>>>> include/uapi/asm-generic/siginfo.h into
>>>> tools/include/uapi/asm-generic/siginfo.h ?
>>> I believe parts of this were inspired by
>>> tools/testing/selftests/perf_events/sigtrap_threads.c - getting the
>>> kernel headers is allowed, as long as $(KHDR_INCLUDES) is added to
>>> CFLAGS. See tools/testing/selftests/perf_events/Makefile. Not sure
>>> it's appropriate for this test though, if you don't want to add
>>> KHDR_INCLUDES for everything.
>> Yes, that's right. Namhyung's commit message for 91c97b36bd69 leads me
>> to believe that I should copy siginfo.h over into tools/include and
>> fix the perf_events self tests too.
>>
>> - Kyle
> That doesn't really help (though perhaps it should be done anyway so
> the selftests aren't reaching into include/) because the glibc headers
> still redefine a ton of stuff in asm-generic/siginfo.h.

Just for testing purpose, I think you can avoid includeasm/siginfo.h and directly define necessary structures in the C file 
directly, right?

>
> - Kyle

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses side effects.
  2023-12-08  1:08         ` Kyle Huey
  2023-12-08  3:46           ` Yonghong Song
@ 2023-12-08  8:06           ` Marco Elver
  2023-12-08 17:09             ` Kyle Huey
  1 sibling, 1 reply; 16+ messages in thread
From: Marco Elver @ 2023-12-08  8:06 UTC (permalink / raw)
  To: Kyle Huey
  Cc: Andrii Nakryiko, Kyle Huey, linux-kernel, Jiri Olsa,
	Namhyung Kim, Yonghong Song, Robert O'Callahan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Mykola Lysenko, Shuah Khan, bpf,
	linux-kselftest

On Fri, 8 Dec 2023 at 02:08, Kyle Huey <me@kylehuey.com> wrote:
>
> On Thu, Dec 7, 2023 at 2:56 PM Kyle Huey <me@kylehuey.com> wrote:
> >
> > On Thu, Dec 7, 2023 at 11:20 AM Marco Elver <elver@google.com> wrote:
> > >
> > > On Thu, 7 Dec 2023 at 20:12, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Thu, Dec 7, 2023 at 8:35 AM Kyle Huey <me@kylehuey.com> wrote:
> > > > >
> > > > > The test sets a hardware breakpoint and uses a bpf program to suppress the
> > > > > side effects of a perf event sample, including I/O availability signals,
> > > > > SIGTRAPs, and decrementing the event counter limit, if the ip matches the
> > > > > expected value. Then the function with the breakpoint is executed multiple
> > > > > times to test that all effects behave as expected.
> > > > >
> > > > > Signed-off-by: Kyle Huey <khuey@kylehuey.com>
> > > > > ---
> > > > >  .../selftests/bpf/prog_tests/perf_skip.c      | 145 ++++++++++++++++++
> > > > >  .../selftests/bpf/progs/test_perf_skip.c      |  15 ++
> > > > >  2 files changed, 160 insertions(+)
> > > > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > > > >  create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
> > > > >
> > > > > diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > > > > new file mode 100644
> > > > > index 000000000000..f6fa9bfd9efa
> > > > > --- /dev/null
> > > > > +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > > > > @@ -0,0 +1,145 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0
> > > > > +#define _GNU_SOURCE
> > > > > +
> > > > > +/* We need the latest siginfo from the kernel repo. */
> > > > > +#include <asm/siginfo.h>
> > > >
> > > > selftests are built with UAPI headers' copies under tools/include, so
> > > > CI did catch a real issue, I think. Try copying
> > > > include/uapi/asm-generic/siginfo.h into
> > > > tools/include/uapi/asm-generic/siginfo.h ?
> > >
> > > I believe parts of this were inspired by
> > > tools/testing/selftests/perf_events/sigtrap_threads.c - getting the
> > > kernel headers is allowed, as long as $(KHDR_INCLUDES) is added to
> > > CFLAGS. See tools/testing/selftests/perf_events/Makefile. Not sure
> > > it's appropriate for this test though, if you don't want to add
> > > KHDR_INCLUDES for everything.
> >
> > Yes, that's right. Namhyung's commit message for 91c97b36bd69 leads me
> > to believe that I should copy siginfo.h over into tools/include and
> > fix the perf_events self tests too.
> >
> > - Kyle
>
> That doesn't really help (though perhaps it should be done anyway so
> the selftests aren't reaching into include/) because the glibc headers
> still redefine a ton of stuff in asm-generic/siginfo.h.

From what I can see your test doesn't actually refer to any of the
si_perf_* fields, but only needs it for this:

> +       ASSERT_EQ(info->si_code, TRAP_PERF, "wrong si_code");

I think that's easy to fix by just defining TRAP_PERF yourself or
leaving out the assert completely. If you get an unexpected SIGTRAP
your asserts checking sigtrap_count elsewhere will fail, too. It'd
just be harder to debug.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses side effects.
  2023-12-08  8:06           ` Marco Elver
@ 2023-12-08 17:09             ` Kyle Huey
  0 siblings, 0 replies; 16+ messages in thread
From: Kyle Huey @ 2023-12-08 17:09 UTC (permalink / raw)
  To: Marco Elver
  Cc: Andrii Nakryiko, Kyle Huey, linux-kernel, Jiri Olsa,
	Namhyung Kim, Yonghong Song, Robert O'Callahan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, John Fastabend, KP Singh,
	Stanislav Fomichev, Hao Luo, Mykola Lysenko, Shuah Khan, bpf,
	linux-kselftest

On Fri, Dec 8, 2023 at 12:07 AM Marco Elver <elver@google.com> wrote:
> I think that's easy to fix by just defining TRAP_PERF yourself

Yeah that would work here.

- Kyle

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-12-08 17:10 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-07 16:34 [PATCH v2 0/3] Combine perf and bpf for fast eval of hw breakpoint conditions Kyle Huey
2023-12-07 16:34 ` [PATCH v2 1/3] perf: Reorder overflow handler ahead of event_limit/sigtrap Kyle Huey
2023-12-07 17:05   ` Marco Elver
     [not found]     ` <CAP045Ap8z0qUpuYtbf9hpBqfnngNU7wVT0HM0XwQMrYYt9CAkg@mail.gmail.com>
2023-12-07 17:53       ` Marco Elver
2023-12-07 22:38         ` Namhyung Kim
2023-12-07 23:02           ` Kyle Huey
2023-12-07 16:34 ` [PATCH v2 2/3] perf/bpf: Allow a bpf program to suppress all sample side effects Kyle Huey
2023-12-07 19:12   ` Andrii Nakryiko
2023-12-07 16:34 ` [PATCH v2 3/3] selftest/bpf: Test a perf bpf program that suppresses " Kyle Huey
2023-12-07 19:12   ` Andrii Nakryiko
2023-12-07 19:20     ` Marco Elver
2023-12-07 22:56       ` Kyle Huey
2023-12-08  1:08         ` Kyle Huey
2023-12-08  3:46           ` Yonghong Song
2023-12-08  8:06           ` Marco Elver
2023-12-08 17:09             ` Kyle Huey

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.