* [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse @ 2022-09-27 12:13 Marco Elver 2022-09-27 12:30 ` Marco Elver ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Marco Elver @ 2022-09-27 12:13 UTC (permalink / raw) To: elver, Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov Due to the implementation of how SIGTRAP are delivered if perf_event_attr::sigtrap is set, we've noticed 3 issues: 1. Missing SIGTRAP due to a race with event_sched_out() (more details below). 2. Hardware PMU events being disabled due to returning 1 from perf_event_overflow(). The only way to re-enable the event is for user space to first "properly" disable the event and then re-enable it. 3. The inability to automatically disable an event after a specified number of overflows via PERF_EVENT_IOC_REFRESH. The worst of the 3 issues is problem (1), which occurs when a pending_disable is "consumed" by a racing event_sched_out(), observed as follows: CPU0 | CPU1 --------------------------------+--------------------------- __perf_event_overflow() | perf_event_disable_inatomic() | pending_disable = CPU0 | ... | _perf_event_enable() | event_function_call() | task_function_call() | /* sends IPI to CPU0 */ <IPI> | ... __perf_event_enable() +--------------------------- ctx_resched() task_ctx_sched_out() ctx_sched_out() group_sched_out() event_sched_out() pending_disable = -1 </IPI> <IRQ-work> perf_pending_event() perf_pending_event_disable() /* Fails to send SIGTRAP because no pending_disable! */ </IRQ-work> In the above case, not only is that particular SIGTRAP missed, but also all future SIGTRAPs because 'event_limit' is not reset back to 1. To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction of a separate 'pending_sigtrap', no longer using 'event_limit' and 'pending_disable' for its delivery. During testing, this also revealed several more possible races between reschedules and pending IRQ work; see code comments for details. Doing so makes it possible to use 'event_limit' normally (thereby enabling use of PERF_EVENT_IOC_REFRESH), perf_event_overflow() no longer returns 1 on SIGTRAP causing disabling of hardware PMUs, and finally the race is no longer possible due to event_sched_out() not consuming 'pending_disable'. Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Debugged-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> --- include/linux/perf_event.h | 2 + kernel/events/core.c | 85 ++++++++++++++++++++++++++++++++------ 2 files changed, 75 insertions(+), 12 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 907b0e3f1318..dff3430844a2 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -740,8 +740,10 @@ struct perf_event { int pending_wakeup; int pending_kill; int pending_disable; + int pending_sigtrap; unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending; + struct irq_work pending_resched; atomic_t event_limit; diff --git a/kernel/events/core.c b/kernel/events/core.c index 75f5705b6892..df90777262bf 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2527,6 +2527,14 @@ event_sched_in(struct perf_event *event, if (event->attr.exclusive) cpuctx->exclusive = 1; + if (event->pending_sigtrap) { + /* + * The task and event might have been moved to another CPU: + * queue another IRQ work. See perf_pending_event_sigtrap(). + */ + WARN_ON_ONCE(!irq_work_queue(&event->pending_resched)); + } + out: perf_pmu_enable(event->pmu); @@ -4938,6 +4946,7 @@ static void perf_addr_filters_splice(struct perf_event *event, static void _free_event(struct perf_event *event) { irq_work_sync(&event->pending); + irq_work_sync(&event->pending_resched); unaccount_event(event); @@ -6446,6 +6455,37 @@ static void perf_sigtrap(struct perf_event *event) event->attr.type, event->attr.sig_data); } +static void perf_pending_event_sigtrap(struct perf_event *event) +{ + if (!event->pending_sigtrap) + return; + + /* + * If we're racing with disabling of the event, consume pending_sigtrap + * and don't send the SIGTRAP. This avoids potentially delaying a signal + * indefinitely (oncpu mismatch) until the event is enabled again, which + * could happen after already returning to user space; in that case the + * signal would erroneously become asynchronous. + */ + if (event->state == PERF_EVENT_STATE_OFF) { + event->pending_sigtrap = 0; + return; + } + + /* + * Only process this pending SIGTRAP if this IRQ work is running on the + * right CPU: the scheduler is able to run before the IRQ work, which + * moved the task to another CPU. In event_sched_in() another IRQ work + * is scheduled, so that the signal is not lost; given the kernel has + * not yet returned to user space, the signal remains synchronous. + */ + if (READ_ONCE(event->oncpu) != smp_processor_id()) + return; + + event->pending_sigtrap = 0; + perf_sigtrap(event); +} + static void perf_pending_event_disable(struct perf_event *event) { int cpu = READ_ONCE(event->pending_disable); @@ -6455,13 +6495,6 @@ static void perf_pending_event_disable(struct perf_event *event) if (cpu == smp_processor_id()) { WRITE_ONCE(event->pending_disable, -1); - - if (event->attr.sigtrap) { - perf_sigtrap(event); - atomic_set_release(&event->event_limit, 1); /* rearm event */ - return; - } - perf_event_disable_local(event); return; } @@ -6500,6 +6533,7 @@ static void perf_pending_event(struct irq_work *entry) * and we won't recurse 'further'. */ + perf_pending_event_sigtrap(event); perf_pending_event_disable(event); if (event->pending_wakeup) { @@ -6511,6 +6545,26 @@ static void perf_pending_event(struct irq_work *entry) perf_swevent_put_recursion_context(rctx); } +/* + * If handling of a pending action must occur before returning to user space, + * and it is possible to reschedule an event (to another CPU) with pending + * actions, where the moved-from CPU may not yet have run event->pending (and + * irq_work_queue() would fail on reuse), we'll use a separate IRQ work that + * runs perf_pending_event_resched(). + */ +static void perf_pending_event_resched(struct irq_work *entry) +{ + struct perf_event *event = container_of(entry, struct perf_event, pending_resched); + int rctx; + + rctx = perf_swevent_get_recursion_context(); + + perf_pending_event_sigtrap(event); + + if (rctx >= 0) + perf_swevent_put_recursion_context(rctx); +} + #ifdef CONFIG_GUEST_PERF_EVENTS struct perf_guest_info_callbacks __rcu *perf_guest_cbs; @@ -9209,11 +9263,20 @@ static int __perf_event_overflow(struct perf_event *event, if (events && atomic_dec_and_test(&event->event_limit)) { ret = 1; event->pending_kill = POLL_HUP; - event->pending_addr = data->addr; - perf_event_disable_inatomic(event); } + if (event->attr.sigtrap) { + /* + * Should not be able to return to user space without processing + * pending_sigtrap (kernel events can overflow multiple times). + */ + WARN_ON_ONCE(event->pending_sigtrap && event->attr.exclude_kernel); + event->pending_sigtrap = 1; + event->pending_addr = data->addr; + irq_work_queue(&event->pending); + } + READ_ONCE(event->overflow_handler)(event, data, regs); if (*perf_event_fasync(event) && event->pending_kill) { @@ -11536,6 +11599,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, init_waitqueue_head(&event->waitq); event->pending_disable = -1; init_irq_work(&event->pending, perf_pending_event); + init_irq_work(&event->pending_resched, perf_pending_event_resched); mutex_init(&event->mmap_mutex); raw_spin_lock_init(&event->addr_filters.lock); @@ -11557,9 +11621,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, if (parent_event) event->event_caps = parent_event->event_caps; - if (event->attr.sigtrap) - atomic_set(&event->event_limit, 1); - if (task) { event->attach_state = PERF_ATTACH_TASK; /* -- 2.37.3.998.g577e59143f-goog ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-09-27 12:13 [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse Marco Elver @ 2022-09-27 12:30 ` Marco Elver 2022-09-27 18:20 ` Peter Zijlstra 2022-10-06 13:33 ` [PATCH] perf: Fix missing SIGTRAPs Peter Zijlstra 2 siblings, 0 replies; 27+ messages in thread From: Marco Elver @ 2022-09-27 12:30 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Tue, Sep 27, 2022 at 02:13PM +0200, Marco Elver wrote: > Due to the implementation of how SIGTRAP are delivered if > perf_event_attr::sigtrap is set, we've noticed 3 issues: > > 1. Missing SIGTRAP due to a race with event_sched_out() (more > details below). > > 2. Hardware PMU events being disabled due to returning 1 from > perf_event_overflow(). The only way to re-enable the event is > for user space to first "properly" disable the event and then > re-enable it. > > 3. The inability to automatically disable an event after a > specified number of overflows via PERF_EVENT_IOC_REFRESH. > > The worst of the 3 issues is problem (1), which occurs when a > pending_disable is "consumed" by a racing event_sched_out(), observed as > follows: > > CPU0 | CPU1 > --------------------------------+--------------------------- > __perf_event_overflow() | > perf_event_disable_inatomic() | > pending_disable = CPU0 | ... > | _perf_event_enable() > | event_function_call() > | task_function_call() > | /* sends IPI to CPU0 */ > <IPI> | ... > __perf_event_enable() +--------------------------- > ctx_resched() > task_ctx_sched_out() > ctx_sched_out() > group_sched_out() > event_sched_out() > pending_disable = -1 > </IPI> > <IRQ-work> > perf_pending_event() > perf_pending_event_disable() > /* Fails to send SIGTRAP because no pending_disable! */ > </IRQ-work> > > In the above case, not only is that particular SIGTRAP missed, but also > all future SIGTRAPs because 'event_limit' is not reset back to 1. > > To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction > of a separate 'pending_sigtrap', no longer using 'event_limit' and > 'pending_disable' for its delivery. > > During testing, this also revealed several more possible races between > reschedules and pending IRQ work; see code comments for details. > > Doing so makes it possible to use 'event_limit' normally (thereby > enabling use of PERF_EVENT_IOC_REFRESH), perf_event_overflow() no longer > returns 1 on SIGTRAP causing disabling of hardware PMUs, and finally the > race is no longer possible due to event_sched_out() not consuming > 'pending_disable'. > > Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Debugged-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Marco Elver <elver@google.com> > --- > include/linux/perf_event.h | 2 + > kernel/events/core.c | 85 ++++++++++++++++++++++++++++++++------ > 2 files changed, 75 insertions(+), 12 deletions(-) > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 907b0e3f1318..dff3430844a2 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -740,8 +740,10 @@ struct perf_event { > int pending_wakeup; > int pending_kill; > int pending_disable; > + int pending_sigtrap; > unsigned long pending_addr; /* SIGTRAP */ > struct irq_work pending; > + struct irq_work pending_resched; > > atomic_t event_limit; > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 75f5705b6892..df90777262bf 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -2527,6 +2527,14 @@ event_sched_in(struct perf_event *event, > if (event->attr.exclusive) > cpuctx->exclusive = 1; > > + if (event->pending_sigtrap) { > + /* > + * The task and event might have been moved to another CPU: > + * queue another IRQ work. See perf_pending_event_sigtrap(). > + */ > + WARN_ON_ONCE(!irq_work_queue(&event->pending_resched)); One question we had is if it's possible for an event to be scheduled in, immediately scheduled out, and then scheduled in on a 3rd CPU. I.e. we'd still be in trouble if we can do this: CPU0 sched-out CPU1 sched-in sched-out CPU2 sched-in without any IRQ work ever running. Some naive solutions so the pending_resched IRQ work isn't needed, like trying to send a signal right here (or in event_sched_out()), don't work because we've seen syzkaller produce programs where there's a pending event and then the scheduler moves the task; because we're in the scheduler we can deadlock if we try to send the signal here. Thanks, -- Marco ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-09-27 12:13 [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse Marco Elver 2022-09-27 12:30 ` Marco Elver @ 2022-09-27 18:20 ` Peter Zijlstra 2022-09-27 21:45 ` Marco Elver 2022-10-06 13:33 ` [PATCH] perf: Fix missing SIGTRAPs Peter Zijlstra 2 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2022-09-27 18:20 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Tue, Sep 27, 2022 at 02:13:22PM +0200, Marco Elver wrote: > Due to the implementation of how SIGTRAP are delivered if > perf_event_attr::sigtrap is set, we've noticed 3 issues: > > 1. Missing SIGTRAP due to a race with event_sched_out() (more > details below). > > 2. Hardware PMU events being disabled due to returning 1 from > perf_event_overflow(). The only way to re-enable the event is > for user space to first "properly" disable the event and then > re-enable it. > > 3. The inability to automatically disable an event after a > specified number of overflows via PERF_EVENT_IOC_REFRESH. > > The worst of the 3 issues is problem (1), which occurs when a > pending_disable is "consumed" by a racing event_sched_out(), observed as > follows: > > CPU0 | CPU1 > --------------------------------+--------------------------- > __perf_event_overflow() | > perf_event_disable_inatomic() | > pending_disable = CPU0 | ... > | _perf_event_enable() > | event_function_call() > | task_function_call() > | /* sends IPI to CPU0 */ > <IPI> | ... > __perf_event_enable() +--------------------------- > ctx_resched() > task_ctx_sched_out() > ctx_sched_out() > group_sched_out() > event_sched_out() > pending_disable = -1 > </IPI> > <IRQ-work> > perf_pending_event() > perf_pending_event_disable() > /* Fails to send SIGTRAP because no pending_disable! */ > </IRQ-work> > > In the above case, not only is that particular SIGTRAP missed, but also > all future SIGTRAPs because 'event_limit' is not reset back to 1. > > To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction > of a separate 'pending_sigtrap', no longer using 'event_limit' and > 'pending_disable' for its delivery. > > During testing, this also revealed several more possible races between > reschedules and pending IRQ work; see code comments for details. Perhaps use task_work_add() for this case? That runs on the return-to-user path, so then it doesn't matter how many reschedules happen in between. The only concern is that task_work_add() uses kasan_record_aux_stack() which obviously isn't NMI clean, so that would need to get removed or made conditional. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-09-27 18:20 ` Peter Zijlstra @ 2022-09-27 21:45 ` Marco Elver 2022-09-28 10:06 ` Marco Elver 0 siblings, 1 reply; 27+ messages in thread From: Marco Elver @ 2022-09-27 21:45 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Tue, Sep 27, 2022 at 08:20PM +0200, Peter Zijlstra wrote: > On Tue, Sep 27, 2022 at 02:13:22PM +0200, Marco Elver wrote: > > Due to the implementation of how SIGTRAP are delivered if > > perf_event_attr::sigtrap is set, we've noticed 3 issues: > > > > 1. Missing SIGTRAP due to a race with event_sched_out() (more > > details below). > > > > 2. Hardware PMU events being disabled due to returning 1 from > > perf_event_overflow(). The only way to re-enable the event is > > for user space to first "properly" disable the event and then > > re-enable it. > > > > 3. The inability to automatically disable an event after a > > specified number of overflows via PERF_EVENT_IOC_REFRESH. > > > > The worst of the 3 issues is problem (1), which occurs when a > > pending_disable is "consumed" by a racing event_sched_out(), observed as > > follows: > > > > CPU0 | CPU1 > > --------------------------------+--------------------------- > > __perf_event_overflow() | > > perf_event_disable_inatomic() | > > pending_disable = CPU0 | ... > > | _perf_event_enable() > > | event_function_call() > > | task_function_call() > > | /* sends IPI to CPU0 */ > > <IPI> | ... > > __perf_event_enable() +--------------------------- > > ctx_resched() > > task_ctx_sched_out() > > ctx_sched_out() > > group_sched_out() > > event_sched_out() > > pending_disable = -1 > > </IPI> > > <IRQ-work> > > perf_pending_event() > > perf_pending_event_disable() > > /* Fails to send SIGTRAP because no pending_disable! */ > > </IRQ-work> > > > > In the above case, not only is that particular SIGTRAP missed, but also > > all future SIGTRAPs because 'event_limit' is not reset back to 1. > > > > To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction > > of a separate 'pending_sigtrap', no longer using 'event_limit' and > > 'pending_disable' for its delivery. > > > > During testing, this also revealed several more possible races between > > reschedules and pending IRQ work; see code comments for details. > > Perhaps use task_work_add() for this case? That runs on the > return-to-user path, so then it doesn't matter how many reschedules > happen in between. Hmm, I tried the below (on top of this patch), but then all the tests fail (including tools/testing/selftests/perf_events/sigtrap_threads.c) because of lots of missing SIGTRAP. (The missing SIGTRAP happen with or without the kernel/entry/ change.) So something is wrong with task_work, and the irq_work solution thus far is more robust (ran many hours of tests and fuzzing without failure). Thoughts? ------ >8 ------ diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index dff3430844a2..928fb9e2b655 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -743,7 +743,7 @@ struct perf_event { int pending_sigtrap; unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending; - struct irq_work pending_resched; + struct callback_head pending_twork; atomic_t event_limit; diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 063068a9ea9b..7cacaefc97fe 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -162,12 +162,12 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs, if (ti_work & _TIF_PATCH_PENDING) klp_update_patch_state(current); - if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) - arch_do_signal_or_restart(regs); - if (ti_work & _TIF_NOTIFY_RESUME) resume_user_mode_work(regs); + if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) + arch_do_signal_or_restart(regs); + /* Architecture specific TIF work */ arch_exit_to_user_mode_work(regs, ti_work); diff --git a/kernel/events/core.c b/kernel/events/core.c index 007a87c1599c..7f93dd91d572 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -17,6 +17,7 @@ #include <linux/poll.h> #include <linux/slab.h> #include <linux/hash.h> +#include <linux/task_work.h> #include <linux/tick.h> #include <linux/sysfs.h> #include <linux/dcache.h> @@ -2527,14 +2528,6 @@ event_sched_in(struct perf_event *event, if (event->attr.exclusive) cpuctx->exclusive = 1; - if (event->pending_sigtrap) { - /* - * The task and event might have been moved to another CPU: - * queue another IRQ work. See perf_pending_event_sigtrap(). - */ - WARN_ON_ONCE(!irq_work_queue(&event->pending_resched)); - } - out: perf_pmu_enable(event->pmu); @@ -4942,11 +4935,13 @@ static bool exclusive_event_installable(struct perf_event *event, static void perf_addr_filters_splice(struct perf_event *event, struct list_head *head); +static void perf_pending_event_task_work(struct callback_head *work); static void _free_event(struct perf_event *event) { irq_work_sync(&event->pending); - irq_work_sync(&event->pending_resched); + if (event->hw.target) + task_work_cancel(event->hw.target, perf_pending_event_task_work); unaccount_event(event); @@ -6438,15 +6433,7 @@ void perf_event_wakeup(struct perf_event *event) static void perf_sigtrap(struct perf_event *event) { /* - * We'd expect this to only occur if the irq_work is delayed and either - * ctx->task or current has changed in the meantime. This can be the - * case on architectures that do not implement arch_irq_work_raise(). - */ - if (WARN_ON_ONCE(event->ctx->task != current)) - return; - - /* - * perf_pending_event() can race with the task exiting. + * Can be called while the task is exiting. */ if (current->flags & PF_EXITING) return; @@ -6455,35 +6442,22 @@ static void perf_sigtrap(struct perf_event *event) event->attr.type, event->attr.sig_data); } -static void perf_pending_event_sigtrap(struct perf_event *event) +static void perf_pending_event_task_work(struct callback_head *work) { - if (!event->pending_sigtrap) - return; + struct perf_event *event = container_of(work, struct perf_event, pending_twork); + int rctx; - /* - * If we're racing with disabling of the event, consume pending_sigtrap - * and don't send the SIGTRAP. This avoids potentially delaying a signal - * indefinitely (oncpu mismatch) until the event is enabled again, which - * could happen after already returning to user space; in that case the - * signal would erroneously become asynchronous. - */ - if (event->state == PERF_EVENT_STATE_OFF) { + preempt_disable_notrace(); + rctx = perf_swevent_get_recursion_context(); + + if (event->pending_sigtrap) { event->pending_sigtrap = 0; - return; + perf_sigtrap(event); } - /* - * Only process this pending SIGTRAP if this IRQ work is running on the - * right CPU: the scheduler is able to run before the IRQ work, which - * moved the task to another CPU. In event_sched_in() another IRQ work - * is scheduled, so that the signal is not lost; given the kernel has - * not yet returned to user space, the signal remains synchronous. - */ - if (READ_ONCE(event->oncpu) != smp_processor_id()) - return; - - event->pending_sigtrap = 0; - perf_sigtrap(event); + if (rctx >= 0) + perf_swevent_put_recursion_context(rctx); + preempt_enable_notrace(); } static void perf_pending_event_disable(struct perf_event *event) @@ -6533,7 +6507,6 @@ static void perf_pending_event(struct irq_work *entry) * and we won't recurse 'further'. */ - perf_pending_event_sigtrap(event); perf_pending_event_disable(event); if (event->pending_wakeup) { @@ -6545,26 +6518,6 @@ static void perf_pending_event(struct irq_work *entry) perf_swevent_put_recursion_context(rctx); } -/* - * If handling of a pending action must occur before returning to user space, - * and it is possible to reschedule an event (to another CPU) with pending - * actions, where the moved-from CPU may not yet have run event->pending (and - * irq_work_queue() would fail on reuse), we'll use a separate IRQ work that - * runs perf_pending_event_resched(). - */ -static void perf_pending_event_resched(struct irq_work *entry) -{ - struct perf_event *event = container_of(entry, struct perf_event, pending_resched); - int rctx; - - rctx = perf_swevent_get_recursion_context(); - - perf_pending_event_sigtrap(event); - - if (rctx >= 0) - perf_swevent_put_recursion_context(rctx); -} - #ifdef CONFIG_GUEST_PERF_EVENTS struct perf_guest_info_callbacks __rcu *perf_guest_cbs; @@ -9274,7 +9227,7 @@ static int __perf_event_overflow(struct perf_event *event, WARN_ON_ONCE(event->pending_sigtrap && event->attr.exclude_kernel); event->pending_sigtrap = 1; event->pending_addr = data->addr; - irq_work_queue(&event->pending); + task_work_add(current, &event->pending_twork, TWA_RESUME); } READ_ONCE(event->overflow_handler)(event, data, regs); @@ -11599,7 +11552,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, init_waitqueue_head(&event->waitq); event->pending_disable = -1; init_irq_work(&event->pending, perf_pending_event); - init_irq_work(&event->pending_resched, perf_pending_event_resched); + init_task_work(&event->pending_twork, perf_pending_event_task_work); mutex_init(&event->mmap_mutex); raw_spin_lock_init(&event->addr_filters.lock); ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-09-27 21:45 ` Marco Elver @ 2022-09-28 10:06 ` Marco Elver 2022-09-28 14:55 ` Marco Elver 0 siblings, 1 reply; 27+ messages in thread From: Marco Elver @ 2022-09-28 10:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Tue, Sep 27, 2022 at 11:45PM +0200, Marco Elver wrote: > On Tue, Sep 27, 2022 at 08:20PM +0200, Peter Zijlstra wrote: > > On Tue, Sep 27, 2022 at 02:13:22PM +0200, Marco Elver wrote: > > > Due to the implementation of how SIGTRAP are delivered if > > > perf_event_attr::sigtrap is set, we've noticed 3 issues: > > > > > > 1. Missing SIGTRAP due to a race with event_sched_out() (more > > > details below). > > > > > > 2. Hardware PMU events being disabled due to returning 1 from > > > perf_event_overflow(). The only way to re-enable the event is > > > for user space to first "properly" disable the event and then > > > re-enable it. > > > > > > 3. The inability to automatically disable an event after a > > > specified number of overflows via PERF_EVENT_IOC_REFRESH. > > > > > > The worst of the 3 issues is problem (1), which occurs when a > > > pending_disable is "consumed" by a racing event_sched_out(), observed as > > > follows: > > > > > > CPU0 | CPU1 > > > --------------------------------+--------------------------- > > > __perf_event_overflow() | > > > perf_event_disable_inatomic() | > > > pending_disable = CPU0 | ... > > > | _perf_event_enable() > > > | event_function_call() > > > | task_function_call() > > > | /* sends IPI to CPU0 */ > > > <IPI> | ... > > > __perf_event_enable() +--------------------------- > > > ctx_resched() > > > task_ctx_sched_out() > > > ctx_sched_out() > > > group_sched_out() > > > event_sched_out() > > > pending_disable = -1 > > > </IPI> > > > <IRQ-work> > > > perf_pending_event() > > > perf_pending_event_disable() > > > /* Fails to send SIGTRAP because no pending_disable! */ > > > </IRQ-work> > > > > > > In the above case, not only is that particular SIGTRAP missed, but also > > > all future SIGTRAPs because 'event_limit' is not reset back to 1. > > > > > > To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction > > > of a separate 'pending_sigtrap', no longer using 'event_limit' and > > > 'pending_disable' for its delivery. > > > > > > During testing, this also revealed several more possible races between > > > reschedules and pending IRQ work; see code comments for details. > > > > Perhaps use task_work_add() for this case? That runs on the > > return-to-user path, so then it doesn't matter how many reschedules > > happen in between. > > Hmm, I tried the below (on top of this patch), but then all the tests > fail (including tools/testing/selftests/perf_events/sigtrap_threads.c) > because of lots of missing SIGTRAP. (The missing SIGTRAP happen with or > without the kernel/entry/ change.) > > So something is wrong with task_work, and the irq_work solution thus far > is more robust (ran many hours of tests and fuzzing without failure). My second idea about introducing something like irq_work_raw_sync(). Maybe it's not that crazy if it is actually safe. I expect this case where we need the irq_work_raw_sync() to be very very rare. ------ >8 ------ diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h index 8cd11a223260..490adecbb4be 100644 --- a/include/linux/irq_work.h +++ b/include/linux/irq_work.h @@ -59,6 +59,7 @@ bool irq_work_queue_on(struct irq_work *work, int cpu); void irq_work_tick(void); void irq_work_sync(struct irq_work *work); +bool irq_work_raw_sync(struct irq_work *work); #ifdef CONFIG_IRQ_WORK #include <asm/irq_work.h> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index dff3430844a2..c119fa7b70d6 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -743,7 +743,6 @@ struct perf_event { int pending_sigtrap; unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending; - struct irq_work pending_resched; atomic_t event_limit; diff --git a/kernel/events/core.c b/kernel/events/core.c index 007a87c1599c..6ba02a1b5c5d 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2532,7 +2532,8 @@ event_sched_in(struct perf_event *event, * The task and event might have been moved to another CPU: * queue another IRQ work. See perf_pending_event_sigtrap(). */ - WARN_ON_ONCE(!irq_work_queue(&event->pending_resched)); + irq_work_raw_sync(&event->pending); /* Syncs if pending on other CPU. */ + irq_work_queue(&event->pending); } out: @@ -4946,7 +4947,6 @@ static void perf_addr_filters_splice(struct perf_event *event, static void _free_event(struct perf_event *event) { irq_work_sync(&event->pending); - irq_work_sync(&event->pending_resched); unaccount_event(event); @@ -6545,26 +6545,6 @@ static void perf_pending_event(struct irq_work *entry) perf_swevent_put_recursion_context(rctx); } -/* - * If handling of a pending action must occur before returning to user space, - * and it is possible to reschedule an event (to another CPU) with pending - * actions, where the moved-from CPU may not yet have run event->pending (and - * irq_work_queue() would fail on reuse), we'll use a separate IRQ work that - * runs perf_pending_event_resched(). - */ -static void perf_pending_event_resched(struct irq_work *entry) -{ - struct perf_event *event = container_of(entry, struct perf_event, pending_resched); - int rctx; - - rctx = perf_swevent_get_recursion_context(); - - perf_pending_event_sigtrap(event); - - if (rctx >= 0) - perf_swevent_put_recursion_context(rctx); -} - #ifdef CONFIG_GUEST_PERF_EVENTS struct perf_guest_info_callbacks __rcu *perf_guest_cbs; @@ -11599,7 +11579,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, init_waitqueue_head(&event->waitq); event->pending_disable = -1; init_irq_work(&event->pending, perf_pending_event); - init_irq_work(&event->pending_resched, perf_pending_event_resched); mutex_init(&event->mmap_mutex); raw_spin_lock_init(&event->addr_filters.lock); diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 7afa40fe5cc4..2d21be0c0f3e 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -290,6 +290,40 @@ void irq_work_sync(struct irq_work *work) } EXPORT_SYMBOL_GPL(irq_work_sync); +/* + * Synchronize against the irq_work @work, ensuring the entry is not currently + * in use after returning true. If it returns false, it was not possible to + * synchronize against the irq_work. Requires that interrupts are already + * disabled (prefer irq_work_sync() in all other cases). + */ +bool irq_work_raw_sync(struct irq_work *work) +{ + struct irq_work *entry; + struct llist_head *list; + + lockdep_assert_irqs_disabled(); + + if (!irq_work_is_busy(work)) + return true; + + list = this_cpu_ptr(&raised_list); + llist_for_each_entry(entry, list->first, node.llist) { + if (entry == work) + return false; + } + list = this_cpu_ptr(&lazy_list); + llist_for_each_entry(entry, list->first, node.llist) { + if (entry == work) + return false; + } + + while (irq_work_is_busy(work)) + cpu_relax(); + + return true; +} +EXPORT_SYMBOL_GPL(irq_work_raw_sync); + static void run_irq_workd(unsigned int cpu) { irq_work_run_list(this_cpu_ptr(&lazy_list)); ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-09-28 10:06 ` Marco Elver @ 2022-09-28 14:55 ` Marco Elver 2022-10-04 17:09 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Marco Elver @ 2022-09-28 14:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov [-- Attachment #1: Type: text/plain, Size: 620 bytes --] On Wed, Sep 28, 2022 at 12:06PM +0200, Marco Elver wrote: > My second idea about introducing something like irq_work_raw_sync(). > Maybe it's not that crazy if it is actually safe. I expect this case > where we need the irq_work_raw_sync() to be very very rare. The previous irq_work_raw_sync() forgot about irq_work_queue_on(). Alas, I might still be missing something obvious, because "it's never that easy". ;-) And for completeness, the full perf patch of what it would look like together with irq_work_raw_sync() (consider it v1.5). It's already survived some shorter stress tests and fuzzing. Thanks, -- Marco [-- Attachment #2: 0001-irq_work-Introduce-irq_work_raw_sync.patch --] [-- Type: text/x-diff, Size: 2750 bytes --] From 5fcc38d87b2cd8c05c5306c0140ccc076c5bf963 Mon Sep 17 00:00:00 2001 From: Marco Elver <elver@google.com> Date: Wed, 28 Sep 2022 16:33:27 +0200 Subject: [PATCH 1/2] irq_work: Introduce irq_work_raw_sync() Introduce a non-sleeping spinning variant of irq_work_sync(), called irq_work_raw_sync(). Its usage is limited to contexts where interrupts are disabled, and unlike irq_work_sync(), may fail if the work is pending in the current CPU. Signed-off-by: Marco Elver <elver@google.com> --- v2: * New patch. --- include/linux/irq_work.h | 1 + kernel/irq_work.c | 41 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 42 insertions(+) diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h index 8cd11a223260..490adecbb4be 100644 --- a/include/linux/irq_work.h +++ b/include/linux/irq_work.h @@ -59,6 +59,7 @@ bool irq_work_queue_on(struct irq_work *work, int cpu); void irq_work_tick(void); void irq_work_sync(struct irq_work *work); +bool irq_work_raw_sync(struct irq_work *work); #ifdef CONFIG_IRQ_WORK #include <asm/irq_work.h> diff --git a/kernel/irq_work.c b/kernel/irq_work.c index 7afa40fe5cc4..b251b3437db1 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -290,6 +290,47 @@ void irq_work_sync(struct irq_work *work) } EXPORT_SYMBOL_GPL(irq_work_sync); +/* + * Synchronize against the irq_work @work, ensuring the entry is not currently + * in use after returning true; returns false if it's impossible to synchronize + * due to being queued on the current CPU. Requires that interrupts are already + * disabled (prefer irq_work_sync() in all other cases). + */ +bool irq_work_raw_sync(struct irq_work *work) +{ + struct llist_node *head; + struct irq_work *entry; + + /* + * Interrupts should be disabled, so that we can be sure the current + * CPU's work queues aren't concurrently run, cleared, and potentially + * some of its entries becoming invalid in the below iterations. + */ + lockdep_assert_irqs_disabled(); + + while (irq_work_is_busy(work)) { + /* + * It is only safe to wait if the work is not on this CPU's work + * queues. Also beware of concurrent irq_work_queue_on(), so we + * need to keep re-checking this CPU's queues in this busy loop. + */ + head = READ_ONCE(this_cpu_ptr(&raised_list)->first); + llist_for_each_entry(entry, head, node.llist) { + if (entry == work) + return false; + } + head = READ_ONCE(this_cpu_ptr(&lazy_list)->first); + llist_for_each_entry(entry, head, node.llist) { + if (entry == work) + return false; + } + cpu_relax(); + } + + return true; +} +EXPORT_SYMBOL_GPL(irq_work_raw_sync); + static void run_irq_workd(unsigned int cpu) { irq_work_run_list(this_cpu_ptr(&lazy_list)); -- 2.37.3.998.g577e59143f-goog [-- Attachment #3: 0002-perf-Fix-missing-SIGTRAPs-due-to-pending_disable-abu.patch --] [-- Type: text/x-diff, Size: 6677 bytes --] From 4467a6520c8f59065220a651167c4cd8da7a6e9b Mon Sep 17 00:00:00 2001 From: Marco Elver <elver@google.com> Date: Fri, 23 Sep 2022 16:43:19 +0200 Subject: [PATCH 2/2] perf: Fix missing SIGTRAPs due to pending_disable abuse Due to the implementation of how SIGTRAP are delivered if perf_event_attr::sigtrap is set, we've noticed 3 issues: 1. Missing SIGTRAP due to a race with event_sched_out() (more details below). 2. Hardware PMU events being disabled due to returning 1 from perf_event_overflow(). The only way to re-enable the event is for user space to first "properly" disable the event and then re-enable it. 3. The inability to automatically disable an event after a specified number of overflows via PERF_EVENT_IOC_REFRESH. The worst of the 3 issues is problem (1), which occurs when a pending_disable is "consumed" by a racing event_sched_out(), observed as follows: CPU0 | CPU1 --------------------------------+--------------------------- __perf_event_overflow() | perf_event_disable_inatomic() | pending_disable = CPU0 | ... | _perf_event_enable() | event_function_call() | task_function_call() | /* sends IPI to CPU0 */ <IPI> | ... __perf_event_enable() +--------------------------- ctx_resched() task_ctx_sched_out() ctx_sched_out() group_sched_out() event_sched_out() pending_disable = -1 </IPI> <IRQ-work> perf_pending_event() perf_pending_event_disable() /* Fails to send SIGTRAP because no pending_disable! */ </IRQ-work> In the above case, not only is that particular SIGTRAP missed, but also all future SIGTRAPs because 'event_limit' is not reset back to 1. To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction of a separate 'pending_sigtrap', no longer using 'event_limit' and 'pending_disable' for its delivery. During testing, this also revealed several more possible races between reschedules and pending IRQ work; see code comments for details. Doing so makes it possible to use 'event_limit' normally (thereby enabling use of PERF_EVENT_IOC_REFRESH), perf_event_overflow() no longer returns 1 on SIGTRAP causing disabling of hardware PMUs, and finally the race is no longer possible due to event_sched_out() not consuming 'pending_disable'. Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Debugged-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> --- v2: * Use irq_work_raw_sync(). --- include/linux/perf_event.h | 1 + kernel/events/core.c | 64 +++++++++++++++++++++++++++++++------- 2 files changed, 53 insertions(+), 12 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 907b0e3f1318..c119fa7b70d6 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -740,6 +740,7 @@ struct perf_event { int pending_wakeup; int pending_kill; int pending_disable; + int pending_sigtrap; unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending; diff --git a/kernel/events/core.c b/kernel/events/core.c index c37ba0909078..6ba02a1b5c5d 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2527,6 +2527,15 @@ event_sched_in(struct perf_event *event, if (event->attr.exclusive) cpuctx->exclusive = 1; + if (event->pending_sigtrap) { + /* + * The task and event might have been moved to another CPU: + * queue another IRQ work. See perf_pending_event_sigtrap(). + */ + irq_work_raw_sync(&event->pending); /* Syncs if pending on other CPU. */ + irq_work_queue(&event->pending); + } + out: perf_pmu_enable(event->pmu); @@ -6446,6 +6455,37 @@ static void perf_sigtrap(struct perf_event *event) event->attr.type, event->attr.sig_data); } +static void perf_pending_event_sigtrap(struct perf_event *event) +{ + if (!event->pending_sigtrap) + return; + + /* + * If we're racing with disabling of the event, consume pending_sigtrap + * and don't send the SIGTRAP. This avoids potentially delaying a signal + * indefinitely (oncpu mismatch) until the event is enabled again, which + * could happen after already returning to user space; in that case the + * signal would erroneously become asynchronous. + */ + if (event->state == PERF_EVENT_STATE_OFF) { + event->pending_sigtrap = 0; + return; + } + + /* + * Only process this pending SIGTRAP if this IRQ work is running on the + * right CPU: the scheduler is able to run before the IRQ work, which + * moved the task to another CPU. In event_sched_in() another IRQ work + * is scheduled, so that the signal is not lost; given the kernel has + * not yet returned to user space, the signal remains synchronous. + */ + if (READ_ONCE(event->oncpu) != smp_processor_id()) + return; + + event->pending_sigtrap = 0; + perf_sigtrap(event); +} + static void perf_pending_event_disable(struct perf_event *event) { int cpu = READ_ONCE(event->pending_disable); @@ -6455,13 +6495,6 @@ static void perf_pending_event_disable(struct perf_event *event) if (cpu == smp_processor_id()) { WRITE_ONCE(event->pending_disable, -1); - - if (event->attr.sigtrap) { - perf_sigtrap(event); - atomic_set_release(&event->event_limit, 1); /* rearm event */ - return; - } - perf_event_disable_local(event); return; } @@ -6500,6 +6533,7 @@ static void perf_pending_event(struct irq_work *entry) * and we won't recurse 'further'. */ + perf_pending_event_sigtrap(event); perf_pending_event_disable(event); if (event->pending_wakeup) { @@ -9209,11 +9243,20 @@ static int __perf_event_overflow(struct perf_event *event, if (events && atomic_dec_and_test(&event->event_limit)) { ret = 1; event->pending_kill = POLL_HUP; - event->pending_addr = data->addr; - perf_event_disable_inatomic(event); } + if (event->attr.sigtrap) { + /* + * Should not be able to return to user space without processing + * pending_sigtrap (kernel events can overflow multiple times). + */ + WARN_ON_ONCE(event->pending_sigtrap && event->attr.exclude_kernel); + event->pending_sigtrap = 1; + event->pending_addr = data->addr; + irq_work_queue(&event->pending); + } + READ_ONCE(event->overflow_handler)(event, data, regs); if (*perf_event_fasync(event) && event->pending_kill) { @@ -11557,9 +11600,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, if (parent_event) event->event_caps = parent_event->event_caps; - if (event->attr.sigtrap) - atomic_set(&event->event_limit, 1); - if (task) { event->attach_state = PERF_ATTACH_TASK; /* -- 2.37.3.998.g577e59143f-goog ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-09-28 14:55 ` Marco Elver @ 2022-10-04 17:09 ` Peter Zijlstra 2022-10-04 17:21 ` Peter Zijlstra 2022-10-04 17:33 ` Marco Elver 0 siblings, 2 replies; 27+ messages in thread From: Peter Zijlstra @ 2022-10-04 17:09 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Wed, Sep 28, 2022 at 04:55:46PM +0200, Marco Elver wrote: > On Wed, Sep 28, 2022 at 12:06PM +0200, Marco Elver wrote: > > > My second idea about introducing something like irq_work_raw_sync(). > > Maybe it's not that crazy if it is actually safe. I expect this case > > where we need the irq_work_raw_sync() to be very very rare. > > The previous irq_work_raw_sync() forgot about irq_work_queue_on(). Alas, > I might still be missing something obvious, because "it's never that > easy". ;-) > > And for completeness, the full perf patch of what it would look like > together with irq_work_raw_sync() (consider it v1.5). It's already > survived some shorter stress tests and fuzzing. So.... I don't like it. But I cooked up the below, which _almost_ works :-/ For some raisin it sometimes fails with 14999 out of 15000 events delivered and I've not yet figured out where it goes sideways. I'm currently thinking it's that sigtrap clear on OFF. Still, what do you think of the approach? --- include/linux/perf_event.h | 8 ++-- kernel/events/core.c | 92 +++++++++++++++++++++++++--------------------- 2 files changed, 55 insertions(+), 45 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index ee8b9ecdc03b..c54161719d37 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -736,9 +736,11 @@ struct perf_event { struct fasync_struct *fasync; /* delayed work for NMIs and such */ - int pending_wakeup; - int pending_kill; - int pending_disable; + unsigned int pending_wakeup :1; + unsigned int pending_disable :1; + unsigned int pending_sigtrap :1; + unsigned int pending_kill :3; + unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending; diff --git a/kernel/events/core.c b/kernel/events/core.c index 2621fd24ad26..8e5dbe971d9e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2268,11 +2268,15 @@ event_sched_out(struct perf_event *event, event->pmu->del(event, 0); event->oncpu = -1; - if (READ_ONCE(event->pending_disable) >= 0) { - WRITE_ONCE(event->pending_disable, -1); + if (event->pending_disable) { + event->pending_disable = 0; perf_cgroup_event_disable(event, ctx); state = PERF_EVENT_STATE_OFF; } + + if (event->pending_sigtrap && state == PERF_EVENT_STATE_OFF) + event->pending_sigtrap = 0; + perf_event_set_state(event, state); if (!is_software_event(event)) @@ -2463,8 +2467,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable); void perf_event_disable_inatomic(struct perf_event *event) { - WRITE_ONCE(event->pending_disable, smp_processor_id()); - /* can fail, see perf_pending_event_disable() */ + event->pending_disable = 1; irq_work_queue(&event->pending); } @@ -2527,6 +2530,9 @@ event_sched_in(struct perf_event *event, if (event->attr.exclusive) cpuctx->exclusive = 1; + if (event->pending_disable || event->pending_sigtrap) + irq_work_queue(&event->pending); + out: perf_pmu_enable(event->pmu); @@ -6440,47 +6446,40 @@ static void perf_sigtrap(struct perf_event *event) event->attr.type, event->attr.sig_data); } -static void perf_pending_event_disable(struct perf_event *event) +/* + * Deliver the pending work in-event-context or follow the context. + */ +static void __perf_pending_event(struct perf_event *event) { - int cpu = READ_ONCE(event->pending_disable); + int cpu = READ_ONCE(event->oncpu); + /* + * If the event isn't running; we done. event_sched_in() will restart + * the irq_work when needed. + */ if (cpu < 0) return; + /* + * Yay, we hit home and are in the context of the event. + */ if (cpu == smp_processor_id()) { - WRITE_ONCE(event->pending_disable, -1); - - if (event->attr.sigtrap) { + if (event->pending_sigtrap) { + event->pending_sigtrap = 0; perf_sigtrap(event); - atomic_set_release(&event->event_limit, 1); /* rearm event */ - return; } - - perf_event_disable_local(event); - return; + if (event->pending_disable) { + event->pending_disable = 0; + perf_event_disable_local(event); + } } /* - * CPU-A CPU-B - * - * perf_event_disable_inatomic() - * @pending_disable = CPU-A; - * irq_work_queue(); - * - * sched-out - * @pending_disable = -1; - * - * sched-in - * perf_event_disable_inatomic() - * @pending_disable = CPU-B; - * irq_work_queue(); // FAILS - * - * irq_work_run() - * perf_pending_event() - * - * But the event runs on CPU-B and wants disabling there. + * Requeue if there's still any pending work left, make sure to follow + * where the event went. */ - irq_work_queue_on(&event->pending, cpu); + if (event->pending_disable || event->pending_sigtrap) + irq_work_queue_on(&event->pending, cpu); } static void perf_pending_event(struct irq_work *entry) @@ -6488,19 +6487,23 @@ static void perf_pending_event(struct irq_work *entry) struct perf_event *event = container_of(entry, struct perf_event, pending); int rctx; - rctx = perf_swevent_get_recursion_context(); /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. */ + rctx = perf_swevent_get_recursion_context(); - perf_pending_event_disable(event); - + /* + * The wakeup isn't bound to the context of the event -- it can happen + * irrespective of where the event is. + */ if (event->pending_wakeup) { event->pending_wakeup = 0; perf_event_wakeup(event); } + __perf_pending_event(event); + if (rctx >= 0) perf_swevent_put_recursion_context(rctx); } @@ -9203,11 +9206,20 @@ static int __perf_event_overflow(struct perf_event *event, if (events && atomic_dec_and_test(&event->event_limit)) { ret = 1; event->pending_kill = POLL_HUP; - event->pending_addr = data->addr; - perf_event_disable_inatomic(event); } + if (event->attr.sigtrap) { + /* + * Should not be able to return to user space without processing + * pending_sigtrap (kernel events can overflow multiple times). + */ + WARN_ON_ONCE(event->pending_sigtrap && event->attr.exclude_kernel); + event->pending_sigtrap = 1; + event->pending_addr = data->addr; + irq_work_queue(&event->pending); + } + READ_ONCE(event->overflow_handler)(event, data, regs); if (*perf_event_fasync(event) && event->pending_kill) { @@ -11528,7 +11540,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, init_waitqueue_head(&event->waitq); - event->pending_disable = -1; init_irq_work(&event->pending, perf_pending_event); mutex_init(&event->mmap_mutex); @@ -11551,9 +11562,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, if (parent_event) event->event_caps = parent_event->event_caps; - if (event->attr.sigtrap) - atomic_set(&event->event_limit, 1); - if (task) { event->attach_state = PERF_ATTACH_TASK; /* ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-10-04 17:09 ` Peter Zijlstra @ 2022-10-04 17:21 ` Peter Zijlstra 2022-10-04 17:33 ` Marco Elver 1 sibling, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2022-10-04 17:21 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Tue, Oct 04, 2022 at 07:09:15PM +0200, Peter Zijlstra wrote: > On Wed, Sep 28, 2022 at 04:55:46PM +0200, Marco Elver wrote: > > On Wed, Sep 28, 2022 at 12:06PM +0200, Marco Elver wrote: > > > > > My second idea about introducing something like irq_work_raw_sync(). > > > Maybe it's not that crazy if it is actually safe. I expect this case > > > where we need the irq_work_raw_sync() to be very very rare. > > > > The previous irq_work_raw_sync() forgot about irq_work_queue_on(). Alas, > > I might still be missing something obvious, because "it's never that > > easy". ;-) > > > > And for completeness, the full perf patch of what it would look like > > together with irq_work_raw_sync() (consider it v1.5). It's already > > survived some shorter stress tests and fuzzing. > > So.... I don't like it. But I cooked up the below, which _almost_ works :-/ > > For some raisin it sometimes fails with 14999 out of 15000 events > delivered and I've not yet figured out where it goes sideways. I'm > currently thinking it's that sigtrap clear on OFF. Oh Urgh, this is ofcourse the case where an IPI races with a migration and we loose the race with return to use. Effectively giving the signal skid vs the hardware event. Bah.. I really hate having one CPU wait for another... Let me see if I can find another way to close that hole. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-10-04 17:09 ` Peter Zijlstra 2022-10-04 17:21 ` Peter Zijlstra @ 2022-10-04 17:33 ` Marco Elver 2022-10-05 7:37 ` Peter Zijlstra 1 sibling, 1 reply; 27+ messages in thread From: Marco Elver @ 2022-10-04 17:33 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Tue, 4 Oct 2022 at 19:09, Peter Zijlstra <peterz@infradead.org> wrote: > > On Wed, Sep 28, 2022 at 04:55:46PM +0200, Marco Elver wrote: > > On Wed, Sep 28, 2022 at 12:06PM +0200, Marco Elver wrote: > > > > > My second idea about introducing something like irq_work_raw_sync(). > > > Maybe it's not that crazy if it is actually safe. I expect this case > > > where we need the irq_work_raw_sync() to be very very rare. > > > > The previous irq_work_raw_sync() forgot about irq_work_queue_on(). Alas, > > I might still be missing something obvious, because "it's never that > > easy". ;-) > > > > And for completeness, the full perf patch of what it would look like > > together with irq_work_raw_sync() (consider it v1.5). It's already > > survived some shorter stress tests and fuzzing. > > So.... I don't like it. But I cooked up the below, which _almost_ works :-/ > > For some raisin it sometimes fails with 14999 out of 15000 events > delivered and I've not yet figured out where it goes sideways. I'm > currently thinking it's that sigtrap clear on OFF. > > Still, what do you think of the approach? It looks reasonable, but obviously needs to pass tests. :-) Also, see comment below (I think you're still turning signals asynchronous, which we shouldn't do). > --- > include/linux/perf_event.h | 8 ++-- > kernel/events/core.c | 92 +++++++++++++++++++++++++--------------------- > 2 files changed, 55 insertions(+), 45 deletions(-) > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index ee8b9ecdc03b..c54161719d37 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -736,9 +736,11 @@ struct perf_event { > struct fasync_struct *fasync; > > /* delayed work for NMIs and such */ > - int pending_wakeup; > - int pending_kill; > - int pending_disable; > + unsigned int pending_wakeup :1; > + unsigned int pending_disable :1; > + unsigned int pending_sigtrap :1; > + unsigned int pending_kill :3; > + > unsigned long pending_addr; /* SIGTRAP */ > struct irq_work pending; > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 2621fd24ad26..8e5dbe971d9e 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -2268,11 +2268,15 @@ event_sched_out(struct perf_event *event, > event->pmu->del(event, 0); > event->oncpu = -1; > > - if (READ_ONCE(event->pending_disable) >= 0) { > - WRITE_ONCE(event->pending_disable, -1); > + if (event->pending_disable) { > + event->pending_disable = 0; > perf_cgroup_event_disable(event, ctx); > state = PERF_EVENT_STATE_OFF; > } > + > + if (event->pending_sigtrap && state == PERF_EVENT_STATE_OFF) > + event->pending_sigtrap = 0; > + > perf_event_set_state(event, state); > > if (!is_software_event(event)) > @@ -2463,8 +2467,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable); > > void perf_event_disable_inatomic(struct perf_event *event) > { > - WRITE_ONCE(event->pending_disable, smp_processor_id()); > - /* can fail, see perf_pending_event_disable() */ > + event->pending_disable = 1; > irq_work_queue(&event->pending); > } > > @@ -2527,6 +2530,9 @@ event_sched_in(struct perf_event *event, > if (event->attr.exclusive) > cpuctx->exclusive = 1; > > + if (event->pending_disable || event->pending_sigtrap) > + irq_work_queue(&event->pending); > + > out: > perf_pmu_enable(event->pmu); > > @@ -6440,47 +6446,40 @@ static void perf_sigtrap(struct perf_event *event) > event->attr.type, event->attr.sig_data); > } > > -static void perf_pending_event_disable(struct perf_event *event) > +/* > + * Deliver the pending work in-event-context or follow the context. > + */ > +static void __perf_pending_event(struct perf_event *event) > { > - int cpu = READ_ONCE(event->pending_disable); > + int cpu = READ_ONCE(event->oncpu); > > + /* > + * If the event isn't running; we done. event_sched_in() will restart > + * the irq_work when needed. > + */ > if (cpu < 0) > return; > > + /* > + * Yay, we hit home and are in the context of the event. > + */ > if (cpu == smp_processor_id()) { > - WRITE_ONCE(event->pending_disable, -1); > - > - if (event->attr.sigtrap) { > + if (event->pending_sigtrap) { > + event->pending_sigtrap = 0; > perf_sigtrap(event); > - atomic_set_release(&event->event_limit, 1); /* rearm event */ > - return; > } > - > - perf_event_disable_local(event); > - return; > + if (event->pending_disable) { > + event->pending_disable = 0; > + perf_event_disable_local(event); > + } > } > > /* > - * CPU-A CPU-B > - * > - * perf_event_disable_inatomic() > - * @pending_disable = CPU-A; > - * irq_work_queue(); > - * > - * sched-out > - * @pending_disable = -1; > - * > - * sched-in > - * perf_event_disable_inatomic() > - * @pending_disable = CPU-B; > - * irq_work_queue(); // FAILS > - * > - * irq_work_run() > - * perf_pending_event() > - * > - * But the event runs on CPU-B and wants disabling there. > + * Requeue if there's still any pending work left, make sure to follow > + * where the event went. > */ > - irq_work_queue_on(&event->pending, cpu); > + if (event->pending_disable || event->pending_sigtrap) > + irq_work_queue_on(&event->pending, cpu); I considered making the irq_work "chase" the right CPU but it doesn't work for sigtrap. This will make the signal asynchronous (it should be synchronous), and the reason why I had to do irq_work_raw_sync(). Thanks, -- Marco ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-10-04 17:33 ` Marco Elver @ 2022-10-05 7:37 ` Peter Zijlstra 2022-10-05 7:49 ` Marco Elver 2022-10-05 8:23 ` Peter Zijlstra 0 siblings, 2 replies; 27+ messages in thread From: Peter Zijlstra @ 2022-10-05 7:37 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Tue, Oct 04, 2022 at 07:33:55PM +0200, Marco Elver wrote: > It looks reasonable, but obviously needs to pass tests. :-) Ikr :-) > Also, see comment below (I think you're still turning signals > asynchronous, which we shouldn't do). Indeed so; I tried fixing that this morning, but so far that doesn't seem to want to actually cure things :/ I'll need to stomp on this harder. Current hackery below. The main difference is that instead of trying to restart the irq_work on sched_in, sched_out will now queue a task-work. The event scheduling is done from 'regular' IRQ context and as such there should be a return-to-userspace for the relevant task in the immediate future (either directly or after scheduling). Alas, something still isn't right... --- include/linux/perf_event.h | 9 ++-- kernel/events/core.c | 115 ++++++++++++++++++++++++++++----------------- 2 files changed, 79 insertions(+), 45 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 853f64b6c8c2..f15726a6c127 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -756,11 +756,14 @@ struct perf_event { struct fasync_struct *fasync; /* delayed work for NMIs and such */ - int pending_wakeup; - int pending_kill; - int pending_disable; + unsigned int pending_wakeup :1; + unsigned int pending_disable :1; + unsigned int pending_sigtrap :1; + unsigned int pending_kill :3; + unsigned long pending_addr; /* SIGTRAP */ struct irq_work pending; + struct callback_head pending_sig; atomic_t event_limit; diff --git a/kernel/events/core.c b/kernel/events/core.c index b981b879bcd8..e28257fb6f00 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -54,6 +54,7 @@ #include <linux/highmem.h> #include <linux/pgtable.h> #include <linux/buildid.h> +#include <linux/task_work.h> #include "internal.h" @@ -2276,11 +2277,19 @@ event_sched_out(struct perf_event *event, event->pmu->del(event, 0); event->oncpu = -1; - if (READ_ONCE(event->pending_disable) >= 0) { - WRITE_ONCE(event->pending_disable, -1); + if (event->pending_disable) { + event->pending_disable = 0; perf_cgroup_event_disable(event, ctx); state = PERF_EVENT_STATE_OFF; } + + if (event->pending_sigtrap) { + if (state != PERF_EVENT_STATE_OFF) + task_work_add(current, &event->pending_sig, TWA_NONE); + else + event->pending_sigtrap = 0; + } + perf_event_set_state(event, state); if (!is_software_event(event)) @@ -2471,8 +2480,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable); void perf_event_disable_inatomic(struct perf_event *event) { - WRITE_ONCE(event->pending_disable, smp_processor_id()); - /* can fail, see perf_pending_event_disable() */ + event->pending_disable = 1; irq_work_queue(&event->pending); } @@ -6448,47 +6456,40 @@ static void perf_sigtrap(struct perf_event *event) event->attr.type, event->attr.sig_data); } -static void perf_pending_event_disable(struct perf_event *event) +/* + * Deliver the pending work in-event-context or follow the context. + */ +static void __perf_pending_event(struct perf_event *event) { - int cpu = READ_ONCE(event->pending_disable); + int cpu = READ_ONCE(event->oncpu); + /* + * If the event isn't running; we done. event_sched_in() will restart + * the irq_work when needed. + */ if (cpu < 0) return; + /* + * Yay, we hit home and are in the context of the event. + */ if (cpu == smp_processor_id()) { - WRITE_ONCE(event->pending_disable, -1); - - if (event->attr.sigtrap) { + if (event->pending_sigtrap) { + event->pending_sigtrap = 0; perf_sigtrap(event); - atomic_set_release(&event->event_limit, 1); /* rearm event */ - return; } - - perf_event_disable_local(event); - return; + if (event->pending_disable) { + event->pending_disable = 0; + perf_event_disable_local(event); + } } /* - * CPU-A CPU-B - * - * perf_event_disable_inatomic() - * @pending_disable = CPU-A; - * irq_work_queue(); - * - * sched-out - * @pending_disable = -1; - * - * sched-in - * perf_event_disable_inatomic() - * @pending_disable = CPU-B; - * irq_work_queue(); // FAILS - * - * irq_work_run() - * perf_pending_event() - * - * But the event runs on CPU-B and wants disabling there. + * Requeue if there's still any pending work left, make sure to follow + * where the event went. */ - irq_work_queue_on(&event->pending, cpu); + if (event->pending_disable || event->pending_sigtrap) + irq_work_queue_on(&event->pending, cpu); } static void perf_pending_event(struct irq_work *entry) @@ -6496,19 +6497,43 @@ static void perf_pending_event(struct irq_work *entry) struct perf_event *event = container_of(entry, struct perf_event, pending); int rctx; - rctx = perf_swevent_get_recursion_context(); /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. */ + rctx = perf_swevent_get_recursion_context(); - perf_pending_event_disable(event); - + /* + * The wakeup isn't bound to the context of the event -- it can happen + * irrespective of where the event is. + */ if (event->pending_wakeup) { event->pending_wakeup = 0; perf_event_wakeup(event); } + __perf_pending_event(event); + + if (rctx >= 0) + perf_swevent_put_recursion_context(rctx); +} + +static void perf_pending_sig(struct callback_head *head) +{ + struct perf_event *event = container_of(head, struct perf_event, pending_sig); + int rctx; + + /* + * If we 'fail' here, that's OK, it means recursion is already disabled + * and we won't recurse 'further'. + */ + rctx = perf_swevent_get_recursion_context(); + + if (event->pending_sigtrap) { + event->pending_sigtrap = 0; + perf_sigtrap(event); + } + if (rctx >= 0) perf_swevent_put_recursion_context(rctx); } @@ -9227,11 +9252,20 @@ static int __perf_event_overflow(struct perf_event *event, if (events && atomic_dec_and_test(&event->event_limit)) { ret = 1; event->pending_kill = POLL_HUP; - event->pending_addr = data->addr; - perf_event_disable_inatomic(event); } + if (event->attr.sigtrap) { + /* + * Should not be able to return to user space without processing + * pending_sigtrap (kernel events can overflow multiple times). + */ + WARN_ON_ONCE(event->pending_sigtrap && event->attr.exclude_kernel); + event->pending_sigtrap = 1; + event->pending_addr = data->addr; + irq_work_queue(&event->pending); + } + READ_ONCE(event->overflow_handler)(event, data, regs); if (*perf_event_fasync(event) && event->pending_kill) { @@ -11560,8 +11594,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, init_waitqueue_head(&event->waitq); - event->pending_disable = -1; init_irq_work(&event->pending, perf_pending_event); + init_task_work(&event->pending_sig, perf_pending_sig); mutex_init(&event->mmap_mutex); raw_spin_lock_init(&event->addr_filters.lock); @@ -11583,9 +11617,6 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, if (parent_event) event->event_caps = parent_event->event_caps; - if (event->attr.sigtrap) - atomic_set(&event->event_limit, 1); - if (task) { event->attach_state = PERF_ATTACH_TASK; /* ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-10-05 7:37 ` Peter Zijlstra @ 2022-10-05 7:49 ` Marco Elver 2022-10-05 8:23 ` Peter Zijlstra 1 sibling, 0 replies; 27+ messages in thread From: Marco Elver @ 2022-10-05 7:49 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Wed, 5 Oct 2022 at 09:37, Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Oct 04, 2022 at 07:33:55PM +0200, Marco Elver wrote: > > It looks reasonable, but obviously needs to pass tests. :-) > > Ikr :-) > > > Also, see comment below (I think you're still turning signals > > asynchronous, which we shouldn't do). > > Indeed so; I tried fixing that this morning, but so far that doesn't > seem to want to actually cure things :/ I'll need to stomp on this > harder. > > Current hackery below. The main difference is that instead of trying to > restart the irq_work on sched_in, sched_out will now queue a task-work. > > The event scheduling is done from 'regular' IRQ context and as such > there should be a return-to-userspace for the relevant task in the > immediate future (either directly or after scheduling). Does this work if we get a __perf_event_enable() IPI as described in the commit message of the patch I sent? I.e. it does a sched-out immediately followed by a sched-in aka resched; presumably in that case it should still have the irq_work on the same CPU, but the task_work will be a noop? > Alas, something still isn't right... > > --- > include/linux/perf_event.h | 9 ++-- > kernel/events/core.c | 115 ++++++++++++++++++++++++++++----------------- > 2 files changed, 79 insertions(+), 45 deletions(-) > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 853f64b6c8c2..f15726a6c127 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -756,11 +756,14 @@ struct perf_event { > struct fasync_struct *fasync; > > /* delayed work for NMIs and such */ > - int pending_wakeup; > - int pending_kill; > - int pending_disable; > + unsigned int pending_wakeup :1; > + unsigned int pending_disable :1; > + unsigned int pending_sigtrap :1; > + unsigned int pending_kill :3; > + > unsigned long pending_addr; /* SIGTRAP */ > struct irq_work pending; > + struct callback_head pending_sig; > > atomic_t event_limit; > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index b981b879bcd8..e28257fb6f00 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -54,6 +54,7 @@ > #include <linux/highmem.h> > #include <linux/pgtable.h> > #include <linux/buildid.h> > +#include <linux/task_work.h> > > #include "internal.h" > > @@ -2276,11 +2277,19 @@ event_sched_out(struct perf_event *event, > event->pmu->del(event, 0); > event->oncpu = -1; > > - if (READ_ONCE(event->pending_disable) >= 0) { > - WRITE_ONCE(event->pending_disable, -1); > + if (event->pending_disable) { > + event->pending_disable = 0; > perf_cgroup_event_disable(event, ctx); > state = PERF_EVENT_STATE_OFF; > } > + > + if (event->pending_sigtrap) { > + if (state != PERF_EVENT_STATE_OFF) > + task_work_add(current, &event->pending_sig, TWA_NONE); > + else > + event->pending_sigtrap = 0; > + } > + > perf_event_set_state(event, state); > > if (!is_software_event(event)) > @@ -2471,8 +2480,7 @@ EXPORT_SYMBOL_GPL(perf_event_disable); > > void perf_event_disable_inatomic(struct perf_event *event) > { > - WRITE_ONCE(event->pending_disable, smp_processor_id()); > - /* can fail, see perf_pending_event_disable() */ > + event->pending_disable = 1; > irq_work_queue(&event->pending); > } > > @@ -6448,47 +6456,40 @@ static void perf_sigtrap(struct perf_event *event) > event->attr.type, event->attr.sig_data); > } > > -static void perf_pending_event_disable(struct perf_event *event) > +/* > + * Deliver the pending work in-event-context or follow the context. > + */ > +static void __perf_pending_event(struct perf_event *event) > { > - int cpu = READ_ONCE(event->pending_disable); > + int cpu = READ_ONCE(event->oncpu); > > + /* > + * If the event isn't running; we done. event_sched_in() will restart > + * the irq_work when needed. > + */ > if (cpu < 0) > return; > > + /* > + * Yay, we hit home and are in the context of the event. > + */ > if (cpu == smp_processor_id()) { > - WRITE_ONCE(event->pending_disable, -1); > - > - if (event->attr.sigtrap) { > + if (event->pending_sigtrap) { > + event->pending_sigtrap = 0; > perf_sigtrap(event); > - atomic_set_release(&event->event_limit, 1); /* rearm event */ > - return; > } > - > - perf_event_disable_local(event); > - return; > + if (event->pending_disable) { > + event->pending_disable = 0; > + perf_event_disable_local(event); > + } > } > > /* > - * CPU-A CPU-B > - * > - * perf_event_disable_inatomic() > - * @pending_disable = CPU-A; > - * irq_work_queue(); > - * > - * sched-out > - * @pending_disable = -1; > - * > - * sched-in > - * perf_event_disable_inatomic() > - * @pending_disable = CPU-B; > - * irq_work_queue(); // FAILS > - * > - * irq_work_run() > - * perf_pending_event() > - * > - * But the event runs on CPU-B and wants disabling there. > + * Requeue if there's still any pending work left, make sure to follow > + * where the event went. > */ > - irq_work_queue_on(&event->pending, cpu); > + if (event->pending_disable || event->pending_sigtrap) > + irq_work_queue_on(&event->pending, cpu); This probably should not queue an irq_work if pending_sigtrap, given it just doesn't work. It probably should just ignore? Thanks, -- Marco ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse 2022-10-05 7:37 ` Peter Zijlstra 2022-10-05 7:49 ` Marco Elver @ 2022-10-05 8:23 ` Peter Zijlstra 1 sibling, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2022-10-05 8:23 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Wed, Oct 05, 2022 at 09:37:06AM +0200, Peter Zijlstra wrote: > On Tue, Oct 04, 2022 at 07:33:55PM +0200, Marco Elver wrote: > > It looks reasonable, but obviously needs to pass tests. :-) > > Ikr :-) > > > Also, see comment below (I think you're still turning signals > > asynchronous, which we shouldn't do). > > Indeed so; I tried fixing that this morning, but so far that doesn't > seem to want to actually cure things :/ I'll need to stomp on this > harder. > > Current hackery below. The main difference is that instead of trying to > restart the irq_work on sched_in, sched_out will now queue a task-work. > > The event scheduling is done from 'regular' IRQ context and as such > there should be a return-to-userspace for the relevant task in the > immediate future (either directly or after scheduling). > > Alas, something still isn't right... Oh, lol, *groan*... this fixes it: Now to find a sane way to inhibit this while a sig thing is pending :/ diff --git a/kernel/events/core.c b/kernel/events/core.c index b981b879bcd8..92b6a2f6de1a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3426,7 +3434,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn, */ raw_spin_lock(&ctx->lock); raw_spin_lock_nested(&next_ctx->lock, SINGLE_DEPTH_NESTING); - if (context_equiv(ctx, next_ctx)) { + if (0 && context_equiv(ctx, next_ctx)) { WRITE_ONCE(ctx->task, next); WRITE_ONCE(next_ctx->task, task); ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH] perf: Fix missing SIGTRAPs 2022-09-27 12:13 [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse Marco Elver 2022-09-27 12:30 ` Marco Elver 2022-09-27 18:20 ` Peter Zijlstra @ 2022-10-06 13:33 ` Peter Zijlstra 2022-10-06 13:59 ` Marco Elver 2022-10-11 7:44 ` [PATCH v2] " Peter Zijlstra 2 siblings, 2 replies; 27+ messages in thread From: Peter Zijlstra @ 2022-10-06 13:33 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov OK, so the below seems to pass the concurrent sigtrap_threads test for me and doesn't have that horrible irq_work_sync hackery. Does it work for you too? --- Subject: perf: Fix missing SIGTRAPs From: Peter Zijlstra <peterz@infradead.org> Date: Thu Oct 6 15:00:39 CEST 2022 Marco reported: Due to the implementation of how SIGTRAP are delivered if perf_event_attr::sigtrap is set, we've noticed 3 issues: 1. Missing SIGTRAP due to a race with event_sched_out() (more details below). 2. Hardware PMU events being disabled due to returning 1 from perf_event_overflow(). The only way to re-enable the event is for user space to first "properly" disable the event and then re-enable it. 3. The inability to automatically disable an event after a specified number of overflows via PERF_EVENT_IOC_REFRESH. The worst of the 3 issues is problem (1), which occurs when a pending_disable is "consumed" by a racing event_sched_out(), observed as follows: CPU0 | CPU1 --------------------------------+--------------------------- __perf_event_overflow() | perf_event_disable_inatomic() | pending_disable = CPU0 | ... | _perf_event_enable() | event_function_call() | task_function_call() | /* sends IPI to CPU0 */ <IPI> | ... __perf_event_enable() +--------------------------- ctx_resched() task_ctx_sched_out() ctx_sched_out() group_sched_out() event_sched_out() pending_disable = -1 </IPI> <IRQ-work> perf_pending_event() perf_pending_event_disable() /* Fails to send SIGTRAP because no pending_disable! */ </IRQ-work> In the above case, not only is that particular SIGTRAP missed, but also all future SIGTRAPs because 'event_limit' is not reset back to 1. To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction of a separate 'pending_sigtrap', no longer using 'event_limit' and 'pending_disable' for its delivery. Additionally; and different to Marco's proposed patch: - recognise that pending_disable effectively duplicates oncpu for the case where it is set. As such, change the irq_work handler to use ->oncpu to target the event and use pending_* as boolean toggles. - observe that SIGTRAP targets the ctx->task, so the context switch optimization that carries contexts between tasks is invalid. If the irq_work were delayed enough to hit after a context switch the SIGTRAP would be delivered to the wrong task. - observe that if the event gets scheduled out (rotation/migration/context-switch/...) the irq-work would be insufficient to deliver the SIGTRAP when the event gets scheduled back in (the irq-work might still be pending on the old CPU). Therefore have event_sched_out() convert the pending sigtrap into a task_work which will deliver the signal at return_to_user. Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Marco Elver <elver@google.com> Debugged-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/perf_event.h | 19 ++++- kernel/events/core.c | 149 ++++++++++++++++++++++++++++++++------------ kernel/events/ring_buffer.c | 2 3 files changed, 127 insertions(+), 43 deletions(-) --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -736,11 +736,14 @@ struct perf_event { struct fasync_struct *fasync; /* delayed work for NMIs and such */ - int pending_wakeup; - int pending_kill; - int pending_disable; + unsigned int pending_wakeup; + unsigned int pending_kill; + unsigned int pending_disable; + unsigned int pending_sigtrap; unsigned long pending_addr; /* SIGTRAP */ - struct irq_work pending; + struct irq_work pending_irq; + struct callback_head pending_task; + unsigned int pending_work; atomic_t event_limit; @@ -857,6 +860,14 @@ struct perf_event_context { #endif void *task_ctx_data; /* pmu specific data */ struct rcu_head rcu_head; + + /* + * Sum (event->pending_sigtrap + event->pending_work) + * + * The SIGTRAP is targeted at ctx->task, as such it won't do changing + * that until the signal is delivered. + */ + local_t nr_pending; }; /* --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -54,6 +54,7 @@ #include <linux/highmem.h> #include <linux/pgtable.h> #include <linux/buildid.h> +#include <linux/task_work.h> #include "internal.h" @@ -2268,11 +2269,28 @@ event_sched_out(struct perf_event *event event->pmu->del(event, 0); event->oncpu = -1; - if (READ_ONCE(event->pending_disable) >= 0) { - WRITE_ONCE(event->pending_disable, -1); + if (event->pending_disable) { + event->pending_disable = 0; perf_cgroup_event_disable(event, ctx); state = PERF_EVENT_STATE_OFF; } + + if (event->pending_sigtrap) { + event->pending_sigtrap = 0; + if (state == PERF_EVENT_STATE_OFF) { + /* + * If we're racing with disabling the event; consume + * the event to avoid it becoming asynchonous by + * mistake. + */ + local_dec(&event->ctx->nr_pending); + } else { + WARN_ON_ONCE(event->pending_work); + event->pending_work = 1; + task_work_add(current, &event->pending_task, TWA_RESUME); + } + } + perf_event_set_state(event, state); if (!is_software_event(event)) @@ -2424,7 +2442,7 @@ static void __perf_event_disable(struct * hold the top-level event's child_mutex, so any descendant that * goes to exit will block in perf_event_exit_event(). * - * When called from perf_pending_event it's OK because event->ctx + * When called from perf_pending_irq it's OK because event->ctx * is the current context on this CPU and preemption is disabled, * hence we can't get into perf_event_task_sched_out for this context. */ @@ -2463,9 +2481,8 @@ EXPORT_SYMBOL_GPL(perf_event_disable); void perf_event_disable_inatomic(struct perf_event *event) { - WRITE_ONCE(event->pending_disable, smp_processor_id()); - /* can fail, see perf_pending_event_disable() */ - irq_work_queue(&event->pending); + event->pending_disable = 1; + irq_work_queue(&event->pending_irq); } #define MAX_INTERRUPTS (~0ULL) @@ -3420,11 +3437,22 @@ static void perf_event_context_sched_out raw_spin_lock_nested(&next_ctx->lock, SINGLE_DEPTH_NESTING); if (context_equiv(ctx, next_ctx)) { + perf_pmu_disable(pmu); + + /* PMIs are disabled; ctx->nr_pending is stable. */ + if (local_read(&ctx->nr_pending)) { + /* + * Must not swap out ctx when there's pending + * events that rely on the ctx->task relation. + */ + raw_spin_unlock(&next_ctx->lock); + rcu_read_unlock(); + goto inside_switch; + } + WRITE_ONCE(ctx->task, next); WRITE_ONCE(next_ctx->task, task); - perf_pmu_disable(pmu); - if (cpuctx->sched_cb_usage && pmu->sched_task) pmu->sched_task(ctx, false); @@ -3465,6 +3493,7 @@ static void perf_event_context_sched_out raw_spin_lock(&ctx->lock); perf_pmu_disable(pmu); +inside_switch: if (cpuctx->sched_cb_usage && pmu->sched_task) pmu->sched_task(ctx, false); task_ctx_sched_out(cpuctx, ctx, EVENT_ALL); @@ -4931,7 +4960,7 @@ static void perf_addr_filters_splice(str static void _free_event(struct perf_event *event) { - irq_work_sync(&event->pending); + irq_work_sync(&event->pending_irq); unaccount_event(event); @@ -6431,7 +6460,7 @@ static void perf_sigtrap(struct perf_eve return; /* - * perf_pending_event() can race with the task exiting. + * perf_pending_irq() can race with the task exiting. */ if (current->flags & PF_EXITING) return; @@ -6440,23 +6469,33 @@ static void perf_sigtrap(struct perf_eve event->attr.type, event->attr.sig_data); } -static void perf_pending_event_disable(struct perf_event *event) +/* + * Deliver the pending work in-event-context or follow the context. + */ +static void __perf_pending_irq(struct perf_event *event) { - int cpu = READ_ONCE(event->pending_disable); + int cpu = READ_ONCE(event->oncpu); + /* + * If the event isn't running; we done. event_sched_out() will have + * taken care of things. + */ if (cpu < 0) return; + /* + * Yay, we hit home and are in the context of the event. + */ if (cpu == smp_processor_id()) { - WRITE_ONCE(event->pending_disable, -1); - - if (event->attr.sigtrap) { + if (event->pending_sigtrap) { + event->pending_sigtrap = 0; + local_dec(&event->ctx->nr_pending); perf_sigtrap(event); - atomic_set_release(&event->event_limit, 1); /* rearm event */ - return; } - - perf_event_disable_local(event); + if (event->pending_disable) { + event->pending_disable = 0; + perf_event_disable_local(event); + } return; } @@ -6476,31 +6515,56 @@ static void perf_pending_event_disable(s * irq_work_queue(); // FAILS * * irq_work_run() - * perf_pending_event() + * perf_pending_irq() * * But the event runs on CPU-B and wants disabling there. */ - irq_work_queue_on(&event->pending, cpu); + irq_work_queue_on(&event->pending_irq, cpu); } -static void perf_pending_event(struct irq_work *entry) +static void perf_pending_irq(struct irq_work *entry) { - struct perf_event *event = container_of(entry, struct perf_event, pending); + struct perf_event *event = container_of(entry, struct perf_event, pending_irq); int rctx; - rctx = perf_swevent_get_recursion_context(); /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. */ + rctx = perf_swevent_get_recursion_context(); - perf_pending_event_disable(event); - + /* + * The wakeup isn't bound to the context of the event -- it can happen + * irrespective of where the event is. + */ if (event->pending_wakeup) { event->pending_wakeup = 0; perf_event_wakeup(event); } + __perf_pending_irq(event); + + if (rctx >= 0) + perf_swevent_put_recursion_context(rctx); +} + +static void perf_pending_task(struct callback_head *head) +{ + struct perf_event *event = container_of(head, struct perf_event, pending_task); + int rctx; + + /* + * If we 'fail' here, that's OK, it means recursion is already disabled + * and we won't recurse 'further'. + */ + rctx = perf_swevent_get_recursion_context(); + + if (event->pending_work) { + event->pending_work = 0; + local_dec(&event->ctx->nr_pending); + perf_sigtrap(event); + } + if (rctx >= 0) perf_swevent_put_recursion_context(rctx); } @@ -9179,8 +9243,8 @@ int perf_event_account_interrupt(struct */ static int __perf_event_overflow(struct perf_event *event, - int throttle, struct perf_sample_data *data, - struct pt_regs *regs) + int throttle, struct perf_sample_data *data, + struct pt_regs *regs) { int events = atomic_read(&event->event_limit); int ret = 0; @@ -9203,24 +9267,36 @@ static int __perf_event_overflow(struct if (events && atomic_dec_and_test(&event->event_limit)) { ret = 1; event->pending_kill = POLL_HUP; - event->pending_addr = data->addr; - perf_event_disable_inatomic(event); } + if (event->attr.sigtrap) { + /* + * Should not be able to return to user space without processing + * pending_sigtrap (kernel events can overflow multiple times). + */ + WARN_ON_ONCE(event->pending_sigtrap && event->attr.exclude_kernel); + if (!event->pending_sigtrap) { + event->pending_sigtrap = 1; + local_inc(&event->ctx->nr_pending); + } + event->pending_addr = data->addr; + irq_work_queue(&event->pending_irq); + } + READ_ONCE(event->overflow_handler)(event, data, regs); if (*perf_event_fasync(event) && event->pending_kill) { event->pending_wakeup = 1; - irq_work_queue(&event->pending); + irq_work_queue(&event->pending_irq); } return ret; } int perf_event_overflow(struct perf_event *event, - struct perf_sample_data *data, - struct pt_regs *regs) + struct perf_sample_data *data, + struct pt_regs *regs) { return __perf_event_overflow(event, 1, data, regs); } @@ -11528,8 +11604,8 @@ perf_event_alloc(struct perf_event_attr init_waitqueue_head(&event->waitq); - event->pending_disable = -1; - init_irq_work(&event->pending, perf_pending_event); + init_irq_work(&event->pending_irq, perf_pending_irq); + init_task_work(&event->pending_task, perf_pending_task); mutex_init(&event->mmap_mutex); raw_spin_lock_init(&event->addr_filters.lock); @@ -11551,9 +11627,6 @@ perf_event_alloc(struct perf_event_attr if (parent_event) event->event_caps = parent_event->event_caps; - if (event->attr.sigtrap) - atomic_set(&event->event_limit, 1); - if (task) { event->attach_state = PERF_ATTACH_TASK; /* --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -22,7 +22,7 @@ static void perf_output_wakeup(struct pe atomic_set(&handle->rb->poll, EPOLLIN); handle->event->pending_wakeup = 1; - irq_work_queue(&handle->event->pending); + irq_work_queue(&handle->event->pending_irq); } /* ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-06 13:33 ` [PATCH] perf: Fix missing SIGTRAPs Peter Zijlstra @ 2022-10-06 13:59 ` Marco Elver 2022-10-06 16:02 ` Peter Zijlstra 2022-10-11 7:44 ` [PATCH v2] " Peter Zijlstra 1 sibling, 1 reply; 27+ messages in thread From: Marco Elver @ 2022-10-06 13:59 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Thu, Oct 06, 2022 at 03:33PM +0200, Peter Zijlstra wrote: > > OK, so the below seems to pass the concurrent sigtrap_threads test for > me and doesn't have that horrible irq_work_sync hackery. > > Does it work for you too? I'm getting this (courtesy of syzkaller): | BUG: using smp_processor_id() in preemptible [00000000] code: syz-executor.8/22848 | caller is perf_swevent_get_recursion_context+0x13/0x80 | CPU: 0 PID: 22860 Comm: syz-executor.6 Not tainted 6.0.0-rc3-00017-g1472d7e42f41 #64 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 | Call Trace: | <TASK> | dump_stack_lvl+0x6a/0x86 | check_preemption_disabled+0xdf/0xf0 | perf_swevent_get_recursion_context+0x13/0x80 | perf_pending_task+0xf/0x80 | task_work_run+0x73/0xc0 | do_exit+0x459/0xf20 | do_group_exit+0x3f/0xe0 | get_signal+0xe04/0xe60 | arch_do_signal_or_restart+0x3f/0x780 | exit_to_user_mode_prepare+0x135/0x1a0 | irqentry_exit_to_user_mode+0x6/0x30 | asm_sysvec_irq_work+0x16/0x20 That one I could fix up with: | diff --git a/kernel/events/core.c b/kernel/events/core.c | index 9319af6013f1..2f1d51b50be7 100644 | --- a/kernel/events/core.c | +++ b/kernel/events/core.c | @@ -6563,6 +6563,7 @@ static void perf_pending_task(struct callback_head *head) | * If we 'fail' here, that's OK, it means recursion is already disabled | * and we won't recurse 'further'. | */ | + preempt_disable_notrace(); | rctx = perf_swevent_get_recursion_context(); | | if (event->pending_work) { | @@ -6573,6 +6574,7 @@ static void perf_pending_task(struct callback_head *head) | | if (rctx >= 0) | perf_swevent_put_recursion_context(rctx); | + preempt_enable_notrace(); | } | | #ifdef CONFIG_GUEST_PERF_EVENTS But following that, I get: | ====================================================== | WARNING: possible circular locking dependency detected | 6.0.0-rc3-00017-g1472d7e42f41-dirty #65 Not tainted | ------------------------------------------------------ | syz-executor.11/13018 is trying to acquire lock: | ffffffffbb754a18 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0xa/0x30 kernel/locking/semaphore.c:139 | | but task is already holding lock: | ffff8ea992e00e20 (&ctx->lock){-.-.}-{2:2}, at: perf_event_context_sched_out kernel/events/core.c:3499 [inline] | ffff8ea992e00e20 (&ctx->lock){-.-.}-{2:2}, at: __perf_event_task_sched_out+0x29e/0xb50 kernel/events/core.c:3608 | | which lock already depends on the new lock. | | << snip ... lockdep unhappy we're trying to WARN >> | | WARNING: CPU: 3 PID: 13018 at kernel/events/core.c:2288 event_sched_out+0x3f2/0x410 kernel/events/core.c:2288 | Modules linked in: | CPU: 3 PID: 13018 Comm: syz-executor.11 Not tainted 6.0.0-rc3-00017-g1472d7e42f41-dirty #65 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 | RIP: 0010:event_sched_out+0x3f2/0x410 kernel/events/core.c:2288 | Code: ff ff e8 21 b2 f9 ff 65 8b 05 76 67 7f 46 85 c0 0f 84 0f ff ff ff e8 0d b2 f9 ff 90 0f 0b 90 e9 01 ff ff ff e8 ff b1 f9 ff 90 <0f> 0b 90 e9 3b fe ff ff e8 f1 b1 f9 ff 90 0f 0b 90 e9 01 ff ff ff | RSP: 0018:ffffa69c8931f9a8 EFLAGS: 00010012 | RAX: 0000000040000000 RBX: ffff8ea99526f1c8 RCX: ffffffffb9824d01 | RDX: ffff8ea9934a0040 RSI: 0000000000000000 RDI: ffff8ea99526f1c8 | RBP: ffff8ea992e00e00 R08: 0000000000000000 R09: 0000000000000000 | R10: 00000000820822cd R11: 00000000d820822c R12: ffff8eacafcf2e50 | R13: ffff8eacafcf2e58 R14: ffffffffbb62e9a0 R15: ffff8ea992e00ef8 | FS: 00007fdcddbb6640(0000) GS:ffff8eacafcc0000(0000) knlGS:0000000000000000 | CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 | CR2: 00007fdcddbb5fa8 CR3: 0000000112846004 CR4: 0000000000770ee0 | DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 | DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 | PKRU: 55555554 | Call Trace: | <TASK> | group_sched_out.part.0+0x5c/0xe0 kernel/events/core.c:2320 | group_sched_out kernel/events/core.c:2315 [inline] | ctx_sched_out+0x35d/0x3c0 kernel/events/core.c:3288 | task_ctx_sched_out+0x3d/0x60 kernel/events/core.c:2657 | perf_event_context_sched_out kernel/events/core.c:3505 [inline] | __perf_event_task_sched_out+0x31b/0xb50 kernel/events/core.c:3608 | perf_event_task_sched_out include/linux/perf_event.h:1266 [inline] | prepare_task_switch kernel/sched/core.c:4992 [inline] | context_switch kernel/sched/core.c:5134 [inline] | __schedule+0x4f8/0xb20 kernel/sched/core.c:6494 | preempt_schedule_irq+0x39/0x70 kernel/sched/core.c:6806 | irqentry_exit+0x32/0x90 kernel/entry/common.c:428 | asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:649 | RIP: 0010:try_to_freeze include/linux/freezer.h:66 [inline] | RIP: 0010:freezer_count include/linux/freezer.h:128 [inline] | RIP: 0010:coredump_wait fs/coredump.c:407 [inline] | RIP: 0010:do_coredump+0x1193/0x1b60 fs/coredump.c:563 | Code: d3 25 df ff 83 e3 08 0f 84 f0 03 00 00 e8 c5 25 df ff 48 f7 85 88 fe ff ff 00 01 00 00 0f 85 7e fc ff ff 31 db e9 83 fc ff ff <e8> a8 25 df ff e8 63 43 d3 ff e9 d2 f1 ff ff e8 99 25 df ff 48 85 | RSP: 0018:ffffa69c8931fc30 EFLAGS: 00000246 | RAX: 7fffffffffffffff RBX: ffff8ea9934a0040 RCX: 0000000000000000 | RDX: 0000000000000001 RSI: ffffffffbb4ab491 RDI: 00000000ffffffff | RBP: ffffa69c8931fdc0 R08: 0000000000000001 R09: 0000000000000001 | R10: 00000000ffffffff R11: 00000000ffffffff R12: ffff8ea9934a0040 | R13: ffffffffbb792620 R14: 0000000000000108 R15: 0000000000000001 | get_signal+0xe56/0xe60 kernel/signal.c:2843 | arch_do_signal_or_restart+0x3f/0x780 arch/x86/kernel/signal.c:869 | exit_to_user_mode_loop kernel/entry/common.c:166 [inline] | exit_to_user_mode_prepare+0x135/0x1a0 kernel/entry/common.c:201 | __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] | syscall_exit_to_user_mode+0x1a/0x50 kernel/entry/common.c:294 | do_syscall_64+0x48/0x90 arch/x86/entry/common.c:86 | entry_SYSCALL_64_after_hwframe+0x64/0xce | RIP: 0033:0x7fdcddc48549 | Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 | RSP: 002b:00007fdcddbb60f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca | RAX: 0000000000000001 RBX: 00007fdcddd62f88 RCX: 00007fdcddc48549 | RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007fdcddd62f8c | RBP: 00007fdcddd62f80 R08: 000000000000001e R09: 0000000000000000 | R10: 0000000000000003 R11: 0000000000000246 R12: 00007fdcddd62f8c | R13: 00007ffc4136118f R14: 0000000000000000 R15: 00007fdcddbb6640 | </TASK> | irq event stamp: 128 | hardirqs last enabled at (127): [<ffffffffba9f7237>] irqentry_exit+0x37/0x90 kernel/entry/common.c:431 | hardirqs last disabled at (128): [<ffffffffba9faa5b>] __schedule+0x6cb/0xb20 kernel/sched/core.c:6393 | softirqs last enabled at (106): [<ffffffffbae0034f>] softirq_handle_end kernel/softirq.c:414 [inline] | softirqs last enabled at (106): [<ffffffffbae0034f>] __do_softirq+0x34f/0x4d5 kernel/softirq.c:600 | softirqs last disabled at (99): [<ffffffffb9693821>] invoke_softirq kernel/softirq.c:445 [inline] | softirqs last disabled at (99): [<ffffffffb9693821>] __irq_exit_rcu+0xb1/0x120 kernel/softirq.c:650 | ---[ end trace 0000000000000000 ]--- | BUG: kernel NULL pointer dereference, address: 0000000000000000 | #PF: supervisor instruction fetch in kernel mode | #PF: error_code(0x0010) - not-present page | PGD 8000000112e91067 P4D 8000000112e91067 PUD 114481067 PMD 0 | Oops: 0010 [#1] PREEMPT SMP PTI | CPU: 1 PID: 13018 Comm: syz-executor.11 Tainted: G W 6.0.0-rc3-00017-g1472d7e42f41-dirty #65 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 | RIP: 0010:0x0 | Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. | RSP: 0018:ffffa69c8931fd18 EFLAGS: 00010293 | RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb96be917 | RDX: ffff8ea9934a0040 RSI: 0000000000000000 RDI: ffff8ea99526f620 | RBP: ffff8ea9934a0040 R08: 0000000000000000 R09: 0000000000000000 | R10: 0000000000000001 R11: ffffffffb9cb7eaf R12: ffff8ea9934a08f0 | R13: ffff8ea992fc9cf8 R14: ffff8ea98c65dec0 R15: ffff8ea9934a0828 | FS: 0000000000000000(0000) GS:ffff8eacafc40000(0000) knlGS:0000000000000000 | CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 | CR2: ffffffffffffffd6 CR3: 000000011343a003 CR4: 0000000000770ee0 | DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 | DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 | PKRU: 55555554 | Call Trace: | <TASK> | task_work_run+0x73/0xc0 kernel/task_work.c:177 | exit_task_work include/linux/task_work.h:38 [inline] | do_exit+0x459/0xf20 kernel/exit.c:795 | do_group_exit+0x3f/0xe0 kernel/exit.c:925 | get_signal+0xe04/0xe60 kernel/signal.c:2857 | arch_do_signal_or_restart+0x3f/0x780 arch/x86/kernel/signal.c:869 | exit_to_user_mode_loop kernel/entry/common.c:166 [inline] | exit_to_user_mode_prepare+0x135/0x1a0 kernel/entry/common.c:201 | __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline] | syscall_exit_to_user_mode+0x1a/0x50 kernel/entry/common.c:294 | do_syscall_64+0x48/0x90 arch/x86/entry/common.c:86 | entry_SYSCALL_64_after_hwframe+0x64/0xce | RIP: 0033:0x7fdcddc48549 | Code: Unable to access opcode bytes at RIP 0x7fdcddc4851f. | RSP: 002b:00007fdcddbb60f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca | RAX: 0000000000000001 RBX: 00007fdcddd62f88 RCX: 00007fdcddc48549 | RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007fdcddd62f8c | RBP: 00007fdcddd62f80 R08: 000000000000001e R09: 0000000000000000 | R10: 0000000000000003 R11: 0000000000000246 R12: 00007fdcddd62f8c | R13: 00007ffc4136118f R14: 0000000000000000 R15: 00007fdcddbb6640 | </TASK> | Modules linked in: | CR2: 0000000000000000 | ---[ end trace 0000000000000000 ]--- | RIP: 0010:0x0 | Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. | RSP: 0018:ffffa69c8931fd18 EFLAGS: 00010293 | RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffb96be917 | RDX: ffff8ea9934a0040 RSI: 0000000000000000 RDI: ffff8ea99526f620 | RBP: ffff8ea9934a0040 R08: 0000000000000000 R09: 0000000000000000 | R10: 0000000000000001 R11: ffffffffb9cb7eaf R12: ffff8ea9934a08f0 | R13: ffff8ea992fc9cf8 R14: ffff8ea98c65dec0 R15: ffff8ea9934a0828 | FS: 0000000000000000(0000) GS:ffff8eacafc40000(0000) knlGS:0000000000000000 | CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 | CR2: ffffffffffffffd6 CR3: 000000011343a003 CR4: 0000000000770ee0 | DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 | DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 | PKRU: 55555554 | ---------------- | Code disassembly (best guess): | 0: d3 25 df ff 83 e3 shll %cl,-0x1c7c0021(%rip) # 0xe383ffe5 | 6: 08 0f or %cl,(%rdi) | 8: 84 f0 test %dh,%al | a: 03 00 add (%rax),%eax | c: 00 e8 add %ch,%al | e: c5 25 df ff vpandn %ymm7,%ymm11,%ymm15 | 12: 48 f7 85 88 fe ff ff testq $0x100,-0x178(%rbp) | 19: 00 01 00 00 | 1d: 0f 85 7e fc ff ff jne 0xfffffca1 | 23: 31 db xor %ebx,%ebx | 25: e9 83 fc ff ff jmp 0xfffffcad | * 2a: e8 a8 25 df ff call 0xffdf25d7 <-- trapping instruction | 2f: e8 63 43 d3 ff call 0xffd34397 | 34: e9 d2 f1 ff ff jmp 0xfffff20b | 39: e8 99 25 df ff call 0xffdf25d7 | 3e: 48 rex.W | 3f: 85 .byte 0x85 So something isn't quite right yet. Unfortunately I don't have a good reproducer. :-/ Thanks, -- Marco ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-06 13:59 ` Marco Elver @ 2022-10-06 16:02 ` Peter Zijlstra 2022-10-07 9:37 ` Marco Elver 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2022-10-06 16:02 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Thu, Oct 06, 2022 at 03:59:55PM +0200, Marco Elver wrote: > That one I could fix up with: > > | diff --git a/kernel/events/core.c b/kernel/events/core.c > | index 9319af6013f1..2f1d51b50be7 100644 > | --- a/kernel/events/core.c > | +++ b/kernel/events/core.c > | @@ -6563,6 +6563,7 @@ static void perf_pending_task(struct callback_head *head) > | * If we 'fail' here, that's OK, it means recursion is already disabled > | * and we won't recurse 'further'. > | */ > | + preempt_disable_notrace(); > | rctx = perf_swevent_get_recursion_context(); > | > | if (event->pending_work) { > | @@ -6573,6 +6574,7 @@ static void perf_pending_task(struct callback_head *head) > | > | if (rctx >= 0) > | perf_swevent_put_recursion_context(rctx); > | + preempt_enable_notrace(); > | } > | > | #ifdef CONFIG_GUEST_PERF_EVENTS Right, thanks! It appears I only have lockdep enabled but not the preempt warning :/ > But following that, I get: > > | WARNING: CPU: 3 PID: 13018 at kernel/events/core.c:2288 event_sched_out+0x3f2/0x410 kernel/events/core.c:2288 I'm taking this is (my line numbers are slightly different): WARN_ON_ONCE(event->pending_work); > So something isn't quite right yet. Unfortunately I don't have a good > reproducer. :-/ This can happen if we get two consecutive event_sched_out() and both instances will have pending_sigtrap set. This can happen when the event that has sigtrap set also triggers in kernel space. You then get task_work list corruption and *boom*. I'm thinking the below might be the simplest solution; we can only send a single signal after all. --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2293,9 +2293,10 @@ event_sched_out(struct perf_event *event */ local_dec(&event->ctx->nr_pending); } else { - WARN_ON_ONCE(event->pending_work); - event->pending_work = 1; - task_work_add(current, &event->pending_task, TWA_RESUME); + if (!event->pending_work) { + event->pending_work = 1; + task_work_add(current, &event->pending_task, TWA_RESUME); + } } } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-06 16:02 ` Peter Zijlstra @ 2022-10-07 9:37 ` Marco Elver 2022-10-07 13:09 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Marco Elver @ 2022-10-07 9:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Thu, Oct 06, 2022 at 06:02PM +0200, Peter Zijlstra wrote: > This can happen if we get two consecutive event_sched_out() and both > instances will have pending_sigtrap set. This can happen when the event > that has sigtrap set also triggers in kernel space. > > You then get task_work list corruption and *boom*. > > I'm thinking the below might be the simplest solution; we can only send > a single signal after all. That worked. In addition I had to disable the ctx->task != current check if we're in task_work, because presumably the event might have already been disabled/moved?? At least with all the below fixups, things seem to work (tests + light fuzzing). Thanks, -- Marco ------ >8 ------ diff --git a/kernel/events/core.c b/kernel/events/core.c index 9319af6013f1..29ed6e58906b 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2285,9 +2285,10 @@ event_sched_out(struct perf_event *event, */ local_dec(&event->ctx->nr_pending); } else { - WARN_ON_ONCE(event->pending_work); - event->pending_work = 1; - task_work_add(current, &event->pending_task, TWA_RESUME); + if (!event->pending_work) { + event->pending_work = 1; + task_work_add(current, &event->pending_task, TWA_RESUME); + } } } @@ -6455,18 +6456,19 @@ void perf_event_wakeup(struct perf_event *event) } } -static void perf_sigtrap(struct perf_event *event) +static void perf_sigtrap(struct perf_event *event, bool in_task_work) { /* * We'd expect this to only occur if the irq_work is delayed and either * ctx->task or current has changed in the meantime. This can be the * case on architectures that do not implement arch_irq_work_raise(). */ - if (WARN_ON_ONCE(event->ctx->task != current)) + if (WARN_ON_ONCE(!in_task_work && event->ctx->task != current)) return; /* - * perf_pending_irq() can race with the task exiting. + * Both perf_pending_task() and perf_pending_irq() can race with the + * task exiting. */ if (current->flags & PF_EXITING) return; @@ -6496,7 +6498,7 @@ static void __perf_pending_irq(struct perf_event *event) if (event->pending_sigtrap) { event->pending_sigtrap = 0; local_dec(&event->ctx->nr_pending); - perf_sigtrap(event); + perf_sigtrap(event, false); } if (event->pending_disable) { event->pending_disable = 0; @@ -6563,16 +6565,18 @@ static void perf_pending_task(struct callback_head *head) * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. */ + preempt_disable_notrace(); rctx = perf_swevent_get_recursion_context(); if (event->pending_work) { event->pending_work = 0; local_dec(&event->ctx->nr_pending); - perf_sigtrap(event); + perf_sigtrap(event, true); } if (rctx >= 0) perf_swevent_put_recursion_context(rctx); + preempt_enable_notrace(); } #ifdef CONFIG_GUEST_PERF_EVENTS ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-07 9:37 ` Marco Elver @ 2022-10-07 13:09 ` Peter Zijlstra 2022-10-07 13:58 ` Marco Elver 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2022-10-07 13:09 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Fri, Oct 07, 2022 at 11:37:34AM +0200, Marco Elver wrote: > That worked. In addition I had to disable the ctx->task != current check > if we're in task_work, because presumably the event might have already > been disabled/moved?? Uhmmm... uhhh... damn. (wall-time was significantly longer) Does this help? --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -6490,8 +6490,8 @@ static void __perf_pending_irq(struct pe if (cpu == smp_processor_id()) { if (event->pending_sigtrap) { event->pending_sigtrap = 0; - local_dec(&event->ctx->nr_pending); perf_sigtrap(event); + local_dec(&event->ctx->nr_pending); } if (event->pending_disable) { event->pending_disable = 0; @@ -6563,8 +6563,8 @@ static void perf_pending_task(struct cal if (event->pending_work) { event->pending_work = 0; - local_dec(&event->ctx->nr_pending); perf_sigtrap(event); + local_dec(&event->ctx->nr_pending); } if (rctx >= 0) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-07 13:09 ` Peter Zijlstra @ 2022-10-07 13:58 ` Marco Elver 2022-10-07 16:14 ` Marco Elver 2022-10-08 13:51 ` Peter Zijlstra 0 siblings, 2 replies; 27+ messages in thread From: Marco Elver @ 2022-10-07 13:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Fri, Oct 07, 2022 at 03:09PM +0200, Peter Zijlstra wrote: > On Fri, Oct 07, 2022 at 11:37:34AM +0200, Marco Elver wrote: > > > That worked. In addition I had to disable the ctx->task != current check > > if we're in task_work, because presumably the event might have already > > been disabled/moved?? > > Uhmmm... uhhh... damn. (wall-time was significantly longer) > > Does this help? No unfortunately - still see: [ 82.300827] ------------[ cut here ]------------ [ 82.301680] WARNING: CPU: 0 PID: 976 at kernel/events/core.c:6466 perf_sigtrap+0x60/0x70 [ 82.303069] Modules linked in: [ 82.303524] CPU: 0 PID: 976 Comm: missed_breakpoi Not tainted 6.0.0-rc3-00017-g1472d7e42f41-dirty #68 [ 82.304825] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 [ 82.306204] RIP: 0010:perf_sigtrap+0x60/0x70 [ 82.306858] Code: e6 59 fa ff 48 8b 93 50 01 00 00 8b b3 d8 00 00 00 48 8b bb 30 04 00 00 e8 dd cf e8 ff 5b 5d e9 c6 59 fa ff e8 c1 59 fa ff 90 <0f> 0b 90 5b 5d e9 b6 59 fa ff 66 0f 1f 44 00 00 e8 ab 59 fa ff bf [ 82.309515] RSP: 0000:ffffa52041cbbee0 EFLAGS: 00010293 [ 82.310295] RAX: 0000000000000000 RBX: ffff902fc966a228 RCX: ffffffffa761a53f [ 82.311336] RDX: ffff902fca39c340 RSI: 0000000000000000 RDI: ffff902fc966a228 [ 82.312376] RBP: ffff902fca39c340 R08: 0000000000000001 R09: 0000000000000001 [ 82.313412] R10: 00000000ffffffff R11: 00000000ffffffff R12: ffff902fca39cbf0 [ 82.314456] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 82.315561] FS: 00007fbae0636700(0000) GS:ffff9032efc00000(0000) knlGS:0000000000000000 [ 82.316815] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 82.317708] CR2: 000000001082d318 CR3: 0000000109430002 CR4: 0000000000770ef0 [ 82.318839] DR0: 00000000008aca98 DR1: 00000000008acb38 DR2: 00000000008acae8 [ 82.319955] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 [ 82.321068] PKRU: 55555554 [ 82.321505] Call Trace: [ 82.321913] <TASK> [ 82.322267] perf_pending_task+0x7d/0xa0 [ 82.322900] task_work_run+0x73/0xc0 [ 82.323476] exit_to_user_mode_prepare+0x19d/0x1a0 [ 82.324209] irqentry_exit_to_user_mode+0x6/0x30 [ 82.324887] asm_sysvec_call_function_single+0x16/0x20 [ 82.325623] RIP: 0033:0x27d10b [ 82.326092] Code: 43 08 48 8d 04 80 48 c1 e0 04 48 8d 0d 5e f9 62 00 48 01 c8 48 83 c0 08 b9 01 00 00 00 66 90 48 8b 10 48 39 ca 75 f8 88 48 41 <f0> 48 ff 40 08 48 8b 50 10 48 39 ca 75 f7 88 48 43 f0 48 ff 40 18 [ 82.328696] RSP: 002b:00007fbae0635a60 EFLAGS: 00000246 [ 82.329470] RAX: 00000000008acaa8 RBX: 000024073fc007d0 RCX: 0000000000001add [ 82.330521] RDX: 0000000000001add RSI: 0000000000000070 RDI: 0000000000000007 [ 82.331557] RBP: 00007fbae0635a70 R08: 00007fbae0636700 R09: 00007fbae0636700 [ 82.332593] R10: 00007fbae06369d0 R11: 0000000000000202 R12: 00007fbae06369d0 [ 82.333630] R13: 00007ffe8139de16 R14: 00007fbae0636d1c R15: 00007fbae0635a80 [ 82.334713] </TASK> [ 82.335093] irq event stamp: 546455 [ 82.335657] hardirqs last enabled at (546465): [<ffffffffa7513ef6>] __up_console_sem+0x66/0x70 [ 82.337032] hardirqs last disabled at (546476): [<ffffffffa7513edb>] __up_console_sem+0x4b/0x70 [ 82.338414] softirqs last enabled at (546084): [<ffffffffa8c0034f>] __do_softirq+0x34f/0x4d5 [ 82.339769] softirqs last disabled at (546079): [<ffffffffa7493821>] __irq_exit_rcu+0xb1/0x120 [ 82.341128] ---[ end trace 0000000000000000 ]--- I now have this on top: diff --git a/kernel/events/core.c b/kernel/events/core.c index 9319af6013f1..7de83c42d312 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2285,9 +2285,10 @@ event_sched_out(struct perf_event *event, */ local_dec(&event->ctx->nr_pending); } else { - WARN_ON_ONCE(event->pending_work); - event->pending_work = 1; - task_work_add(current, &event->pending_task, TWA_RESUME); + if (!event->pending_work) { + event->pending_work = 1; + task_work_add(current, &event->pending_task, TWA_RESUME); + } } } @@ -6466,7 +6467,8 @@ static void perf_sigtrap(struct perf_event *event) return; /* - * perf_pending_irq() can race with the task exiting. + * Both perf_pending_task() and perf_pending_irq() can race with the + * task exiting. */ if (current->flags & PF_EXITING) return; @@ -6495,8 +6497,8 @@ static void __perf_pending_irq(struct perf_event *event) if (cpu == smp_processor_id()) { if (event->pending_sigtrap) { event->pending_sigtrap = 0; - local_dec(&event->ctx->nr_pending); perf_sigtrap(event); + local_dec(&event->ctx->nr_pending); } if (event->pending_disable) { event->pending_disable = 0; @@ -6563,16 +6565,18 @@ static void perf_pending_task(struct callback_head *head) * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. */ + preempt_disable_notrace(); rctx = perf_swevent_get_recursion_context(); if (event->pending_work) { event->pending_work = 0; - local_dec(&event->ctx->nr_pending); perf_sigtrap(event); + local_dec(&event->ctx->nr_pending); } if (rctx >= 0) perf_swevent_put_recursion_context(rctx); + preempt_enable_notrace(); } #ifdef CONFIG_GUEST_PERF_EVENTS I'm throwing more WARN_ON()s at it to see what's going on... ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-07 13:58 ` Marco Elver @ 2022-10-07 16:14 ` Marco Elver 2022-10-08 8:41 ` Marco Elver 2022-10-08 13:51 ` Peter Zijlstra 1 sibling, 1 reply; 27+ messages in thread From: Marco Elver @ 2022-10-07 16:14 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Fri, Oct 07, 2022 at 03:58PM +0200, Marco Elver wrote: > On Fri, Oct 07, 2022 at 03:09PM +0200, Peter Zijlstra wrote: > > On Fri, Oct 07, 2022 at 11:37:34AM +0200, Marco Elver wrote: > > > > > That worked. In addition I had to disable the ctx->task != current check > > > if we're in task_work, because presumably the event might have already > > > been disabled/moved?? > > > > Uhmmm... uhhh... damn. (wall-time was significantly longer) > > > > Does this help? > > No unfortunately - still see: > > [ 82.300827] ------------[ cut here ]------------ > [ 82.301680] WARNING: CPU: 0 PID: 976 at kernel/events/core.c:6466 perf_sigtrap+0x60/0x70 Whenever the warning fires, I see that event->state is OFF. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-07 16:14 ` Marco Elver @ 2022-10-08 8:41 ` Marco Elver 2022-10-08 12:41 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Marco Elver @ 2022-10-08 8:41 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Fri, Oct 07, 2022 at 06:14PM +0200, Marco Elver wrote: > On Fri, Oct 07, 2022 at 03:58PM +0200, Marco Elver wrote: > > On Fri, Oct 07, 2022 at 03:09PM +0200, Peter Zijlstra wrote: > > > On Fri, Oct 07, 2022 at 11:37:34AM +0200, Marco Elver wrote: > > > > > > > That worked. In addition I had to disable the ctx->task != current check > > > > if we're in task_work, because presumably the event might have already > > > > been disabled/moved?? > > > > > > Uhmmm... uhhh... damn. (wall-time was significantly longer) > > > > > > Does this help? > > > > No unfortunately - still see: > > > > [ 82.300827] ------------[ cut here ]------------ > > [ 82.301680] WARNING: CPU: 0 PID: 976 at kernel/events/core.c:6466 perf_sigtrap+0x60/0x70 > > Whenever the warning fires, I see that event->state is OFF. The below patch to the sigtrap_threads test can repro the issue (when run lots of them concurrently again). It also illustrates the original problem we're trying to solve, where the event never gets rearmed again and the test times out (doesn't happen with the almost-working fix). Thanks, -- Marco ------ >8 ------ From 98d225bda6d94dd793a1d0c77ae4b301c364166e Mon Sep 17 00:00:00 2001 From: Marco Elver <elver@google.com> Date: Sat, 8 Oct 2022 10:26:58 +0200 Subject: [PATCH] selftests/perf_events: Add a SIGTRAP stress test with disables Add a SIGTRAP stress test that exercises repeatedly enabling/disabling an event while it concurrently keeps firing. Signed-off-by: Marco Elver <elver@google.com> --- .../selftests/perf_events/sigtrap_threads.c | 35 +++++++++++++++++-- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/perf_events/sigtrap_threads.c b/tools/testing/selftests/perf_events/sigtrap_threads.c index 6d849dc2bee0..d1d8483ac628 100644 --- a/tools/testing/selftests/perf_events/sigtrap_threads.c +++ b/tools/testing/selftests/perf_events/sigtrap_threads.c @@ -62,6 +62,8 @@ static struct perf_event_attr make_event_attr(bool enabled, volatile void *addr, .remove_on_exec = 1, /* Required by sigtrap. */ .sigtrap = 1, /* Request synchronous SIGTRAP on event. */ .sig_data = TEST_SIG_DATA(addr, id), + .exclude_kernel = 1, /* To allow */ + .exclude_hv = 1, /* running as !root */ }; return attr; } @@ -93,9 +95,13 @@ static void *test_thread(void *arg) __atomic_fetch_add(&ctx.tids_want_signal, tid, __ATOMIC_RELAXED); iter = ctx.iterate_on; /* read */ - for (i = 0; i < iter - 1; i++) { - __atomic_fetch_add(&ctx.tids_want_signal, tid, __ATOMIC_RELAXED); - ctx.iterate_on = iter; /* idempotent write */ + if (iter >= 0) { + for (i = 0; i < iter - 1; i++) { + __atomic_fetch_add(&ctx.tids_want_signal, tid, __ATOMIC_RELAXED); + ctx.iterate_on = iter; /* idempotent write */ + } + } else { + while (ctx.iterate_on); } return NULL; @@ -208,4 +214,27 @@ TEST_F(sigtrap_threads, signal_stress) EXPECT_EQ(ctx.first_siginfo.si_perf_data, TEST_SIG_DATA(&ctx.iterate_on, 0)); } +TEST_F(sigtrap_threads, signal_stress_with_disable) +{ + const int target_count = NUM_THREADS * 3000; + int i; + + ctx.iterate_on = -1; + + EXPECT_EQ(ioctl(self->fd, PERF_EVENT_IOC_ENABLE, 0), 0); + pthread_barrier_wait(&self->barrier); + while (__atomic_load_n(&ctx.signal_count, __ATOMIC_RELAXED) < target_count) { + EXPECT_EQ(ioctl(self->fd, PERF_EVENT_IOC_DISABLE, 0), 0); + EXPECT_EQ(ioctl(self->fd, PERF_EVENT_IOC_ENABLE, 0), 0); + } + ctx.iterate_on = 0; + for (i = 0; i < NUM_THREADS; i++) + ASSERT_EQ(pthread_join(self->threads[i], NULL), 0); + EXPECT_EQ(ioctl(self->fd, PERF_EVENT_IOC_DISABLE, 0), 0); + + EXPECT_EQ(ctx.first_siginfo.si_addr, &ctx.iterate_on); + EXPECT_EQ(ctx.first_siginfo.si_perf_type, PERF_TYPE_BREAKPOINT); + EXPECT_EQ(ctx.first_siginfo.si_perf_data, TEST_SIG_DATA(&ctx.iterate_on, 0)); +} + TEST_HARNESS_MAIN -- 2.38.0.rc1.362.ged0d419d3c-goog ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-08 8:41 ` Marco Elver @ 2022-10-08 12:41 ` Peter Zijlstra 2022-10-10 20:52 ` Marco Elver 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2022-10-08 12:41 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Sat, Oct 08, 2022 at 10:41:28AM +0200, Marco Elver wrote: > The below patch to the sigtrap_threads test can repro the issue (when > run lots of them concurrently again). It also illustrates the original > problem we're trying to solve, where the event never gets rearmed again > and the test times out (doesn't happen with the almost-working fix). Excellent, that helps. Also, I'm an idiot ;-) The below seems to fix it for me. --- --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3441,7 +3448,8 @@ static void perf_event_context_sched_out perf_pmu_disable(pmu); /* PMIs are disabled; ctx->nr_pending is stable. */ - if (local_read(&ctx->nr_pending)) { + if (local_read(&ctx->nr_pending) || + local_read(&next_ctx->nr_pending)) { /* * Must not swap out ctx when there's pending * events that rely on the ctx->task relation. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-08 12:41 ` Peter Zijlstra @ 2022-10-10 20:52 ` Marco Elver 0 siblings, 0 replies; 27+ messages in thread From: Marco Elver @ 2022-10-10 20:52 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Sat, Oct 08, 2022 at 02:41PM +0200, Peter Zijlstra wrote: > On Sat, Oct 08, 2022 at 10:41:28AM +0200, Marco Elver wrote: > > The below patch to the sigtrap_threads test can repro the issue (when > > run lots of them concurrently again). It also illustrates the original > > problem we're trying to solve, where the event never gets rearmed again > > and the test times out (doesn't happen with the almost-working fix). > > Excellent, that helps. Also, I'm an idiot ;-) > > The below seems to fix it for me. > > --- > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -3441,7 +3448,8 @@ static void perf_event_context_sched_out > perf_pmu_disable(pmu); > > /* PMIs are disabled; ctx->nr_pending is stable. */ > - if (local_read(&ctx->nr_pending)) { > + if (local_read(&ctx->nr_pending) || > + local_read(&next_ctx->nr_pending)) { > /* > * Must not swap out ctx when there's pending > * events that rely on the ctx->task relation. Yup, that fixes it. Can you send a v2 with all the fixups? Just to make sure I've tested the right thing. I'll also send the patch for the selftest addition once I gave it a good spin. Thanks, -- Marco ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-07 13:58 ` Marco Elver 2022-10-07 16:14 ` Marco Elver @ 2022-10-08 13:51 ` Peter Zijlstra 2022-10-08 14:08 ` Peter Zijlstra 1 sibling, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2022-10-08 13:51 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Fri, Oct 07, 2022 at 03:58:03PM +0200, Marco Elver wrote: > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 9319af6013f1..7de83c42d312 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -2285,9 +2285,10 @@ event_sched_out(struct perf_event *event, > */ > local_dec(&event->ctx->nr_pending); > } else { > - WARN_ON_ONCE(event->pending_work); > - event->pending_work = 1; > - task_work_add(current, &event->pending_task, TWA_RESUME); > + if (!event->pending_work) { > + event->pending_work = 1; > + task_work_add(current, &event->pending_task, TWA_RESUME); > + } else { local_dec(&event->ctx->nr_pending); } > } > } That whole thing can be written much saner like: if (event->pending_sigtrap) { event->pending_sigtrap = 0; if (state != PERF_EVENT_STATE_OFF && !event->pending_work) { event->pending_work = 1; local_inc(&event->ctx->nr_pending); task_work_add(current, &event->pending_task, TWA_RESUME); } local_dec(&event->ctx->nr_pending); } Except now we have two nr_pending ops -- I'm torn. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] perf: Fix missing SIGTRAPs 2022-10-08 13:51 ` Peter Zijlstra @ 2022-10-08 14:08 ` Peter Zijlstra 0 siblings, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2022-10-08 14:08 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Sat, Oct 08, 2022 at 03:51:24PM +0200, Peter Zijlstra wrote: > On Fri, Oct 07, 2022 at 03:58:03PM +0200, Marco Elver wrote: > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index 9319af6013f1..7de83c42d312 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -2285,9 +2285,10 @@ event_sched_out(struct perf_event *event, > > */ > > local_dec(&event->ctx->nr_pending); > > } else { > > - WARN_ON_ONCE(event->pending_work); > > - event->pending_work = 1; > > - task_work_add(current, &event->pending_task, TWA_RESUME); > > + if (!event->pending_work) { > > + event->pending_work = 1; > > + task_work_add(current, &event->pending_task, TWA_RESUME); > > + } > else { > local_dec(&event->ctx->nr_pending); > } > > } > > } > > That whole thing can be written much saner like: > > if (event->pending_sigtrap) { > event->pending_sigtrap = 0; > if (state != PERF_EVENT_STATE_OFF && > !event->pending_work) { > event->pending_work = 1; > local_inc(&event->ctx->nr_pending); > task_work_add(current, &event->pending_task, TWA_RESUME); > } > local_dec(&event->ctx->nr_pending); > } > > Except now we have two nr_pending ops -- I'm torn. I've settled for: + if (event->pending_sigtrap) { + bool dec = true; + + event->pending_sigtrap = 0; + if (state != PERF_EVENT_STATE_OFF && + !event->pending_work) { + event->pending_work = 1; + dec = false; + task_work_add(current, &event->pending_task, TWA_RESUME); + } + if (dec) + local_dec(&event->ctx->nr_pending); + } ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2] perf: Fix missing SIGTRAPs 2022-10-06 13:33 ` [PATCH] perf: Fix missing SIGTRAPs Peter Zijlstra 2022-10-06 13:59 ` Marco Elver @ 2022-10-11 7:44 ` Peter Zijlstra 2022-10-11 12:58 ` Marco Elver 1 sibling, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2022-10-11 7:44 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov Subject: perf: Fix missing SIGTRAPs From: Peter Zijlstra <peterz@infradead.org> Date: Thu Oct 6 15:00:39 CEST 2022 Marco reported: Due to the implementation of how SIGTRAP are delivered if perf_event_attr::sigtrap is set, we've noticed 3 issues: 1. Missing SIGTRAP due to a race with event_sched_out() (more details below). 2. Hardware PMU events being disabled due to returning 1 from perf_event_overflow(). The only way to re-enable the event is for user space to first "properly" disable the event and then re-enable it. 3. The inability to automatically disable an event after a specified number of overflows via PERF_EVENT_IOC_REFRESH. The worst of the 3 issues is problem (1), which occurs when a pending_disable is "consumed" by a racing event_sched_out(), observed as follows: CPU0 | CPU1 --------------------------------+--------------------------- __perf_event_overflow() | perf_event_disable_inatomic() | pending_disable = CPU0 | ... | _perf_event_enable() | event_function_call() | task_function_call() | /* sends IPI to CPU0 */ <IPI> | ... __perf_event_enable() +--------------------------- ctx_resched() task_ctx_sched_out() ctx_sched_out() group_sched_out() event_sched_out() pending_disable = -1 </IPI> <IRQ-work> perf_pending_event() perf_pending_event_disable() /* Fails to send SIGTRAP because no pending_disable! */ </IRQ-work> In the above case, not only is that particular SIGTRAP missed, but also all future SIGTRAPs because 'event_limit' is not reset back to 1. To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction of a separate 'pending_sigtrap', no longer using 'event_limit' and 'pending_disable' for its delivery. Additionally; and different to Marco's proposed patch: - recognise that pending_disable effectively duplicates oncpu for the case where it is set. As such, change the irq_work handler to use ->oncpu to target the event and use pending_* as boolean toggles. - observe that SIGTRAP targets the ctx->task, so the context switch optimization that carries contexts between tasks is invalid. If the irq_work were delayed enough to hit after a context switch the SIGTRAP would be delivered to the wrong task. - observe that if the event gets scheduled out (rotation/migration/context-switch/...) the irq-work would be insufficient to deliver the SIGTRAP when the event gets scheduled back in (the irq-work might still be pending on the old CPU). Therefore have event_sched_out() convert the pending sigtrap into a task_work which will deliver the signal at return_to_user. Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Marco Elver <elver@google.com> Debugged-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- include/linux/perf_event.h | 19 ++++- kernel/events/core.c | 151 ++++++++++++++++++++++++++++++++------------ kernel/events/ring_buffer.c | 2 3 files changed, 129 insertions(+), 43 deletions(-) --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -736,11 +736,14 @@ struct perf_event { struct fasync_struct *fasync; /* delayed work for NMIs and such */ - int pending_wakeup; - int pending_kill; - int pending_disable; + unsigned int pending_wakeup; + unsigned int pending_kill; + unsigned int pending_disable; + unsigned int pending_sigtrap; unsigned long pending_addr; /* SIGTRAP */ - struct irq_work pending; + struct irq_work pending_irq; + struct callback_head pending_task; + unsigned int pending_work; atomic_t event_limit; @@ -857,6 +860,14 @@ struct perf_event_context { #endif void *task_ctx_data; /* pmu specific data */ struct rcu_head rcu_head; + + /* + * Sum (event->pending_sigtrap + event->pending_work) + * + * The SIGTRAP is targeted at ctx->task, as such it won't do changing + * that until the signal is delivered. + */ + local_t nr_pending; }; /* --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -54,6 +54,7 @@ #include <linux/highmem.h> #include <linux/pgtable.h> #include <linux/buildid.h> +#include <linux/task_work.h> #include "internal.h" @@ -2268,11 +2269,26 @@ event_sched_out(struct perf_event *event event->pmu->del(event, 0); event->oncpu = -1; - if (READ_ONCE(event->pending_disable) >= 0) { - WRITE_ONCE(event->pending_disable, -1); + if (event->pending_disable) { + event->pending_disable = 0; perf_cgroup_event_disable(event, ctx); state = PERF_EVENT_STATE_OFF; } + + if (event->pending_sigtrap) { + bool dec = true; + + event->pending_sigtrap = 0; + if (state != PERF_EVENT_STATE_OFF && + !event->pending_work) { + event->pending_work = 1; + dec = false; + task_work_add(current, &event->pending_task, TWA_RESUME); + } + if (dec) + local_dec(&event->ctx->nr_pending); + } + perf_event_set_state(event, state); if (!is_software_event(event)) @@ -2424,7 +2440,7 @@ static void __perf_event_disable(struct * hold the top-level event's child_mutex, so any descendant that * goes to exit will block in perf_event_exit_event(). * - * When called from perf_pending_event it's OK because event->ctx + * When called from perf_pending_irq it's OK because event->ctx * is the current context on this CPU and preemption is disabled, * hence we can't get into perf_event_task_sched_out for this context. */ @@ -2463,9 +2479,8 @@ EXPORT_SYMBOL_GPL(perf_event_disable); void perf_event_disable_inatomic(struct perf_event *event) { - WRITE_ONCE(event->pending_disable, smp_processor_id()); - /* can fail, see perf_pending_event_disable() */ - irq_work_queue(&event->pending); + event->pending_disable = 1; + irq_work_queue(&event->pending_irq); } #define MAX_INTERRUPTS (~0ULL) @@ -3420,11 +3435,23 @@ static void perf_event_context_sched_out raw_spin_lock_nested(&next_ctx->lock, SINGLE_DEPTH_NESTING); if (context_equiv(ctx, next_ctx)) { + perf_pmu_disable(pmu); + + /* PMIs are disabled; ctx->nr_pending is stable. */ + if (local_read(&ctx->nr_pending) || + local_read(&next_ctx->nr_pending)) { + /* + * Must not swap out ctx when there's pending + * events that rely on the ctx->task relation. + */ + raw_spin_unlock(&next_ctx->lock); + rcu_read_unlock(); + goto inside_switch; + } + WRITE_ONCE(ctx->task, next); WRITE_ONCE(next_ctx->task, task); - perf_pmu_disable(pmu); - if (cpuctx->sched_cb_usage && pmu->sched_task) pmu->sched_task(ctx, false); @@ -3465,6 +3492,7 @@ static void perf_event_context_sched_out raw_spin_lock(&ctx->lock); perf_pmu_disable(pmu); +inside_switch: if (cpuctx->sched_cb_usage && pmu->sched_task) pmu->sched_task(ctx, false); task_ctx_sched_out(cpuctx, ctx, EVENT_ALL); @@ -4931,7 +4959,7 @@ static void perf_addr_filters_splice(str static void _free_event(struct perf_event *event) { - irq_work_sync(&event->pending); + irq_work_sync(&event->pending_irq); unaccount_event(event); @@ -6431,7 +6459,8 @@ static void perf_sigtrap(struct perf_eve return; /* - * perf_pending_event() can race with the task exiting. + * Both perf_pending_task() and perf_pending_irq() can race with the + * task exiting. */ if (current->flags & PF_EXITING) return; @@ -6440,23 +6469,33 @@ static void perf_sigtrap(struct perf_eve event->attr.type, event->attr.sig_data); } -static void perf_pending_event_disable(struct perf_event *event) +/* + * Deliver the pending work in-event-context or follow the context. + */ +static void __perf_pending_irq(struct perf_event *event) { - int cpu = READ_ONCE(event->pending_disable); + int cpu = READ_ONCE(event->oncpu); + /* + * If the event isn't running; we done. event_sched_out() will have + * taken care of things. + */ if (cpu < 0) return; + /* + * Yay, we hit home and are in the context of the event. + */ if (cpu == smp_processor_id()) { - WRITE_ONCE(event->pending_disable, -1); - - if (event->attr.sigtrap) { + if (event->pending_sigtrap) { + event->pending_sigtrap = 0; perf_sigtrap(event); - atomic_set_release(&event->event_limit, 1); /* rearm event */ - return; + local_dec(&event->ctx->nr_pending); + } + if (event->pending_disable) { + event->pending_disable = 0; + perf_event_disable_local(event); } - - perf_event_disable_local(event); return; } @@ -6476,35 +6515,62 @@ static void perf_pending_event_disable(s * irq_work_queue(); // FAILS * * irq_work_run() - * perf_pending_event() + * perf_pending_irq() * * But the event runs on CPU-B and wants disabling there. */ - irq_work_queue_on(&event->pending, cpu); + irq_work_queue_on(&event->pending_irq, cpu); } -static void perf_pending_event(struct irq_work *entry) +static void perf_pending_irq(struct irq_work *entry) { - struct perf_event *event = container_of(entry, struct perf_event, pending); + struct perf_event *event = container_of(entry, struct perf_event, pending_irq); int rctx; - rctx = perf_swevent_get_recursion_context(); /* * If we 'fail' here, that's OK, it means recursion is already disabled * and we won't recurse 'further'. */ + rctx = perf_swevent_get_recursion_context(); - perf_pending_event_disable(event); - + /* + * The wakeup isn't bound to the context of the event -- it can happen + * irrespective of where the event is. + */ if (event->pending_wakeup) { event->pending_wakeup = 0; perf_event_wakeup(event); } + __perf_pending_irq(event); + if (rctx >= 0) perf_swevent_put_recursion_context(rctx); } +static void perf_pending_task(struct callback_head *head) +{ + struct perf_event *event = container_of(head, struct perf_event, pending_task); + int rctx; + + /* + * If we 'fail' here, that's OK, it means recursion is already disabled + * and we won't recurse 'further'. + */ + preempt_disable_notrace(); + rctx = perf_swevent_get_recursion_context(); + + if (event->pending_work) { + event->pending_work = 0; + perf_sigtrap(event); + local_dec(&event->ctx->nr_pending); + } + + if (rctx >= 0) + perf_swevent_put_recursion_context(rctx); + preempt_enable_notrace(); +} + #ifdef CONFIG_GUEST_PERF_EVENTS struct perf_guest_info_callbacks __rcu *perf_guest_cbs; @@ -9188,8 +9254,8 @@ int perf_event_account_interrupt(struct */ static int __perf_event_overflow(struct perf_event *event, - int throttle, struct perf_sample_data *data, - struct pt_regs *regs) + int throttle, struct perf_sample_data *data, + struct pt_regs *regs) { int events = atomic_read(&event->event_limit); int ret = 0; @@ -9212,24 +9278,36 @@ static int __perf_event_overflow(struct if (events && atomic_dec_and_test(&event->event_limit)) { ret = 1; event->pending_kill = POLL_HUP; - event->pending_addr = data->addr; - perf_event_disable_inatomic(event); } + if (event->attr.sigtrap) { + /* + * Should not be able to return to user space without processing + * pending_sigtrap (kernel events can overflow multiple times). + */ + WARN_ON_ONCE(event->pending_sigtrap && event->attr.exclude_kernel); + if (!event->pending_sigtrap) { + event->pending_sigtrap = 1; + local_inc(&event->ctx->nr_pending); + } + event->pending_addr = data->addr; + irq_work_queue(&event->pending_irq); + } + READ_ONCE(event->overflow_handler)(event, data, regs); if (*perf_event_fasync(event) && event->pending_kill) { event->pending_wakeup = 1; - irq_work_queue(&event->pending); + irq_work_queue(&event->pending_irq); } return ret; } int perf_event_overflow(struct perf_event *event, - struct perf_sample_data *data, - struct pt_regs *regs) + struct perf_sample_data *data, + struct pt_regs *regs) { return __perf_event_overflow(event, 1, data, regs); } @@ -11537,8 +11615,8 @@ perf_event_alloc(struct perf_event_attr init_waitqueue_head(&event->waitq); - event->pending_disable = -1; - init_irq_work(&event->pending, perf_pending_event); + init_irq_work(&event->pending_irq, perf_pending_irq); + init_task_work(&event->pending_task, perf_pending_task); mutex_init(&event->mmap_mutex); raw_spin_lock_init(&event->addr_filters.lock); @@ -11560,9 +11638,6 @@ perf_event_alloc(struct perf_event_attr if (parent_event) event->event_caps = parent_event->event_caps; - if (event->attr.sigtrap) - atomic_set(&event->event_limit, 1); - if (task) { event->attach_state = PERF_ATTACH_TASK; /* --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -22,7 +22,7 @@ static void perf_output_wakeup(struct pe atomic_set(&handle->rb->poll, EPOLLIN); handle->event->pending_wakeup = 1; - irq_work_queue(&handle->event->pending); + irq_work_queue(&handle->event->pending_irq); } /* ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2] perf: Fix missing SIGTRAPs 2022-10-11 7:44 ` [PATCH v2] " Peter Zijlstra @ 2022-10-11 12:58 ` Marco Elver 2022-10-11 13:06 ` Peter Zijlstra 0 siblings, 1 reply; 27+ messages in thread From: Marco Elver @ 2022-10-11 12:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Tue, Oct 11, 2022 at 09:44AM +0200, Peter Zijlstra wrote: > Subject: perf: Fix missing SIGTRAPs > From: Peter Zijlstra <peterz@infradead.org> > Date: Thu Oct 6 15:00:39 CEST 2022 > > Marco reported: > > Due to the implementation of how SIGTRAP are delivered if > perf_event_attr::sigtrap is set, we've noticed 3 issues: > > 1. Missing SIGTRAP due to a race with event_sched_out() (more > details below). > > 2. Hardware PMU events being disabled due to returning 1 from > perf_event_overflow(). The only way to re-enable the event is > for user space to first "properly" disable the event and then > re-enable it. > > 3. The inability to automatically disable an event after a > specified number of overflows via PERF_EVENT_IOC_REFRESH. > > The worst of the 3 issues is problem (1), which occurs when a > pending_disable is "consumed" by a racing event_sched_out(), observed > as follows: > > CPU0 | CPU1 > --------------------------------+--------------------------- > __perf_event_overflow() | > perf_event_disable_inatomic() | > pending_disable = CPU0 | ... > | _perf_event_enable() > | event_function_call() > | task_function_call() > | /* sends IPI to CPU0 */ > <IPI> | ... > __perf_event_enable() +--------------------------- > ctx_resched() > task_ctx_sched_out() > ctx_sched_out() > group_sched_out() > event_sched_out() > pending_disable = -1 > </IPI> > <IRQ-work> > perf_pending_event() > perf_pending_event_disable() > /* Fails to send SIGTRAP because no pending_disable! */ > </IRQ-work> > > In the above case, not only is that particular SIGTRAP missed, but also > all future SIGTRAPs because 'event_limit' is not reset back to 1. > > To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction > of a separate 'pending_sigtrap', no longer using 'event_limit' and > 'pending_disable' for its delivery. > > Additionally; and different to Marco's proposed patch: > > - recognise that pending_disable effectively duplicates oncpu for > the case where it is set. As such, change the irq_work handler to > use ->oncpu to target the event and use pending_* as boolean toggles. > > - observe that SIGTRAP targets the ctx->task, so the context switch > optimization that carries contexts between tasks is invalid. If > the irq_work were delayed enough to hit after a context switch the > SIGTRAP would be delivered to the wrong task. > > - observe that if the event gets scheduled out > (rotation/migration/context-switch/...) the irq-work would be > insufficient to deliver the SIGTRAP when the event gets scheduled > back in (the irq-work might still be pending on the old CPU). > > Therefore have event_sched_out() convert the pending sigtrap into a > task_work which will deliver the signal at return_to_user. > > Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") > Reported-by: Marco Elver <elver@google.com> > Debugged-by: Marco Elver <elver@google.com> Reviewed-by: Marco Elver <elver@google.com> Tested-by: Marco Elver <elver@google.com> .. fuzzing, and lots of concurrent sigtrap_threads with this patch: https://lore.kernel.org/all/20221011124534.84907-1-elver@google.com/ > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> My original patch also attributed Dmitry: Reported-by: Dmitry Vyukov <dvyukov@google.com> Debugged-by: Dmitry Vyukov <dvyukov@google.com> ... we all melted our brains on this one. :-) Would be good to get the fix into one of the upcoming 6.1-rc. Thanks! -- Marco > --- > include/linux/perf_event.h | 19 ++++- > kernel/events/core.c | 151 ++++++++++++++++++++++++++++++++------------ > kernel/events/ring_buffer.c | 2 > 3 files changed, 129 insertions(+), 43 deletions(-) > > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -736,11 +736,14 @@ struct perf_event { > struct fasync_struct *fasync; > > /* delayed work for NMIs and such */ > - int pending_wakeup; > - int pending_kill; > - int pending_disable; > + unsigned int pending_wakeup; > + unsigned int pending_kill; > + unsigned int pending_disable; > + unsigned int pending_sigtrap; > unsigned long pending_addr; /* SIGTRAP */ > - struct irq_work pending; > + struct irq_work pending_irq; > + struct callback_head pending_task; > + unsigned int pending_work; > > atomic_t event_limit; > > @@ -857,6 +860,14 @@ struct perf_event_context { > #endif > void *task_ctx_data; /* pmu specific data */ > struct rcu_head rcu_head; > + > + /* > + * Sum (event->pending_sigtrap + event->pending_work) > + * > + * The SIGTRAP is targeted at ctx->task, as such it won't do changing > + * that until the signal is delivered. > + */ > + local_t nr_pending; > }; > > /* > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -54,6 +54,7 @@ > #include <linux/highmem.h> > #include <linux/pgtable.h> > #include <linux/buildid.h> > +#include <linux/task_work.h> > > #include "internal.h" > > @@ -2268,11 +2269,26 @@ event_sched_out(struct perf_event *event > event->pmu->del(event, 0); > event->oncpu = -1; > > - if (READ_ONCE(event->pending_disable) >= 0) { > - WRITE_ONCE(event->pending_disable, -1); > + if (event->pending_disable) { > + event->pending_disable = 0; > perf_cgroup_event_disable(event, ctx); > state = PERF_EVENT_STATE_OFF; > } > + > + if (event->pending_sigtrap) { > + bool dec = true; > + > + event->pending_sigtrap = 0; > + if (state != PERF_EVENT_STATE_OFF && > + !event->pending_work) { > + event->pending_work = 1; > + dec = false; > + task_work_add(current, &event->pending_task, TWA_RESUME); > + } > + if (dec) > + local_dec(&event->ctx->nr_pending); > + } > + > perf_event_set_state(event, state); > > if (!is_software_event(event)) > @@ -2424,7 +2440,7 @@ static void __perf_event_disable(struct > * hold the top-level event's child_mutex, so any descendant that > * goes to exit will block in perf_event_exit_event(). > * > - * When called from perf_pending_event it's OK because event->ctx > + * When called from perf_pending_irq it's OK because event->ctx > * is the current context on this CPU and preemption is disabled, > * hence we can't get into perf_event_task_sched_out for this context. > */ > @@ -2463,9 +2479,8 @@ EXPORT_SYMBOL_GPL(perf_event_disable); > > void perf_event_disable_inatomic(struct perf_event *event) > { > - WRITE_ONCE(event->pending_disable, smp_processor_id()); > - /* can fail, see perf_pending_event_disable() */ > - irq_work_queue(&event->pending); > + event->pending_disable = 1; > + irq_work_queue(&event->pending_irq); > } > > #define MAX_INTERRUPTS (~0ULL) > @@ -3420,11 +3435,23 @@ static void perf_event_context_sched_out > raw_spin_lock_nested(&next_ctx->lock, SINGLE_DEPTH_NESTING); > if (context_equiv(ctx, next_ctx)) { > > + perf_pmu_disable(pmu); > + > + /* PMIs are disabled; ctx->nr_pending is stable. */ > + if (local_read(&ctx->nr_pending) || > + local_read(&next_ctx->nr_pending)) { > + /* > + * Must not swap out ctx when there's pending > + * events that rely on the ctx->task relation. > + */ > + raw_spin_unlock(&next_ctx->lock); > + rcu_read_unlock(); > + goto inside_switch; > + } > + > WRITE_ONCE(ctx->task, next); > WRITE_ONCE(next_ctx->task, task); > > - perf_pmu_disable(pmu); > - > if (cpuctx->sched_cb_usage && pmu->sched_task) > pmu->sched_task(ctx, false); > > @@ -3465,6 +3492,7 @@ static void perf_event_context_sched_out > raw_spin_lock(&ctx->lock); > perf_pmu_disable(pmu); > > +inside_switch: > if (cpuctx->sched_cb_usage && pmu->sched_task) > pmu->sched_task(ctx, false); > task_ctx_sched_out(cpuctx, ctx, EVENT_ALL); > @@ -4931,7 +4959,7 @@ static void perf_addr_filters_splice(str > > static void _free_event(struct perf_event *event) > { > - irq_work_sync(&event->pending); > + irq_work_sync(&event->pending_irq); > > unaccount_event(event); > > @@ -6431,7 +6459,8 @@ static void perf_sigtrap(struct perf_eve > return; > > /* > - * perf_pending_event() can race with the task exiting. > + * Both perf_pending_task() and perf_pending_irq() can race with the > + * task exiting. > */ > if (current->flags & PF_EXITING) > return; > @@ -6440,23 +6469,33 @@ static void perf_sigtrap(struct perf_eve > event->attr.type, event->attr.sig_data); > } > > -static void perf_pending_event_disable(struct perf_event *event) > +/* > + * Deliver the pending work in-event-context or follow the context. > + */ > +static void __perf_pending_irq(struct perf_event *event) > { > - int cpu = READ_ONCE(event->pending_disable); > + int cpu = READ_ONCE(event->oncpu); > > + /* > + * If the event isn't running; we done. event_sched_out() will have > + * taken care of things. > + */ > if (cpu < 0) > return; > > + /* > + * Yay, we hit home and are in the context of the event. > + */ > if (cpu == smp_processor_id()) { > - WRITE_ONCE(event->pending_disable, -1); > - > - if (event->attr.sigtrap) { > + if (event->pending_sigtrap) { > + event->pending_sigtrap = 0; > perf_sigtrap(event); > - atomic_set_release(&event->event_limit, 1); /* rearm event */ > - return; > + local_dec(&event->ctx->nr_pending); > + } > + if (event->pending_disable) { > + event->pending_disable = 0; > + perf_event_disable_local(event); > } > - > - perf_event_disable_local(event); > return; > } > > @@ -6476,35 +6515,62 @@ static void perf_pending_event_disable(s > * irq_work_queue(); // FAILS > * > * irq_work_run() > - * perf_pending_event() > + * perf_pending_irq() > * > * But the event runs on CPU-B and wants disabling there. > */ > - irq_work_queue_on(&event->pending, cpu); > + irq_work_queue_on(&event->pending_irq, cpu); > } > > -static void perf_pending_event(struct irq_work *entry) > +static void perf_pending_irq(struct irq_work *entry) > { > - struct perf_event *event = container_of(entry, struct perf_event, pending); > + struct perf_event *event = container_of(entry, struct perf_event, pending_irq); > int rctx; > > - rctx = perf_swevent_get_recursion_context(); > /* > * If we 'fail' here, that's OK, it means recursion is already disabled > * and we won't recurse 'further'. > */ > + rctx = perf_swevent_get_recursion_context(); > > - perf_pending_event_disable(event); > - > + /* > + * The wakeup isn't bound to the context of the event -- it can happen > + * irrespective of where the event is. > + */ > if (event->pending_wakeup) { > event->pending_wakeup = 0; > perf_event_wakeup(event); > } > > + __perf_pending_irq(event); > + > if (rctx >= 0) > perf_swevent_put_recursion_context(rctx); > } > > +static void perf_pending_task(struct callback_head *head) > +{ > + struct perf_event *event = container_of(head, struct perf_event, pending_task); > + int rctx; > + > + /* > + * If we 'fail' here, that's OK, it means recursion is already disabled > + * and we won't recurse 'further'. > + */ > + preempt_disable_notrace(); > + rctx = perf_swevent_get_recursion_context(); > + > + if (event->pending_work) { > + event->pending_work = 0; > + perf_sigtrap(event); > + local_dec(&event->ctx->nr_pending); > + } > + > + if (rctx >= 0) > + perf_swevent_put_recursion_context(rctx); > + preempt_enable_notrace(); > +} > + > #ifdef CONFIG_GUEST_PERF_EVENTS > struct perf_guest_info_callbacks __rcu *perf_guest_cbs; > > @@ -9188,8 +9254,8 @@ int perf_event_account_interrupt(struct > */ > > static int __perf_event_overflow(struct perf_event *event, > - int throttle, struct perf_sample_data *data, > - struct pt_regs *regs) > + int throttle, struct perf_sample_data *data, > + struct pt_regs *regs) > { > int events = atomic_read(&event->event_limit); > int ret = 0; > @@ -9212,24 +9278,36 @@ static int __perf_event_overflow(struct > if (events && atomic_dec_and_test(&event->event_limit)) { > ret = 1; > event->pending_kill = POLL_HUP; > - event->pending_addr = data->addr; > - > perf_event_disable_inatomic(event); > } > > + if (event->attr.sigtrap) { > + /* > + * Should not be able to return to user space without processing > + * pending_sigtrap (kernel events can overflow multiple times). > + */ > + WARN_ON_ONCE(event->pending_sigtrap && event->attr.exclude_kernel); > + if (!event->pending_sigtrap) { > + event->pending_sigtrap = 1; > + local_inc(&event->ctx->nr_pending); > + } > + event->pending_addr = data->addr; > + irq_work_queue(&event->pending_irq); > + } > + > READ_ONCE(event->overflow_handler)(event, data, regs); > > if (*perf_event_fasync(event) && event->pending_kill) { > event->pending_wakeup = 1; > - irq_work_queue(&event->pending); > + irq_work_queue(&event->pending_irq); > } > > return ret; > } > > int perf_event_overflow(struct perf_event *event, > - struct perf_sample_data *data, > - struct pt_regs *regs) > + struct perf_sample_data *data, > + struct pt_regs *regs) > { > return __perf_event_overflow(event, 1, data, regs); > } > @@ -11537,8 +11615,8 @@ perf_event_alloc(struct perf_event_attr > > > init_waitqueue_head(&event->waitq); > - event->pending_disable = -1; > - init_irq_work(&event->pending, perf_pending_event); > + init_irq_work(&event->pending_irq, perf_pending_irq); > + init_task_work(&event->pending_task, perf_pending_task); > > mutex_init(&event->mmap_mutex); > raw_spin_lock_init(&event->addr_filters.lock); > @@ -11560,9 +11638,6 @@ perf_event_alloc(struct perf_event_attr > if (parent_event) > event->event_caps = parent_event->event_caps; > > - if (event->attr.sigtrap) > - atomic_set(&event->event_limit, 1); > - > if (task) { > event->attach_state = PERF_ATTACH_TASK; > /* > --- a/kernel/events/ring_buffer.c > +++ b/kernel/events/ring_buffer.c > @@ -22,7 +22,7 @@ static void perf_output_wakeup(struct pe > atomic_set(&handle->rb->poll, EPOLLIN); > > handle->event->pending_wakeup = 1; > - irq_work_queue(&handle->event->pending); > + irq_work_queue(&handle->event->pending_irq); > } > > /* ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2] perf: Fix missing SIGTRAPs 2022-10-11 12:58 ` Marco Elver @ 2022-10-11 13:06 ` Peter Zijlstra 0 siblings, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2022-10-11 13:06 UTC (permalink / raw) To: Marco Elver Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, kasan-dev, Dmitry Vyukov On Tue, Oct 11, 2022 at 02:58:36PM +0200, Marco Elver wrote: > On Tue, Oct 11, 2022 at 09:44AM +0200, Peter Zijlstra wrote: > > Subject: perf: Fix missing SIGTRAPs > > From: Peter Zijlstra <peterz@infradead.org> > > Date: Thu Oct 6 15:00:39 CEST 2022 > > > > Marco reported: > > > > Due to the implementation of how SIGTRAP are delivered if > > perf_event_attr::sigtrap is set, we've noticed 3 issues: > > > > 1. Missing SIGTRAP due to a race with event_sched_out() (more > > details below). > > > > 2. Hardware PMU events being disabled due to returning 1 from > > perf_event_overflow(). The only way to re-enable the event is > > for user space to first "properly" disable the event and then > > re-enable it. > > > > 3. The inability to automatically disable an event after a > > specified number of overflows via PERF_EVENT_IOC_REFRESH. > > > > The worst of the 3 issues is problem (1), which occurs when a > > pending_disable is "consumed" by a racing event_sched_out(), observed > > as follows: > > > > CPU0 | CPU1 > > --------------------------------+--------------------------- > > __perf_event_overflow() | > > perf_event_disable_inatomic() | > > pending_disable = CPU0 | ... > > | _perf_event_enable() > > | event_function_call() > > | task_function_call() > > | /* sends IPI to CPU0 */ > > <IPI> | ... > > __perf_event_enable() +--------------------------- > > ctx_resched() > > task_ctx_sched_out() > > ctx_sched_out() > > group_sched_out() > > event_sched_out() > > pending_disable = -1 > > </IPI> > > <IRQ-work> > > perf_pending_event() > > perf_pending_event_disable() > > /* Fails to send SIGTRAP because no pending_disable! */ > > </IRQ-work> > > > > In the above case, not only is that particular SIGTRAP missed, but also > > all future SIGTRAPs because 'event_limit' is not reset back to 1. > > > > To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction > > of a separate 'pending_sigtrap', no longer using 'event_limit' and > > 'pending_disable' for its delivery. > > > > Additionally; and different to Marco's proposed patch: > > > > - recognise that pending_disable effectively duplicates oncpu for > > the case where it is set. As such, change the irq_work handler to > > use ->oncpu to target the event and use pending_* as boolean toggles. > > > > - observe that SIGTRAP targets the ctx->task, so the context switch > > optimization that carries contexts between tasks is invalid. If > > the irq_work were delayed enough to hit after a context switch the > > SIGTRAP would be delivered to the wrong task. > > > > - observe that if the event gets scheduled out > > (rotation/migration/context-switch/...) the irq-work would be > > insufficient to deliver the SIGTRAP when the event gets scheduled > > back in (the irq-work might still be pending on the old CPU). > > > > Therefore have event_sched_out() convert the pending sigtrap into a > > task_work which will deliver the signal at return_to_user. > > > > Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") > > Reported-by: Marco Elver <elver@google.com> > > Debugged-by: Marco Elver <elver@google.com> > > Reviewed-by: Marco Elver <elver@google.com> > Tested-by: Marco Elver <elver@google.com> > > .. fuzzing, and lots of concurrent sigtrap_threads with this patch: > > https://lore.kernel.org/all/20221011124534.84907-1-elver@google.com/ > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > My original patch also attributed Dmitry: > > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Debugged-by: Dmitry Vyukov <dvyukov@google.com> > > ... we all melted our brains on this one. :-) > > Would be good to get the fix into one of the upcoming 6.1-rc. Updated and yes, I'm planning on queueing this in perf/urgent the moment -rc1 happens. Thanks! ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2022-10-11 13:07 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-09-27 12:13 [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse Marco Elver 2022-09-27 12:30 ` Marco Elver 2022-09-27 18:20 ` Peter Zijlstra 2022-09-27 21:45 ` Marco Elver 2022-09-28 10:06 ` Marco Elver 2022-09-28 14:55 ` Marco Elver 2022-10-04 17:09 ` Peter Zijlstra 2022-10-04 17:21 ` Peter Zijlstra 2022-10-04 17:33 ` Marco Elver 2022-10-05 7:37 ` Peter Zijlstra 2022-10-05 7:49 ` Marco Elver 2022-10-05 8:23 ` Peter Zijlstra 2022-10-06 13:33 ` [PATCH] perf: Fix missing SIGTRAPs Peter Zijlstra 2022-10-06 13:59 ` Marco Elver 2022-10-06 16:02 ` Peter Zijlstra 2022-10-07 9:37 ` Marco Elver 2022-10-07 13:09 ` Peter Zijlstra 2022-10-07 13:58 ` Marco Elver 2022-10-07 16:14 ` Marco Elver 2022-10-08 8:41 ` Marco Elver 2022-10-08 12:41 ` Peter Zijlstra 2022-10-10 20:52 ` Marco Elver 2022-10-08 13:51 ` Peter Zijlstra 2022-10-08 14:08 ` Peter Zijlstra 2022-10-11 7:44 ` [PATCH v2] " Peter Zijlstra 2022-10-11 12:58 ` Marco Elver 2022-10-11 13:06 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).