* [PATCH v2] perf/core: Fix missing wakeup when waiting for context reference
@ 2024-04-18 8:03 Haifeng Xu
2024-04-18 10:03 ` Frederic Weisbecker
0 siblings, 1 reply; 3+ messages in thread
From: Haifeng Xu @ 2024-04-18 8:03 UTC (permalink / raw)
To: peterz, mingo, frederic
Cc: acme, mark.rutland, alexander.shishkin, jolsa, namhyung, irogers,
adrian.hunter, linux-perf-users, linux-kernel, Haifeng Xu
In our production environment, we found many hung tasks which are
blocked for more than 18 hours. Their call traces are like this:
[346278.191038] __schedule+0x2d8/0x890
[346278.191046] schedule+0x4e/0xb0
[346278.191049] perf_event_free_task+0x220/0x270
[346278.191056] ? init_wait_var_entry+0x50/0x50
[346278.191060] copy_process+0x663/0x18d0
[346278.191068] kernel_clone+0x9d/0x3d0
[346278.191072] __do_sys_clone+0x5d/0x80
[346278.191076] __x64_sys_clone+0x25/0x30
[346278.191079] do_syscall_64+0x5c/0xc0
[346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
[346278.191086] ? do_syscall_64+0x69/0xc0
[346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
[346278.191092] ? irqentry_exit+0x19/0x30
[346278.191095] ? exc_page_fault+0x89/0x160
[346278.191097] ? asm_exc_page_fault+0x8/0x30
[346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
The task was waiting for the refcount become to 1, but from the vmcore,
we found the refcount has already been 1. It seems that the task didn't
get woken up by perf_event_release_kernel() and got stuck forever. The
below scenario may cause the problem.
Thread A Thread B
... ...
perf_event_free_task perf_event_release_kernel
...
acquire event->child_mutex
...
get_ctx
... release event->child_mutex
acquire ctx->mutex
...
perf_free_event (acquire/release event->child_mutex)
...
release ctx->mutex
wait_var_event
acquire ctx->mutex
acquire event->child_mutex
# move existing events to free_list
release event->child_mutex
release ctx->mutex
put_ctx
... ...
In this case, all events of the ctx have been freed, so we couldn't
find the ctx in free_list and Thread A will miss the wakeup. It's thus
necessary to add a wakeup after dropping the reference.
Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()")
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
Changes since v1
- Add the fixed tag.
- Simplify v1's patch. (Frederic)
---
kernel/events/core.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4f0c45ab8d7d..15c35070db6a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5340,6 +5340,7 @@ int perf_event_release_kernel(struct perf_event *event)
again:
mutex_lock(&event->child_mutex);
list_for_each_entry(child, &event->child_list, child_list) {
+ void *var = NULL;
/*
* Cannot change, child events are not migrated, see the
@@ -5380,11 +5381,23 @@ int perf_event_release_kernel(struct perf_event *event)
* this can't be the last reference.
*/
put_event(event);
+ } else {
+ var = &ctx->refcount;
}
mutex_unlock(&event->child_mutex);
mutex_unlock(&ctx->mutex);
put_ctx(ctx);
+
+ if (var) {
+ /*
+ * If perf_event_free_task() has deleted all events from the
+ * ctx while the child_mutex got released above, make sure to
+ * notify about the preceding put_ctx().
+ */
+ smp_mb(); /* pairs with wait_var_event() */
+ wake_up_var(var);
+ }
goto again;
}
mutex_unlock(&event->child_mutex);
--
2.25.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2] perf/core: Fix missing wakeup when waiting for context reference
2024-04-18 8:03 [PATCH v2] perf/core: Fix missing wakeup when waiting for context reference Haifeng Xu
@ 2024-04-18 10:03 ` Frederic Weisbecker
2024-04-18 11:34 ` Haifeng Xu
0 siblings, 1 reply; 3+ messages in thread
From: Frederic Weisbecker @ 2024-04-18 10:03 UTC (permalink / raw)
To: Haifeng Xu
Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
namhyung, irogers, adrian.hunter, linux-perf-users, linux-kernel
On Thu, Apr 18, 2024 at 08:03:56AM +0000, Haifeng Xu wrote:
> In our production environment, we found many hung tasks which are
> blocked for more than 18 hours. Their call traces are like this:
>
> [346278.191038] __schedule+0x2d8/0x890
> [346278.191046] schedule+0x4e/0xb0
> [346278.191049] perf_event_free_task+0x220/0x270
> [346278.191056] ? init_wait_var_entry+0x50/0x50
> [346278.191060] copy_process+0x663/0x18d0
> [346278.191068] kernel_clone+0x9d/0x3d0
> [346278.191072] __do_sys_clone+0x5d/0x80
> [346278.191076] __x64_sys_clone+0x25/0x30
> [346278.191079] do_syscall_64+0x5c/0xc0
> [346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
> [346278.191086] ? do_syscall_64+0x69/0xc0
> [346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
> [346278.191092] ? irqentry_exit+0x19/0x30
> [346278.191095] ? exc_page_fault+0x89/0x160
> [346278.191097] ? asm_exc_page_fault+0x8/0x30
> [346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> The task was waiting for the refcount become to 1, but from the vmcore,
> we found the refcount has already been 1. It seems that the task didn't
> get woken up by perf_event_release_kernel() and got stuck forever. The
> below scenario may cause the problem.
>
> Thread A Thread B
> ... ...
> perf_event_free_task perf_event_release_kernel
> ...
> acquire event->child_mutex
> ...
> get_ctx
> ... release event->child_mutex
> acquire ctx->mutex
> ...
> perf_free_event (acquire/release event->child_mutex)
> ...
> release ctx->mutex
> wait_var_event
> acquire ctx->mutex
> acquire event->child_mutex
> # move existing events to free_list
> release event->child_mutex
> release ctx->mutex
> put_ctx
> ... ...
>
> In this case, all events of the ctx have been freed, so we couldn't
> find the ctx in free_list and Thread A will miss the wakeup. It's thus
> necessary to add a wakeup after dropping the reference.
>
> Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()")
> Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Hint: always ask before putting someone else's Signed-off-by tag ;-)
And anyway you don't need it here.
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> ---
> Changes since v1
> - Add the fixed tag.
> - Simplify v1's patch. (Frederic)
> ---
> kernel/events/core.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 4f0c45ab8d7d..15c35070db6a 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5340,6 +5340,7 @@ int perf_event_release_kernel(struct perf_event *event)
> again:
> mutex_lock(&event->child_mutex);
> list_for_each_entry(child, &event->child_list, child_list) {
> + void *var = NULL;
>
> /*
> * Cannot change, child events are not migrated, see the
> @@ -5380,11 +5381,23 @@ int perf_event_release_kernel(struct perf_event *event)
> * this can't be the last reference.
> */
> put_event(event);
> + } else {
> + var = &ctx->refcount;
> }
>
> mutex_unlock(&event->child_mutex);
> mutex_unlock(&ctx->mutex);
> put_ctx(ctx);
> +
> + if (var) {
> + /*
> + * If perf_event_free_task() has deleted all events from the
> + * ctx while the child_mutex got released above, make sure to
> + * notify about the preceding put_ctx().
> + */
> + smp_mb(); /* pairs with wait_var_event() */
> + wake_up_var(var);
> + }
> goto again;
> }
> mutex_unlock(&event->child_mutex);
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] perf/core: Fix missing wakeup when waiting for context reference
2024-04-18 10:03 ` Frederic Weisbecker
@ 2024-04-18 11:34 ` Haifeng Xu
0 siblings, 0 replies; 3+ messages in thread
From: Haifeng Xu @ 2024-04-18 11:34 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
namhyung, irogers, adrian.hunter, linux-perf-users, linux-kernel
On 2024/4/18 18:03, Frederic Weisbecker wrote:
> On Thu, Apr 18, 2024 at 08:03:56AM +0000, Haifeng Xu wrote:
>> In our production environment, we found many hung tasks which are
>> blocked for more than 18 hours. Their call traces are like this:
>>
>> [346278.191038] __schedule+0x2d8/0x890
>> [346278.191046] schedule+0x4e/0xb0
>> [346278.191049] perf_event_free_task+0x220/0x270
>> [346278.191056] ? init_wait_var_entry+0x50/0x50
>> [346278.191060] copy_process+0x663/0x18d0
>> [346278.191068] kernel_clone+0x9d/0x3d0
>> [346278.191072] __do_sys_clone+0x5d/0x80
>> [346278.191076] __x64_sys_clone+0x25/0x30
>> [346278.191079] do_syscall_64+0x5c/0xc0
>> [346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
>> [346278.191086] ? do_syscall_64+0x69/0xc0
>> [346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
>> [346278.191092] ? irqentry_exit+0x19/0x30
>> [346278.191095] ? exc_page_fault+0x89/0x160
>> [346278.191097] ? asm_exc_page_fault+0x8/0x30
>> [346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>> The task was waiting for the refcount become to 1, but from the vmcore,
>> we found the refcount has already been 1. It seems that the task didn't
>> get woken up by perf_event_release_kernel() and got stuck forever. The
>> below scenario may cause the problem.
>>
>> Thread A Thread B
>> ... ...
>> perf_event_free_task perf_event_release_kernel
>> ...
>> acquire event->child_mutex
>> ...
>> get_ctx
>> ... release event->child_mutex
>> acquire ctx->mutex
>> ...
>> perf_free_event (acquire/release event->child_mutex)
>> ...
>> release ctx->mutex
>> wait_var_event
>> acquire ctx->mutex
>> acquire event->child_mutex
>> # move existing events to free_list
>> release event->child_mutex
>> release ctx->mutex
>> put_ctx
>> ... ...
>>
>> In this case, all events of the ctx have been freed, so we couldn't
>> find the ctx in free_list and Thread A will miss the wakeup. It's thus
>> necessary to add a wakeup after dropping the reference.
>>
>> Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()")
>> Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>
> Hint: always ask before putting someone else's Signed-off-by tag ;-)
> And anyway you don't need it here.
Sorry, Frederic. I'll resend this with Reviewed-by tag.
Thanks!
>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>
>
>> ---
>> Changes since v1
>> - Add the fixed tag.
>> - Simplify v1's patch. (Frederic)
>> ---
>> kernel/events/core.c | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 4f0c45ab8d7d..15c35070db6a 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -5340,6 +5340,7 @@ int perf_event_release_kernel(struct perf_event *event)
>> again:
>> mutex_lock(&event->child_mutex);
>> list_for_each_entry(child, &event->child_list, child_list) {
>> + void *var = NULL;
>>
>> /*
>> * Cannot change, child events are not migrated, see the
>> @@ -5380,11 +5381,23 @@ int perf_event_release_kernel(struct perf_event *event)
>> * this can't be the last reference.
>> */
>> put_event(event);
>> + } else {
>> + var = &ctx->refcount;
>> }
>>
>> mutex_unlock(&event->child_mutex);
>> mutex_unlock(&ctx->mutex);
>> put_ctx(ctx);
>> +
>> + if (var) {
>> + /*
>> + * If perf_event_free_task() has deleted all events from the
>> + * ctx while the child_mutex got released above, make sure to
>> + * notify about the preceding put_ctx().
>> + */
>> + smp_mb(); /* pairs with wait_var_event() */
>> + wake_up_var(var);
>> + }
>> goto again;
>> }
>> mutex_unlock(&event->child_mutex);
>> --
>> 2.25.1
>>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-04-18 11:34 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-18 8:03 [PATCH v2] perf/core: Fix missing wakeup when waiting for context reference Haifeng Xu
2024-04-18 10:03 ` Frederic Weisbecker
2024-04-18 11:34 ` Haifeng Xu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.