Re: [PATCH] perf: Fix sys_perf_event_open() race against self

From: Mark Rutland <mark.rutland@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org, nslusarek@gmx.net,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] perf: Fix sys_perf_event_open() race against self
Date: Mon, 23 May 2022 11:48:43 +0100	[thread overview]
Message-ID: <Yotmi2kM3mm+iPPU@FVFF77S0Q05N> (raw)
In-Reply-To: <20220520183806.GV2578@worktop.programming.kicks-ass.net>

Hi Peter,

On Fri, May 20, 2022 at 08:38:06PM +0200, Peter Zijlstra wrote:
> 
> Norbert reported that it's possible to race sys_perf_event_open() such
> that the looser ends up in another context from the group leader,
> triggering many WARNs.

I'm hitting the same with my local arm64 Syzkaller instance, atop v5.18-rc6.

> The move_group case checks for races against itself, but the
> !move_group case doesn't, seemingly relying on the previous
> group_leader->ctx == ctx check. However, that check is racy due to not
> holding any locks at that time.
> 
> Therefore, re-check the result after acquiring locks and bailing
> if they no longer match.
> 
> Additionally, clarify the not_move_group case from the
> move_group-vs-move_group race.
> 
> Fixes: f63a8daa5812 ("perf: Fix event->ctx locking")
> Reported-by: Norbert Slusarek <nslusarek@gmx.net>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/events/core.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -12217,6 +12217,9 @@ SYSCALL_DEFINE5(perf_event_open,
>  		 * Do not allow to attach to a group in a different task
>  		 * or CPU context. If we're moving SW events, we'll fix
>  		 * this up later, so allow that.
> +		 *
> +		 * Racy, not holding group_leader->ctx->mutex, see comment with
> +		 * perf_event_ctx_lock().
>  		 */
>  		if (!move_group && group_leader->ctx != ctx)
>  			goto err_context;

I assume that given we say we're not holding the mutex that this is truly racy,
and a concurrent write can happen at any time. If that's the case, shouldn't we
be using *_ONCE() to manipulate perf_event::ctx?

... or could we remove the racy read entirely, and just rely on the later
check with ctx->mutex held? We can always reach that by chance anyway, so
there's not a functional need to bail out early, and consistently using the
later test removes some potential for introducing similar races in future.

FWIW, with this patch as-is applied atop v5.18-rc6, I no longer see the issue
in testing with the reproducer Syzkaller came up with. That would normally take
a few seconds, but it now survives several minutes.

For posterity, that reproducer was:

----8<----
r0 = perf_event_open$cgroup(&(0x7f0000000100)={0x1, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_config_ext}, 0xffffffffffffffff, 0x0, 0xffffffffffffffff, 0x0)
r1 = dup(r0)
perf_event_open(&(0x7f00000001c0)={0x0, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_bp={0x0}}, 0xffffffffffffffff, 0x0, r1, 0x0) (async)
perf_event_open$cgroup(&(0x7f0000000100)={0x1, 0x80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, @perf_bp={0x0, 0x5}, 0x0, 0x0, 0x0, 0x0, 0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x200000000000000}, 0xffffffffffffffff, 0x0, r1, 0x0)
---->8----

Thanks,
Mark.

> @@ -12282,6 +12285,7 @@ SYSCALL_DEFINE5(perf_event_open,
>  			} else {
>  				perf_event_ctx_unlock(group_leader, gctx);
>  				move_group = 0;
> +				goto not_move_group;
>  			}
>  		}
>  
> @@ -12298,7 +12302,17 @@ SYSCALL_DEFINE5(perf_event_open,
>  		}
>  	} else {
>  		mutex_lock(&ctx->mutex);
> +
> +		/*
> +		 * Now that we hold ctx->lock, (re)validate group_leader->ctx == ctx,
> +		 * see the group_leader && !move_group test earlier.
> +		 */
> +		if (group_leader && group_leader->ctx != ctx) {
> +			err = -EINVAL;
> +			goto err_locked;
> +		}
>  	}
> +not_move_group:
>  
>  	if (ctx->task == TASK_TOMBSTONE) {
>  		err = -ESRCH;
>