From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1163021AbdAFNOw (ORCPT ); Fri, 6 Jan 2017 08:14:52 -0500 Received: from merlin.infradead.org ([205.233.59.134]:50708 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161620AbdAFNOo (ORCPT ); Fri, 6 Jan 2017 08:14:44 -0500 Date: Fri, 6 Jan 2017 14:14:44 +0100 From: Peter Zijlstra To: Kees Cook Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Arnaldo Carvalho de Melo , Alexander Shishkin , John Dias , Min Chong Subject: Re: [PATCH] perf: protect group_leader from races that cause ctx Message-ID: <20170106131444.GZ3174@twins.programming.kicks-ass.net> References: <20170105231429.GA83592@beast> <20170106093251.GL3093@worktop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170106093251.GL3093@worktop> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 06, 2017 at 10:32:51AM +0100, Peter Zijlstra wrote: > On Thu, Jan 05, 2017 at 03:14:29PM -0800, Kees Cook wrote: > > From: John Dias > > > > When moving a group_leader perf event from a software-context to > > a hardware-context, there's a race in checking and updating that > > context. The existing locking solution doesn't work; note that it tries > > to grab a lock inside the group_leader's context object, which you can > > only get at by going through a pointer that should be protected from these > > races. If two threads trigger this operation simultaneously, the refcount > > of 'perf_event_context' will fall to zero and the object may be freed. > > > > To avoid that problem, and to produce a simple solution, we can just > > use a lock per group_leader to protect all checks on the group_leader's > > context. The new lock is grabbed and released when no context locks are > > held. > > This Changelog really stinks. I'll go try and reverse engineer the thing > :-( So the fundamental problem is a race of two sys_perf_event_open() calls trying to move the same (software) group, nothing else, the rest of the text above is misdirection and side effects. And instead of applying the existing locking rules for this exact scenario, it invents extra locking :-( Ok so I came up with the following, compile tested only, since no reproducer and being fairly grumpy for having to spend entirely too much time reconstructing the problem. --- Subject: perf: Fix concurrent sys_perf_event_open() move_context race Kees reported a race between two concurrent sys_perf_event_open() calls where both try and move the same pre-existing software group into a hardware context. The problem is exactly that of commit f63a8daa5812 ("perf: Fix event->ctx locking"), where, while we wait for a ctx->mutex acquisition, the event->ctx relation can have changed under us. That very same commit failed to recognise sys_perf_event_context() as an external access vector to the events and thereby didn't apply the established locking rules correctly. So while one sys_perf_event_open() call is stuck waiting on mutex_lock_double(), the other (which owns said locks) moves the group about. So by the time the former sys_perf_event_open() acquires the locks, the context we've acquired is stale (and possibly dead). Apply the established locking rules as per perf_event_ctx_lock_nested() to the mutex_lock_double() for the move_group case. This obviously means we need to validate state after we acquire the locks. Cc: Ingo Molnar Cc: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: John Dias Cc: Min Chong Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Reported-by: Kees Cook Signed-off-by: Peter Zijlstra (Intel) --- kernel/events/core.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 54 insertions(+), 4 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index b47f2f24e36a..c5d62a9f2c97 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9518,6 +9518,37 @@ static int perf_event_set_clock(struct perf_event *event, clockid_t clk_id) return 0; } +/* + * Variation on perf_event_ctx_lock_nested(), except we take two context + * mutexes. + */ +static struct perf_event_context * +__perf_event_ctx_lock_double(struct perf_event *group_leader, + struct perf_event_context *ctx) +{ + struct perf_event_context *gctx; + +again: + rcu_read_lock(); + gctx = READ_ONCE(group_leader->ctx); + if (!atomic_inc_not_zero(&gctx->refcount)) { + rcu_read_unlock(); + goto again; + } + rcu_read_unlock(); + + mutex_lock_double(&gctx->mutex, &ctx->mutex); + + if (group_leader->ctx != gctx) { + mutex_unlock(&ctx->mutex); + mutex_unlock(&gctx->mutex); + put_ctx(gctx); + goto again; + } + + return gctx; +} + /** * sys_perf_event_open - open a performance event, associate it to a task/cpu * @@ -9761,12 +9792,31 @@ SYSCALL_DEFINE5(perf_event_open, } if (move_group) { - gctx = group_leader->ctx; - mutex_lock_double(&gctx->mutex, &ctx->mutex); + gctx = __perf_event_ctx_lock_double(group_leader, ctx); + if (gctx->task == TASK_TOMBSTONE) { err = -ESRCH; goto err_locked; } + + /* + * Check if we raced against another sys_perf_event_open() call + * moving the software group underneath us. + */ + if (!(group_leader->group_caps & PERF_EV_CAP_SOFTWARE)) { + /* + * If someone moved the group out from under us, check + * if this new event wound up on the same ctx, if so + * its the regular !move_group case, otherwise fail. + */ + if (gctx != ctx) { + err = -EINVAL; + goto err_locked; + } else { + perf_event_ctx_unlock(group_leader, gctx); + move_group = 0; + } + } } else { mutex_lock(&ctx->mutex); } @@ -9868,7 +9918,7 @@ SYSCALL_DEFINE5(perf_event_open, perf_unpin_context(ctx); if (move_group) - mutex_unlock(&gctx->mutex); + perf_event_ctx_unlock(group_leader, gctx); mutex_unlock(&ctx->mutex); if (task) { @@ -9894,7 +9944,7 @@ SYSCALL_DEFINE5(perf_event_open, err_locked: if (move_group) - mutex_unlock(&gctx->mutex); + perf_event_ctx_unlock(group_leader, gctx); mutex_unlock(&ctx->mutex); /* err_file: */ fput(event_file);