Re: [PATCH] perf: protect group_leader from races that cause ctx

From: Peter Zijlstra <peterz@infradead.org>
To: Kees Cook <keescook@chromium.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	John Dias <joaodias@google.com>, Min Chong <mchong@google.com>
Subject: Re: [PATCH] perf: protect group_leader from races that cause ctx
Date: Fri, 6 Jan 2017 14:14:44 +0100	[thread overview]
Message-ID: <20170106131444.GZ3174@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20170106093251.GL3093@worktop>

On Fri, Jan 06, 2017 at 10:32:51AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 05, 2017 at 03:14:29PM -0800, Kees Cook wrote:
> > From: John Dias <joaodias@google.com>
> > 
> > When moving a group_leader perf event from a software-context to
> > a hardware-context, there's a race in checking and updating that
> > context. The existing locking solution doesn't work; note that it tries
> > to grab a lock inside the group_leader's context object, which you can
> > only get at by going through a pointer that should be protected from these
> > races. If two threads trigger this operation simultaneously, the refcount
> > of 'perf_event_context' will fall to zero and the object may be freed.
> > 
> > To avoid that problem, and to produce a simple solution, we can just
> > use a lock per group_leader to protect all checks on the group_leader's
> > context. The new lock is grabbed and released when no context locks are
> > held.
> 
> This Changelog really stinks. I'll go try and reverse engineer the thing
> :-(

So the fundamental problem is a race of two sys_perf_event_open() calls
trying to move the same (software) group, nothing else, the rest of the
text above is misdirection and side effects.

And instead of applying the existing locking rules for this exact
scenario, it invents extra locking :-(

Ok so I came up with the following, compile tested only, since no
reproducer and being fairly grumpy for having to spend entirely too much
time reconstructing the problem.

---
Subject: perf: Fix concurrent sys_perf_event_open() move_context race

Kees reported a race between two concurrent sys_perf_event_open() calls
where both try and move the same pre-existing software group into a
hardware context.

The problem is exactly that of commit f63a8daa5812 ("perf: Fix
event->ctx locking"), where, while we wait for a ctx->mutex acquisition,
the event->ctx relation can have changed under us.

That very same commit failed to recognise sys_perf_event_context() as an
external access vector to the events and thereby didn't apply the
established locking rules correctly.

So while one sys_perf_event_open() call is stuck waiting on
mutex_lock_double(), the other (which owns said locks) moves the group
about. So by the time the former sys_perf_event_open() acquires the
locks, the context we've acquired is stale (and possibly dead).

Apply the established locking rules as per perf_event_ctx_lock_nested()
to the mutex_lock_double() for the move_group case. This obviously means
we need to validate state after we acquire the locks.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: John Dias <joaodias@google.com>
Cc: Min Chong <mchong@google.com>
Fixes: f63a8daa5812 ("perf: Fix event->ctx locking")
Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index b47f2f24e36a..c5d62a9f2c97 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9518,6 +9518,37 @@ static int perf_event_set_clock(struct perf_event *event, clockid_t clk_id)
 	return 0;
 }
 
+/*
+ * Variation on perf_event_ctx_lock_nested(), except we take two context
+ * mutexes.
+ */
+static struct perf_event_context *
+__perf_event_ctx_lock_double(struct perf_event *group_leader,
+			     struct perf_event_context *ctx)
+{
+	struct perf_event_context *gctx;
+
+again:
+	rcu_read_lock();
+	gctx = READ_ONCE(group_leader->ctx);
+	if (!atomic_inc_not_zero(&gctx->refcount)) {
+		rcu_read_unlock();
+		goto again;
+	}
+	rcu_read_unlock();
+
+	mutex_lock_double(&gctx->mutex, &ctx->mutex);
+
+	if (group_leader->ctx != gctx) {
+		mutex_unlock(&ctx->mutex);
+		mutex_unlock(&gctx->mutex);
+		put_ctx(gctx);
+		goto again;
+	}
+
+	return gctx;
+}
+
 /**
  * sys_perf_event_open - open a performance event, associate it to a task/cpu
  *
@@ -9761,12 +9792,31 @@ SYSCALL_DEFINE5(perf_event_open,
 	}
 
 	if (move_group) {
-		gctx = group_leader->ctx;
-		mutex_lock_double(&gctx->mutex, &ctx->mutex);
+		gctx = __perf_event_ctx_lock_double(group_leader, ctx);
+
 		if (gctx->task == TASK_TOMBSTONE) {
 			err = -ESRCH;
 			goto err_locked;
 		}
+
+		/*
+		 * Check if we raced against another sys_perf_event_open() call
+		 * moving the software group underneath us.
+		 */
+		if (!(group_leader->group_caps & PERF_EV_CAP_SOFTWARE)) {
+			/*
+			 * If someone moved the group out from under us, check
+			 * if this new event wound up on the same ctx, if so
+			 * its the regular !move_group case, otherwise fail.
+			 */
+			if (gctx != ctx) {
+				err = -EINVAL;
+				goto err_locked;
+			} else {
+				perf_event_ctx_unlock(group_leader, gctx);
+				move_group = 0;
+			}
+		}
 	} else {
 		mutex_lock(&ctx->mutex);
 	}
@@ -9868,7 +9918,7 @@ SYSCALL_DEFINE5(perf_event_open,
 	perf_unpin_context(ctx);
 
 	if (move_group)
-		mutex_unlock(&gctx->mutex);
+		perf_event_ctx_unlock(group_leader, gctx);
 	mutex_unlock(&ctx->mutex);
 
 	if (task) {
@@ -9894,7 +9944,7 @@ SYSCALL_DEFINE5(perf_event_open,
 
 err_locked:
 	if (move_group)
-		mutex_unlock(&gctx->mutex);
+		perf_event_ctx_unlock(group_leader, gctx);
 	mutex_unlock(&ctx->mutex);
 /* err_file: */
 	fput(event_file);