On Wed, 2017-02-15 at 22:41 +0000, Ben Hutchings wrote: > 3.2.85-rc1 review patch.  If anyone has any objections, please let me know. > > ------------------ > > From: Peter Zijlstra > > commit 321027c1fe77f892f4ea07846aeae08cefbbb290 upstream. > > Di Shen reported a race between two concurrent sys_perf_event_open() > calls where both try and move the same pre-existing software group > into a hardware context. > > The problem is exactly that described in commit: > >   f63a8daa5812 ("perf: Fix event->ctx locking") > > ... where, while we wait for a ctx->mutex acquisition, the event->ctx > relation can have changed under us. > > That very same commit failed to recognise sys_perf_event_context() as an > external access vector to the events and thereby didn't apply the > established locking rules correctly. > > So while one sys_perf_event_open() call is stuck waiting on > mutex_lock_double(), the other (which owns said locks) moves the group > about. So by the time the former sys_perf_event_open() acquires the > locks, the context we've acquired is stale (and possibly dead). > > Apply the established locking rules as per perf_event_ctx_lock_nested() > to the mutex_lock_double() for the 'move_group' case. This obviously means > we need to validate state after we acquire the locks. [...] >   /* >    * See perf_event_ctx_lock() for comments on the details >    * of swizzling perf_event::ctx. >    */ > - mutex_lock_double(&gctx->mutex, &ctx->mutex); > - >   perf_remove_from_context(group_leader, false); >   >   /* > @@ -6709,10 +6757,8 @@ SYSCALL_DEFINE5(perf_event_open, >   ++ctx->generation; >   perf_unpin_context(ctx); >   > - if (move_group) { > - mutex_unlock(&gctx->mutex); > - put_ctx(gctx); > - } > + if (move_group) > + perf_event_ctx_unlock(group_leader, gctx); >   mutex_unlock(&ctx->mutex); >   >   event->owner = current; [...] Peter has clarified that the last call to put_ctx(gctx) corresponds to the reference cleared by perf_remove_from_context(group_leader, false) above. So although perf_event_ctx_unlock() also calls put_ctx(gctx), we really do want to drop two references here now and should keep the direct call. I made the same error when backporting to 3.16, and will fix that as well. Ben. -- Ben Hutchings 73.46% of all statistics are made up.