Re: safety of *mutex_unlock() (Was: [BUG] signal: sighand unprotected when accessed by /proc)

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Oleg Nesterov <oleg@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	Clark Williams <williams@redhat.com>
Subject: Re: safety of *mutex_unlock() (Was: [BUG] signal: sighand unprotected when accessed by /proc)
Date: Tue, 10 Jun 2014 05:56:55 -0700	[thread overview]
Message-ID: <20140610125655.GJ4581@linux.vnet.ibm.com> (raw)
In-Reply-To: <CA+55aFyO8bb7fS3AaQzfMq7geERQ1JpSdcRZ9Fbs5-tZP0oRXg@mail.gmail.com>

On Mon, Jun 09, 2014 at 11:51:09AM -0700, Linus Torvalds wrote:
> On Mon, Jun 9, 2014 at 11:29 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> >>
> >> And once again, note that the normal mutex is already unsafe (unless I missed
> >> something).
> >
> > Is it unsafe?
> >
> > This thread was started because of a bug we triggered in -rt, which
> > ended up being a change specific to -rt that modified the way slub
> > handled SLAB_DESTROY_BY_RCU. What else was wrong with it?
> 
> There's a different issue with freeing of mutexes, which is not a bug,
> but "by design". Namely that mutexes aren't truly "atomic". They are
> complex data structures, and they have issues that a spinlock does not
> have.
> 
> When unlocking a mutex, the thread doing the unlocking will still
> touch the mutex itself _after_ another thread could already have
> successfully acquired the mutex. This is not a problem in any normal
> use. since all of this is perfectly coherent in general, but it means
> that code sequences like:
> 
>   mutex_lock(mem->mutex);
>   kill_it = !--mem->refcount;
>   mutex_unlock(mem->mutex);
>   if (kill_it)
>      free(mem);
> 
> are fundamentally buggy.
> 
> Note that if you think of mutexes as truly indivisible atomic
> operations, the above is "obviously correct": the last person who got
> the mutex marked it for killing. But the fact is, the *next-to-last*
> mutex acquirer may still actively be in the *non-indivisible*
> mutex_unlock() when the last person frees it, resulting in a
> use-after-free. And yes, we've had this bug, and as far as I know it's
> possible that the RT code *introduces* this bug when it changes
> spinlocks into mutexes. Because we do exactly the above code sequence
> with spinlocks. So just replacing spinlocks with mutexes is a very
> subtly buggy thing to do in general.
> 
> Another example of this kind of situation is using a mutex as a
> completion event: that's simply buggy. Again, it's because mutexes are
> complex data structures, and you have a very similar use-after-free
> issue. It's why the fundamental data structure for a "struct
> completion" is a spinlock, not a mutex.
> 
> Again, in *theory*, a completion could be just a mutex that starts out
> locked, and then the completer completes it by just unlocking it. The
> person who waits for a completion does so by just asking for a lock.
> Obvious, simple, and trivial, right? Not so, because of the *exact*
> same issue above: the completer (who does an "unlock") may still be
> accessing the completion data structure when the completion waiter has
> successfully gotten the lock. So the standard thing of the completer
> freeing the underlying completion memory (which is often on the stack,
> so "freeing" is just he act of going out of scope of the liveness of
> the completion data structure) would not work if the completion was a
> mutex.
> 
> This is subtle, and it is basically unavoidable. If a mutex (or
> counting semaphore) has a fast-path - and a mutex/semaphore without a
> fast-path is shit - then this issue will exist. Exactly because the
> fast-path will depend on just one part of the whole big mutex
> structure, and the slow-path will have other pieces to it.
> 
> There might be reasonable ways to avoid this issue (having the
> fastpath locking field share memory with the slow-path locking, for
> example), but it's not how our semaphores and mutexes work, and I
> suspect it cannot be the case in general (because it limits you too
> badly in how to implement the mutex). As a result, this is all "by
> design" as opposed to being a bug.

So to safely free a structure containing a mutex, is there some better
approach than the following?

	mutex_lock(mem->mutex);
	kill_it = !--mem->refcount;
	rcu_read_lock();
	mutex_unlock(mem->mutex);
	rcu_read_unlock();
	if (kill_it)
		kfree_rcu(mem, rh); /* rh is the rcu_head field in mem. */

For example, is there some other way to know that all the prior lock
releases have finished their post-release accesses?

							Thanx, Paul