From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932339AbcKYQTm (ORCPT ); Fri, 25 Nov 2016 11:19:42 -0500 Received: from merlin.infradead.org ([205.233.59.134]:55266 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755392AbcKYQTc (ORCPT ); Fri, 25 Nov 2016 11:19:32 -0500 Date: Fri, 25 Nov 2016 15:09:47 +0100 From: Peter Zijlstra To: Thomas Gleixner Cc: mingo@kernel.org, juri.lelli@arm.com, rostedt@goodmis.org, xlpang@redhat.com, bigeasy@linutronix.de, linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com, jdesfossez@efficios.com, bristot@redhat.com Subject: Re: [RFC][PATCH 4/4] futex: Rewrite FUTEX_UNLOCK_PI Message-ID: <20161125140947.GB3107@twins.programming.kicks-ass.net> References: <20161003091234.879763059@infradead.org> <20161003091847.704255067@infradead.org> <20161007112143.GJ3117@twins.programming.kicks-ass.net> <20161008165540.GI3568@worktop.programming.kicks-ass.net> <20161021122735.GA3117@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 27, 2016 at 10:36:00PM +0200, Thomas Gleixner wrote: > So that waiter which is now spinning on pi_mutex->lock has already set the > waiters bit, which you undo. So you created the following scenario: > > CPU0 CPU1 CPU2 > > TID 0x1001 TID 0x1000 TID 0x1002 > > Acquires futex in user space > futex value = 0x1000; > > futex_lock_pi() > > lock_hb() > set_waiter_bit() > --> futex value = 0x40001000; > > futex_unlock_pi() > lock_hb() > attach_to_pi_owner() > rt_mutex_init_proxy_locked() > queue_me() > unlock_hb() > unlock_hb() > rt_mutex_lock() wake_futex_pi() > lock(pi_mutex->lock); > lock(pi_mutex->lock) new_owner is NULL; > --> futex value = 0; > rt_mutex_futex_unlock(pi_mutex); > unlock(pi_mutex->lock); > acquire_rt_mutex() return to user space > Acquires futex in user space > --> futex value = 0x1002 > fixup_owner() > fixup_pi_state_owner() > uval = 0x1002; > newval = 0x40001001; > cmpxchg(uval, newval) succeeds > --> futex value = 0x40001001 > > Voila. Inconsistent state .... TID 0x1002 is borked now. OK, I think its much worse... Consider: CPU0 (tid=0x1000) CPU1 (tid=0x1001) CPU2 (tid=0x1002) acquire in userspace *uaddr = 0x1000 futex_lock_pi() acq hb->lock attach_to_pi_owner pi_state->refcount == 1 queue_me rel hb->lock futex_lock_pi() acq hb->lock attach_to_pi_state pi_state->refcount == 2 queue_me rel hb->lock futex_unlock_pi() acq pi_mutex->lock top_waiter == (CPU1 | CPU2) wake_futex_pi() new_owner == NULL pi_state->owner = &init_task *uaddr = 0 rt_mutex_futex_unlock() ret-to-user acquire in userspace *uaddr = 0x1000 >>From here there's multiple ways to trainwreck the thing, the way you list above, lets call this A): rt_mutex_lock() acq hb->lock fixup_owner() cmpxchg(uaddr, 0x1000, 0x40001002) *BORKED CPU0* alternatively we can do; B): CPU3 (or CPU0 with a different tid) futex_lock_pi() acq hb->lock attach_to_pi_state() -EINVAL : *uaddr != task_pid_vnr(&init_task) Now, since CPU1 is pinning the hb->waiter relation, the proposed fixup_owner() -EAGAIN change cannot work either, since, A1): rt_mutex_lock() acq hb->lock fixup_owner() : -EAGAIN unqueue_me_pi() put_pi_state pi_state->refcount == 1 rel hb->lock goto retry_private retry_private: acq hb->lock attach_to_pi_state() : -EINVAL Similar things happen with CMP_REQUEUE doing a lookup_pi_state() around this point. I'm sorely tempted to do the 'simple' thing and leak FUTEX_WAITERS, forcing things into the slow path.