From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S941120AbcKXS6S (ORCPT ); Thu, 24 Nov 2016 13:58:18 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:54492 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935677AbcKXS6R (ORCPT ); Thu, 24 Nov 2016 13:58:17 -0500 Date: Thu, 24 Nov 2016 19:58:07 +0100 From: Peter Zijlstra To: Thomas Gleixner Cc: mingo@kernel.org, juri.lelli@arm.com, rostedt@goodmis.org, xlpang@redhat.com, bigeasy@linutronix.de, linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com, jdesfossez@efficios.com, bristot@redhat.com Subject: Re: [RFC][PATCH 4/4] futex: Rewrite FUTEX_UNLOCK_PI Message-ID: <20161124185807.GI3092@twins.programming.kicks-ass.net> References: <20161003091847.704255067@infradead.org> <20161007112143.GJ3117@twins.programming.kicks-ass.net> <20161008165540.GI3568@worktop.programming.kicks-ass.net> <20161021122735.GA3117@twins.programming.kicks-ass.net> <20161123192005.GA3107@twins.programming.kicks-ass.net> <20161124165241.GF3174@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 24, 2016 at 06:56:53PM +0100, Thomas Gleixner wrote: > > I'm stumped on REQUEUE_PI.. this relies on attach_to_pi_owner() and > > You mean LOCK_PI, right? > > > fixup_owner() being in the same function. But this is not the case for > > requeue. WAIT_REQUEUE has the fixup, as its return path finds it has > > acquired the outer pi-futex (uaddr2), but the lookup_pi_state() stuff is > > done by CMP_REQUEUE, which does the actual transfer of the waiters from > > inner futex (uaddr1) to outer futex (uaddr2). > > Correct. WAIT_REQUEUE puts the futex on the inner (uaddr1) and then gets > moved to the outer. From there it's the same thing as LOCK_PI. > > > Maybe I can restructure things a bit, I think CMP_REQUEUE would also > > know who actually acquired the outer-futex, but I have to think more on > > this and the brain is pretty fried... > > That is irrelevant at requeue time and the owner can change up to the point > where the waiter is really woken by a UNLOCK_PI. OK, so clearly I'm confused. So let me try again. LOCK_PI, does in one function: lookup_pi_state, and fixup_owner. If fixup_owner fails with -EAGAIN, we can redo the pi_state lookup. The requeue stuff, otoh, has one each. REQUEUE_WAIT has fixup_owner(), CMP_REQUEUE has lookup_pi_state. Therefore, fixup_owner failing with -EAGAIN leaves us dead in the water. There's nothing to go back to to retry. So far, so 'good', right? Now, as far as I understand this requeue stuff, we have 2 futexes, an inner futex and an outer futex. The inner futex is always 'locked' and serves as a collection pool for waiting threads. The requeue crap picks one (or more) waiters from the inner futex and sticks them on the outer futex, which gives them a chance to run. So WAIT_REQUEUE blocks on the inner futex, but knows that if it ever gets woken, it will be on the outer futex, and hence needs to fixup_owner if the futex and rt_mutex state got out of sync. CMP_REQUEUEUEUE picks the one (or more) waiters of the inner futex and sticks them on the outer futex. So far, so 'good' ? The thing I'm not entire sure on is what happens with the outer futex, do we first LOCK_PI it before doing CMP_REQUEUE, giving us waiters, and then UNLOCK_PI to let them rip? Or do we just CMP_REQUEUE and then let whoever wins finish with UNLOCK_PI? In any case, I don't think it matters much, either way we can race betwen the 'last' UNLOCK_PI and getting rt_mutex waiters and then hit the &init_task funny state, such that WAIT_REQUEUE waking hits EAGAIN and we're 'stuck'. Now, if we always CMP_REQUEUE to a locked outer futex, then we cannot know, at CMP_REQUEUE time, who will win and cannot fix up. The only solution I've come up with so far involves that rt_mutex_proxy_swizzle() muck which you didn't really fancy much.