From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755956AbdCGSoK (ORCPT ); Tue, 7 Mar 2017 13:44:10 -0500 Received: from merlin.infradead.org ([205.233.59.134]:59686 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755900AbdCGSoI (ORCPT ); Tue, 7 Mar 2017 13:44:08 -0500 Date: Tue, 7 Mar 2017 18:59:23 +0100 From: Peter Zijlstra To: Thomas Gleixner Cc: mingo@kernel.org, juri.lelli@arm.com, rostedt@goodmis.org, xlpang@redhat.com, bigeasy@linutronix.de, linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com, jdesfossez@efficios.com, bristot@redhat.com, dvhart@infradead.org Subject: Re: [PATCH -v5 14/14] futex: futex_unlock_pi() determinism Message-ID: <20170307175923.GE3312@twins.programming.kicks-ass.net> References: <20170304092717.762954142@infradead.org> <20170304093559.696873055@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 07, 2017 at 03:31:50PM +0100, Thomas Gleixner wrote: > On Sat, 4 Mar 2017, Peter Zijlstra wrote: > > > The problem with returning -EAGAIN when the waiter state mismatches is > > that it becomes very hard to proof a bounded execution time on the > > operation. And seeing that this is a RT operation, this is somewhat > > important. > > > > While in practise it will be very unlikely to ever really take more > > than one or two rounds, proving so becomes rather hard. > > Oh no. Assume the following: > > T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1) > > CPU0 > > T1 > lock_pi() > queue_me() <- Waiter is visible > > preemption > > T2 > unlock_pi() > loops with -EAGAIN forever Ah! indeed. > > Now that modifying wait_list is done while holding both hb->lock and > > wait_lock, we can avoid the scenario entirely if we acquire wait_lock > > while still holding hb-lock. Doing a hand-over, without leaving a > > hole. > > > Signed-off-by: Peter Zijlstra (Intel) > > --- > > kernel/futex.c | 26 ++++++++++++-------------- > > 1 file changed, 12 insertions(+), 14 deletions(-) > > > > --- a/kernel/futex.c > > +++ b/kernel/futex.c > > @@ -1391,16 +1391,11 @@ static int wake_futex_pi(u32 __user *uad > > DEFINE_WAKE_Q(wake_q); > > int ret = 0; > > > > new_owner = rt_mutex_next_owner(&pi_state->pi_mutex); > > + if (WARN_ON_ONCE(!new_owner)) { > > /* > > + * Should be impossible now... but if weirdness happens, > > 'now...' is not very useful 6 month from NOW :) I'll put in a reference to the below comment in, that explains why this should now be impossible. > > + * returning -EAGAIN is safe and correct. > > */ > > ret = -EAGAIN; > > goto out_unlock; > > @@ -2770,15 +2765,18 @@ static int futex_unlock_pi(u32 __user *u > > if (pi_state->owner != current) > > goto out_unlock; > > > > + get_pi_state(pi_state); > > /* > > + * Since modifying the wait_list is done while holding both > > + * hb->lock and wait_lock, holding either is sufficient to > > + * observe it. > > * > > + * By taking wait_lock while still holding hb->lock, we ensure > > + * there is no point where we hold neither; and therefore > > + * wake_futex_pi() must observe a state consistent with what we > > + * observed. > > */ ^^ that one. > > + raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); > > spin_unlock(&hb->lock);