From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1755956AbdCGSoK (ORCPT <rfc822;w@1wt.eu>);
        Tue, 7 Mar 2017 13:44:10 -0500
Received: from merlin.infradead.org ([205.233.59.134]:59686 "EHLO
        merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1755900AbdCGSoI (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 7 Mar 2017 13:44:08 -0500
Date: Tue, 7 Mar 2017 18:59:23 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: mingo@kernel.org, juri.lelli@arm.com, rostedt@goodmis.org,
        xlpang@redhat.com, bigeasy@linutronix.de, linux-kernel@vger.kernel.org,
        mathieu.desnoyers@efficios.com, jdesfossez@efficios.com,
        bristot@redhat.com, dvhart@infradead.org
Subject: Re: [PATCH -v5 14/14] futex: futex_unlock_pi() determinism
Message-ID: <20170307175923.GE3312@twins.programming.kicks-ass.net>
References: <20170304092717.762954142@infradead.org>
 <20170304093559.696873055@infradead.org>
 <alpine.DEB.2.20.1703071526110.3584@nanos>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.DEB.2.20.1703071526110.3584@nanos>
User-Agent: Mutt/1.5.23.1 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Mar 07, 2017 at 03:31:50PM +0100, Thomas Gleixner wrote:
> On Sat, 4 Mar 2017, Peter Zijlstra wrote:
> 
> > The problem with returning -EAGAIN when the waiter state mismatches is
> > that it becomes very hard to proof a bounded execution time on the
> > operation. And seeing that this is a RT operation, this is somewhat
> > important.
> > 
> > While in practise it will be very unlikely to ever really take more
> > than one or two rounds, proving so becomes rather hard.
> 
> Oh no. Assume the following:
> 
> T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1)
> 
> CPU0
> 
> T1 
>   lock_pi()
>   queue_me()  <- Waiter is visible
> 
> preemption
> 
> T2
>   unlock_pi()
>     loops with -EAGAIN forever

Ah! indeed.

> > Now that modifying wait_list is done while holding both hb->lock and
> > wait_lock, we can avoid the scenario entirely if we acquire wait_lock
> > while still holding hb-lock. Doing a hand-over, without leaving a
> > hole.
> 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  kernel/futex.c |   26 ++++++++++++--------------
> >  1 file changed, 12 insertions(+), 14 deletions(-)
> > 
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -1391,16 +1391,11 @@ static int wake_futex_pi(u32 __user *uad
> >  	DEFINE_WAKE_Q(wake_q);
> >  	int ret = 0;
> >  
> >  	new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
> > +	if (WARN_ON_ONCE(!new_owner)) {
> >  		/*
> > +		 * Should be impossible now... but if weirdness happens,
> 
> 'now...' is not very useful 6 month from NOW :)

I'll put in a reference to the below comment in, that explains why this
should now be impossible.

> > +		 * returning -EAGAIN is safe and correct.
> >  		 */
> >  		ret = -EAGAIN;
> >  		goto out_unlock;
> > @@ -2770,15 +2765,18 @@ static int futex_unlock_pi(u32 __user *u
> >  		if (pi_state->owner != current)
> >  			goto out_unlock;
> >  
> > +		get_pi_state(pi_state);
> >  		/*
> > +		 * Since modifying the wait_list is done while holding both
> > +		 * hb->lock and wait_lock, holding either is sufficient to
> > +		 * observe it.
> >  		 *
> > +		 * By taking wait_lock while still holding hb->lock, we ensure
> > +		 * there is no point where we hold neither; and therefore
> > +		 * wake_futex_pi() must observe a state consistent with what we
> > +		 * observed.
> >  		 */


^^ that one.

> > +		raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
> >  		spin_unlock(&hb->lock);