From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S965397AbcKXL1D (ORCPT <rfc822;w@1wt.eu>);
        Thu, 24 Nov 2016 06:27:03 -0500
Received: from mail-wm0-f43.google.com ([74.125.82.43]:33876 "EHLO
        mail-wm0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S964851AbcKXL1B (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 24 Nov 2016 06:27:01 -0500
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes
To: Peter Zijlstra <peterz@infradead.org>,
        =?UTF-8?Q?Nicolai_H=c3=a4hnle?= <Nicolai.Haehnle@amd.com>,
        linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
        Ingo Molnar <mingo@redhat.com>, stable@vger.kernel.org,
        Maarten Lankhorst <maarten.lankhorst@canonical.com>
References: <1479900325-28358-1-git-send-email-nhaehnle@gmail.com>
 <20161123140336.GU3092@twins.programming.kicks-ass.net>
 <20161123142525.ns2pkyp4bo2sa5z2@phenom.ffwll.local>
From: =?UTF-8?Q?Nicolai_H=c3=a4hnle?= <nhaehnle@gmail.com>
Message-ID: <b1320ee3-1d68-8ebe-03ab-b491b04b5e6f@gmail.com>
Date: Thu, 24 Nov 2016 12:26:57 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <20161123142525.ns2pkyp4bo2sa5z2@phenom.ffwll.local>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 23.11.2016 15:25, Daniel Vetter wrote:
> On Wed, Nov 23, 2016 at 03:03:36PM +0100, Peter Zijlstra wrote:
>> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote:
>>> @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
>>>  	 */
>>>  	mutex_clear_owner(&lock->base);
>>>  #endif
>>> -	__mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
>>> +	/*
>>> +	 * A previously _not_ waiting task may acquire the lock via the fast
>>> +	 * path during our unlock. In that case, already waiting tasks may have
>>> +	 * to back off to avoid a deadlock. Wake up all waiters so that they
>>> +	 * can check their acquire context stamp against the new owner.
>>> +	 */
>>> +	__mutex_fastpath_unlock(&lock->base.count,
>>> +				__mutex_unlock_slowpath_wakeall);
>>>  }
>>
>> So doing a wake-all has obvious issues with thundering herd etc.. Also,
>> with the new mutex, you'd not be able to do hand-off, which would
>> introduce starvation cases.
>>
>> Ideally we'd iterate the blocked list and pick the waiter with the
>> earliest stamp, or we'd maintain the list in stamp order instead of
>> FIFO, for ww_mutex.
>
> Not sure we'll win that much, at least I think we still need to wake up
> everyone with earlier stamp than the one of the task that just released
> the lock. Otherwise there's deadlocks. So just cuts the wakeups in half,
> on average.
>
> What we could do is do a handoff-hint with the timestamp of earliest task
> we believe should get the lock. Everyone with a later timestamp that gets
> woken then knows that they definitely have a deadlock situation and need
> to back off (thread 2 in the example).
>
> thread 1 would get woken, and would be able to take the lock, except when
> thread 0 successfully raced it and stole the lock. And anyone else racing
> in with later timestamps would also immediately back off, ensuring
> fairness.

I did consider maintaining stamp order in the waiter list and originally 
decided against it because I just wanted a simple and conservative fix 
to avoid adding new regressions.

Now that a different approach is needed for >= 4.9 anyway mostly due to 
the hand-off logic, I'm reconsidering this.

I do believe we can win a bit by keeping the wait list sorted, if we 
also make sure that waiters don't add themselves in the first place if 
they see that a deadlock situation cannot be avoided.

I will probably want to extend struct mutex_waiter with 
ww_mutex-specific fields to facilitate this (i.e. ctx pointer, perhaps 
stamp as well to reduce pointer-chasing). That should be fine since it 
lives on the stack.

In the meantime, I'd appreciate it if patch #1 could be accepted as-is 
for stable updates to <= 4.8. It fixes a real (if rare) bug, and the 
stampede inefficiency isn't a problem in practice at least for GPU 
applications.

Thanks,
Nicolai

>
> Without thinking it through in detail this is a PI issue, except that we
> replace boosting with wakeup&back-off. Could we perhaps steal something
> from rt mutexes to make it fair&efficient?
> -Daniel
>

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes
To: Peter Zijlstra <peterz@infradead.org>,
        =?UTF-8?Q?Nicolai_H=c3=a4hnle?= <Nicolai.Haehnle@amd.com>,
        linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
        Ingo Molnar <mingo@redhat.com>, stable@vger.kernel.org,
        Maarten Lankhorst <maarten.lankhorst@canonical.com>
References: <1479900325-28358-1-git-send-email-nhaehnle@gmail.com>
 <20161123140336.GU3092@twins.programming.kicks-ass.net>
 <20161123142525.ns2pkyp4bo2sa5z2@phenom.ffwll.local>
From: =?UTF-8?Q?Nicolai_H=c3=a4hnle?= <nhaehnle@gmail.com>
Message-ID: <b1320ee3-1d68-8ebe-03ab-b491b04b5e6f@gmail.com>
Date: Thu, 24 Nov 2016 12:26:57 +0100
MIME-Version: 1.0
In-Reply-To: <20161123142525.ns2pkyp4bo2sa5z2@phenom.ffwll.local>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <stable.vger.kernel.org>

On 23.11.2016 15:25, Daniel Vetter wrote:
> On Wed, Nov 23, 2016 at 03:03:36PM +0100, Peter Zijlstra wrote:
>> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hï¿½hnle wrote:
>>> @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
>>>  	 */
>>>  	mutex_clear_owner(&lock->base);
>>>  #endif
>>> -	__mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
>>> +	/*
>>> +	 * A previously _not_ waiting task may acquire the lock via the fast
>>> +	 * path during our unlock. In that case, already waiting tasks may have
>>> +	 * to back off to avoid a deadlock. Wake up all waiters so that they
>>> +	 * can check their acquire context stamp against the new owner.
>>> +	 */
>>> +	__mutex_fastpath_unlock(&lock->base.count,
>>> +				__mutex_unlock_slowpath_wakeall);
>>>  }
>>
>> So doing a wake-all has obvious issues with thundering herd etc.. Also,
>> with the new mutex, you'd not be able to do hand-off, which would
>> introduce starvation cases.
>>
>> Ideally we'd iterate the blocked list and pick the waiter with the
>> earliest stamp, or we'd maintain the list in stamp order instead of
>> FIFO, for ww_mutex.
>
> Not sure we'll win that much, at least I think we still need to wake up
> everyone with earlier stamp than the one of the task that just released
> the lock. Otherwise there's deadlocks. So just cuts the wakeups in half,
> on average.
>
> What we could do is do a handoff-hint with the timestamp of earliest task
> we believe should get the lock. Everyone with a later timestamp that gets
> woken then knows that they definitely have a deadlock situation and need
> to back off (thread 2 in the example).
>
> thread 1 would get woken, and would be able to take the lock, except when
> thread 0 successfully raced it and stole the lock. And anyone else racing
> in with later timestamps would also immediately back off, ensuring
> fairness.

I did consider maintaining stamp order in the waiter list and originally 
decided against it because I just wanted a simple and conservative fix 
to avoid adding new regressions.

Now that a different approach is needed for >= 4.9 anyway mostly due to 
the hand-off logic, I'm reconsidering this.

I do believe we can win a bit by keeping the wait list sorted, if we 
also make sure that waiters don't add themselves in the first place if 
they see that a deadlock situation cannot be avoided.

I will probably want to extend struct mutex_waiter with 
ww_mutex-specific fields to facilitate this (i.e. ctx pointer, perhaps 
stamp as well to reduce pointer-chasing). That should be fine since it 
lives on the stack.

In the meantime, I'd appreciate it if patch #1 could be accepted as-is 
for stable updates to <= 4.8. It fixes a real (if rare) bug, and the 
stampede inefficiency isn't a problem in practice at least for GPU 
applications.

Thanks,
Nicolai

>
> Without thinking it through in detail this is a PI issue, except that we
> replace boosting with wakeup&back-off. Could we perhaps steal something
> from rt mutexes to make it fair&efficient?
> -Daniel
>