From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1756628AbcKWNmx (ORCPT <rfc822;w@1wt.eu>);
        Wed, 23 Nov 2016 08:42:53 -0500
Received: from mblankhorst.nl ([141.105.120.124]:37266 "EHLO mblankhorst.nl"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1756602AbcKWNmw (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 23 Nov 2016 08:42:52 -0500
X-Greylist: delayed 517 seconds by postgrey-1.27 at vger.kernel.org; Wed, 23 Nov 2016 08:42:52 EST
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes
To: Peter Zijlstra <peterz@infradead.org>,
        =?UTF-8?Q?Nicolai_H=c3=a4hnle?= <nhaehnle@gmail.com>,
        =?UTF-8?Q?Nicolai_H=c3=a4hnle?= <Nicolai.Haehnle@amd.com>,
        linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
        Ingo Molnar <mingo@redhat.com>, stable@vger.kernel.org,
        Maarten Lankhorst <maarten.lankhorst@canonical.com>
References: <1479900325-28358-1-git-send-email-nhaehnle@gmail.com>
 <20161123130046.GS3092@twins.programming.kicks-ass.net>
 <20161123130848.q6yw73fjdhttmbqh@phenom.ffwll.local>
 <20161123131142.g53re5uznh6jqtk6@phenom.ffwll.local>
From: Maarten Lankhorst <dev@mblankhorst.nl>
Message-ID: <ece16d5d-89a9-7b0f-878b-f894b61aaa73@mblankhorst.nl>
Date: Wed, 23 Nov 2016 14:33:54 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <20161123131142.g53re5uznh6jqtk6@phenom.ffwll.local>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Op 23-11-16 om 14:11 schreef Daniel Vetter:
> On Wed, Nov 23, 2016 at 02:08:48PM +0100, Daniel Vetter wrote:
>> On Wed, Nov 23, 2016 at 02:00:46PM +0100, Peter Zijlstra wrote:
>>> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote:
>>>> From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
>>>>
>>>> Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
>>>> the following example. Acquire context stamps are ordered like the thread
>>>> numbers, i.e. thread #1 should back off when it encounters a mutex locked
>>>> by thread #0 etc.
>>>>
>>>> Thread #0    Thread #1    Thread #2    Thread #3
>>>> ---------    ---------    ---------    ---------
>>>>                                        lock(ww)
>>>>                                        success
>>>>              lock(ww')
>>>>              success
>>>>                           lock(ww)
>>>>              lock(ww)        .
>>>>                 .            .         unlock(ww) part 1
>>>> lock(ww)        .            .            .
>>>> success         .            .            .
>>>>                 .            .         unlock(ww) part 2
>>>>                 .         back off
>>>> lock(ww')       .
>>>>    .            .
>>>> (stuck)      (stuck)
>>>>
>>>> Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
>>>> (without being protected by lock->base.wait_lock), meaning that thread #0
>>>> can acquire ww in the fast path or, much more likely, the medium path
>>>> in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
>>>> won't wake up any of the waiters in ww_mutex_set_context_fastpath.
>>>>
>>>> Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
>>>> thread #2, since waiters are added at the tail. Thread #2 wakes up and
>>>> backs off since it sees ww owned by a context with a lower stamp.
>>>>
>>>> Meanwhile, thread #1 is never woken up, and so it won't back off its lock
>>>> on ww'. So thread #0 gets stuck waiting for ww' to be released.
>>>>
>>>> This patch fixes the deadlock by waking up all waiters in the slow path
>>>> of ww_mutex_unlock.
>>>>
>>>> We have an internal test case for amdgpu which continuously submits
>>>> command streams from tens of threads, where all command streams reference
>>>> hundreds of GPU buffer objects with a lot of overlap in the buffer lists
>>>> between command streams. This test reliably caused a deadlock, and while I
>>>> haven't completely confirmed that it is exactly the scenario outlined
>>>> above, this patch does fix the test case.
>>>>
>>>> v2:
>>>> - use wake_q_add
>>>> - add additional explanations
>>>>
>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>> Cc: dri-devel@lists.freedesktop.org
>>>> Cc: stable@vger.kernel.org
>>>> Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
>>>> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
>>> Completely and utterly fails to apply; I think this patch is based on
>>> code prior to the mutex rewrite.
>>>
>>> Please rebase on tip/locking/core.
>>>
>>> Also, is this a regression, or has this been a 'feature' of the ww_mutex
>>> code from early on?
>> Sorry forgot to mention that, but I checked. Seems to have been broken
>> since day 1, at least looking at the original code the wake-single-waiter
>> stuff is as old as the mutex code added in 2006.
> More details: For gpu drivers this was originally working, since the
> ww_mutex implementation in ttm did use wake_up_all. So need to add a
>
> Fixes: 5e338405119a ("drm/ttm: convert to the reservation api")

But it's broken in the original kernel ww_mutex implementation, I guess it doesn't matter much since ttm was the first in-kernel user anyway. :)

~Maarten

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stable-owner@vger.kernel.org>
Received: from mblankhorst.nl ([141.105.120.124]:37266 "EHLO mblankhorst.nl"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1756602AbcKWNmw (ORCPT <rfc822;stable@vger.kernel.org>);
        Wed, 23 Nov 2016 08:42:52 -0500
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes
To: Peter Zijlstra <peterz@infradead.org>,
        =?UTF-8?Q?Nicolai_H=c3=a4hnle?= <nhaehnle@gmail.com>,
        =?UTF-8?Q?Nicolai_H=c3=a4hnle?= <Nicolai.Haehnle@amd.com>,
        linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
        Ingo Molnar <mingo@redhat.com>, stable@vger.kernel.org,
        Maarten Lankhorst <maarten.lankhorst@canonical.com>
References: <1479900325-28358-1-git-send-email-nhaehnle@gmail.com>
 <20161123130046.GS3092@twins.programming.kicks-ass.net>
 <20161123130848.q6yw73fjdhttmbqh@phenom.ffwll.local>
 <20161123131142.g53re5uznh6jqtk6@phenom.ffwll.local>
From: Maarten Lankhorst <dev@mblankhorst.nl>
Message-ID: <ece16d5d-89a9-7b0f-878b-f894b61aaa73@mblankhorst.nl>
Date: Wed, 23 Nov 2016 14:33:54 +0100
MIME-Version: 1.0
In-Reply-To: <20161123131142.g53re5uznh6jqtk6@phenom.ffwll.local>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Sender: stable-owner@vger.kernel.org
List-ID: <stable.vger.kernel.org>

Op 23-11-16 om 14:11 schreef Daniel Vetter:
> On Wed, Nov 23, 2016 at 02:08:48PM +0100, Daniel Vetter wrote:
>> On Wed, Nov 23, 2016 at 02:00:46PM +0100, Peter Zijlstra wrote:
>>> On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hï¿½hnle wrote:
>>>> From: Nicolai Hï¿½hnle <Nicolai.Haehnle@amd.com>
>>>>
>>>> Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
>>>> the following example. Acquire context stamps are ordered like the thread
>>>> numbers, i.e. thread #1 should back off when it encounters a mutex locked
>>>> by thread #0 etc.
>>>>
>>>> Thread #0    Thread #1    Thread #2    Thread #3
>>>> ---------    ---------    ---------    ---------
>>>>                                        lock(ww)
>>>>                                        success
>>>>              lock(ww')
>>>>              success
>>>>                           lock(ww)
>>>>              lock(ww)        .
>>>>                 .            .         unlock(ww) part 1
>>>> lock(ww)        .            .            .
>>>> success         .            .            .
>>>>                 .            .         unlock(ww) part 2
>>>>                 .         back off
>>>> lock(ww')       .
>>>>    .            .
>>>> (stuck)      (stuck)
>>>>
>>>> Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
>>>> (without being protected by lock->base.wait_lock), meaning that thread #0
>>>> can acquire ww in the fast path or, much more likely, the medium path
>>>> in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
>>>> won't wake up any of the waiters in ww_mutex_set_context_fastpath.
>>>>
>>>> Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
>>>> thread #2, since waiters are added at the tail. Thread #2 wakes up and
>>>> backs off since it sees ww owned by a context with a lower stamp.
>>>>
>>>> Meanwhile, thread #1 is never woken up, and so it won't back off its lock
>>>> on ww'. So thread #0 gets stuck waiting for ww' to be released.
>>>>
>>>> This patch fixes the deadlock by waking up all waiters in the slow path
>>>> of ww_mutex_unlock.
>>>>
>>>> We have an internal test case for amdgpu which continuously submits
>>>> command streams from tens of threads, where all command streams reference
>>>> hundreds of GPU buffer objects with a lot of overlap in the buffer lists
>>>> between command streams. This test reliably caused a deadlock, and while I
>>>> haven't completely confirmed that it is exactly the scenario outlined
>>>> above, this patch does fix the test case.
>>>>
>>>> v2:
>>>> - use wake_q_add
>>>> - add additional explanations
>>>>
>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>>>> Cc: dri-devel@lists.freedesktop.org
>>>> Cc: stable@vger.kernel.org
>>>> Reviewed-by: Christian Kï¿½nig <christian.koenig@amd.com> (v1)
>>>> Signed-off-by: Nicolai Hï¿½hnle <nicolai.haehnle@amd.com>
>>> Completely and utterly fails to apply; I think this patch is based on
>>> code prior to the mutex rewrite.
>>>
>>> Please rebase on tip/locking/core.
>>>
>>> Also, is this a regression, or has this been a 'feature' of the ww_mutex
>>> code from early on?
>> Sorry forgot to mention that, but I checked. Seems to have been broken
>> since day 1, at least looking at the original code the wake-single-waiter
>> stuff is as old as the mutex code added in 2006.
> More details: For gpu drivers this was originally working, since the
> ww_mutex implementation in ttm did use wake_up_all. So need to add a
>
> Fixes: 5e338405119a ("drm/ttm: convert to the reservation api")

But it's broken in the original kernel ww_mutex implementation, I guess it doesn't matter much since ttm was the first in-kernel user anyway. :)

~Maarten