All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Nicolai Hähnle" <nhaehnle@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	"Nicolai Hähnle" <nicolai.haehnle@amd.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	dri-devel@lists.freedesktop.org, "Ingo Molnar" <mingo@redhat.com>,
	stable@vger.kernel.org,
	"Maarten Lankhorst" <maarten.lankhorst@canonical.com>
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes
Date: Wed, 23 Nov 2016 13:50:52 +0100	[thread overview]
Message-ID: <20161123125052.ldewdgng7vvupve6@phenom.ffwll.local> (raw)
In-Reply-To: <1479900325-28358-1-git-send-email-nhaehnle@gmail.com>

On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai Hähnle wrote:
> From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
> 
> Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
> the following example. Acquire context stamps are ordered like the thread
> numbers, i.e. thread #1 should back off when it encounters a mutex locked
> by thread #0 etc.
> 
> Thread #0    Thread #1    Thread #2    Thread #3
> ---------    ---------    ---------    ---------
>                                        lock(ww)
>                                        success
>              lock(ww')
>              success
>                           lock(ww)
>              lock(ww)        .
>                 .            .         unlock(ww) part 1
> lock(ww)        .            .            .
> success         .            .            .
>                 .            .         unlock(ww) part 2
>                 .         back off
> lock(ww')       .
>    .            .
> (stuck)      (stuck)
> 
> Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
> (without being protected by lock->base.wait_lock), meaning that thread #0
> can acquire ww in the fast path or, much more likely, the medium path
> in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
> won't wake up any of the waiters in ww_mutex_set_context_fastpath.
> 
> Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
> thread #2, since waiters are added at the tail. Thread #2 wakes up and
> backs off since it sees ww owned by a context with a lower stamp.
> 
> Meanwhile, thread #1 is never woken up, and so it won't back off its lock
> on ww'. So thread #0 gets stuck waiting for ww' to be released.
> 
> This patch fixes the deadlock by waking up all waiters in the slow path
> of ww_mutex_unlock.
> 
> We have an internal test case for amdgpu which continuously submits
> command streams from tens of threads, where all command streams reference
> hundreds of GPU buffer objects with a lot of overlap in the buffer lists
> between command streams. This test reliably caused a deadlock, and while I
> haven't completely confirmed that it is exactly the scenario outlined
> above, this patch does fix the test case.
> 
> v2:
> - use wake_q_add
> - add additional explanations
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: stable@vger.kernel.org
> Reviewed-by: Christian König <christian.koenig@amd.com> (v1)
> Signed-off-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

Yeah, when the owning ctx changes we need to wake up all waiters, to make
sure we catch all (new) deadlock scenarios. And I tried poking at your
example, and I think it's solid and can't be minimized any further. I
don't have much clue on mutex.c code itself, but the changes seem
reasonable. With that caveat:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cheers, Daniel

> ---
>  kernel/locking/mutex.c | 33 +++++++++++++++++++++++++++++----
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index a70b90d..7fbf9b4 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -409,6 +409,9 @@ static bool mutex_optimistic_spin(struct mutex *lock,
>  __visible __used noinline
>  void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
>  
> +static __used noinline
> +void __sched __mutex_unlock_slowpath_wakeall(atomic_t *lock_count);
> +
>  /**
>   * mutex_unlock - release the mutex
>   * @lock: the mutex to be released
> @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
>  	 */
>  	mutex_clear_owner(&lock->base);
>  #endif
> -	__mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
> +	/*
> +	 * A previously _not_ waiting task may acquire the lock via the fast
> +	 * path during our unlock. In that case, already waiting tasks may have
> +	 * to back off to avoid a deadlock. Wake up all waiters so that they
> +	 * can check their acquire context stamp against the new owner.
> +	 */
> +	__mutex_fastpath_unlock(&lock->base.count,
> +				__mutex_unlock_slowpath_wakeall);
>  }
>  EXPORT_SYMBOL(ww_mutex_unlock);
>  
> @@ -716,7 +726,7 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible);
>   * Release the lock, slowpath:
>   */
>  static inline void
> -__mutex_unlock_common_slowpath(struct mutex *lock, int nested)
> +__mutex_unlock_common_slowpath(struct mutex *lock, int nested, int wake_all)
>  {
>  	unsigned long flags;
>  	WAKE_Q(wake_q);
> @@ -740,7 +750,14 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested)
>  	mutex_release(&lock->dep_map, nested, _RET_IP_);
>  	debug_mutex_unlock(lock);
>  
> -	if (!list_empty(&lock->wait_list)) {
> +	if (wake_all) {
> +		struct mutex_waiter *waiter;
> +
> +		list_for_each_entry(waiter, &lock->wait_list, list) {
> +			debug_mutex_wake_waiter(lock, waiter);
> +			wake_q_add(&wake_q, waiter->task);
> +		}
> +	} else if (!list_empty(&lock->wait_list)) {
>  		/* get the first entry from the wait-list: */
>  		struct mutex_waiter *waiter =
>  				list_entry(lock->wait_list.next,
> @@ -762,7 +779,15 @@ __mutex_unlock_slowpath(atomic_t *lock_count)
>  {
>  	struct mutex *lock = container_of(lock_count, struct mutex, count);
>  
> -	__mutex_unlock_common_slowpath(lock, 1);
> +	__mutex_unlock_common_slowpath(lock, 1, 0);
> +}
> +
> +static void
> +__mutex_unlock_slowpath_wakeall(atomic_t *lock_count)
> +{
> +	struct mutex *lock = container_of(lock_count, struct mutex, count);
> +
> +	__mutex_unlock_common_slowpath(lock, 1, 1);
>  }
>  
>  #ifndef CONFIG_DEBUG_LOCK_ALLOC
> -- 
> 2.7.4
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: "Nicolai Hähnle" <nhaehnle@gmail.com>
Cc: linux-kernel@vger.kernel.org,
	"Nicolai Hähnle" <nicolai.haehnle@amd.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	dri-devel@lists.freedesktop.org, "Ingo Molnar" <mingo@redhat.com>,
	stable@vger.kernel.org,
	"Maarten Lankhorst" <maarten.lankhorst@canonical.com>
Subject: Re: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes
Date: Wed, 23 Nov 2016 13:50:52 +0100	[thread overview]
Message-ID: <20161123125052.ldewdgng7vvupve6@phenom.ffwll.local> (raw)
In-Reply-To: <1479900325-28358-1-git-send-email-nhaehnle@gmail.com>

On Wed, Nov 23, 2016 at 12:25:22PM +0100, Nicolai H�hnle wrote:
> From: Nicolai H�hnle <Nicolai.Haehnle@amd.com>
> 
> Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in
> the following example. Acquire context stamps are ordered like the thread
> numbers, i.e. thread #1 should back off when it encounters a mutex locked
> by thread #0 etc.
> 
> Thread #0    Thread #1    Thread #2    Thread #3
> ---------    ---------    ---------    ---------
>                                        lock(ww)
>                                        success
>              lock(ww')
>              success
>                           lock(ww)
>              lock(ww)        .
>                 .            .         unlock(ww) part 1
> lock(ww)        .            .            .
> success         .            .            .
>                 .            .         unlock(ww) part 2
>                 .         back off
> lock(ww')       .
>    .            .
> (stuck)      (stuck)
> 
> Here, unlock(ww) part 1 is the part that sets lock->base.count to 1
> (without being protected by lock->base.wait_lock), meaning that thread #0
> can acquire ww in the fast path or, much more likely, the medium path
> in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then
> won't wake up any of the waiters in ww_mutex_set_context_fastpath.
> 
> Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is
> thread #2, since waiters are added at the tail. Thread #2 wakes up and
> backs off since it sees ww owned by a context with a lower stamp.
> 
> Meanwhile, thread #1 is never woken up, and so it won't back off its lock
> on ww'. So thread #0 gets stuck waiting for ww' to be released.
> 
> This patch fixes the deadlock by waking up all waiters in the slow path
> of ww_mutex_unlock.
> 
> We have an internal test case for amdgpu which continuously submits
> command streams from tens of threads, where all command streams reference
> hundreds of GPU buffer objects with a lot of overlap in the buffer lists
> between command streams. This test reliably caused a deadlock, and while I
> haven't completely confirmed that it is exactly the scenario outlined
> above, this patch does fix the test case.
> 
> v2:
> - use wake_q_add
> - add additional explanations
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: stable@vger.kernel.org
> Reviewed-by: Christian K�nig <christian.koenig@amd.com> (v1)
> Signed-off-by: Nicolai H�hnle <nicolai.haehnle@amd.com>

Yeah, when the owning ctx changes we need to wake up all waiters, to make
sure we catch all (new) deadlock scenarios. And I tried poking at your
example, and I think it's solid and can't be minimized any further. I
don't have much clue on mutex.c code itself, but the changes seem
reasonable. With that caveat:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Cheers, Daniel

> ---
>  kernel/locking/mutex.c | 33 +++++++++++++++++++++++++++++----
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index a70b90d..7fbf9b4 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -409,6 +409,9 @@ static bool mutex_optimistic_spin(struct mutex *lock,
>  __visible __used noinline
>  void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
>  
> +static __used noinline
> +void __sched __mutex_unlock_slowpath_wakeall(atomic_t *lock_count);
> +
>  /**
>   * mutex_unlock - release the mutex
>   * @lock: the mutex to be released
> @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
>  	 */
>  	mutex_clear_owner(&lock->base);
>  #endif
> -	__mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
> +	/*
> +	 * A previously _not_ waiting task may acquire the lock via the fast
> +	 * path during our unlock. In that case, already waiting tasks may have
> +	 * to back off to avoid a deadlock. Wake up all waiters so that they
> +	 * can check their acquire context stamp against the new owner.
> +	 */
> +	__mutex_fastpath_unlock(&lock->base.count,
> +				__mutex_unlock_slowpath_wakeall);
>  }
>  EXPORT_SYMBOL(ww_mutex_unlock);
>  
> @@ -716,7 +726,7 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible);
>   * Release the lock, slowpath:
>   */
>  static inline void
> -__mutex_unlock_common_slowpath(struct mutex *lock, int nested)
> +__mutex_unlock_common_slowpath(struct mutex *lock, int nested, int wake_all)
>  {
>  	unsigned long flags;
>  	WAKE_Q(wake_q);
> @@ -740,7 +750,14 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested)
>  	mutex_release(&lock->dep_map, nested, _RET_IP_);
>  	debug_mutex_unlock(lock);
>  
> -	if (!list_empty(&lock->wait_list)) {
> +	if (wake_all) {
> +		struct mutex_waiter *waiter;
> +
> +		list_for_each_entry(waiter, &lock->wait_list, list) {
> +			debug_mutex_wake_waiter(lock, waiter);
> +			wake_q_add(&wake_q, waiter->task);
> +		}
> +	} else if (!list_empty(&lock->wait_list)) {
>  		/* get the first entry from the wait-list: */
>  		struct mutex_waiter *waiter =
>  				list_entry(lock->wait_list.next,
> @@ -762,7 +779,15 @@ __mutex_unlock_slowpath(atomic_t *lock_count)
>  {
>  	struct mutex *lock = container_of(lock_count, struct mutex, count);
>  
> -	__mutex_unlock_common_slowpath(lock, 1);
> +	__mutex_unlock_common_slowpath(lock, 1, 0);
> +}
> +
> +static void
> +__mutex_unlock_slowpath_wakeall(atomic_t *lock_count)
> +{
> +	struct mutex *lock = container_of(lock_count, struct mutex, count);
> +
> +	__mutex_unlock_common_slowpath(lock, 1, 1);
>  }
>  
>  #ifndef CONFIG_DEBUG_LOCK_ALLOC
> -- 
> 2.7.4
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  parent reply	other threads:[~2016-11-23 12:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-23 11:25 [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes Nicolai Hähnle
2016-11-23 11:25 ` [PATCH 2/4] locking/ww_mutex: Remove redundant wakeups in ww_mutex_set_context_slowpath Nicolai Hähnle
2016-11-23 11:25   ` Nicolai Hähnle
2016-11-23 11:25 ` [PATCH 3/4] locking/Documentation: fix a typo Nicolai Hähnle
2016-11-23 11:25   ` Nicolai Hähnle
2016-11-23 11:25 ` [PATCH 4/4] locking/ww_mutex: Fix a comment typo Nicolai Hähnle
2016-11-23 11:25   ` Nicolai Hähnle
2016-11-23 12:50 ` Daniel Vetter [this message]
2016-11-23 12:50   ` [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes Daniel Vetter
2016-11-23 13:00 ` Peter Zijlstra
2016-11-23 13:00   ` Peter Zijlstra
2016-11-23 13:00   ` Peter Zijlstra
2016-11-23 13:08   ` Daniel Vetter
2016-11-23 13:08     ` Daniel Vetter
2016-11-23 13:11     ` Daniel Vetter
2016-11-23 13:11       ` Daniel Vetter
2016-11-23 13:11       ` Daniel Vetter
2016-11-23 13:33       ` Maarten Lankhorst
2016-11-23 13:33         ` Maarten Lankhorst
2016-11-23 14:03 ` Peter Zijlstra
2016-11-23 14:03   ` Peter Zijlstra
2016-11-23 14:03   ` Peter Zijlstra
2016-11-23 14:25   ` Daniel Vetter
2016-11-23 14:25     ` Daniel Vetter
2016-11-23 14:25     ` Daniel Vetter
2016-11-23 14:32     ` Peter Zijlstra
2016-11-23 14:32       ` Peter Zijlstra
2016-11-24 11:26     ` Nicolai Hähnle
2016-11-24 11:26       ` Nicolai Hähnle
2016-11-24 11:40       ` Peter Zijlstra
2016-11-24 11:40         ` Peter Zijlstra
2016-11-24 11:40         ` Peter Zijlstra
2016-11-24 11:52         ` Daniel Vetter
2016-11-24 11:52           ` Daniel Vetter
2016-11-24 11:56           ` Peter Zijlstra
2016-11-24 11:56             ` Peter Zijlstra
2016-11-24 12:05             ` Nicolai Hähnle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161123125052.ldewdgng7vvupve6@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@canonical.com \
    --cc=mingo@redhat.com \
    --cc=nhaehnle@gmail.com \
    --cc=nicolai.haehnle@amd.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.