[PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes
@ 2016-11-28 12:20 Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context Nicolai Hähnle
                   ` (12 more replies)
  0 siblings, 13 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel

It turns out that the deadlock that I found last week was already implicitly
fixed during the lock->owner redesign, by checking the WAITERS bit in the
w/w lock fast path. However, since I had already started looking into
sorting the wait list, here goes.

The basic idea is to make sure that:

1. All waiters that have a ww_ctx appear in stamp order in the wait list.
Waiters without a ww_ctx are still supported and appear in FIFO order as
before.

2. At most one of the waiters can be in a state where it has to check for
back off (i.e., ww_ctx->acquire > 0). Technically, there are short time
windows in which more than one such waiter can be on the list, but all
but the first one are running. This happens when a new waiter with
ww_ctx->acquire > 0 adds itself at the front of the list and wakes up the
previous head of the list, and of course multiple such chained cases can
be in-flight simultaneously.

Then we only ever have to wake up one task at a time. This is _not_ always
the head of the wait list, since there may be waiters without a context. But
among waiters with a context, we only ever have to wake the first one.

To achieve all this, the series adds a new field to mutex_waiter which is
only used for the w/w lock case. As a consequence, calling mutex_lock
directly on w/w locks is now definitely incorrect. That was likely the
intention previously anyway, but grepping through the source I did find one
place that had slipped through.

I've included timings taken from a contention-heavy stress test to some of
the patches. The stress test performs actual GPU operations which take a
good chunk of the wall time, but even so, the series still manages to
improve the wall time quite a bit.

Cheers,
Nicolai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-28 12:42   ` Maarten Lankhorst
  2016-11-28 12:58   ` Christian König
  2016-11-28 12:20 ` [PATCH 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop Nicolai Hähnle
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 drivers/gpu/drm/vgem/vgem_fence.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c
index 488909a..e1d516f 100644
--- a/drivers/gpu/drm/vgem/vgem_fence.c
+++ b/drivers/gpu/drm/vgem/vgem_fence.c
@@ -191,12 +191,12 @@ int vgem_fence_attach_ioctl(struct drm_device *dev,
 
 	/* Expose the fence via the dma-buf */
 	ret = 0;
-	mutex_lock(&resv->lock.base);
+	ww_mutex_lock(&resv->lock.base, NULL);
 	if (arg->flags & VGEM_FENCE_WRITE)
 		reservation_object_add_excl_fence(resv, fence);
 	else if ((ret = reservation_object_reserve_shared(resv)) == 0)
 		reservation_object_add_shared_fence(resv, fence);
-	mutex_unlock(&resv->lock.base);
+	ww_mutex_unlock(&resv->lock.base);
 
 	/* Record the fence in our idr for later signaling */
 	if (ret == 0) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 03/11] locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after Nicolai Hähnle
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

In the following scenario, thread #1 should back off its attempt to lock
ww1 and unlock ww2 (assuming the acquire context stamps are ordered
accordingly).

    Thread #0               Thread #1
    ---------               ---------
                            successfully lock ww2
    set ww1->base.owner
                            attempt to lock ww1
                            confirm ww1->ctx == NULL
                            enter mutex_spin_on_owner
    set ww1->ctx

What was likely to happen previously is:

    attempt to lock ww2
    refuse to spin because
      ww2->ctx != NULL
    schedule()
                            detect thread #0 is off CPU
                            stop optimistic spin
                            return -EDEADLK
                            unlock ww2
                            wakeup thread #0
    lock ww2

Now, we are more likely to see:

                            detect ww1->ctx != NULL
                            stop optimistic spin
                            return -EDEADLK
                            unlock ww2
    successfully lock ww2

... because thread #1 will stop its optimistic spin as soon as possible.

The whole scenario is quite unlikely, since it requires thread #1 to get
between thread #0 setting the owner and setting the ctx. But since we're
idling here anyway, the additional check is basically free.

Found by inspection.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 kernel/locking/mutex.c | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 9b34961..0afa998 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -350,7 +350,8 @@ ww_mutex_set_context_slowpath(struct ww_mutex *lock,
  * access and not reliable.
  */
 static noinline
-bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
+			 bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx)
 {
 	bool ret = true;
 
@@ -373,6 +374,28 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 			break;
 		}
 
+		if (use_ww_ctx && ww_ctx->acquired > 0) {
+			struct ww_mutex *ww;
+
+			ww = container_of(lock, struct ww_mutex, base);
+
+			/*
+			 * If ww->ctx is set the contents are undefined, only
+			 * by acquiring wait_lock there is a guarantee that
+			 * they are not invalid when reading.
+			 *
+			 * As such, when deadlock detection needs to be
+			 * performed the optimistic spinning cannot be done.
+			 *
+			 * Check this in every inner iteration because we may
+			 * be racing against another thread's ww_mutex_lock.
+			 */
+			if (READ_ONCE(ww->ctx)) {
+				ret = false;
+				break;
+			}
+		}
+
 		cpu_relax();
 	}
 	rcu_read_unlock();
@@ -460,22 +483,6 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 	for (;;) {
 		struct task_struct *owner;
 
-		if (use_ww_ctx && ww_ctx->acquired > 0) {
-			struct ww_mutex *ww;
-
-			ww = container_of(lock, struct ww_mutex, base);
-			/*
-			 * If ww->ctx is set the contents are undefined, only
-			 * by acquiring wait_lock there is a guarantee that
-			 * they are not invalid when reading.
-			 *
-			 * As such, when deadlock detection needs to be
-			 * performed the optimistic spinning cannot be done.
-			 */
-			if (READ_ONCE(ww->ctx))
-				goto fail_unlock;
-		}
-
 		/*
 		 * If there's an owner, wait for it to either
 		 * release the lock or go to sleep.
@@ -487,7 +494,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 				break;
 			}
 
-			if (!mutex_spin_on_owner(lock, owner))
+			if (!mutex_spin_on_owner(lock, owner, use_ww_ctx,
+						 ww_ctx))
 				goto fail_unlock;
 		}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/11] locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context Nicolai Hähnle
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

The function will be re-used in subsequent patches.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 kernel/locking/mutex.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0afa998..200629a 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -277,6 +277,13 @@ static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
 	ww_ctx->acquired++;
 }
 
+static inline bool __sched
+__ww_mutex_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
+{
+	return a->stamp - b->stamp <= LONG_MAX &&
+	       (a->stamp != b->stamp || a > b);
+}
+
 /*
  * After acquiring lock with fastpath or when we lost out in contested
  * slowpath, set ctx and wake up any waiters so they can recheck.
@@ -610,8 +617,7 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
 	if (!hold_ctx)
 		return 0;
 
-	if (ctx->stamp - hold_ctx->stamp <= LONG_MAX &&
-	    (ctx->stamp != hold_ctx->stamp || ctx > hold_ctx)) {
+	if (__ww_mutex_stamp_after(ctx, hold_ctx)) {
 #ifdef CONFIG_DEBUG_MUTEXES
 		DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
 		ctx->contending_lock = ww;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
                   ` (2 preceding siblings ...)
  2016-11-28 12:20 ` [PATCH 03/11] locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 05/11] locking/ww_mutex: Add waiters in stamp order Nicolai Hähnle
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

We will add a new field to struct mutex_waiter.  This field must be
initialized for all waiters if any waiter uses the ww_use_ctx path.

So there is a trade-off: Keep ww_mutex locking without a context on the
faster non-use_ww_ctx path, at the cost of adding the initialization to all
mutex locks (including non-ww_mutexes), or avoid the additional cost for
non-ww_mutex locks, at the cost of adding additional checks to the
use_ww_ctx path.

We take the latter choice.  It may be worth eliminating the users of
ww_mutex_lock(lock, NULL), but there are a lot of them.

Move the declaration of ww in __mutex_lock_common closer to its uses
because gcc otherwise (incorrectly) believes ww may be used without
initialization.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 include/linux/ww_mutex.h | 11 ++---------
 kernel/locking/mutex.c   | 37 +++++++++++++++++++++++++------------
 2 files changed, 27 insertions(+), 21 deletions(-)

diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 2bb5deb..a5960e5 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -222,11 +222,7 @@ extern int __must_check __ww_mutex_lock_interruptible(struct ww_mutex *lock,
  */
 static inline int ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
-	if (ctx)
-		return __ww_mutex_lock(lock, ctx);
-
-	mutex_lock(&lock->base);
-	return 0;
+	return __ww_mutex_lock(lock, ctx);
 }
 
 /**
@@ -262,10 +258,7 @@ static inline int ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ct
 static inline int __must_check ww_mutex_lock_interruptible(struct ww_mutex *lock,
 							   struct ww_acquire_ctx *ctx)
 {
-	if (ctx)
-		return __ww_mutex_lock_interruptible(lock, ctx);
-	else
-		return mutex_lock_interruptible(&lock->base);
+	return __ww_mutex_lock_interruptible(lock, ctx);
 }
 
 /**
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 200629a..585627f 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -381,7 +381,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
 			break;
 		}
 
-		if (use_ww_ctx && ww_ctx->acquired > 0) {
+		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
 			struct ww_mutex *ww;
 
 			ww = container_of(lock, struct ww_mutex, base);
@@ -640,10 +640,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	struct mutex_waiter waiter;
 	unsigned long flags;
 	bool first = false;
-	struct ww_mutex *ww;
 	int ret;
 
-	if (use_ww_ctx) {
+	if (use_ww_ctx && ww_ctx) {
+		struct ww_mutex *ww;
+
 		ww = container_of(lock, struct ww_mutex, base);
 		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
 			return -EALREADY;
@@ -656,8 +657,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	    mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, false)) {
 		/* got the lock, yay! */
 		lock_acquired(&lock->dep_map, ip);
-		if (use_ww_ctx)
+		if (use_ww_ctx && ww_ctx) {
+			struct ww_mutex *ww;
+
+			ww = container_of(lock, struct ww_mutex, base);
 			ww_mutex_set_context_fastpath(ww, ww_ctx);
+		}
 		preempt_enable();
 		return 0;
 	}
@@ -702,7 +707,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			goto err;
 		}
 
-		if (use_ww_ctx && ww_ctx->acquired > 0) {
+		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
 			ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
 			if (ret)
 				goto err;
@@ -742,8 +747,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	/* got the lock - cleanup and rejoice! */
 	lock_acquired(&lock->dep_map, ip);
 
-	if (use_ww_ctx)
+	if (use_ww_ctx && ww_ctx) {
+		struct ww_mutex *ww;
+
+		ww = container_of(lock, struct ww_mutex, base);
 		ww_mutex_set_context_slowpath(ww, ww_ctx);
+	}
 
 	spin_unlock_mutex(&lock->wait_lock, flags);
 	preempt_enable();
@@ -830,8 +839,9 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 
 	might_sleep();
 	ret =  __mutex_lock_common(&lock->base, TASK_UNINTERRUPTIBLE,
-				   0, &ctx->dep_map, _RET_IP_, ctx, 1);
-	if (!ret && ctx->acquired > 1)
+				   0, ctx ? &ctx->dep_map : NULL, _RET_IP_,
+				   ctx, 1);
+	if (!ret && ctx && ctx->acquired > 1)
 		return ww_mutex_deadlock_injection(lock, ctx);
 
 	return ret;
@@ -845,9 +855,10 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 
 	might_sleep();
 	ret = __mutex_lock_common(&lock->base, TASK_INTERRUPTIBLE,
-				  0, &ctx->dep_map, _RET_IP_, ctx, 1);
+				  0, ctx ? &ctx->dep_map : NULL, _RET_IP_,
+				  ctx, 1);
 
-	if (!ret && ctx->acquired > 1)
+	if (!ret && ctx && ctx->acquired > 1)
 		return ww_mutex_deadlock_injection(lock, ctx);
 
 	return ret;
@@ -1034,7 +1045,8 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	might_sleep();
 
 	if (__mutex_trylock_fast(&lock->base)) {
-		ww_mutex_set_context_fastpath(lock, ctx);
+		if (ctx)
+			ww_mutex_set_context_fastpath(lock, ctx);
 		return 0;
 	}
 
@@ -1048,7 +1060,8 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	might_sleep();
 
 	if (__mutex_trylock_fast(&lock->base)) {
-		ww_mutex_set_context_fastpath(lock, ctx);
+		if (ctx)
+			ww_mutex_set_context_fastpath(lock, ctx);
 		return 0;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/11] locking/ww_mutex: Add waiters in stamp order
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
                   ` (3 preceding siblings ...)
  2016-11-28 12:20 ` [PATCH 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-30 14:10   ` Chris Wilson
  2016-11-28 12:20 ` [PATCH 06/11] locking/ww_mutex: Notify waiters that have to back off while adding tasks to wait list Nicolai Hähnle
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

Add regular waiters in stamp order. Keep adding waiters that have no
context in FIFO order and take care not to starve them.

While adding our task as a waiter, back off if we detect that there is a
waiter with a lower stamp in front of us.

Make sure to call lock_contended even when we back off early.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 include/linux/mutex.h  |  3 ++
 kernel/locking/mutex.c | 76 +++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 72 insertions(+), 7 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index b97870f..118a3b6 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -20,6 +20,8 @@
 #include <linux/osq_lock.h>
 #include <linux/debug_locks.h>
 
+struct ww_acquire_ctx;
+
 /*
  * Simple, straightforward mutexes with strict semantics:
  *
@@ -75,6 +77,7 @@ static inline struct task_struct *__mutex_owner(struct mutex *lock)
 struct mutex_waiter {
 	struct list_head	list;
 	struct task_struct	*task;
+	struct ww_acquire_ctx	*ww_ctx;
 #ifdef CONFIG_DEBUG_MUTEXES
 	void			*magic;
 #endif
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 585627f..01dcae7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -628,6 +628,40 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
 	return 0;
 }
 
+static inline int __sched
+__ww_mutex_add_waiter(struct mutex_waiter *waiter,
+		      struct mutex *lock,
+		      struct ww_acquire_ctx *ww_ctx)
+{
+	if (ww_ctx) {
+		struct mutex_waiter *cur;
+
+		/*
+		 * Add the waiter before the first waiter with a higher stamp.
+		 * Waiters without a context are skipped to avoid starving
+		 * them.
+		 */
+		list_for_each_entry(cur, &lock->wait_list, list) {
+			if (!cur->ww_ctx)
+				continue;
+
+			if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) {
+				/* Back off immediately if necessary. */
+				if (ww_ctx->acquired > 0)
+					return -EDEADLK;
+
+				continue;
+			}
+
+			list_add_tail(&waiter->list, &cur->list);
+			return 0;
+		}
+	}
+
+	list_add_tail(&waiter->list, &lock->wait_list);
+	return 0;
+}
+
 /*
  * Lock a mutex (possibly interruptible), slowpath:
  */
@@ -677,15 +711,25 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	debug_mutex_lock_common(lock, &waiter);
 	debug_mutex_add_waiter(lock, &waiter, task);
 
-	/* add waiting tasks to the end of the waitqueue (FIFO): */
-	list_add_tail(&waiter.list, &lock->wait_list);
+	lock_contended(&lock->dep_map, ip);
+
+	if (!use_ww_ctx) {
+		/* add waiting tasks to the end of the waitqueue (FIFO): */
+		list_add_tail(&waiter.list, &lock->wait_list);
+	} else {
+		/* Add in stamp order, waking up waiters that must back off. */
+		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
+		if (ret)
+			goto err_early_backoff;
+
+		waiter.ww_ctx = ww_ctx;
+	}
+
 	waiter.task = task;
 
 	if (__mutex_waiter_is_first(lock, &waiter))
 		__mutex_set_flag(lock, MUTEX_FLAG_WAITERS);
 
-	lock_contended(&lock->dep_map, ip);
-
 	set_task_state(task, state);
 	for (;;) {
 		/*
@@ -693,8 +737,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * mutex_unlock() handing the lock off to us, do a trylock
 		 * before testing the error conditions to make sure we pick up
 		 * the handoff.
+		 *
+		 * For w/w locks, we always need to do this even if we're not
+		 * currently the first waiter, because we may have been the
+		 * first waiter during the unlock.
 		 */
-		if (__mutex_trylock(lock, first))
+		if (__mutex_trylock(lock, use_ww_ctx || first))
 			goto acquired;
 
 		/*
@@ -716,7 +764,20 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
 
-		if (!first && __mutex_waiter_is_first(lock, &waiter)) {
+		if (use_ww_ctx && ww_ctx) {
+			/*
+			 * Always re-check whether we're in first position. We
+			 * don't want to spin if another task with a lower
+			 * stamp has taken our position.
+			 *
+			 * We also may have to set the handoff flag again, if
+			 * our position at the head was temporarily taken away.
+			 */
+			first = __mutex_waiter_is_first(lock, &waiter);
+
+			if (first)
+				__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
+		} else if (!first && __mutex_waiter_is_first(lock, &waiter)) {
 			first = true;
 			__mutex_set_flag(lock, MUTEX_FLAG_HANDOFF);
 		}
@@ -728,7 +789,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * or we must see its unlock and acquire.
 		 */
 		if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
-		     __mutex_trylock(lock, first))
+		     __mutex_trylock(lock, use_ww_ctx || first))
 			break;
 
 		spin_lock_mutex(&lock->wait_lock, flags);
@@ -761,6 +822,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 err:
 	__set_task_state(task, TASK_RUNNING);
 	mutex_remove_waiter(lock, &waiter, task);
+err_early_backoff:
 	spin_unlock_mutex(&lock->wait_lock, flags);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, 1, ip);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/11] locking/ww_mutex: Notify waiters that have to back off while adding tasks to wait list
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
                   ` (4 preceding siblings ...)
  2016-11-28 12:20 ` [PATCH 05/11] locking/ww_mutex: Add waiters in stamp order Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 07/11] locking/ww_mutex: Wake at most one waiter for back off when acquiring the lock Nicolai Hähnle
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

While adding our task as a waiter, detect if another task should back off
because of us.

With this patch, we establish the invariant that the wait list contains
at most one (sleeping) waiter with ww_ctx->acquired > 0, and this waiter
will be the first waiter with a context.

Since only waiters with ww_ctx->acquired > 0 have to back off, this allows
us to be much more economical with wakeups.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 kernel/locking/mutex.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 01dcae7..d310703e 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -609,23 +609,34 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 EXPORT_SYMBOL(ww_mutex_unlock);
 
 static inline int __sched
-__ww_mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
+__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
+			    struct ww_acquire_ctx *ctx)
 {
 	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
 	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
+	struct mutex_waiter *cur;
 
-	if (!hold_ctx)
-		return 0;
+	if (hold_ctx && __ww_mutex_stamp_after(ctx, hold_ctx))
+		goto deadlock;
 
-	if (__ww_mutex_stamp_after(ctx, hold_ctx)) {
-#ifdef CONFIG_DEBUG_MUTEXES
-		DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
-		ctx->contending_lock = ww;
-#endif
-		return -EDEADLK;
+	/*
+	 * If there is a waiter in front of us that has a context, then its
+	 * stamp is earlier than ours and we must back off.
+	 */
+	cur = waiter;
+	list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
+		if (cur->ww_ctx)
+			goto deadlock;
 	}
 
 	return 0;
+
+deadlock:
+#ifdef CONFIG_DEBUG_MUTEXES
+	DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
+	ctx->contending_lock = ww;
+#endif
+	return -EDEADLK;
 }
 
 static inline int __sched
@@ -654,6 +665,16 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 			}
 
 			list_add_tail(&waiter->list, &cur->list);
+
+			/*
+			 * Wake up the waiter so that it gets a chance to back
+			 * off.
+			 */
+			if (cur->ww_ctx->acquired > 0) {
+				debug_mutex_wake_waiter(lock, cur);
+				wake_up_process(cur->task);
+			}
+
 			return 0;
 		}
 	}
@@ -756,7 +777,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		}
 
 		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
-			ret = __ww_mutex_lock_check_stamp(lock, ww_ctx);
+			ret = __ww_mutex_lock_check_stamp(lock, &waiter,
+							  ww_ctx);
 			if (ret)
 				goto err;
 		}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/11] locking/ww_mutex: Wake at most one waiter for back off when acquiring the lock
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
                   ` (5 preceding siblings ...)
  2016-11-28 12:20 ` [PATCH 06/11] locking/ww_mutex: Notify waiters that have to back off while adding tasks to wait list Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 08/11] locking/ww_mutex: Yield to other waiters from optimistic spin Nicolai Hähnle
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

The wait list is sorted by stamp order, and the only waiting task that may
have to back off is the first waiter with a context.

The regular slow path does not have to wake any other tasks at all, since
all other waiters that would have to back off were either woken up when
the waiter was added to the list, or detected the condition before they
added themselves.

Median timings taken of a contention-heavy GPU workload:

Without this series:
real    0m59.900s
user    0m7.516s
sys     2m16.076s

With changes up to and including this patch:
real    0m52.946s
user    0m7.272s
sys     1m55.964s

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 kernel/locking/mutex.c | 58 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index d310703e..0f6f1da 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -285,6 +285,35 @@ __ww_mutex_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
 }
 
 /*
+ * Wake up any waiters that may have to back off when the lock is held by the
+ * given context.
+ *
+ * Due to the invariants on the wait list, this can only affect the first
+ * waiter with a context.
+ *
+ * Must be called with wait_lock held. The current task must not be on the
+ * wait list.
+ */
+static void __sched
+__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+{
+	struct mutex_waiter *cur;
+
+	list_for_each_entry(cur, &lock->wait_list, list) {
+		if (!cur->ww_ctx)
+			continue;
+
+		if (cur->ww_ctx->acquired > 0 &&
+		    __ww_mutex_stamp_after(cur->ww_ctx, ww_ctx)) {
+			debug_mutex_wake_waiter(lock, cur);
+			wake_up_process(cur->task);
+		}
+
+		break;
+	}
+}
+
+/*
  * After acquiring lock with fastpath or when we lost out in contested
  * slowpath, set ctx and wake up any waiters so they can recheck.
  */
@@ -293,7 +322,6 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
 			       struct ww_acquire_ctx *ctx)
 {
 	unsigned long flags;
-	struct mutex_waiter *cur;
 
 	ww_mutex_lock_acquired(lock, ctx);
 
@@ -319,16 +347,15 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock,
 	 * so they can see the new lock->ctx.
 	 */
 	spin_lock_mutex(&lock->base.wait_lock, flags);
-	list_for_each_entry(cur, &lock->base.wait_list, list) {
-		debug_mutex_wake_waiter(&lock->base, cur);
-		wake_up_process(cur->task);
-	}
+	__ww_mutex_wakeup_for_backoff(&lock->base, ctx);
 	spin_unlock_mutex(&lock->base.wait_lock, flags);
 }
 
 /*
- * After acquiring lock in the slowpath set ctx and wake up any
- * waiters so they can recheck.
+ * After acquiring lock in the slowpath set ctx.
+ *
+ * Unlike for the fast path, the caller ensures that waiters are woken up where
+ * necessary.
  *
  * Callers must hold the mutex wait_lock.
  */
@@ -336,19 +363,8 @@ static __always_inline void
 ww_mutex_set_context_slowpath(struct ww_mutex *lock,
 			      struct ww_acquire_ctx *ctx)
 {
-	struct mutex_waiter *cur;
-
 	ww_mutex_lock_acquired(lock, ctx);
 	lock->ctx = ctx;
-
-	/*
-	 * Give any possible sleeping processes the chance to wake up,
-	 * so they can recheck if they have to back off.
-	 */
-	list_for_each_entry(cur, &lock->base.wait_list, list) {
-		debug_mutex_wake_waiter(&lock->base, cur);
-		wake_up_process(cur->task);
-	}
 }
 
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
@@ -726,8 +742,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	/*
 	 * After waiting to acquire the wait_lock, try again.
 	 */
-	if (__mutex_trylock(lock, false))
+	if (__mutex_trylock(lock, false)) {
+		if (use_ww_ctx && ww_ctx)
+			__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+
 		goto skip_wait;
+	}
 
 	debug_mutex_lock_common(lock, &waiter);
 	debug_mutex_add_waiter(lock, &waiter, task);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/11] locking/ww_mutex: Yield to other waiters from optimistic spin
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
                   ` (6 preceding siblings ...)
  2016-11-28 12:20 ` [PATCH 07/11] locking/ww_mutex: Wake at most one waiter for back off when acquiring the lock Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 09/11] locking/mutex: Initialize mutex_waiter::ww_ctx with poison when debugging Nicolai Hähnle
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

Lock stealing is less beneficial for w/w mutexes since we may just end up
backing off if we stole from a thread with an earlier acquire stamp that
already holds another w/w mutex that we also need. So don't spin
optimistically unless we are sure that there is no other waiter that might
cause us to back off.

Median timings taken of a contention-heavy GPU workload:

Before:
real    0m52.946s
user    0m7.272s
sys     1m55.964s

After:
real    0m53.086s
user    0m7.360s
sys     1m46.204s

This particular workload still spends 20%-25% of CPU in mutex_spin_on_owner
according to perf, but my attempts to further reduce this spinning based on
various heuristics all lead to an increase in measured wall time despite
the decrease in sys time.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 kernel/locking/mutex.c | 48 ++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0f6f1da..1e4da21 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -374,7 +374,8 @@ ww_mutex_set_context_slowpath(struct ww_mutex *lock,
  */
 static noinline
 bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
-			 bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx)
+			 bool use_ww_ctx, struct ww_acquire_ctx *ww_ctx,
+			 struct mutex_waiter *waiter)
 {
 	bool ret = true;
 
@@ -397,7 +398,7 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
 			break;
 		}
 
-		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
+		if (use_ww_ctx && ww_ctx) {
 			struct ww_mutex *ww;
 
 			ww = container_of(lock, struct ww_mutex, base);
@@ -413,7 +414,30 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
 			 * Check this in every inner iteration because we may
 			 * be racing against another thread's ww_mutex_lock.
 			 */
-			if (READ_ONCE(ww->ctx)) {
+			if (ww_ctx->acquired > 0 && READ_ONCE(ww->ctx)) {
+				ret = false;
+				break;
+			}
+
+			/*
+			 * If we aren't on the wait list yet, cancel the spin
+			 * if there are waiters. We want  to avoid stealing the
+			 * lock from a waiter with an earlier stamp, since the
+			 * other thread may already own a lock that we also
+			 * need.
+			 */
+			if (!waiter &&
+			    (atomic_long_read(&lock->owner) &
+			     MUTEX_FLAG_WAITERS)) {
+				ret = false;
+				break;
+			}
+
+			/*
+			 * Similarly, stop spinning if we are no longer the
+			 * first waiter.
+			 */
+			if (waiter && !__mutex_waiter_is_first(lock, waiter)) {
 				ret = false;
 				break;
 			}
@@ -479,7 +503,8 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
  */
 static bool mutex_optimistic_spin(struct mutex *lock,
 				  struct ww_acquire_ctx *ww_ctx,
-				  const bool use_ww_ctx, const bool waiter)
+				  const bool use_ww_ctx,
+				  struct mutex_waiter *waiter)
 {
 	struct task_struct *task = current;
 
@@ -518,12 +543,12 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 			}
 
 			if (!mutex_spin_on_owner(lock, owner, use_ww_ctx,
-						 ww_ctx))
+						 ww_ctx, waiter))
 				goto fail_unlock;
 		}
 
 		/* Try to acquire the mutex if it is unlocked. */
-		if (__mutex_trylock(lock, waiter))
+		if (__mutex_trylock(lock, waiter != NULL))
 			break;
 
 		/*
@@ -565,7 +590,8 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 #else
 static bool mutex_optimistic_spin(struct mutex *lock,
 				  struct ww_acquire_ctx *ww_ctx,
-				  const bool use_ww_ctx, const bool waiter)
+				  const bool use_ww_ctx,
+				  struct mutex_waiter *waiter)
 {
 	return false;
 }
@@ -725,7 +751,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
 
 	if (__mutex_trylock(lock, false) ||
-	    mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, false)) {
+	    mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, NULL)) {
 		/* got the lock, yay! */
 		lock_acquired(&lock->dep_map, ip);
 		if (use_ww_ctx && ww_ctx) {
@@ -830,8 +856,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		 * state back to RUNNING and fall through the next schedule(),
 		 * or we must see its unlock and acquire.
 		 */
-		if ((first && mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx, true)) ||
-		     __mutex_trylock(lock, use_ww_ctx || first))
+		if ((first &&
+		     mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx,
+					   &waiter)) ||
+		    __mutex_trylock(lock, use_ww_ctx || first))
 			break;
 
 		spin_lock_mutex(&lock->wait_lock, flags);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/11] locking/mutex: Initialize mutex_waiter::ww_ctx with poison when debugging
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
                   ` (7 preceding siblings ...)
  2016-11-28 12:20 ` [PATCH 08/11] locking/ww_mutex: Yield to other waiters from optimistic spin Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 10/11] Documentation/locking/ww_mutex: Update the design document Nicolai Hähnle
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

Help catch cases where mutex_lock is used directly on w/w mutexes, which
otherwise result in the w/w tasks reading uninitialized data.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 include/linux/poison.h | 1 +
 kernel/locking/mutex.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/include/linux/poison.h b/include/linux/poison.h
index 51334ed..a395403 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -80,6 +80,7 @@
 /********** kernel/mutexes **********/
 #define MUTEX_DEBUG_INIT	0x11
 #define MUTEX_DEBUG_FREE	0x22
+#define MUTEX_POISON_WW_CTX	((void *) 0x500 + POISON_POINTER_DELTA)
 
 /********** lib/flex_array.c **********/
 #define FLEX_ARRAY_FREE	0x6c	/* for use-after-free poisoning */
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 1e4da21..ee84007 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -783,6 +783,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	if (!use_ww_ctx) {
 		/* add waiting tasks to the end of the waitqueue (FIFO): */
 		list_add_tail(&waiter.list, &lock->wait_list);
+
+#ifdef CONFIG_DEBUG_MUTEXES
+		waiter.ww_ctx = MUTEX_POISON_WW_CTX;
+#endif
 	} else {
 		/* Add in stamp order, waking up waiters that must back off. */
 		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/11] Documentation/locking/ww_mutex: Update the design document
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
                   ` (8 preceding siblings ...)
  2016-11-28 12:20 ` [PATCH 09/11] locking/mutex: Initialize mutex_waiter::ww_ctx with poison when debugging Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-28 12:20 ` [PATCH 11/11] [rfc] locking/ww_mutex: Always spin optimistically for the first waiter Nicolai Hähnle
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

Document the invariants we maintain for the wait list of ww_mutexes.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 Documentation/locking/ww-mutex-design.txt | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
index 8a112dc..34c3a1b 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -309,11 +309,15 @@ Design:
   normal mutex locks, which are far more common. As such there is only a small
   increase in code size if wait/wound mutexes are not used.
 
+  We maintain the following invariants for the wait list:
+  (1) Waiters with an acquire context are sorted by stamp order; waiters
+      without an acquire context are interspersed in FIFO order.
+  (2) Among waiters with contexts, only the first one can have other locks
+      acquired already (ctx->acquired > 0). Note that this waiter may come
+      after other waiters without contexts in the list.
+
   In general, not much contention is expected. The locks are typically used to
-  serialize access to resources for devices. The only way to make wakeups
-  smarter would be at the cost of adding a field to struct mutex_waiter. This
-  would add overhead to all cases where normal mutexes are used, and
-  ww_mutexes are generally less performance sensitive.
+  serialize access to resources for devices.
 
 Lockdep:
   Special care has been taken to warn for as many cases of api abuse
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/11] [rfc] locking/ww_mutex: Always spin optimistically for the first waiter
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
                   ` (9 preceding siblings ...)
  2016-11-28 12:20 ` [PATCH 10/11] Documentation/locking/ww_mutex: Update the design document Nicolai Hähnle
@ 2016-11-28 12:20 ` Nicolai Hähnle
  2016-11-30  0:35 ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Chris Wilson
  2016-11-30  9:40 ` [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Chris Wilson
  12 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-28 12:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, Chris Wilson, dri-devel

From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>

Check the current owner's context once against our stamp. If our stamp is
lower, we continue to spin optimistically instead of backing off.

This is correct with respect to deadlock detection because while the
(owner, ww_ctx) pair may re-appear if the owner task manages to unlock
and re-acquire the lock while we're spinning, the context could only have
been re-initialized with an even higher stamp. We also still detect when
we have to back off for other waiters that join the list while we're
spinning.

But taking the wait_lock in mutex_spin_on_owner feels iffy, even if it is
done only once.

Median timings taken of a contention-heavy GPU workload:

Before:
real    0m53.086s
user    0m7.360s
sys     1m46.204s

After:
real    0m52.577s
user    0m7.544s
sys     1m49.200s

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
---
 kernel/locking/mutex.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index ee84007..e7d5fac 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -378,6 +378,28 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
 			 struct mutex_waiter *waiter)
 {
 	bool ret = true;
+	struct ww_acquire_ctx *owner_ww_ctx = NULL;
+
+	if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
+		struct ww_mutex *ww;
+		unsigned long flags;
+
+		ww = container_of(lock, struct ww_mutex, base);
+
+		/*
+		 * Check the stamp of the current owner once. This allows us
+		 * to spin optimistically in the case where the current owner
+		 * has a higher stamp than us.
+		 */
+		spin_lock_mutex(&lock->wait_lock, flags);
+		owner_ww_ctx = ww->ctx;
+		if (owner_ww_ctx &&
+		    __ww_mutex_stamp_after(ww_ctx, owner_ww_ctx)) {
+			spin_unlock_mutex(&lock->wait_lock, flags);
+			return false;
+		}
+		spin_unlock_mutex(&lock->wait_lock, flags);
+	}
 
 	rcu_read_lock();
 	while (__mutex_owner(lock) == owner) {
@@ -414,9 +436,16 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner,
 			 * Check this in every inner iteration because we may
 			 * be racing against another thread's ww_mutex_lock.
 			 */
-			if (ww_ctx->acquired > 0 && READ_ONCE(ww->ctx)) {
-				ret = false;
-				break;
+			if (ww_ctx->acquired > 0) {
+				struct ww_acquire_ctx *current_ctx;
+
+				current_ctx = READ_ONCE(ww->ctx);
+
+				if (current_ctx &&
+				    current_ctx != owner_ww_ctx) {
+					ret = false;
+					break;
+				}
 			}
 
 			/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context
  2016-11-28 12:20 ` [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context Nicolai Hähnle
@ 2016-11-28 12:42   ` Maarten Lankhorst
  2016-11-28 12:50     ` Peter Zijlstra
  2016-11-28 12:58   ` Christian König
  1 sibling, 1 reply; 28+ messages in thread
From: Maarten Lankhorst @ 2016-11-28 12:42 UTC (permalink / raw)
  To: Nicolai Hähnle, linux-kernel
  Cc: Nicolai Hähnle, Peter Zijlstra, Ingo Molnar, Daniel Vetter,
	Chris Wilson, dri-devel

Op 28-11-16 om 13:20 schreef Nicolai Hähnle:
> From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Maarten Lankhorst <dev@mblankhorst.nl>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
> ---
>  drivers/gpu/drm/vgem/vgem_fence.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c
> index 488909a..e1d516f 100644
> --- a/drivers/gpu/drm/vgem/vgem_fence.c
> +++ b/drivers/gpu/drm/vgem/vgem_fence.c
> @@ -191,12 +191,12 @@ int vgem_fence_attach_ioctl(struct drm_device *dev,
>  
>  	/* Expose the fence via the dma-buf */
>  	ret = 0;
> -	mutex_lock(&resv->lock.base);
> +	ww_mutex_lock(&resv->lock.base, NULL);
Yuck, can we rename base to __NEVER_TOUCH_DIRECTLY_OUTSIDE_LOCKING_CORE?
It's harder to get them confused like that, even with a null context it's illegal to call mutex_lock/unlock directly.

~Maarten

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context
  2016-11-28 12:42   ` Maarten Lankhorst
@ 2016-11-28 12:50     ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-11-28 12:50 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: Nicolai Hähnle, linux-kernel, Nicolai Hähnle,
	Ingo Molnar, Daniel Vetter, Chris Wilson, dri-devel

On Mon, Nov 28, 2016 at 01:42:26PM +0100, Maarten Lankhorst wrote:

> > +	ww_mutex_lock(&resv->lock.base, NULL);
> Yuck, can we rename base to __NEVER_TOUCH_DIRECTLY_OUTSIDE_LOCKING_CORE?
> It's harder to get them confused like that, even with a null context it's illegal to call mutex_lock/unlock directly.

I think there's a __private sparse annotation that could be used.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context
  2016-11-28 12:20 ` [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context Nicolai Hähnle
  2016-11-28 12:42   ` Maarten Lankhorst
@ 2016-11-28 12:58   ` Christian König
  1 sibling, 0 replies; 28+ messages in thread
From: Christian König @ 2016-11-28 12:58 UTC (permalink / raw)
  To: Nicolai Hähnle, linux-kernel
  Cc: Maarten Lankhorst, Nicolai Hähnle, Peter Zijlstra,
	dri-devel, Ingo Molnar

Am 28.11.2016 um 13:20 schrieb Nicolai Hähnle:
> From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Maarten Lankhorst <dev@mblankhorst.nl>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: dri-devel@lists.freedesktop.org
> Signed-off-by: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
> ---
>   drivers/gpu/drm/vgem/vgem_fence.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c
> index 488909a..e1d516f 100644
> --- a/drivers/gpu/drm/vgem/vgem_fence.c
> +++ b/drivers/gpu/drm/vgem/vgem_fence.c
> @@ -191,12 +191,12 @@ int vgem_fence_attach_ioctl(struct drm_device *dev,
>   
>   	/* Expose the fence via the dma-buf */
>   	ret = 0;
> -	mutex_lock(&resv->lock.base);
> +	ww_mutex_lock(&resv->lock.base, NULL);

Accessing the base member here is probably superfluous now and will most 
likely raise a warning.

Christian.

>   	if (arg->flags & VGEM_FENCE_WRITE)
>   		reservation_object_add_excl_fence(resv, fence);
>   	else if ((ret = reservation_object_reserve_shared(resv)) == 0)
>   		reservation_object_add_shared_fence(resv, fence);
> -	mutex_unlock(&resv->lock.base);
> +	ww_mutex_unlock(&resv->lock.base);
>   
>   	/* Record the fence in our idr for later signaling */
>   	if (ret == 0) {

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/4] locking: Begin kselftests for ww_mutex
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
                   ` (10 preceding siblings ...)
  2016-11-28 12:20 ` [PATCH 11/11] [rfc] locking/ww_mutex: Always spin optimistically for the first waiter Nicolai Hähnle
@ 2016-11-30  0:35 ` Chris Wilson
  2016-11-30  0:35   ` [PATCH 2/4] locking: Add kselftests for ww_mutex AA deadlock detection Chris Wilson
                     ` (4 more replies)
  2016-11-30  9:40 ` [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Chris Wilson
  12 siblings, 5 replies; 28+ messages in thread
From: Chris Wilson @ 2016-11-30  0:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chris Wilson, Peter Zijlstra, Maarten Lankhorst, Nicolai Hähnle

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Nicolai Hähnle <nhaehnle@gmail.com>
---
 kernel/locking/Makefile        |   1 +
 kernel/locking/test-ww_mutex.c | 137 +++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug              |  10 +++
 3 files changed, 148 insertions(+)
 create mode 100644 kernel/locking/test-ww_mutex.c

diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 6f88e352cd4f..760158d9d98d 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -28,3 +28,4 @@ obj-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
 obj-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem-xadd.o
 obj-$(CONFIG_QUEUED_RWLOCKS) += qrwlock.o
 obj-$(CONFIG_LOCK_TORTURE_TEST) += locktorture.o
+obj-$(CONFIG_WW_MUTEX_SELFTEST) += test-ww_mutex.o
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
new file mode 100644
index 000000000000..e94b807e06c2
--- /dev/null
+++ b/kernel/locking/test-ww_mutex.c
@@ -0,0 +1,137 @@
+/*
+ * Module-based API test facility for ww_mutexes
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, you can access it online at
+ * http://www.gnu.org/licenses/gpl-2.0.html.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kthread.h>
+#include <linux/ww_mutex.h>
+#include <linux/completion.h>
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Intel Corporation");
+
+static DEFINE_WW_CLASS(ww_class);
+
+struct test_mutex {
+	struct work_struct work;
+	struct ww_mutex mutex;
+	struct completion ready, go, done;
+	unsigned flags;
+#define TEST_AB_SPIN BIT(0)
+#define TEST_AB_TRY BIT(1)
+};
+
+static void test_mutex_work(struct work_struct *work)
+{
+	struct test_mutex *mtx = container_of(work, typeof(*mtx), work);
+
+	complete(&mtx->ready);
+	wait_for_completion(&mtx->go);
+
+	if (mtx->flags & TEST_AB_TRY) {
+		while (!ww_mutex_trylock(&mtx->mutex))
+			cpu_relax();
+	} else {
+		ww_mutex_lock(&mtx->mutex, NULL);
+	}
+	complete(&mtx->done);
+	ww_mutex_unlock(&mtx->mutex);
+}
+
+static int __test_mutex(unsigned flags)
+{
+	struct test_mutex mtx;
+	int ret;
+
+	ww_mutex_init(&mtx.mutex, &ww_class);
+	INIT_WORK_ONSTACK(&mtx.work, test_mutex_work);
+	init_completion(&mtx.ready);
+	init_completion(&mtx.go);
+	init_completion(&mtx.done);
+	mtx.flags = flags;
+
+	schedule_work(&mtx.work);
+
+	wait_for_completion(&mtx.ready);
+	ww_mutex_lock(&mtx.mutex, NULL);
+	complete(&mtx.go);
+	if (flags & TEST_AB_SPIN) {
+		unsigned long timeout = jiffies + HZ;
+
+		ret = 0;
+		do {
+			if (completion_done(&mtx.done)) {
+				ret = -EINVAL;
+				break;
+			}
+			cpu_relax();
+		} while (time_before(jiffies, timeout));
+	} else {
+		ret = wait_for_completion_timeout(&mtx.done, HZ);
+	}
+	ww_mutex_unlock(&mtx.mutex);
+
+	if (ret) {
+		pr_err("%s(flags=%x): mutual exclusion failure\n",
+		       __func__, flags);
+		ret = -EINVAL;
+	}
+
+	flush_work(&mtx.work);
+	destroy_work_on_stack(&mtx.work);
+	return ret;
+}
+
+static int test_mutex(void)
+{
+	unsigned int modes[] = {
+		0,
+		TEST_AB_SPIN,
+		TEST_AB_TRY,
+		TEST_AB_SPIN | TEST_AB_TRY,
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(modes); i++) {
+		int ret;
+
+		ret = __test_mutex(modes[i]);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int __init test_ww_mutex_init(void)
+{
+	int ret;
+
+	ret = test_mutex();
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static void __exit test_ww_mutex_exit(void)
+{
+}
+
+module_init(test_ww_mutex_init);
+module_exit(test_ww_mutex_exit);
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a6c8db1d62f6..5ffe7fd34c50 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1161,6 +1161,16 @@ config LOCK_TORTURE_TEST
 	  Say M if you want these torture tests to build as a module.
 	  Say N if you are unsure.
 
+config WW_MUTEX_SELFTEST
+	tristate "Wait/wound mutex selftests"
+	select DEBUG_WW_MUTEX_SLOWPATH
+	help
+	  This option provides a kernel module that runs tests on the
+	  on the struct ww_mutex locking API.
+
+	  Say M if you want these self tests to build as a module.
+	  Say N if you are unsure.
+
 endmenu # lock debugging
 
 config TRACE_IRQFLAGS
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/4] locking: Add kselftests for ww_mutex AA deadlock detection
  2016-11-30  0:35 ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Chris Wilson
@ 2016-11-30  0:35   ` Chris Wilson
  2016-11-30  0:35   ` [PATCH 3/4] locking: Add kselftests for ww_mutex ABBA " Chris Wilson
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2016-11-30  0:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chris Wilson, Peter Zijlstra, Maarten Lankhorst, Nicolai Hähnle

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Nicolai Hähnle <nhaehnle@gmail.com>
---
 kernel/locking/test-ww_mutex.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index e94b807e06c2..02a4bacf8aac 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -118,6 +118,41 @@ static int test_mutex(void)
 	return 0;
 }
 
+static int test_aa(void)
+{
+	struct ww_mutex mutex;
+	struct ww_acquire_ctx ctx;
+	int ret;
+
+	ww_mutex_init(&mutex, &ww_class);
+	ww_acquire_init(&ctx, &ww_class);
+
+	ww_mutex_lock(&mutex, &ctx);
+
+	if (ww_mutex_trylock(&mutex))  {
+		pr_err("%s: trylocked itself!\n", __func__);
+		ww_mutex_unlock(&mutex);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = ww_mutex_lock(&mutex, &ctx);
+	if (ret != -EALREADY) {
+		pr_err("%s: missed deadlock for recursing, ret=%d\n",
+		       __func__, ret);
+		if (!ret)
+			ww_mutex_unlock(&mutex);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = 0;
+out:
+	ww_mutex_unlock(&mutex);
+	ww_acquire_fini(&ctx);
+	return ret;
+}
+
 static int __init test_ww_mutex_init(void)
 {
 	int ret;
@@ -126,6 +161,10 @@ static int __init test_ww_mutex_init(void)
 	if (ret)
 		return ret;
 
+	ret = test_aa();
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/4] locking: Add kselftests for ww_mutex ABBA deadlock detection
  2016-11-30  0:35 ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Chris Wilson
  2016-11-30  0:35   ` [PATCH 2/4] locking: Add kselftests for ww_mutex AA deadlock detection Chris Wilson
@ 2016-11-30  0:35   ` Chris Wilson
  2016-11-30  0:35   ` [PATCH 4/4] locking: Add kselftests for ww_mutex stress Chris Wilson
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2016-11-30  0:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chris Wilson, Peter Zijlstra, Maarten Lankhorst, Nicolai Hähnle

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Nicolai Hähnle <nhaehnle@gmail.com>
---
 kernel/locking/test-ww_mutex.c | 75 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 02a4bacf8aac..63a5031de138 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -153,6 +153,77 @@ static int test_aa(void)
 	return ret;
 }
 
+struct test_abba {
+	struct work_struct work;
+	struct ww_mutex a_mutex;
+	struct ww_mutex b_mutex;
+	struct completion a_ready;
+	struct completion b_ready;
+	struct completion done;
+	int ret;
+};
+
+static void test_abba_work(struct work_struct *work)
+{
+	struct test_abba *abba = container_of(work, typeof(*abba), work);
+	struct ww_acquire_ctx ctx;
+
+	ww_acquire_init(&ctx, &ww_class);
+	ww_mutex_lock(&abba->b_mutex, &ctx);
+
+	complete(&abba->b_ready);
+	wait_for_completion(&abba->a_ready);
+	abba->ret = ww_mutex_lock(&abba->a_mutex, &ctx);
+	if (!abba->ret)
+		ww_mutex_unlock(&abba->a_mutex);
+
+	ww_mutex_unlock(&abba->b_mutex);
+	ww_acquire_fini(&ctx);
+
+	complete(&abba->done);
+}
+
+static int test_abba(void)
+{
+	struct test_abba abba;
+	struct ww_acquire_ctx ctx;
+	int ret;
+
+	ww_mutex_init(&abba.a_mutex, &ww_class);
+	ww_mutex_init(&abba.b_mutex, &ww_class);
+	INIT_WORK_ONSTACK(&abba.work, test_abba_work);
+	init_completion(&abba.a_ready);
+	init_completion(&abba.b_ready);
+	init_completion(&abba.done);
+
+	schedule_work(&abba.work);
+
+	ww_acquire_init(&ctx, &ww_class);
+	ww_mutex_lock(&abba.a_mutex, &ctx);
+	complete(&abba.a_ready);
+	wait_for_completion(&abba.b_ready);
+	ret = ww_mutex_lock(&abba.b_mutex, &ctx);
+	if (!ret)
+		ww_mutex_unlock(&abba.b_mutex);
+	ww_mutex_unlock(&abba.a_mutex);
+
+	wait_for_completion(&abba.done);
+
+	if (ret != -EDEADLK && abba.ret != -EDEADLK) {
+		pr_err("%s: missed ABBA deadlock, A ret=%d, B ret=%d\n",
+		       __func__, ret, abba.ret);
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = 0;
+out:
+	ww_acquire_fini(&ctx);
+	flush_work(&abba.work);
+	destroy_work_on_stack(&abba.work);
+	return ret;
+}
+
 static int __init test_ww_mutex_init(void)
 {
 	int ret;
@@ -165,6 +236,10 @@ static int __init test_ww_mutex_init(void)
 	if (ret)
 		return ret;
 
+	ret = test_abba();
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/4] locking: Add kselftests for ww_mutex stress
  2016-11-30  0:35 ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Chris Wilson
  2016-11-30  0:35   ` [PATCH 2/4] locking: Add kselftests for ww_mutex AA deadlock detection Chris Wilson
  2016-11-30  0:35   ` [PATCH 3/4] locking: Add kselftests for ww_mutex ABBA " Chris Wilson
@ 2016-11-30  0:35   ` Chris Wilson
  2016-11-30 12:29     ` Maarten Lankhorst
  2016-11-30  8:00   ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Nicolai Hähnle
  2016-11-30 19:08   ` kbuild test robot
  4 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2016-11-30  0:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chris Wilson, Peter Zijlstra, Maarten Lankhorst, Nicolai Hähnle

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Nicolai Hähnle <nhaehnle@gmail.com>
---
 kernel/locking/test-ww_mutex.c | 134 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)

diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 63a5031de138..c367014f62dc 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -21,6 +21,9 @@
 #include <linux/kthread.h>
 #include <linux/ww_mutex.h>
 #include <linux/completion.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/delay.h>
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Intel Corporation");
@@ -224,6 +227,129 @@ static int test_abba(void)
 	return ret;
 }
 
+struct stress {
+	struct work_struct work;
+	struct ww_mutex *locks;
+	int nlocks;
+};
+
+static int *get_random_order(int count)
+{
+	int *order;
+	int n, r, tmp;
+
+	order = kmalloc_array(count, sizeof(*order), GFP_TEMPORARY);
+	if (!order)
+		return order;
+
+	for (n = 0; n < count; n++)
+		order[n] = n;
+
+	for (n = count - 1; n > 1; n--) {
+		r = get_random_int() % (n + 1);
+		if (r != n) {
+			tmp = order[n];
+			order[n] = order[r];
+			order[r] = tmp;
+		}
+	}
+
+	return order;
+}
+
+static void stress_work(struct work_struct *work)
+{
+	struct stress *stress = container_of(work, typeof(*stress), work);
+	const int nlocks = stress->nlocks;
+	struct ww_mutex *locks = stress->locks;
+	struct ww_acquire_ctx ctx;
+	int contended = -1;
+	int *order;
+	int n, ret;
+
+	order = get_random_order(nlocks);
+	if (!order)
+		return;
+
+	ww_acquire_init(&ctx, &ww_class);
+
+retry:
+	ret = 0;
+	for (n = 0; n < nlocks; n++) {
+		if (n == contended)
+			continue;
+
+		ret = ww_mutex_lock(&locks[order[n]], &ctx);
+		if (ret < 0)
+			break;
+	}
+	if (!ret)
+		usleep_range(1000, 2000); /* dummy load */
+
+	if (contended > n)
+		ww_mutex_unlock(&locks[order[contended]]);
+	contended = n;
+	while (n--)
+		ww_mutex_unlock(&locks[order[n]]);
+
+	if (ret == -EDEADLK) {
+		ww_mutex_lock_slow(&locks[order[contended]], &ctx);
+		goto retry;
+	}
+
+	if (ret)
+		pr_err_once("ww_mutex stress test failed with %d\n", ret);
+
+	ww_acquire_fini(&ctx);
+
+	kfree(order);
+	kfree(stress);
+}
+
+static int stress(int nlocks, int count)
+{
+	struct ww_mutex *locks;
+	struct workqueue_struct *wq;
+	int ret = -ENOMEM;
+	int n;
+
+	wq = alloc_workqueue("test-ww_mutex", WQ_UNBOUND, 0);
+	if (!wq)
+		return -ENOMEM;
+
+	locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
+	if (!locks)
+		goto err;
+
+	for (n = 0; n < nlocks; n++)
+		ww_mutex_init(&locks[n], &ww_class);
+
+	for (n = 0; n < count; n++) {
+		struct stress *stress;
+
+		stress = kmalloc(sizeof(*stress), GFP_KERNEL);
+		if (!stress)
+			break;
+
+		INIT_WORK(&stress->work, stress_work);
+		stress->locks = locks;
+		stress->nlocks = nlocks;
+
+		queue_work(wq, &stress->work);
+	}
+
+	flush_workqueue(wq);
+
+	for (n = 0; n < nlocks; n++)
+		ww_mutex_destroy(&locks[n]);
+	kfree(locks);
+
+	ret = 0;
+err:
+	destroy_workqueue(wq);
+	return ret;
+}
+
 static int __init test_ww_mutex_init(void)
 {
 	int ret;
@@ -240,6 +366,14 @@ static int __init test_ww_mutex_init(void)
 	if (ret)
 		return ret;
 
+	ret = stress(16, 1024);
+	if (ret)
+		return ret;
+
+	ret = stress(4096, 1024);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] locking: Begin kselftests for ww_mutex
  2016-11-30  0:35 ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Chris Wilson
                     ` (2 preceding siblings ...)
  2016-11-30  0:35   ` [PATCH 4/4] locking: Add kselftests for ww_mutex stress Chris Wilson
@ 2016-11-30  8:00   ` Nicolai Hähnle
  2016-11-30 19:08   ` kbuild test robot
  4 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-30  8:00 UTC (permalink / raw)
  To: Chris Wilson, linux-kernel; +Cc: Peter Zijlstra, Maarten Lankhorst

On 30.11.2016 01:35, Chris Wilson wrote:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Maarten Lankhorst <dev@mblankhorst.nl>
> Cc: Nicolai Hähnle <nhaehnle@gmail.com>
> ---
>  kernel/locking/Makefile        |   1 +
>  kernel/locking/test-ww_mutex.c | 137 +++++++++++++++++++++++++++++++++++++++++
>  lib/Kconfig.debug              |  10 +++
>  3 files changed, 148 insertions(+)
>  create mode 100644 kernel/locking/test-ww_mutex.c
>
> diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
> index 6f88e352cd4f..760158d9d98d 100644
> --- a/kernel/locking/Makefile
> +++ b/kernel/locking/Makefile
> @@ -28,3 +28,4 @@ obj-$(CONFIG_RWSEM_GENERIC_SPINLOCK) += rwsem-spinlock.o
>  obj-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem-xadd.o
>  obj-$(CONFIG_QUEUED_RWLOCKS) += qrwlock.o
>  obj-$(CONFIG_LOCK_TORTURE_TEST) += locktorture.o
> +obj-$(CONFIG_WW_MUTEX_SELFTEST) += test-ww_mutex.o
> diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
> new file mode 100644
> index 000000000000..e94b807e06c2
> --- /dev/null
> +++ b/kernel/locking/test-ww_mutex.c
> @@ -0,0 +1,137 @@
> +/*
> + * Module-based API test facility for ww_mutexes
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, you can access it online at
> + * http://www.gnu.org/licenses/gpl-2.0.html.
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/kthread.h>
> +#include <linux/ww_mutex.h>
> +#include <linux/completion.h>
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Intel Corporation");
> +
> +static DEFINE_WW_CLASS(ww_class);
> +
> +struct test_mutex {
> +	struct work_struct work;
> +	struct ww_mutex mutex;
> +	struct completion ready, go, done;
> +	unsigned flags;
> +#define TEST_AB_SPIN BIT(0)
> +#define TEST_AB_TRY BIT(1)
> +};


Is it common to put #defines inside structs like that? It looks odd to 
me. Apart from that, patches 1-4 all make sense to me.

Thanks,
Nicolai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes
  2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
                   ` (11 preceding siblings ...)
  2016-11-30  0:35 ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Chris Wilson
@ 2016-11-30  9:40 ` Chris Wilson
  2016-11-30 11:52   ` Nicolai Hähnle
  12 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2016-11-30  9:40 UTC (permalink / raw)
  To: Nicolai Hähnle; +Cc: linux-kernel

On Mon, Nov 28, 2016 at 01:20:01PM +0100, Nicolai Hähnle wrote:
> I've included timings taken from a contention-heavy stress test to some of
> the patches. The stress test performs actual GPU operations which take a
> good chunk of the wall time, but even so, the series still manages to
> improve the wall time quite a bit.

In looking at your contention scenarios, what was the average/max list
size? Just wondering if it makes sense to use an rbtree + first_waiter
instead of a sorted list from the start.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes
  2016-11-30  9:40 ` [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Chris Wilson
@ 2016-11-30 11:52   ` Nicolai Hähnle
  2016-11-30 12:20     ` Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-30 11:52 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-kernel

On 30.11.2016 10:40, Chris Wilson wrote:
> On Mon, Nov 28, 2016 at 01:20:01PM +0100, Nicolai Hähnle wrote:
>> I've included timings taken from a contention-heavy stress test to some of
>> the patches. The stress test performs actual GPU operations which take a
>> good chunk of the wall time, but even so, the series still manages to
>> improve the wall time quite a bit.
>
> In looking at your contention scenarios, what was the average/max list
> size? Just wondering if it makes sense to use an rbtree + first_waiter
> instead of a sorted list from the start.

I haven't measured this with the new series; previously, while I was 
debugging the deadlock on older kernels, I occasionally saw wait lists 
of up to ~20 tasks, spit-balling the average over all the deadlock cases 
I'd say the average was not more than ~5. The average _without_ 
deadlocks should be lower, if anything.

I saw that your test cases go quite a bit higher, but even the rather 
extreme load I was testing with -- which is not quite a load from an 
actual application, though it is related to one -- has 40 threads and so 
a theoretical maximum of 40.

Nicolai

> -Chris
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes
  2016-11-30 11:52   ` Nicolai Hähnle
@ 2016-11-30 12:20     ` Chris Wilson
  2016-11-30 13:40       ` Nicolai Hähnle
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2016-11-30 12:20 UTC (permalink / raw)
  To: Nicolai Hähnle; +Cc: linux-kernel

On Wed, Nov 30, 2016 at 12:52:28PM +0100, Nicolai Hähnle wrote:
> On 30.11.2016 10:40, Chris Wilson wrote:
> >On Mon, Nov 28, 2016 at 01:20:01PM +0100, Nicolai Hähnle wrote:
> >>I've included timings taken from a contention-heavy stress test to some of
> >>the patches. The stress test performs actual GPU operations which take a
> >>good chunk of the wall time, but even so, the series still manages to
> >>improve the wall time quite a bit.
> >
> >In looking at your contention scenarios, what was the average/max list
> >size? Just wondering if it makes sense to use an rbtree + first_waiter
> >instead of a sorted list from the start.
> 
> I haven't measured this with the new series; previously, while I was
> debugging the deadlock on older kernels, I occasionally saw wait
> lists of up to ~20 tasks, spit-balling the average over all the
> deadlock cases I'd say the average was not more than ~5. The average
> _without_ deadlocks should be lower, if anything.

Right, I wasn't expecting the list to be large, certainly no larger than
cores typically. On the borderline of where a more complex tree starts
to pay off.
 
> I saw that your test cases go quite a bit higher, but even the
> rather extreme load I was testing with -- which is not quite a load
> from an actual application, though it is related to one -- has 40
> threads and so a theoretical maximum of 40.

The stress loads were just values plucked out of nowhere to try and have
a reasonable stab at hitting the deadlock. Certainly if we were to wrap
that up in a microbenchmark we would want to have wider coverage (so the
graph against contention is more useful).

Do you have a branch I can pull the patches for (or what did you use as
the base)?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] locking: Add kselftests for ww_mutex stress
  2016-11-30  0:35   ` [PATCH 4/4] locking: Add kselftests for ww_mutex stress Chris Wilson
@ 2016-11-30 12:29     ` Maarten Lankhorst
  2016-11-30 12:52       ` Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Maarten Lankhorst @ 2016-11-30 12:29 UTC (permalink / raw)
  To: Chris Wilson, linux-kernel; +Cc: Peter Zijlstra, Nicolai Hähnle

Op 30-11-16 om 01:35 schreef Chris Wilson:
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Maarten Lankhorst <dev@mblankhorst.nl>
> Cc: Nicolai Hähnle <nhaehnle@gmail.com>
> ---
>  kernel/locking/test-ww_mutex.c | 134 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 134 insertions(+)
>
> diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
> index 63a5031de138..c367014f62dc 100644
> --- a/kernel/locking/test-ww_mutex.c
> +++ b/kernel/locking/test-ww_mutex.c
> @@ -21,6 +21,9 @@
>  #include <linux/kthread.h>
>  #include <linux/ww_mutex.h>
>  #include <linux/completion.h>
> +#include <linux/random.h>
> +#include <linux/slab.h>
> +#include <linux/delay.h>
>  
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Intel Corporation");
> @@ -224,6 +227,129 @@ static int test_abba(void)
>  	return ret;
>  }
>  
> +struct stress {
> +	struct work_struct work;
> +	struct ww_mutex *locks;
> +	int nlocks;
> +};
> +
> +static int *get_random_order(int count)
> +{
> +	int *order;
> +	int n, r, tmp;
> +
> +	order = kmalloc_array(count, sizeof(*order), GFP_TEMPORARY);
> +	if (!order)
> +		return order;
> +
> +	for (n = 0; n < count; n++)
> +		order[n] = n;
> +
> +	for (n = count - 1; n > 1; n--) {
> +		r = get_random_int() % (n + 1);
> +		if (r != n) {
> +			tmp = order[n];
> +			order[n] = order[r];
> +			order[r] = tmp;
> +		}
> +	}
> +
> +	return order;
> +}
> +
> +static void stress_work(struct work_struct *work)
> +{
> +	struct stress *stress = container_of(work, typeof(*stress), work);
> +	const int nlocks = stress->nlocks;
> +	struct ww_mutex *locks = stress->locks;
> +	struct ww_acquire_ctx ctx;
> +	int contended = -1;
> +	int *order;
> +	int n, ret;
> +
> +	order = get_random_order(nlocks);
> +	if (!order)
> +		return;
> +
> +	ww_acquire_init(&ctx, &ww_class);
> +
> +retry:
> +	ret = 0;
> +	for (n = 0; n < nlocks; n++) {
> +		if (n == contended)
> +			continue;
> +
> +		ret = ww_mutex_lock(&locks[order[n]], &ctx);
> +		if (ret < 0)
> +			break;
> +	}
What's wrong with attempting to lock the contended lock here?
Who knows, this might find some more bugs than the functional tests already do.
> +	if (!ret)
> +		usleep_range(1000, 2000); /* dummy load */
> +
> +	if (contended > n)
> +		ww_mutex_unlock(&locks[order[contended]]);
> +	contended = n;
> +	while (n--)
> +		ww_mutex_unlock(&locks[order[n]]);
> +
> +	if (ret == -EDEADLK) {
> +		ww_mutex_lock_slow(&locks[order[contended]], &ctx);
> +		goto retry;
> +	}
> +
> +	if (ret)
> +		pr_err_once("ww_mutex stress test failed with %d\n", ret);
> +
> +	ww_acquire_fini(&ctx);
> +
> +	kfree(order);
> +	kfree(stress);
> +}
> +
> +static int stress(int nlocks, int count)
> +{
> +	struct ww_mutex *locks;
> +	struct workqueue_struct *wq;
> +	int ret = -ENOMEM;
> +	int n;
> +
> +	wq = alloc_workqueue("test-ww_mutex", WQ_UNBOUND, 0);
> +	if (!wq)
> +		return -ENOMEM;
> +
> +	locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
> +	if (!locks)
> +		goto err;
> +
> +	for (n = 0; n < nlocks; n++)
> +		ww_mutex_init(&locks[n], &ww_class);
> +
> +	for (n = 0; n < count; n++) {
> +		struct stress *stress;
> +
> +		stress = kmalloc(sizeof(*stress), GFP_KERNEL);
> +		if (!stress)
> +			break;
> +
> +		INIT_WORK(&stress->work, stress_work);
> +		stress->locks = locks;
> +		stress->nlocks = nlocks;
> +
> +		queue_work(wq, &stress->work);
> +	}
> +
> +	flush_workqueue(wq);
> +
> +	for (n = 0; n < nlocks; n++)
> +		ww_mutex_destroy(&locks[n]);
> +	kfree(locks);
> +
> +	ret = 0;
> +err:
> +	destroy_workqueue(wq);
> +	return ret;
> +}
> +
>  static int __init test_ww_mutex_init(void)
>  {
>  	int ret;
> @@ -240,6 +366,14 @@ static int __init test_ww_mutex_init(void)
>  	if (ret)
>  		return ret;
>  
> +	ret = stress(16, 1024);
> +	if (ret)
> +		return ret;
> +
> +	ret = stress(4096, 1024);
> +	if (ret)
> +		return ret;
> +
>  	return 0;
>  }
>  

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] locking: Add kselftests for ww_mutex stress
  2016-11-30 12:29     ` Maarten Lankhorst
@ 2016-11-30 12:52       ` Chris Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2016-11-30 12:52 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: linux-kernel, Peter Zijlstra, Nicolai Hähnle

On Wed, Nov 30, 2016 at 01:29:39PM +0100, Maarten Lankhorst wrote:
> > +static void stress_work(struct work_struct *work)
> > +{
> > +	struct stress *stress = container_of(work, typeof(*stress), work);
> > +	const int nlocks = stress->nlocks;
> > +	struct ww_mutex *locks = stress->locks;
> > +	struct ww_acquire_ctx ctx;
> > +	int contended = -1;
> > +	int *order;
> > +	int n, ret;
> > +
> > +	order = get_random_order(nlocks);
> > +	if (!order)
> > +		return;
> > +
> > +	ww_acquire_init(&ctx, &ww_class);
> > +
> > +retry:
> > +	ret = 0;
> > +	for (n = 0; n < nlocks; n++) {
> > +		if (n == contended)
> > +			continue;
> > +
> > +		ret = ww_mutex_lock(&locks[order[n]], &ctx);
> > +		if (ret < 0)
> > +			break;
> > +	}
> What's wrong with attempting to lock the contended lock here?
> Who knows, this might find some more bugs than the functional tests already do.

I was trying to follow the guide, which was lock, backoff by unlocking
everything, slowlock the contended lock, then lock everything else.

I have now a second worker that follows the reordering method as well.
(As well as a test that slowlock after the ABBA deadlock detection
resolves the locking order.)

If you have a sketch of something else to try, I'll add it.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes
  2016-11-30 12:20     ` Chris Wilson
@ 2016-11-30 13:40       ` Nicolai Hähnle
  0 siblings, 0 replies; 28+ messages in thread
From: Nicolai Hähnle @ 2016-11-30 13:40 UTC (permalink / raw)
  To: Chris Wilson; +Cc: linux-kernel

On 30.11.2016 13:20, Chris Wilson wrote:
> On Wed, Nov 30, 2016 at 12:52:28PM +0100, Nicolai Hähnle wrote:
>> On 30.11.2016 10:40, Chris Wilson wrote:
>>> On Mon, Nov 28, 2016 at 01:20:01PM +0100, Nicolai Hähnle wrote:
>>>> I've included timings taken from a contention-heavy stress test to some of
>>>> the patches. The stress test performs actual GPU operations which take a
>>>> good chunk of the wall time, but even so, the series still manages to
>>>> improve the wall time quite a bit.
>>>
>>> In looking at your contention scenarios, what was the average/max list
>>> size? Just wondering if it makes sense to use an rbtree + first_waiter
>>> instead of a sorted list from the start.
>>
>> I haven't measured this with the new series; previously, while I was
>> debugging the deadlock on older kernels, I occasionally saw wait
>> lists of up to ~20 tasks, spit-balling the average over all the
>> deadlock cases I'd say the average was not more than ~5. The average
>> _without_ deadlocks should be lower, if anything.
>
> Right, I wasn't expecting the list to be large, certainly no larger than
> cores typically. On the borderline of where a more complex tree starts
> to pay off.
>
>> I saw that your test cases go quite a bit higher, but even the
>> rather extreme load I was testing with -- which is not quite a load
>> from an actual application, though it is related to one -- has 40
>> threads and so a theoretical maximum of 40.
>
> The stress loads were just values plucked out of nowhere to try and have
> a reasonable stab at hitting the deadlock. Certainly if we were to wrap
> that up in a microbenchmark we would want to have wider coverage (so the
> graph against contention is more useful).
>
> Do you have a branch I can pull the patches for (or what did you use as
> the base)?

See git://people.freedesktop.org/~nh/linux mutex or 
https://cgit.freedesktop.org/~nh/linux/log/?h=mutex.

It's based on tip/core/locking + agd5f's drm-next, the latter only 
because I needed it for the test application.

Nicolai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 05/11] locking/ww_mutex: Add waiters in stamp order
  2016-11-28 12:20 ` [PATCH 05/11] locking/ww_mutex: Add waiters in stamp order Nicolai Hähnle
@ 2016-11-30 14:10   ` Chris Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2016-11-30 14:10 UTC (permalink / raw)
  To: Nicolai Hähnle
  Cc: linux-kernel, Nicolai Hähnle, Peter Zijlstra, Ingo Molnar,
	Maarten Lankhorst, Daniel Vetter, dri-devel

On Mon, Nov 28, 2016 at 01:20:06PM +0100, Nicolai Hähnle wrote:
> From: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
> 
> Add regular waiters in stamp order. Keep adding waiters that have no
> context in FIFO order and take care not to starve them.
> 
> While adding our task as a waiter, back off if we detect that there is a
> waiter with a lower stamp in front of us.
> 
> Make sure to call lock_contended even when we back off early.

I'm hitting
[   86.202749] WARNING: CPU: 1 PID: 813 at ./include/linux/ww_mutex.h:292 stress_inorder_work+0x436/0x4b5 [test_ww_mutex]
[   86.202885] DEBUG_LOCKS_WARN_ON(!ctx->contending_lock)

which if I understand correctly is due to

> +static inline int __sched
> +__ww_mutex_add_waiter(struct mutex_waiter *waiter,
> +		      struct mutex *lock,
> +		      struct ww_acquire_ctx *ww_ctx)
> +{
> +	if (ww_ctx) {
> +		struct mutex_waiter *cur;
> +
> +		/*
> +		 * Add the waiter before the first waiter with a higher stamp.
> +		 * Waiters without a context are skipped to avoid starving
> +		 * them.
> +		 */
> +		list_for_each_entry(cur, &lock->wait_list, list) {
> +			if (!cur->ww_ctx)
> +				continue;
> +
> +			if (__ww_mutex_stamp_after(ww_ctx, cur->ww_ctx)) {
> +				/* Back off immediately if necessary. */
> +				if (ww_ctx->acquired > 0)
> +					return -EDEADLK;

not setting ww_ctx->contending_lock here.

> +
> +				continue;
> +			}
> +
> +			list_add_tail(&waiter->list, &cur->list);
> +			return 0;
> +		}
> +	}
> +
> +	list_add_tail(&waiter->list, &lock->wait_list);
> +	return 0;
> +}

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] locking: Begin kselftests for ww_mutex
  2016-11-30  0:35 ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Chris Wilson
                     ` (3 preceding siblings ...)
  2016-11-30  8:00   ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Nicolai Hähnle
@ 2016-11-30 19:08   ` kbuild test robot
  4 siblings, 0 replies; 28+ messages in thread
From: kbuild test robot @ 2016-11-30 19:08 UTC (permalink / raw)
  To: Chris Wilson
  Cc: kbuild-all, linux-kernel, Chris Wilson, Peter Zijlstra,
	Maarten Lankhorst, Nicolai Hähnle

[-- Attachment #1: Type: text/plain, Size: 1068 bytes --]

Hi Chris,

[auto build test WARNING on tip/locking/core]
[also build test WARNING on v4.9-rc7 next-20161130]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Chris-Wilson/locking-Begin-kselftests-for-ww_mutex/20161130-221058
config: alpha-allmodconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=alpha 

All warnings (new ones prefixed by >>):

warning: (WW_MUTEX_SELFTEST) selects DEBUG_WW_MUTEX_SLOWPATH which has unmet direct dependencies (DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 47636 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-11-30 19:09 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-28 12:20 [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Nicolai Hähnle
2016-11-28 12:20 ` [PATCH 01/11] drm/vgem: Use ww_mutex_(un)lock even with a NULL context Nicolai Hähnle
2016-11-28 12:42   ` Maarten Lankhorst
2016-11-28 12:50     ` Peter Zijlstra
2016-11-28 12:58   ` Christian König
2016-11-28 12:20 ` [PATCH 02/11] locking/ww_mutex: Re-check ww->ctx in the inner optimistic spin loop Nicolai Hähnle
2016-11-28 12:20 ` [PATCH 03/11] locking/ww_mutex: Extract stamp comparison to __ww_mutex_stamp_after Nicolai Hähnle
2016-11-28 12:20 ` [PATCH 04/11] locking/ww_mutex: Set use_ww_ctx even when locking without a context Nicolai Hähnle
2016-11-28 12:20 ` [PATCH 05/11] locking/ww_mutex: Add waiters in stamp order Nicolai Hähnle
2016-11-30 14:10   ` Chris Wilson
2016-11-28 12:20 ` [PATCH 06/11] locking/ww_mutex: Notify waiters that have to back off while adding tasks to wait list Nicolai Hähnle
2016-11-28 12:20 ` [PATCH 07/11] locking/ww_mutex: Wake at most one waiter for back off when acquiring the lock Nicolai Hähnle
2016-11-28 12:20 ` [PATCH 08/11] locking/ww_mutex: Yield to other waiters from optimistic spin Nicolai Hähnle
2016-11-28 12:20 ` [PATCH 09/11] locking/mutex: Initialize mutex_waiter::ww_ctx with poison when debugging Nicolai Hähnle
2016-11-28 12:20 ` [PATCH 10/11] Documentation/locking/ww_mutex: Update the design document Nicolai Hähnle
2016-11-28 12:20 ` [PATCH 11/11] [rfc] locking/ww_mutex: Always spin optimistically for the first waiter Nicolai Hähnle
2016-11-30  0:35 ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Chris Wilson
2016-11-30  0:35   ` [PATCH 2/4] locking: Add kselftests for ww_mutex AA deadlock detection Chris Wilson
2016-11-30  0:35   ` [PATCH 3/4] locking: Add kselftests for ww_mutex ABBA " Chris Wilson
2016-11-30  0:35   ` [PATCH 4/4] locking: Add kselftests for ww_mutex stress Chris Wilson
2016-11-30 12:29     ` Maarten Lankhorst
2016-11-30 12:52       ` Chris Wilson
2016-11-30  8:00   ` [PATCH 1/4] locking: Begin kselftests for ww_mutex Nicolai Hähnle
2016-11-30 19:08   ` kbuild test robot
2016-11-30  9:40 ` [PATCH 00/11] locking/ww_mutex: Keep sorted wait list to avoid stampedes Chris Wilson
2016-11-30 11:52   ` Nicolai Hähnle
2016-11-30 12:20     ` Chris Wilson
2016-11-30 13:40       ` Nicolai Hähnle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).