All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] locking,drm: Fix ww mutex naming / algorithm inconsistency
@ 2018-06-13  7:47 ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13  7:47 UTC (permalink / raw)
  To: dri-devel, linux-kernel

This is a small fallout from a work to allow batching WW mutex locks and
unlocks.

Our Wound-Wait mutexes actually don't use the Wound-Wait algorithm but
the Wait-Die algorithm. One could perhaps rename those mutexes tree-wide to
"Wait-Die mutexes" or "Deadlock Avoidance mutexes". Another approach suggested
here is to implement also the "Wound-Wait" algorithm as a per-WW-class
choice, as it has advantages in some cases. See for example

http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html

Now Wound-Wait is a preemptive algorithm, and the preemption is implemented
using a lazy scheme: If a wounded transaction is about to go to sleep on
a contended WW mutex, we return -EDEADLK. That is sufficient for deadlock
prevention. Since with WW mutexes we also require the aborted transaction to
sleep waiting to lock the WW mutex it was aborted on, this choice also provides
a suitable WW mutex to sleep on. If we were to return -EDEADLK on the first
WW mutex lock after the transaction was wounded whether the WW mutex was
contended or not, the transaction might frequently be restarted without a wait,
which is far from optimal. Note also that with the lazy preemption scheme,
contrary to Wait-Die there will be no rollbacks on lock contention of locks
held by a transaction that has completed its locking sequence.

The modeset locks are then changed from Wait-Die to Wound-Wait since the
typical locking pattern of those locks very well matches the criterion for
a substantial reduction in the number of rollbacks. For reservation objects,
the benefit is more unclear at this point and they remain using Wait-Die.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 0/2] locking, drm: Fix ww mutex naming / algorithm inconsistency
@ 2018-06-13  7:47 ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13  7:47 UTC (permalink / raw)
  To: dri-devel, linux-kernel

This is a small fallout from a work to allow batching WW mutex locks and
unlocks.

Our Wound-Wait mutexes actually don't use the Wound-Wait algorithm but
the Wait-Die algorithm. One could perhaps rename those mutexes tree-wide to
"Wait-Die mutexes" or "Deadlock Avoidance mutexes". Another approach suggested
here is to implement also the "Wound-Wait" algorithm as a per-WW-class
choice, as it has advantages in some cases. See for example

http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html

Now Wound-Wait is a preemptive algorithm, and the preemption is implemented
using a lazy scheme: If a wounded transaction is about to go to sleep on
a contended WW mutex, we return -EDEADLK. That is sufficient for deadlock
prevention. Since with WW mutexes we also require the aborted transaction to
sleep waiting to lock the WW mutex it was aborted on, this choice also provides
a suitable WW mutex to sleep on. If we were to return -EDEADLK on the first
WW mutex lock after the transaction was wounded whether the WW mutex was
contended or not, the transaction might frequently be restarted without a wait,
which is far from optimal. Note also that with the lazy preemption scheme,
contrary to Wait-Die there will be no rollbacks on lock contention of locks
held by a transaction that has completed its locking sequence.

The modeset locks are then changed from Wait-Die to Wound-Wait since the
typical locking pattern of those locks very well matches the criterion for
a substantial reduction in the number of rollbacks. For reservation objects,
the benefit is more unclear at this point and they remain using Wait-Die.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-13  7:47 ` [PATCH 0/2] locking, drm: " Thomas Hellstrom
  (?)
@ 2018-06-13  7:47   ` Thomas Hellstrom
  -1 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13  7:47 UTC (permalink / raw)
  To: dri-devel, linux-kernel
  Cc: Thomas Hellstrom, Peter Zijlstra, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die in terms of elapsed time.
Possibly due to the larger number of held locks of sleeping transactions.

Update documentation and callers.

Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
tag patch-18-06-04

Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
chosen out of 100000. Four core Intel x86_64:

Algorithm    #threads       Rollbacks  time
Wound-Wait   4              ~100       ~17s.
Wait-Die     4              ~150000    ~19s.
Wound-Wait   16             ~360000    ~109s.
Wait-Die     16             ~450000    ~82s.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 Documentation/locking/ww-mutex-design.txt | 57 ++++++++++++++----
 drivers/dma-buf/reservation.c             |  2 +-
 drivers/gpu/drm/drm_modeset_lock.c        |  2 +-
 include/linux/ww_mutex.h                  | 19 ++++--
 kernel/locking/locktorture.c              |  2 +-
 kernel/locking/mutex.c                    | 98 ++++++++++++++++++++++++++++---
 kernel/locking/test-ww_mutex.c            |  2 +-
 lib/locking-selftest.c                    |  2 +-
 8 files changed, 152 insertions(+), 32 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
index 34c3a1b50b9a..29c85623b551 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -1,4 +1,4 @@
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
 ======================================
 
 Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the
 younger task) unlocks all of the buffers that it has already locked, and then
 tries again.
 
-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+preempts the transaction holding the lock, requiring it to back off. It
+Wounds the other transaction.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm which requires a reliable way to preempt another transaction.
 
 Concepts
 --------
@@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task
 trying to acquire locks doesn't grab a new reservation id, but keeps the one it
 acquired when starting the lock acquisition. This ticket is stored in the
 acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.
 
 W/w class: In contrast to normal mutexes the lock class needs to be explicit for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.
 
 Furthermore there are three different class of w/w lock acquire functions:
 
@@ -90,10 +105,15 @@ provided.
 Usage
 -----
 
+The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
+argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
+typically expect the number of simultaneous competing transactions to be small,
+and the rollback cost can be substantial.
+
 Three different ways to acquire locks within the same w/w class. Common
 definitions for methods #1 and #2:
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 struct obj {
 	struct ww_mutex lock;
@@ -243,7 +263,7 @@ struct obj {
 	struct list_head locked_list;
 };
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 void __unlock_objs(struct list_head *list)
 {
@@ -312,12 +332,23 @@ Design:
   We maintain the following invariants for the wait list:
   (1) Waiters with an acquire context are sorted by stamp order; waiters
       without an acquire context are interspersed in FIFO order.
-  (2) Among waiters with contexts, only the first one can have other locks
-      acquired already (ctx->acquired > 0). Note that this waiter may come
-      after other waiters without contexts in the list.
+  (2) For Wait-Die, among waiters with contexts, only the first one can have
+      other locks acquired already (ctx->acquired > 0). Note that this waiter
+      may come after other waiters without contexts in the list.
+
+  The Wound-Wait preemption is implemented with a lazy-preemption scheme:
+  The wounded status of the transaction is checked only when there is
+  contention for a new lock and hence a true chance of deadlock. In that
+  situation, if the transaction is wounded, it backs off, clears the
+  wounded status and retries. A great benefit of implementing preemption in
+  this way is that the wounded transaction can identify a contending lock to
+  wait for before restarting the transaction. Just blindly restarting the
+  transaction would likely make the transaction end up in a situation where
+  it would have to back off again.
 
   In general, not much contention is expected. The locks are typically used to
-  serialize access to resources for devices.
+  serialize access to resources for devices, and optimization focus should
+  therefore be directed towards the uncontended cases.
 
 Lockdep:
   Special care has been taken to warn for as many cases of api abuse
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..039571b9fea1 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -46,7 +46,7 @@
  * write-side updates.
  */
 
-DEFINE_WW_CLASS(reservation_ww_class);
+DEFINE_WW_CLASS(reservation_ww_class, true);
 EXPORT_SYMBOL(reservation_ww_class);
 
 struct lock_class_key reservation_seqcount_class;
diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
index 8a5100685875..f22a7ef41de1 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -70,7 +70,7 @@
  * lists and lookup data structures.
  */
 
-static DEFINE_WW_CLASS(crtc_ww_class);
+static DEFINE_WW_CLASS(crtc_ww_class, true);
 
 /**
  * drm_modeset_lock_all - take all modeset locks
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 39fda195bf78..6278077f288b 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -8,6 +8,8 @@
  *
  * Wound/wait implementation:
  *  Copyright (C) 2013 Canonical Ltd.
+ * Choice of algorithm:
+ *  Copyright (C) 2018 WMWare Inc.
  *
  * This file contains the main data structure and API definitions.
  */
@@ -23,15 +25,17 @@ struct ww_class {
 	struct lock_class_key mutex_key;
 	const char *acquire_name;
 	const char *mutex_name;
+	bool is_wait_die;
 };
 
 struct ww_acquire_ctx {
 	struct task_struct *task;
 	unsigned long stamp;
 	unsigned acquired;
+	bool wounded;
+	struct ww_class *ww_class;
 #ifdef CONFIG_DEBUG_MUTEXES
 	unsigned done_acquire;
-	struct ww_class *ww_class;
 	struct ww_mutex *contending_lock;
 #endif
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -58,17 +62,19 @@ struct ww_mutex {
 # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
 #endif
 
-#define __WW_CLASS_INITIALIZER(ww_class) \
+#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die)	    \
 		{ .stamp = ATOMIC_LONG_INIT(0) \
 		, .acquire_name = #ww_class "_acquire" \
-		, .mutex_name = #ww_class "_mutex" }
+		, .mutex_name = #ww_class "_mutex" \
+		, .is_wait_die = _is_wait_die }
 
 #define __WW_MUTEX_INITIALIZER(lockname, class) \
 		{ .base =  __MUTEX_INITIALIZER(lockname.base) \
 		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
 
-#define DEFINE_WW_CLASS(classname) \
-	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
+#define DEFINE_WW_CLASS(classname, _is_wait_die)			\
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, \
+							   _is_wait_die)
 
 #define DEFINE_WW_MUTEX(mutexname, ww_class) \
 	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
@@ -123,8 +129,9 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
 	ctx->task = current;
 	ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
 	ctx->acquired = 0;
-#ifdef CONFIG_DEBUG_MUTEXES
 	ctx->ww_class = ww_class;
+	ctx->wounded = false;
+#ifdef CONFIG_DEBUG_MUTEXES
 	ctx->done_acquire = 0;
 	ctx->contending_lock = NULL;
 #endif
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 6850ffd69125..778ed026382f 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -365,7 +365,7 @@ static struct lock_torture_ops mutex_lock_ops = {
 };
 
 #include <linux/ww_mutex.h>
-static DEFINE_WW_CLASS(torture_ww_class);
+static DEFINE_WW_CLASS(torture_ww_class, true);
 static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
 static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
 static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2048359f33d2..b449a012c6f9 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -290,12 +290,47 @@ __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
 	       (a->stamp != b->stamp || a > b);
 }
 
+/*
+ * Wound the lock holder transaction if it's younger than the contending
+ * transaction, and there is a possibility of a deadlock.
+ * Also if the lock holder transaction isn't the current transaction,
+ * Make sure it's woken up in case it's sleeping on another ww mutex.
+ */
+static bool __ww_mutex_wound(struct mutex *lock,
+			     struct ww_acquire_ctx *ww_ctx,
+			     struct ww_acquire_ctx *hold_ctx)
+{
+	struct task_struct *owner =
+		__owner_task(atomic_long_read(&lock->owner));
+
+	lockdep_assert_held(&lock->wait_lock);
+
+	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
+	    ww_ctx->acquired > 0) {
+		WRITE_ONCE(hold_ctx->wounded, true);
+		if (owner != current) {
+			/*
+			 * wake_up_process() inserts a write memory barrier to
+			 * make sure owner sees it is wounded before
+			 * TASK_RUNNING in case it's sleeping on another
+			 * ww_mutex. Note that owner points to a valid
+			 * task_struct as long as we hold the wait_lock.
+			 */
+			wake_up_process(owner);
+		}
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * Wake up any waiters that may have to back off when the lock is held by the
  * given context.
  *
  * Due to the invariants on the wait list, this can only affect the first
- * waiter with a context.
+ * waiter with a context, unless the Wound-Wait algorithm is used where
+ * also subsequent waiters with a context main wound the lock holder.
  *
  * The current task must not be on the wait list.
  */
@@ -303,6 +338,7 @@ static void __sched
 __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 {
 	struct mutex_waiter *cur;
+	bool is_wait_die = ww_ctx->ww_class->is_wait_die;
 
 	lockdep_assert_held(&lock->wait_lock);
 
@@ -310,13 +346,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 		if (!cur->ww_ctx)
 			continue;
 
-		if (cur->ww_ctx->acquired > 0 &&
+		if (is_wait_die && cur->ww_ctx->acquired > 0 &&
 		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
 			debug_mutex_wake_waiter(lock, cur);
 			wake_up_process(cur->task);
 		}
 
-		break;
+		if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+			break;
 	}
 }
 
@@ -338,12 +375,17 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	 * and keep spinning, or it will acquire wait_lock, add itself
 	 * to waiter list and sleep.
 	 */
-	smp_mb(); /* ^^^ */
+	smp_mb(); /* See comments above and below. */
 
 	/*
-	 * Check if lock is contended, if not there is nobody to wake up
+	 * Check if lock is contended, if not there is nobody to wake up.
+	 * Checking MUTEX_FLAG_WAITERS is not enough here, since we need to
+	 * order against the lock->ctx check in __ww_mutex_wound called from
+	 * __ww_mutex_add_waiter. We can use list_empty without taking the
+	 * wait_lock, given the memory barrier above and the list_empty
+	 * documentation.
 	 */
-	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
+	if (likely(list_empty(&lock->base.wait_list)))
 		return;
 
 	/*
@@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
 	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
 	struct mutex_waiter *cur;
 
+	/*
+	 * If we miss a wounded == true here, we will have a pending
+	 * TASK_RUNNING and pick it up on the next schedule fall-through.
+	 */
+	if (!ctx->ww_class->is_wait_die) {
+		if (READ_ONCE(ctx->wounded))
+			goto deadlock;
+		else
+			return 0;
+	}
+
 	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
 		goto deadlock;
 
@@ -683,12 +736,15 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 {
 	struct mutex_waiter *cur;
 	struct list_head *pos;
+	bool is_wait_die;
 
 	if (!ww_ctx) {
 		list_add_tail(&waiter->list, &lock->wait_list);
 		return 0;
 	}
 
+	is_wait_die = ww_ctx->ww_class->is_wait_die;
+
 	/*
 	 * Add the waiter before the first waiter with a higher stamp.
 	 * Waiters without a context are skipped to avoid starving
@@ -701,7 +757,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 
 		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
 			/* Back off immediately if necessary. */
-			if (ww_ctx->acquired > 0) {
+			if (is_wait_die && ww_ctx->acquired > 0) {
 #ifdef CONFIG_DEBUG_MUTEXES
 				struct ww_mutex *ww;
 
@@ -721,13 +777,26 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		 * Wake up the waiter so that it gets a chance to back
 		 * off.
 		 */
-		if (cur->ww_ctx->acquired > 0) {
+		if (is_wait_die && cur->ww_ctx->acquired > 0) {
 			debug_mutex_wake_waiter(lock, cur);
 			wake_up_process(cur->task);
 		}
 	}
 
 	list_add_tail(&waiter->list, pos);
+	if (!is_wait_die) {
+		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
+
+		/*
+		 * Make sure a racing lock taker sees a non-empty waiting list
+		 * before we read ww->ctx, so that if we miss ww->ctx, the
+		 * racing lock taker will call __ww_mutex_wake_up_for_backoff()
+		 * and wound itself.
+		 */
+		smp_mb();
+		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
+	}
+
 	return 0;
 }
 
@@ -750,6 +819,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	if (use_ww_ctx && ww_ctx) {
 		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
 			return -EALREADY;
+
+		/*
+		 * Reset the wounded flag after a backoff.
+		 * No other process can race and wound us here since they
+		 * can't have a valid owner pointer at this time
+		 */
+		if (ww_ctx->acquired == 0)
+			ww_ctx->wounded = false;
 	}
 
 	preempt_disable();
@@ -858,6 +935,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 acquired:
 	__set_current_state(TASK_RUNNING);
 
+	/* We stole the lock. Need to check wounded status. */
+	if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
+	    !__mutex_waiter_is_first(lock, &waiter))
+		__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+
 	mutex_remove_waiter(lock, &waiter, current);
 	if (likely(list_empty(&lock->wait_list)))
 		__mutex_clear_flag(lock, MUTEX_FLAGS);
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 0e4cd64ad2c0..c7fc112d691d 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -26,7 +26,7 @@
 #include <linux/slab.h>
 #include <linux/ww_mutex.h>
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, true);
 struct workqueue_struct *wq;
 
 struct test_mutex {
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index b5c1293ce147..e52065f2acbf 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -29,7 +29,7 @@
  */
 static unsigned int debug_locks_verbose;
 
-static DEFINE_WW_CLASS(ww_lockdep);
+static DEFINE_WW_CLASS(ww_lockdep, true);
 
 static int __init setup_debug_locks_verbose(char *str)
 {
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13  7:47   ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13  7:47 UTC (permalink / raw)
  To: dri-devel, linux-kernel
  Cc: Thomas Hellstrom, Peter Zijlstra, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die in terms of elapsed time.
Possibly due to the larger number of held locks of sleeping transactions.

Update documentation and callers.

Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
tag patch-18-06-04

Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
chosen out of 100000. Four core Intel x86_64:

Algorithm    #threads       Rollbacks  time
Wound-Wait   4              ~100       ~17s.
Wait-Die     4              ~150000    ~19s.
Wound-Wait   16             ~360000    ~109s.
Wait-Die     16             ~450000    ~82s.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 Documentation/locking/ww-mutex-design.txt | 57 ++++++++++++++----
 drivers/dma-buf/reservation.c             |  2 +-
 drivers/gpu/drm/drm_modeset_lock.c        |  2 +-
 include/linux/ww_mutex.h                  | 19 ++++--
 kernel/locking/locktorture.c              |  2 +-
 kernel/locking/mutex.c                    | 98 ++++++++++++++++++++++++++++---
 kernel/locking/test-ww_mutex.c            |  2 +-
 lib/locking-selftest.c                    |  2 +-
 8 files changed, 152 insertions(+), 32 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
index 34c3a1b50b9a..29c85623b551 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -1,4 +1,4 @@
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
 ======================================
 
 Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the
 younger task) unlocks all of the buffers that it has already locked, and then
 tries again.
 
-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+preempts the transaction holding the lock, requiring it to back off. It
+Wounds the other transaction.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm which requires a reliable way to preempt another transaction.
 
 Concepts
 --------
@@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task
 trying to acquire locks doesn't grab a new reservation id, but keeps the one it
 acquired when starting the lock acquisition. This ticket is stored in the
 acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.
 
 W/w class: In contrast to normal mutexes the lock class needs to be explicit for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.
 
 Furthermore there are three different class of w/w lock acquire functions:
 
@@ -90,10 +105,15 @@ provided.
 Usage
 -----
 
+The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
+argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
+typically expect the number of simultaneous competing transactions to be small,
+and the rollback cost can be substantial.
+
 Three different ways to acquire locks within the same w/w class. Common
 definitions for methods #1 and #2:
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 struct obj {
 	struct ww_mutex lock;
@@ -243,7 +263,7 @@ struct obj {
 	struct list_head locked_list;
 };
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 void __unlock_objs(struct list_head *list)
 {
@@ -312,12 +332,23 @@ Design:
   We maintain the following invariants for the wait list:
   (1) Waiters with an acquire context are sorted by stamp order; waiters
       without an acquire context are interspersed in FIFO order.
-  (2) Among waiters with contexts, only the first one can have other locks
-      acquired already (ctx->acquired > 0). Note that this waiter may come
-      after other waiters without contexts in the list.
+  (2) For Wait-Die, among waiters with contexts, only the first one can have
+      other locks acquired already (ctx->acquired > 0). Note that this waiter
+      may come after other waiters without contexts in the list.
+
+  The Wound-Wait preemption is implemented with a lazy-preemption scheme:
+  The wounded status of the transaction is checked only when there is
+  contention for a new lock and hence a true chance of deadlock. In that
+  situation, if the transaction is wounded, it backs off, clears the
+  wounded status and retries. A great benefit of implementing preemption in
+  this way is that the wounded transaction can identify a contending lock to
+  wait for before restarting the transaction. Just blindly restarting the
+  transaction would likely make the transaction end up in a situation where
+  it would have to back off again.
 
   In general, not much contention is expected. The locks are typically used to
-  serialize access to resources for devices.
+  serialize access to resources for devices, and optimization focus should
+  therefore be directed towards the uncontended cases.
 
 Lockdep:
   Special care has been taken to warn for as many cases of api abuse
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..039571b9fea1 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -46,7 +46,7 @@
  * write-side updates.
  */
 
-DEFINE_WW_CLASS(reservation_ww_class);
+DEFINE_WW_CLASS(reservation_ww_class, true);
 EXPORT_SYMBOL(reservation_ww_class);
 
 struct lock_class_key reservation_seqcount_class;
diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
index 8a5100685875..f22a7ef41de1 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -70,7 +70,7 @@
  * lists and lookup data structures.
  */
 
-static DEFINE_WW_CLASS(crtc_ww_class);
+static DEFINE_WW_CLASS(crtc_ww_class, true);
 
 /**
  * drm_modeset_lock_all - take all modeset locks
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 39fda195bf78..6278077f288b 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -8,6 +8,8 @@
  *
  * Wound/wait implementation:
  *  Copyright (C) 2013 Canonical Ltd.
+ * Choice of algorithm:
+ *  Copyright (C) 2018 WMWare Inc.
  *
  * This file contains the main data structure and API definitions.
  */
@@ -23,15 +25,17 @@ struct ww_class {
 	struct lock_class_key mutex_key;
 	const char *acquire_name;
 	const char *mutex_name;
+	bool is_wait_die;
 };
 
 struct ww_acquire_ctx {
 	struct task_struct *task;
 	unsigned long stamp;
 	unsigned acquired;
+	bool wounded;
+	struct ww_class *ww_class;
 #ifdef CONFIG_DEBUG_MUTEXES
 	unsigned done_acquire;
-	struct ww_class *ww_class;
 	struct ww_mutex *contending_lock;
 #endif
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -58,17 +62,19 @@ struct ww_mutex {
 # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
 #endif
 
-#define __WW_CLASS_INITIALIZER(ww_class) \
+#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die)	    \
 		{ .stamp = ATOMIC_LONG_INIT(0) \
 		, .acquire_name = #ww_class "_acquire" \
-		, .mutex_name = #ww_class "_mutex" }
+		, .mutex_name = #ww_class "_mutex" \
+		, .is_wait_die = _is_wait_die }
 
 #define __WW_MUTEX_INITIALIZER(lockname, class) \
 		{ .base =  __MUTEX_INITIALIZER(lockname.base) \
 		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
 
-#define DEFINE_WW_CLASS(classname) \
-	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
+#define DEFINE_WW_CLASS(classname, _is_wait_die)			\
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, \
+							   _is_wait_die)
 
 #define DEFINE_WW_MUTEX(mutexname, ww_class) \
 	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
@@ -123,8 +129,9 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
 	ctx->task = current;
 	ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
 	ctx->acquired = 0;
-#ifdef CONFIG_DEBUG_MUTEXES
 	ctx->ww_class = ww_class;
+	ctx->wounded = false;
+#ifdef CONFIG_DEBUG_MUTEXES
 	ctx->done_acquire = 0;
 	ctx->contending_lock = NULL;
 #endif
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 6850ffd69125..778ed026382f 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -365,7 +365,7 @@ static struct lock_torture_ops mutex_lock_ops = {
 };
 
 #include <linux/ww_mutex.h>
-static DEFINE_WW_CLASS(torture_ww_class);
+static DEFINE_WW_CLASS(torture_ww_class, true);
 static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
 static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
 static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2048359f33d2..b449a012c6f9 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -290,12 +290,47 @@ __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
 	       (a->stamp != b->stamp || a > b);
 }
 
+/*
+ * Wound the lock holder transaction if it's younger than the contending
+ * transaction, and there is a possibility of a deadlock.
+ * Also if the lock holder transaction isn't the current transaction,
+ * Make sure it's woken up in case it's sleeping on another ww mutex.
+ */
+static bool __ww_mutex_wound(struct mutex *lock,
+			     struct ww_acquire_ctx *ww_ctx,
+			     struct ww_acquire_ctx *hold_ctx)
+{
+	struct task_struct *owner =
+		__owner_task(atomic_long_read(&lock->owner));
+
+	lockdep_assert_held(&lock->wait_lock);
+
+	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
+	    ww_ctx->acquired > 0) {
+		WRITE_ONCE(hold_ctx->wounded, true);
+		if (owner != current) {
+			/*
+			 * wake_up_process() inserts a write memory barrier to
+			 * make sure owner sees it is wounded before
+			 * TASK_RUNNING in case it's sleeping on another
+			 * ww_mutex. Note that owner points to a valid
+			 * task_struct as long as we hold the wait_lock.
+			 */
+			wake_up_process(owner);
+		}
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * Wake up any waiters that may have to back off when the lock is held by the
  * given context.
  *
  * Due to the invariants on the wait list, this can only affect the first
- * waiter with a context.
+ * waiter with a context, unless the Wound-Wait algorithm is used where
+ * also subsequent waiters with a context main wound the lock holder.
  *
  * The current task must not be on the wait list.
  */
@@ -303,6 +338,7 @@ static void __sched
 __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 {
 	struct mutex_waiter *cur;
+	bool is_wait_die = ww_ctx->ww_class->is_wait_die;
 
 	lockdep_assert_held(&lock->wait_lock);
 
@@ -310,13 +346,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 		if (!cur->ww_ctx)
 			continue;
 
-		if (cur->ww_ctx->acquired > 0 &&
+		if (is_wait_die && cur->ww_ctx->acquired > 0 &&
 		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
 			debug_mutex_wake_waiter(lock, cur);
 			wake_up_process(cur->task);
 		}
 
-		break;
+		if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+			break;
 	}
 }
 
@@ -338,12 +375,17 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	 * and keep spinning, or it will acquire wait_lock, add itself
 	 * to waiter list and sleep.
 	 */
-	smp_mb(); /* ^^^ */
+	smp_mb(); /* See comments above and below. */
 
 	/*
-	 * Check if lock is contended, if not there is nobody to wake up
+	 * Check if lock is contended, if not there is nobody to wake up.
+	 * Checking MUTEX_FLAG_WAITERS is not enough here, since we need to
+	 * order against the lock->ctx check in __ww_mutex_wound called from
+	 * __ww_mutex_add_waiter. We can use list_empty without taking the
+	 * wait_lock, given the memory barrier above and the list_empty
+	 * documentation.
 	 */
-	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
+	if (likely(list_empty(&lock->base.wait_list)))
 		return;
 
 	/*
@@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
 	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
 	struct mutex_waiter *cur;
 
+	/*
+	 * If we miss a wounded == true here, we will have a pending
+	 * TASK_RUNNING and pick it up on the next schedule fall-through.
+	 */
+	if (!ctx->ww_class->is_wait_die) {
+		if (READ_ONCE(ctx->wounded))
+			goto deadlock;
+		else
+			return 0;
+	}
+
 	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
 		goto deadlock;
 
@@ -683,12 +736,15 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 {
 	struct mutex_waiter *cur;
 	struct list_head *pos;
+	bool is_wait_die;
 
 	if (!ww_ctx) {
 		list_add_tail(&waiter->list, &lock->wait_list);
 		return 0;
 	}
 
+	is_wait_die = ww_ctx->ww_class->is_wait_die;
+
 	/*
 	 * Add the waiter before the first waiter with a higher stamp.
 	 * Waiters without a context are skipped to avoid starving
@@ -701,7 +757,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 
 		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
 			/* Back off immediately if necessary. */
-			if (ww_ctx->acquired > 0) {
+			if (is_wait_die && ww_ctx->acquired > 0) {
 #ifdef CONFIG_DEBUG_MUTEXES
 				struct ww_mutex *ww;
 
@@ -721,13 +777,26 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		 * Wake up the waiter so that it gets a chance to back
 		 * off.
 		 */
-		if (cur->ww_ctx->acquired > 0) {
+		if (is_wait_die && cur->ww_ctx->acquired > 0) {
 			debug_mutex_wake_waiter(lock, cur);
 			wake_up_process(cur->task);
 		}
 	}
 
 	list_add_tail(&waiter->list, pos);
+	if (!is_wait_die) {
+		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
+
+		/*
+		 * Make sure a racing lock taker sees a non-empty waiting list
+		 * before we read ww->ctx, so that if we miss ww->ctx, the
+		 * racing lock taker will call __ww_mutex_wake_up_for_backoff()
+		 * and wound itself.
+		 */
+		smp_mb();
+		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
+	}
+
 	return 0;
 }
 
@@ -750,6 +819,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	if (use_ww_ctx && ww_ctx) {
 		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
 			return -EALREADY;
+
+		/*
+		 * Reset the wounded flag after a backoff.
+		 * No other process can race and wound us here since they
+		 * can't have a valid owner pointer at this time
+		 */
+		if (ww_ctx->acquired == 0)
+			ww_ctx->wounded = false;
 	}
 
 	preempt_disable();
@@ -858,6 +935,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 acquired:
 	__set_current_state(TASK_RUNNING);
 
+	/* We stole the lock. Need to check wounded status. */
+	if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
+	    !__mutex_waiter_is_first(lock, &waiter))
+		__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+
 	mutex_remove_waiter(lock, &waiter, current);
 	if (likely(list_empty(&lock->wait_list)))
 		__mutex_clear_flag(lock, MUTEX_FLAGS);
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 0e4cd64ad2c0..c7fc112d691d 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -26,7 +26,7 @@
 #include <linux/slab.h>
 #include <linux/ww_mutex.h>
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, true);
 struct workqueue_struct *wq;
 
 struct test_mutex {
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index b5c1293ce147..e52065f2acbf 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -29,7 +29,7 @@
  */
 static unsigned int debug_locks_verbose;
 
-static DEFINE_WW_CLASS(ww_lockdep);
+static DEFINE_WW_CLASS(ww_lockdep, true);
 
 static int __init setup_debug_locks_verbose(char *str)
 {
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13  7:47   ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13  7:47 UTC (permalink / raw)
  To: dri-devel, linux-kernel
  Cc: Kate Stewart, Thomas Hellstrom, Davidlohr Bueso, Jonathan Corbet,
	Peter Zijlstra, linux-doc, Josh Triplett, linaro-mm-sig,
	David Airlie, Greg Kroah-Hartman, Ingo Molnar,
	Philippe Ombredanne, Thomas Gleixner, Paul E. McKenney,
	linux-media

The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die in terms of elapsed time.
Possibly due to the larger number of held locks of sleeping transactions.

Update documentation and callers.

Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
tag patch-18-06-04

Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
chosen out of 100000. Four core Intel x86_64:

Algorithm    #threads       Rollbacks  time
Wound-Wait   4              ~100       ~17s.
Wait-Die     4              ~150000    ~19s.
Wound-Wait   16             ~360000    ~109s.
Wait-Die     16             ~450000    ~82s.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 Documentation/locking/ww-mutex-design.txt | 57 ++++++++++++++----
 drivers/dma-buf/reservation.c             |  2 +-
 drivers/gpu/drm/drm_modeset_lock.c        |  2 +-
 include/linux/ww_mutex.h                  | 19 ++++--
 kernel/locking/locktorture.c              |  2 +-
 kernel/locking/mutex.c                    | 98 ++++++++++++++++++++++++++++---
 kernel/locking/test-ww_mutex.c            |  2 +-
 lib/locking-selftest.c                    |  2 +-
 8 files changed, 152 insertions(+), 32 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt
index 34c3a1b50b9a..29c85623b551 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -1,4 +1,4 @@
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
 ======================================
 
 Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the
 younger task) unlocks all of the buffers that it has already locked, and then
 tries again.
 
-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+preempts the transaction holding the lock, requiring it to back off. It
+Wounds the other transaction.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm which requires a reliable way to preempt another transaction.
 
 Concepts
 --------
@@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task
 trying to acquire locks doesn't grab a new reservation id, but keeps the one it
 acquired when starting the lock acquisition. This ticket is stored in the
 acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.
 
 W/w class: In contrast to normal mutexes the lock class needs to be explicit for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.
 
 Furthermore there are three different class of w/w lock acquire functions:
 
@@ -90,10 +105,15 @@ provided.
 Usage
 -----
 
+The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
+argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
+typically expect the number of simultaneous competing transactions to be small,
+and the rollback cost can be substantial.
+
 Three different ways to acquire locks within the same w/w class. Common
 definitions for methods #1 and #2:
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 struct obj {
 	struct ww_mutex lock;
@@ -243,7 +263,7 @@ struct obj {
 	struct list_head locked_list;
 };
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 void __unlock_objs(struct list_head *list)
 {
@@ -312,12 +332,23 @@ Design:
   We maintain the following invariants for the wait list:
   (1) Waiters with an acquire context are sorted by stamp order; waiters
       without an acquire context are interspersed in FIFO order.
-  (2) Among waiters with contexts, only the first one can have other locks
-      acquired already (ctx->acquired > 0). Note that this waiter may come
-      after other waiters without contexts in the list.
+  (2) For Wait-Die, among waiters with contexts, only the first one can have
+      other locks acquired already (ctx->acquired > 0). Note that this waiter
+      may come after other waiters without contexts in the list.
+
+  The Wound-Wait preemption is implemented with a lazy-preemption scheme:
+  The wounded status of the transaction is checked only when there is
+  contention for a new lock and hence a true chance of deadlock. In that
+  situation, if the transaction is wounded, it backs off, clears the
+  wounded status and retries. A great benefit of implementing preemption in
+  this way is that the wounded transaction can identify a contending lock to
+  wait for before restarting the transaction. Just blindly restarting the
+  transaction would likely make the transaction end up in a situation where
+  it would have to back off again.
 
   In general, not much contention is expected. The locks are typically used to
-  serialize access to resources for devices.
+  serialize access to resources for devices, and optimization focus should
+  therefore be directed towards the uncontended cases.
 
 Lockdep:
   Special care has been taken to warn for as many cases of api abuse
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 314eb1071cce..039571b9fea1 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -46,7 +46,7 @@
  * write-side updates.
  */
 
-DEFINE_WW_CLASS(reservation_ww_class);
+DEFINE_WW_CLASS(reservation_ww_class, true);
 EXPORT_SYMBOL(reservation_ww_class);
 
 struct lock_class_key reservation_seqcount_class;
diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
index 8a5100685875..f22a7ef41de1 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -70,7 +70,7 @@
  * lists and lookup data structures.
  */
 
-static DEFINE_WW_CLASS(crtc_ww_class);
+static DEFINE_WW_CLASS(crtc_ww_class, true);
 
 /**
  * drm_modeset_lock_all - take all modeset locks
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 39fda195bf78..6278077f288b 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -8,6 +8,8 @@
  *
  * Wound/wait implementation:
  *  Copyright (C) 2013 Canonical Ltd.
+ * Choice of algorithm:
+ *  Copyright (C) 2018 WMWare Inc.
  *
  * This file contains the main data structure and API definitions.
  */
@@ -23,15 +25,17 @@ struct ww_class {
 	struct lock_class_key mutex_key;
 	const char *acquire_name;
 	const char *mutex_name;
+	bool is_wait_die;
 };
 
 struct ww_acquire_ctx {
 	struct task_struct *task;
 	unsigned long stamp;
 	unsigned acquired;
+	bool wounded;
+	struct ww_class *ww_class;
 #ifdef CONFIG_DEBUG_MUTEXES
 	unsigned done_acquire;
-	struct ww_class *ww_class;
 	struct ww_mutex *contending_lock;
 #endif
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
@@ -58,17 +62,19 @@ struct ww_mutex {
 # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
 #endif
 
-#define __WW_CLASS_INITIALIZER(ww_class) \
+#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die)	    \
 		{ .stamp = ATOMIC_LONG_INIT(0) \
 		, .acquire_name = #ww_class "_acquire" \
-		, .mutex_name = #ww_class "_mutex" }
+		, .mutex_name = #ww_class "_mutex" \
+		, .is_wait_die = _is_wait_die }
 
 #define __WW_MUTEX_INITIALIZER(lockname, class) \
 		{ .base =  __MUTEX_INITIALIZER(lockname.base) \
 		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
 
-#define DEFINE_WW_CLASS(classname) \
-	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
+#define DEFINE_WW_CLASS(classname, _is_wait_die)			\
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, \
+							   _is_wait_die)
 
 #define DEFINE_WW_MUTEX(mutexname, ww_class) \
 	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
@@ -123,8 +129,9 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
 	ctx->task = current;
 	ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
 	ctx->acquired = 0;
-#ifdef CONFIG_DEBUG_MUTEXES
 	ctx->ww_class = ww_class;
+	ctx->wounded = false;
+#ifdef CONFIG_DEBUG_MUTEXES
 	ctx->done_acquire = 0;
 	ctx->contending_lock = NULL;
 #endif
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 6850ffd69125..778ed026382f 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -365,7 +365,7 @@ static struct lock_torture_ops mutex_lock_ops = {
 };
 
 #include <linux/ww_mutex.h>
-static DEFINE_WW_CLASS(torture_ww_class);
+static DEFINE_WW_CLASS(torture_ww_class, true);
 static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
 static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
 static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2048359f33d2..b449a012c6f9 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -290,12 +290,47 @@ __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
 	       (a->stamp != b->stamp || a > b);
 }
 
+/*
+ * Wound the lock holder transaction if it's younger than the contending
+ * transaction, and there is a possibility of a deadlock.
+ * Also if the lock holder transaction isn't the current transaction,
+ * Make sure it's woken up in case it's sleeping on another ww mutex.
+ */
+static bool __ww_mutex_wound(struct mutex *lock,
+			     struct ww_acquire_ctx *ww_ctx,
+			     struct ww_acquire_ctx *hold_ctx)
+{
+	struct task_struct *owner =
+		__owner_task(atomic_long_read(&lock->owner));
+
+	lockdep_assert_held(&lock->wait_lock);
+
+	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
+	    ww_ctx->acquired > 0) {
+		WRITE_ONCE(hold_ctx->wounded, true);
+		if (owner != current) {
+			/*
+			 * wake_up_process() inserts a write memory barrier to
+			 * make sure owner sees it is wounded before
+			 * TASK_RUNNING in case it's sleeping on another
+			 * ww_mutex. Note that owner points to a valid
+			 * task_struct as long as we hold the wait_lock.
+			 */
+			wake_up_process(owner);
+		}
+		return true;
+	}
+
+	return false;
+}
+
 /*
  * Wake up any waiters that may have to back off when the lock is held by the
  * given context.
  *
  * Due to the invariants on the wait list, this can only affect the first
- * waiter with a context.
+ * waiter with a context, unless the Wound-Wait algorithm is used where
+ * also subsequent waiters with a context main wound the lock holder.
  *
  * The current task must not be on the wait list.
  */
@@ -303,6 +338,7 @@ static void __sched
 __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 {
 	struct mutex_waiter *cur;
+	bool is_wait_die = ww_ctx->ww_class->is_wait_die;
 
 	lockdep_assert_held(&lock->wait_lock);
 
@@ -310,13 +346,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 		if (!cur->ww_ctx)
 			continue;
 
-		if (cur->ww_ctx->acquired > 0 &&
+		if (is_wait_die && cur->ww_ctx->acquired > 0 &&
 		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
 			debug_mutex_wake_waiter(lock, cur);
 			wake_up_process(cur->task);
 		}
 
-		break;
+		if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+			break;
 	}
 }
 
@@ -338,12 +375,17 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	 * and keep spinning, or it will acquire wait_lock, add itself
 	 * to waiter list and sleep.
 	 */
-	smp_mb(); /* ^^^ */
+	smp_mb(); /* See comments above and below. */
 
 	/*
-	 * Check if lock is contended, if not there is nobody to wake up
+	 * Check if lock is contended, if not there is nobody to wake up.
+	 * Checking MUTEX_FLAG_WAITERS is not enough here, since we need to
+	 * order against the lock->ctx check in __ww_mutex_wound called from
+	 * __ww_mutex_add_waiter. We can use list_empty without taking the
+	 * wait_lock, given the memory barrier above and the list_empty
+	 * documentation.
 	 */
-	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
+	if (likely(list_empty(&lock->base.wait_list)))
 		return;
 
 	/*
@@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
 	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
 	struct mutex_waiter *cur;
 
+	/*
+	 * If we miss a wounded == true here, we will have a pending
+	 * TASK_RUNNING and pick it up on the next schedule fall-through.
+	 */
+	if (!ctx->ww_class->is_wait_die) {
+		if (READ_ONCE(ctx->wounded))
+			goto deadlock;
+		else
+			return 0;
+	}
+
 	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
 		goto deadlock;
 
@@ -683,12 +736,15 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 {
 	struct mutex_waiter *cur;
 	struct list_head *pos;
+	bool is_wait_die;
 
 	if (!ww_ctx) {
 		list_add_tail(&waiter->list, &lock->wait_list);
 		return 0;
 	}
 
+	is_wait_die = ww_ctx->ww_class->is_wait_die;
+
 	/*
 	 * Add the waiter before the first waiter with a higher stamp.
 	 * Waiters without a context are skipped to avoid starving
@@ -701,7 +757,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 
 		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
 			/* Back off immediately if necessary. */
-			if (ww_ctx->acquired > 0) {
+			if (is_wait_die && ww_ctx->acquired > 0) {
 #ifdef CONFIG_DEBUG_MUTEXES
 				struct ww_mutex *ww;
 
@@ -721,13 +777,26 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		 * Wake up the waiter so that it gets a chance to back
 		 * off.
 		 */
-		if (cur->ww_ctx->acquired > 0) {
+		if (is_wait_die && cur->ww_ctx->acquired > 0) {
 			debug_mutex_wake_waiter(lock, cur);
 			wake_up_process(cur->task);
 		}
 	}
 
 	list_add_tail(&waiter->list, pos);
+	if (!is_wait_die) {
+		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
+
+		/*
+		 * Make sure a racing lock taker sees a non-empty waiting list
+		 * before we read ww->ctx, so that if we miss ww->ctx, the
+		 * racing lock taker will call __ww_mutex_wake_up_for_backoff()
+		 * and wound itself.
+		 */
+		smp_mb();
+		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
+	}
+
 	return 0;
 }
 
@@ -750,6 +819,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	if (use_ww_ctx && ww_ctx) {
 		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
 			return -EALREADY;
+
+		/*
+		 * Reset the wounded flag after a backoff.
+		 * No other process can race and wound us here since they
+		 * can't have a valid owner pointer at this time
+		 */
+		if (ww_ctx->acquired == 0)
+			ww_ctx->wounded = false;
 	}
 
 	preempt_disable();
@@ -858,6 +935,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 acquired:
 	__set_current_state(TASK_RUNNING);
 
+	/* We stole the lock. Need to check wounded status. */
+	if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
+	    !__mutex_waiter_is_first(lock, &waiter))
+		__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+
 	mutex_remove_waiter(lock, &waiter, current);
 	if (likely(list_empty(&lock->wait_list)))
 		__mutex_clear_flag(lock, MUTEX_FLAGS);
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 0e4cd64ad2c0..c7fc112d691d 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -26,7 +26,7 @@
 #include <linux/slab.h>
 #include <linux/ww_mutex.h>
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, true);
 struct workqueue_struct *wq;
 
 struct test_mutex {
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index b5c1293ce147..e52065f2acbf 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -29,7 +29,7 @@
  */
 static unsigned int debug_locks_verbose;
 
-static DEFINE_WW_CLASS(ww_lockdep);
+static DEFINE_WW_CLASS(ww_lockdep, true);
 
 static int __init setup_debug_locks_verbose(char *str)
 {
-- 
2.14.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/2] drm: Change deadlock-avoidance algorithm for the modeset locks.
  2018-06-13  7:47 ` [PATCH 0/2] locking, drm: " Thomas Hellstrom
@ 2018-06-13  7:47   ` Thomas Hellstrom
  -1 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13  7:47 UTC (permalink / raw)
  To: dri-devel, linux-kernel; +Cc: Thomas Hellstrom

For modeset locks we don't expect a high number of contending
transactions so change algorithm from Wait-Die to Wound-Wait.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/drm_modeset_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
index f22a7ef41de1..294997765a2c 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -70,7 +70,7 @@
  * lists and lookup data structures.
  */
 
-static DEFINE_WW_CLASS(crtc_ww_class, true);
+static DEFINE_WW_CLASS(crtc_ww_class, false);
 
 /**
  * drm_modeset_lock_all - take all modeset locks
-- 
2.14.3


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/2] drm: Change deadlock-avoidance algorithm for the modeset locks.
@ 2018-06-13  7:47   ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13  7:47 UTC (permalink / raw)
  To: dri-devel, linux-kernel; +Cc: Thomas Hellstrom

For modeset locks we don't expect a high number of contending
transactions so change algorithm from Wait-Die to Wound-Wait.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/drm_modeset_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c
index f22a7ef41de1..294997765a2c 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -70,7 +70,7 @@
  * lists and lookup data structures.
  */
 
-static DEFINE_WW_CLASS(crtc_ww_class, true);
+static DEFINE_WW_CLASS(crtc_ww_class, false);
 
 /**
  * drm_modeset_lock_all - take all modeset locks
-- 
2.14.3

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-13  7:47   ` Thomas Hellstrom
  (?)
@ 2018-06-13  7:54     ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 43+ messages in thread
From: Greg Kroah-Hartman @ 2018-06-13  7:54 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Jonathan Corbet, Gustavo Padovan, Maarten Lankhorst, Sean Paul,
	David Airlie, Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne, linux-doc,
	linux-media, linaro-mm-sig

On Wed, Jun 13, 2018 at 09:47:44AM +0200, Thomas Hellstrom wrote:
>  -----
>  
> +The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
> +argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
> +typically expect the number of simultaneous competing transactions to be small,
> +and the rollback cost can be substantial.
> +
>  Three different ways to acquire locks within the same w/w class. Common
>  definitions for methods #1 and #2:
>  
> -static DEFINE_WW_CLASS(ww_class);
> +static DEFINE_WW_CLASS(ww_class, false);

Minor nit on the api here.  Having a "flag" is a royal pain.  You have
to go and look up exactly what that "true/false" means every time you
run across it in code to figure out what it means.  Don't do that if at
all possible.

Make a new api:
	DEFINE_WW_CLASS_DIE(ww_class);
instead that then wraps that boolean internally to switch between the
different types.  That way the api is "self-documenting" and we all know
what is going on without having to dig through a header file.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13  7:54     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 43+ messages in thread
From: Greg Kroah-Hartman @ 2018-06-13  7:54 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Jonathan Corbet, Gustavo Padovan, Maarten Lankhorst, Sean Paul,
	David Airlie, Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne, linux-doc,
	linux-media, linaro-mm-sig

On Wed, Jun 13, 2018 at 09:47:44AM +0200, Thomas Hellstrom wrote:
>  -----
>  
> +The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
> +argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
> +typically expect the number of simultaneous competing transactions to be small,
> +and the rollback cost can be substantial.
> +
>  Three different ways to acquire locks within the same w/w class. Common
>  definitions for methods #1 and #2:
>  
> -static DEFINE_WW_CLASS(ww_class);
> +static DEFINE_WW_CLASS(ww_class, false);

Minor nit on the api here.  Having a "flag" is a royal pain.  You have
to go and look up exactly what that "true/false" means every time you
run across it in code to figure out what it means.  Don't do that if at
all possible.

Make a new api:
	DEFINE_WW_CLASS_DIE(ww_class);
instead that then wraps that boolean internally to switch between the
different types.  That way the api is "self-documenting" and we all know
what is going on without having to dig through a header file.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13  7:54     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 43+ messages in thread
From: Greg Kroah-Hartman @ 2018-06-13  7:54 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, Peter Zijlstra,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	David Airlie, Ingo Molnar, Philippe Ombredanne, Thomas Gleixner,
	Paul E. McKenney, linux-media

On Wed, Jun 13, 2018 at 09:47:44AM +0200, Thomas Hellstrom wrote:
>  -----
>  
> +The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
> +argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
> +typically expect the number of simultaneous competing transactions to be small,
> +and the rollback cost can be substantial.
> +
>  Three different ways to acquire locks within the same w/w class. Common
>  definitions for methods #1 and #2:
>  
> -static DEFINE_WW_CLASS(ww_class);
> +static DEFINE_WW_CLASS(ww_class, false);

Minor nit on the api here.  Having a "flag" is a royal pain.  You have
to go and look up exactly what that "true/false" means every time you
run across it in code to figure out what it means.  Don't do that if at
all possible.

Make a new api:
	DEFINE_WW_CLASS_DIE(ww_class);
instead that then wraps that boolean internally to switch between the
different types.  That way the api is "self-documenting" and we all know
what is going on without having to dig through a header file.

thanks,

greg k-h
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-13  7:54     ` Greg Kroah-Hartman
  (?)
@ 2018-06-13  8:34       ` Thomas Hellstrom
  -1 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13  8:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: dri-devel, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Jonathan Corbet, Gustavo Padovan, Maarten Lankhorst, Sean Paul,
	David Airlie, Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne, linux-doc,
	linux-media, linaro-mm-sig

On 06/13/2018 09:54 AM, Greg Kroah-Hartman wrote:
> On Wed, Jun 13, 2018 at 09:47:44AM +0200, Thomas Hellstrom wrote:
>>   -----
>>   
>> +The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
>> +argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
>> +typically expect the number of simultaneous competing transactions to be small,
>> +and the rollback cost can be substantial.
>> +
>>   Three different ways to acquire locks within the same w/w class. Common
>>   definitions for methods #1 and #2:
>>   
>> -static DEFINE_WW_CLASS(ww_class);
>> +static DEFINE_WW_CLASS(ww_class, false);
> Minor nit on the api here.  Having a "flag" is a royal pain.  You have
> to go and look up exactly what that "true/false" means every time you
> run across it in code to figure out what it means.  Don't do that if at
> all possible.
>
> Make a new api:
> 	DEFINE_WW_CLASS_DIE(ww_class);
> instead that then wraps that boolean internally to switch between the
> different types.  That way the api is "self-documenting" and we all know
> what is going on without having to dig through a header file.
>
> thanks,
>
> greg k-h

Good point. I'll update in a v2.

Thanks,

Thomas



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13  8:34       ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13  8:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: dri-devel, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Jonathan Corbet, Gustavo Padovan, Maarten Lankhorst, Sean Paul,
	David Airlie, Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne, linux-doc,
	linux-media, linaro-mm-sig

On 06/13/2018 09:54 AM, Greg Kroah-Hartman wrote:
> On Wed, Jun 13, 2018 at 09:47:44AM +0200, Thomas Hellstrom wrote:
>>   -----
>>   
>> +The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
>> +argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
>> +typically expect the number of simultaneous competing transactions to be small,
>> +and the rollback cost can be substantial.
>> +
>>   Three different ways to acquire locks within the same w/w class. Common
>>   definitions for methods #1 and #2:
>>   
>> -static DEFINE_WW_CLASS(ww_class);
>> +static DEFINE_WW_CLASS(ww_class, false);
> Minor nit on the api here.  Having a "flag" is a royal pain.  You have
> to go and look up exactly what that "true/false" means every time you
> run across it in code to figure out what it means.  Don't do that if at
> all possible.
>
> Make a new api:
> 	DEFINE_WW_CLASS_DIE(ww_class);
> instead that then wraps that boolean internally to switch between the
> different types.  That way the api is "self-documenting" and we all know
> what is going on without having to dig through a header file.
>
> thanks,
>
> greg k-h

Good point. I'll update in a v2.

Thanks,

Thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13  8:34       ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13  8:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, Peter Zijlstra,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	David Airlie, Ingo Molnar, Philippe Ombredanne, Thomas Gleixner,
	Paul E. McKenney, linux-media

On 06/13/2018 09:54 AM, Greg Kroah-Hartman wrote:
> On Wed, Jun 13, 2018 at 09:47:44AM +0200, Thomas Hellstrom wrote:
>>   -----
>>   
>> +The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
>> +argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
>> +typically expect the number of simultaneous competing transactions to be small,
>> +and the rollback cost can be substantial.
>> +
>>   Three different ways to acquire locks within the same w/w class. Common
>>   definitions for methods #1 and #2:
>>   
>> -static DEFINE_WW_CLASS(ww_class);
>> +static DEFINE_WW_CLASS(ww_class, false);
> Minor nit on the api here.  Having a "flag" is a royal pain.  You have
> to go and look up exactly what that "true/false" means every time you
> run across it in code to figure out what it means.  Don't do that if at
> all possible.
>
> Make a new api:
> 	DEFINE_WW_CLASS_DIE(ww_class);
> instead that then wraps that boolean internally to switch between the
> different types.  That way the api is "self-documenting" and we all know
> what is going on without having to dig through a header file.
>
> thanks,
>
> greg k-h

Good point. I'll update in a v2.

Thanks,

Thomas


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-13  7:47   ` Thomas Hellstrom
  (?)
@ 2018-06-13  9:50     ` Peter Zijlstra
  -1 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-13  9:50 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig


/me wonders what's up with partial Cc's today..

On Wed, Jun 13, 2018 at 09:47:44AM +0200, Thomas Hellstrom wrote:
> The current Wound-Wait mutex algorithm is actually not Wound-Wait but
> Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
> is, contrary to Wait-Die a preemptive algorithm and is known to generate
> fewer backoffs. Testing reveals that this is true if the
> number of simultaneous contending transactions is small.
> As the number of simultaneous contending threads increases, Wait-Wound
> becomes inferior to Wait-Die in terms of elapsed time.
> Possibly due to the larger number of held locks of sleeping transactions.
> 
> Update documentation and callers.
> 
> Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
> tag patch-18-06-04
> 
> Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
> chosen out of 100000. Four core Intel x86_64:
> 
> Algorithm    #threads       Rollbacks  time
> Wound-Wait   4              ~100       ~17s.
> Wait-Die     4              ~150000    ~19s.
> Wound-Wait   16             ~360000    ~109s.
> Wait-Die     16             ~450000    ~82s.

> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index 39fda195bf78..6278077f288b 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -8,6 +8,8 @@
>   *
>   * Wound/wait implementation:
>   *  Copyright (C) 2013 Canonical Ltd.
> + * Choice of algorithm:
> + *  Copyright (C) 2018 WMWare Inc.
>   *
>   * This file contains the main data structure and API definitions.
>   */
> @@ -23,15 +25,17 @@ struct ww_class {
>  	struct lock_class_key mutex_key;
>  	const char *acquire_name;
>  	const char *mutex_name;
> +	bool is_wait_die;
>  };

No _Bool in composites please.

>  struct ww_acquire_ctx {
>  	struct task_struct *task;
>  	unsigned long stamp;
>  	unsigned acquired;
> +	bool wounded;

Again.

> +	struct ww_class *ww_class;
>  #ifdef CONFIG_DEBUG_MUTEXES
>  	unsigned done_acquire;
> -	struct ww_class *ww_class;
>  	struct ww_mutex *contending_lock;
>  #endif
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC

> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 2048359f33d2..b449a012c6f9 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -290,12 +290,47 @@ __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>  	       (a->stamp != b->stamp || a > b);
>  }
>  
> +/*
> + * Wound the lock holder transaction if it's younger than the contending
> + * transaction, and there is a possibility of a deadlock.
> + * Also if the lock holder transaction isn't the current transaction,

Comma followed by a capital?

> + * Make sure it's woken up in case it's sleeping on another ww mutex.

> + */
> +static bool __ww_mutex_wound(struct mutex *lock,
> +			     struct ww_acquire_ctx *ww_ctx,
> +			     struct ww_acquire_ctx *hold_ctx)
> +{
> +	struct task_struct *owner =
> +		__owner_task(atomic_long_read(&lock->owner));

Did you just spell __mutex_owner() wrong?

> +
> +	lockdep_assert_held(&lock->wait_lock);
> +
> +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
> +	    ww_ctx->acquired > 0) {
> +		WRITE_ONCE(hold_ctx->wounded, true);
> +		if (owner != current) {
> +			/*
> +			 * wake_up_process() inserts a write memory barrier to

It does no such thing. But yes, it does ensure the wakee sees all prior
stores IFF the wakeup happened.

> +			 * make sure owner sees it is wounded before
> +			 * TASK_RUNNING in case it's sleeping on another
> +			 * ww_mutex. Note that owner points to a valid
> +			 * task_struct as long as we hold the wait_lock.
> +			 */

What exactly are you trying to say here ?

I'm thinking this is the pairing barrier to the smp_mb() below, with
your list_empty() thing? Might make sense to write a single coherent
comment and refer to the other location.

> +			wake_up_process(owner);
> +		}
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  /*
>   * Wake up any waiters that may have to back off when the lock is held by the
>   * given context.
>   *
>   * Due to the invariants on the wait list, this can only affect the first
> - * waiter with a context.
> + * waiter with a context, unless the Wound-Wait algorithm is used where
> + * also subsequent waiters with a context main wound the lock holder.
>   *
>   * The current task must not be on the wait list.
>   */
> @@ -303,6 +338,7 @@ static void __sched
>  __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>  {
>  	struct mutex_waiter *cur;
> +	bool is_wait_die = ww_ctx->ww_class->is_wait_die;
>  
>  	lockdep_assert_held(&lock->wait_lock);
>  
> @@ -310,13 +346,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>  		if (!cur->ww_ctx)
>  			continue;
>  
> -		if (cur->ww_ctx->acquired > 0 &&
> +		if (is_wait_die && cur->ww_ctx->acquired > 0 &&
>  		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
>  			debug_mutex_wake_waiter(lock, cur);
>  			wake_up_process(cur->task);
>  		}
>  
> -		break;
> +		if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
> +			break;
>  	}
>  }
>  
> @@ -338,12 +375,17 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>  	 * and keep spinning, or it will acquire wait_lock, add itself
>  	 * to waiter list and sleep.
>  	 */
> -	smp_mb(); /* ^^^ */
> +	smp_mb(); /* See comments above and below. */
>  
>  	/*
> -	 * Check if lock is contended, if not there is nobody to wake up
> +	 * Check if lock is contended, if not there is nobody to wake up.
> +	 * Checking MUTEX_FLAG_WAITERS is not enough here, 

That seems like a superfluous thing to say. It makes sense in the
context of this patch because we change the FLAG check into a list
check, but the resulting comment/code looks odd.

>							   since we need to
> +	 * order against the lock->ctx check in __ww_mutex_wound called from
> +	 * __ww_mutex_add_waiter. We can use list_empty without taking the
> +	 * wait_lock, given the memory barrier above and the list_empty
> +	 * documentation.

I don't trust documentation. Please reason about implementation.

>  	 */
> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> +	if (likely(list_empty(&lock->base.wait_list)))
>  		return;
>  
>  	/*
> @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
>  	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>  	struct mutex_waiter *cur;
>  
> +	/*
> +	 * If we miss a wounded == true here, we will have a pending

Explain how we can miss that.

> +	 * TASK_RUNNING and pick it up on the next schedule fall-through.
> +	 */
> +	if (!ctx->ww_class->is_wait_die) {
> +		if (READ_ONCE(ctx->wounded))
> +			goto deadlock;
> +		else
> +			return 0;
> +	}
> +
>  	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
>  		goto deadlock;
>  
> @@ -683,12 +736,15 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>  {
>  	struct mutex_waiter *cur;
>  	struct list_head *pos;
> +	bool is_wait_die;
>  
>  	if (!ww_ctx) {
>  		list_add_tail(&waiter->list, &lock->wait_list);
>  		return 0;
>  	}
>  
> +	is_wait_die = ww_ctx->ww_class->is_wait_die;
> +
>  	/*
>  	 * Add the waiter before the first waiter with a higher stamp.
>  	 * Waiters without a context are skipped to avoid starving
> @@ -701,7 +757,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>  
>  		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
>  			/* Back off immediately if necessary. */
> -			if (ww_ctx->acquired > 0) {
> +			if (is_wait_die && ww_ctx->acquired > 0) {
>  #ifdef CONFIG_DEBUG_MUTEXES
>  				struct ww_mutex *ww;
>  
> @@ -721,13 +777,26 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>  		 * Wake up the waiter so that it gets a chance to back
>  		 * off.
>  		 */
> -		if (cur->ww_ctx->acquired > 0) {
> +		if (is_wait_die && cur->ww_ctx->acquired > 0) {
>  			debug_mutex_wake_waiter(lock, cur);
>  			wake_up_process(cur->task);
>  		}
>  	}
>  
>  	list_add_tail(&waiter->list, pos);
> +	if (!is_wait_die) {
> +		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
> +
> +		/*
> +		 * Make sure a racing lock taker sees a non-empty waiting list
> +		 * before we read ww->ctx, so that if we miss ww->ctx, the
> +		 * racing lock taker will call __ww_mutex_wake_up_for_backoff()
> +		 * and wound itself.
> +		 */
> +		smp_mb();
> +		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
> +	}
> +
>  	return 0;
>  }
>  
> @@ -750,6 +819,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	if (use_ww_ctx && ww_ctx) {
>  		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>  			return -EALREADY;
> +
> +		/*
> +		 * Reset the wounded flag after a backoff.
> +		 * No other process can race and wound us here since they
> +		 * can't have a valid owner pointer at this time
> +		 */
> +		if (ww_ctx->acquired == 0)
> +			ww_ctx->wounded = false;
>  	}
>  
>  	preempt_disable();
> @@ -858,6 +935,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  acquired:
>  	__set_current_state(TASK_RUNNING);
>  
> +	/* We stole the lock. Need to check wounded status. */
> +	if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
> +	    !__mutex_waiter_is_first(lock, &waiter))
> +		__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
> +
>  	mutex_remove_waiter(lock, &waiter, current);
>  	if (likely(list_empty(&lock->wait_list)))
>  		__mutex_clear_flag(lock, MUTEX_FLAGS);

I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
I have to look at it, and you just made it worse spagethi.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13  9:50     ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-13  9:50 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig


/me wonders what's up with partial Cc's today..

On Wed, Jun 13, 2018 at 09:47:44AM +0200, Thomas Hellstrom wrote:
> The current Wound-Wait mutex algorithm is actually not Wound-Wait but
> Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
> is, contrary to Wait-Die a preemptive algorithm and is known to generate
> fewer backoffs. Testing reveals that this is true if the
> number of simultaneous contending transactions is small.
> As the number of simultaneous contending threads increases, Wait-Wound
> becomes inferior to Wait-Die in terms of elapsed time.
> Possibly due to the larger number of held locks of sleeping transactions.
> 
> Update documentation and callers.
> 
> Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
> tag patch-18-06-04
> 
> Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
> chosen out of 100000. Four core Intel x86_64:
> 
> Algorithm    #threads       Rollbacks  time
> Wound-Wait   4              ~100       ~17s.
> Wait-Die     4              ~150000    ~19s.
> Wound-Wait   16             ~360000    ~109s.
> Wait-Die     16             ~450000    ~82s.

> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index 39fda195bf78..6278077f288b 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -8,6 +8,8 @@
>   *
>   * Wound/wait implementation:
>   *  Copyright (C) 2013 Canonical Ltd.
> + * Choice of algorithm:
> + *  Copyright (C) 2018 WMWare Inc.
>   *
>   * This file contains the main data structure and API definitions.
>   */
> @@ -23,15 +25,17 @@ struct ww_class {
>  	struct lock_class_key mutex_key;
>  	const char *acquire_name;
>  	const char *mutex_name;
> +	bool is_wait_die;
>  };

No _Bool in composites please.

>  struct ww_acquire_ctx {
>  	struct task_struct *task;
>  	unsigned long stamp;
>  	unsigned acquired;
> +	bool wounded;

Again.

> +	struct ww_class *ww_class;
>  #ifdef CONFIG_DEBUG_MUTEXES
>  	unsigned done_acquire;
> -	struct ww_class *ww_class;
>  	struct ww_mutex *contending_lock;
>  #endif
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC

> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 2048359f33d2..b449a012c6f9 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -290,12 +290,47 @@ __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>  	       (a->stamp != b->stamp || a > b);
>  }
>  
> +/*
> + * Wound the lock holder transaction if it's younger than the contending
> + * transaction, and there is a possibility of a deadlock.
> + * Also if the lock holder transaction isn't the current transaction,

Comma followed by a capital?

> + * Make sure it's woken up in case it's sleeping on another ww mutex.

> + */
> +static bool __ww_mutex_wound(struct mutex *lock,
> +			     struct ww_acquire_ctx *ww_ctx,
> +			     struct ww_acquire_ctx *hold_ctx)
> +{
> +	struct task_struct *owner =
> +		__owner_task(atomic_long_read(&lock->owner));

Did you just spell __mutex_owner() wrong?

> +
> +	lockdep_assert_held(&lock->wait_lock);
> +
> +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
> +	    ww_ctx->acquired > 0) {
> +		WRITE_ONCE(hold_ctx->wounded, true);
> +		if (owner != current) {
> +			/*
> +			 * wake_up_process() inserts a write memory barrier to

It does no such thing. But yes, it does ensure the wakee sees all prior
stores IFF the wakeup happened.

> +			 * make sure owner sees it is wounded before
> +			 * TASK_RUNNING in case it's sleeping on another
> +			 * ww_mutex. Note that owner points to a valid
> +			 * task_struct as long as we hold the wait_lock.
> +			 */

What exactly are you trying to say here ?

I'm thinking this is the pairing barrier to the smp_mb() below, with
your list_empty() thing? Might make sense to write a single coherent
comment and refer to the other location.

> +			wake_up_process(owner);
> +		}
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  /*
>   * Wake up any waiters that may have to back off when the lock is held by the
>   * given context.
>   *
>   * Due to the invariants on the wait list, this can only affect the first
> - * waiter with a context.
> + * waiter with a context, unless the Wound-Wait algorithm is used where
> + * also subsequent waiters with a context main wound the lock holder.
>   *
>   * The current task must not be on the wait list.
>   */
> @@ -303,6 +338,7 @@ static void __sched
>  __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>  {
>  	struct mutex_waiter *cur;
> +	bool is_wait_die = ww_ctx->ww_class->is_wait_die;
>  
>  	lockdep_assert_held(&lock->wait_lock);
>  
> @@ -310,13 +346,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>  		if (!cur->ww_ctx)
>  			continue;
>  
> -		if (cur->ww_ctx->acquired > 0 &&
> +		if (is_wait_die && cur->ww_ctx->acquired > 0 &&
>  		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
>  			debug_mutex_wake_waiter(lock, cur);
>  			wake_up_process(cur->task);
>  		}
>  
> -		break;
> +		if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
> +			break;
>  	}
>  }
>  
> @@ -338,12 +375,17 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>  	 * and keep spinning, or it will acquire wait_lock, add itself
>  	 * to waiter list and sleep.
>  	 */
> -	smp_mb(); /* ^^^ */
> +	smp_mb(); /* See comments above and below. */
>  
>  	/*
> -	 * Check if lock is contended, if not there is nobody to wake up
> +	 * Check if lock is contended, if not there is nobody to wake up.
> +	 * Checking MUTEX_FLAG_WAITERS is not enough here, 

That seems like a superfluous thing to say. It makes sense in the
context of this patch because we change the FLAG check into a list
check, but the resulting comment/code looks odd.

>							   since we need to
> +	 * order against the lock->ctx check in __ww_mutex_wound called from
> +	 * __ww_mutex_add_waiter. We can use list_empty without taking the
> +	 * wait_lock, given the memory barrier above and the list_empty
> +	 * documentation.

I don't trust documentation. Please reason about implementation.

>  	 */
> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> +	if (likely(list_empty(&lock->base.wait_list)))
>  		return;
>  
>  	/*
> @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
>  	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>  	struct mutex_waiter *cur;
>  
> +	/*
> +	 * If we miss a wounded == true here, we will have a pending

Explain how we can miss that.

> +	 * TASK_RUNNING and pick it up on the next schedule fall-through.
> +	 */
> +	if (!ctx->ww_class->is_wait_die) {
> +		if (READ_ONCE(ctx->wounded))
> +			goto deadlock;
> +		else
> +			return 0;
> +	}
> +
>  	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
>  		goto deadlock;
>  
> @@ -683,12 +736,15 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>  {
>  	struct mutex_waiter *cur;
>  	struct list_head *pos;
> +	bool is_wait_die;
>  
>  	if (!ww_ctx) {
>  		list_add_tail(&waiter->list, &lock->wait_list);
>  		return 0;
>  	}
>  
> +	is_wait_die = ww_ctx->ww_class->is_wait_die;
> +
>  	/*
>  	 * Add the waiter before the first waiter with a higher stamp.
>  	 * Waiters without a context are skipped to avoid starving
> @@ -701,7 +757,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>  
>  		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
>  			/* Back off immediately if necessary. */
> -			if (ww_ctx->acquired > 0) {
> +			if (is_wait_die && ww_ctx->acquired > 0) {
>  #ifdef CONFIG_DEBUG_MUTEXES
>  				struct ww_mutex *ww;
>  
> @@ -721,13 +777,26 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>  		 * Wake up the waiter so that it gets a chance to back
>  		 * off.
>  		 */
> -		if (cur->ww_ctx->acquired > 0) {
> +		if (is_wait_die && cur->ww_ctx->acquired > 0) {
>  			debug_mutex_wake_waiter(lock, cur);
>  			wake_up_process(cur->task);
>  		}
>  	}
>  
>  	list_add_tail(&waiter->list, pos);
> +	if (!is_wait_die) {
> +		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
> +
> +		/*
> +		 * Make sure a racing lock taker sees a non-empty waiting list
> +		 * before we read ww->ctx, so that if we miss ww->ctx, the
> +		 * racing lock taker will call __ww_mutex_wake_up_for_backoff()
> +		 * and wound itself.
> +		 */
> +		smp_mb();
> +		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
> +	}
> +
>  	return 0;
>  }
>  
> @@ -750,6 +819,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	if (use_ww_ctx && ww_ctx) {
>  		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>  			return -EALREADY;
> +
> +		/*
> +		 * Reset the wounded flag after a backoff.
> +		 * No other process can race and wound us here since they
> +		 * can't have a valid owner pointer at this time
> +		 */
> +		if (ww_ctx->acquired == 0)
> +			ww_ctx->wounded = false;
>  	}
>  
>  	preempt_disable();
> @@ -858,6 +935,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  acquired:
>  	__set_current_state(TASK_RUNNING);
>  
> +	/* We stole the lock. Need to check wounded status. */
> +	if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
> +	    !__mutex_waiter_is_first(lock, &waiter))
> +		__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
> +
>  	mutex_remove_waiter(lock, &waiter, current);
>  	if (likely(list_empty(&lock->wait_list)))
>  		__mutex_clear_flag(lock, MUTEX_FLAGS);

I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
I have to look at it, and you just made it worse spagethi.


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13  9:50     ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-13  9:50 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, David Airlie,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	Greg Kroah-Hartman, Ingo Molnar, Philippe Ombredanne,
	Thomas Gleixner, Paul E. McKenney, linux-media


/me wonders what's up with partial Cc's today..

On Wed, Jun 13, 2018 at 09:47:44AM +0200, Thomas Hellstrom wrote:
> The current Wound-Wait mutex algorithm is actually not Wound-Wait but
> Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
> is, contrary to Wait-Die a preemptive algorithm and is known to generate
> fewer backoffs. Testing reveals that this is true if the
> number of simultaneous contending transactions is small.
> As the number of simultaneous contending threads increases, Wait-Wound
> becomes inferior to Wait-Die in terms of elapsed time.
> Possibly due to the larger number of held locks of sleeping transactions.
> 
> Update documentation and callers.
> 
> Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
> tag patch-18-06-04
> 
> Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
> chosen out of 100000. Four core Intel x86_64:
> 
> Algorithm    #threads       Rollbacks  time
> Wound-Wait   4              ~100       ~17s.
> Wait-Die     4              ~150000    ~19s.
> Wound-Wait   16             ~360000    ~109s.
> Wait-Die     16             ~450000    ~82s.

> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index 39fda195bf78..6278077f288b 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -8,6 +8,8 @@
>   *
>   * Wound/wait implementation:
>   *  Copyright (C) 2013 Canonical Ltd.
> + * Choice of algorithm:
> + *  Copyright (C) 2018 WMWare Inc.
>   *
>   * This file contains the main data structure and API definitions.
>   */
> @@ -23,15 +25,17 @@ struct ww_class {
>  	struct lock_class_key mutex_key;
>  	const char *acquire_name;
>  	const char *mutex_name;
> +	bool is_wait_die;
>  };

No _Bool in composites please.

>  struct ww_acquire_ctx {
>  	struct task_struct *task;
>  	unsigned long stamp;
>  	unsigned acquired;
> +	bool wounded;

Again.

> +	struct ww_class *ww_class;
>  #ifdef CONFIG_DEBUG_MUTEXES
>  	unsigned done_acquire;
> -	struct ww_class *ww_class;
>  	struct ww_mutex *contending_lock;
>  #endif
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC

> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 2048359f33d2..b449a012c6f9 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -290,12 +290,47 @@ __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>  	       (a->stamp != b->stamp || a > b);
>  }
>  
> +/*
> + * Wound the lock holder transaction if it's younger than the contending
> + * transaction, and there is a possibility of a deadlock.
> + * Also if the lock holder transaction isn't the current transaction,

Comma followed by a capital?

> + * Make sure it's woken up in case it's sleeping on another ww mutex.

> + */
> +static bool __ww_mutex_wound(struct mutex *lock,
> +			     struct ww_acquire_ctx *ww_ctx,
> +			     struct ww_acquire_ctx *hold_ctx)
> +{
> +	struct task_struct *owner =
> +		__owner_task(atomic_long_read(&lock->owner));

Did you just spell __mutex_owner() wrong?

> +
> +	lockdep_assert_held(&lock->wait_lock);
> +
> +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
> +	    ww_ctx->acquired > 0) {
> +		WRITE_ONCE(hold_ctx->wounded, true);
> +		if (owner != current) {
> +			/*
> +			 * wake_up_process() inserts a write memory barrier to

It does no such thing. But yes, it does ensure the wakee sees all prior
stores IFF the wakeup happened.

> +			 * make sure owner sees it is wounded before
> +			 * TASK_RUNNING in case it's sleeping on another
> +			 * ww_mutex. Note that owner points to a valid
> +			 * task_struct as long as we hold the wait_lock.
> +			 */

What exactly are you trying to say here ?

I'm thinking this is the pairing barrier to the smp_mb() below, with
your list_empty() thing? Might make sense to write a single coherent
comment and refer to the other location.

> +			wake_up_process(owner);
> +		}
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  /*
>   * Wake up any waiters that may have to back off when the lock is held by the
>   * given context.
>   *
>   * Due to the invariants on the wait list, this can only affect the first
> - * waiter with a context.
> + * waiter with a context, unless the Wound-Wait algorithm is used where
> + * also subsequent waiters with a context main wound the lock holder.
>   *
>   * The current task must not be on the wait list.
>   */
> @@ -303,6 +338,7 @@ static void __sched
>  __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>  {
>  	struct mutex_waiter *cur;
> +	bool is_wait_die = ww_ctx->ww_class->is_wait_die;
>  
>  	lockdep_assert_held(&lock->wait_lock);
>  
> @@ -310,13 +346,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>  		if (!cur->ww_ctx)
>  			continue;
>  
> -		if (cur->ww_ctx->acquired > 0 &&
> +		if (is_wait_die && cur->ww_ctx->acquired > 0 &&
>  		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
>  			debug_mutex_wake_waiter(lock, cur);
>  			wake_up_process(cur->task);
>  		}
>  
> -		break;
> +		if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
> +			break;
>  	}
>  }
>  
> @@ -338,12 +375,17 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>  	 * and keep spinning, or it will acquire wait_lock, add itself
>  	 * to waiter list and sleep.
>  	 */
> -	smp_mb(); /* ^^^ */
> +	smp_mb(); /* See comments above and below. */
>  
>  	/*
> -	 * Check if lock is contended, if not there is nobody to wake up
> +	 * Check if lock is contended, if not there is nobody to wake up.
> +	 * Checking MUTEX_FLAG_WAITERS is not enough here, 

That seems like a superfluous thing to say. It makes sense in the
context of this patch because we change the FLAG check into a list
check, but the resulting comment/code looks odd.

>							   since we need to
> +	 * order against the lock->ctx check in __ww_mutex_wound called from
> +	 * __ww_mutex_add_waiter. We can use list_empty without taking the
> +	 * wait_lock, given the memory barrier above and the list_empty
> +	 * documentation.

I don't trust documentation. Please reason about implementation.

>  	 */
> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> +	if (likely(list_empty(&lock->base.wait_list)))
>  		return;
>  
>  	/*
> @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
>  	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>  	struct mutex_waiter *cur;
>  
> +	/*
> +	 * If we miss a wounded == true here, we will have a pending

Explain how we can miss that.

> +	 * TASK_RUNNING and pick it up on the next schedule fall-through.
> +	 */
> +	if (!ctx->ww_class->is_wait_die) {
> +		if (READ_ONCE(ctx->wounded))
> +			goto deadlock;
> +		else
> +			return 0;
> +	}
> +
>  	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
>  		goto deadlock;
>  
> @@ -683,12 +736,15 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>  {
>  	struct mutex_waiter *cur;
>  	struct list_head *pos;
> +	bool is_wait_die;
>  
>  	if (!ww_ctx) {
>  		list_add_tail(&waiter->list, &lock->wait_list);
>  		return 0;
>  	}
>  
> +	is_wait_die = ww_ctx->ww_class->is_wait_die;
> +
>  	/*
>  	 * Add the waiter before the first waiter with a higher stamp.
>  	 * Waiters without a context are skipped to avoid starving
> @@ -701,7 +757,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>  
>  		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
>  			/* Back off immediately if necessary. */
> -			if (ww_ctx->acquired > 0) {
> +			if (is_wait_die && ww_ctx->acquired > 0) {
>  #ifdef CONFIG_DEBUG_MUTEXES
>  				struct ww_mutex *ww;
>  
> @@ -721,13 +777,26 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>  		 * Wake up the waiter so that it gets a chance to back
>  		 * off.
>  		 */
> -		if (cur->ww_ctx->acquired > 0) {
> +		if (is_wait_die && cur->ww_ctx->acquired > 0) {
>  			debug_mutex_wake_waiter(lock, cur);
>  			wake_up_process(cur->task);
>  		}
>  	}
>  
>  	list_add_tail(&waiter->list, pos);
> +	if (!is_wait_die) {
> +		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
> +
> +		/*
> +		 * Make sure a racing lock taker sees a non-empty waiting list
> +		 * before we read ww->ctx, so that if we miss ww->ctx, the
> +		 * racing lock taker will call __ww_mutex_wake_up_for_backoff()
> +		 * and wound itself.
> +		 */
> +		smp_mb();
> +		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
> +	}
> +
>  	return 0;
>  }
>  
> @@ -750,6 +819,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	if (use_ww_ctx && ww_ctx) {
>  		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>  			return -EALREADY;
> +
> +		/*
> +		 * Reset the wounded flag after a backoff.
> +		 * No other process can race and wound us here since they
> +		 * can't have a valid owner pointer at this time
> +		 */
> +		if (ww_ctx->acquired == 0)
> +			ww_ctx->wounded = false;
>  	}
>  
>  	preempt_disable();
> @@ -858,6 +935,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  acquired:
>  	__set_current_state(TASK_RUNNING);
>  
> +	/* We stole the lock. Need to check wounded status. */
> +	if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
> +	    !__mutex_waiter_is_first(lock, &waiter))
> +		__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
> +
>  	mutex_remove_waiter(lock, &waiter, current);
>  	if (likely(list_empty(&lock->wait_list)))
>  		__mutex_clear_flag(lock, MUTEX_FLAGS);

I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
I have to look at it, and you just made it worse spagethi.


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-13  9:50     ` Peter Zijlstra
  (?)
@ 2018-06-13 10:40       ` Thomas Hellstrom
  -1 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13 10:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On 06/13/2018 11:50 AM, Peter Zijlstra wrote:
>
>> +
>> +	lockdep_assert_held(&lock->wait_lock);
>> +
>> +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
>> +	    ww_ctx->acquired > 0) {
>> +		WRITE_ONCE(hold_ctx->wounded, true);
>> +		if (owner != current) {
>> +			/*
>> +			 * wake_up_process() inserts a write memory barrier to
> It does no such thing. But yes, it does ensure the wakee sees all prior
> stores IFF the wakeup happened.
>
>> +			 * make sure owner sees it is wounded before
>> +			 * TASK_RUNNING in case it's sleeping on another
>> +			 * ww_mutex. Note that owner points to a valid
>> +			 * task_struct as long as we hold the wait_lock.
>> +			 */
> What exactly are you trying to say here ?
>
> I'm thinking this is the pairing barrier to the smp_mb() below, with
> your list_empty() thing? Might make sense to write a single coherent
> comment and refer to the other location.

So what I'm trying to say here is that wake_up_process() ensures that 
the owner, if in !TASK_RUNNING, sees the write to hold_ctx->wounded 
before the transition to TASK_RUNNING. This was how I interpreted "woken 
up" in the wake up process documentation.

>
>> +			wake_up_process(owner);
>> +		}
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>   /*
>>    * Wake up any waiters that may have to back off when the lock is held by the
>>    * given context.
>>    *
>>    * Due to the invariants on the wait list, this can only affect the first
>> - * waiter with a context.
>> + * waiter with a context, unless the Wound-Wait algorithm is used where
>> + * also subsequent waiters with a context main wound the lock holder.
>>    *
>>    * The current task must not be on the wait list.
>>    */
>> @@ -303,6 +338,7 @@ static void __sched
>>   __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>>   {
>>   	struct mutex_waiter *cur;
>> +	bool is_wait_die = ww_ctx->ww_class->is_wait_die;
>>   
>>   	lockdep_assert_held(&lock->wait_lock);
>>   
>> @@ -310,13 +346,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>>   		if (!cur->ww_ctx)
>>   			continue;
>>   
>> -		if (cur->ww_ctx->acquired > 0 &&
>> +		if (is_wait_die && cur->ww_ctx->acquired > 0 &&
>>   		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
>>   			debug_mutex_wake_waiter(lock, cur);
>>   			wake_up_process(cur->task);
>>   		}
>>   
>> -		break;
>> +		if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
>> +			break;
>>   	}
>>   }
>>   
>> @@ -338,12 +375,17 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>>   	 * and keep spinning, or it will acquire wait_lock, add itself
>>   	 * to waiter list and sleep.
>>   	 */
>> -	smp_mb(); /* ^^^ */
>> +	smp_mb(); /* See comments above and below. */
>>   
>>   	/*
>> -	 * Check if lock is contended, if not there is nobody to wake up
>> +	 * Check if lock is contended, if not there is nobody to wake up.
>> +	 * Checking MUTEX_FLAG_WAITERS is not enough here,
> That seems like a superfluous thing to say. It makes sense in the
> context of this patch because we change the FLAG check into a list
> check, but the resulting comment/code looks odd.
>
>> 							   since we need to
>> +	 * order against the lock->ctx check in __ww_mutex_wound called from
>> +	 * __ww_mutex_add_waiter. We can use list_empty without taking the
>> +	 * wait_lock, given the memory barrier above and the list_empty
>> +	 * documentation.
> I don't trust documentation. Please reason about implementation.

Will do.

>>   	 */
>> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
>> +	if (likely(list_empty(&lock->base.wait_list)))
>>   		return;
>>   
>>   	/*
>> @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
>>   	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>>   	struct mutex_waiter *cur;
>>   
>> +	/*
>> +	 * If we miss a wounded == true here, we will have a pending
> Explain how we can miss that.

This is actually the pairing location of the wake_up_process() comment / 
code discussed above. Here we should have !TASK_RUNNING, and let's say 
ctx->wounded is set by another process immediately after we've read it 
(we "miss" it). At that point there must be a pending wake-up-process() 
for us and we'll pick up the set value of wounded on the next iteration 
after returning from schedule().

>
>> +	 * TASK_RUNNING and pick it up on the next schedule fall-through.
>> +	 */
>> +	if (!ctx->ww_class->is_wait_die) {
>> +		if (READ_ONCE(ctx->wounded))
>> +			goto deadlock;
>> +		else
>> +			return 0;
>> +	}
>> +
>>   	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
>>   		goto deadlock;
>>   
>> @@ -683,12 +736,15 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>>   {
>>   	struct mutex_waiter *cur;
>>   	struct list_head *pos;
>> +	bool is_wait_die;
>>   
>>   	if (!ww_ctx) {
>>   		list_add_tail(&waiter->list, &lock->wait_list);
>>   		return 0;
>>   	}
>>   
>> +	is_wait_die = ww_ctx->ww_class->is_wait_die;
>> +
>>   	/*
>>   	 * Add the waiter before the first waiter with a higher stamp.
>>   	 * Waiters without a context are skipped to avoid starving
>> @@ -701,7 +757,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>>   
>>   		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
>>   			/* Back off immediately if necessary. */
>> -			if (ww_ctx->acquired > 0) {
>> +			if (is_wait_die && ww_ctx->acquired > 0) {
>>   #ifdef CONFIG_DEBUG_MUTEXES
>>   				struct ww_mutex *ww;
>>   
>> @@ -721,13 +777,26 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>>   		 * Wake up the waiter so that it gets a chance to back
>>   		 * off.
>>   		 */
>> -		if (cur->ww_ctx->acquired > 0) {
>> +		if (is_wait_die && cur->ww_ctx->acquired > 0) {
>>   			debug_mutex_wake_waiter(lock, cur);
>>   			wake_up_process(cur->task);
>>   		}
>>   	}
>>   
>>   	list_add_tail(&waiter->list, pos);
>> +	if (!is_wait_die) {
>> +		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
>> +
>> +		/*
>> +		 * Make sure a racing lock taker sees a non-empty waiting list
>> +		 * before we read ww->ctx, so that if we miss ww->ctx, the
>> +		 * racing lock taker will call __ww_mutex_wake_up_for_backoff()
>> +		 * and wound itself.
>> +		 */
>> +		smp_mb();
>> +		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
>> +	}
>> +
>>   	return 0;
>>   }
>>   
>> @@ -750,6 +819,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>   	if (use_ww_ctx && ww_ctx) {
>>   		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>>   			return -EALREADY;
>> +
>> +		/*
>> +		 * Reset the wounded flag after a backoff.
>> +		 * No other process can race and wound us here since they
>> +		 * can't have a valid owner pointer at this time
>> +		 */
>> +		if (ww_ctx->acquired == 0)
>> +			ww_ctx->wounded = false;
>>   	}
>>   
>>   	preempt_disable();
>> @@ -858,6 +935,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>   acquired:
>>   	__set_current_state(TASK_RUNNING);
>>   
>> +	/* We stole the lock. Need to check wounded status. */
>> +	if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
>> +	    !__mutex_waiter_is_first(lock, &waiter))
>> +		__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
>> +
>>   	mutex_remove_waiter(lock, &waiter, current);
>>   	if (likely(list_empty(&lock->wait_list)))
>>   		__mutex_clear_flag(lock, MUTEX_FLAGS);
> I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
> I have to look at it, and you just made it worse spagethi.
>
>

Thanks for the review.

Well, I can't speak for the current ww implementation except I didn't 
think it was too hard to understand for a first time reader.

Admittedly the Wound-Wait path makes it worse since it's a preemptive 
algorithm and we need to touch other processes a acquire contexts and 
worry about ordering.

So, assuming your review comments are fixed up, is that a solid NAK or 
do you have any suggestion that would make you more comfortable with the 
code? like splitting out ww-stuff to a separate file?

/Thomas



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13 10:40       ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13 10:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On 06/13/2018 11:50 AM, Peter Zijlstra wrote:
>
>> +
>> +	lockdep_assert_held(&lock->wait_lock);
>> +
>> +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
>> +	    ww_ctx->acquired > 0) {
>> +		WRITE_ONCE(hold_ctx->wounded, true);
>> +		if (owner != current) {
>> +			/*
>> +			 * wake_up_process() inserts a write memory barrier to
> It does no such thing. But yes, it does ensure the wakee sees all prior
> stores IFF the wakeup happened.
>
>> +			 * make sure owner sees it is wounded before
>> +			 * TASK_RUNNING in case it's sleeping on another
>> +			 * ww_mutex. Note that owner points to a valid
>> +			 * task_struct as long as we hold the wait_lock.
>> +			 */
> What exactly are you trying to say here ?
>
> I'm thinking this is the pairing barrier to the smp_mb() below, with
> your list_empty() thing? Might make sense to write a single coherent
> comment and refer to the other location.

So what I'm trying to say here is that wake_up_process() ensures that 
the owner, if in !TASK_RUNNING, sees the write to hold_ctx->wounded 
before the transition to TASK_RUNNING. This was how I interpreted "woken 
up" in the wake up process documentation.

>
>> +			wake_up_process(owner);
>> +		}
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>   /*
>>    * Wake up any waiters that may have to back off when the lock is held by the
>>    * given context.
>>    *
>>    * Due to the invariants on the wait list, this can only affect the first
>> - * waiter with a context.
>> + * waiter with a context, unless the Wound-Wait algorithm is used where
>> + * also subsequent waiters with a context main wound the lock holder.
>>    *
>>    * The current task must not be on the wait list.
>>    */
>> @@ -303,6 +338,7 @@ static void __sched
>>   __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>>   {
>>   	struct mutex_waiter *cur;
>> +	bool is_wait_die = ww_ctx->ww_class->is_wait_die;
>>   
>>   	lockdep_assert_held(&lock->wait_lock);
>>   
>> @@ -310,13 +346,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>>   		if (!cur->ww_ctx)
>>   			continue;
>>   
>> -		if (cur->ww_ctx->acquired > 0 &&
>> +		if (is_wait_die && cur->ww_ctx->acquired > 0 &&
>>   		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
>>   			debug_mutex_wake_waiter(lock, cur);
>>   			wake_up_process(cur->task);
>>   		}
>>   
>> -		break;
>> +		if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
>> +			break;
>>   	}
>>   }
>>   
>> @@ -338,12 +375,17 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>>   	 * and keep spinning, or it will acquire wait_lock, add itself
>>   	 * to waiter list and sleep.
>>   	 */
>> -	smp_mb(); /* ^^^ */
>> +	smp_mb(); /* See comments above and below. */
>>   
>>   	/*
>> -	 * Check if lock is contended, if not there is nobody to wake up
>> +	 * Check if lock is contended, if not there is nobody to wake up.
>> +	 * Checking MUTEX_FLAG_WAITERS is not enough here,
> That seems like a superfluous thing to say. It makes sense in the
> context of this patch because we change the FLAG check into a list
> check, but the resulting comment/code looks odd.
>
>> 							   since we need to
>> +	 * order against the lock->ctx check in __ww_mutex_wound called from
>> +	 * __ww_mutex_add_waiter. We can use list_empty without taking the
>> +	 * wait_lock, given the memory barrier above and the list_empty
>> +	 * documentation.
> I don't trust documentation. Please reason about implementation.

Will do.

>>   	 */
>> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
>> +	if (likely(list_empty(&lock->base.wait_list)))
>>   		return;
>>   
>>   	/*
>> @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
>>   	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>>   	struct mutex_waiter *cur;
>>   
>> +	/*
>> +	 * If we miss a wounded == true here, we will have a pending
> Explain how we can miss that.

This is actually the pairing location of the wake_up_process() comment / 
code discussed above. Here we should have !TASK_RUNNING, and let's say 
ctx->wounded is set by another process immediately after we've read it 
(we "miss" it). At that point there must be a pending wake-up-process() 
for us and we'll pick up the set value of wounded on the next iteration 
after returning from schedule().

>
>> +	 * TASK_RUNNING and pick it up on the next schedule fall-through.
>> +	 */
>> +	if (!ctx->ww_class->is_wait_die) {
>> +		if (READ_ONCE(ctx->wounded))
>> +			goto deadlock;
>> +		else
>> +			return 0;
>> +	}
>> +
>>   	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
>>   		goto deadlock;
>>   
>> @@ -683,12 +736,15 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>>   {
>>   	struct mutex_waiter *cur;
>>   	struct list_head *pos;
>> +	bool is_wait_die;
>>   
>>   	if (!ww_ctx) {
>>   		list_add_tail(&waiter->list, &lock->wait_list);
>>   		return 0;
>>   	}
>>   
>> +	is_wait_die = ww_ctx->ww_class->is_wait_die;
>> +
>>   	/*
>>   	 * Add the waiter before the first waiter with a higher stamp.
>>   	 * Waiters without a context are skipped to avoid starving
>> @@ -701,7 +757,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>>   
>>   		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
>>   			/* Back off immediately if necessary. */
>> -			if (ww_ctx->acquired > 0) {
>> +			if (is_wait_die && ww_ctx->acquired > 0) {
>>   #ifdef CONFIG_DEBUG_MUTEXES
>>   				struct ww_mutex *ww;
>>   
>> @@ -721,13 +777,26 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>>   		 * Wake up the waiter so that it gets a chance to back
>>   		 * off.
>>   		 */
>> -		if (cur->ww_ctx->acquired > 0) {
>> +		if (is_wait_die && cur->ww_ctx->acquired > 0) {
>>   			debug_mutex_wake_waiter(lock, cur);
>>   			wake_up_process(cur->task);
>>   		}
>>   	}
>>   
>>   	list_add_tail(&waiter->list, pos);
>> +	if (!is_wait_die) {
>> +		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
>> +
>> +		/*
>> +		 * Make sure a racing lock taker sees a non-empty waiting list
>> +		 * before we read ww->ctx, so that if we miss ww->ctx, the
>> +		 * racing lock taker will call __ww_mutex_wake_up_for_backoff()
>> +		 * and wound itself.
>> +		 */
>> +		smp_mb();
>> +		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
>> +	}
>> +
>>   	return 0;
>>   }
>>   
>> @@ -750,6 +819,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>   	if (use_ww_ctx && ww_ctx) {
>>   		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>>   			return -EALREADY;
>> +
>> +		/*
>> +		 * Reset the wounded flag after a backoff.
>> +		 * No other process can race and wound us here since they
>> +		 * can't have a valid owner pointer at this time
>> +		 */
>> +		if (ww_ctx->acquired == 0)
>> +			ww_ctx->wounded = false;
>>   	}
>>   
>>   	preempt_disable();
>> @@ -858,6 +935,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>   acquired:
>>   	__set_current_state(TASK_RUNNING);
>>   
>> +	/* We stole the lock. Need to check wounded status. */
>> +	if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
>> +	    !__mutex_waiter_is_first(lock, &waiter))
>> +		__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
>> +
>>   	mutex_remove_waiter(lock, &waiter, current);
>>   	if (likely(list_empty(&lock->wait_list)))
>>   		__mutex_clear_flag(lock, MUTEX_FLAGS);
> I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
> I have to look at it, and you just made it worse spagethi.
>
>

Thanks for the review.

Well, I can't speak for the current ww implementation except I didn't 
think it was too hard to understand for a first time reader.

Admittedly the Wound-Wait path makes it worse since it's a preemptive 
algorithm and we need to touch other processes a acquire contexts and 
worry about ordering.

So, assuming your review comments are fixed up, is that a solid NAK or 
do you have any suggestion that would make you more comfortable with the 
code? like splitting out ww-stuff to a separate file?

/Thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13 10:40       ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13 10:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, David Airlie,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	Greg Kroah-Hartman, Ingo Molnar, Philippe Ombredanne,
	Thomas Gleixner, Paul E. McKenney, linux-media

On 06/13/2018 11:50 AM, Peter Zijlstra wrote:
>
>> +
>> +	lockdep_assert_held(&lock->wait_lock);
>> +
>> +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
>> +	    ww_ctx->acquired > 0) {
>> +		WRITE_ONCE(hold_ctx->wounded, true);
>> +		if (owner != current) {
>> +			/*
>> +			 * wake_up_process() inserts a write memory barrier to
> It does no such thing. But yes, it does ensure the wakee sees all prior
> stores IFF the wakeup happened.
>
>> +			 * make sure owner sees it is wounded before
>> +			 * TASK_RUNNING in case it's sleeping on another
>> +			 * ww_mutex. Note that owner points to a valid
>> +			 * task_struct as long as we hold the wait_lock.
>> +			 */
> What exactly are you trying to say here ?
>
> I'm thinking this is the pairing barrier to the smp_mb() below, with
> your list_empty() thing? Might make sense to write a single coherent
> comment and refer to the other location.

So what I'm trying to say here is that wake_up_process() ensures that 
the owner, if in !TASK_RUNNING, sees the write to hold_ctx->wounded 
before the transition to TASK_RUNNING. This was how I interpreted "woken 
up" in the wake up process documentation.

>
>> +			wake_up_process(owner);
>> +		}
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>   /*
>>    * Wake up any waiters that may have to back off when the lock is held by the
>>    * given context.
>>    *
>>    * Due to the invariants on the wait list, this can only affect the first
>> - * waiter with a context.
>> + * waiter with a context, unless the Wound-Wait algorithm is used where
>> + * also subsequent waiters with a context main wound the lock holder.
>>    *
>>    * The current task must not be on the wait list.
>>    */
>> @@ -303,6 +338,7 @@ static void __sched
>>   __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>>   {
>>   	struct mutex_waiter *cur;
>> +	bool is_wait_die = ww_ctx->ww_class->is_wait_die;
>>   
>>   	lockdep_assert_held(&lock->wait_lock);
>>   
>> @@ -310,13 +346,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>>   		if (!cur->ww_ctx)
>>   			continue;
>>   
>> -		if (cur->ww_ctx->acquired > 0 &&
>> +		if (is_wait_die && cur->ww_ctx->acquired > 0 &&
>>   		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
>>   			debug_mutex_wake_waiter(lock, cur);
>>   			wake_up_process(cur->task);
>>   		}
>>   
>> -		break;
>> +		if (is_wait_die || __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
>> +			break;
>>   	}
>>   }
>>   
>> @@ -338,12 +375,17 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>>   	 * and keep spinning, or it will acquire wait_lock, add itself
>>   	 * to waiter list and sleep.
>>   	 */
>> -	smp_mb(); /* ^^^ */
>> +	smp_mb(); /* See comments above and below. */
>>   
>>   	/*
>> -	 * Check if lock is contended, if not there is nobody to wake up
>> +	 * Check if lock is contended, if not there is nobody to wake up.
>> +	 * Checking MUTEX_FLAG_WAITERS is not enough here,
> That seems like a superfluous thing to say. It makes sense in the
> context of this patch because we change the FLAG check into a list
> check, but the resulting comment/code looks odd.
>
>> 							   since we need to
>> +	 * order against the lock->ctx check in __ww_mutex_wound called from
>> +	 * __ww_mutex_add_waiter. We can use list_empty without taking the
>> +	 * wait_lock, given the memory barrier above and the list_empty
>> +	 * documentation.
> I don't trust documentation. Please reason about implementation.

Will do.

>>   	 */
>> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
>> +	if (likely(list_empty(&lock->base.wait_list)))
>>   		return;
>>   
>>   	/*
>> @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
>>   	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>>   	struct mutex_waiter *cur;
>>   
>> +	/*
>> +	 * If we miss a wounded == true here, we will have a pending
> Explain how we can miss that.

This is actually the pairing location of the wake_up_process() comment / 
code discussed above. Here we should have !TASK_RUNNING, and let's say 
ctx->wounded is set by another process immediately after we've read it 
(we "miss" it). At that point there must be a pending wake-up-process() 
for us and we'll pick up the set value of wounded on the next iteration 
after returning from schedule().

>
>> +	 * TASK_RUNNING and pick it up on the next schedule fall-through.
>> +	 */
>> +	if (!ctx->ww_class->is_wait_die) {
>> +		if (READ_ONCE(ctx->wounded))
>> +			goto deadlock;
>> +		else
>> +			return 0;
>> +	}
>> +
>>   	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
>>   		goto deadlock;
>>   
>> @@ -683,12 +736,15 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>>   {
>>   	struct mutex_waiter *cur;
>>   	struct list_head *pos;
>> +	bool is_wait_die;
>>   
>>   	if (!ww_ctx) {
>>   		list_add_tail(&waiter->list, &lock->wait_list);
>>   		return 0;
>>   	}
>>   
>> +	is_wait_die = ww_ctx->ww_class->is_wait_die;
>> +
>>   	/*
>>   	 * Add the waiter before the first waiter with a higher stamp.
>>   	 * Waiters without a context are skipped to avoid starving
>> @@ -701,7 +757,7 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>>   
>>   		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
>>   			/* Back off immediately if necessary. */
>> -			if (ww_ctx->acquired > 0) {
>> +			if (is_wait_die && ww_ctx->acquired > 0) {
>>   #ifdef CONFIG_DEBUG_MUTEXES
>>   				struct ww_mutex *ww;
>>   
>> @@ -721,13 +777,26 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>>   		 * Wake up the waiter so that it gets a chance to back
>>   		 * off.
>>   		 */
>> -		if (cur->ww_ctx->acquired > 0) {
>> +		if (is_wait_die && cur->ww_ctx->acquired > 0) {
>>   			debug_mutex_wake_waiter(lock, cur);
>>   			wake_up_process(cur->task);
>>   		}
>>   	}
>>   
>>   	list_add_tail(&waiter->list, pos);
>> +	if (!is_wait_die) {
>> +		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
>> +
>> +		/*
>> +		 * Make sure a racing lock taker sees a non-empty waiting list
>> +		 * before we read ww->ctx, so that if we miss ww->ctx, the
>> +		 * racing lock taker will call __ww_mutex_wake_up_for_backoff()
>> +		 * and wound itself.
>> +		 */
>> +		smp_mb();
>> +		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
>> +	}
>> +
>>   	return 0;
>>   }
>>   
>> @@ -750,6 +819,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>   	if (use_ww_ctx && ww_ctx) {
>>   		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>>   			return -EALREADY;
>> +
>> +		/*
>> +		 * Reset the wounded flag after a backoff.
>> +		 * No other process can race and wound us here since they
>> +		 * can't have a valid owner pointer at this time
>> +		 */
>> +		if (ww_ctx->acquired == 0)
>> +			ww_ctx->wounded = false;
>>   	}
>>   
>>   	preempt_disable();
>> @@ -858,6 +935,11 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>   acquired:
>>   	__set_current_state(TASK_RUNNING);
>>   
>> +	/* We stole the lock. Need to check wounded status. */
>> +	if (use_ww_ctx && ww_ctx && !ww_ctx->ww_class->is_wait_die &&
>> +	    !__mutex_waiter_is_first(lock, &waiter))
>> +		__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
>> +
>>   	mutex_remove_waiter(lock, &waiter, current);
>>   	if (likely(list_empty(&lock->wait_list)))
>>   		__mutex_clear_flag(lock, MUTEX_FLAGS);
> I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
> I have to look at it, and you just made it worse spagethi.
>
>

Thanks for the review.

Well, I can't speak for the current ww implementation except I didn't 
think it was too hard to understand for a first time reader.

Admittedly the Wound-Wait path makes it worse since it's a preemptive 
algorithm and we need to touch other processes a acquire contexts and 
worry about ordering.

So, assuming your review comments are fixed up, is that a solid NAK or 
do you have any suggestion that would make you more comfortable with the 
code? like splitting out ww-stuff to a separate file?

/Thomas


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-13 10:40       ` Thomas Hellstrom
  (?)
@ 2018-06-13 13:10         ` Peter Zijlstra
  -1 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-13 13:10 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On Wed, Jun 13, 2018 at 12:40:29PM +0200, Thomas Hellstrom wrote:
> On 06/13/2018 11:50 AM, Peter Zijlstra wrote:
> > 
> > > +
> > > +	lockdep_assert_held(&lock->wait_lock);
> > > +
> > > +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
> > > +	    ww_ctx->acquired > 0) {
> > > +		WRITE_ONCE(hold_ctx->wounded, true);
> > > +		if (owner != current) {
> > > +			/*
> > > +			 * wake_up_process() inserts a write memory barrier to
> > It does no such thing. But yes, it does ensure the wakee sees all prior
> > stores IFF the wakeup happened.
> > 
> > > +			 * make sure owner sees it is wounded before
> > > +			 * TASK_RUNNING in case it's sleeping on another
> > > +			 * ww_mutex. Note that owner points to a valid
> > > +			 * task_struct as long as we hold the wait_lock.
> > > +			 */
> > What exactly are you trying to say here ?
> > 
> > I'm thinking this is the pairing barrier to the smp_mb() below, with
> > your list_empty() thing? Might make sense to write a single coherent
> > comment and refer to the other location.
> 
> So what I'm trying to say here is that wake_up_process() ensures that the
> owner, if in !TASK_RUNNING, sees the write to hold_ctx->wounded before the
> transition to TASK_RUNNING. This was how I interpreted "woken up" in the
> wake up process documentation.

There is documentation!? :-) Aaah, you mean that kerneldoc comment with
wake_up_process() ? Yeah, that needs fixing. /me puts on endless todo
list.

Anyway, wakeup providing that ordering isn't something that needs a
comment of that size; and I think the only comment here is that we care
about the ordering and a reference to the site(s) that pairs with it.

Maybe something like:

	/*
	 * __ww_mutex_lock_check_stamp() will observe our wounded store.
	 */

> > > -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> > > +	if (likely(list_empty(&lock->base.wait_list)))
> > >   		return;
> > >   	/*
> > > @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
> > >   	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
> > >   	struct mutex_waiter *cur;
> > > +	/*
> > > +	 * If we miss a wounded == true here, we will have a pending
> > Explain how we can miss that.
> 
> This is actually the pairing location of the wake_up_process() comment /
> code discussed above. Here we should have !TASK_RUNNING, and let's say
> ctx->wounded is set by another process immediately after we've read it (we
> "miss" it). At that point there must be a pending wake-up-process() for us
> and we'll pick up the set value of wounded on the next iteration after
> returning from schedule().

Right, so that's when the above wakeup isn't the one waking us.


> > I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
> > I have to look at it, and you just made it worse spagethi.

> Well, I can't speak for the current ww implementation except I didn't think
> it was too hard to understand for a first time reader.
> 
> Admittedly the Wound-Wait path makes it worse since it's a preemptive
> algorithm and we need to touch other processes a acquire contexts and worry
> about ordering.
> 
> So, assuming your review comments are fixed up, is that a solid NAK or do
> you have any suggestion that would make you more comfortable with the code?
> like splitting out ww-stuff to a separate file?

Nah, not a NAK, but we should look at whan can be done to improve code.
Maybe add a few more comments that explain why. Part of the problem with
ww_mutex is always that I forget exactly how they work and mutex.c
doesn't have much useful comments in (most of those are in ww_mutex.h
and I always forget to look there).

Also; I'm not at all sure about the exact difference between what we
have and what you propose. I did read the documentation part (I really
should not have to) but it just doesn't jive.

I suspect you're using preemption entirely different from what we
usually call a preemption.



Also, __ww_ctx_stamp_after() is crap; did we want to write:

	return (signed long)(a->stamp - b->stamp) > 0;

or something?



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13 13:10         ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-13 13:10 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On Wed, Jun 13, 2018 at 12:40:29PM +0200, Thomas Hellstrom wrote:
> On 06/13/2018 11:50 AM, Peter Zijlstra wrote:
> > 
> > > +
> > > +	lockdep_assert_held(&lock->wait_lock);
> > > +
> > > +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
> > > +	    ww_ctx->acquired > 0) {
> > > +		WRITE_ONCE(hold_ctx->wounded, true);
> > > +		if (owner != current) {
> > > +			/*
> > > +			 * wake_up_process() inserts a write memory barrier to
> > It does no such thing. But yes, it does ensure the wakee sees all prior
> > stores IFF the wakeup happened.
> > 
> > > +			 * make sure owner sees it is wounded before
> > > +			 * TASK_RUNNING in case it's sleeping on another
> > > +			 * ww_mutex. Note that owner points to a valid
> > > +			 * task_struct as long as we hold the wait_lock.
> > > +			 */
> > What exactly are you trying to say here ?
> > 
> > I'm thinking this is the pairing barrier to the smp_mb() below, with
> > your list_empty() thing? Might make sense to write a single coherent
> > comment and refer to the other location.
> 
> So what I'm trying to say here is that wake_up_process() ensures that the
> owner, if in !TASK_RUNNING, sees the write to hold_ctx->wounded before the
> transition to TASK_RUNNING. This was how I interpreted "woken up" in the
> wake up process documentation.

There is documentation!? :-) Aaah, you mean that kerneldoc comment with
wake_up_process() ? Yeah, that needs fixing. /me puts on endless todo
list.

Anyway, wakeup providing that ordering isn't something that needs a
comment of that size; and I think the only comment here is that we care
about the ordering and a reference to the site(s) that pairs with it.

Maybe something like:

	/*
	 * __ww_mutex_lock_check_stamp() will observe our wounded store.
	 */

> > > -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> > > +	if (likely(list_empty(&lock->base.wait_list)))
> > >   		return;
> > >   	/*
> > > @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
> > >   	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
> > >   	struct mutex_waiter *cur;
> > > +	/*
> > > +	 * If we miss a wounded == true here, we will have a pending
> > Explain how we can miss that.
> 
> This is actually the pairing location of the wake_up_process() comment /
> code discussed above. Here we should have !TASK_RUNNING, and let's say
> ctx->wounded is set by another process immediately after we've read it (we
> "miss" it). At that point there must be a pending wake-up-process() for us
> and we'll pick up the set value of wounded on the next iteration after
> returning from schedule().

Right, so that's when the above wakeup isn't the one waking us.


> > I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
> > I have to look at it, and you just made it worse spagethi.

> Well, I can't speak for the current ww implementation except I didn't think
> it was too hard to understand for a first time reader.
> 
> Admittedly the Wound-Wait path makes it worse since it's a preemptive
> algorithm and we need to touch other processes a acquire contexts and worry
> about ordering.
> 
> So, assuming your review comments are fixed up, is that a solid NAK or do
> you have any suggestion that would make you more comfortable with the code?
> like splitting out ww-stuff to a separate file?

Nah, not a NAK, but we should look at whan can be done to improve code.
Maybe add a few more comments that explain why. Part of the problem with
ww_mutex is always that I forget exactly how they work and mutex.c
doesn't have much useful comments in (most of those are in ww_mutex.h
and I always forget to look there).

Also; I'm not at all sure about the exact difference between what we
have and what you propose. I did read the documentation part (I really
should not have to) but it just doesn't jive.

I suspect you're using preemption entirely different from what we
usually call a preemption.



Also, __ww_ctx_stamp_after() is crap; did we want to write:

	return (signed long)(a->stamp - b->stamp) > 0;

or something?


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13 13:10         ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-13 13:10 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, David Airlie,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	Greg Kroah-Hartman, Ingo Molnar, Philippe Ombredanne,
	Thomas Gleixner, Paul E. McKenney, linux-media

On Wed, Jun 13, 2018 at 12:40:29PM +0200, Thomas Hellstrom wrote:
> On 06/13/2018 11:50 AM, Peter Zijlstra wrote:
> > 
> > > +
> > > +	lockdep_assert_held(&lock->wait_lock);
> > > +
> > > +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
> > > +	    ww_ctx->acquired > 0) {
> > > +		WRITE_ONCE(hold_ctx->wounded, true);
> > > +		if (owner != current) {
> > > +			/*
> > > +			 * wake_up_process() inserts a write memory barrier to
> > It does no such thing. But yes, it does ensure the wakee sees all prior
> > stores IFF the wakeup happened.
> > 
> > > +			 * make sure owner sees it is wounded before
> > > +			 * TASK_RUNNING in case it's sleeping on another
> > > +			 * ww_mutex. Note that owner points to a valid
> > > +			 * task_struct as long as we hold the wait_lock.
> > > +			 */
> > What exactly are you trying to say here ?
> > 
> > I'm thinking this is the pairing barrier to the smp_mb() below, with
> > your list_empty() thing? Might make sense to write a single coherent
> > comment and refer to the other location.
> 
> So what I'm trying to say here is that wake_up_process() ensures that the
> owner, if in !TASK_RUNNING, sees the write to hold_ctx->wounded before the
> transition to TASK_RUNNING. This was how I interpreted "woken up" in the
> wake up process documentation.

There is documentation!? :-) Aaah, you mean that kerneldoc comment with
wake_up_process() ? Yeah, that needs fixing. /me puts on endless todo
list.

Anyway, wakeup providing that ordering isn't something that needs a
comment of that size; and I think the only comment here is that we care
about the ordering and a reference to the site(s) that pairs with it.

Maybe something like:

	/*
	 * __ww_mutex_lock_check_stamp() will observe our wounded store.
	 */

> > > -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> > > +	if (likely(list_empty(&lock->base.wait_list)))
> > >   		return;
> > >   	/*
> > > @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
> > >   	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
> > >   	struct mutex_waiter *cur;
> > > +	/*
> > > +	 * If we miss a wounded == true here, we will have a pending
> > Explain how we can miss that.
> 
> This is actually the pairing location of the wake_up_process() comment /
> code discussed above. Here we should have !TASK_RUNNING, and let's say
> ctx->wounded is set by another process immediately after we've read it (we
> "miss" it). At that point there must be a pending wake-up-process() for us
> and we'll pick up the set value of wounded on the next iteration after
> returning from schedule().

Right, so that's when the above wakeup isn't the one waking us.


> > I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
> > I have to look at it, and you just made it worse spagethi.

> Well, I can't speak for the current ww implementation except I didn't think
> it was too hard to understand for a first time reader.
> 
> Admittedly the Wound-Wait path makes it worse since it's a preemptive
> algorithm and we need to touch other processes a acquire contexts and worry
> about ordering.
> 
> So, assuming your review comments are fixed up, is that a solid NAK or do
> you have any suggestion that would make you more comfortable with the code?
> like splitting out ww-stuff to a separate file?

Nah, not a NAK, but we should look at whan can be done to improve code.
Maybe add a few more comments that explain why. Part of the problem with
ww_mutex is always that I forget exactly how they work and mutex.c
doesn't have much useful comments in (most of those are in ww_mutex.h
and I always forget to look there).

Also; I'm not at all sure about the exact difference between what we
have and what you propose. I did read the documentation part (I really
should not have to) but it just doesn't jive.

I suspect you're using preemption entirely different from what we
usually call a preemption.



Also, __ww_ctx_stamp_after() is crap; did we want to write:

	return (signed long)(a->stamp - b->stamp) > 0;

or something?


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-13 13:10         ` Peter Zijlstra
  (?)
@ 2018-06-13 14:05           ` Thomas Hellstrom
  -1 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On 06/13/2018 03:10 PM, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 12:40:29PM +0200, Thomas Hellstrom wrote:
>> On 06/13/2018 11:50 AM, Peter Zijlstra wrote:
>>>> +
>>>> +	lockdep_assert_held(&lock->wait_lock);
>>>> +
>>>> +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
>>>> +	    ww_ctx->acquired > 0) {
>>>> +		WRITE_ONCE(hold_ctx->wounded, true);
>>>> +		if (owner != current) {
>>>> +			/*
>>>> +			 * wake_up_process() inserts a write memory barrier to
>>> It does no such thing. But yes, it does ensure the wakee sees all prior
>>> stores IFF the wakeup happened.
>>>
>>>> +			 * make sure owner sees it is wounded before
>>>> +			 * TASK_RUNNING in case it's sleeping on another
>>>> +			 * ww_mutex. Note that owner points to a valid
>>>> +			 * task_struct as long as we hold the wait_lock.
>>>> +			 */
>>> What exactly are you trying to say here ?
>>>
>>> I'm thinking this is the pairing barrier to the smp_mb() below, with
>>> your list_empty() thing? Might make sense to write a single coherent
>>> comment and refer to the other location.
>> So what I'm trying to say here is that wake_up_process() ensures that the
>> owner, if in !TASK_RUNNING, sees the write to hold_ctx->wounded before the
>> transition to TASK_RUNNING. This was how I interpreted "woken up" in the
>> wake up process documentation.
> There is documentation!? :-) Aaah, you mean that kerneldoc comment with
> wake_up_process() ? Yeah, that needs fixing. /me puts on endless todo
> list.
>
> Anyway, wakeup providing that ordering isn't something that needs a
> comment of that size; and I think the only comment here is that we care
> about the ordering and a reference to the site(s) that pairs with it.
>
> Maybe something like:
>
> 	/*
> 	 * __ww_mutex_lock_check_stamp() will observe our wounded store.
> 	 */

Yes.

Actually, I just found the set_current_state() kerneldoc which explains 
the built-in barrier pairing with wake_up_xxx. Perhaps I also should 
mention that as well. Looks like the use WRITE_ONCE() and READ_ONCE() 
can be dropped as well.

>>>> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
>>>> +	if (likely(list_empty(&lock->base.wait_list)))
>>>>    		return;
>>>>    	/*
>>>> @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
>>>>    	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>>>>    	struct mutex_waiter *cur;
>>>> +	/*
>>>> +	 * If we miss a wounded == true here, we will have a pending
>>> Explain how we can miss that.
>> This is actually the pairing location of the wake_up_process() comment /
>> code discussed above. Here we should have !TASK_RUNNING, and let's say
>> ctx->wounded is set by another process immediately after we've read it (we
>> "miss" it). At that point there must be a pending wake-up-process() for us
>> and we'll pick up the set value of wounded on the next iteration after
>> returning from schedule().
> Right, so that's when the above wakeup isn't the one waking us.
>
>
>>> I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
>>> I have to look at it, and you just made it worse spagethi.
>> Well, I can't speak for the current ww implementation except I didn't think
>> it was too hard to understand for a first time reader.
>>
>> Admittedly the Wound-Wait path makes it worse since it's a preemptive
>> algorithm and we need to touch other processes a acquire contexts and worry
>> about ordering.
>>
>> So, assuming your review comments are fixed up, is that a solid NAK or do
>> you have any suggestion that would make you more comfortable with the code?
>> like splitting out ww-stuff to a separate file?
> Nah, not a NAK, but we should look at whan can be done to improve code.
> Maybe add a few more comments that explain why. Part of the problem with
> ww_mutex is always that I forget exactly how they work and mutex.c
> doesn't have much useful comments in (most of those are in ww_mutex.h
> and I always forget to look there).

Understood.

>
> Also; I'm not at all sure about the exact difference between what we
> have and what you propose. I did read the documentation part (I really
> should not have to) but it just doesn't jive.
>
> I suspect you're using preemption entirely different from what we
> usually call a preemption.

I think that perhaps requires a good understanding of the difference of 
the algorithms in question before looking at the implementation. I put a 
short explanation and some URLs to CS websites describing the two 
algorithms and their pros and cons in the patch series introductory 
message. I'll forward that.

In short, with Wait-Die (before the patch) it's the process _taking_ the 
contended lock that backs off if necessary. No preemption required. With 
Wound-Wait, it's the process _holding_ the contended lock that gets 
wounded (preempted), and it needs to back off at its own discretion but 
no later than when it's going to sleep on another ww mutex. That point 
is where we intercept the preemption request. We're preempting the 
transaction rather than the process.


>
>
> Also, __ww_ctx_stamp_after() is crap; did we want to write:
>
> 	return (signed long)(a->stamp - b->stamp) > 0;
>
> or something?
>
>
Hmm. Yes it indeed looks odd. Seems like the above code should do the trick.

/Thomas



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13 14:05           ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On 06/13/2018 03:10 PM, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 12:40:29PM +0200, Thomas Hellstrom wrote:
>> On 06/13/2018 11:50 AM, Peter Zijlstra wrote:
>>>> +
>>>> +	lockdep_assert_held(&lock->wait_lock);
>>>> +
>>>> +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
>>>> +	    ww_ctx->acquired > 0) {
>>>> +		WRITE_ONCE(hold_ctx->wounded, true);
>>>> +		if (owner != current) {
>>>> +			/*
>>>> +			 * wake_up_process() inserts a write memory barrier to
>>> It does no such thing. But yes, it does ensure the wakee sees all prior
>>> stores IFF the wakeup happened.
>>>
>>>> +			 * make sure owner sees it is wounded before
>>>> +			 * TASK_RUNNING in case it's sleeping on another
>>>> +			 * ww_mutex. Note that owner points to a valid
>>>> +			 * task_struct as long as we hold the wait_lock.
>>>> +			 */
>>> What exactly are you trying to say here ?
>>>
>>> I'm thinking this is the pairing barrier to the smp_mb() below, with
>>> your list_empty() thing? Might make sense to write a single coherent
>>> comment and refer to the other location.
>> So what I'm trying to say here is that wake_up_process() ensures that the
>> owner, if in !TASK_RUNNING, sees the write to hold_ctx->wounded before the
>> transition to TASK_RUNNING. This was how I interpreted "woken up" in the
>> wake up process documentation.
> There is documentation!? :-) Aaah, you mean that kerneldoc comment with
> wake_up_process() ? Yeah, that needs fixing. /me puts on endless todo
> list.
>
> Anyway, wakeup providing that ordering isn't something that needs a
> comment of that size; and I think the only comment here is that we care
> about the ordering and a reference to the site(s) that pairs with it.
>
> Maybe something like:
>
> 	/*
> 	 * __ww_mutex_lock_check_stamp() will observe our wounded store.
> 	 */

Yes.

Actually, I just found the set_current_state() kerneldoc which explains 
the built-in barrier pairing with wake_up_xxx. Perhaps I also should 
mention that as well. Looks like the use WRITE_ONCE() and READ_ONCE() 
can be dropped as well.

>>>> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
>>>> +	if (likely(list_empty(&lock->base.wait_list)))
>>>>    		return;
>>>>    	/*
>>>> @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
>>>>    	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>>>>    	struct mutex_waiter *cur;
>>>> +	/*
>>>> +	 * If we miss a wounded == true here, we will have a pending
>>> Explain how we can miss that.
>> This is actually the pairing location of the wake_up_process() comment /
>> code discussed above. Here we should have !TASK_RUNNING, and let's say
>> ctx->wounded is set by another process immediately after we've read it (we
>> "miss" it). At that point there must be a pending wake-up-process() for us
>> and we'll pick up the set value of wounded on the next iteration after
>> returning from schedule().
> Right, so that's when the above wakeup isn't the one waking us.
>
>
>>> I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
>>> I have to look at it, and you just made it worse spagethi.
>> Well, I can't speak for the current ww implementation except I didn't think
>> it was too hard to understand for a first time reader.
>>
>> Admittedly the Wound-Wait path makes it worse since it's a preemptive
>> algorithm and we need to touch other processes a acquire contexts and worry
>> about ordering.
>>
>> So, assuming your review comments are fixed up, is that a solid NAK or do
>> you have any suggestion that would make you more comfortable with the code?
>> like splitting out ww-stuff to a separate file?
> Nah, not a NAK, but we should look at whan can be done to improve code.
> Maybe add a few more comments that explain why. Part of the problem with
> ww_mutex is always that I forget exactly how they work and mutex.c
> doesn't have much useful comments in (most of those are in ww_mutex.h
> and I always forget to look there).

Understood.

>
> Also; I'm not at all sure about the exact difference between what we
> have and what you propose. I did read the documentation part (I really
> should not have to) but it just doesn't jive.
>
> I suspect you're using preemption entirely different from what we
> usually call a preemption.

I think that perhaps requires a good understanding of the difference of 
the algorithms in question before looking at the implementation. I put a 
short explanation and some URLs to CS websites describing the two 
algorithms and their pros and cons in the patch series introductory 
message. I'll forward that.

In short, with Wait-Die (before the patch) it's the process _taking_ the 
contended lock that backs off if necessary. No preemption required. With 
Wound-Wait, it's the process _holding_ the contended lock that gets 
wounded (preempted), and it needs to back off at its own discretion but 
no later than when it's going to sleep on another ww mutex. That point 
is where we intercept the preemption request. We're preempting the 
transaction rather than the process.


>
>
> Also, __ww_ctx_stamp_after() is crap; did we want to write:
>
> 	return (signed long)(a->stamp - b->stamp) > 0;
>
> or something?
>
>
Hmm. Yes it indeed looks odd. Seems like the above code should do the trick.

/Thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-13 14:05           ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-13 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, David Airlie,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	Greg Kroah-Hartman, Ingo Molnar, Philippe Ombredanne,
	Thomas Gleixner, Paul E. McKenney, linux-media

On 06/13/2018 03:10 PM, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 12:40:29PM +0200, Thomas Hellstrom wrote:
>> On 06/13/2018 11:50 AM, Peter Zijlstra wrote:
>>>> +
>>>> +	lockdep_assert_held(&lock->wait_lock);
>>>> +
>>>> +	if (owner && hold_ctx && __ww_ctx_stamp_after(hold_ctx, ww_ctx) &&
>>>> +	    ww_ctx->acquired > 0) {
>>>> +		WRITE_ONCE(hold_ctx->wounded, true);
>>>> +		if (owner != current) {
>>>> +			/*
>>>> +			 * wake_up_process() inserts a write memory barrier to
>>> It does no such thing. But yes, it does ensure the wakee sees all prior
>>> stores IFF the wakeup happened.
>>>
>>>> +			 * make sure owner sees it is wounded before
>>>> +			 * TASK_RUNNING in case it's sleeping on another
>>>> +			 * ww_mutex. Note that owner points to a valid
>>>> +			 * task_struct as long as we hold the wait_lock.
>>>> +			 */
>>> What exactly are you trying to say here ?
>>>
>>> I'm thinking this is the pairing barrier to the smp_mb() below, with
>>> your list_empty() thing? Might make sense to write a single coherent
>>> comment and refer to the other location.
>> So what I'm trying to say here is that wake_up_process() ensures that the
>> owner, if in !TASK_RUNNING, sees the write to hold_ctx->wounded before the
>> transition to TASK_RUNNING. This was how I interpreted "woken up" in the
>> wake up process documentation.
> There is documentation!? :-) Aaah, you mean that kerneldoc comment with
> wake_up_process() ? Yeah, that needs fixing. /me puts on endless todo
> list.
>
> Anyway, wakeup providing that ordering isn't something that needs a
> comment of that size; and I think the only comment here is that we care
> about the ordering and a reference to the site(s) that pairs with it.
>
> Maybe something like:
>
> 	/*
> 	 * __ww_mutex_lock_check_stamp() will observe our wounded store.
> 	 */

Yes.

Actually, I just found the set_current_state() kerneldoc which explains 
the built-in barrier pairing with wake_up_xxx. Perhaps I also should 
mention that as well. Looks like the use WRITE_ONCE() and READ_ONCE() 
can be dropped as well.

>>>> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
>>>> +	if (likely(list_empty(&lock->base.wait_list)))
>>>>    		return;
>>>>    	/*
>>>> @@ -653,6 +695,17 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
>>>>    	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>>>>    	struct mutex_waiter *cur;
>>>> +	/*
>>>> +	 * If we miss a wounded == true here, we will have a pending
>>> Explain how we can miss that.
>> This is actually the pairing location of the wake_up_process() comment /
>> code discussed above. Here we should have !TASK_RUNNING, and let's say
>> ctx->wounded is set by another process immediately after we've read it (we
>> "miss" it). At that point there must be a pending wake-up-process() for us
>> and we'll pick up the set value of wounded on the next iteration after
>> returning from schedule().
> Right, so that's when the above wakeup isn't the one waking us.
>
>
>>> I can't say I'm a fan. I'm already cursing the ww_mutex stuff every time
>>> I have to look at it, and you just made it worse spagethi.
>> Well, I can't speak for the current ww implementation except I didn't think
>> it was too hard to understand for a first time reader.
>>
>> Admittedly the Wound-Wait path makes it worse since it's a preemptive
>> algorithm and we need to touch other processes a acquire contexts and worry
>> about ordering.
>>
>> So, assuming your review comments are fixed up, is that a solid NAK or do
>> you have any suggestion that would make you more comfortable with the code?
>> like splitting out ww-stuff to a separate file?
> Nah, not a NAK, but we should look at whan can be done to improve code.
> Maybe add a few more comments that explain why. Part of the problem with
> ww_mutex is always that I forget exactly how they work and mutex.c
> doesn't have much useful comments in (most of those are in ww_mutex.h
> and I always forget to look there).

Understood.

>
> Also; I'm not at all sure about the exact difference between what we
> have and what you propose. I did read the documentation part (I really
> should not have to) but it just doesn't jive.
>
> I suspect you're using preemption entirely different from what we
> usually call a preemption.

I think that perhaps requires a good understanding of the difference of 
the algorithms in question before looking at the implementation. I put a 
short explanation and some URLs to CS websites describing the two 
algorithms and their pros and cons in the patch series introductory 
message. I'll forward that.

In short, with Wait-Die (before the patch) it's the process _taking_ the 
contended lock that backs off if necessary. No preemption required. With 
Wound-Wait, it's the process _holding_ the contended lock that gets 
wounded (preempted), and it needs to back off at its own discretion but 
no later than when it's going to sleep on another ww mutex. That point 
is where we intercept the preemption request. We're preempting the 
transaction rather than the process.


>
>
> Also, __ww_ctx_stamp_after() is crap; did we want to write:
>
> 	return (signed long)(a->stamp - b->stamp) > 0;
>
> or something?
>
>
Hmm. Yes it indeed looks odd. Seems like the above code should do the trick.

/Thomas


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-13 14:05           ` Thomas Hellstrom
  (?)
@ 2018-06-14 10:51             ` Peter Zijlstra
  -1 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-14 10:51 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On Wed, Jun 13, 2018 at 04:05:43PM +0200, Thomas Hellstrom wrote:
> In short, with Wait-Die (before the patch) it's the process _taking_ the
> contended lock that backs off if necessary. No preemption required. With
> Wound-Wait, it's the process _holding_ the contended lock that gets wounded
> (preempted), and it needs to back off at its own discretion but no later
> than when it's going to sleep on another ww mutex. That point is where we
> intercept the preemption request. We're preempting the transaction rather
> than the process.

This:

  Wait-die:
    The newer transactions are killed when:
      It (= the newer transaction) makes a reqeust for a lock being held
      by an older transactions

  Wound-wait:
    The newer transactions are killed when:
      An older transaction makes a request for a lock being held by the
      newer transactions

Would make for an excellent comment somewhere. No talking about
preemption, although I think I know what you mean with it, that is not
how preemption is normally used.

In scheduling speak preemption is when we pick a runnable (but !running)
task to run instead of the current running task.  In this case however,
our T2 is blocked on a lock acquisition (one owned by our T1) and T1 is
the only runnable task. Only when T1's progress is inhibited by T2 (T1
wants a lock held by T2) do we wound/wake T2.

In any case, I had a little look at the current ww_mutex code and ended
up with the below patch that hopefully clarifies things a little.

---
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index f44f658ae629..a20c04619b2a 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -244,6 +244,10 @@ void __sched mutex_lock(struct mutex *lock)
 EXPORT_SYMBOL(mutex_lock);
 #endif
 
+/*
+ * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
+ * it.
+ */
 static __always_inline void
 ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 {
@@ -282,26 +286,36 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
 #endif
 	ww_ctx->acquired++;
+	lock->ctx = ctx;
 }
 
+/*
+ * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
+ * favour of @b.
+ */
 static inline bool __sched
 __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
 {
-	return a->stamp - b->stamp <= LONG_MAX &&
-	       (a->stamp != b->stamp || a > b);
+
+	return (signed long)(a->stamp - b->stamp) > 0;
 }
 
 /*
- * Wake up any waiters that may have to back off when the lock is held by the
- * given context.
+ * We just acquired @lock under @ww_ctx, if there are later contexts waiting
+ * behind us on the wait-list, wake them up so they can wound themselves.
  *
- * Due to the invariants on the wait list, this can only affect the first
- * waiter with a context.
+ * See __ww_mutex_add_waiter() for the list-order construction; basically the
+ * list is ordered by stamp smallest (oldest) first, so if there is a later
+ * (younger) stamp on the list behind us, wake it so it can wound itself.
+ *
+ * Because __ww_mutex_add_waiter() and __ww_mutex_check_stamp() wake any
+ * but the earliest context, this can only affect the first waiter (with a
+ * context).
  *
  * The current task must not be on the wait list.
  */
 static void __sched
-__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_wakeup_for_wound(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 {
 	struct mutex_waiter *cur;
 
@@ -322,16 +336,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 }
 
 /*
- * After acquiring lock with fastpath or when we lost out in contested
- * slowpath, set ctx and wake up any waiters so they can recheck.
+ * After acquiring lock with fastpath, where we do not hold wait_lock, set ctx
+ * and wake up any waiters so they can recheck.
  */
 static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
 	ww_mutex_lock_acquired(lock, ctx);
 
-	lock->ctx = ctx;
-
 	/*
 	 * The lock->ctx update should be visible on all cores before
 	 * the atomic read is done, otherwise contended waiters might be
@@ -352,25 +364,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	 * so they can see the new lock->ctx.
 	 */
 	spin_lock(&lock->base.wait_lock);
-	__ww_mutex_wakeup_for_backoff(&lock->base, ctx);
+	__ww_mutex_wakeup_for_wound(&lock->base, ctx);
 	spin_unlock(&lock->base.wait_lock);
 }
 
-/*
- * After acquiring lock in the slowpath set ctx.
- *
- * Unlike for the fast path, the caller ensures that waiters are woken up where
- * necessary.
- *
- * Callers must hold the mutex wait_lock.
- */
-static __always_inline void
-ww_mutex_set_context_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
-{
-	ww_mutex_lock_acquired(lock, ctx);
-	lock->ctx = ctx;
-}
-
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 
 static inline
@@ -646,20 +643,30 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 }
 EXPORT_SYMBOL(ww_mutex_unlock);
 
+/*
+ * Check the wound condition for the current lock acquire.  If we're trying to
+ * acquire a lock already held by an older context, wound ourselves.
+ *
+ * Since __ww_mutex_add_waiter() orders the wait-list on stamp, we only have to
+ * look at waiters before us in the wait-list.
+ */
 static inline int __sched
-__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
+__ww_mutex_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
 			    struct ww_acquire_ctx *ctx)
 {
 	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
 	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
 	struct mutex_waiter *cur;
 
+	if (ctx->acquired == 0)
+		return 0;
+
 	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
 		goto deadlock;
 
 	/*
 	 * If there is a waiter in front of us that has a context, then its
-	 * stamp is earlier than ours and we must back off.
+	 * stamp is earlier than ours and we must wound ourself.
 	 */
 	cur = waiter;
 	list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
@@ -677,6 +684,14 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
 	return -EDEADLK;
 }
 
+/*
+ * Add @waiter to the wait-list, keep the wait-list ordered by stamp, smallest
+ * first. Such that older contexts are preferred to acquire the lock over
+ * younger contexts.
+ *
+ * Furthermore, wound ourself immediately when possible (there are older
+ * contexts already waiting) to avoid unnecessary waiting.
+ */
 static inline int __sched
 __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		      struct mutex *lock,
@@ -700,8 +715,12 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		if (!cur->ww_ctx)
 			continue;
 
+		/*
+		 * If we find an older context waiting, there is no point in
+		 * queueing behind it, as we'd have to wound ourselves the
+		 * moment it would acquire the lock.
+		 */
 		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
-			/* Back off immediately if necessary. */
 			if (ww_ctx->acquired > 0) {
 #ifdef CONFIG_DEBUG_MUTEXES
 				struct ww_mutex *ww;
@@ -719,8 +738,9 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		pos = &cur->list;
 
 		/*
-		 * Wake up the waiter so that it gets a chance to back
-		 * off.
+		 * When we enqueued an older context, wake all younger
+		 * contexts such that they can wound themselves, see
+		 * __ww_mutex_check_stamp().
 		 */
 		if (cur->ww_ctx->acquired > 0) {
 			debug_mutex_wake_waiter(lock, cur);
@@ -772,7 +792,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	 */
 	if (__mutex_trylock(lock)) {
 		if (use_ww_ctx && ww_ctx)
-			__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+			__ww_mutex_wakeup_for_wound(lock, ww_ctx);
 
 		goto skip_wait;
 	}
@@ -790,10 +810,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		waiter.ww_ctx = MUTEX_POISON_WW_CTX;
 #endif
 	} else {
-		/* Add in stamp order, waking up waiters that must back off. */
+		/* Add in stamp order, waking up waiters that must wound themselves. */
 		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
 		if (ret)
-			goto err_early_backoff;
+			goto err_early_wound;
 
 		waiter.ww_ctx = ww_ctx;
 	}
@@ -824,8 +844,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			goto err;
 		}
 
-		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
-			ret = __ww_mutex_lock_check_stamp(lock, &waiter, ww_ctx);
+		if (use_ww_ctx && ww_ctx) {
+			ret = __ww_mutex_check_stamp(lock, &waiter, ww_ctx);
 			if (ret)
 				goto err;
 		}
@@ -870,7 +890,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_acquired(&lock->dep_map, ip);
 
 	if (use_ww_ctx && ww_ctx)
-		ww_mutex_set_context_slowpath(ww, ww_ctx);
+		ww_mutex_lock_acquired(ww, ww_ctx);
 
 	spin_unlock(&lock->wait_lock);
 	preempt_enable();
@@ -879,7 +899,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 err:
 	__set_current_state(TASK_RUNNING);
 	mutex_remove_waiter(lock, &waiter, current);
-err_early_backoff:
+err_early_wound:
 	spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, 1, ip);

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-14 10:51             ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-14 10:51 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On Wed, Jun 13, 2018 at 04:05:43PM +0200, Thomas Hellstrom wrote:
> In short, with Wait-Die (before the patch) it's the process _taking_ the
> contended lock that backs off if necessary. No preemption required. With
> Wound-Wait, it's the process _holding_ the contended lock that gets wounded
> (preempted), and it needs to back off at its own discretion but no later
> than when it's going to sleep on another ww mutex. That point is where we
> intercept the preemption request. We're preempting the transaction rather
> than the process.

This:

  Wait-die:
    The newer transactions are killed when:
      It (= the newer transaction) makes a reqeust for a lock being held
      by an older transactions

  Wound-wait:
    The newer transactions are killed when:
      An older transaction makes a request for a lock being held by the
      newer transactions

Would make for an excellent comment somewhere. No talking about
preemption, although I think I know what you mean with it, that is not
how preemption is normally used.

In scheduling speak preemption is when we pick a runnable (but !running)
task to run instead of the current running task.  In this case however,
our T2 is blocked on a lock acquisition (one owned by our T1) and T1 is
the only runnable task. Only when T1's progress is inhibited by T2 (T1
wants a lock held by T2) do we wound/wake T2.

In any case, I had a little look at the current ww_mutex code and ended
up with the below patch that hopefully clarifies things a little.

---
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index f44f658ae629..a20c04619b2a 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -244,6 +244,10 @@ void __sched mutex_lock(struct mutex *lock)
 EXPORT_SYMBOL(mutex_lock);
 #endif
 
+/*
+ * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
+ * it.
+ */
 static __always_inline void
 ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 {
@@ -282,26 +286,36 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
 #endif
 	ww_ctx->acquired++;
+	lock->ctx = ctx;
 }
 
+/*
+ * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
+ * favour of @b.
+ */
 static inline bool __sched
 __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
 {
-	return a->stamp - b->stamp <= LONG_MAX &&
-	       (a->stamp != b->stamp || a > b);
+
+	return (signed long)(a->stamp - b->stamp) > 0;
 }
 
 /*
- * Wake up any waiters that may have to back off when the lock is held by the
- * given context.
+ * We just acquired @lock under @ww_ctx, if there are later contexts waiting
+ * behind us on the wait-list, wake them up so they can wound themselves.
  *
- * Due to the invariants on the wait list, this can only affect the first
- * waiter with a context.
+ * See __ww_mutex_add_waiter() for the list-order construction; basically the
+ * list is ordered by stamp smallest (oldest) first, so if there is a later
+ * (younger) stamp on the list behind us, wake it so it can wound itself.
+ *
+ * Because __ww_mutex_add_waiter() and __ww_mutex_check_stamp() wake any
+ * but the earliest context, this can only affect the first waiter (with a
+ * context).
  *
  * The current task must not be on the wait list.
  */
 static void __sched
-__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_wakeup_for_wound(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 {
 	struct mutex_waiter *cur;
 
@@ -322,16 +336,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 }
 
 /*
- * After acquiring lock with fastpath or when we lost out in contested
- * slowpath, set ctx and wake up any waiters so they can recheck.
+ * After acquiring lock with fastpath, where we do not hold wait_lock, set ctx
+ * and wake up any waiters so they can recheck.
  */
 static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
 	ww_mutex_lock_acquired(lock, ctx);
 
-	lock->ctx = ctx;
-
 	/*
 	 * The lock->ctx update should be visible on all cores before
 	 * the atomic read is done, otherwise contended waiters might be
@@ -352,25 +364,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	 * so they can see the new lock->ctx.
 	 */
 	spin_lock(&lock->base.wait_lock);
-	__ww_mutex_wakeup_for_backoff(&lock->base, ctx);
+	__ww_mutex_wakeup_for_wound(&lock->base, ctx);
 	spin_unlock(&lock->base.wait_lock);
 }
 
-/*
- * After acquiring lock in the slowpath set ctx.
- *
- * Unlike for the fast path, the caller ensures that waiters are woken up where
- * necessary.
- *
- * Callers must hold the mutex wait_lock.
- */
-static __always_inline void
-ww_mutex_set_context_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
-{
-	ww_mutex_lock_acquired(lock, ctx);
-	lock->ctx = ctx;
-}
-
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 
 static inline
@@ -646,20 +643,30 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 }
 EXPORT_SYMBOL(ww_mutex_unlock);
 
+/*
+ * Check the wound condition for the current lock acquire.  If we're trying to
+ * acquire a lock already held by an older context, wound ourselves.
+ *
+ * Since __ww_mutex_add_waiter() orders the wait-list on stamp, we only have to
+ * look at waiters before us in the wait-list.
+ */
 static inline int __sched
-__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
+__ww_mutex_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
 			    struct ww_acquire_ctx *ctx)
 {
 	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
 	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
 	struct mutex_waiter *cur;
 
+	if (ctx->acquired == 0)
+		return 0;
+
 	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
 		goto deadlock;
 
 	/*
 	 * If there is a waiter in front of us that has a context, then its
-	 * stamp is earlier than ours and we must back off.
+	 * stamp is earlier than ours and we must wound ourself.
 	 */
 	cur = waiter;
 	list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
@@ -677,6 +684,14 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
 	return -EDEADLK;
 }
 
+/*
+ * Add @waiter to the wait-list, keep the wait-list ordered by stamp, smallest
+ * first. Such that older contexts are preferred to acquire the lock over
+ * younger contexts.
+ *
+ * Furthermore, wound ourself immediately when possible (there are older
+ * contexts already waiting) to avoid unnecessary waiting.
+ */
 static inline int __sched
 __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		      struct mutex *lock,
@@ -700,8 +715,12 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		if (!cur->ww_ctx)
 			continue;
 
+		/*
+		 * If we find an older context waiting, there is no point in
+		 * queueing behind it, as we'd have to wound ourselves the
+		 * moment it would acquire the lock.
+		 */
 		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
-			/* Back off immediately if necessary. */
 			if (ww_ctx->acquired > 0) {
 #ifdef CONFIG_DEBUG_MUTEXES
 				struct ww_mutex *ww;
@@ -719,8 +738,9 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		pos = &cur->list;
 
 		/*
-		 * Wake up the waiter so that it gets a chance to back
-		 * off.
+		 * When we enqueued an older context, wake all younger
+		 * contexts such that they can wound themselves, see
+		 * __ww_mutex_check_stamp().
 		 */
 		if (cur->ww_ctx->acquired > 0) {
 			debug_mutex_wake_waiter(lock, cur);
@@ -772,7 +792,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	 */
 	if (__mutex_trylock(lock)) {
 		if (use_ww_ctx && ww_ctx)
-			__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+			__ww_mutex_wakeup_for_wound(lock, ww_ctx);
 
 		goto skip_wait;
 	}
@@ -790,10 +810,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		waiter.ww_ctx = MUTEX_POISON_WW_CTX;
 #endif
 	} else {
-		/* Add in stamp order, waking up waiters that must back off. */
+		/* Add in stamp order, waking up waiters that must wound themselves. */
 		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
 		if (ret)
-			goto err_early_backoff;
+			goto err_early_wound;
 
 		waiter.ww_ctx = ww_ctx;
 	}
@@ -824,8 +844,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			goto err;
 		}
 
-		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
-			ret = __ww_mutex_lock_check_stamp(lock, &waiter, ww_ctx);
+		if (use_ww_ctx && ww_ctx) {
+			ret = __ww_mutex_check_stamp(lock, &waiter, ww_ctx);
 			if (ret)
 				goto err;
 		}
@@ -870,7 +890,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_acquired(&lock->dep_map, ip);
 
 	if (use_ww_ctx && ww_ctx)
-		ww_mutex_set_context_slowpath(ww, ww_ctx);
+		ww_mutex_lock_acquired(ww, ww_ctx);
 
 	spin_unlock(&lock->wait_lock);
 	preempt_enable();
@@ -879,7 +899,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 err:
 	__set_current_state(TASK_RUNNING);
 	mutex_remove_waiter(lock, &waiter, current);
-err_early_backoff:
+err_early_wound:
 	spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, 1, ip);
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-14 10:51             ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-14 10:51 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, David Airlie,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	Greg Kroah-Hartman, Ingo Molnar, Philippe Ombredanne,
	Thomas Gleixner, Paul E. McKenney, linux-media

On Wed, Jun 13, 2018 at 04:05:43PM +0200, Thomas Hellstrom wrote:
> In short, with Wait-Die (before the patch) it's the process _taking_ the
> contended lock that backs off if necessary. No preemption required. With
> Wound-Wait, it's the process _holding_ the contended lock that gets wounded
> (preempted), and it needs to back off at its own discretion but no later
> than when it's going to sleep on another ww mutex. That point is where we
> intercept the preemption request. We're preempting the transaction rather
> than the process.

This:

  Wait-die:
    The newer transactions are killed when:
      It (= the newer transaction) makes a reqeust for a lock being held
      by an older transactions

  Wound-wait:
    The newer transactions are killed when:
      An older transaction makes a request for a lock being held by the
      newer transactions

Would make for an excellent comment somewhere. No talking about
preemption, although I think I know what you mean with it, that is not
how preemption is normally used.

In scheduling speak preemption is when we pick a runnable (but !running)
task to run instead of the current running task.  In this case however,
our T2 is blocked on a lock acquisition (one owned by our T1) and T1 is
the only runnable task. Only when T1's progress is inhibited by T2 (T1
wants a lock held by T2) do we wound/wake T2.

In any case, I had a little look at the current ww_mutex code and ended
up with the below patch that hopefully clarifies things a little.

---
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index f44f658ae629..a20c04619b2a 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -244,6 +244,10 @@ void __sched mutex_lock(struct mutex *lock)
 EXPORT_SYMBOL(mutex_lock);
 #endif
 
+/*
+ * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
+ * it.
+ */
 static __always_inline void
 ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 {
@@ -282,26 +286,36 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
 #endif
 	ww_ctx->acquired++;
+	lock->ctx = ctx;
 }
 
+/*
+ * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
+ * favour of @b.
+ */
 static inline bool __sched
 __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
 {
-	return a->stamp - b->stamp <= LONG_MAX &&
-	       (a->stamp != b->stamp || a > b);
+
+	return (signed long)(a->stamp - b->stamp) > 0;
 }
 
 /*
- * Wake up any waiters that may have to back off when the lock is held by the
- * given context.
+ * We just acquired @lock under @ww_ctx, if there are later contexts waiting
+ * behind us on the wait-list, wake them up so they can wound themselves.
  *
- * Due to the invariants on the wait list, this can only affect the first
- * waiter with a context.
+ * See __ww_mutex_add_waiter() for the list-order construction; basically the
+ * list is ordered by stamp smallest (oldest) first, so if there is a later
+ * (younger) stamp on the list behind us, wake it so it can wound itself.
+ *
+ * Because __ww_mutex_add_waiter() and __ww_mutex_check_stamp() wake any
+ * but the earliest context, this can only affect the first waiter (with a
+ * context).
  *
  * The current task must not be on the wait list.
  */
 static void __sched
-__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_wakeup_for_wound(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 {
 	struct mutex_waiter *cur;
 
@@ -322,16 +336,14 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 }
 
 /*
- * After acquiring lock with fastpath or when we lost out in contested
- * slowpath, set ctx and wake up any waiters so they can recheck.
+ * After acquiring lock with fastpath, where we do not hold wait_lock, set ctx
+ * and wake up any waiters so they can recheck.
  */
 static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
 	ww_mutex_lock_acquired(lock, ctx);
 
-	lock->ctx = ctx;
-
 	/*
 	 * The lock->ctx update should be visible on all cores before
 	 * the atomic read is done, otherwise contended waiters might be
@@ -352,25 +364,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 	 * so they can see the new lock->ctx.
 	 */
 	spin_lock(&lock->base.wait_lock);
-	__ww_mutex_wakeup_for_backoff(&lock->base, ctx);
+	__ww_mutex_wakeup_for_wound(&lock->base, ctx);
 	spin_unlock(&lock->base.wait_lock);
 }
 
-/*
- * After acquiring lock in the slowpath set ctx.
- *
- * Unlike for the fast path, the caller ensures that waiters are woken up where
- * necessary.
- *
- * Callers must hold the mutex wait_lock.
- */
-static __always_inline void
-ww_mutex_set_context_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
-{
-	ww_mutex_lock_acquired(lock, ctx);
-	lock->ctx = ctx;
-}
-
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 
 static inline
@@ -646,20 +643,30 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 }
 EXPORT_SYMBOL(ww_mutex_unlock);
 
+/*
+ * Check the wound condition for the current lock acquire.  If we're trying to
+ * acquire a lock already held by an older context, wound ourselves.
+ *
+ * Since __ww_mutex_add_waiter() orders the wait-list on stamp, we only have to
+ * look at waiters before us in the wait-list.
+ */
 static inline int __sched
-__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
+__ww_mutex_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
 			    struct ww_acquire_ctx *ctx)
 {
 	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
 	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
 	struct mutex_waiter *cur;
 
+	if (ctx->acquired == 0)
+		return 0;
+
 	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
 		goto deadlock;
 
 	/*
 	 * If there is a waiter in front of us that has a context, then its
-	 * stamp is earlier than ours and we must back off.
+	 * stamp is earlier than ours and we must wound ourself.
 	 */
 	cur = waiter;
 	list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
@@ -677,6 +684,14 @@ __ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
 	return -EDEADLK;
 }
 
+/*
+ * Add @waiter to the wait-list, keep the wait-list ordered by stamp, smallest
+ * first. Such that older contexts are preferred to acquire the lock over
+ * younger contexts.
+ *
+ * Furthermore, wound ourself immediately when possible (there are older
+ * contexts already waiting) to avoid unnecessary waiting.
+ */
 static inline int __sched
 __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		      struct mutex *lock,
@@ -700,8 +715,12 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		if (!cur->ww_ctx)
 			continue;
 
+		/*
+		 * If we find an older context waiting, there is no point in
+		 * queueing behind it, as we'd have to wound ourselves the
+		 * moment it would acquire the lock.
+		 */
 		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
-			/* Back off immediately if necessary. */
 			if (ww_ctx->acquired > 0) {
 #ifdef CONFIG_DEBUG_MUTEXES
 				struct ww_mutex *ww;
@@ -719,8 +738,9 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		pos = &cur->list;
 
 		/*
-		 * Wake up the waiter so that it gets a chance to back
-		 * off.
+		 * When we enqueued an older context, wake all younger
+		 * contexts such that they can wound themselves, see
+		 * __ww_mutex_check_stamp().
 		 */
 		if (cur->ww_ctx->acquired > 0) {
 			debug_mutex_wake_waiter(lock, cur);
@@ -772,7 +792,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	 */
 	if (__mutex_trylock(lock)) {
 		if (use_ww_ctx && ww_ctx)
-			__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+			__ww_mutex_wakeup_for_wound(lock, ww_ctx);
 
 		goto skip_wait;
 	}
@@ -790,10 +810,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		waiter.ww_ctx = MUTEX_POISON_WW_CTX;
 #endif
 	} else {
-		/* Add in stamp order, waking up waiters that must back off. */
+		/* Add in stamp order, waking up waiters that must wound themselves. */
 		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
 		if (ret)
-			goto err_early_backoff;
+			goto err_early_wound;
 
 		waiter.ww_ctx = ww_ctx;
 	}
@@ -824,8 +844,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			goto err;
 		}
 
-		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
-			ret = __ww_mutex_lock_check_stamp(lock, &waiter, ww_ctx);
+		if (use_ww_ctx && ww_ctx) {
+			ret = __ww_mutex_check_stamp(lock, &waiter, ww_ctx);
 			if (ret)
 				goto err;
 		}
@@ -870,7 +890,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_acquired(&lock->dep_map, ip);
 
 	if (use_ww_ctx && ww_ctx)
-		ww_mutex_set_context_slowpath(ww, ww_ctx);
+		ww_mutex_lock_acquired(ww, ww_ctx);
 
 	spin_unlock(&lock->wait_lock);
 	preempt_enable();
@@ -879,7 +899,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 err:
 	__set_current_state(TASK_RUNNING);
 	mutex_remove_waiter(lock, &waiter, current);
-err_early_backoff:
+err_early_wound:
 	spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, 1, ip);
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-14 10:51             ` Peter Zijlstra
  (?)
@ 2018-06-14 11:48               ` Thomas Hellstrom
  -1 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-14 11:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On 06/14/2018 12:51 PM, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 04:05:43PM +0200, Thomas Hellstrom wrote:
>> In short, with Wait-Die (before the patch) it's the process _taking_ the
>> contended lock that backs off if necessary. No preemption required. With
>> Wound-Wait, it's the process _holding_ the contended lock that gets wounded
>> (preempted), and it needs to back off at its own discretion but no later
>> than when it's going to sleep on another ww mutex. That point is where we
>> intercept the preemption request. We're preempting the transaction rather
>> than the process.
> This:
>
>    Wait-die:
>      The newer transactions are killed when:
>        It (= the newer transaction) makes a reqeust for a lock being held
>        by an older transactions
>
>    Wound-wait:
>      The newer transactions are killed when:
>        An older transaction makes a request for a lock being held by the
>        newer transactions
>
> Would make for an excellent comment somewhere. No talking about
> preemption, although I think I know what you mean with it, that is not
> how preemption is normally used.

Ok. I'll incorporate something along this line. Unfortunately that last 
statement is not fully true. It should read
"The newer transactions are wounded when:", not "killed" when.

The literature makes a distinction between "killed" and "wounded". In 
our context, "Killed" is when a transaction actually receives an 
-EDEADLK and needs to back off. "Wounded" is when someone (typically 
another transaction) requests a transaction to kill itself. A wound will 
often, but not always, lead to a kill. If the wounded transaction has 
finished its locking sequence, or has the opportunity to grab 
uncontended ww mutexes or steal contended (non-handoff) ww mutexes to 
finish its transaction it will do so and never kill itself.



>
> In scheduling speak preemption is when we pick a runnable (but !running)
> task to run instead of the current running task.  In this case however,
> our T2 is blocked on a lock acquisition (one owned by our T1) and T1 is
> the only runnable task. Only when T1's progress is inhibited by T2 (T1
> wants a lock held by T2) do we wound/wake T2.

Indeed. The preemption spoken about in the Wound-Wait litterature means 
that a transaction preempts another transaction when it wounds it. In 
distributed computing my understanding is that the preempted transaction 
is aborted instantly and restarted after a random delay. Of course, we 
have no means of mapping wounding to process preemption in the linux 
kernel, so that's why I referred to it as "lazy preemption". In process 
analogy "wounded" wound roughly correspond to (need_resched() == true), 
and returning -EDEADLK would correspond to voluntary preemption.



>
> In any case, I had a little look at the current ww_mutex code and ended
> up with the below patch that hopefully clarifies things a little.
>
> ---
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index f44f658ae629..a20c04619b2a 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -244,6 +244,10 @@ void __sched mutex_lock(struct mutex *lock)
>   EXPORT_SYMBOL(mutex_lock);
>   #endif
>   
> +/*
> + * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
> + * it.
> + */

IMO use of "acquire_context" or "context" is a little unfortunate when 
the literature uses "transaction",
but otherwise fine.


>   static __always_inline void
>   ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   {
> @@ -282,26 +286,36 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
>   #endif
>   	ww_ctx->acquired++;
> +	lock->ctx = ctx;
>   }
>   
> +/*
> + * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
> + * favour of @b.
> + */

So "wounded" should never really be used with Wait-Die
"Determine whether context @a represents a younger transaction than 
context @b"?

>   static inline bool __sched
>   __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>   {
> -	return a->stamp - b->stamp <= LONG_MAX &&
> -	       (a->stamp != b->stamp || a > b);
> +
> +	return (signed long)(a->stamp - b->stamp) > 0;
>   }
>   
>   /*
> - * Wake up any waiters that may have to back off when the lock is held by the
> - * given context.
> + * We just acquired @lock under @ww_ctx, if there are later contexts waiting
> + * behind us on the wait-list, wake them up so they can wound themselves.

Actually for Wait-Die, Back off or "Die" is the correct terminology.

>    *
> - * Due to the invariants on the wait list, this can only affect the first
> - * waiter with a context.
> + * See __ww_mutex_add_waiter() for the list-order construction; basically the
> + * list is ordered by stamp smallest (oldest) first, so if there is a later
> + * (younger) stamp on the list behind us, wake it so it can wound itself.
> + *
> + * Because __ww_mutex_add_waiter() and __ww_mutex_check_stamp() wake any
> + * but the earliest context, this can only affect the first waiter (with a
> + * context).

The wait list invariants are stated in 
Documentation/locking/ww-mutex-design.txt.
Perhaps we could copy those into the code to make the comment more 
understandable:
"  We maintain the following invariants for the wait list:
   (1) Waiters with an acquire context are sorted by stamp order; waiters
       without an acquire context are interspersed in FIFO order.
   (2) For Wait-Die, among waiters with contexts, only the first one can 
have
       other locks acquired already (ctx->acquired > 0). Note that this 
waiter
       may come after other waiters without contexts in the list."

>    *
>    * The current task must not be on the wait list.
>    */
>   static void __sched
> -__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> +__ww_mutex_wakeup_for_wound(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)

Again, "wound" is unsuitable for Wait-Die. + numerous additional places.

Thanks,
Thomas





^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-14 11:48               ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-14 11:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On 06/14/2018 12:51 PM, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 04:05:43PM +0200, Thomas Hellstrom wrote:
>> In short, with Wait-Die (before the patch) it's the process _taking_ the
>> contended lock that backs off if necessary. No preemption required. With
>> Wound-Wait, it's the process _holding_ the contended lock that gets wounded
>> (preempted), and it needs to back off at its own discretion but no later
>> than when it's going to sleep on another ww mutex. That point is where we
>> intercept the preemption request. We're preempting the transaction rather
>> than the process.
> This:
>
>    Wait-die:
>      The newer transactions are killed when:
>        It (= the newer transaction) makes a reqeust for a lock being held
>        by an older transactions
>
>    Wound-wait:
>      The newer transactions are killed when:
>        An older transaction makes a request for a lock being held by the
>        newer transactions
>
> Would make for an excellent comment somewhere. No talking about
> preemption, although I think I know what you mean with it, that is not
> how preemption is normally used.

Ok. I'll incorporate something along this line. Unfortunately that last 
statement is not fully true. It should read
"The newer transactions are wounded when:", not "killed" when.

The literature makes a distinction between "killed" and "wounded". In 
our context, "Killed" is when a transaction actually receives an 
-EDEADLK and needs to back off. "Wounded" is when someone (typically 
another transaction) requests a transaction to kill itself. A wound will 
often, but not always, lead to a kill. If the wounded transaction has 
finished its locking sequence, or has the opportunity to grab 
uncontended ww mutexes or steal contended (non-handoff) ww mutexes to 
finish its transaction it will do so and never kill itself.



>
> In scheduling speak preemption is when we pick a runnable (but !running)
> task to run instead of the current running task.  In this case however,
> our T2 is blocked on a lock acquisition (one owned by our T1) and T1 is
> the only runnable task. Only when T1's progress is inhibited by T2 (T1
> wants a lock held by T2) do we wound/wake T2.

Indeed. The preemption spoken about in the Wound-Wait litterature means 
that a transaction preempts another transaction when it wounds it. In 
distributed computing my understanding is that the preempted transaction 
is aborted instantly and restarted after a random delay. Of course, we 
have no means of mapping wounding to process preemption in the linux 
kernel, so that's why I referred to it as "lazy preemption". In process 
analogy "wounded" wound roughly correspond to (need_resched() == true), 
and returning -EDEADLK would correspond to voluntary preemption.



>
> In any case, I had a little look at the current ww_mutex code and ended
> up with the below patch that hopefully clarifies things a little.
>
> ---
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index f44f658ae629..a20c04619b2a 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -244,6 +244,10 @@ void __sched mutex_lock(struct mutex *lock)
>   EXPORT_SYMBOL(mutex_lock);
>   #endif
>   
> +/*
> + * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
> + * it.
> + */

IMO use of "acquire_context" or "context" is a little unfortunate when 
the literature uses "transaction",
but otherwise fine.


>   static __always_inline void
>   ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   {
> @@ -282,26 +286,36 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
>   #endif
>   	ww_ctx->acquired++;
> +	lock->ctx = ctx;
>   }
>   
> +/*
> + * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
> + * favour of @b.
> + */

So "wounded" should never really be used with Wait-Die
"Determine whether context @a represents a younger transaction than 
context @b"?

>   static inline bool __sched
>   __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>   {
> -	return a->stamp - b->stamp <= LONG_MAX &&
> -	       (a->stamp != b->stamp || a > b);
> +
> +	return (signed long)(a->stamp - b->stamp) > 0;
>   }
>   
>   /*
> - * Wake up any waiters that may have to back off when the lock is held by the
> - * given context.
> + * We just acquired @lock under @ww_ctx, if there are later contexts waiting
> + * behind us on the wait-list, wake them up so they can wound themselves.

Actually for Wait-Die, Back off or "Die" is the correct terminology.

>    *
> - * Due to the invariants on the wait list, this can only affect the first
> - * waiter with a context.
> + * See __ww_mutex_add_waiter() for the list-order construction; basically the
> + * list is ordered by stamp smallest (oldest) first, so if there is a later
> + * (younger) stamp on the list behind us, wake it so it can wound itself.
> + *
> + * Because __ww_mutex_add_waiter() and __ww_mutex_check_stamp() wake any
> + * but the earliest context, this can only affect the first waiter (with a
> + * context).

The wait list invariants are stated in 
Documentation/locking/ww-mutex-design.txt.
Perhaps we could copy those into the code to make the comment more 
understandable:
"  We maintain the following invariants for the wait list:
   (1) Waiters with an acquire context are sorted by stamp order; waiters
       without an acquire context are interspersed in FIFO order.
   (2) For Wait-Die, among waiters with contexts, only the first one can 
have
       other locks acquired already (ctx->acquired > 0). Note that this 
waiter
       may come after other waiters without contexts in the list."

>    *
>    * The current task must not be on the wait list.
>    */
>   static void __sched
> -__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> +__ww_mutex_wakeup_for_wound(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)

Again, "wound" is unsuitable for Wait-Die. + numerous additional places.

Thanks,
Thomas




--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-14 11:48               ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-14 11:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, David Airlie,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	Greg Kroah-Hartman, Ingo Molnar, Philippe Ombredanne,
	Thomas Gleixner, Paul E. McKenney, linux-media

On 06/14/2018 12:51 PM, Peter Zijlstra wrote:
> On Wed, Jun 13, 2018 at 04:05:43PM +0200, Thomas Hellstrom wrote:
>> In short, with Wait-Die (before the patch) it's the process _taking_ the
>> contended lock that backs off if necessary. No preemption required. With
>> Wound-Wait, it's the process _holding_ the contended lock that gets wounded
>> (preempted), and it needs to back off at its own discretion but no later
>> than when it's going to sleep on another ww mutex. That point is where we
>> intercept the preemption request. We're preempting the transaction rather
>> than the process.
> This:
>
>    Wait-die:
>      The newer transactions are killed when:
>        It (= the newer transaction) makes a reqeust for a lock being held
>        by an older transactions
>
>    Wound-wait:
>      The newer transactions are killed when:
>        An older transaction makes a request for a lock being held by the
>        newer transactions
>
> Would make for an excellent comment somewhere. No talking about
> preemption, although I think I know what you mean with it, that is not
> how preemption is normally used.

Ok. I'll incorporate something along this line. Unfortunately that last 
statement is not fully true. It should read
"The newer transactions are wounded when:", not "killed" when.

The literature makes a distinction between "killed" and "wounded". In 
our context, "Killed" is when a transaction actually receives an 
-EDEADLK and needs to back off. "Wounded" is when someone (typically 
another transaction) requests a transaction to kill itself. A wound will 
often, but not always, lead to a kill. If the wounded transaction has 
finished its locking sequence, or has the opportunity to grab 
uncontended ww mutexes or steal contended (non-handoff) ww mutexes to 
finish its transaction it will do so and never kill itself.



>
> In scheduling speak preemption is when we pick a runnable (but !running)
> task to run instead of the current running task.  In this case however,
> our T2 is blocked on a lock acquisition (one owned by our T1) and T1 is
> the only runnable task. Only when T1's progress is inhibited by T2 (T1
> wants a lock held by T2) do we wound/wake T2.

Indeed. The preemption spoken about in the Wound-Wait litterature means 
that a transaction preempts another transaction when it wounds it. In 
distributed computing my understanding is that the preempted transaction 
is aborted instantly and restarted after a random delay. Of course, we 
have no means of mapping wounding to process preemption in the linux 
kernel, so that's why I referred to it as "lazy preemption". In process 
analogy "wounded" wound roughly correspond to (need_resched() == true), 
and returning -EDEADLK would correspond to voluntary preemption.



>
> In any case, I had a little look at the current ww_mutex code and ended
> up with the below patch that hopefully clarifies things a little.
>
> ---
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index f44f658ae629..a20c04619b2a 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -244,6 +244,10 @@ void __sched mutex_lock(struct mutex *lock)
>   EXPORT_SYMBOL(mutex_lock);
>   #endif
>   
> +/*
> + * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
> + * it.
> + */

IMO use of "acquire_context" or "context" is a little unfortunate when 
the literature uses "transaction",
but otherwise fine.


>   static __always_inline void
>   ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   {
> @@ -282,26 +286,36 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
>   #endif
>   	ww_ctx->acquired++;
> +	lock->ctx = ctx;
>   }
>   
> +/*
> + * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
> + * favour of @b.
> + */

So "wounded" should never really be used with Wait-Die
"Determine whether context @a represents a younger transaction than 
context @b"?

>   static inline bool __sched
>   __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>   {
> -	return a->stamp - b->stamp <= LONG_MAX &&
> -	       (a->stamp != b->stamp || a > b);
> +
> +	return (signed long)(a->stamp - b->stamp) > 0;
>   }
>   
>   /*
> - * Wake up any waiters that may have to back off when the lock is held by the
> - * given context.
> + * We just acquired @lock under @ww_ctx, if there are later contexts waiting
> + * behind us on the wait-list, wake them up so they can wound themselves.

Actually for Wait-Die, Back off or "Die" is the correct terminology.

>    *
> - * Due to the invariants on the wait list, this can only affect the first
> - * waiter with a context.
> + * See __ww_mutex_add_waiter() for the list-order construction; basically the
> + * list is ordered by stamp smallest (oldest) first, so if there is a later
> + * (younger) stamp on the list behind us, wake it so it can wound itself.
> + *
> + * Because __ww_mutex_add_waiter() and __ww_mutex_check_stamp() wake any
> + * but the earliest context, this can only affect the first waiter (with a
> + * context).

The wait list invariants are stated in 
Documentation/locking/ww-mutex-design.txt.
Perhaps we could copy those into the code to make the comment more 
understandable:
"  We maintain the following invariants for the wait list:
   (1) Waiters with an acquire context are sorted by stamp order; waiters
       without an acquire context are interspersed in FIFO order.
   (2) For Wait-Die, among waiters with contexts, only the first one can 
have
       other locks acquired already (ctx->acquired > 0). Note that this 
waiter
       may come after other waiters without contexts in the list."

>    *
>    * The current task must not be on the wait list.
>    */
>   static void __sched
> -__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> +__ww_mutex_wakeup_for_wound(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)

Again, "wound" is unsuitable for Wait-Die. + numerous additional places.

Thanks,
Thomas




_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-14 11:48               ` Thomas Hellstrom
  (?)
@ 2018-06-14 14:42                 ` Peter Zijlstra
  -1 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-14 14:42 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On Thu, Jun 14, 2018 at 01:48:39PM +0200, Thomas Hellstrom wrote:
> The literature makes a distinction between "killed" and "wounded". In our
> context, "Killed" is when a transaction actually receives an -EDEADLK and
> needs to back off. "Wounded" is when someone (typically another transaction)
> requests a transaction to kill itself. A wound will often, but not always,
> lead to a kill. If the wounded transaction has finished its locking
> sequence, or has the opportunity to grab uncontended ww mutexes or steal
> contended (non-handoff) ww mutexes to finish its transaction it will do so
> and never kill itself.

Hopefully I got it all right this time; I folded your patch in and
mucked around with it a bit, but haven't done anything except compile
it.

I left the context/transaction thing because well, that's what we called
the thing.


diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 39fda195bf78..50ef5a10cfa0 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -8,6 +8,8 @@
  *
  * Wound/wait implementation:
  *  Copyright (C) 2013 Canonical Ltd.
+ * Choice of algorithm:
+ *  Copyright (C) 2018 WMWare Inc.
  *
  * This file contains the main data structure and API definitions.
  */
@@ -23,14 +25,17 @@ struct ww_class {
 	struct lock_class_key mutex_key;
 	const char *acquire_name;
 	const char *mutex_name;
+	unsigned int is_wait_die;
 };
 
 struct ww_acquire_ctx {
 	struct task_struct *task;
 	unsigned long stamp;
-	unsigned acquired;
+	unsigned int acquired;
+	unsigned short wounded;
+	unsigned short is_wait_die;
 #ifdef CONFIG_DEBUG_MUTEXES
-	unsigned done_acquire;
+	unsigned int done_acquire;
 	struct ww_class *ww_class;
 	struct ww_mutex *contending_lock;
 #endif
@@ -38,8 +43,8 @@ struct ww_acquire_ctx {
 	struct lockdep_map dep_map;
 #endif
 #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
-	unsigned deadlock_inject_interval;
-	unsigned deadlock_inject_countdown;
+	unsigned int deadlock_inject_interval;
+	unsigned int deadlock_inject_countdown;
 #endif
 };
 
@@ -58,17 +63,21 @@ struct ww_mutex {
 # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
 #endif
 
-#define __WW_CLASS_INITIALIZER(ww_class) \
+#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die)	    \
 		{ .stamp = ATOMIC_LONG_INIT(0) \
 		, .acquire_name = #ww_class "_acquire" \
-		, .mutex_name = #ww_class "_mutex" }
+		, .mutex_name = #ww_class "_mutex" \
+		, .is_wait_die = _is_wait_die }
 
 #define __WW_MUTEX_INITIALIZER(lockname, class) \
 		{ .base =  __MUTEX_INITIALIZER(lockname.base) \
 		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
 
+#define DEFINE_WD_CLASS(classname) \
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
+
 #define DEFINE_WW_CLASS(classname) \
-	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
 
 #define DEFINE_WW_MUTEX(mutexname, ww_class) \
 	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
@@ -123,6 +132,8 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
 	ctx->task = current;
 	ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
 	ctx->acquired = 0;
+	ctx->wounded = false;
+	ctx->is_wait_die = ww_class->is_wait_die;
 #ifdef CONFIG_DEBUG_MUTEXES
 	ctx->ww_class = ww_class;
 	ctx->done_acquire = 0;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index f44f658ae629..9e244af4647d 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -244,6 +244,22 @@ void __sched mutex_lock(struct mutex *lock)
 EXPORT_SYMBOL(mutex_lock);
 #endif
 
+/*
+ * Wait-Die:
+ *   The newer transactions are killed when:
+ *     It (the new transaction) makes a request for a lock being held
+ *     by an older transaction.
+ *
+ * Wound-Wait:
+ *   The newer transactions are wounded when:
+ *     An older transaction makes a request for a lock being held by
+ *     the newer transaction.
+ */
+
+/*
+ * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
+ * it.
+ */
 static __always_inline void
 ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 {
@@ -282,26 +298,96 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
 #endif
 	ww_ctx->acquired++;
+	ww->ctx = ww_ctx;
 }
 
+/*
+ * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
+ * favour of @b.
+ */
 static inline bool __sched
 __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
 {
-	return a->stamp - b->stamp <= LONG_MAX &&
-	       (a->stamp != b->stamp || a > b);
+
+	return (signed long)(a->stamp - b->stamp) > 0;
 }
 
 /*
- * Wake up any waiters that may have to back off when the lock is held by the
- * given context.
+ * Wait-Die; wake a younger waiter context (when locks held) such that it can die.
  *
- * Due to the invariants on the wait list, this can only affect the first
- * waiter with a context.
+ * Among waiters with context, only the first one can have other locks acquired
+ * already (ctx->acquired > 0), because __ww_mutex_add_waiter() and
+ * __ww_mutex_check_wound() wake any but the earliest context.
+ */
+static bool __ww_mutex_die(struct mutex *lock, struct mutex_waiter *waiter,
+		           struct ww_acquire_ctx *ww_ctx)
+{
+	if (!ww_ctx->is_wait_die)
+		return false;
+
+	if (waiter->ww_ctx->acquired > 0 &&
+			__ww_ctx_stamp_after(waiter->ww_ctx, ww_ctx)) {
+		debug_mutex_wake_waiter(lock, waiter);
+		wake_up_process(waiter->task);
+	}
+
+	return true;
+}
+
+/*
+ * Wound-Wait; wound a younger @hold_ctx (if it has locks held).
+ *
+ * XXX more; explain why we too only need to wake the first.
+ */
+static bool __ww_mutex_wound(struct mutex *lock,
+			     struct ww_acquire_ctx *ww_ctx,
+			     struct ww_acquire_ctx *hold_ctx)
+{
+	struct task_struct *owner = __mutex_owner(lock);
+
+	lockdep_assert_held(&lock->wait_lock);
+
+	/*
+	 * Possible through __ww_mutex_add_waiter() when we race with
+	 * ww_mutex_set_context_fastpath(). In that case we'll get here again
+	 * through __ww_mutex_check_waiters().
+	 */
+	if (!hold_ctx)
+		return false;
+
+	/*
+	 * Can have !owner because of __mutex_unlock_slowpath(), but if owner,
+	 * it cannot go away because we'll have FLAG_WAITERS set and hold
+	 * wait_lock.
+	 */
+	if (!owner)
+		return false;
+
+	if (ww_ctx->acquired > 0 && __ww_ctx_stamp_after(hold_ctx, ww_ctx)) {
+		hold_ctx->wounded = 1;
+		if (owner != current)
+			wake_up_process(owner);
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * We just acquired @lock under @ww_ctx, if there are later contexts waiting
+ * behind us on the wait-list, check if they need wounding/killing.
+ *
+ * See __ww_mutex_add_waiter() for the list-order construction; basically the
+ * list is ordered by stamp, smallest (oldest) first.
+ *
+ * This relies on never mixing wait-die/wound-wait on the same wait-list; which is
+ * currently ensured by that being a ww_class property.
  *
  * The current task must not be on the wait list.
  */
 static void __sched
-__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_check_waiters(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 {
 	struct mutex_waiter *cur;
 
@@ -311,66 +397,50 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 		if (!cur->ww_ctx)
 			continue;
 
-		if (cur->ww_ctx->acquired > 0 &&
-		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
-			debug_mutex_wake_waiter(lock, cur);
-			wake_up_process(cur->task);
-		}
-
-		break;
+		if (__ww_mutex_die(lock, cur, ww_ctx) ||
+		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+			break;
 	}
 }
 
 /*
- * After acquiring lock with fastpath or when we lost out in contested
- * slowpath, set ctx and wake up any waiters so they can recheck.
+ * After acquiring lock with fastpath, where we do not hold wait_lock, set ctx
+ * and wake up any waiters so they can recheck.
  */
 static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
 	ww_mutex_lock_acquired(lock, ctx);
 
-	lock->ctx = ctx;
-
 	/*
 	 * The lock->ctx update should be visible on all cores before
-	 * the atomic read is done, otherwise contended waiters might be
+	 * the list_empty check is done, otherwise contended waiters might be
 	 * missed. The contended waiters will either see ww_ctx == NULL
 	 * and keep spinning, or it will acquire wait_lock, add itself
 	 * to waiter list and sleep.
 	 */
-	smp_mb(); /* ^^^ */
+	smp_mb(); /* See comments above and below. */
 
 	/*
-	 * Check if lock is contended, if not there is nobody to wake up
+	 * [W] ww->ctx = ctx	[W] list_add_tail()
+	 *     MB		    MB
+	 * [R] list_empty()	[R] ww->ctx
+	 *
+	 * The memory barrier above pairs with the memory barrier in
+	 * __ww_mutex_add_waiter() and makes sure we either observe ww->ctx
+	 * and/or !empty list.
 	 */
-	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
+	if (likely(list_empty(&lock->base.wait_list)))
 		return;
 
 	/*
-	 * Uh oh, we raced in fastpath, wake up everyone in this case,
-	 * so they can see the new lock->ctx.
+	 * Uh oh, we raced in fastpath, check if any of the waiters need wounding.
 	 */
 	spin_lock(&lock->base.wait_lock);
-	__ww_mutex_wakeup_for_backoff(&lock->base, ctx);
+	__ww_mutex_check_waiters(&lock->base, ctx);
 	spin_unlock(&lock->base.wait_lock);
 }
 
-/*
- * After acquiring lock in the slowpath set ctx.
- *
- * Unlike for the fast path, the caller ensures that waiters are woken up where
- * necessary.
- *
- * Callers must hold the mutex wait_lock.
- */
-static __always_inline void
-ww_mutex_set_context_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
-{
-	ww_mutex_lock_acquired(lock, ctx);
-	lock->ctx = ctx;
-}
-
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 
 static inline
@@ -646,37 +716,83 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 }
 EXPORT_SYMBOL(ww_mutex_unlock);
 
+
+static __always_inline int __sched
+__ww_mutex_kill(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+{
+	if (ww_ctx->acquired > 0) {
+#ifdef CONFIG_DEBUG_MUTEXES
+		struct ww_mutex *ww;
+
+		ww = container_of(lock, struct ww_mutex, base);
+		DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
+		ww_ctx->contending_lock = ww;
+#endif
+		return -EDEADLK;
+	}
+
+	return 0;
+}
+
+
+/*
+ * Check the wound condition for the current lock acquire.
+ *
+ * Wound-Wait: If we're wounded, kill ourself.
+ *
+ * Wait-Die: If we're trying to acquire a lock already held by an older
+ *           context, kill ourselves.
+ *
+ * Since __ww_mutex_add_waiter() orders the wait-list on stamp, we only have to
+ * look at waiters before us in the wait-list.
+ */
 static inline int __sched
-__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
+__ww_mutex_check_wound(struct mutex *lock, struct mutex_waiter *waiter,
 			    struct ww_acquire_ctx *ctx)
 {
 	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
 	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
 	struct mutex_waiter *cur;
 
+	if (ctx->acquired == 0)
+		return 0;
+
+	if (!ctx->is_wait_die) {
+		if (ctx->wounded)
+			return __ww_mutex_kill(lock, ctx);
+
+		return 0;
+	}
+
 	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
-		goto deadlock;
+		return __ww_mutex_kill(lock, ctx);
 
 	/*
 	 * If there is a waiter in front of us that has a context, then its
-	 * stamp is earlier than ours and we must back off.
+	 * stamp is earlier than ours and we must wound ourself.
 	 */
 	cur = waiter;
 	list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
-		if (cur->ww_ctx)
-			goto deadlock;
+		if (!cur->ww_ctx)
+			continue;
+
+		return __ww_mutex_kill(lock, ctx);
 	}
 
 	return 0;
-
-deadlock:
-#ifdef CONFIG_DEBUG_MUTEXES
-	DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
-	ctx->contending_lock = ww;
-#endif
-	return -EDEADLK;
 }
 
+/*
+ * Add @waiter to the wait-list, keep the wait-list ordered by stamp, smallest
+ * first. Such that older contexts are preferred to acquire the lock over
+ * younger contexts.
+ *
+ * Waiters without context are interspersed in FIFO order.
+ *
+ * Furthermore, for Wait-Die kill ourself immediately when possible (there are
+ * older contexts already waiting) to avoid unnecessary waiting and for
+ * Wound-Wait ensure we wound the owning context when it is younger.
+ */
 static inline int __sched
 __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		      struct mutex *lock,
@@ -684,16 +800,21 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 {
 	struct mutex_waiter *cur;
 	struct list_head *pos;
+	bool is_wait_die;
 
 	if (!ww_ctx) {
 		list_add_tail(&waiter->list, &lock->wait_list);
 		return 0;
 	}
 
+	is_wait_die = ww_ctx->is_wait_die;
+
 	/*
 	 * Add the waiter before the first waiter with a higher stamp.
 	 * Waiters without a context are skipped to avoid starving
-	 * them.
+	 * them. Wait-Die waiters may back off here. Wound-Wait waiters
+	 * never back off here, but they are sorted in stamp order and
+	 * may wound the lock holder.
 	 */
 	pos = &lock->wait_list;
 	list_for_each_entry_reverse(cur, &lock->wait_list, list) {
@@ -701,16 +822,16 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 			continue;
 
 		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
-			/* Back off immediately if necessary. */
-			if (ww_ctx->acquired > 0) {
-#ifdef CONFIG_DEBUG_MUTEXES
-				struct ww_mutex *ww;
-
-				ww = container_of(lock, struct ww_mutex, base);
-				DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
-				ww_ctx->contending_lock = ww;
-#endif
-				return -EDEADLK;
+			/*
+			 * Wait-Die: if we find an older context waiting, there
+			 * is no point in queueing behind it, as we'd have to
+			 * wound ourselves the moment it would acquire the
+			 * lock.
+			 */
+			if (is_wait_die) {
+				int ret = __ww_mutex_kill(lock, ww_ctx);
+				if (ret)
+					return ret;
 			}
 
 			break;
@@ -718,17 +839,29 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 
 		pos = &cur->list;
 
+		/* Wait-Die: ensure younger waiters die. */
+		__ww_mutex_die(lock, cur, ww_ctx);
+	}
+
+	list_add_tail(&waiter->list, pos);
+
+	/*
+	 * Wound-Wait: if we're blocking on a mutex owned by a younger context,
+	 * wound that such that we might proceed.
+	 */
+	if (!is_wait_die) {
+		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
+
 		/*
-		 * Wake up the waiter so that it gets a chance to back
-		 * off.
+		 * See ww_mutex_set_context_fastpath(). Orders the
+		 * list_add_tail() vs the ww->ctx load, such that either we
+		 * or the fastpath will wound @ww->ctx.
 		 */
-		if (cur->ww_ctx->acquired > 0) {
-			debug_mutex_wake_waiter(lock, cur);
-			wake_up_process(cur->task);
-		}
+		smp_mb();
+
+		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
 	}
 
-	list_add_tail(&waiter->list, pos);
 	return 0;
 }
 
@@ -751,6 +884,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	if (use_ww_ctx && ww_ctx) {
 		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
 			return -EALREADY;
+
+		/*
+		 * Reset the wounded flag after a kill.  No other process can
+		 * race and wound us here since they can't have a valid owner
+		 * pointer at this time.
+		 */
+		if (ww_ctx->acquired == 0)
+			ww_ctx->wounded = 0;
 	}
 
 	preempt_disable();
@@ -772,7 +913,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	 */
 	if (__mutex_trylock(lock)) {
 		if (use_ww_ctx && ww_ctx)
-			__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx);
 
 		goto skip_wait;
 	}
@@ -790,10 +931,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		waiter.ww_ctx = MUTEX_POISON_WW_CTX;
 #endif
 	} else {
-		/* Add in stamp order, waking up waiters that must back off. */
+		/* Add in stamp order, waking up waiters that must wound themselves. */
 		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
 		if (ret)
-			goto err_early_backoff;
+			goto err_early_kill;
 
 		waiter.ww_ctx = ww_ctx;
 	}
@@ -824,8 +965,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			goto err;
 		}
 
-		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
-			ret = __ww_mutex_lock_check_stamp(lock, &waiter, ww_ctx);
+		if (use_ww_ctx && ww_ctx) {
+			ret = __ww_mutex_check_wound(lock, &waiter, ww_ctx);
 			if (ret)
 				goto err;
 		}
@@ -859,6 +1000,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 acquired:
 	__set_current_state(TASK_RUNNING);
 
+	if (use_ww_ctx && ww_ctx) {
+		/*
+		 * Wound-Wait; we stole the lock (!first_waiter), check the
+		 * waiters. This, together with XXX, ensures __ww_mutex_wound()
+		 * only needs to check the first waiter (with context).
+		 */
+		if (!ww_ctx->is_wait_die && !__mutex_waiter_is_first(lock, &waiter))
+			__ww_mutex_check_waiters(lock, ww_ctx);
+	}
+
 	mutex_remove_waiter(lock, &waiter, current);
 	if (likely(list_empty(&lock->wait_list)))
 		__mutex_clear_flag(lock, MUTEX_FLAGS);
@@ -870,7 +1021,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_acquired(&lock->dep_map, ip);
 
 	if (use_ww_ctx && ww_ctx)
-		ww_mutex_set_context_slowpath(ww, ww_ctx);
+		ww_mutex_lock_acquired(ww, ww_ctx);
 
 	spin_unlock(&lock->wait_lock);
 	preempt_enable();
@@ -879,7 +1030,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 err:
 	__set_current_state(TASK_RUNNING);
 	mutex_remove_waiter(lock, &waiter, current);
-err_early_backoff:
+err_early_kill:
 	spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, 1, ip);

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-14 14:42                 ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-14 14:42 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On Thu, Jun 14, 2018 at 01:48:39PM +0200, Thomas Hellstrom wrote:
> The literature makes a distinction between "killed" and "wounded". In our
> context, "Killed" is when a transaction actually receives an -EDEADLK and
> needs to back off. "Wounded" is when someone (typically another transaction)
> requests a transaction to kill itself. A wound will often, but not always,
> lead to a kill. If the wounded transaction has finished its locking
> sequence, or has the opportunity to grab uncontended ww mutexes or steal
> contended (non-handoff) ww mutexes to finish its transaction it will do so
> and never kill itself.

Hopefully I got it all right this time; I folded your patch in and
mucked around with it a bit, but haven't done anything except compile
it.

I left the context/transaction thing because well, that's what we called
the thing.


diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 39fda195bf78..50ef5a10cfa0 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -8,6 +8,8 @@
  *
  * Wound/wait implementation:
  *  Copyright (C) 2013 Canonical Ltd.
+ * Choice of algorithm:
+ *  Copyright (C) 2018 WMWare Inc.
  *
  * This file contains the main data structure and API definitions.
  */
@@ -23,14 +25,17 @@ struct ww_class {
 	struct lock_class_key mutex_key;
 	const char *acquire_name;
 	const char *mutex_name;
+	unsigned int is_wait_die;
 };
 
 struct ww_acquire_ctx {
 	struct task_struct *task;
 	unsigned long stamp;
-	unsigned acquired;
+	unsigned int acquired;
+	unsigned short wounded;
+	unsigned short is_wait_die;
 #ifdef CONFIG_DEBUG_MUTEXES
-	unsigned done_acquire;
+	unsigned int done_acquire;
 	struct ww_class *ww_class;
 	struct ww_mutex *contending_lock;
 #endif
@@ -38,8 +43,8 @@ struct ww_acquire_ctx {
 	struct lockdep_map dep_map;
 #endif
 #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
-	unsigned deadlock_inject_interval;
-	unsigned deadlock_inject_countdown;
+	unsigned int deadlock_inject_interval;
+	unsigned int deadlock_inject_countdown;
 #endif
 };
 
@@ -58,17 +63,21 @@ struct ww_mutex {
 # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
 #endif
 
-#define __WW_CLASS_INITIALIZER(ww_class) \
+#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die)	    \
 		{ .stamp = ATOMIC_LONG_INIT(0) \
 		, .acquire_name = #ww_class "_acquire" \
-		, .mutex_name = #ww_class "_mutex" }
+		, .mutex_name = #ww_class "_mutex" \
+		, .is_wait_die = _is_wait_die }
 
 #define __WW_MUTEX_INITIALIZER(lockname, class) \
 		{ .base =  __MUTEX_INITIALIZER(lockname.base) \
 		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
 
+#define DEFINE_WD_CLASS(classname) \
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
+
 #define DEFINE_WW_CLASS(classname) \
-	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
 
 #define DEFINE_WW_MUTEX(mutexname, ww_class) \
 	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
@@ -123,6 +132,8 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
 	ctx->task = current;
 	ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
 	ctx->acquired = 0;
+	ctx->wounded = false;
+	ctx->is_wait_die = ww_class->is_wait_die;
 #ifdef CONFIG_DEBUG_MUTEXES
 	ctx->ww_class = ww_class;
 	ctx->done_acquire = 0;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index f44f658ae629..9e244af4647d 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -244,6 +244,22 @@ void __sched mutex_lock(struct mutex *lock)
 EXPORT_SYMBOL(mutex_lock);
 #endif
 
+/*
+ * Wait-Die:
+ *   The newer transactions are killed when:
+ *     It (the new transaction) makes a request for a lock being held
+ *     by an older transaction.
+ *
+ * Wound-Wait:
+ *   The newer transactions are wounded when:
+ *     An older transaction makes a request for a lock being held by
+ *     the newer transaction.
+ */
+
+/*
+ * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
+ * it.
+ */
 static __always_inline void
 ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 {
@@ -282,26 +298,96 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
 #endif
 	ww_ctx->acquired++;
+	ww->ctx = ww_ctx;
 }
 
+/*
+ * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
+ * favour of @b.
+ */
 static inline bool __sched
 __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
 {
-	return a->stamp - b->stamp <= LONG_MAX &&
-	       (a->stamp != b->stamp || a > b);
+
+	return (signed long)(a->stamp - b->stamp) > 0;
 }
 
 /*
- * Wake up any waiters that may have to back off when the lock is held by the
- * given context.
+ * Wait-Die; wake a younger waiter context (when locks held) such that it can die.
  *
- * Due to the invariants on the wait list, this can only affect the first
- * waiter with a context.
+ * Among waiters with context, only the first one can have other locks acquired
+ * already (ctx->acquired > 0), because __ww_mutex_add_waiter() and
+ * __ww_mutex_check_wound() wake any but the earliest context.
+ */
+static bool __ww_mutex_die(struct mutex *lock, struct mutex_waiter *waiter,
+		           struct ww_acquire_ctx *ww_ctx)
+{
+	if (!ww_ctx->is_wait_die)
+		return false;
+
+	if (waiter->ww_ctx->acquired > 0 &&
+			__ww_ctx_stamp_after(waiter->ww_ctx, ww_ctx)) {
+		debug_mutex_wake_waiter(lock, waiter);
+		wake_up_process(waiter->task);
+	}
+
+	return true;
+}
+
+/*
+ * Wound-Wait; wound a younger @hold_ctx (if it has locks held).
+ *
+ * XXX more; explain why we too only need to wake the first.
+ */
+static bool __ww_mutex_wound(struct mutex *lock,
+			     struct ww_acquire_ctx *ww_ctx,
+			     struct ww_acquire_ctx *hold_ctx)
+{
+	struct task_struct *owner = __mutex_owner(lock);
+
+	lockdep_assert_held(&lock->wait_lock);
+
+	/*
+	 * Possible through __ww_mutex_add_waiter() when we race with
+	 * ww_mutex_set_context_fastpath(). In that case we'll get here again
+	 * through __ww_mutex_check_waiters().
+	 */
+	if (!hold_ctx)
+		return false;
+
+	/*
+	 * Can have !owner because of __mutex_unlock_slowpath(), but if owner,
+	 * it cannot go away because we'll have FLAG_WAITERS set and hold
+	 * wait_lock.
+	 */
+	if (!owner)
+		return false;
+
+	if (ww_ctx->acquired > 0 && __ww_ctx_stamp_after(hold_ctx, ww_ctx)) {
+		hold_ctx->wounded = 1;
+		if (owner != current)
+			wake_up_process(owner);
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * We just acquired @lock under @ww_ctx, if there are later contexts waiting
+ * behind us on the wait-list, check if they need wounding/killing.
+ *
+ * See __ww_mutex_add_waiter() for the list-order construction; basically the
+ * list is ordered by stamp, smallest (oldest) first.
+ *
+ * This relies on never mixing wait-die/wound-wait on the same wait-list; which is
+ * currently ensured by that being a ww_class property.
  *
  * The current task must not be on the wait list.
  */
 static void __sched
-__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_check_waiters(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 {
 	struct mutex_waiter *cur;
 
@@ -311,66 +397,50 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 		if (!cur->ww_ctx)
 			continue;
 
-		if (cur->ww_ctx->acquired > 0 &&
-		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
-			debug_mutex_wake_waiter(lock, cur);
-			wake_up_process(cur->task);
-		}
-
-		break;
+		if (__ww_mutex_die(lock, cur, ww_ctx) ||
+		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+			break;
 	}
 }
 
 /*
- * After acquiring lock with fastpath or when we lost out in contested
- * slowpath, set ctx and wake up any waiters so they can recheck.
+ * After acquiring lock with fastpath, where we do not hold wait_lock, set ctx
+ * and wake up any waiters so they can recheck.
  */
 static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
 	ww_mutex_lock_acquired(lock, ctx);
 
-	lock->ctx = ctx;
-
 	/*
 	 * The lock->ctx update should be visible on all cores before
-	 * the atomic read is done, otherwise contended waiters might be
+	 * the list_empty check is done, otherwise contended waiters might be
 	 * missed. The contended waiters will either see ww_ctx == NULL
 	 * and keep spinning, or it will acquire wait_lock, add itself
 	 * to waiter list and sleep.
 	 */
-	smp_mb(); /* ^^^ */
+	smp_mb(); /* See comments above and below. */
 
 	/*
-	 * Check if lock is contended, if not there is nobody to wake up
+	 * [W] ww->ctx = ctx	[W] list_add_tail()
+	 *     MB		    MB
+	 * [R] list_empty()	[R] ww->ctx
+	 *
+	 * The memory barrier above pairs with the memory barrier in
+	 * __ww_mutex_add_waiter() and makes sure we either observe ww->ctx
+	 * and/or !empty list.
 	 */
-	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
+	if (likely(list_empty(&lock->base.wait_list)))
 		return;
 
 	/*
-	 * Uh oh, we raced in fastpath, wake up everyone in this case,
-	 * so they can see the new lock->ctx.
+	 * Uh oh, we raced in fastpath, check if any of the waiters need wounding.
 	 */
 	spin_lock(&lock->base.wait_lock);
-	__ww_mutex_wakeup_for_backoff(&lock->base, ctx);
+	__ww_mutex_check_waiters(&lock->base, ctx);
 	spin_unlock(&lock->base.wait_lock);
 }
 
-/*
- * After acquiring lock in the slowpath set ctx.
- *
- * Unlike for the fast path, the caller ensures that waiters are woken up where
- * necessary.
- *
- * Callers must hold the mutex wait_lock.
- */
-static __always_inline void
-ww_mutex_set_context_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
-{
-	ww_mutex_lock_acquired(lock, ctx);
-	lock->ctx = ctx;
-}
-
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 
 static inline
@@ -646,37 +716,83 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 }
 EXPORT_SYMBOL(ww_mutex_unlock);
 
+
+static __always_inline int __sched
+__ww_mutex_kill(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+{
+	if (ww_ctx->acquired > 0) {
+#ifdef CONFIG_DEBUG_MUTEXES
+		struct ww_mutex *ww;
+
+		ww = container_of(lock, struct ww_mutex, base);
+		DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
+		ww_ctx->contending_lock = ww;
+#endif
+		return -EDEADLK;
+	}
+
+	return 0;
+}
+
+
+/*
+ * Check the wound condition for the current lock acquire.
+ *
+ * Wound-Wait: If we're wounded, kill ourself.
+ *
+ * Wait-Die: If we're trying to acquire a lock already held by an older
+ *           context, kill ourselves.
+ *
+ * Since __ww_mutex_add_waiter() orders the wait-list on stamp, we only have to
+ * look at waiters before us in the wait-list.
+ */
 static inline int __sched
-__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
+__ww_mutex_check_wound(struct mutex *lock, struct mutex_waiter *waiter,
 			    struct ww_acquire_ctx *ctx)
 {
 	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
 	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
 	struct mutex_waiter *cur;
 
+	if (ctx->acquired == 0)
+		return 0;
+
+	if (!ctx->is_wait_die) {
+		if (ctx->wounded)
+			return __ww_mutex_kill(lock, ctx);
+
+		return 0;
+	}
+
 	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
-		goto deadlock;
+		return __ww_mutex_kill(lock, ctx);
 
 	/*
 	 * If there is a waiter in front of us that has a context, then its
-	 * stamp is earlier than ours and we must back off.
+	 * stamp is earlier than ours and we must wound ourself.
 	 */
 	cur = waiter;
 	list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
-		if (cur->ww_ctx)
-			goto deadlock;
+		if (!cur->ww_ctx)
+			continue;
+
+		return __ww_mutex_kill(lock, ctx);
 	}
 
 	return 0;
-
-deadlock:
-#ifdef CONFIG_DEBUG_MUTEXES
-	DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
-	ctx->contending_lock = ww;
-#endif
-	return -EDEADLK;
 }
 
+/*
+ * Add @waiter to the wait-list, keep the wait-list ordered by stamp, smallest
+ * first. Such that older contexts are preferred to acquire the lock over
+ * younger contexts.
+ *
+ * Waiters without context are interspersed in FIFO order.
+ *
+ * Furthermore, for Wait-Die kill ourself immediately when possible (there are
+ * older contexts already waiting) to avoid unnecessary waiting and for
+ * Wound-Wait ensure we wound the owning context when it is younger.
+ */
 static inline int __sched
 __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		      struct mutex *lock,
@@ -684,16 +800,21 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 {
 	struct mutex_waiter *cur;
 	struct list_head *pos;
+	bool is_wait_die;
 
 	if (!ww_ctx) {
 		list_add_tail(&waiter->list, &lock->wait_list);
 		return 0;
 	}
 
+	is_wait_die = ww_ctx->is_wait_die;
+
 	/*
 	 * Add the waiter before the first waiter with a higher stamp.
 	 * Waiters without a context are skipped to avoid starving
-	 * them.
+	 * them. Wait-Die waiters may back off here. Wound-Wait waiters
+	 * never back off here, but they are sorted in stamp order and
+	 * may wound the lock holder.
 	 */
 	pos = &lock->wait_list;
 	list_for_each_entry_reverse(cur, &lock->wait_list, list) {
@@ -701,16 +822,16 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 			continue;
 
 		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
-			/* Back off immediately if necessary. */
-			if (ww_ctx->acquired > 0) {
-#ifdef CONFIG_DEBUG_MUTEXES
-				struct ww_mutex *ww;
-
-				ww = container_of(lock, struct ww_mutex, base);
-				DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
-				ww_ctx->contending_lock = ww;
-#endif
-				return -EDEADLK;
+			/*
+			 * Wait-Die: if we find an older context waiting, there
+			 * is no point in queueing behind it, as we'd have to
+			 * wound ourselves the moment it would acquire the
+			 * lock.
+			 */
+			if (is_wait_die) {
+				int ret = __ww_mutex_kill(lock, ww_ctx);
+				if (ret)
+					return ret;
 			}
 
 			break;
@@ -718,17 +839,29 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 
 		pos = &cur->list;
 
+		/* Wait-Die: ensure younger waiters die. */
+		__ww_mutex_die(lock, cur, ww_ctx);
+	}
+
+	list_add_tail(&waiter->list, pos);
+
+	/*
+	 * Wound-Wait: if we're blocking on a mutex owned by a younger context,
+	 * wound that such that we might proceed.
+	 */
+	if (!is_wait_die) {
+		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
+
 		/*
-		 * Wake up the waiter so that it gets a chance to back
-		 * off.
+		 * See ww_mutex_set_context_fastpath(). Orders the
+		 * list_add_tail() vs the ww->ctx load, such that either we
+		 * or the fastpath will wound @ww->ctx.
 		 */
-		if (cur->ww_ctx->acquired > 0) {
-			debug_mutex_wake_waiter(lock, cur);
-			wake_up_process(cur->task);
-		}
+		smp_mb();
+
+		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
 	}
 
-	list_add_tail(&waiter->list, pos);
 	return 0;
 }
 
@@ -751,6 +884,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	if (use_ww_ctx && ww_ctx) {
 		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
 			return -EALREADY;
+
+		/*
+		 * Reset the wounded flag after a kill.  No other process can
+		 * race and wound us here since they can't have a valid owner
+		 * pointer at this time.
+		 */
+		if (ww_ctx->acquired == 0)
+			ww_ctx->wounded = 0;
 	}
 
 	preempt_disable();
@@ -772,7 +913,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	 */
 	if (__mutex_trylock(lock)) {
 		if (use_ww_ctx && ww_ctx)
-			__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx);
 
 		goto skip_wait;
 	}
@@ -790,10 +931,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		waiter.ww_ctx = MUTEX_POISON_WW_CTX;
 #endif
 	} else {
-		/* Add in stamp order, waking up waiters that must back off. */
+		/* Add in stamp order, waking up waiters that must wound themselves. */
 		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
 		if (ret)
-			goto err_early_backoff;
+			goto err_early_kill;
 
 		waiter.ww_ctx = ww_ctx;
 	}
@@ -824,8 +965,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			goto err;
 		}
 
-		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
-			ret = __ww_mutex_lock_check_stamp(lock, &waiter, ww_ctx);
+		if (use_ww_ctx && ww_ctx) {
+			ret = __ww_mutex_check_wound(lock, &waiter, ww_ctx);
 			if (ret)
 				goto err;
 		}
@@ -859,6 +1000,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 acquired:
 	__set_current_state(TASK_RUNNING);
 
+	if (use_ww_ctx && ww_ctx) {
+		/*
+		 * Wound-Wait; we stole the lock (!first_waiter), check the
+		 * waiters. This, together with XXX, ensures __ww_mutex_wound()
+		 * only needs to check the first waiter (with context).
+		 */
+		if (!ww_ctx->is_wait_die && !__mutex_waiter_is_first(lock, &waiter))
+			__ww_mutex_check_waiters(lock, ww_ctx);
+	}
+
 	mutex_remove_waiter(lock, &waiter, current);
 	if (likely(list_empty(&lock->wait_list)))
 		__mutex_clear_flag(lock, MUTEX_FLAGS);
@@ -870,7 +1021,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_acquired(&lock->dep_map, ip);
 
 	if (use_ww_ctx && ww_ctx)
-		ww_mutex_set_context_slowpath(ww, ww_ctx);
+		ww_mutex_lock_acquired(ww, ww_ctx);
 
 	spin_unlock(&lock->wait_lock);
 	preempt_enable();
@@ -879,7 +1030,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 err:
 	__set_current_state(TASK_RUNNING);
 	mutex_remove_waiter(lock, &waiter, current);
-err_early_backoff:
+err_early_kill:
 	spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, 1, ip);
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-14 14:42                 ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-14 14:42 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, David Airlie,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	Greg Kroah-Hartman, Ingo Molnar, Philippe Ombredanne,
	Thomas Gleixner, Paul E. McKenney, linux-media

On Thu, Jun 14, 2018 at 01:48:39PM +0200, Thomas Hellstrom wrote:
> The literature makes a distinction between "killed" and "wounded". In our
> context, "Killed" is when a transaction actually receives an -EDEADLK and
> needs to back off. "Wounded" is when someone (typically another transaction)
> requests a transaction to kill itself. A wound will often, but not always,
> lead to a kill. If the wounded transaction has finished its locking
> sequence, or has the opportunity to grab uncontended ww mutexes or steal
> contended (non-handoff) ww mutexes to finish its transaction it will do so
> and never kill itself.

Hopefully I got it all right this time; I folded your patch in and
mucked around with it a bit, but haven't done anything except compile
it.

I left the context/transaction thing because well, that's what we called
the thing.


diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 39fda195bf78..50ef5a10cfa0 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -8,6 +8,8 @@
  *
  * Wound/wait implementation:
  *  Copyright (C) 2013 Canonical Ltd.
+ * Choice of algorithm:
+ *  Copyright (C) 2018 WMWare Inc.
  *
  * This file contains the main data structure and API definitions.
  */
@@ -23,14 +25,17 @@ struct ww_class {
 	struct lock_class_key mutex_key;
 	const char *acquire_name;
 	const char *mutex_name;
+	unsigned int is_wait_die;
 };
 
 struct ww_acquire_ctx {
 	struct task_struct *task;
 	unsigned long stamp;
-	unsigned acquired;
+	unsigned int acquired;
+	unsigned short wounded;
+	unsigned short is_wait_die;
 #ifdef CONFIG_DEBUG_MUTEXES
-	unsigned done_acquire;
+	unsigned int done_acquire;
 	struct ww_class *ww_class;
 	struct ww_mutex *contending_lock;
 #endif
@@ -38,8 +43,8 @@ struct ww_acquire_ctx {
 	struct lockdep_map dep_map;
 #endif
 #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
-	unsigned deadlock_inject_interval;
-	unsigned deadlock_inject_countdown;
+	unsigned int deadlock_inject_interval;
+	unsigned int deadlock_inject_countdown;
 #endif
 };
 
@@ -58,17 +63,21 @@ struct ww_mutex {
 # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
 #endif
 
-#define __WW_CLASS_INITIALIZER(ww_class) \
+#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die)	    \
 		{ .stamp = ATOMIC_LONG_INIT(0) \
 		, .acquire_name = #ww_class "_acquire" \
-		, .mutex_name = #ww_class "_mutex" }
+		, .mutex_name = #ww_class "_mutex" \
+		, .is_wait_die = _is_wait_die }
 
 #define __WW_MUTEX_INITIALIZER(lockname, class) \
 		{ .base =  __MUTEX_INITIALIZER(lockname.base) \
 		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
 
+#define DEFINE_WD_CLASS(classname) \
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
+
 #define DEFINE_WW_CLASS(classname) \
-	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
 
 #define DEFINE_WW_MUTEX(mutexname, ww_class) \
 	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
@@ -123,6 +132,8 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
 	ctx->task = current;
 	ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
 	ctx->acquired = 0;
+	ctx->wounded = false;
+	ctx->is_wait_die = ww_class->is_wait_die;
 #ifdef CONFIG_DEBUG_MUTEXES
 	ctx->ww_class = ww_class;
 	ctx->done_acquire = 0;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index f44f658ae629..9e244af4647d 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -244,6 +244,22 @@ void __sched mutex_lock(struct mutex *lock)
 EXPORT_SYMBOL(mutex_lock);
 #endif
 
+/*
+ * Wait-Die:
+ *   The newer transactions are killed when:
+ *     It (the new transaction) makes a request for a lock being held
+ *     by an older transaction.
+ *
+ * Wound-Wait:
+ *   The newer transactions are wounded when:
+ *     An older transaction makes a request for a lock being held by
+ *     the newer transaction.
+ */
+
+/*
+ * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
+ * it.
+ */
 static __always_inline void
 ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 {
@@ -282,26 +298,96 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
 	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
 #endif
 	ww_ctx->acquired++;
+	ww->ctx = ww_ctx;
 }
 
+/*
+ * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
+ * favour of @b.
+ */
 static inline bool __sched
 __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
 {
-	return a->stamp - b->stamp <= LONG_MAX &&
-	       (a->stamp != b->stamp || a > b);
+
+	return (signed long)(a->stamp - b->stamp) > 0;
 }
 
 /*
- * Wake up any waiters that may have to back off when the lock is held by the
- * given context.
+ * Wait-Die; wake a younger waiter context (when locks held) such that it can die.
  *
- * Due to the invariants on the wait list, this can only affect the first
- * waiter with a context.
+ * Among waiters with context, only the first one can have other locks acquired
+ * already (ctx->acquired > 0), because __ww_mutex_add_waiter() and
+ * __ww_mutex_check_wound() wake any but the earliest context.
+ */
+static bool __ww_mutex_die(struct mutex *lock, struct mutex_waiter *waiter,
+		           struct ww_acquire_ctx *ww_ctx)
+{
+	if (!ww_ctx->is_wait_die)
+		return false;
+
+	if (waiter->ww_ctx->acquired > 0 &&
+			__ww_ctx_stamp_after(waiter->ww_ctx, ww_ctx)) {
+		debug_mutex_wake_waiter(lock, waiter);
+		wake_up_process(waiter->task);
+	}
+
+	return true;
+}
+
+/*
+ * Wound-Wait; wound a younger @hold_ctx (if it has locks held).
+ *
+ * XXX more; explain why we too only need to wake the first.
+ */
+static bool __ww_mutex_wound(struct mutex *lock,
+			     struct ww_acquire_ctx *ww_ctx,
+			     struct ww_acquire_ctx *hold_ctx)
+{
+	struct task_struct *owner = __mutex_owner(lock);
+
+	lockdep_assert_held(&lock->wait_lock);
+
+	/*
+	 * Possible through __ww_mutex_add_waiter() when we race with
+	 * ww_mutex_set_context_fastpath(). In that case we'll get here again
+	 * through __ww_mutex_check_waiters().
+	 */
+	if (!hold_ctx)
+		return false;
+
+	/*
+	 * Can have !owner because of __mutex_unlock_slowpath(), but if owner,
+	 * it cannot go away because we'll have FLAG_WAITERS set and hold
+	 * wait_lock.
+	 */
+	if (!owner)
+		return false;
+
+	if (ww_ctx->acquired > 0 && __ww_ctx_stamp_after(hold_ctx, ww_ctx)) {
+		hold_ctx->wounded = 1;
+		if (owner != current)
+			wake_up_process(owner);
+
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * We just acquired @lock under @ww_ctx, if there are later contexts waiting
+ * behind us on the wait-list, check if they need wounding/killing.
+ *
+ * See __ww_mutex_add_waiter() for the list-order construction; basically the
+ * list is ordered by stamp, smallest (oldest) first.
+ *
+ * This relies on never mixing wait-die/wound-wait on the same wait-list; which is
+ * currently ensured by that being a ww_class property.
  *
  * The current task must not be on the wait list.
  */
 static void __sched
-__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_check_waiters(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 {
 	struct mutex_waiter *cur;
 
@@ -311,66 +397,50 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
 		if (!cur->ww_ctx)
 			continue;
 
-		if (cur->ww_ctx->acquired > 0 &&
-		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
-			debug_mutex_wake_waiter(lock, cur);
-			wake_up_process(cur->task);
-		}
-
-		break;
+		if (__ww_mutex_die(lock, cur, ww_ctx) ||
+		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+			break;
 	}
 }
 
 /*
- * After acquiring lock with fastpath or when we lost out in contested
- * slowpath, set ctx and wake up any waiters so they can recheck.
+ * After acquiring lock with fastpath, where we do not hold wait_lock, set ctx
+ * and wake up any waiters so they can recheck.
  */
 static __always_inline void
 ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
 {
 	ww_mutex_lock_acquired(lock, ctx);
 
-	lock->ctx = ctx;
-
 	/*
 	 * The lock->ctx update should be visible on all cores before
-	 * the atomic read is done, otherwise contended waiters might be
+	 * the list_empty check is done, otherwise contended waiters might be
 	 * missed. The contended waiters will either see ww_ctx == NULL
 	 * and keep spinning, or it will acquire wait_lock, add itself
 	 * to waiter list and sleep.
 	 */
-	smp_mb(); /* ^^^ */
+	smp_mb(); /* See comments above and below. */
 
 	/*
-	 * Check if lock is contended, if not there is nobody to wake up
+	 * [W] ww->ctx = ctx	[W] list_add_tail()
+	 *     MB		    MB
+	 * [R] list_empty()	[R] ww->ctx
+	 *
+	 * The memory barrier above pairs with the memory barrier in
+	 * __ww_mutex_add_waiter() and makes sure we either observe ww->ctx
+	 * and/or !empty list.
 	 */
-	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
+	if (likely(list_empty(&lock->base.wait_list)))
 		return;
 
 	/*
-	 * Uh oh, we raced in fastpath, wake up everyone in this case,
-	 * so they can see the new lock->ctx.
+	 * Uh oh, we raced in fastpath, check if any of the waiters need wounding.
 	 */
 	spin_lock(&lock->base.wait_lock);
-	__ww_mutex_wakeup_for_backoff(&lock->base, ctx);
+	__ww_mutex_check_waiters(&lock->base, ctx);
 	spin_unlock(&lock->base.wait_lock);
 }
 
-/*
- * After acquiring lock in the slowpath set ctx.
- *
- * Unlike for the fast path, the caller ensures that waiters are woken up where
- * necessary.
- *
- * Callers must hold the mutex wait_lock.
- */
-static __always_inline void
-ww_mutex_set_context_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
-{
-	ww_mutex_lock_acquired(lock, ctx);
-	lock->ctx = ctx;
-}
-
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 
 static inline
@@ -646,37 +716,83 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
 }
 EXPORT_SYMBOL(ww_mutex_unlock);
 
+
+static __always_inline int __sched
+__ww_mutex_kill(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
+{
+	if (ww_ctx->acquired > 0) {
+#ifdef CONFIG_DEBUG_MUTEXES
+		struct ww_mutex *ww;
+
+		ww = container_of(lock, struct ww_mutex, base);
+		DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
+		ww_ctx->contending_lock = ww;
+#endif
+		return -EDEADLK;
+	}
+
+	return 0;
+}
+
+
+/*
+ * Check the wound condition for the current lock acquire.
+ *
+ * Wound-Wait: If we're wounded, kill ourself.
+ *
+ * Wait-Die: If we're trying to acquire a lock already held by an older
+ *           context, kill ourselves.
+ *
+ * Since __ww_mutex_add_waiter() orders the wait-list on stamp, we only have to
+ * look at waiters before us in the wait-list.
+ */
 static inline int __sched
-__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
+__ww_mutex_check_wound(struct mutex *lock, struct mutex_waiter *waiter,
 			    struct ww_acquire_ctx *ctx)
 {
 	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
 	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
 	struct mutex_waiter *cur;
 
+	if (ctx->acquired == 0)
+		return 0;
+
+	if (!ctx->is_wait_die) {
+		if (ctx->wounded)
+			return __ww_mutex_kill(lock, ctx);
+
+		return 0;
+	}
+
 	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
-		goto deadlock;
+		return __ww_mutex_kill(lock, ctx);
 
 	/*
 	 * If there is a waiter in front of us that has a context, then its
-	 * stamp is earlier than ours and we must back off.
+	 * stamp is earlier than ours and we must wound ourself.
 	 */
 	cur = waiter;
 	list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
-		if (cur->ww_ctx)
-			goto deadlock;
+		if (!cur->ww_ctx)
+			continue;
+
+		return __ww_mutex_kill(lock, ctx);
 	}
 
 	return 0;
-
-deadlock:
-#ifdef CONFIG_DEBUG_MUTEXES
-	DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
-	ctx->contending_lock = ww;
-#endif
-	return -EDEADLK;
 }
 
+/*
+ * Add @waiter to the wait-list, keep the wait-list ordered by stamp, smallest
+ * first. Such that older contexts are preferred to acquire the lock over
+ * younger contexts.
+ *
+ * Waiters without context are interspersed in FIFO order.
+ *
+ * Furthermore, for Wait-Die kill ourself immediately when possible (there are
+ * older contexts already waiting) to avoid unnecessary waiting and for
+ * Wound-Wait ensure we wound the owning context when it is younger.
+ */
 static inline int __sched
 __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 		      struct mutex *lock,
@@ -684,16 +800,21 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 {
 	struct mutex_waiter *cur;
 	struct list_head *pos;
+	bool is_wait_die;
 
 	if (!ww_ctx) {
 		list_add_tail(&waiter->list, &lock->wait_list);
 		return 0;
 	}
 
+	is_wait_die = ww_ctx->is_wait_die;
+
 	/*
 	 * Add the waiter before the first waiter with a higher stamp.
 	 * Waiters without a context are skipped to avoid starving
-	 * them.
+	 * them. Wait-Die waiters may back off here. Wound-Wait waiters
+	 * never back off here, but they are sorted in stamp order and
+	 * may wound the lock holder.
 	 */
 	pos = &lock->wait_list;
 	list_for_each_entry_reverse(cur, &lock->wait_list, list) {
@@ -701,16 +822,16 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 			continue;
 
 		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
-			/* Back off immediately if necessary. */
-			if (ww_ctx->acquired > 0) {
-#ifdef CONFIG_DEBUG_MUTEXES
-				struct ww_mutex *ww;
-
-				ww = container_of(lock, struct ww_mutex, base);
-				DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
-				ww_ctx->contending_lock = ww;
-#endif
-				return -EDEADLK;
+			/*
+			 * Wait-Die: if we find an older context waiting, there
+			 * is no point in queueing behind it, as we'd have to
+			 * wound ourselves the moment it would acquire the
+			 * lock.
+			 */
+			if (is_wait_die) {
+				int ret = __ww_mutex_kill(lock, ww_ctx);
+				if (ret)
+					return ret;
 			}
 
 			break;
@@ -718,17 +839,29 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
 
 		pos = &cur->list;
 
+		/* Wait-Die: ensure younger waiters die. */
+		__ww_mutex_die(lock, cur, ww_ctx);
+	}
+
+	list_add_tail(&waiter->list, pos);
+
+	/*
+	 * Wound-Wait: if we're blocking on a mutex owned by a younger context,
+	 * wound that such that we might proceed.
+	 */
+	if (!is_wait_die) {
+		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
+
 		/*
-		 * Wake up the waiter so that it gets a chance to back
-		 * off.
+		 * See ww_mutex_set_context_fastpath(). Orders the
+		 * list_add_tail() vs the ww->ctx load, such that either we
+		 * or the fastpath will wound @ww->ctx.
 		 */
-		if (cur->ww_ctx->acquired > 0) {
-			debug_mutex_wake_waiter(lock, cur);
-			wake_up_process(cur->task);
-		}
+		smp_mb();
+
+		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
 	}
 
-	list_add_tail(&waiter->list, pos);
 	return 0;
 }
 
@@ -751,6 +884,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	if (use_ww_ctx && ww_ctx) {
 		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
 			return -EALREADY;
+
+		/*
+		 * Reset the wounded flag after a kill.  No other process can
+		 * race and wound us here since they can't have a valid owner
+		 * pointer at this time.
+		 */
+		if (ww_ctx->acquired == 0)
+			ww_ctx->wounded = 0;
 	}
 
 	preempt_disable();
@@ -772,7 +913,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	 */
 	if (__mutex_trylock(lock)) {
 		if (use_ww_ctx && ww_ctx)
-			__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
+			__ww_mutex_check_waiters(lock, ww_ctx);
 
 		goto skip_wait;
 	}
@@ -790,10 +931,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		waiter.ww_ctx = MUTEX_POISON_WW_CTX;
 #endif
 	} else {
-		/* Add in stamp order, waking up waiters that must back off. */
+		/* Add in stamp order, waking up waiters that must wound themselves. */
 		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
 		if (ret)
-			goto err_early_backoff;
+			goto err_early_kill;
 
 		waiter.ww_ctx = ww_ctx;
 	}
@@ -824,8 +965,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 			goto err;
 		}
 
-		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
-			ret = __ww_mutex_lock_check_stamp(lock, &waiter, ww_ctx);
+		if (use_ww_ctx && ww_ctx) {
+			ret = __ww_mutex_check_wound(lock, &waiter, ww_ctx);
 			if (ret)
 				goto err;
 		}
@@ -859,6 +1000,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 acquired:
 	__set_current_state(TASK_RUNNING);
 
+	if (use_ww_ctx && ww_ctx) {
+		/*
+		 * Wound-Wait; we stole the lock (!first_waiter), check the
+		 * waiters. This, together with XXX, ensures __ww_mutex_wound()
+		 * only needs to check the first waiter (with context).
+		 */
+		if (!ww_ctx->is_wait_die && !__mutex_waiter_is_first(lock, &waiter))
+			__ww_mutex_check_waiters(lock, ww_ctx);
+	}
+
 	mutex_remove_waiter(lock, &waiter, current);
 	if (likely(list_empty(&lock->wait_list)))
 		__mutex_clear_flag(lock, MUTEX_FLAGS);
@@ -870,7 +1021,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_acquired(&lock->dep_map, ip);
 
 	if (use_ww_ctx && ww_ctx)
-		ww_mutex_set_context_slowpath(ww, ww_ctx);
+		ww_mutex_lock_acquired(ww, ww_ctx);
 
 	spin_unlock(&lock->wait_lock);
 	preempt_enable();
@@ -879,7 +1030,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 err:
 	__set_current_state(TASK_RUNNING);
 	mutex_remove_waiter(lock, &waiter, current);
-err_early_backoff:
+err_early_kill:
 	spin_unlock(&lock->wait_lock);
 	debug_mutex_free_waiter(&waiter);
 	mutex_release(&lock->dep_map, 1, ip);
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-14 14:42                 ` Peter Zijlstra
  (?)
@ 2018-06-14 16:43                   ` Thomas Hellstrom
  -1 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-14 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On 06/14/2018 04:42 PM, Peter Zijlstra wrote:
> On Thu, Jun 14, 2018 at 01:48:39PM +0200, Thomas Hellstrom wrote:
>> The literature makes a distinction between "killed" and "wounded". In our
>> context, "Killed" is when a transaction actually receives an -EDEADLK and
>> needs to back off. "Wounded" is when someone (typically another transaction)
>> requests a transaction to kill itself. A wound will often, but not always,
>> lead to a kill. If the wounded transaction has finished its locking
>> sequence, or has the opportunity to grab uncontended ww mutexes or steal
>> contended (non-handoff) ww mutexes to finish its transaction it will do so
>> and never kill itself.
> Hopefully I got it all right this time; I folded your patch in and
> mucked around with it a bit, but haven't done anything except compile
> it.
>
> I left the context/transaction thing because well, that's what we called
> the thing.

Overall, I think this looks fine. I'll just fix up the FLAG_WAITERS 
setting and affected comments and do some torture testing on it.

Are you OK with adding the new feature and the cleanup in the same patch?

Thomas



>
>
> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index 39fda195bf78..50ef5a10cfa0 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -8,6 +8,8 @@
>    *
>    * Wound/wait implementation:
>    *  Copyright (C) 2013 Canonical Ltd.
> + * Choice of algorithm:
> + *  Copyright (C) 2018 WMWare Inc.
>    *
>    * This file contains the main data structure and API definitions.
>    */
> @@ -23,14 +25,17 @@ struct ww_class {
>   	struct lock_class_key mutex_key;
>   	const char *acquire_name;
>   	const char *mutex_name;
> +	unsigned int is_wait_die;
>   };
>   
>   struct ww_acquire_ctx {
>   	struct task_struct *task;
>   	unsigned long stamp;
> -	unsigned acquired;
> +	unsigned int acquired;
> +	unsigned short wounded;
> +	unsigned short is_wait_die;
>   #ifdef CONFIG_DEBUG_MUTEXES
> -	unsigned done_acquire;
> +	unsigned int done_acquire;
>   	struct ww_class *ww_class;
>   	struct ww_mutex *contending_lock;
>   #endif
> @@ -38,8 +43,8 @@ struct ww_acquire_ctx {
>   	struct lockdep_map dep_map;
>   #endif
>   #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> -	unsigned deadlock_inject_interval;
> -	unsigned deadlock_inject_countdown;
> +	unsigned int deadlock_inject_interval;
> +	unsigned int deadlock_inject_countdown;
>   #endif
>   };
>   
> @@ -58,17 +63,21 @@ struct ww_mutex {
>   # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
>   #endif
>   
> -#define __WW_CLASS_INITIALIZER(ww_class) \
> +#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die)	    \
>   		{ .stamp = ATOMIC_LONG_INIT(0) \
>   		, .acquire_name = #ww_class "_acquire" \
> -		, .mutex_name = #ww_class "_mutex" }
> +		, .mutex_name = #ww_class "_mutex" \
> +		, .is_wait_die = _is_wait_die }
>   
>   #define __WW_MUTEX_INITIALIZER(lockname, class) \
>   		{ .base =  __MUTEX_INITIALIZER(lockname.base) \
>   		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
>   
> +#define DEFINE_WD_CLASS(classname) \
> +	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
> +
>   #define DEFINE_WW_CLASS(classname) \
> -	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
> +	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
>   
>   #define DEFINE_WW_MUTEX(mutexname, ww_class) \
>   	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
> @@ -123,6 +132,8 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
>   	ctx->task = current;
>   	ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
>   	ctx->acquired = 0;
> +	ctx->wounded = false;
> +	ctx->is_wait_die = ww_class->is_wait_die;
>   #ifdef CONFIG_DEBUG_MUTEXES
>   	ctx->ww_class = ww_class;
>   	ctx->done_acquire = 0;
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index f44f658ae629..9e244af4647d 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -244,6 +244,22 @@ void __sched mutex_lock(struct mutex *lock)
>   EXPORT_SYMBOL(mutex_lock);
>   #endif
>   
> +/*
> + * Wait-Die:
> + *   The newer transactions are killed when:
> + *     It (the new transaction) makes a request for a lock being held
> + *     by an older transaction.
> + *
> + * Wound-Wait:
> + *   The newer transactions are wounded when:
> + *     An older transaction makes a request for a lock being held by
> + *     the newer transaction.
> + */
> +
> +/*
> + * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
> + * it.
> + */
>   static __always_inline void
>   ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   {
> @@ -282,26 +298,96 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
>   #endif
>   	ww_ctx->acquired++;
> +	ww->ctx = ww_ctx;
>   }
>   
> +/*
> + * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
> + * favour of @b.
> + */
>   static inline bool __sched
>   __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>   {
> -	return a->stamp - b->stamp <= LONG_MAX &&
> -	       (a->stamp != b->stamp || a > b);
> +
> +	return (signed long)(a->stamp - b->stamp) > 0;
>   }
>   
>   /*
> - * Wake up any waiters that may have to back off when the lock is held by the
> - * given context.
> + * Wait-Die; wake a younger waiter context (when locks held) such that it can die.
>    *
> - * Due to the invariants on the wait list, this can only affect the first
> - * waiter with a context.
> + * Among waiters with context, only the first one can have other locks acquired
> + * already (ctx->acquired > 0), because __ww_mutex_add_waiter() and
> + * __ww_mutex_check_wound() wake any but the earliest context.
> + */
> +static bool __ww_mutex_die(struct mutex *lock, struct mutex_waiter *waiter,
> +		           struct ww_acquire_ctx *ww_ctx)
> +{
> +	if (!ww_ctx->is_wait_die)
> +		return false;
> +
> +	if (waiter->ww_ctx->acquired > 0 &&
> +			__ww_ctx_stamp_after(waiter->ww_ctx, ww_ctx)) {
> +		debug_mutex_wake_waiter(lock, waiter);
> +		wake_up_process(waiter->task);
> +	}
> +
> +	return true;
> +}
> +
> +/*
> + * Wound-Wait; wound a younger @hold_ctx (if it has locks held).
> + *
> + * XXX more; explain why we too only need to wake the first.
> + */
> +static bool __ww_mutex_wound(struct mutex *lock,
> +			     struct ww_acquire_ctx *ww_ctx,
> +			     struct ww_acquire_ctx *hold_ctx)
> +{
> +	struct task_struct *owner = __mutex_owner(lock);
> +
> +	lockdep_assert_held(&lock->wait_lock);
> +
> +	/*
> +	 * Possible through __ww_mutex_add_waiter() when we race with
> +	 * ww_mutex_set_context_fastpath(). In that case we'll get here again
> +	 * through __ww_mutex_check_waiters().
> +	 */
> +	if (!hold_ctx)
> +		return false;
> +
> +	/*
> +	 * Can have !owner because of __mutex_unlock_slowpath(), but if owner,
> +	 * it cannot go away because we'll have FLAG_WAITERS set and hold
> +	 * wait_lock.
> +	 */
> +	if (!owner)
> +		return false;
> +
> +	if (ww_ctx->acquired > 0 && __ww_ctx_stamp_after(hold_ctx, ww_ctx)) {
> +		hold_ctx->wounded = 1;
> +		if (owner != current)
> +			wake_up_process(owner);
> +
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * We just acquired @lock under @ww_ctx, if there are later contexts waiting
> + * behind us on the wait-list, check if they need wounding/killing.
> + *
> + * See __ww_mutex_add_waiter() for the list-order construction; basically the
> + * list is ordered by stamp, smallest (oldest) first.
> + *
> + * This relies on never mixing wait-die/wound-wait on the same wait-list; which is
> + * currently ensured by that being a ww_class property.
>    *
>    * The current task must not be on the wait list.
>    */
>   static void __sched
> -__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> +__ww_mutex_check_waiters(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>   {
>   	struct mutex_waiter *cur;
>   
> @@ -311,66 +397,50 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>   		if (!cur->ww_ctx)
>   			continue;
>   
> -		if (cur->ww_ctx->acquired > 0 &&
> -		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
> -			debug_mutex_wake_waiter(lock, cur);
> -			wake_up_process(cur->task);
> -		}
> -
> -		break;
> +		if (__ww_mutex_die(lock, cur, ww_ctx) ||
> +		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
> +			break;
>   	}
>   }
>   
>   /*
> - * After acquiring lock with fastpath or when we lost out in contested
> - * slowpath, set ctx and wake up any waiters so they can recheck.
> + * After acquiring lock with fastpath, where we do not hold wait_lock, set ctx
> + * and wake up any waiters so they can recheck.
>    */
>   static __always_inline void
>   ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>   {
>   	ww_mutex_lock_acquired(lock, ctx);
>   
> -	lock->ctx = ctx;
> -
>   	/*
>   	 * The lock->ctx update should be visible on all cores before
> -	 * the atomic read is done, otherwise contended waiters might be
> +	 * the list_empty check is done, otherwise contended waiters might be
>   	 * missed. The contended waiters will either see ww_ctx == NULL
>   	 * and keep spinning, or it will acquire wait_lock, add itself
>   	 * to waiter list and sleep.
>   	 */
> -	smp_mb(); /* ^^^ */
> +	smp_mb(); /* See comments above and below. */
>   
>   	/*
> -	 * Check if lock is contended, if not there is nobody to wake up
> +	 * [W] ww->ctx = ctx	[W] list_add_tail()
> +	 *     MB		    MB
> +	 * [R] list_empty()	[R] ww->ctx
> +	 *
> +	 * The memory barrier above pairs with the memory barrier in
> +	 * __ww_mutex_add_waiter() and makes sure we either observe ww->ctx
> +	 * and/or !empty list.
>   	 */
> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> +	if (likely(list_empty(&lock->base.wait_list)))
>   		return;
>   
>   	/*
> -	 * Uh oh, we raced in fastpath, wake up everyone in this case,
> -	 * so they can see the new lock->ctx.
> +	 * Uh oh, we raced in fastpath, check if any of the waiters need wounding.
>   	 */
>   	spin_lock(&lock->base.wait_lock);
> -	__ww_mutex_wakeup_for_backoff(&lock->base, ctx);
> +	__ww_mutex_check_waiters(&lock->base, ctx);
>   	spin_unlock(&lock->base.wait_lock);
>   }
>   
> -/*
> - * After acquiring lock in the slowpath set ctx.
> - *
> - * Unlike for the fast path, the caller ensures that waiters are woken up where
> - * necessary.
> - *
> - * Callers must hold the mutex wait_lock.
> - */
> -static __always_inline void
> -ww_mutex_set_context_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
> -{
> -	ww_mutex_lock_acquired(lock, ctx);
> -	lock->ctx = ctx;
> -}
> -
>   #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
>   
>   static inline
> @@ -646,37 +716,83 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
>   }
>   EXPORT_SYMBOL(ww_mutex_unlock);
>   
> +
> +static __always_inline int __sched
> +__ww_mutex_kill(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> +{
> +	if (ww_ctx->acquired > 0) {
> +#ifdef CONFIG_DEBUG_MUTEXES
> +		struct ww_mutex *ww;
> +
> +		ww = container_of(lock, struct ww_mutex, base);
> +		DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
> +		ww_ctx->contending_lock = ww;
> +#endif
> +		return -EDEADLK;
> +	}
> +
> +	return 0;
> +}
> +
> +
> +/*
> + * Check the wound condition for the current lock acquire.
> + *
> + * Wound-Wait: If we're wounded, kill ourself.
> + *
> + * Wait-Die: If we're trying to acquire a lock already held by an older
> + *           context, kill ourselves.
> + *
> + * Since __ww_mutex_add_waiter() orders the wait-list on stamp, we only have to
> + * look at waiters before us in the wait-list.
> + */
>   static inline int __sched
> -__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
> +__ww_mutex_check_wound(struct mutex *lock, struct mutex_waiter *waiter,
>   			    struct ww_acquire_ctx *ctx)
>   {
>   	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
>   	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>   	struct mutex_waiter *cur;
>   
> +	if (ctx->acquired == 0)
> +		return 0;
> +
> +	if (!ctx->is_wait_die) {
> +		if (ctx->wounded)
> +			return __ww_mutex_kill(lock, ctx);
> +
> +		return 0;
> +	}
> +
>   	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
> -		goto deadlock;
> +		return __ww_mutex_kill(lock, ctx);
>   
>   	/*
>   	 * If there is a waiter in front of us that has a context, then its
> -	 * stamp is earlier than ours and we must back off.
> +	 * stamp is earlier than ours and we must wound ourself.
>   	 */
>   	cur = waiter;
>   	list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
> -		if (cur->ww_ctx)
> -			goto deadlock;
> +		if (!cur->ww_ctx)
> +			continue;
> +
> +		return __ww_mutex_kill(lock, ctx);
>   	}
>   
>   	return 0;
> -
> -deadlock:
> -#ifdef CONFIG_DEBUG_MUTEXES
> -	DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
> -	ctx->contending_lock = ww;
> -#endif
> -	return -EDEADLK;
>   }
>   
> +/*
> + * Add @waiter to the wait-list, keep the wait-list ordered by stamp, smallest
> + * first. Such that older contexts are preferred to acquire the lock over
> + * younger contexts.
> + *
> + * Waiters without context are interspersed in FIFO order.
> + *
> + * Furthermore, for Wait-Die kill ourself immediately when possible (there are
> + * older contexts already waiting) to avoid unnecessary waiting and for
> + * Wound-Wait ensure we wound the owning context when it is younger.
> + */
>   static inline int __sched
>   __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   		      struct mutex *lock,
> @@ -684,16 +800,21 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   {
>   	struct mutex_waiter *cur;
>   	struct list_head *pos;
> +	bool is_wait_die;
>   
>   	if (!ww_ctx) {
>   		list_add_tail(&waiter->list, &lock->wait_list);
>   		return 0;
>   	}
>   
> +	is_wait_die = ww_ctx->is_wait_die;
> +
>   	/*
>   	 * Add the waiter before the first waiter with a higher stamp.
>   	 * Waiters without a context are skipped to avoid starving
> -	 * them.
> +	 * them. Wait-Die waiters may back off here. Wound-Wait waiters
> +	 * never back off here, but they are sorted in stamp order and
> +	 * may wound the lock holder.
>   	 */
>   	pos = &lock->wait_list;
>   	list_for_each_entry_reverse(cur, &lock->wait_list, list) {
> @@ -701,16 +822,16 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   			continue;
>   
>   		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
> -			/* Back off immediately if necessary. */
> -			if (ww_ctx->acquired > 0) {
> -#ifdef CONFIG_DEBUG_MUTEXES
> -				struct ww_mutex *ww;
> -
> -				ww = container_of(lock, struct ww_mutex, base);
> -				DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
> -				ww_ctx->contending_lock = ww;
> -#endif
> -				return -EDEADLK;
> +			/*
> +			 * Wait-Die: if we find an older context waiting, there
> +			 * is no point in queueing behind it, as we'd have to
> +			 * wound ourselves the moment it would acquire the
> +			 * lock.
> +			 */
> +			if (is_wait_die) {
> +				int ret = __ww_mutex_kill(lock, ww_ctx);
> +				if (ret)
> +					return ret;
>   			}
>   
>   			break;
> @@ -718,17 +839,29 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   
>   		pos = &cur->list;
>   
> +		/* Wait-Die: ensure younger waiters die. */
> +		__ww_mutex_die(lock, cur, ww_ctx);
> +	}
> +
> +	list_add_tail(&waiter->list, pos);
> +
> +	/*
> +	 * Wound-Wait: if we're blocking on a mutex owned by a younger context,
> +	 * wound that such that we might proceed.
> +	 */
> +	if (!is_wait_die) {
> +		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
> +
>   		/*
> -		 * Wake up the waiter so that it gets a chance to back
> -		 * off.
> +		 * See ww_mutex_set_context_fastpath(). Orders the
> +		 * list_add_tail() vs the ww->ctx load, such that either we
> +		 * or the fastpath will wound @ww->ctx.
>   		 */
> -		if (cur->ww_ctx->acquired > 0) {
> -			debug_mutex_wake_waiter(lock, cur);
> -			wake_up_process(cur->task);
> -		}
> +		smp_mb();
> +
> +		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
>   	}
>   
> -	list_add_tail(&waiter->list, pos);
>   	return 0;
>   }
>   
> @@ -751,6 +884,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	if (use_ww_ctx && ww_ctx) {
>   		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>   			return -EALREADY;
> +
> +		/*
> +		 * Reset the wounded flag after a kill.  No other process can
> +		 * race and wound us here since they can't have a valid owner
> +		 * pointer at this time.
> +		 */
> +		if (ww_ctx->acquired == 0)
> +			ww_ctx->wounded = 0;
>   	}
>   
>   	preempt_disable();
> @@ -772,7 +913,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	 */
>   	if (__mutex_trylock(lock)) {
>   		if (use_ww_ctx && ww_ctx)
> -			__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
> +			__ww_mutex_check_waiters(lock, ww_ctx);
>   
>   		goto skip_wait;
>   	}
> @@ -790,10 +931,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   		waiter.ww_ctx = MUTEX_POISON_WW_CTX;
>   #endif
>   	} else {
> -		/* Add in stamp order, waking up waiters that must back off. */
> +		/* Add in stamp order, waking up waiters that must wound themselves. */
>   		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
>   		if (ret)
> -			goto err_early_backoff;
> +			goto err_early_kill;
>   
>   		waiter.ww_ctx = ww_ctx;
>   	}
> @@ -824,8 +965,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   			goto err;
>   		}
>   
> -		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
> -			ret = __ww_mutex_lock_check_stamp(lock, &waiter, ww_ctx);
> +		if (use_ww_ctx && ww_ctx) {
> +			ret = __ww_mutex_check_wound(lock, &waiter, ww_ctx);
>   			if (ret)
>   				goto err;
>   		}
> @@ -859,6 +1000,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   acquired:
>   	__set_current_state(TASK_RUNNING);
>   
> +	if (use_ww_ctx && ww_ctx) {
> +		/*
> +		 * Wound-Wait; we stole the lock (!first_waiter), check the
> +		 * waiters. This, together with XXX, ensures __ww_mutex_wound()
> +		 * only needs to check the first waiter (with context).
> +		 */
> +		if (!ww_ctx->is_wait_die && !__mutex_waiter_is_first(lock, &waiter))
> +			__ww_mutex_check_waiters(lock, ww_ctx);
> +	}
> +
>   	mutex_remove_waiter(lock, &waiter, current);
>   	if (likely(list_empty(&lock->wait_list)))
>   		__mutex_clear_flag(lock, MUTEX_FLAGS);
> @@ -870,7 +1021,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	lock_acquired(&lock->dep_map, ip);
>   
>   	if (use_ww_ctx && ww_ctx)
> -		ww_mutex_set_context_slowpath(ww, ww_ctx);
> +		ww_mutex_lock_acquired(ww, ww_ctx);
>   
>   	spin_unlock(&lock->wait_lock);
>   	preempt_enable();
> @@ -879,7 +1030,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   err:
>   	__set_current_state(TASK_RUNNING);
>   	mutex_remove_waiter(lock, &waiter, current);
> -err_early_backoff:
> +err_early_kill:
>   	spin_unlock(&lock->wait_lock);
>   	debug_mutex_free_waiter(&waiter);
>   	mutex_release(&lock->dep_map, 1, ip);



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-14 16:43                   ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-14 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On 06/14/2018 04:42 PM, Peter Zijlstra wrote:
> On Thu, Jun 14, 2018 at 01:48:39PM +0200, Thomas Hellstrom wrote:
>> The literature makes a distinction between "killed" and "wounded". In our
>> context, "Killed" is when a transaction actually receives an -EDEADLK and
>> needs to back off. "Wounded" is when someone (typically another transaction)
>> requests a transaction to kill itself. A wound will often, but not always,
>> lead to a kill. If the wounded transaction has finished its locking
>> sequence, or has the opportunity to grab uncontended ww mutexes or steal
>> contended (non-handoff) ww mutexes to finish its transaction it will do so
>> and never kill itself.
> Hopefully I got it all right this time; I folded your patch in and
> mucked around with it a bit, but haven't done anything except compile
> it.
>
> I left the context/transaction thing because well, that's what we called
> the thing.

Overall, I think this looks fine. I'll just fix up the FLAG_WAITERS 
setting and affected comments and do some torture testing on it.

Are you OK with adding the new feature and the cleanup in the same patch?

Thomas



>
>
> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index 39fda195bf78..50ef5a10cfa0 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -8,6 +8,8 @@
>    *
>    * Wound/wait implementation:
>    *  Copyright (C) 2013 Canonical Ltd.
> + * Choice of algorithm:
> + *  Copyright (C) 2018 WMWare Inc.
>    *
>    * This file contains the main data structure and API definitions.
>    */
> @@ -23,14 +25,17 @@ struct ww_class {
>   	struct lock_class_key mutex_key;
>   	const char *acquire_name;
>   	const char *mutex_name;
> +	unsigned int is_wait_die;
>   };
>   
>   struct ww_acquire_ctx {
>   	struct task_struct *task;
>   	unsigned long stamp;
> -	unsigned acquired;
> +	unsigned int acquired;
> +	unsigned short wounded;
> +	unsigned short is_wait_die;
>   #ifdef CONFIG_DEBUG_MUTEXES
> -	unsigned done_acquire;
> +	unsigned int done_acquire;
>   	struct ww_class *ww_class;
>   	struct ww_mutex *contending_lock;
>   #endif
> @@ -38,8 +43,8 @@ struct ww_acquire_ctx {
>   	struct lockdep_map dep_map;
>   #endif
>   #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> -	unsigned deadlock_inject_interval;
> -	unsigned deadlock_inject_countdown;
> +	unsigned int deadlock_inject_interval;
> +	unsigned int deadlock_inject_countdown;
>   #endif
>   };
>   
> @@ -58,17 +63,21 @@ struct ww_mutex {
>   # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
>   #endif
>   
> -#define __WW_CLASS_INITIALIZER(ww_class) \
> +#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die)	    \
>   		{ .stamp = ATOMIC_LONG_INIT(0) \
>   		, .acquire_name = #ww_class "_acquire" \
> -		, .mutex_name = #ww_class "_mutex" }
> +		, .mutex_name = #ww_class "_mutex" \
> +		, .is_wait_die = _is_wait_die }
>   
>   #define __WW_MUTEX_INITIALIZER(lockname, class) \
>   		{ .base =  __MUTEX_INITIALIZER(lockname.base) \
>   		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
>   
> +#define DEFINE_WD_CLASS(classname) \
> +	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
> +
>   #define DEFINE_WW_CLASS(classname) \
> -	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
> +	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
>   
>   #define DEFINE_WW_MUTEX(mutexname, ww_class) \
>   	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
> @@ -123,6 +132,8 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
>   	ctx->task = current;
>   	ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
>   	ctx->acquired = 0;
> +	ctx->wounded = false;
> +	ctx->is_wait_die = ww_class->is_wait_die;
>   #ifdef CONFIG_DEBUG_MUTEXES
>   	ctx->ww_class = ww_class;
>   	ctx->done_acquire = 0;
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index f44f658ae629..9e244af4647d 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -244,6 +244,22 @@ void __sched mutex_lock(struct mutex *lock)
>   EXPORT_SYMBOL(mutex_lock);
>   #endif
>   
> +/*
> + * Wait-Die:
> + *   The newer transactions are killed when:
> + *     It (the new transaction) makes a request for a lock being held
> + *     by an older transaction.
> + *
> + * Wound-Wait:
> + *   The newer transactions are wounded when:
> + *     An older transaction makes a request for a lock being held by
> + *     the newer transaction.
> + */
> +
> +/*
> + * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
> + * it.
> + */
>   static __always_inline void
>   ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   {
> @@ -282,26 +298,96 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
>   #endif
>   	ww_ctx->acquired++;
> +	ww->ctx = ww_ctx;
>   }
>   
> +/*
> + * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
> + * favour of @b.
> + */
>   static inline bool __sched
>   __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>   {
> -	return a->stamp - b->stamp <= LONG_MAX &&
> -	       (a->stamp != b->stamp || a > b);
> +
> +	return (signed long)(a->stamp - b->stamp) > 0;
>   }
>   
>   /*
> - * Wake up any waiters that may have to back off when the lock is held by the
> - * given context.
> + * Wait-Die; wake a younger waiter context (when locks held) such that it can die.
>    *
> - * Due to the invariants on the wait list, this can only affect the first
> - * waiter with a context.
> + * Among waiters with context, only the first one can have other locks acquired
> + * already (ctx->acquired > 0), because __ww_mutex_add_waiter() and
> + * __ww_mutex_check_wound() wake any but the earliest context.
> + */
> +static bool __ww_mutex_die(struct mutex *lock, struct mutex_waiter *waiter,
> +		           struct ww_acquire_ctx *ww_ctx)
> +{
> +	if (!ww_ctx->is_wait_die)
> +		return false;
> +
> +	if (waiter->ww_ctx->acquired > 0 &&
> +			__ww_ctx_stamp_after(waiter->ww_ctx, ww_ctx)) {
> +		debug_mutex_wake_waiter(lock, waiter);
> +		wake_up_process(waiter->task);
> +	}
> +
> +	return true;
> +}
> +
> +/*
> + * Wound-Wait; wound a younger @hold_ctx (if it has locks held).
> + *
> + * XXX more; explain why we too only need to wake the first.
> + */
> +static bool __ww_mutex_wound(struct mutex *lock,
> +			     struct ww_acquire_ctx *ww_ctx,
> +			     struct ww_acquire_ctx *hold_ctx)
> +{
> +	struct task_struct *owner = __mutex_owner(lock);
> +
> +	lockdep_assert_held(&lock->wait_lock);
> +
> +	/*
> +	 * Possible through __ww_mutex_add_waiter() when we race with
> +	 * ww_mutex_set_context_fastpath(). In that case we'll get here again
> +	 * through __ww_mutex_check_waiters().
> +	 */
> +	if (!hold_ctx)
> +		return false;
> +
> +	/*
> +	 * Can have !owner because of __mutex_unlock_slowpath(), but if owner,
> +	 * it cannot go away because we'll have FLAG_WAITERS set and hold
> +	 * wait_lock.
> +	 */
> +	if (!owner)
> +		return false;
> +
> +	if (ww_ctx->acquired > 0 && __ww_ctx_stamp_after(hold_ctx, ww_ctx)) {
> +		hold_ctx->wounded = 1;
> +		if (owner != current)
> +			wake_up_process(owner);
> +
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * We just acquired @lock under @ww_ctx, if there are later contexts waiting
> + * behind us on the wait-list, check if they need wounding/killing.
> + *
> + * See __ww_mutex_add_waiter() for the list-order construction; basically the
> + * list is ordered by stamp, smallest (oldest) first.
> + *
> + * This relies on never mixing wait-die/wound-wait on the same wait-list; which is
> + * currently ensured by that being a ww_class property.
>    *
>    * The current task must not be on the wait list.
>    */
>   static void __sched
> -__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> +__ww_mutex_check_waiters(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>   {
>   	struct mutex_waiter *cur;
>   
> @@ -311,66 +397,50 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>   		if (!cur->ww_ctx)
>   			continue;
>   
> -		if (cur->ww_ctx->acquired > 0 &&
> -		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
> -			debug_mutex_wake_waiter(lock, cur);
> -			wake_up_process(cur->task);
> -		}
> -
> -		break;
> +		if (__ww_mutex_die(lock, cur, ww_ctx) ||
> +		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
> +			break;
>   	}
>   }
>   
>   /*
> - * After acquiring lock with fastpath or when we lost out in contested
> - * slowpath, set ctx and wake up any waiters so they can recheck.
> + * After acquiring lock with fastpath, where we do not hold wait_lock, set ctx
> + * and wake up any waiters so they can recheck.
>    */
>   static __always_inline void
>   ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>   {
>   	ww_mutex_lock_acquired(lock, ctx);
>   
> -	lock->ctx = ctx;
> -
>   	/*
>   	 * The lock->ctx update should be visible on all cores before
> -	 * the atomic read is done, otherwise contended waiters might be
> +	 * the list_empty check is done, otherwise contended waiters might be
>   	 * missed. The contended waiters will either see ww_ctx == NULL
>   	 * and keep spinning, or it will acquire wait_lock, add itself
>   	 * to waiter list and sleep.
>   	 */
> -	smp_mb(); /* ^^^ */
> +	smp_mb(); /* See comments above and below. */
>   
>   	/*
> -	 * Check if lock is contended, if not there is nobody to wake up
> +	 * [W] ww->ctx = ctx	[W] list_add_tail()
> +	 *     MB		    MB
> +	 * [R] list_empty()	[R] ww->ctx
> +	 *
> +	 * The memory barrier above pairs with the memory barrier in
> +	 * __ww_mutex_add_waiter() and makes sure we either observe ww->ctx
> +	 * and/or !empty list.
>   	 */
> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> +	if (likely(list_empty(&lock->base.wait_list)))
>   		return;
>   
>   	/*
> -	 * Uh oh, we raced in fastpath, wake up everyone in this case,
> -	 * so they can see the new lock->ctx.
> +	 * Uh oh, we raced in fastpath, check if any of the waiters need wounding.
>   	 */
>   	spin_lock(&lock->base.wait_lock);
> -	__ww_mutex_wakeup_for_backoff(&lock->base, ctx);
> +	__ww_mutex_check_waiters(&lock->base, ctx);
>   	spin_unlock(&lock->base.wait_lock);
>   }
>   
> -/*
> - * After acquiring lock in the slowpath set ctx.
> - *
> - * Unlike for the fast path, the caller ensures that waiters are woken up where
> - * necessary.
> - *
> - * Callers must hold the mutex wait_lock.
> - */
> -static __always_inline void
> -ww_mutex_set_context_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
> -{
> -	ww_mutex_lock_acquired(lock, ctx);
> -	lock->ctx = ctx;
> -}
> -
>   #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
>   
>   static inline
> @@ -646,37 +716,83 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
>   }
>   EXPORT_SYMBOL(ww_mutex_unlock);
>   
> +
> +static __always_inline int __sched
> +__ww_mutex_kill(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> +{
> +	if (ww_ctx->acquired > 0) {
> +#ifdef CONFIG_DEBUG_MUTEXES
> +		struct ww_mutex *ww;
> +
> +		ww = container_of(lock, struct ww_mutex, base);
> +		DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
> +		ww_ctx->contending_lock = ww;
> +#endif
> +		return -EDEADLK;
> +	}
> +
> +	return 0;
> +}
> +
> +
> +/*
> + * Check the wound condition for the current lock acquire.
> + *
> + * Wound-Wait: If we're wounded, kill ourself.
> + *
> + * Wait-Die: If we're trying to acquire a lock already held by an older
> + *           context, kill ourselves.
> + *
> + * Since __ww_mutex_add_waiter() orders the wait-list on stamp, we only have to
> + * look at waiters before us in the wait-list.
> + */
>   static inline int __sched
> -__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
> +__ww_mutex_check_wound(struct mutex *lock, struct mutex_waiter *waiter,
>   			    struct ww_acquire_ctx *ctx)
>   {
>   	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
>   	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>   	struct mutex_waiter *cur;
>   
> +	if (ctx->acquired == 0)
> +		return 0;
> +
> +	if (!ctx->is_wait_die) {
> +		if (ctx->wounded)
> +			return __ww_mutex_kill(lock, ctx);
> +
> +		return 0;
> +	}
> +
>   	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
> -		goto deadlock;
> +		return __ww_mutex_kill(lock, ctx);
>   
>   	/*
>   	 * If there is a waiter in front of us that has a context, then its
> -	 * stamp is earlier than ours and we must back off.
> +	 * stamp is earlier than ours and we must wound ourself.
>   	 */
>   	cur = waiter;
>   	list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
> -		if (cur->ww_ctx)
> -			goto deadlock;
> +		if (!cur->ww_ctx)
> +			continue;
> +
> +		return __ww_mutex_kill(lock, ctx);
>   	}
>   
>   	return 0;
> -
> -deadlock:
> -#ifdef CONFIG_DEBUG_MUTEXES
> -	DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
> -	ctx->contending_lock = ww;
> -#endif
> -	return -EDEADLK;
>   }
>   
> +/*
> + * Add @waiter to the wait-list, keep the wait-list ordered by stamp, smallest
> + * first. Such that older contexts are preferred to acquire the lock over
> + * younger contexts.
> + *
> + * Waiters without context are interspersed in FIFO order.
> + *
> + * Furthermore, for Wait-Die kill ourself immediately when possible (there are
> + * older contexts already waiting) to avoid unnecessary waiting and for
> + * Wound-Wait ensure we wound the owning context when it is younger.
> + */
>   static inline int __sched
>   __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   		      struct mutex *lock,
> @@ -684,16 +800,21 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   {
>   	struct mutex_waiter *cur;
>   	struct list_head *pos;
> +	bool is_wait_die;
>   
>   	if (!ww_ctx) {
>   		list_add_tail(&waiter->list, &lock->wait_list);
>   		return 0;
>   	}
>   
> +	is_wait_die = ww_ctx->is_wait_die;
> +
>   	/*
>   	 * Add the waiter before the first waiter with a higher stamp.
>   	 * Waiters without a context are skipped to avoid starving
> -	 * them.
> +	 * them. Wait-Die waiters may back off here. Wound-Wait waiters
> +	 * never back off here, but they are sorted in stamp order and
> +	 * may wound the lock holder.
>   	 */
>   	pos = &lock->wait_list;
>   	list_for_each_entry_reverse(cur, &lock->wait_list, list) {
> @@ -701,16 +822,16 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   			continue;
>   
>   		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
> -			/* Back off immediately if necessary. */
> -			if (ww_ctx->acquired > 0) {
> -#ifdef CONFIG_DEBUG_MUTEXES
> -				struct ww_mutex *ww;
> -
> -				ww = container_of(lock, struct ww_mutex, base);
> -				DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
> -				ww_ctx->contending_lock = ww;
> -#endif
> -				return -EDEADLK;
> +			/*
> +			 * Wait-Die: if we find an older context waiting, there
> +			 * is no point in queueing behind it, as we'd have to
> +			 * wound ourselves the moment it would acquire the
> +			 * lock.
> +			 */
> +			if (is_wait_die) {
> +				int ret = __ww_mutex_kill(lock, ww_ctx);
> +				if (ret)
> +					return ret;
>   			}
>   
>   			break;
> @@ -718,17 +839,29 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   
>   		pos = &cur->list;
>   
> +		/* Wait-Die: ensure younger waiters die. */
> +		__ww_mutex_die(lock, cur, ww_ctx);
> +	}
> +
> +	list_add_tail(&waiter->list, pos);
> +
> +	/*
> +	 * Wound-Wait: if we're blocking on a mutex owned by a younger context,
> +	 * wound that such that we might proceed.
> +	 */
> +	if (!is_wait_die) {
> +		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
> +
>   		/*
> -		 * Wake up the waiter so that it gets a chance to back
> -		 * off.
> +		 * See ww_mutex_set_context_fastpath(). Orders the
> +		 * list_add_tail() vs the ww->ctx load, such that either we
> +		 * or the fastpath will wound @ww->ctx.
>   		 */
> -		if (cur->ww_ctx->acquired > 0) {
> -			debug_mutex_wake_waiter(lock, cur);
> -			wake_up_process(cur->task);
> -		}
> +		smp_mb();
> +
> +		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
>   	}
>   
> -	list_add_tail(&waiter->list, pos);
>   	return 0;
>   }
>   
> @@ -751,6 +884,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	if (use_ww_ctx && ww_ctx) {
>   		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>   			return -EALREADY;
> +
> +		/*
> +		 * Reset the wounded flag after a kill.  No other process can
> +		 * race and wound us here since they can't have a valid owner
> +		 * pointer at this time.
> +		 */
> +		if (ww_ctx->acquired == 0)
> +			ww_ctx->wounded = 0;
>   	}
>   
>   	preempt_disable();
> @@ -772,7 +913,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	 */
>   	if (__mutex_trylock(lock)) {
>   		if (use_ww_ctx && ww_ctx)
> -			__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
> +			__ww_mutex_check_waiters(lock, ww_ctx);
>   
>   		goto skip_wait;
>   	}
> @@ -790,10 +931,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   		waiter.ww_ctx = MUTEX_POISON_WW_CTX;
>   #endif
>   	} else {
> -		/* Add in stamp order, waking up waiters that must back off. */
> +		/* Add in stamp order, waking up waiters that must wound themselves. */
>   		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
>   		if (ret)
> -			goto err_early_backoff;
> +			goto err_early_kill;
>   
>   		waiter.ww_ctx = ww_ctx;
>   	}
> @@ -824,8 +965,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   			goto err;
>   		}
>   
> -		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
> -			ret = __ww_mutex_lock_check_stamp(lock, &waiter, ww_ctx);
> +		if (use_ww_ctx && ww_ctx) {
> +			ret = __ww_mutex_check_wound(lock, &waiter, ww_ctx);
>   			if (ret)
>   				goto err;
>   		}
> @@ -859,6 +1000,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   acquired:
>   	__set_current_state(TASK_RUNNING);
>   
> +	if (use_ww_ctx && ww_ctx) {
> +		/*
> +		 * Wound-Wait; we stole the lock (!first_waiter), check the
> +		 * waiters. This, together with XXX, ensures __ww_mutex_wound()
> +		 * only needs to check the first waiter (with context).
> +		 */
> +		if (!ww_ctx->is_wait_die && !__mutex_waiter_is_first(lock, &waiter))
> +			__ww_mutex_check_waiters(lock, ww_ctx);
> +	}
> +
>   	mutex_remove_waiter(lock, &waiter, current);
>   	if (likely(list_empty(&lock->wait_list)))
>   		__mutex_clear_flag(lock, MUTEX_FLAGS);
> @@ -870,7 +1021,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	lock_acquired(&lock->dep_map, ip);
>   
>   	if (use_ww_ctx && ww_ctx)
> -		ww_mutex_set_context_slowpath(ww, ww_ctx);
> +		ww_mutex_lock_acquired(ww, ww_ctx);
>   
>   	spin_unlock(&lock->wait_lock);
>   	preempt_enable();
> @@ -879,7 +1030,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   err:
>   	__set_current_state(TASK_RUNNING);
>   	mutex_remove_waiter(lock, &waiter, current);
> -err_early_backoff:
> +err_early_kill:
>   	spin_unlock(&lock->wait_lock);
>   	debug_mutex_free_waiter(&waiter);
>   	mutex_release(&lock->dep_map, 1, ip);


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-14 16:43                   ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-14 16:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, David Airlie,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	Greg Kroah-Hartman, Ingo Molnar, Philippe Ombredanne,
	Thomas Gleixner, Paul E. McKenney, linux-media

On 06/14/2018 04:42 PM, Peter Zijlstra wrote:
> On Thu, Jun 14, 2018 at 01:48:39PM +0200, Thomas Hellstrom wrote:
>> The literature makes a distinction between "killed" and "wounded". In our
>> context, "Killed" is when a transaction actually receives an -EDEADLK and
>> needs to back off. "Wounded" is when someone (typically another transaction)
>> requests a transaction to kill itself. A wound will often, but not always,
>> lead to a kill. If the wounded transaction has finished its locking
>> sequence, or has the opportunity to grab uncontended ww mutexes or steal
>> contended (non-handoff) ww mutexes to finish its transaction it will do so
>> and never kill itself.
> Hopefully I got it all right this time; I folded your patch in and
> mucked around with it a bit, but haven't done anything except compile
> it.
>
> I left the context/transaction thing because well, that's what we called
> the thing.

Overall, I think this looks fine. I'll just fix up the FLAG_WAITERS 
setting and affected comments and do some torture testing on it.

Are you OK with adding the new feature and the cleanup in the same patch?

Thomas



>
>
> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index 39fda195bf78..50ef5a10cfa0 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -8,6 +8,8 @@
>    *
>    * Wound/wait implementation:
>    *  Copyright (C) 2013 Canonical Ltd.
> + * Choice of algorithm:
> + *  Copyright (C) 2018 WMWare Inc.
>    *
>    * This file contains the main data structure and API definitions.
>    */
> @@ -23,14 +25,17 @@ struct ww_class {
>   	struct lock_class_key mutex_key;
>   	const char *acquire_name;
>   	const char *mutex_name;
> +	unsigned int is_wait_die;
>   };
>   
>   struct ww_acquire_ctx {
>   	struct task_struct *task;
>   	unsigned long stamp;
> -	unsigned acquired;
> +	unsigned int acquired;
> +	unsigned short wounded;
> +	unsigned short is_wait_die;
>   #ifdef CONFIG_DEBUG_MUTEXES
> -	unsigned done_acquire;
> +	unsigned int done_acquire;
>   	struct ww_class *ww_class;
>   	struct ww_mutex *contending_lock;
>   #endif
> @@ -38,8 +43,8 @@ struct ww_acquire_ctx {
>   	struct lockdep_map dep_map;
>   #endif
>   #ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
> -	unsigned deadlock_inject_interval;
> -	unsigned deadlock_inject_countdown;
> +	unsigned int deadlock_inject_interval;
> +	unsigned int deadlock_inject_countdown;
>   #endif
>   };
>   
> @@ -58,17 +63,21 @@ struct ww_mutex {
>   # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
>   #endif
>   
> -#define __WW_CLASS_INITIALIZER(ww_class) \
> +#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die)	    \
>   		{ .stamp = ATOMIC_LONG_INIT(0) \
>   		, .acquire_name = #ww_class "_acquire" \
> -		, .mutex_name = #ww_class "_mutex" }
> +		, .mutex_name = #ww_class "_mutex" \
> +		, .is_wait_die = _is_wait_die }
>   
>   #define __WW_MUTEX_INITIALIZER(lockname, class) \
>   		{ .base =  __MUTEX_INITIALIZER(lockname.base) \
>   		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
>   
> +#define DEFINE_WD_CLASS(classname) \
> +	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 1)
> +
>   #define DEFINE_WW_CLASS(classname) \
> -	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
> +	struct ww_class classname = __WW_CLASS_INITIALIZER(classname, 0)
>   
>   #define DEFINE_WW_MUTEX(mutexname, ww_class) \
>   	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
> @@ -123,6 +132,8 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
>   	ctx->task = current;
>   	ctx->stamp = atomic_long_inc_return_relaxed(&ww_class->stamp);
>   	ctx->acquired = 0;
> +	ctx->wounded = false;
> +	ctx->is_wait_die = ww_class->is_wait_die;
>   #ifdef CONFIG_DEBUG_MUTEXES
>   	ctx->ww_class = ww_class;
>   	ctx->done_acquire = 0;
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index f44f658ae629..9e244af4647d 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -244,6 +244,22 @@ void __sched mutex_lock(struct mutex *lock)
>   EXPORT_SYMBOL(mutex_lock);
>   #endif
>   
> +/*
> + * Wait-Die:
> + *   The newer transactions are killed when:
> + *     It (the new transaction) makes a request for a lock being held
> + *     by an older transaction.
> + *
> + * Wound-Wait:
> + *   The newer transactions are wounded when:
> + *     An older transaction makes a request for a lock being held by
> + *     the newer transaction.
> + */
> +
> +/*
> + * Associate the ww_mutex @ww with the context @ww_ctx under which we acquired
> + * it.
> + */
>   static __always_inline void
>   ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   {
> @@ -282,26 +298,96 @@ ww_mutex_lock_acquired(struct ww_mutex *ww, struct ww_acquire_ctx *ww_ctx)
>   	DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
>   #endif
>   	ww_ctx->acquired++;
> +	ww->ctx = ww_ctx;
>   }
>   
> +/*
> + * Determine if context @a is 'after' context @b. IOW, @a should be wounded in
> + * favour of @b.
> + */
>   static inline bool __sched
>   __ww_ctx_stamp_after(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
>   {
> -	return a->stamp - b->stamp <= LONG_MAX &&
> -	       (a->stamp != b->stamp || a > b);
> +
> +	return (signed long)(a->stamp - b->stamp) > 0;
>   }
>   
>   /*
> - * Wake up any waiters that may have to back off when the lock is held by the
> - * given context.
> + * Wait-Die; wake a younger waiter context (when locks held) such that it can die.
>    *
> - * Due to the invariants on the wait list, this can only affect the first
> - * waiter with a context.
> + * Among waiters with context, only the first one can have other locks acquired
> + * already (ctx->acquired > 0), because __ww_mutex_add_waiter() and
> + * __ww_mutex_check_wound() wake any but the earliest context.
> + */
> +static bool __ww_mutex_die(struct mutex *lock, struct mutex_waiter *waiter,
> +		           struct ww_acquire_ctx *ww_ctx)
> +{
> +	if (!ww_ctx->is_wait_die)
> +		return false;
> +
> +	if (waiter->ww_ctx->acquired > 0 &&
> +			__ww_ctx_stamp_after(waiter->ww_ctx, ww_ctx)) {
> +		debug_mutex_wake_waiter(lock, waiter);
> +		wake_up_process(waiter->task);
> +	}
> +
> +	return true;
> +}
> +
> +/*
> + * Wound-Wait; wound a younger @hold_ctx (if it has locks held).
> + *
> + * XXX more; explain why we too only need to wake the first.
> + */
> +static bool __ww_mutex_wound(struct mutex *lock,
> +			     struct ww_acquire_ctx *ww_ctx,
> +			     struct ww_acquire_ctx *hold_ctx)
> +{
> +	struct task_struct *owner = __mutex_owner(lock);
> +
> +	lockdep_assert_held(&lock->wait_lock);
> +
> +	/*
> +	 * Possible through __ww_mutex_add_waiter() when we race with
> +	 * ww_mutex_set_context_fastpath(). In that case we'll get here again
> +	 * through __ww_mutex_check_waiters().
> +	 */
> +	if (!hold_ctx)
> +		return false;
> +
> +	/*
> +	 * Can have !owner because of __mutex_unlock_slowpath(), but if owner,
> +	 * it cannot go away because we'll have FLAG_WAITERS set and hold
> +	 * wait_lock.
> +	 */
> +	if (!owner)
> +		return false;
> +
> +	if (ww_ctx->acquired > 0 && __ww_ctx_stamp_after(hold_ctx, ww_ctx)) {
> +		hold_ctx->wounded = 1;
> +		if (owner != current)
> +			wake_up_process(owner);
> +
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +/*
> + * We just acquired @lock under @ww_ctx, if there are later contexts waiting
> + * behind us on the wait-list, check if they need wounding/killing.
> + *
> + * See __ww_mutex_add_waiter() for the list-order construction; basically the
> + * list is ordered by stamp, smallest (oldest) first.
> + *
> + * This relies on never mixing wait-die/wound-wait on the same wait-list; which is
> + * currently ensured by that being a ww_class property.
>    *
>    * The current task must not be on the wait list.
>    */
>   static void __sched
> -__ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> +__ww_mutex_check_waiters(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>   {
>   	struct mutex_waiter *cur;
>   
> @@ -311,66 +397,50 @@ __ww_mutex_wakeup_for_backoff(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
>   		if (!cur->ww_ctx)
>   			continue;
>   
> -		if (cur->ww_ctx->acquired > 0 &&
> -		    __ww_ctx_stamp_after(cur->ww_ctx, ww_ctx)) {
> -			debug_mutex_wake_waiter(lock, cur);
> -			wake_up_process(cur->task);
> -		}
> -
> -		break;
> +		if (__ww_mutex_die(lock, cur, ww_ctx) ||
> +		    __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
> +			break;
>   	}
>   }
>   
>   /*
> - * After acquiring lock with fastpath or when we lost out in contested
> - * slowpath, set ctx and wake up any waiters so they can recheck.
> + * After acquiring lock with fastpath, where we do not hold wait_lock, set ctx
> + * and wake up any waiters so they can recheck.
>    */
>   static __always_inline void
>   ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
>   {
>   	ww_mutex_lock_acquired(lock, ctx);
>   
> -	lock->ctx = ctx;
> -
>   	/*
>   	 * The lock->ctx update should be visible on all cores before
> -	 * the atomic read is done, otherwise contended waiters might be
> +	 * the list_empty check is done, otherwise contended waiters might be
>   	 * missed. The contended waiters will either see ww_ctx == NULL
>   	 * and keep spinning, or it will acquire wait_lock, add itself
>   	 * to waiter list and sleep.
>   	 */
> -	smp_mb(); /* ^^^ */
> +	smp_mb(); /* See comments above and below. */
>   
>   	/*
> -	 * Check if lock is contended, if not there is nobody to wake up
> +	 * [W] ww->ctx = ctx	[W] list_add_tail()
> +	 *     MB		    MB
> +	 * [R] list_empty()	[R] ww->ctx
> +	 *
> +	 * The memory barrier above pairs with the memory barrier in
> +	 * __ww_mutex_add_waiter() and makes sure we either observe ww->ctx
> +	 * and/or !empty list.
>   	 */
> -	if (likely(!(atomic_long_read(&lock->base.owner) & MUTEX_FLAG_WAITERS)))
> +	if (likely(list_empty(&lock->base.wait_list)))
>   		return;
>   
>   	/*
> -	 * Uh oh, we raced in fastpath, wake up everyone in this case,
> -	 * so they can see the new lock->ctx.
> +	 * Uh oh, we raced in fastpath, check if any of the waiters need wounding.
>   	 */
>   	spin_lock(&lock->base.wait_lock);
> -	__ww_mutex_wakeup_for_backoff(&lock->base, ctx);
> +	__ww_mutex_check_waiters(&lock->base, ctx);
>   	spin_unlock(&lock->base.wait_lock);
>   }
>   
> -/*
> - * After acquiring lock in the slowpath set ctx.
> - *
> - * Unlike for the fast path, the caller ensures that waiters are woken up where
> - * necessary.
> - *
> - * Callers must hold the mutex wait_lock.
> - */
> -static __always_inline void
> -ww_mutex_set_context_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
> -{
> -	ww_mutex_lock_acquired(lock, ctx);
> -	lock->ctx = ctx;
> -}
> -
>   #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
>   
>   static inline
> @@ -646,37 +716,83 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock)
>   }
>   EXPORT_SYMBOL(ww_mutex_unlock);
>   
> +
> +static __always_inline int __sched
> +__ww_mutex_kill(struct mutex *lock, struct ww_acquire_ctx *ww_ctx)
> +{
> +	if (ww_ctx->acquired > 0) {
> +#ifdef CONFIG_DEBUG_MUTEXES
> +		struct ww_mutex *ww;
> +
> +		ww = container_of(lock, struct ww_mutex, base);
> +		DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
> +		ww_ctx->contending_lock = ww;
> +#endif
> +		return -EDEADLK;
> +	}
> +
> +	return 0;
> +}
> +
> +
> +/*
> + * Check the wound condition for the current lock acquire.
> + *
> + * Wound-Wait: If we're wounded, kill ourself.
> + *
> + * Wait-Die: If we're trying to acquire a lock already held by an older
> + *           context, kill ourselves.
> + *
> + * Since __ww_mutex_add_waiter() orders the wait-list on stamp, we only have to
> + * look at waiters before us in the wait-list.
> + */
>   static inline int __sched
> -__ww_mutex_lock_check_stamp(struct mutex *lock, struct mutex_waiter *waiter,
> +__ww_mutex_check_wound(struct mutex *lock, struct mutex_waiter *waiter,
>   			    struct ww_acquire_ctx *ctx)
>   {
>   	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
>   	struct ww_acquire_ctx *hold_ctx = READ_ONCE(ww->ctx);
>   	struct mutex_waiter *cur;
>   
> +	if (ctx->acquired == 0)
> +		return 0;
> +
> +	if (!ctx->is_wait_die) {
> +		if (ctx->wounded)
> +			return __ww_mutex_kill(lock, ctx);
> +
> +		return 0;
> +	}
> +
>   	if (hold_ctx && __ww_ctx_stamp_after(ctx, hold_ctx))
> -		goto deadlock;
> +		return __ww_mutex_kill(lock, ctx);
>   
>   	/*
>   	 * If there is a waiter in front of us that has a context, then its
> -	 * stamp is earlier than ours and we must back off.
> +	 * stamp is earlier than ours and we must wound ourself.
>   	 */
>   	cur = waiter;
>   	list_for_each_entry_continue_reverse(cur, &lock->wait_list, list) {
> -		if (cur->ww_ctx)
> -			goto deadlock;
> +		if (!cur->ww_ctx)
> +			continue;
> +
> +		return __ww_mutex_kill(lock, ctx);
>   	}
>   
>   	return 0;
> -
> -deadlock:
> -#ifdef CONFIG_DEBUG_MUTEXES
> -	DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
> -	ctx->contending_lock = ww;
> -#endif
> -	return -EDEADLK;
>   }
>   
> +/*
> + * Add @waiter to the wait-list, keep the wait-list ordered by stamp, smallest
> + * first. Such that older contexts are preferred to acquire the lock over
> + * younger contexts.
> + *
> + * Waiters without context are interspersed in FIFO order.
> + *
> + * Furthermore, for Wait-Die kill ourself immediately when possible (there are
> + * older contexts already waiting) to avoid unnecessary waiting and for
> + * Wound-Wait ensure we wound the owning context when it is younger.
> + */
>   static inline int __sched
>   __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   		      struct mutex *lock,
> @@ -684,16 +800,21 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   {
>   	struct mutex_waiter *cur;
>   	struct list_head *pos;
> +	bool is_wait_die;
>   
>   	if (!ww_ctx) {
>   		list_add_tail(&waiter->list, &lock->wait_list);
>   		return 0;
>   	}
>   
> +	is_wait_die = ww_ctx->is_wait_die;
> +
>   	/*
>   	 * Add the waiter before the first waiter with a higher stamp.
>   	 * Waiters without a context are skipped to avoid starving
> -	 * them.
> +	 * them. Wait-Die waiters may back off here. Wound-Wait waiters
> +	 * never back off here, but they are sorted in stamp order and
> +	 * may wound the lock holder.
>   	 */
>   	pos = &lock->wait_list;
>   	list_for_each_entry_reverse(cur, &lock->wait_list, list) {
> @@ -701,16 +822,16 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   			continue;
>   
>   		if (__ww_ctx_stamp_after(ww_ctx, cur->ww_ctx)) {
> -			/* Back off immediately if necessary. */
> -			if (ww_ctx->acquired > 0) {
> -#ifdef CONFIG_DEBUG_MUTEXES
> -				struct ww_mutex *ww;
> -
> -				ww = container_of(lock, struct ww_mutex, base);
> -				DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
> -				ww_ctx->contending_lock = ww;
> -#endif
> -				return -EDEADLK;
> +			/*
> +			 * Wait-Die: if we find an older context waiting, there
> +			 * is no point in queueing behind it, as we'd have to
> +			 * wound ourselves the moment it would acquire the
> +			 * lock.
> +			 */
> +			if (is_wait_die) {
> +				int ret = __ww_mutex_kill(lock, ww_ctx);
> +				if (ret)
> +					return ret;
>   			}
>   
>   			break;
> @@ -718,17 +839,29 @@ __ww_mutex_add_waiter(struct mutex_waiter *waiter,
>   
>   		pos = &cur->list;
>   
> +		/* Wait-Die: ensure younger waiters die. */
> +		__ww_mutex_die(lock, cur, ww_ctx);
> +	}
> +
> +	list_add_tail(&waiter->list, pos);
> +
> +	/*
> +	 * Wound-Wait: if we're blocking on a mutex owned by a younger context,
> +	 * wound that such that we might proceed.
> +	 */
> +	if (!is_wait_die) {
> +		struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
> +
>   		/*
> -		 * Wake up the waiter so that it gets a chance to back
> -		 * off.
> +		 * See ww_mutex_set_context_fastpath(). Orders the
> +		 * list_add_tail() vs the ww->ctx load, such that either we
> +		 * or the fastpath will wound @ww->ctx.
>   		 */
> -		if (cur->ww_ctx->acquired > 0) {
> -			debug_mutex_wake_waiter(lock, cur);
> -			wake_up_process(cur->task);
> -		}
> +		smp_mb();
> +
> +		__ww_mutex_wound(lock, ww_ctx, ww->ctx);
>   	}
>   
> -	list_add_tail(&waiter->list, pos);
>   	return 0;
>   }
>   
> @@ -751,6 +884,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	if (use_ww_ctx && ww_ctx) {
>   		if (unlikely(ww_ctx == READ_ONCE(ww->ctx)))
>   			return -EALREADY;
> +
> +		/*
> +		 * Reset the wounded flag after a kill.  No other process can
> +		 * race and wound us here since they can't have a valid owner
> +		 * pointer at this time.
> +		 */
> +		if (ww_ctx->acquired == 0)
> +			ww_ctx->wounded = 0;
>   	}
>   
>   	preempt_disable();
> @@ -772,7 +913,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	 */
>   	if (__mutex_trylock(lock)) {
>   		if (use_ww_ctx && ww_ctx)
> -			__ww_mutex_wakeup_for_backoff(lock, ww_ctx);
> +			__ww_mutex_check_waiters(lock, ww_ctx);
>   
>   		goto skip_wait;
>   	}
> @@ -790,10 +931,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   		waiter.ww_ctx = MUTEX_POISON_WW_CTX;
>   #endif
>   	} else {
> -		/* Add in stamp order, waking up waiters that must back off. */
> +		/* Add in stamp order, waking up waiters that must wound themselves. */
>   		ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
>   		if (ret)
> -			goto err_early_backoff;
> +			goto err_early_kill;
>   
>   		waiter.ww_ctx = ww_ctx;
>   	}
> @@ -824,8 +965,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   			goto err;
>   		}
>   
> -		if (use_ww_ctx && ww_ctx && ww_ctx->acquired > 0) {
> -			ret = __ww_mutex_lock_check_stamp(lock, &waiter, ww_ctx);
> +		if (use_ww_ctx && ww_ctx) {
> +			ret = __ww_mutex_check_wound(lock, &waiter, ww_ctx);
>   			if (ret)
>   				goto err;
>   		}
> @@ -859,6 +1000,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   acquired:
>   	__set_current_state(TASK_RUNNING);
>   
> +	if (use_ww_ctx && ww_ctx) {
> +		/*
> +		 * Wound-Wait; we stole the lock (!first_waiter), check the
> +		 * waiters. This, together with XXX, ensures __ww_mutex_wound()
> +		 * only needs to check the first waiter (with context).
> +		 */
> +		if (!ww_ctx->is_wait_die && !__mutex_waiter_is_first(lock, &waiter))
> +			__ww_mutex_check_waiters(lock, ww_ctx);
> +	}
> +
>   	mutex_remove_waiter(lock, &waiter, current);
>   	if (likely(list_empty(&lock->wait_list)))
>   		__mutex_clear_flag(lock, MUTEX_FLAGS);
> @@ -870,7 +1021,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	lock_acquired(&lock->dep_map, ip);
>   
>   	if (use_ww_ctx && ww_ctx)
> -		ww_mutex_set_context_slowpath(ww, ww_ctx);
> +		ww_mutex_lock_acquired(ww, ww_ctx);
>   
>   	spin_unlock(&lock->wait_lock);
>   	preempt_enable();
> @@ -879,7 +1030,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   err:
>   	__set_current_state(TASK_RUNNING);
>   	mutex_remove_waiter(lock, &waiter, current);
> -err_early_backoff:
> +err_early_kill:
>   	spin_unlock(&lock->wait_lock);
>   	debug_mutex_free_waiter(&waiter);
>   	mutex_release(&lock->dep_map, 1, ip);


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-14 16:43                   ` Thomas Hellstrom
  (?)
@ 2018-06-14 18:51                     ` Peter Zijlstra
  -1 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-14 18:51 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On Thu, Jun 14, 2018 at 06:43:40PM +0200, Thomas Hellstrom wrote:
> Overall, I think this looks fine. I'll just fix up the FLAG_WAITERS setting
> and affected comments and do some torture testing on it.

Thanks!

> Are you OK with adding the new feature and the cleanup in the same patch?

I suppose so, trying to untangle that will be a bit of a pain. But if
you feel so inclined I'm not going to stop you :-)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-14 18:51                     ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-14 18:51 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On Thu, Jun 14, 2018 at 06:43:40PM +0200, Thomas Hellstrom wrote:
> Overall, I think this looks fine. I'll just fix up the FLAG_WAITERS setting
> and affected comments and do some torture testing on it.

Thanks!

> Are you OK with adding the new feature and the cleanup in the same patch?

I suppose so, trying to untangle that will be a bit of a pain. But if
you feel so inclined I'm not going to stop you :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-14 18:51                     ` Peter Zijlstra
  0 siblings, 0 replies; 43+ messages in thread
From: Peter Zijlstra @ 2018-06-14 18:51 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, David Airlie,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	Greg Kroah-Hartman, Ingo Molnar, Philippe Ombredanne,
	Thomas Gleixner, Paul E. McKenney, linux-media

On Thu, Jun 14, 2018 at 06:43:40PM +0200, Thomas Hellstrom wrote:
> Overall, I think this looks fine. I'll just fix up the FLAG_WAITERS setting
> and affected comments and do some torture testing on it.

Thanks!

> Are you OK with adding the new feature and the cleanup in the same patch?

I suppose so, trying to untangle that will be a bit of a pain. But if
you feel so inclined I'm not going to stop you :-)
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
  2018-06-14 18:51                     ` Peter Zijlstra
  (?)
@ 2018-06-15 12:07                       ` Thomas Hellstrom
  -1 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-15 12:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On 06/14/2018 08:51 PM, Peter Zijlstra wrote:
> On Thu, Jun 14, 2018 at 06:43:40PM +0200, Thomas Hellstrom wrote:
>> Overall, I think this looks fine. I'll just fix up the FLAG_WAITERS setting
>> and affected comments and do some torture testing on it.
> Thanks!
>
>> Are you OK with adding the new feature and the cleanup in the same patch?
> I suppose so, trying to untangle that will be a bit of a pain. But if
> you feel so inclined I'm not going to stop you :-)

OK, I did some untangling. Sending out the resulting two patches. There 
are very minor changes in comments and naming, mostly trying to avoid 
"wound" where we really mean "die".

The only functional change is that I've moved the waiter-wounding-owner 
path to *after* we actually set the FLAG_WAITER so that we make sure a 
valid owner pointer remains valid while we hold the spinlock. This also 
means we can replace an smp_mb() with smp_mb__after_atomic().

Sending the patches as separate emails. Please let me know if you're OK 
with them and also the author / co-author info, and if so, I'll send out 
the full series again.

Thanks,

/Thomas



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-15 12:07                       ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-15 12:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: dri-devel, linux-kernel, Ingo Molnar, Jonathan Corbet,
	Gustavo Padovan, Maarten Lankhorst, Sean Paul, David Airlie,
	Davidlohr Bueso, Paul E. McKenney, Josh Triplett,
	Thomas Gleixner, Kate Stewart, Philippe Ombredanne,
	Greg Kroah-Hartman, linux-doc, linux-media, linaro-mm-sig

On 06/14/2018 08:51 PM, Peter Zijlstra wrote:
> On Thu, Jun 14, 2018 at 06:43:40PM +0200, Thomas Hellstrom wrote:
>> Overall, I think this looks fine. I'll just fix up the FLAG_WAITERS setting
>> and affected comments and do some torture testing on it.
> Thanks!
>
>> Are you OK with adding the new feature and the cleanup in the same patch?
> I suppose so, trying to untangle that will be a bit of a pain. But if
> you feel so inclined I'm not going to stop you :-)

OK, I did some untangling. Sending out the resulting two patches. There 
are very minor changes in comments and naming, mostly trying to avoid 
"wound" where we really mean "die".

The only functional change is that I've moved the waiter-wounding-owner 
path to *after* we actually set the FLAG_WAITER so that we make sure a 
valid owner pointer remains valid while we hold the spinlock. This also 
means we can replace an smp_mb() with smp_mb__after_atomic().

Sending the patches as separate emails. Please let me know if you're OK 
with them and also the author / co-author info, and if so, I'll send out 
the full series again.

Thanks,

/Thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
@ 2018-06-15 12:07                       ` Thomas Hellstrom
  0 siblings, 0 replies; 43+ messages in thread
From: Thomas Hellstrom @ 2018-06-15 12:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kate Stewart, Davidlohr Bueso, Jonathan Corbet, David Airlie,
	linux-doc, linux-kernel, dri-devel, Josh Triplett, linaro-mm-sig,
	Greg Kroah-Hartman, Ingo Molnar, Philippe Ombredanne,
	Thomas Gleixner, Paul E. McKenney, linux-media

On 06/14/2018 08:51 PM, Peter Zijlstra wrote:
> On Thu, Jun 14, 2018 at 06:43:40PM +0200, Thomas Hellstrom wrote:
>> Overall, I think this looks fine. I'll just fix up the FLAG_WAITERS setting
>> and affected comments and do some torture testing on it.
> Thanks!
>
>> Are you OK with adding the new feature and the cleanup in the same patch?
> I suppose so, trying to untangle that will be a bit of a pain. But if
> you feel so inclined I'm not going to stop you :-)

OK, I did some untangling. Sending out the resulting two patches. There 
are very minor changes in comments and naming, mostly trying to avoid 
"wound" where we really mean "die".

The only functional change is that I've moved the waiter-wounding-owner 
path to *after* we actually set the FLAG_WAITER so that we make sure a 
valid owner pointer remains valid while we hold the spinlock. This also 
means we can replace an smp_mb() with smp_mb__after_atomic().

Sending the patches as separate emails. Please let me know if you're OK 
with them and also the author / co-author info, and if so, I'll send out 
the full series again.

Thanks,

/Thomas


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2018-06-15 12:07 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-13  7:47 [PATCH 0/2] locking,drm: Fix ww mutex naming / algorithm inconsistency Thomas Hellstrom
2018-06-13  7:47 ` [PATCH 0/2] locking, drm: " Thomas Hellstrom
2018-06-13  7:47 ` [PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes Thomas Hellstrom
2018-06-13  7:47   ` Thomas Hellstrom
2018-06-13  7:47   ` Thomas Hellstrom
2018-06-13  7:54   ` Greg Kroah-Hartman
2018-06-13  7:54     ` Greg Kroah-Hartman
2018-06-13  7:54     ` Greg Kroah-Hartman
2018-06-13  8:34     ` Thomas Hellstrom
2018-06-13  8:34       ` Thomas Hellstrom
2018-06-13  8:34       ` Thomas Hellstrom
2018-06-13  9:50   ` Peter Zijlstra
2018-06-13  9:50     ` Peter Zijlstra
2018-06-13  9:50     ` Peter Zijlstra
2018-06-13 10:40     ` Thomas Hellstrom
2018-06-13 10:40       ` Thomas Hellstrom
2018-06-13 10:40       ` Thomas Hellstrom
2018-06-13 13:10       ` Peter Zijlstra
2018-06-13 13:10         ` Peter Zijlstra
2018-06-13 13:10         ` Peter Zijlstra
2018-06-13 14:05         ` Thomas Hellstrom
2018-06-13 14:05           ` Thomas Hellstrom
2018-06-13 14:05           ` Thomas Hellstrom
2018-06-14 10:51           ` Peter Zijlstra
2018-06-14 10:51             ` Peter Zijlstra
2018-06-14 10:51             ` Peter Zijlstra
2018-06-14 11:48             ` Thomas Hellstrom
2018-06-14 11:48               ` Thomas Hellstrom
2018-06-14 11:48               ` Thomas Hellstrom
2018-06-14 14:42               ` Peter Zijlstra
2018-06-14 14:42                 ` Peter Zijlstra
2018-06-14 14:42                 ` Peter Zijlstra
2018-06-14 16:43                 ` Thomas Hellstrom
2018-06-14 16:43                   ` Thomas Hellstrom
2018-06-14 16:43                   ` Thomas Hellstrom
2018-06-14 18:51                   ` Peter Zijlstra
2018-06-14 18:51                     ` Peter Zijlstra
2018-06-14 18:51                     ` Peter Zijlstra
2018-06-15 12:07                     ` Thomas Hellstrom
2018-06-15 12:07                       ` Thomas Hellstrom
2018-06-15 12:07                       ` Thomas Hellstrom
2018-06-13  7:47 ` [PATCH 2/2] drm: Change deadlock-avoidance algorithm for the modeset locks Thomas Hellstrom
2018-06-13  7:47   ` Thomas Hellstrom

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.