linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2
@ 2019-04-05 19:21 Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 01/12] locking/rwsem: Implement a new locking scheme Waiman Long
                   ` (13 more replies)
  0 siblings, 14 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

 v2:
  - Move the negative reader count checking patch (patch 12->10)
    forward to before the merge owner to count patch as suggested by
    Linus & expand the comment.
  - Change the reader-owned rwsem spinning from count based to time
    based to have better control of the max time allowed.

This is part 2 of a 3-part (0/1/2) series to rearchitect the internal
operation of rwsem.

part 0: merged into tip
part 1: https://lore.kernel.org/lkml/20190404174320.22416-1-longman@redhat.com/

This patchset revamps the current rwsem-xadd implementation to make
it saner and easier to work with. It also implements the following 3
new features:

 1) Waiter lock handoff
 2) Reader optimistic spinning
 3) Store write-lock owner in the atomic count (x86-64 only)

Waiter lock handoff is similar to the mechanism currently in the mutex
code. This ensures that lock starvation won't happen.

Reader optimistic spinning enables readers to acquire the lock more
quickly.  So workloads that use a mix of readers and writers should
see an increase in performance as long as the reader critical sections
are short.

Finally, storing the write-lock owner into the count will allow
optimistic spinners to get to the lock holder's task structure more
quickly and eliminating the timing gap where the write lock is acquired
but the owner isn't known yet. This is important for RT tasks where
spinning on a lock with an unknown owner is not allowed.

Because of the fact that multiple readers can share the same lock,
there is a natural preference for readers when measuring in term of
locking throughput as more readers are likely to get into the locking
fast path than the writers. With waiter lock handoff, we are not going
to starve the writers.

On a 8-socket 120-core 240-thread IvyBridge-EX system with 120 reader
and writer locking threads, the min/mean/max locking operations done
in a 5-second testing window before the patchset were:

  120 readers, Iterations Min/Mean/Max = 399/400/401
  120 writers, Iterations Min/Mean/Max = 400/33,389/211,359

After the patchset, they became:

  120 readers, Iterations Min/Mean/Max = 584/10,266/26,609
  120 writers, Iterations Min/Mean/Max = 22,080/29,016/38,728

So it was much fairer to readers. With less locking threads, the readers
were preferred than writers.

Patch 1 implements a new rwsem locking scheme similar to what qrwlock
is current doing. Write lock is done by atomic_cmpxchg() while read
lock is still being done by atomic_add().

Patch 2 implments lock handoff to prevent lock starvation.

Patch 3 removes rwsem_wake() wakeup optimization as it doesn't work
with lock handoff.

Patch 4 makes rwsem_spin_on_owner() returns owner state.

Patch 5 disallows RT tasks to spin on a rwsem with unknown owner.

Patch 6 makes reader wakeup to wake almost all the readers in the wait
queue instead of just those in the front.

Patch 7 enables reader to spin on a writer-owned rwsem.

Patch 8 enables a writer to spin on a reader-owned rwsem for at most
25us.

Patch 9 adds some new rwsem owner access helper functions.

Patch 10 handles the case of too many readers by reserving the sign
bit to designate that a reader lock attempt will fail and the locking
reader will be put to sleep. This will ensure that we will not overflow
the reader count.

Patch 11 merges the write-lock owner task pointer into the count.
Only 64-bit count has enough space to provide a reasonable number of
bits for reader count. This is for x86-64 only for the time being.

Patch 12 eliminates redundant computation of the merged owner-count.

With a locking microbenchmark running on 5.1 based kernel, the total
locking rates (in kops/s) on a 8-socket IvyBridge-EX system with equal
numbers of readers and writers (mixed) before and after this patchset
were:

   # of Threads   Before Patch      After Patch
   ------------   ------------      -----------
        2            1,179             9,436
        4            1,505             8,268
        8              721             7,041
       16              575             7,652
       32               70             2,189
       64               39               534

Waiman Long (12):
  locking/rwsem: Implement a new locking scheme
  locking/rwsem: Implement lock handoff to prevent lock starvation
  locking/rwsem: Remove rwsem_wake() wakeup optimization
  locking/rwsem: Make rwsem_spin_on_owner() return owner state
  locking/rwsem: Ensure an RT task will not spin on reader
  locking/rwsem: Wake up almost all readers in wait queue
  locking/rwsem: Enable readers spinning on writer
  locking/rwsem: Enable time-based spinning on reader-owned rwsem
  locking/rwsem: Add more rwsem owner access helpers
  locking/rwsem: Guard against making count negative
  locking/rwsem: Merge owner into count on x86-64
  locking/rwsem: Remove redundant computation of writer lock word

 kernel/locking/lock_events_list.h |   4 +
 kernel/locking/rwsem-xadd.c       | 635 +++++++++++++++++++-----------
 kernel/locking/rwsem.c            |   3 +-
 kernel/locking/rwsem.h            | 290 +++++++++++---
 4 files changed, 647 insertions(+), 285 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 01/12] locking/rwsem: Implement a new locking scheme
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

The current way of using various reader, writer and waiting biases
in the rwsem code are confusing and hard to understand. I have to
reread the rwsem count guide in the rwsem-xadd.c file from time to
time to remind myself how this whole thing works. It also makes the
rwsem code harder to be optimized.

To make rwsem more sane, a new locking scheme similar to the one in
qrwlock is now being used.  The atomic long count has the following
bit definitions:

  Bit  0   - writer locked bit
  Bit  1   - waiters present bit
  Bits 2-7 - reserved for future extension
  Bits 8-X - reader count (24/56 bits)

The cmpxchg instruction is now used to acquire the write lock. The read
lock is still acquired with xadd instruction, so there is no change here.
This scheme will allow up to 16M/64P active readers which should be
more than enough. We can always use some more reserved bits if necessary.

With that change, we can deterministically know if a rwsem has been
write-locked. Looking at the count alone, however, one cannot determine
for certain if a rwsem is owned by readers or not as the readers that
set the reader count bits may be in the process of backing out. So we
still need the reader-owned bit in the owner field to be sure.

With a locking microbenchmark running on 5.1 based kernel, the total
locking rates (in kops/s) of the benchmark on a 8-socket 120-core
IvyBridge-EX system before and after the patch were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        30,659   31,341   31,055   31,283
        2         8,909   16,457    9,884   17,659
        4         9,028   15,823    8,933   20,233
        8         8,410   14,212    7,230   17,140
       16         8,217   25,240    7,479   24,607

The locking rates of the benchmark on a Power8 system were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        12,963   13,647   13,275   13,601
        2         7,570   11,569    7,902   10,829
        4         5,232    5,516    5,466    5,435
        8         5,233    3,386    5,467    3,168

The locking rates of the benchmark on a 2-socket ARM64 system were
as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        21,495   21,046   21,524   21,074
        2         5,293   10,502    5,333   10,504
        4         5,325   11,463    5,358   11,631
        8         5,391   11,712    5,470   11,680

The performance are roughly the same before and after the patch. There
are run-to-run variations in performance. Runs with higher variances
usually have higher throughput.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 147 ++++++++++++------------------------
 kernel/locking/rwsem.h      |  74 +++++++++---------
 2 files changed, 86 insertions(+), 135 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 6b3ee9948bf1..adab6477be51 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -9,6 +9,8 @@
  *
  * Optimistic spinning by Tim Chen <tim.c.chen@intel.com>
  * and Davidlohr Bueso <davidlohr@hp.com>. Based on mutexes.
+ *
+ * Rwsem count bit fields re-definition by Waiman Long <longman@redhat.com>.
  */
 #include <linux/rwsem.h>
 #include <linux/init.h>
@@ -22,52 +24,20 @@
 #include "rwsem.h"
 
 /*
- * Guide to the rw_semaphore's count field for common values.
- * (32-bit case illustrated, similar for 64-bit)
- *
- * 0x0000000X	(1) X readers active or attempting lock, no writer waiting
- *		    X = #active_readers + #readers attempting to lock
- *		    (X*ACTIVE_BIAS)
- *
- * 0x00000000	rwsem is unlocked, and no one is waiting for the lock or
- *		attempting to read lock or write lock.
- *
- * 0xffff000X	(1) X readers active or attempting lock, with waiters for lock
- *		    X = #active readers + # readers attempting lock
- *		    (X*ACTIVE_BIAS + WAITING_BIAS)
- *		(2) 1 writer attempting lock, no waiters for lock
- *		    X-1 = #active readers + #readers attempting lock
- *		    ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
- *		(3) 1 writer active, no waiters for lock
- *		    X-1 = #active readers + #readers attempting lock
- *		    ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
- *
- * 0xffff0001	(1) 1 reader active or attempting lock, waiters for lock
- *		    (WAITING_BIAS + ACTIVE_BIAS)
- *		(2) 1 writer active or attempting lock, no waiters for lock
- *		    (ACTIVE_WRITE_BIAS)
+ * Guide to the rw_semaphore's count field.
  *
- * 0xffff0000	(1) There are writers or readers queued but none active
- *		    or in the process of attempting lock.
- *		    (WAITING_BIAS)
- *		Note: writer can attempt to steal lock for this count by adding
- *		ACTIVE_WRITE_BIAS in cmpxchg and checking the old count
+ * When the RWSEM_WRITER_LOCKED bit in count is set, the lock is owned
+ * by a writer.
  *
- * 0xfffe0001	(1) 1 writer active, or attempting lock. Waiters on queue.
- *		    (ACTIVE_WRITE_BIAS + WAITING_BIAS)
- *
- * Note: Readers attempt to lock by adding ACTIVE_BIAS in down_read and checking
- *	 the count becomes more than 0 for successful lock acquisition,
- *	 i.e. the case where there are only readers or nobody has lock.
- *	 (1st and 2nd case above).
- *
- *	 Writers attempt to lock by adding ACTIVE_WRITE_BIAS in down_write and
- *	 checking the count becomes ACTIVE_WRITE_BIAS for successful lock
- *	 acquisition (i.e. nobody else has lock or attempts lock).  If
- *	 unsuccessful, in rwsem_down_write_failed, we'll check to see if there
- *	 are only waiters but none active (5th case above), and attempt to
- *	 steal the lock.
+ * The lock is owned by readers when
+ * (1) the RWSEM_WRITER_LOCKED isn't set in count,
+ * (2) some of the reader bits are set in count, and
+ * (3) the owner field has RWSEM_READ_OWNED bit set.
  *
+ * Having some reader bits set is not enough to guarantee a readers owned
+ * lock as the readers may be in the process of backing out from the count
+ * and a writer has just released the lock. So another writer may steal
+ * the lock immediately after that.
  */
 
 /*
@@ -113,9 +83,8 @@ enum rwsem_wake_type {
 
 /*
  * handle the lock release when processes blocked on it that can now run
- * - if we come here from up_xxxx(), then:
- *   - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
- *   - the 'waiting part' of count (&0xffff0000) is -ve (and will still be so)
+ * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must
+ *   have been set.
  * - there must be someone on the queue
  * - the wait_lock must be held by the caller
  * - tasks are marked for wakeup, the caller must later invoke wake_up_q()
@@ -159,22 +128,11 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 	 * so we can bail out early if a writer stole the lock.
 	 */
 	if (wake_type != RWSEM_WAKE_READ_OWNED) {
-		adjustment = RWSEM_ACTIVE_READ_BIAS;
- try_reader_grant:
+		adjustment = RWSEM_READER_BIAS;
 		oldcount = atomic_long_fetch_add(adjustment, &sem->count);
-		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
-			/*
-			 * If the count is still less than RWSEM_WAITING_BIAS
-			 * after removing the adjustment, it is assumed that
-			 * a writer has stolen the lock. We have to undo our
-			 * reader grant.
-			 */
-			if (atomic_long_add_return(-adjustment, &sem->count) <
-			    RWSEM_WAITING_BIAS)
-				return;
-
-			/* Last active locker left. Retry waking readers. */
-			goto try_reader_grant;
+		if (unlikely(oldcount & RWSEM_WRITER_MASK)) {
+			atomic_long_sub(adjustment, &sem->count);
+			return;
 		}
 		/*
 		 * Set it to reader-owned to give spinners an early
@@ -214,11 +172,11 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		wake_q_add_safe(wake_q, tsk);
 	}
 
-	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
+	adjustment = woken * RWSEM_READER_BIAS - adjustment;
 	lockevent_cond_inc(rwsem_wake_reader, woken);
 	if (list_empty(&sem->wait_list)) {
 		/* hit end of list above */
-		adjustment -= RWSEM_WAITING_BIAS;
+		adjustment -= RWSEM_FLAG_WAITERS;
 	}
 
 	if (adjustment)
@@ -232,22 +190,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
  */
 static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
 {
-	/*
-	 * Avoid trying to acquire write lock if count isn't RWSEM_WAITING_BIAS.
-	 */
-	if (count != RWSEM_WAITING_BIAS)
+	long new;
+
+	if (RWSEM_COUNT_LOCKED(count))
 		return false;
 
-	/*
-	 * Acquire the lock by trying to set it to ACTIVE_WRITE_BIAS. If there
-	 * are other tasks on the wait list, we need to add on WAITING_BIAS.
-	 */
-	count = list_is_singular(&sem->wait_list) ?
-			RWSEM_ACTIVE_WRITE_BIAS :
-			RWSEM_ACTIVE_WRITE_BIAS + RWSEM_WAITING_BIAS;
+	new = count + RWSEM_WRITER_LOCKED -
+	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0);
 
-	if (atomic_long_cmpxchg_acquire(&sem->count, RWSEM_WAITING_BIAS, count)
-							== RWSEM_WAITING_BIAS) {
+	if (atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)) {
 		rwsem_set_owner(sem);
 		return true;
 	}
@@ -263,9 +214,9 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 {
 	long count = atomic_long_read(&sem->count);
 
-	while (!count || count == RWSEM_WAITING_BIAS) {
+	while (!RWSEM_COUNT_LOCKED(count)) {
 		if (atomic_long_try_cmpxchg_acquire(&sem->count, &count,
-					count + RWSEM_ACTIVE_WRITE_BIAS)) {
+					count + RWSEM_WRITER_LOCKED)) {
 			rwsem_set_owner(sem);
 			lockevent_inc(rwsem_opt_wlock);
 			return true;
@@ -422,7 +373,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore __sched *
 __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 {
-	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+	long count, adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
@@ -434,16 +385,16 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 		/*
 		 * In case the wait queue is empty and the lock isn't owned
 		 * by a writer, this reader can exit the slowpath and return
-		 * immediately as its RWSEM_ACTIVE_READ_BIAS has already
-		 * been set in the count.
+		 * immediately as its RWSEM_READER_BIAS has already been
+		 * set in the count.
 		 */
-		if (atomic_long_read(&sem->count) >= 0) {
+		if (!(atomic_long_read(&sem->count) & RWSEM_WRITER_MASK)) {
 			raw_spin_unlock_irq(&sem->wait_lock);
 			rwsem_set_reader_owned(sem);
 			lockevent_inc(rwsem_rlock_fast);
 			return sem;
 		}
-		adjustment += RWSEM_WAITING_BIAS;
+		adjustment += RWSEM_FLAG_WAITERS;
 	}
 	list_add_tail(&waiter.list, &sem->wait_list);
 
@@ -456,9 +407,8 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 	 * If there are no writers and we are first in the queue,
 	 * wake our own waiter to join the existing active readers !
 	 */
-	if (count == RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS &&
-	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
+	if (!RWSEM_COUNT_LOCKED(count) ||
+	   (!(count & RWSEM_WRITER_MASK) && (adjustment & RWSEM_FLAG_WAITERS)))
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 
 	raw_spin_unlock_irq(&sem->wait_lock);
@@ -486,7 +436,7 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 out_nolock:
 	list_del(&waiter.list);
 	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+		atomic_long_andnot(RWSEM_FLAG_WAITERS, &sem->count);
 	raw_spin_unlock_irq(&sem->wait_lock);
 	__set_current_state(TASK_RUNNING);
 	lockevent_inc(rwsem_rlock_fail);
@@ -519,9 +469,6 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 	struct rw_semaphore *ret = sem;
 	DEFINE_WAKE_Q(wake_q);
 
-	/* undo write bias from down_write operation, stop active locking */
-	count = atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS, &sem->count);
-
 	/* do optimistic spinning and steal lock if possible */
 	if (rwsem_optimistic_spin(sem))
 		return sem;
@@ -541,16 +488,18 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 
 	list_add_tail(&waiter.list, &sem->wait_list);
 
-	/* we're now waiting on the lock, but no longer actively locking */
+	/* we're now waiting on the lock */
 	if (waiting) {
 		count = atomic_long_read(&sem->count);
 
 		/*
 		 * If there were already threads queued before us and there are
-		 * no active writers, the lock must be read owned; so we try to
-		 * wake any read locks that were queued ahead of us.
+		 * no active writers and some readers, the lock must be read
+		 * owned; so we try to  any read locks that were queued ahead
+		 * of us.
 		 */
-		if (count > RWSEM_WAITING_BIAS) {
+		if (!(count & RWSEM_WRITER_MASK) &&
+		     (count & RWSEM_READER_MASK)) {
 			__rwsem_mark_wake(sem, RWSEM_WAKE_READERS, &wake_q);
 			/*
 			 * The wakeup is normally called _after_ the wait_lock
@@ -567,8 +516,9 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 			wake_q_init(&wake_q);
 		}
 
-	} else
-		count = atomic_long_add_return(RWSEM_WAITING_BIAS, &sem->count);
+	} else {
+		count = atomic_long_add_return(RWSEM_FLAG_WAITERS, &sem->count);
+	}
 
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
@@ -585,7 +535,8 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 			schedule();
 			lockevent_inc(rwsem_sleep_writer);
 			set_current_state(state);
-		} while ((count = atomic_long_read(&sem->count)) & RWSEM_ACTIVE_MASK);
+			count = atomic_long_read(&sem->count);
+		} while (RWSEM_COUNT_LOCKED(count));
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
@@ -601,7 +552,7 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 	raw_spin_lock_irq(&sem->wait_lock);
 	list_del(&waiter.list);
 	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+		atomic_long_andnot(RWSEM_FLAG_WAITERS, &sem->count);
 	else
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 	raw_spin_unlock_irq(&sem->wait_lock);
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 37db17890e36..a49ebce1b4ab 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -41,24 +41,26 @@
 #endif
 
 /*
- * R/W semaphores originally for PPC using the stuff in lib/rwsem.c.
- * Adapted largely from include/asm-i386/rwsem.h
- * by Paul Mackerras <paulus@samba.org>.
+ * The definition of the atomic counter in the semaphore:
+ *
+ * Bit  0   - writer locked bit
+ * Bit  1   - waiters present bit
+ * Bits 2-7 - reserved
+ * Bits 8-X - 24-bit (32-bit) or 56-bit reader count
+ *
+ * atomic_long_fetch_add() is used to obtain reader lock, whereas
+ * atomic_long_cmpxchg() will be used to obtain writer lock.
  */
+#define RWSEM_WRITER_LOCKED	(1UL << 0)
+#define RWSEM_FLAG_WAITERS	(1UL << 1)
+#define RWSEM_READER_SHIFT	8
+#define RWSEM_READER_BIAS	(1UL << RWSEM_READER_SHIFT)
+#define RWSEM_READER_MASK	(~(RWSEM_READER_BIAS - 1))
+#define RWSEM_WRITER_MASK	RWSEM_WRITER_LOCKED
+#define RWSEM_LOCK_MASK		(RWSEM_WRITER_MASK|RWSEM_READER_MASK)
+#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS)
 
-/*
- * the semaphore definition
- */
-#ifdef CONFIG_64BIT
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#define RWSEM_COUNT_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
@@ -173,7 +175,8 @@ extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) {
+	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
+			&sem->count) & RWSEM_READ_FAILED_MASK)) {
 		rwsem_down_read_failed(sem);
 		DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner &
 					RWSEM_READER_OWNED), sem);
@@ -184,7 +187,8 @@ static inline void __down_read(struct rw_semaphore *sem)
 
 static inline int __down_read_killable(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0)) {
+	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
+			&sem->count) & RWSEM_READ_FAILED_MASK)) {
 		if (IS_ERR(rwsem_down_read_failed_killable(sem)))
 			return -EINTR;
 		DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner &
@@ -205,11 +209,11 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	lockevent_inc(rwsem_rtrylock);
 	do {
 		if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp,
-					tmp + RWSEM_ACTIVE_READ_BIAS)) {
+					tmp + RWSEM_READER_BIAS)) {
 			rwsem_set_reader_owned(sem);
 			return 1;
 		}
-	} while (tmp >= 0);
+	} while (!(tmp & RWSEM_READ_FAILED_MASK));
 	return 0;
 }
 
@@ -218,22 +222,16 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
  */
 static inline void __down_write(struct rw_semaphore *sem)
 {
-	long tmp;
-
-	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
-					     &sem->count);
-	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+	if (unlikely(atomic_long_cmpxchg_acquire(&sem->count, 0,
+						 RWSEM_WRITER_LOCKED)))
 		rwsem_down_write_failed(sem);
 	rwsem_set_owner(sem);
 }
 
 static inline int __down_write_killable(struct rw_semaphore *sem)
 {
-	long tmp;
-
-	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
-					     &sem->count);
-	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
+	if (unlikely(atomic_long_cmpxchg_acquire(&sem->count, 0,
+						 RWSEM_WRITER_LOCKED)))
 		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
 			return -EINTR;
 	rwsem_set_owner(sem);
@@ -246,7 +244,7 @@ static inline int __down_write_trylock(struct rw_semaphore *sem)
 
 	lockevent_inc(rwsem_wtrylock);
 	tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
+					  RWSEM_WRITER_LOCKED);
 	if (tmp == RWSEM_UNLOCKED_VALUE) {
 		rwsem_set_owner(sem);
 		return true;
@@ -264,8 +262,9 @@ static inline void __up_read(struct rw_semaphore *sem)
 	DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED),
 				sem);
 	rwsem_clear_reader_owned(sem);
-	tmp = atomic_long_dec_return_release(&sem->count);
-	if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0))
+	tmp = atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count);
+	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS))
+			== RWSEM_FLAG_WAITERS))
 		rwsem_wake(sem);
 }
 
@@ -276,8 +275,8 @@ static inline void __up_write(struct rw_semaphore *sem)
 {
 	DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem);
 	rwsem_clear_owner(sem);
-	if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS,
-						    &sem->count) < 0))
+	if (unlikely(atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED,
+			&sem->count) & RWSEM_FLAG_WAITERS))
 		rwsem_wake(sem);
 }
 
@@ -296,8 +295,9 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 	 * write side. As such, rely on RELEASE semantics.
 	 */
 	DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem);
-	tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count);
+	tmp = atomic_long_fetch_add_release(
+		-RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS, &sem->count);
 	rwsem_set_reader_owned(sem);
-	if (tmp < 0)
+	if (tmp & RWSEM_FLAG_WAITERS)
 		rwsem_downgrade_wake(sem);
 }
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 01/12] locking/rwsem: Implement a new locking scheme Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-10 15:07   ` Peter Zijlstra
                     ` (2 more replies)
  2019-04-05 19:21 ` [PATCH-tip v2 03/12] locking/rwsem: Remove rwsem_wake() wakeup optimization Waiman Long
                   ` (11 subsequent siblings)
  13 siblings, 3 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

Because of writer lock stealing, it is possible that a constant
stream of incoming writers will cause a waiting writer or reader to
wait indefinitely leading to lock starvation.

The mutex code has a lock handoff mechanism to prevent lock starvation.
This patch implements a similar lock handoff mechanism to disable
lock stealing and force lock handoff to the first waiter in the queue
after at least a 5ms waiting period. The waiting period is used to
avoid discouraging lock stealing too much to affect performance.

A rwsem microbenchmark was run for 5 seconds on a 8-socket 120-core
240-thread IvyBridge-EX system with a v5.1 based kernel and 240
write_lock threads with 5us sleep critical section.

Before the patch, the min/mean/max numbers of locking operations for the
locking threads were 1/2,433/320,955. After the patch, the figures became
891/1,807/3,174. It can be seen that the rwsem became much more fair,
though there was a drop of about 26% in the mean locking operations
done which was a tradeoff of having better fairness.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lock_events_list.h |   2 +
 kernel/locking/rwsem-xadd.c       | 154 ++++++++++++++++++++++--------
 kernel/locking/rwsem.h            |  23 +++--
 3 files changed, 134 insertions(+), 45 deletions(-)

diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index ad7668cfc9da..29e5c52197fa 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -61,7 +61,9 @@ LOCK_EVENT(rwsem_opt_fail)	/* # of failed opt-spinnings		*/
 LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
 LOCK_EVENT(rwsem_rlock_fast)	/* # of fast read locks acquired	*/
 LOCK_EVENT(rwsem_rlock_fail)	/* # of failed read lock acquisitions	*/
+LOCK_EVENT(rwsem_rlock_handoff)	/* # of read lock handoffs		*/
 LOCK_EVENT(rwsem_rtrylock)	/* # of read trylock calls		*/
 LOCK_EVENT(rwsem_wlock)		/* # of write locks acquired		*/
 LOCK_EVENT(rwsem_wlock_fail)	/* # of failed write lock acquisitions	*/
+LOCK_EVENT(rwsem_wlock_handoff)	/* # of write lock handoffs		*/
 LOCK_EVENT(rwsem_wtrylock)	/* # of write trylock calls		*/
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index adab6477be51..58b3a64e6f2c 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -73,6 +73,7 @@ struct rwsem_waiter {
 	struct list_head list;
 	struct task_struct *task;
 	enum rwsem_waiter_type type;
+	unsigned long timeout;
 };
 
 enum rwsem_wake_type {
@@ -81,6 +82,12 @@ enum rwsem_wake_type {
 	RWSEM_WAKE_READ_OWNED	/* Waker thread holds the read lock */
 };
 
+/*
+ * The minimum waiting time (5ms) in the wait queue before initiating the
+ * handoff protocol.
+ */
+#define RWSEM_WAIT_TIMEOUT	((HZ - 1)/200 + 1)
+
 /*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must
@@ -131,6 +138,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		adjustment = RWSEM_READER_BIAS;
 		oldcount = atomic_long_fetch_add(adjustment, &sem->count);
 		if (unlikely(oldcount & RWSEM_WRITER_MASK)) {
+			/*
+			 * Initiate handoff to reader, if applicable.
+			 */
+			if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
+			    time_after(jiffies, waiter->timeout)) {
+				adjustment -= RWSEM_FLAG_HANDOFF;
+				lockevent_inc(rwsem_rlock_handoff);
+			}
+
 			atomic_long_sub(adjustment, &sem->count);
 			return;
 		}
@@ -179,6 +195,12 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		adjustment -= RWSEM_FLAG_WAITERS;
 	}
 
+	/*
+	 * Clear the handoff flag
+	 */
+	if (woken && RWSEM_COUNT_HANDOFF(atomic_long_read(&sem->count)))
+		adjustment -= RWSEM_FLAG_HANDOFF;
+
 	if (adjustment)
 		atomic_long_add(adjustment, &sem->count);
 }
@@ -188,15 +210,19 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
  */
-static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
+static inline bool
+rwsem_try_write_lock(long count, struct rw_semaphore *sem, bool first)
 {
 	long new;
 
 	if (RWSEM_COUNT_LOCKED(count))
 		return false;
 
-	new = count + RWSEM_WRITER_LOCKED -
-	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0);
+	if (!first && RWSEM_COUNT_HANDOFF(count))
+		return false;
+
+	new = (count & ~RWSEM_FLAG_HANDOFF) + RWSEM_WRITER_LOCKED -
+	      (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0);
 
 	if (atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)) {
 		rwsem_set_owner(sem);
@@ -214,7 +240,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 {
 	long count = atomic_long_read(&sem->count);
 
-	while (!RWSEM_COUNT_LOCKED(count)) {
+	while (!RWSEM_COUNT_LOCKED_OR_HANDOFF(count)) {
 		if (atomic_long_try_cmpxchg_acquire(&sem->count, &count,
 					count + RWSEM_WRITER_LOCKED)) {
 			rwsem_set_owner(sem);
@@ -367,6 +393,16 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 }
 #endif
 
+/*
+ * This is safe to be called without holding the wait_lock.
+ */
+static inline bool
+rwsem_waiter_is_first(struct rw_semaphore *sem, struct rwsem_waiter *waiter)
+{
+	return list_first_entry(&sem->wait_list, struct rwsem_waiter, list)
+			== waiter;
+}
+
 /*
  * Wait for the read lock to be granted
  */
@@ -379,16 +415,18 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_READ;
+	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	if (list_empty(&sem->wait_list)) {
 		/*
 		 * In case the wait queue is empty and the lock isn't owned
-		 * by a writer, this reader can exit the slowpath and return
-		 * immediately as its RWSEM_READER_BIAS has already been
-		 * set in the count.
+		 * by a writer or has the handoff bit set, this reader can
+		 * exit the slowpath and return immediately as its
+		 * RWSEM_READER_BIAS has already been set in the count.
 		 */
-		if (!(atomic_long_read(&sem->count) & RWSEM_WRITER_MASK)) {
+		if (!(atomic_long_read(&sem->count) &
+		     (RWSEM_WRITER_MASK | RWSEM_FLAG_HANDOFF))) {
 			raw_spin_unlock_irq(&sem->wait_lock);
 			rwsem_set_reader_owned(sem);
 			lockevent_inc(rwsem_rlock_fast);
@@ -436,7 +474,8 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 out_nolock:
 	list_del(&waiter.list);
 	if (list_empty(&sem->wait_list))
-		atomic_long_andnot(RWSEM_FLAG_WAITERS, &sem->count);
+		atomic_long_andnot(RWSEM_FLAG_WAITERS|RWSEM_FLAG_HANDOFF,
+				   &sem->count);
 	raw_spin_unlock_irq(&sem->wait_lock);
 	__set_current_state(TASK_RUNNING);
 	lockevent_inc(rwsem_rlock_fail);
@@ -464,7 +503,7 @@ static inline struct rw_semaphore *
 __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 {
 	long count;
-	bool waiting = true; /* any queued threads before us */
+	int first;	/* First one in queue (>1 if handoff set) */
 	struct rwsem_waiter waiter;
 	struct rw_semaphore *ret = sem;
 	DEFINE_WAKE_Q(wake_q);
@@ -479,56 +518,63 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 	 */
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_WRITE;
+	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 
 	/* account for this before adding a new element to the list */
-	if (list_empty(&sem->wait_list))
-		waiting = false;
+	first = list_empty(&sem->wait_list);
 
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock */
-	if (waiting) {
+	if (!first) {
 		count = atomic_long_read(&sem->count);
 
 		/*
-		 * If there were already threads queued before us and there are
-		 * no active writers and some readers, the lock must be read
-		 * owned; so we try to  any read locks that were queued ahead
-		 * of us.
+		 * If there were already threads queued before us and:
+		 *  1) there are no no active locks, wake the front
+		 *     queued process(es) as the handoff bit might be set.
+		 *  2) there are no active writers and some readers, the lock
+		 *     must be read owned; so we try to wake any read lock
+		 *     waiters that were queued ahead of us.
 		 */
-		if (!(count & RWSEM_WRITER_MASK) &&
-		     (count & RWSEM_READER_MASK)) {
+		if (!RWSEM_COUNT_LOCKED(count))
+			__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+		else if (!(count & RWSEM_WRITER_MASK) &&
+			  (count & RWSEM_READER_MASK))
 			__rwsem_mark_wake(sem, RWSEM_WAKE_READERS, &wake_q);
-			/*
-			 * The wakeup is normally called _after_ the wait_lock
-			 * is released, but given that we are proactively waking
-			 * readers we can deal with the wake_q overhead as it is
-			 * similar to releasing and taking the wait_lock again
-			 * for attempting rwsem_try_write_lock().
-			 */
-			wake_up_q(&wake_q);
+		else
+			goto wait;
 
-			/*
-			 * Reinitialize wake_q after use.
-			 */
-			wake_q_init(&wake_q);
-		}
+		/*
+		 * The wakeup is normally called _after_ the wait_lock
+		 * is released, but given that we are proactively waking
+		 * readers we can deal with the wake_q overhead as it is
+		 * similar to releasing and taking the wait_lock again
+		 * for attempting rwsem_try_write_lock().
+		 */
+		wake_up_q(&wake_q);
 
+		/*
+		 * Reinitialize wake_q after use.
+		 */
+		wake_q_init(&wake_q);
 	} else {
 		count = atomic_long_add_return(RWSEM_FLAG_WAITERS, &sem->count);
 	}
 
+wait:
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
 	while (true) {
-		if (rwsem_try_write_lock(count, sem))
+		if (rwsem_try_write_lock(count, sem, first))
 			break;
+
 		raw_spin_unlock_irq(&sem->wait_lock);
 
 		/* Block until there are no active lockers. */
-		do {
+		for (;;) {
 			if (signal_pending_state(state, current))
 				goto out_nolock;
 
@@ -536,7 +582,31 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 			lockevent_inc(rwsem_sleep_writer);
 			set_current_state(state);
 			count = atomic_long_read(&sem->count);
-		} while (RWSEM_COUNT_LOCKED(count));
+
+			if (!first)
+				first = rwsem_waiter_is_first(sem, &waiter);
+
+			if (!RWSEM_COUNT_LOCKED(count))
+				break;
+
+			if (first && !RWSEM_COUNT_HANDOFF(count) &&
+			    time_after(jiffies, waiter.timeout)) {
+				atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);
+				first++;
+
+				/*
+				 * Make sure the handoff bit is seen by
+				 * others before proceeding.
+				 */
+				smp_mb__after_atomic();
+				lockevent_inc(rwsem_wlock_handoff);
+				/*
+				 * Do a trylock after setting the handoff
+				 * flag to avoid missed wakeup.
+				 */
+				break;
+			}
+		}
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
@@ -551,6 +621,12 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 	__set_current_state(TASK_RUNNING);
 	raw_spin_lock_irq(&sem->wait_lock);
 	list_del(&waiter.list);
+	/*
+	 * If handoff bit is set by this waiter, make sure that the clearing
+	 * of the handoff bit is seen by all before proceeding.
+	 */
+	if (unlikely(first > 1))
+		atomic_long_add_return(-RWSEM_FLAG_HANDOFF,  &sem->count);
 	if (list_empty(&sem->wait_list))
 		atomic_long_andnot(RWSEM_FLAG_WAITERS, &sem->count);
 	else
@@ -581,7 +657,7 @@ EXPORT_SYMBOL(rwsem_down_write_failed_killable);
  * - up_read/up_write has decremented the active part of count if we come here
  */
 __visible
-struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
+struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, long count)
 {
 	unsigned long flags;
 	DEFINE_WAKE_Q(wake_q);
@@ -614,7 +690,9 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	smp_rmb();
 
 	/*
-	 * If a spinner is present, it is not necessary to do the wakeup.
+	 * If a spinner is present and the handoff flag isn't set, it is
+	 * not necessary to do the wakeup.
+	 *
 	 * Try to do wakeup only if the trylock succeeds to minimize
 	 * spinlock contention which may introduce too much delay in the
 	 * unlock operation.
@@ -633,7 +711,7 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	 * rwsem_has_spinner() is true, it will guarantee at least one
 	 * trylock attempt on the rwsem later on.
 	 */
-	if (rwsem_has_spinner(sem)) {
+	if (rwsem_has_spinner(sem) && !RWSEM_COUNT_HANDOFF(count)) {
 		/*
 		 * The smp_rmb() here is to make sure that the spinner
 		 * state is consulted before reading the wait_lock.
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index a49ebce1b4ab..2befb5ab1181 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -45,7 +45,8 @@
  *
  * Bit  0   - writer locked bit
  * Bit  1   - waiters present bit
- * Bits 2-7 - reserved
+ * Bit  2   - lock handoff bit
+ * Bits 3-7 - reserved
  * Bits 8-X - 24-bit (32-bit) or 56-bit reader count
  *
  * atomic_long_fetch_add() is used to obtain reader lock, whereas
@@ -53,14 +54,20 @@
  */
 #define RWSEM_WRITER_LOCKED	(1UL << 0)
 #define RWSEM_FLAG_WAITERS	(1UL << 1)
+#define RWSEM_FLAG_HANDOFF	(1UL << 2)
+
 #define RWSEM_READER_SHIFT	8
 #define RWSEM_READER_BIAS	(1UL << RWSEM_READER_SHIFT)
 #define RWSEM_READER_MASK	(~(RWSEM_READER_BIAS - 1))
 #define RWSEM_WRITER_MASK	RWSEM_WRITER_LOCKED
 #define RWSEM_LOCK_MASK		(RWSEM_WRITER_MASK|RWSEM_READER_MASK)
-#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS)
+#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\
+				 RWSEM_FLAG_HANDOFF)
 
 #define RWSEM_COUNT_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
+#define RWSEM_COUNT_HANDOFF(c)	((c) & RWSEM_FLAG_HANDOFF)
+#define RWSEM_COUNT_LOCKED_OR_HANDOFF(c)	\
+	((c) & (RWSEM_LOCK_MASK|RWSEM_FLAG_HANDOFF))
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
@@ -167,7 +174,7 @@ extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, long count);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
 /*
@@ -265,7 +272,7 @@ static inline void __up_read(struct rw_semaphore *sem)
 	tmp = atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count);
 	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS))
 			== RWSEM_FLAG_WAITERS))
-		rwsem_wake(sem);
+		rwsem_wake(sem, tmp);
 }
 
 /*
@@ -273,11 +280,13 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
+	long tmp;
+
 	DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem);
 	rwsem_clear_owner(sem);
-	if (unlikely(atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED,
-			&sem->count) & RWSEM_FLAG_WAITERS))
-		rwsem_wake(sem);
+	tmp = atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count);
+	if (unlikely(tmp & RWSEM_FLAG_WAITERS))
+		rwsem_wake(sem, tmp);
 }
 
 /*
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 03/12] locking/rwsem: Remove rwsem_wake() wakeup optimization
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 01/12] locking/rwsem: Implement a new locking scheme Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-10 18:38   ` Davidlohr Bueso
  2019-04-05 19:21 ` [PATCH-tip v2 04/12] locking/rwsem: Make rwsem_spin_on_owner() return owner state Waiman Long
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

With the commit 59aabfc7e959 ("locking/rwsem: Reduce spinlock contention
in wakeup after up_read()/up_write()"), the rwsem_wake() forgoes doing
a wakeup if the wait_lock cannot be directly acquired and an optimistic
spinning locker is present.  This can help performance by avoiding
spinning on the wait_lock when it is contended.

With the later commit 133e89ef5ef3 ("locking/rwsem: Enable lockless
waiter wakeup(s)"), the performance advantage of the above optimization
diminishes as the average wait_lock hold time become much shorter.

By supporting rwsem lock handoff, we can no longer relies on the fact
that the presence of an optimistic spinning locker will ensure that the
lock will be acquired by a task soon. This can lead to missed wakeup
and application hang. So the commit 59aabfc7e959 ("locking/rwsem:
Reduce spinlock contention in wakeup after up_read()/up_write()")
will have to be reverted.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 74 -------------------------------------
 1 file changed, 74 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 58b3a64e6f2c..4f036bda9063 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -372,25 +372,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	lockevent_cond_inc(rwsem_opt_fail, !taken);
 	return taken;
 }
-
-/*
- * Return true if the rwsem has active spinner
- */
-static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
-{
-	return osq_is_locked(&sem->osq);
-}
-
 #else
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 {
 	return false;
 }
-
-static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
-{
-	return false;
-}
 #endif
 
 /*
@@ -662,67 +648,7 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, long count)
 	unsigned long flags;
 	DEFINE_WAKE_Q(wake_q);
 
-	/*
-	* __rwsem_down_write_failed_common(sem)
-	*   rwsem_optimistic_spin(sem)
-	*     osq_unlock(sem->osq)
-	*   ...
-	*   atomic_long_add_return(&sem->count)
-	*
-	*      - VS -
-	*
-	*              __up_write()
-	*                if (atomic_long_sub_return_release(&sem->count) < 0)
-	*                  rwsem_wake(sem)
-	*                    osq_is_locked(&sem->osq)
-	*
-	* And __up_write() must observe !osq_is_locked() when it observes the
-	* atomic_long_add_return() in order to not miss a wakeup.
-	*
-	* This boils down to:
-	*
-	* [S.rel] X = 1                [RmW] r0 = (Y += 0)
-	*         MB                         RMB
-	* [RmW]   Y += 1               [L]   r1 = X
-	*
-	* exists (r0=1 /\ r1=0)
-	*/
-	smp_rmb();
-
-	/*
-	 * If a spinner is present and the handoff flag isn't set, it is
-	 * not necessary to do the wakeup.
-	 *
-	 * Try to do wakeup only if the trylock succeeds to minimize
-	 * spinlock contention which may introduce too much delay in the
-	 * unlock operation.
-	 *
-	 *    spinning writer		up_write/up_read caller
-	 *    ---------------		-----------------------
-	 * [S]   osq_unlock()		[L]   osq
-	 *	 MB			      RMB
-	 * [RmW] rwsem_try_write_lock() [RmW] spin_trylock(wait_lock)
-	 *
-	 * Here, it is important to make sure that there won't be a missed
-	 * wakeup while the rwsem is free and the only spinning writer goes
-	 * to sleep without taking the rwsem. Even when the spinning writer
-	 * is just going to break out of the waiting loop, it will still do
-	 * a trylock in rwsem_down_write_failed() before sleeping. IOW, if
-	 * rwsem_has_spinner() is true, it will guarantee at least one
-	 * trylock attempt on the rwsem later on.
-	 */
-	if (rwsem_has_spinner(sem) && !RWSEM_COUNT_HANDOFF(count)) {
-		/*
-		 * The smp_rmb() here is to make sure that the spinner
-		 * state is consulted before reading the wait_lock.
-		 */
-		smp_rmb();
-		if (!raw_spin_trylock_irqsave(&sem->wait_lock, flags))
-			return sem;
-		goto locked;
-	}
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
-locked:
 
 	if (!list_empty(&sem->wait_list))
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 04/12] locking/rwsem: Make rwsem_spin_on_owner() return owner state
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (2 preceding siblings ...)
  2019-04-05 19:21 ` [PATCH-tip v2 03/12] locking/rwsem: Remove rwsem_wake() wakeup optimization Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 05/12] locking/rwsem: Ensure an RT task will not spin on reader Waiman Long
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

This patch modifies rwsem_spin_on_owner() to return four possible
values to better reflect the state of lock holder which enables us to
make a better decision of what to do next.

In the special case that there is no active lock and the handoff bit
is set, optimistic spinning has to be stopped.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 40 ++++++++++++++++++++++++++++++-------
 kernel/locking/rwsem.h      |  5 +++++
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 4f036bda9063..35891c53338b 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -281,14 +281,30 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 }
 
 /*
- * Return true only if we can still spin on the owner field of the rwsem.
+ * Return the folowing 4 values depending on the lock owner state.
+ *   OWNER_NULL  : owner is currently NULL
+ *   OWNER_WRITER: when owner changes and is a writer
+ *   OWNER_READER: when owner changes and the new owner may be a reader.
+ *   OWNER_NONSPINNABLE:
+ *		   when optimistic spinning has to stop because either the
+ *		   owner stops running, is unknown, or its timeslice has
+ *		   been used up.
  */
-static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
+enum owner_state {
+	OWNER_NULL		= 1 << 0,
+	OWNER_WRITER		= 1 << 1,
+	OWNER_READER		= 1 << 2,
+	OWNER_NONSPINNABLE	= 1 << 3,
+};
+#define OWNER_SPINNABLE		(OWNER_NULL | OWNER_WRITER)
+
+static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner = READ_ONCE(sem->owner);
+	long count;
 
 	if (!is_rwsem_owner_spinnable(owner))
-		return false;
+		return OWNER_NONSPINNABLE;
 
 	rcu_read_lock();
 	while (owner && (READ_ONCE(sem->owner) == owner)) {
@@ -306,7 +322,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 		 */
 		if (need_resched() || !owner_on_cpu(owner)) {
 			rcu_read_unlock();
-			return false;
+			return OWNER_NONSPINNABLE;
 		}
 
 		cpu_relax();
@@ -315,9 +331,19 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 
 	/*
 	 * If there is a new owner or the owner is not set, we continue
-	 * spinning.
+	 * spinning except when here is no active locks and the handoff bit
+	 * is set. In this case, we have to stop spinning.
 	 */
-	return is_rwsem_owner_spinnable(READ_ONCE(sem->owner));
+	owner = READ_ONCE(sem->owner);
+	if (!is_rwsem_owner_spinnable(owner))
+		return OWNER_NONSPINNABLE;
+	if (owner && !is_rwsem_owner_reader(owner))
+		return OWNER_WRITER;
+
+	count = atomic_long_read(&sem->count);
+	if (RWSEM_COUNT_HANDOFF(count) && !RWSEM_COUNT_LOCKED(count))
+		return OWNER_NONSPINNABLE;
+	return !owner ? OWNER_NULL : OWNER_READER;
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
@@ -340,7 +366,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	 *  2) readers own the lock as we can't determine if they are
 	 *     actively running or not.
 	 */
-	while (rwsem_spin_on_owner(sem)) {
+	while (rwsem_spin_on_owner(sem) & OWNER_SPINNABLE) {
 		/*
 		 * Try to acquire the lock
 		 */
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 2befb5ab1181..3ba235491dc1 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -119,6 +119,11 @@ static inline bool is_rwsem_owner_spinnable(struct task_struct *owner)
 	return !((unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED);
 }
 
+static inline bool is_rwsem_owner_reader(struct task_struct *owner)
+{
+	return (unsigned long)owner & RWSEM_READER_OWNED;
+}
+
 /*
  * Return true if rwsem is owned by an anonymous writer or readers.
  */
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 05/12] locking/rwsem: Ensure an RT task will not spin on reader
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (3 preceding siblings ...)
  2019-04-05 19:21 ` [PATCH-tip v2 04/12] locking/rwsem: Make rwsem_spin_on_owner() return owner state Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue Waiman Long
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

An RT task can do optimistic spinning only if the lock holder is
actually running. If the state of the lock holder isn't known, there
is a possibility that high priority of the RT task may block forward
progress of the lock holder if it happens to reside on the same CPU.
This will lead to deadlock. So we have to make sure that an RT task
will not spin on a reader-owned rwsem.

When the owner is temporarily set to NULL, it is more tricky to decide
if an RT task should stop spinning as it may be a temporary state
where another writer may have just stolen the lock which then failed
the task's trylock attempt. So one more retry is allowed to make sure
that the lock is not spinnable by an RT task.

When testing on a 8-socket IvyBridge-EX system, the one additional retry
seems to improve locking performance of RT write locking threads under
heavy contentions. The table below shows the locking rates (in kops/s)
with various write locking threads before and after the patch.

    Locking threads     Pre-patch     Post-patch
    ---------------     ---------     -----------
            4             2,753          2,608
            8             2,529          2,520
           16             1,727          1,918
           32             1,263          1,956
           64               889          1,343

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 36 +++++++++++++++++++++++++++++-------
 1 file changed, 29 insertions(+), 7 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 35891c53338b..11d7eb61799a 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -349,6 +349,8 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem)
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 {
 	bool taken = false;
+	bool is_rt_task = rt_task(current);
+	int prev_owner_state = OWNER_NULL;
 
 	preempt_disable();
 
@@ -366,7 +368,12 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	 *  2) readers own the lock as we can't determine if they are
 	 *     actively running or not.
 	 */
-	while (rwsem_spin_on_owner(sem) & OWNER_SPINNABLE) {
+	for (;;) {
+		enum owner_state owner_state = rwsem_spin_on_owner(sem);
+
+		if (!(owner_state & OWNER_SPINNABLE))
+			break;
+
 		/*
 		 * Try to acquire the lock
 		 */
@@ -376,13 +383,28 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		}
 
 		/*
-		 * When there's no owner, we might have preempted between the
-		 * owner acquiring the lock and setting the owner field. If
-		 * we're an RT task that will live-lock because we won't let
-		 * the owner complete.
+		 * An RT task cannot do optimistic spinning if it cannot
+		 * be sure the lock holder is running or live-lock may
+		 * happen if the current task and the lock holder happen
+		 * to run in the same CPU.
+		 *
+		 * When there's no owner or is reader-owned, an RT task
+		 * will stop spinning if the owner state is not a writer
+		 * at the previous iteration of the loop. This allows the
+		 * RT task to recheck if the task that steals the lock is
+		 * a spinnable writer. If so, it can keeps on spinning.
+		 *
+		 * If the owner is a writer, the need_resched() check is
+		 * done inside rwsem_spin_on_owner(). If the owner is not
+		 * a writer, need_resched() check needs to be done here.
 		 */
-		if (!sem->owner && (need_resched() || rt_task(current)))
-			break;
+		if (owner_state != OWNER_WRITER) {
+			if (need_resched())
+				break;
+			if (is_rt_task && (prev_owner_state != OWNER_WRITER))
+				break;
+		}
+		prev_owner_state = owner_state;
 
 		/*
 		 * The cpu_relax() call is a compiler barrier which forces
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (4 preceding siblings ...)
  2019-04-05 19:21 ` [PATCH-tip v2 05/12] locking/rwsem: Ensure an RT task will not spin on reader Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-10 16:50   ` Davidlohr Bueso
  2019-04-05 19:21 ` [PATCH-tip v2 07/12] locking/rwsem: Enable readers spinning on writer Waiman Long
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

When the front of the wait queue is a reader, other readers
immediately following the first reader will also be woken up at the
same time. However, if there is a writer in between. Those readers
behind the writer will not be woken up.

Because of optimistic spinning, the lock acquisition order is not FIFO
anyway. The lock handoff mechanism will ensure that lock starvation
will not happen.

Assuming that the lock hold times of the other readers still in the
queue will be about the same as the readers that are being woken up,
there is really not much additional cost other than the additional
latency due to the wakeup of additional tasks by the waker. Therefore
all the readers up to a maximum of 256 in the queue are woken up when
the first waiter is a reader to improve reader throughput.

With a locking microbenchmark running on 5.1 based kernel, the total
locking rates (in kops/s) on a 8-socket IvyBridge-EX system with
equal numbers of readers and writers before and after this patch were
as follows:

   # of Threads  Pre-Patch   Post-patch
   ------------  ---------   ----------
        4          1,641        1,674
        8            731        1,062
       16            564          924
       32             78          300
       64             38          195
      240             50          149

There is no performance gain at low contention level. At high contention
level, however, this patch gives a pretty decent performance boost.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 11d7eb61799a..51858554ff0e 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -88,6 +88,13 @@ enum rwsem_wake_type {
  */
 #define RWSEM_WAIT_TIMEOUT	((HZ - 1)/200 + 1)
 
+/*
+ * We limit the maximum number of readers that can be woken up for a
+ * wake-up call to not penalizing the waking thread for spending too
+ * much time doing it.
+ */
+#define MAX_READERS_WAKEUP	0x100
+
 /*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must
@@ -158,16 +165,16 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 	}
 
 	/*
-	 * Grant an infinite number of read locks to the readers at the front
-	 * of the queue. We know that woken will be at least 1 as we accounted
-	 * for above. Note we increment the 'active part' of the count by the
+	 * Grant up to MAX_READERS_WAKEUP read locks to all the readers in the
+	 * queue. We know that woken will be at least 1 as we accounted for
+	 * above. Note we increment the 'active part' of the count by the
 	 * number of readers before waking any processes up.
 	 */
 	list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) {
 		struct task_struct *tsk;
 
 		if (waiter->type == RWSEM_WAITING_FOR_WRITE)
-			break;
+			continue;
 
 		woken++;
 		tsk = waiter->task;
@@ -186,6 +193,12 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		 * after setting the reader waiter to nil.
 		 */
 		wake_q_add_safe(wake_q, tsk);
+
+		/*
+		 * Limit # of readers that can be woken up per wakeup call.
+		 */
+		if (woken >= MAX_READERS_WAKEUP)
+			break;
 	}
 
 	adjustment = woken * RWSEM_READER_BIAS - adjustment;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 07/12] locking/rwsem: Enable readers spinning on writer
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (5 preceding siblings ...)
  2019-04-05 19:21 ` [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 08/12] locking/rwsem: Enable time-based spinning on reader-owned rwsem Waiman Long
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

This patch enables readers to optimistically spin on a
rwsem when it is owned by a writer instead of going to sleep
directly.  The rwsem_can_spin_on_owner() function is extracted
out of rwsem_optimistic_spin() and is called directly by
__rwsem_down_read_failed_common() and __rwsem_down_write_failed_common().

With a locking microbenchmark running on 5.1 based kernel, the total
locking rates (in kops/s) on a 8-socket IvyBrige-EX system with equal
numbers of readers and writers before and after the patch were as
follows:

   # of Threads  Pre-patch    Post-patch
   ------------  ---------    ----------
        4          1,674        1,684
        8          1,062        1,074
       16            924          900
       32            300          458
       64            195          208
      128            164          168
      240            149          143

The performance change wasn't significant in this case, but this change
is required by a follow-on patch.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lock_events_list.h |  1 +
 kernel/locking/rwsem-xadd.c       | 88 ++++++++++++++++++++++++++-----
 kernel/locking/rwsem.h            |  3 ++
 3 files changed, 80 insertions(+), 12 deletions(-)

diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 29e5c52197fa..333ed5fda333 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -56,6 +56,7 @@ LOCK_EVENT(rwsem_sleep_reader)	/* # of reader sleeps			*/
 LOCK_EVENT(rwsem_sleep_writer)	/* # of writer sleeps			*/
 LOCK_EVENT(rwsem_wake_reader)	/* # of reader wakeups			*/
 LOCK_EVENT(rwsem_wake_writer)	/* # of writer wakeups			*/
+LOCK_EVENT(rwsem_opt_rlock)	/* # of read locks opt-spin acquired	*/
 LOCK_EVENT(rwsem_opt_wlock)	/* # of write locks opt-spin acquired	*/
 LOCK_EVENT(rwsem_opt_fail)	/* # of failed opt-spinnings		*/
 LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 51858554ff0e..1d61c6a5717f 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -246,6 +246,30 @@ rwsem_try_write_lock(long count, struct rw_semaphore *sem, bool first)
 }
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * Try to acquire read lock before the reader is put on wait queue.
+ * Lock acquisition isn't allowed if the rwsem is locked or a writer handoff
+ * is ongoing.
+ */
+static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+{
+	long count = atomic_long_read(&sem->count);
+
+	if (RWSEM_COUNT_WLOCKED_OR_HANDOFF(count))
+		return false;
+
+	count = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
+	if (!RWSEM_COUNT_WLOCKED_OR_HANDOFF(count)) {
+		rwsem_set_reader_owned(sem);
+		lockevent_inc(rwsem_opt_rlock);
+		return true;
+	}
+
+	/* Back out the change */
+	atomic_long_add(-RWSEM_READER_BIAS, &sem->count);
+	return false;
+}
+
 /*
  * Try to acquire write lock before the writer has been put on wait queue.
  */
@@ -280,9 +304,12 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 
 	BUILD_BUG_ON(!rwsem_has_anonymous_owner(RWSEM_OWNER_UNKNOWN));
 
-	if (need_resched())
+	if (need_resched()) {
+		lockevent_inc(rwsem_opt_fail);
 		return false;
+	}
 
+	preempt_disable();
 	rcu_read_lock();
 	owner = READ_ONCE(sem->owner);
 	if (owner) {
@@ -290,6 +317,9 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 		      owner_on_cpu(owner);
 	}
 	rcu_read_unlock();
+	preempt_enable();
+
+	lockevent_cond_inc(rwsem_opt_fail, !ret);
 	return ret;
 }
 
@@ -359,7 +389,7 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem)
 	return !owner ? OWNER_NULL : OWNER_READER;
 }
 
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 {
 	bool taken = false;
 	bool is_rt_task = rt_task(current);
@@ -368,9 +398,6 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	preempt_disable();
 
 	/* sem->wait_lock should not be held when doing optimistic spinning */
-	if (!rwsem_can_spin_on_owner(sem))
-		goto done;
-
 	if (!osq_lock(&sem->osq))
 		goto done;
 
@@ -390,10 +417,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		/*
 		 * Try to acquire the lock
 		 */
-		if (rwsem_try_write_lock_unqueued(sem)) {
-			taken = true;
+		taken = wlock ? rwsem_try_write_lock_unqueued(sem)
+			      : rwsem_try_read_lock_unqueued(sem);
+
+		if (taken)
 			break;
-		}
 
 		/*
 		 * An RT task cannot do optimistic spinning if it cannot
@@ -434,7 +462,12 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	return taken;
 }
 #else
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+{
+	return false;
+}
+
+static inline bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 {
 	return false;
 }
@@ -460,6 +493,33 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
+	if (!rwsem_can_spin_on_owner(sem))
+		goto queue;
+
+	/*
+	 * Undo read bias from down_read() and do optimistic spinning.
+	 */
+	atomic_long_add(-RWSEM_READER_BIAS, &sem->count);
+	adjustment = 0;
+	if (rwsem_optimistic_spin(sem, false)) {
+		unsigned long flags;
+
+		/*
+		 * Opportunistically wake up other readers in the wait queue.
+		 * It has another chance of wakeup at unlock time.
+		 */
+		if ((atomic_long_read(&sem->count) & RWSEM_FLAG_WAITERS) &&
+		    raw_spin_trylock_irqsave(&sem->wait_lock, flags)) {
+			if (!list_empty(&sem->wait_list))
+				__rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED,
+						  &wake_q);
+			raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
+			wake_up_q(&wake_q);
+		}
+		return sem;
+	}
+
+queue:
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_READ;
 	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
@@ -472,7 +532,7 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 		 * exit the slowpath and return immediately as its
 		 * RWSEM_READER_BIAS has already been set in the count.
 		 */
-		if (!(atomic_long_read(&sem->count) &
+		if (adjustment && !(atomic_long_read(&sem->count) &
 		     (RWSEM_WRITER_MASK | RWSEM_FLAG_HANDOFF))) {
 			raw_spin_unlock_irq(&sem->wait_lock);
 			rwsem_set_reader_owned(sem);
@@ -484,7 +544,10 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-	count = atomic_long_add_return(adjustment, &sem->count);
+	if (adjustment)
+		count = atomic_long_add_return(adjustment, &sem->count);
+	else
+		count = atomic_long_read(&sem->count);
 
 	/*
 	 * If there are no active locks, wake the front queued process(es).
@@ -556,7 +619,8 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_optimistic_spin(sem))
+	if (rwsem_can_spin_on_owner(sem) &&
+	    rwsem_optimistic_spin(sem, true))
 		return sem;
 
 	/*
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 3ba235491dc1..0a8657fe4bc9 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -65,9 +65,12 @@
 				 RWSEM_FLAG_HANDOFF)
 
 #define RWSEM_COUNT_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
+#define RWSEM_COUNT_WLOCKED(c)	((c) & RWSEM_WRITER_MASK)
 #define RWSEM_COUNT_HANDOFF(c)	((c) & RWSEM_FLAG_HANDOFF)
 #define RWSEM_COUNT_LOCKED_OR_HANDOFF(c)	\
 	((c) & (RWSEM_LOCK_MASK|RWSEM_FLAG_HANDOFF))
+#define RWSEM_COUNT_WLOCKED_OR_HANDOFF(c)	\
+	((c) & (RWSEM_WRITER_MASK | RWSEM_FLAG_HANDOFF))
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 08/12] locking/rwsem: Enable time-based spinning on reader-owned rwsem
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (6 preceding siblings ...)
  2019-04-05 19:21 ` [PATCH-tip v2 07/12] locking/rwsem: Enable readers spinning on writer Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 09/12] locking/rwsem: Add more rwsem owner access helpers Waiman Long
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

When the rwsem is owned by reader, writers stop optimistic spinning
simply because there is no easy way to figure out if all the readers
are actively running or not. However, there are scenarios where
the readers are unlikely to sleep and optimistic spinning can help
performance.

This patch provides a simple mechanism for spinning on a reader-owned
rwsem by a writer. It is a time threshold based spinning where the
allowable spinning time can vary from 10us to 25us depending on the
condition of the rwsem.

When the time threshold is exceeded, a bit will be set in the owner field
to indicate that no more optimistic spinning will be allowed on this
rwsem until it becomes writer owned again. Not even readers is allowed
to acquire the reader-locked rwsem by optimistic spinning for fairness.

The time taken for each iteration of the reader-owned rwsem spinning
loop varies. Below are sample minimum elapsed times for 16 iterations
of the loop.

      System                 Time for 16 Iterations
      ------                 ----------------------
  1-socket Skylake                  ~800ns
  4-socket Broadwell                ~300ns
  2-socket ThunderX2 (arm64)        ~250ns

When the lock cacheline is contended, we can see up to almost 10X
increase in elapsed time.  So 25us will be at most 500, 1300 and 1600
iterations for each of the above systems.

With a locking microbenchmark running on 5.1 based kernel, the total
locking rates (in kops/s) on a 8-socket IvyBridge-EX system with
equal numbers of readers and writers before and after this patch were
as follows:

   # of Threads  Pre-patch    Post-patch
   ------------  ---------    ----------
        2          1,759        6,684
        4          1,684        6,738
        8          1,074        7,222
       16            900        7,163
       32            458        7,316
       64            208          520
      128            168          425
      240            143          474

This patch gives a big boost in performance for mixed reader/writer
workloads.

With 32 locking threads, the rwsem lock event data were:

rwsem_opt_fail=79850
rwsem_opt_nospin=5069
rwsem_opt_rlock=597484
rwsem_opt_wlock=957339
rwsem_sleep_reader=57782
rwsem_sleep_writer=55663

With 64 locking threads, the data looked like:

rwsem_opt_fail=346723
rwsem_opt_nospin=6293
rwsem_opt_rlock=1127119
rwsem_opt_wlock=1400628
rwsem_sleep_reader=308201
rwsem_sleep_writer=72281

So a lot more threads acquired the lock in the slowpath and more threads
went to sleep.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lock_events_list.h |  1 +
 kernel/locking/rwsem-xadd.c       | 76 +++++++++++++++++++++++++++++--
 kernel/locking/rwsem.h            | 45 ++++++++++++++----
 3 files changed, 107 insertions(+), 15 deletions(-)

diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 333ed5fda333..f3550aa5866a 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -59,6 +59,7 @@ LOCK_EVENT(rwsem_wake_writer)	/* # of writer wakeups			*/
 LOCK_EVENT(rwsem_opt_rlock)	/* # of read locks opt-spin acquired	*/
 LOCK_EVENT(rwsem_opt_wlock)	/* # of write locks opt-spin acquired	*/
 LOCK_EVENT(rwsem_opt_fail)	/* # of failed opt-spinnings		*/
+LOCK_EVENT(rwsem_opt_nospin)	/* # of disabled reader opt-spinnings	*/
 LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
 LOCK_EVENT(rwsem_rlock_fast)	/* # of fast read locks acquired	*/
 LOCK_EVENT(rwsem_rlock_fail)	/* # of failed read lock acquisitions	*/
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 1d61c6a5717f..bc3fd14cf354 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -19,6 +19,7 @@
 #include <linux/sched/rt.h>
 #include <linux/sched/wake_q.h>
 #include <linux/sched/debug.h>
+#include <linux/sched/clock.h>
 #include <linux/osq_lock.h>
 
 #include "rwsem.h"
@@ -314,7 +315,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 	owner = READ_ONCE(sem->owner);
 	if (owner) {
 		ret = is_rwsem_owner_spinnable(owner) &&
-		      owner_on_cpu(owner);
+		     (is_rwsem_owner_reader(owner) || owner_on_cpu(owner));
 	}
 	rcu_read_unlock();
 	preempt_enable();
@@ -339,7 +340,7 @@ enum owner_state {
 	OWNER_READER		= 1 << 2,
 	OWNER_NONSPINNABLE	= 1 << 3,
 };
-#define OWNER_SPINNABLE		(OWNER_NULL | OWNER_WRITER)
+#define OWNER_SPINNABLE		(OWNER_NULL | OWNER_WRITER | OWNER_READER)
 
 static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
@@ -350,7 +351,8 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem)
 		return OWNER_NONSPINNABLE;
 
 	rcu_read_lock();
-	while (owner && (READ_ONCE(sem->owner) == owner)) {
+	while (owner && !is_rwsem_owner_reader(owner)
+		     && (READ_ONCE(sem->owner) == owner)) {
 		/*
 		 * Ensure we emit the owner->on_cpu, dereference _after_
 		 * checking sem->owner still matches owner, if that fails,
@@ -389,11 +391,47 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem)
 	return !owner ? OWNER_NULL : OWNER_READER;
 }
 
+/*
+ * Calculate reader-owned rwsem spinning threshold for writer
+ *
+ * It is assumed that the more readers own the rwsem, the longer it will
+ * take for them to wind down and free the rwsem. So the formula to
+ * determine the actual spinning time limit is:
+ *
+ * 1) RWSEM_FLAG_WAITERS set
+ *    Spinning threshold = (10 + nr_readers/2)us
+ *
+ * 2) RWSEM_FLAG_WAITERS not set
+ *    Spinning threshold = 25us
+ *
+ * In the first case when RWSEM_FLAG_WAITERS is set, no new reader can
+ * become rwsem owner. It is assumed that the more readers own the rwsem,
+ * the longer it will take for them to wind down and free the rwsem. This
+ * is subjected to a maximum value of 25us.
+ *
+ * In the second case with RWSEM_FLAG_WAITERS off, new readers can join
+ * and become one of the owners. So assuming for the worst case and spin
+ * for at most 25us.
+ */
+static inline u64 rwsem_rspin_threshold(struct rw_semaphore *sem)
+{
+	long count = atomic_long_read(&sem->count);
+	int reader_cnt = atomic_long_read(&sem->count) >> RWSEM_READER_SHIFT;
+
+	if (reader_cnt > 30)
+		reader_cnt = 30;
+	return sched_clock() + ((count & RWSEM_FLAG_WAITERS)
+		? 10 * NSEC_PER_USEC + reader_cnt * NSEC_PER_USEC/2
+		: 25 * NSEC_PER_USEC);
+}
+
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 {
 	bool taken = false;
 	bool is_rt_task = rt_task(current);
 	int prev_owner_state = OWNER_NULL;
+	int loop = 0;
+	u64 rspin_threshold = 0;
 
 	preempt_disable();
 
@@ -405,8 +443,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 	 * Optimistically spin on the owner field and attempt to acquire the
 	 * lock whenever the owner changes. Spinning will be stopped when:
 	 *  1) the owning writer isn't running; or
-	 *  2) readers own the lock as we can't determine if they are
-	 *     actively running or not.
+	 *  2) readers own the lock and spinning count has reached 0.
 	 */
 	for (;;) {
 		enum owner_state owner_state = rwsem_spin_on_owner(sem);
@@ -423,6 +460,35 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 		if (taken)
 			break;
 
+		/*
+		 * Time-based reader-owned rwsem optimistic spinning
+		 */
+		if (wlock && (owner_state == OWNER_READER)) {
+			/*
+			 * Initialize rspin_threshold when the owner
+			 * state changes from non-reader to reader.
+			 */
+			if (prev_owner_state != OWNER_READER) {
+				if (!is_rwsem_spinnable(sem))
+					break;
+				rspin_threshold = rwsem_rspin_threshold(sem);
+				loop = 0;
+			}
+
+			/*
+			 * Check time threshold every 16 iterations to
+			 * avoid calling sched_clock() too frequently.
+			 * This will make the actual spinning time a
+			 * bit more than that specified in the threshold.
+			 */
+			else if (!(++loop & 0xf) &&
+				 (sched_clock() > rspin_threshold)) {
+				rwsem_set_nonspinnable(sem);
+				lockevent_inc(rwsem_opt_nospin);
+				break;
+			}
+		}
+
 		/*
 		 * An RT task cannot do optimistic spinning if it cannot
 		 * be sure the lock holder is running or live-lock may
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 0a8657fe4bc9..1398799b7547 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -5,18 +5,20 @@
  *  - RWSEM_READER_OWNED (bit 0): The rwsem is owned by readers
  *  - RWSEM_ANONYMOUSLY_OWNED (bit 1): The rwsem is anonymously owned,
  *    i.e. the owner(s) cannot be readily determined. It can be reader
- *    owned or the owning writer is indeterminate.
+ *    owned or the owning writer is indeterminate. Optimistic spinning
+ *    should be disabled if this flag is set.
  *
  * When a writer acquires a rwsem, it puts its task_struct pointer
- * into the owner field. It is cleared after an unlock.
+ * into the owner field or the count itself (64-bit only. It should
+ * be cleared after an unlock.
  *
  * When a reader acquires a rwsem, it will also puts its task_struct
- * pointer into the owner field with both the RWSEM_READER_OWNED and
- * RWSEM_ANONYMOUSLY_OWNED bits set. On unlock, the owner field will
- * largely be left untouched. So for a free or reader-owned rwsem,
- * the owner value may contain information about the last reader that
- * acquires the rwsem. The anonymous bit is set because that particular
- * reader may or may not still own the lock.
+ * pointer into the owner field with the RWSEM_READER_OWNED bit set.
+ * On unlock, the owner field will largely be left untouched. So
+ * for a free or reader-owned rwsem, the owner value may contain
+ * information about the last reader that acquires the rwsem. The
+ * anonymous bit may also be set to permanently disable optimistic
+ * spinning on a reader-own rwsem until a writer comes along.
  *
  * That information may be helpful in debugging cases where the system
  * seems to hang on a reader owned rwsem especially if only one reader
@@ -101,8 +103,7 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem,
 					    struct task_struct *owner)
 {
-	unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED
-						 | RWSEM_ANONYMOUSLY_OWNED;
+	unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED;
 
 	WRITE_ONCE(sem->owner, (struct task_struct *)val);
 }
@@ -127,6 +128,14 @@ static inline bool is_rwsem_owner_reader(struct task_struct *owner)
 	return (unsigned long)owner & RWSEM_READER_OWNED;
 }
 
+/*
+ * Return true if the rwsem is spinnable.
+ */
+static inline bool is_rwsem_spinnable(struct rw_semaphore *sem)
+{
+	return is_rwsem_owner_spinnable(READ_ONCE(sem->owner));
+}
+
 /*
  * Return true if rwsem is owned by an anonymous writer or readers.
  */
@@ -185,6 +194,22 @@ extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, long count);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
+/*
+ * Set the RWSEM_ANONYMOUSLY_OWNED flag if the RWSEM_READER_OWNED flag
+ * remains set. Otherwise, the operation will be aborted.
+ */
+static inline void rwsem_set_nonspinnable(struct rw_semaphore *sem)
+{
+	long owner = (long)READ_ONCE(sem->owner);
+
+	while (is_rwsem_owner_reader((struct task_struct *)owner)) {
+		if (!is_rwsem_owner_spinnable((struct task_struct *)owner))
+			break;
+		owner = cmpxchg((long *)&sem->owner, owner,
+				owner | RWSEM_ANONYMOUSLY_OWNED);
+	}
+}
+
 /*
  * lock for reading
  */
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 09/12] locking/rwsem: Add more rwsem owner access helpers
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (7 preceding siblings ...)
  2019-04-05 19:21 ` [PATCH-tip v2 08/12] locking/rwsem: Enable time-based spinning on reader-owned rwsem Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 10/12] locking/rwsem: Guard against making count negative Waiman Long
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

Before combining owner and count, we are adding two new helpers for
accessing the owner value in the rwsem.

 1) struct task_struct *rwsem_get_owner(struct rw_semaphore *sem)
 2) bool is_rwsem_reader_owned(struct rw_semaphore *sem)

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 15 ++++++++++-----
 kernel/locking/rwsem.c      |  3 +--
 kernel/locking/rwsem.h      | 32 ++++++++++++++++++++++++++------
 3 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index bc3fd14cf354..a9756e3e3a73 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -312,7 +312,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 
 	preempt_disable();
 	rcu_read_lock();
-	owner = READ_ONCE(sem->owner);
+	owner = rwsem_get_owner(sem);
 	if (owner) {
 		ret = is_rwsem_owner_spinnable(owner) &&
 		     (is_rwsem_owner_reader(owner) || owner_on_cpu(owner));
@@ -344,15 +344,21 @@ enum owner_state {
 
 static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
-	struct task_struct *owner = READ_ONCE(sem->owner);
+	struct task_struct *owner = rwsem_get_owner(sem);
 	long count;
 
 	if (!is_rwsem_owner_spinnable(owner))
 		return OWNER_NONSPINNABLE;
 
 	rcu_read_lock();
-	while (owner && !is_rwsem_owner_reader(owner)
-		     && (READ_ONCE(sem->owner) == owner)) {
+	while (owner && !is_rwsem_owner_reader(owner)) {
+		struct task_struct *new_owner = rwsem_get_owner(sem);
+
+		if (new_owner != owner) {
+			owner = new_owner;
+			break;	/* The owner has changed */
+		}
+
 		/*
 		 * Ensure we emit the owner->on_cpu, dereference _after_
 		 * checking sem->owner still matches owner, if that fails,
@@ -379,7 +385,6 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem)
 	 * spinning except when here is no active locks and the handoff bit
 	 * is set. In this case, we have to stop spinning.
 	 */
-	owner = READ_ONCE(sem->owner);
 	if (!is_rwsem_owner_spinnable(owner))
 		return OWNER_NONSPINNABLE;
 	if (owner && !is_rwsem_owner_reader(owner))
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index ccbf18f560ff..38d24676e01c 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -198,8 +198,7 @@ EXPORT_SYMBOL(down_write_killable_nested);
 
 void up_read_non_owner(struct rw_semaphore *sem)
 {
-	DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED),
-				sem);
+	DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	__up_read(sem);
 }
 
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 1398799b7547..bbb8a266dd50 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -92,6 +92,11 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 	WRITE_ONCE(sem->owner, NULL);
 }
 
+static inline struct task_struct *rwsem_get_owner(struct rw_semaphore *sem)
+{
+	return READ_ONCE(sem->owner);
+}
+
 /*
  * The task_struct pointer of the last owning reader will be left in
  * the owner field.
@@ -136,6 +141,23 @@ static inline bool is_rwsem_spinnable(struct rw_semaphore *sem)
 	return is_rwsem_owner_spinnable(READ_ONCE(sem->owner));
 }
 
+/*
+ * Return true if the rwsem is owned by a reader.
+ */
+static inline bool is_rwsem_reader_owned(struct rw_semaphore *sem)
+{
+#ifdef CONFIG_DEBUG_RWSEMS
+	/*
+	 * Check the count to see if it is write-locked.
+	 */
+	long count = atomic_long_read(&sem->count);
+
+	if (count & RWSEM_WRITER_MASK)
+		return false;
+#endif
+	return (unsigned long)sem->owner & RWSEM_READER_OWNED;
+}
+
 /*
  * Return true if rwsem is owned by an anonymous writer or readers.
  */
@@ -156,6 +178,7 @@ static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem)
 {
 	unsigned long val = (unsigned long)current | RWSEM_READER_OWNED
 						   | RWSEM_ANONYMOUSLY_OWNED;
+
 	if (READ_ONCE(sem->owner) == (struct task_struct *)val)
 		cmpxchg_relaxed((unsigned long *)&sem->owner, val,
 				RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED);
@@ -218,8 +241,7 @@ static inline void __down_read(struct rw_semaphore *sem)
 	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
 			&sem->count) & RWSEM_READ_FAILED_MASK)) {
 		rwsem_down_read_failed(sem);
-		DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner &
-					RWSEM_READER_OWNED), sem);
+		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	} else {
 		rwsem_set_reader_owned(sem);
 	}
@@ -231,8 +253,7 @@ static inline int __down_read_killable(struct rw_semaphore *sem)
 			&sem->count) & RWSEM_READ_FAILED_MASK)) {
 		if (IS_ERR(rwsem_down_read_failed_killable(sem)))
 			return -EINTR;
-		DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner &
-					RWSEM_READER_OWNED), sem);
+		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	} else {
 		rwsem_set_reader_owned(sem);
 	}
@@ -299,8 +320,7 @@ static inline void __up_read(struct rw_semaphore *sem)
 {
 	long tmp;
 
-	DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED),
-				sem);
+	DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	rwsem_clear_reader_owned(sem);
 	tmp = atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count);
 	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS))
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 10/12] locking/rwsem: Guard against making count negative
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (8 preceding siblings ...)
  2019-04-05 19:21 ` [PATCH-tip v2 09/12] locking/rwsem: Add more rwsem owner access helpers Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 11/12] locking/rwsem: Merge owner into count on x86-64 Waiman Long
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

The upper bits of the count field is used as reader count. When
sufficient number of active readers are present, the most significant
bit will be set and the count becomes negative. If the number of active
readers keep on piling up, we may eventually overflow the reader counts.
This is not likely to happen unless the number of bits reserved for
reader count is reduced because those bits are need for other purpose.

To prevent this count overflow from happening, the most significant bit
is now treated as a guard bit (RWSEM_FLAG_READFAIL). Read-lock attempts
will now fail for both the fast and optimistic spinning paths whenever
this bit is set. So all those extra readers will be put to sleep in
the wait queue. Wakeup will not happen until the reader count reaches 0.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 38 +++++++++++++++++++-----
 kernel/locking/rwsem.h      | 59 ++++++++++++++++++++++++++-----------
 2 files changed, 73 insertions(+), 24 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a9756e3e3a73..d1aee97ee97d 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -92,7 +92,8 @@ enum rwsem_wake_type {
 /*
  * We limit the maximum number of readers that can be woken up for a
  * wake-up call to not penalizing the waking thread for spending too
- * much time doing it.
+ * much time doing it as well as the unlikely possiblity of overflowing
+ * the reader count.
  */
 #define MAX_READERS_WAKEUP	0x100
 
@@ -558,12 +559,35 @@ rwsem_waiter_is_first(struct rw_semaphore *sem, struct rwsem_waiter *waiter)
  * Wait for the read lock to be granted
  */
 static inline struct rw_semaphore __sched *
-__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
+__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state, long count)
 {
-	long count, adjustment = -RWSEM_READER_BIAS;
+	long adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
+	if (unlikely(count < 0)) {
+		/*
+		 * The sign bit has been set meaning that too many active
+		 * readers are present. We need to decrement reader count &
+		 * enter wait queue immediately to avoid overflowing the
+		 * reader count.
+		 *
+		 * As preemption is not disabled, there is a remote
+		 * possibility that premption can happen in the narrow
+		 * timing window between incrementing and decrementing
+		 * the reader count and the task is put to sleep for a
+		 * considerable amount of time. If sufficient number
+		 * of such unfortunate sequence of events happen, we
+		 * may still overflow the reader count. It is extremely
+		 * unlikey, though. If this is a concern, we should consider
+		 * disable preemption during this timing window to make
+		 * sure that such unfortunate event will not happen.
+		 */
+		atomic_long_add(-RWSEM_READER_BIAS, &sem->count);
+		adjustment = 0;
+		goto queue;
+	}
+
 	if (!rwsem_can_spin_on_owner(sem))
 		goto queue;
 
@@ -664,16 +688,16 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 }
 
 __visible struct rw_semaphore * __sched
-rwsem_down_read_failed(struct rw_semaphore *sem)
+rwsem_down_read_failed(struct rw_semaphore *sem, long cnt)
 {
-	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE);
+	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE, cnt);
 }
 EXPORT_SYMBOL(rwsem_down_read_failed);
 
 __visible struct rw_semaphore * __sched
-rwsem_down_read_failed_killable(struct rw_semaphore *sem)
+rwsem_down_read_failed_killable(struct rw_semaphore *sem, long cnt)
 {
-	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE);
+	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE, cnt);
 }
 EXPORT_SYMBOL(rwsem_down_read_failed_killable);
 
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index bbb8a266dd50..52946fc0144e 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -43,13 +43,28 @@
 #endif
 
 /*
- * The definition of the atomic counter in the semaphore:
+ * On 64-bit architectures, the bit definitions of the count are:
  *
- * Bit  0   - writer locked bit
- * Bit  1   - waiters present bit
- * Bit  2   - lock handoff bit
- * Bits 3-7 - reserved
- * Bits 8-X - 24-bit (32-bit) or 56-bit reader count
+ * Bit  0    - writer locked bit
+ * Bit  1    - waiters present bit
+ * Bit  2    - lock handoff bit
+ * Bits 3-7  - reserved
+ * Bits 8-62 - 55-bit reader count
+ * Bit  63   - read fail bit
+ *
+ * On 32-bit architectures, the bit definitions of the count are:
+ *
+ * Bit  0    - writer locked bit
+ * Bit  1    - waiters present bit
+ * Bit  2    - lock handoff bit
+ * Bits 3-7  - reserved
+ * Bits 8-30 - 23-bit reader count
+ * Bit  31   - read fail bit
+ *
+ * It is not likely that the most significant bit (read fail bit) will ever
+ * be set. This guard bit is still checked anyway in the down_read() fastpath
+ * just in case we need to use up more of the reader bits for other purpose
+ * in the future.
  *
  * atomic_long_fetch_add() is used to obtain reader lock, whereas
  * atomic_long_cmpxchg() will be used to obtain writer lock.
@@ -57,6 +72,7 @@
 #define RWSEM_WRITER_LOCKED	(1UL << 0)
 #define RWSEM_FLAG_WAITERS	(1UL << 1)
 #define RWSEM_FLAG_HANDOFF	(1UL << 2)
+#define RWSEM_FLAG_READFAIL	(1UL << (BITS_PER_LONG - 1))
 
 #define RWSEM_READER_SHIFT	8
 #define RWSEM_READER_BIAS	(1UL << RWSEM_READER_SHIFT)
@@ -64,7 +80,7 @@
 #define RWSEM_WRITER_MASK	RWSEM_WRITER_LOCKED
 #define RWSEM_LOCK_MASK		(RWSEM_WRITER_MASK|RWSEM_READER_MASK)
 #define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\
-				 RWSEM_FLAG_HANDOFF)
+				 RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL)
 
 #define RWSEM_COUNT_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
 #define RWSEM_COUNT_WLOCKED(c)	((c) & RWSEM_WRITER_MASK)
@@ -210,10 +226,15 @@ static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem)
 }
 #endif
 
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
+extern struct rw_semaphore *
+rwsem_down_read_failed(struct rw_semaphore *sem, long count);
+extern struct rw_semaphore *
+rwsem_down_read_failed_killable(struct rw_semaphore *sem, long count);
+extern struct rw_semaphore *
+rwsem_down_write_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *
+rwsem_down_write_failed_killable(struct rw_semaphore *sem);
+
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, long count);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
@@ -238,9 +259,11 @@ static inline void rwsem_set_nonspinnable(struct rw_semaphore *sem)
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
-			&sem->count) & RWSEM_READ_FAILED_MASK)) {
-		rwsem_down_read_failed(sem);
+	long count = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
+						   &sem->count);
+
+	if (unlikely(count & RWSEM_READ_FAILED_MASK)) {
+		rwsem_down_read_failed(sem, count);
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	} else {
 		rwsem_set_reader_owned(sem);
@@ -249,9 +272,11 @@ static inline void __down_read(struct rw_semaphore *sem)
 
 static inline int __down_read_killable(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
-			&sem->count) & RWSEM_READ_FAILED_MASK)) {
-		if (IS_ERR(rwsem_down_read_failed_killable(sem)))
+	long count = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
+						   &sem->count);
+
+	if (unlikely(count & RWSEM_READ_FAILED_MASK)) {
+		if (IS_ERR(rwsem_down_read_failed_killable(sem, count)))
 			return -EINTR;
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	} else {
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 11/12] locking/rwsem: Merge owner into count on x86-64
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (9 preceding siblings ...)
  2019-04-05 19:21 ` [PATCH-tip v2 10/12] locking/rwsem: Guard against making count negative Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-05 19:21 ` [PATCH-tip v2 12/12] locking/rwsem: Remove redundant computation of writer lock word Waiman Long
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

With separate count and owner, there are timing windows where the two
values are inconsistent. That can cause problem when trying to figure
out the exact state of the rwsem. For instance, a RT task will stop
optimistic spinning if the lock is acquired by a writer but the owner
field isn't set yet. That can be solved by combining the count and
owner together in a single atomic value.

On 32-bit architectures, there aren't enough bits to hold both.
64-bit architectures, however, can have enough bits to do that. For
x86-64, the physical address can use up to 52 bits. That is 4PB of
memory. That leaves 12 bits available for other use. The task structure
pointer is aligned to the L1 cache size. That means another 6 bits
(64 bytes cacheline) will be available. Reserving 2 bits for status
flags, we will have 16 bits for the reader count and the read fail bit.
That can supports up to (32k-1) readers.

The owner value will still be duplicated in the owner field as that
will ease debugging when looking at core dump.

This change is currently enabled for x86-64 only. Other 64-bit
architectures may be enabled in the future if the need arises.

With a locking microbenchmark running on 5.1 based kernel, the total
locking rates (in kops/s) on a 8-socket IvyBridge-EX system with
writer-only locking threads and then equal numbers of readers and writers
(mixed) before patch and after this and subsequent related patches were
as follows:

                  Before Patch      After Patch
   # of Threads  wlock    mixed    wlock    mixed
   ------------  -----    -----    -----    -----
        1        30,422   31,034   30,323   30,379
        2         6,427    6,684    7,804    9,436
        4         6,742    6,738    7,568    8,268
        8         7,092    7,222    5,679    7,041
       16         6,882    7,163    6,848    7,652
       32         7,458    7,316    7,975    2,189
       64         7,906      520    8,269      534
      128         1,680      425    8,047      448

In the single thread case, the complex write-locking operation does
introduce a little bit of overhead (about 0.3%). For the contended cases,
except for some anomalies in the data, there is no evidence that this
change will adversely impact performance.

When running the same microbenchmark with RT locking threads instead,
we got the following results:

                  Before Patch      After Patch
   # of Threads  wlock    mixed    wlock    mixed
   ------------  -----    -----    -----    -----
        2         4,065    3,642    4,756    5,062
        4         2,254    1,907    3,460    2,496
        8         2,386      964    3,012    1,964
       16         2,095    1,596    3,083    1,862
       32         2,388      530    3,717      359
       64         1,424      322    4,060      401
      128         1,642      510    4,488      628

It is obvious that RT tasks can benefit pretty significantly with this set
of patches.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c |  11 ++--
 kernel/locking/rwsem.h      | 101 +++++++++++++++++++++++++++++++++---
 2 files changed, 103 insertions(+), 9 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index d1aee97ee97d..ebdf21fcd75e 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -27,11 +27,11 @@
 /*
  * Guide to the rw_semaphore's count field.
  *
- * When the RWSEM_WRITER_LOCKED bit in count is set, the lock is owned
- * by a writer.
+ * When any of the RWSEM_WRITER_MASK bits in count is set, the lock is
+ * owned by a writer.
  *
  * The lock is owned by readers when
- * (1) the RWSEM_WRITER_LOCKED isn't set in count,
+ * (1) none of the RWSEM_WRITER_MASK bits is set in count,
  * (2) some of the reader bits are set in count, and
  * (3) the owner field has RWSEM_READ_OWNED bit set.
  *
@@ -47,6 +47,11 @@
 void __init_rwsem(struct rw_semaphore *sem, const char *name,
 		  struct lock_class_key *key)
 {
+	/*
+	 * We should support at least (4k-1) concurrent readers
+	 */
+	BUILD_BUG_ON(sizeof(long) * 8 - RWSEM_READER_SHIFT < 12);
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	/*
 	 * Make sure we are not reinitializing a held semaphore:
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 52946fc0144e..1153500694ed 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -43,7 +43,41 @@
 #endif
 
 /*
- * On 64-bit architectures, the bit definitions of the count are:
+ * Enable the merging of owner into count for x86-64 only.
+ */
+#ifdef CONFIG_X86_64
+#define RWSEM_MERGE_OWNER_TO_COUNT
+#endif
+
+/*
+ * With separate count and owner, there are timing windows where the two
+ * values are inconsistent. That can cause problem when trying to figure
+ * out the exact state of the rwsem. That can be solved by combining
+ * the count and owner together in a single atomic value.
+ *
+ * On 64-bit architectures, the owner task structure pointer can be
+ * compressed and combined with reader count and other status flags.
+ * A simple compression method is to map the virtual address back to
+ * the physical address by subtracting PAGE_OFFSET. On 32-bit
+ * architectures, the long integer value just isn't big enough for
+ * combining owner and count. So they remain separate.
+ *
+ * For x86-64, the physical address can use up to 52 bits. That is 4PB
+ * of memory. That leaves 12 bits available for other use. The task
+ * structure pointer is also aligned to the L1 cache size. That means
+ * another 6 bits (64 bytes cacheline) will be available. Reserving
+ * 2 bits for status flags, we will have 16 bits for the reader count
+ * and read fail bit. That can supports up to (32k-1) active readers.
+ *
+ * On x86-64, the bit definitions of the count are:
+ *
+ * Bit   0    - waiters present bit
+ * Bit   1    - lock handoff bit
+ * Bits  2-47 - compressed task structure pointer
+ * Bits 48-62 - 15-bit reader counts
+ * Bit  63    - read fail bit
+ *
+ * On other 64-bit architectures, the bit definitions are:
  *
  * Bit  0    - writer locked bit
  * Bit  1    - waiters present bit
@@ -69,15 +103,30 @@
  * atomic_long_fetch_add() is used to obtain reader lock, whereas
  * atomic_long_cmpxchg() will be used to obtain writer lock.
  */
-#define RWSEM_WRITER_LOCKED	(1UL << 0)
-#define RWSEM_FLAG_WAITERS	(1UL << 1)
-#define RWSEM_FLAG_HANDOFF	(1UL << 2)
+#define RWSEM_FLAG_WAITERS	(1UL << 0)
+#define RWSEM_FLAG_HANDOFF	(1UL << 1)
 #define RWSEM_FLAG_READFAIL	(1UL << (BITS_PER_LONG - 1))
 
+
+#ifdef RWSEM_MERGE_OWNER_TO_COUNT
+
+#ifdef __PHYSICAL_MASK_SHIFT
+#define RWSEM_PA_MASK_SHIFT	__PHYSICAL_MASK_SHIFT
+#else
+#define RWSEM_PA_MASK_SHIFT	52
+#endif
+#define RWSEM_READER_SHIFT	(RWSEM_PA_MASK_SHIFT - L1_CACHE_SHIFT + 2)
+#define RWSEM_WRITER_MASK	((1UL << RWSEM_READER_SHIFT) - 4)
+#define RWSEM_WRITER_LOCKED	rwsem_owner_count(current)
+
+#else /* RWSEM_MERGE_OWNER_TO_COUNT */
 #define RWSEM_READER_SHIFT	8
+#define RWSEM_WRITER_MASK	(1UL << 7)
+#define RWSEM_WRITER_LOCKED	RWSEM_WRITER_MASK
+#endif /* RWSEM_MERGE_OWNER_TO_COUNT */
+
 #define RWSEM_READER_BIAS	(1UL << RWSEM_READER_SHIFT)
 #define RWSEM_READER_MASK	(~(RWSEM_READER_BIAS - 1))
-#define RWSEM_WRITER_MASK	RWSEM_WRITER_LOCKED
 #define RWSEM_LOCK_MASK		(RWSEM_WRITER_MASK|RWSEM_READER_MASK)
 #define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\
 				 RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL)
@@ -90,6 +139,22 @@
 #define RWSEM_COUNT_WLOCKED_OR_HANDOFF(c)	\
 	((c) & (RWSEM_WRITER_MASK | RWSEM_FLAG_HANDOFF))
 
+/*
+ * Task structure pointer compression (64-bit only):
+ * (owner - PAGE_OFFSET) >> (L1_CACHE_SHIFT - 2)
+ */
+static inline unsigned long rwsem_owner_count(struct task_struct *owner)
+{
+	return ((unsigned long)owner - PAGE_OFFSET) >> (L1_CACHE_SHIFT - 2);
+}
+
+static inline unsigned long rwsem_count_owner(long count)
+{
+	unsigned long writer = (unsigned long)count & RWSEM_WRITER_MASK;
+
+	return writer ? (writer << (L1_CACHE_SHIFT - 2)) + PAGE_OFFSET : 0;
+}
+
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
  * All writes to owner are protected by WRITE_ONCE() to make sure that
@@ -97,7 +162,12 @@
  * the owner value concurrently without lock. Read from owner, however,
  * may not need READ_ONCE() as long as the pointer value is only used
  * for comparison and isn't being dereferenced.
+ *
+ * On 32-bit architectures, the owner and count are separate. On 64-bit
+ * architectures, however, the writer task structure pointer is written
+ * to the count as well in addition to the owner field.
  */
+
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
 	WRITE_ONCE(sem->owner, current);
@@ -108,10 +178,26 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 	WRITE_ONCE(sem->owner, NULL);
 }
 
+#ifdef RWSEM_MERGE_OWNER_TO_COUNT
+/*
+ * Get the owner value from count to have early access to the task structure.
+ * Owner from sem->count should includes the RWSEM_ANONYMOUSLY_OWNED bit
+ * from sem->owner.
+ */
+static inline struct task_struct *rwsem_get_owner(struct rw_semaphore *sem)
+{
+	unsigned long cowner = rwsem_count_owner(atomic_long_read(&sem->count));
+	unsigned long sowner = (unsigned long)READ_ONCE(sem->owner);
+
+	return (struct task_struct *) (cowner
+		? cowner | (sowner & RWSEM_ANONYMOUSLY_OWNED) : sowner);
+}
+#else /* !RWSEM_MERGE_OWNER_TO_COUNT */
 static inline struct task_struct *rwsem_get_owner(struct rw_semaphore *sem)
 {
 	return READ_ONCE(sem->owner);
 }
+#endif /* RWSEM_MERGE_OWNER_TO_COUNT */
 
 /*
  * The task_struct pointer of the last owning reader will be left in
@@ -312,6 +398,9 @@ static inline void __down_write(struct rw_semaphore *sem)
 						 RWSEM_WRITER_LOCKED)))
 		rwsem_down_write_failed(sem);
 	rwsem_set_owner(sem);
+#ifdef RWSEM_MERGE_OWNER_TO_COUNT
+	DEBUG_RWSEMS_WARN_ON(sem->owner != rwsem_get_owner(sem), sem);
+#endif
 }
 
 static inline int __down_write_killable(struct rw_semaphore *sem)
@@ -362,7 +451,7 @@ static inline void __up_write(struct rw_semaphore *sem)
 
 	DEBUG_RWSEMS_WARN_ON(sem->owner != current, sem);
 	rwsem_clear_owner(sem);
-	tmp = atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count);
+	tmp = atomic_long_fetch_and_release(~RWSEM_WRITER_MASK, &sem->count);
 	if (unlikely(tmp & RWSEM_FLAG_WAITERS))
 		rwsem_wake(sem, tmp);
 }
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH-tip v2 12/12] locking/rwsem: Remove redundant computation of writer lock word
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (10 preceding siblings ...)
  2019-04-05 19:21 ` [PATCH-tip v2 11/12] locking/rwsem: Merge owner into count on x86-64 Waiman Long
@ 2019-04-05 19:21 ` Waiman Long
  2019-04-05 23:27 ` [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Linus Torvalds
  2019-04-10 10:00 ` Ingo Molnar
  13 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-05 19:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner
  Cc: linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen,
	Waiman Long

On 64-bit architectures, each rwsem writer will have its unique lock
word for acquiring the lock. Right now, the writer code recomputes the
lock word every time it tries to acquire the lock. This is a waste of
time. The lock word is now cached and reused when it is needed.

On 32-bit architectures, the extra constant argument to
rwsem_try_write_lock() and rwsem_try_write_lock_unqueued() should be
optimized out by the compiler.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index ebdf21fcd75e..24ce7e2d4716 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -230,8 +230,8 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
  */
-static inline bool
-rwsem_try_write_lock(long count, struct rw_semaphore *sem, bool first)
+static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem,
+					const long wlock, bool first)
 {
 	long new;
 
@@ -241,7 +241,7 @@ rwsem_try_write_lock(long count, struct rw_semaphore *sem, bool first)
 	if (!first && RWSEM_COUNT_HANDOFF(count))
 		return false;
 
-	new = (count & ~RWSEM_FLAG_HANDOFF) + RWSEM_WRITER_LOCKED -
+	new = (count & ~RWSEM_FLAG_HANDOFF) + wlock -
 	      (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0);
 
 	if (atomic_long_try_cmpxchg_acquire(&sem->count, &count, new)) {
@@ -280,13 +280,14 @@ static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
 /*
  * Try to acquire write lock before the writer has been put on wait queue.
  */
-static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
+static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem,
+						 const long wlock)
 {
 	long count = atomic_long_read(&sem->count);
 
 	while (!RWSEM_COUNT_LOCKED_OR_HANDOFF(count)) {
 		if (atomic_long_try_cmpxchg_acquire(&sem->count, &count,
-					count + RWSEM_WRITER_LOCKED)) {
+						    count + wlock)) {
 			rwsem_set_owner(sem);
 			lockevent_inc(rwsem_opt_wlock);
 			return true;
@@ -436,7 +437,7 @@ static inline u64 rwsem_rspin_threshold(struct rw_semaphore *sem)
 		: 25 * NSEC_PER_USEC);
 }
 
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock)
 {
 	bool taken = false;
 	bool is_rt_task = rt_task(current);
@@ -465,7 +466,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 		/*
 		 * Try to acquire the lock
 		 */
-		taken = wlock ? rwsem_try_write_lock_unqueued(sem)
+		taken = wlock ? rwsem_try_write_lock_unqueued(sem, wlock)
 			      : rwsem_try_read_lock_unqueued(sem);
 
 		if (taken)
@@ -544,7 +545,8 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 	return false;
 }
 
-static inline bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
+static inline bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+					 const long wlock)
 {
 	return false;
 }
@@ -601,7 +603,7 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state, long count)
 	 */
 	atomic_long_add(-RWSEM_READER_BIAS, &sem->count);
 	adjustment = 0;
-	if (rwsem_optimistic_spin(sem, false)) {
+	if (rwsem_optimistic_spin(sem, 0)) {
 		unsigned long flags;
 
 		/*
@@ -717,10 +719,11 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 	struct rwsem_waiter waiter;
 	struct rw_semaphore *ret = sem;
 	DEFINE_WAKE_Q(wake_q);
+	const long wlock = RWSEM_WRITER_LOCKED;
 
 	/* do optimistic spinning and steal lock if possible */
 	if (rwsem_can_spin_on_owner(sem) &&
-	    rwsem_optimistic_spin(sem, true))
+	    rwsem_optimistic_spin(sem, wlock))
 		return sem;
 
 	/*
@@ -779,7 +782,7 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
 	while (true) {
-		if (rwsem_try_write_lock(count, sem, first))
+		if (rwsem_try_write_lock(count, sem, wlock, first))
 			break;
 
 		raw_spin_unlock_irq(&sem->wait_lock);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (11 preceding siblings ...)
  2019-04-05 19:21 ` [PATCH-tip v2 12/12] locking/rwsem: Remove redundant computation of writer lock word Waiman Long
@ 2019-04-05 23:27 ` Linus Torvalds
  2019-04-10 10:00 ` Ingo Molnar
  13 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2019-04-05 23:27 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	Linux List Kernel Mailing, the arch/x86 maintainers,
	Davidlohr Bueso, Tim Chen

On Fri, Apr 5, 2019 at 9:22 AM Waiman Long <longman@redhat.com> wrote:
>
>  v2:
>   - Move the negative reader count checking patch (patch 12->10)
>     forward to before the merge owner to count patch as suggested by
>     Linus & expand the comment.

All looks good to me now.

I didn't apply and _test_ any of the patches, but they look sane and
the numbers you quote are compelling.

So ack as far as I'm concerned,

                   Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2
  2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
                   ` (12 preceding siblings ...)
  2019-04-05 23:27 ` [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Linus Torvalds
@ 2019-04-10 10:00 ` Ingo Molnar
  2019-04-10 12:38   ` Waiman Long
  13 siblings, 1 reply; 30+ messages in thread
From: Ingo Molnar @ 2019-04-10 10:00 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen


* Waiman Long <longman@redhat.com> wrote:

>    # of Threads   Before Patch      After Patch
>    ------------   ------------      -----------
>         2            1,179             9,436
>         4            1,505             8,268
>         8              721             7,041
>        16              575             7,652
>        32               70             2,189
>        64               39               534
> 
> Waiman Long (12):
>   locking/rwsem: Implement a new locking scheme
>   locking/rwsem: Implement lock handoff to prevent lock starvation
>   locking/rwsem: Remove rwsem_wake() wakeup optimization
>   locking/rwsem: Make rwsem_spin_on_owner() return owner state
>   locking/rwsem: Ensure an RT task will not spin on reader
>   locking/rwsem: Wake up almost all readers in wait queue
>   locking/rwsem: Enable readers spinning on writer
>   locking/rwsem: Enable time-based spinning on reader-owned rwsem
>   locking/rwsem: Add more rwsem owner access helpers
>   locking/rwsem: Guard against making count negative
>   locking/rwsem: Merge owner into count on x86-64
>   locking/rwsem: Remove redundant computation of writer lock word
> 
>  kernel/locking/lock_events_list.h |   4 +
>  kernel/locking/rwsem-xadd.c       | 635 +++++++++++++++++++-----------
>  kernel/locking/rwsem.c            |   3 +-
>  kernel/locking/rwsem.h            | 290 +++++++++++---
>  4 files changed, 647 insertions(+), 285 deletions(-)

Looks very nice!

With both sets of patches applied they pass all x86 tests, but there's 
cross-build failures on powerpc32, sparc32 and UML:

   kernel/locking/rwsem.h:333:34: error: 'struct rw_semaphore' has no member named 'owner'

Could you please send a delta patch against tip:WIP.locking/core, which 
is the work-in-progress tree? I'll backmerge the delta patch (and any 
other updates you might have meanwhil) to the one that introduces the 
breakage.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2
  2019-04-10 10:00 ` Ingo Molnar
@ 2019-04-10 12:38   ` Waiman Long
  0 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-10 12:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	linux-kernel, x86, Davidlohr Bueso, Linus Torvalds, Tim Chen

On 04/10/2019 06:00 AM, Ingo Molnar wrote:
> * Waiman Long <longman@redhat.com> wrote:
>
>>    # of Threads   Before Patch      After Patch
>>    ------------   ------------      -----------
>>         2            1,179             9,436
>>         4            1,505             8,268
>>         8              721             7,041
>>        16              575             7,652
>>        32               70             2,189
>>        64               39               534
>>
>> Waiman Long (12):
>>   locking/rwsem: Implement a new locking scheme
>>   locking/rwsem: Implement lock handoff to prevent lock starvation
>>   locking/rwsem: Remove rwsem_wake() wakeup optimization
>>   locking/rwsem: Make rwsem_spin_on_owner() return owner state
>>   locking/rwsem: Ensure an RT task will not spin on reader
>>   locking/rwsem: Wake up almost all readers in wait queue
>>   locking/rwsem: Enable readers spinning on writer
>>   locking/rwsem: Enable time-based spinning on reader-owned rwsem
>>   locking/rwsem: Add more rwsem owner access helpers
>>   locking/rwsem: Guard against making count negative
>>   locking/rwsem: Merge owner into count on x86-64
>>   locking/rwsem: Remove redundant computation of writer lock word
>>
>>  kernel/locking/lock_events_list.h |   4 +
>>  kernel/locking/rwsem-xadd.c       | 635 +++++++++++++++++++-----------
>>  kernel/locking/rwsem.c            |   3 +-
>>  kernel/locking/rwsem.h            | 290 +++++++++++---
>>  4 files changed, 647 insertions(+), 285 deletions(-)
> Looks very nice!
>
> With both sets of patches applied they pass all x86 tests, but there's 
> cross-build failures on powerpc32, sparc32 and UML:
>
>    kernel/locking/rwsem.h:333:34: error: 'struct rw_semaphore' has no member named 'owner'
>
> Could you please send a delta patch against tip:WIP.locking/core, which 
> is the work-in-progress tree? I'll backmerge the delta patch (and any 
> other updates you might have meanwhil) to the one that introduces the 
> breakage.
>
> Thanks,
>
> 	Ingo

Sure. I will look into that and fix failures.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation
  2019-04-05 19:21 ` [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
@ 2019-04-10 15:07   ` Peter Zijlstra
  2019-04-10 15:28     ` Waiman Long
  2019-04-10 15:10   ` Peter Zijlstra
  2019-04-10 18:44   ` Peter Zijlstra
  2 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-04-10 15:07 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, x86,
	Davidlohr Bueso, Linus Torvalds, Tim Chen

On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
> Because of writer lock stealing, it is possible that a constant
> stream of incoming writers will cause a waiting writer or reader to
> wait indefinitely leading to lock starvation.
> 
> The mutex code has a lock handoff mechanism to prevent lock starvation.
> This patch implements a similar lock handoff mechanism to disable
> lock stealing and force lock handoff to the first waiter in the queue
> after at least a 5ms waiting period. The waiting period is used to
> avoid discouraging lock stealing too much to affect performance.

So the mutex code doesn't have that timeout, it foces the handoff if the
top waiter fails to acquire.

I don't find the above sufficiently justifies the additional complexity.
What were the numbers with the simple scheme vs this etc..

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation
  2019-04-05 19:21 ` [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
  2019-04-10 15:07   ` Peter Zijlstra
@ 2019-04-10 15:10   ` Peter Zijlstra
  2019-04-10 15:29     ` Waiman Long
  2019-04-10 18:44   ` Peter Zijlstra
  2 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-04-10 15:10 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, x86,
	Davidlohr Bueso, Linus Torvalds, Tim Chen

On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
> +#define RWSEM_WAIT_TIMEOUT	((HZ - 1)/200 + 1)

Given the choices in HZ, the above seems fairly 'optimistic'.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation
  2019-04-10 15:07   ` Peter Zijlstra
@ 2019-04-10 15:28     ` Waiman Long
  0 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-10 15:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, x86,
	Davidlohr Bueso, Linus Torvalds, Tim Chen

On 04/10/2019 11:07 AM, Peter Zijlstra wrote:
> On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
>> Because of writer lock stealing, it is possible that a constant
>> stream of incoming writers will cause a waiting writer or reader to
>> wait indefinitely leading to lock starvation.
>>
>> The mutex code has a lock handoff mechanism to prevent lock starvation.
>> This patch implements a similar lock handoff mechanism to disable
>> lock stealing and force lock handoff to the first waiter in the queue
>> after at least a 5ms waiting period. The waiting period is used to
>> avoid discouraging lock stealing too much to affect performance.
> So the mutex code doesn't have that timeout, it foces the handoff if the
> top waiter fails to acquire.
>
> I don't find the above sufficiently justifies the additional complexity.
> What were the numbers with the simple scheme vs this etc..

When the handoff bit is set, it stops the lock from being acquired by
anyone else until the front waiter is woken up, scheduled and take the
lock. Doing that too frequently will impede the throughput when the
rwsem is highly contended. I can ran some performance test to show the
difference it can make.

I have also been thinking about having the front waiter set the handoff
bit and then spin on owner so that it can acquire the lock immediately
after it is freed. It will be a follow up patch.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation
  2019-04-10 15:10   ` Peter Zijlstra
@ 2019-04-10 15:29     ` Waiman Long
  0 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-10 15:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, x86,
	Davidlohr Bueso, Linus Torvalds, Tim Chen

On 04/10/2019 11:10 AM, Peter Zijlstra wrote:
> On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
>> +#define RWSEM_WAIT_TIMEOUT	((HZ - 1)/200 + 1)
> Given the choices in HZ, the above seems fairly 'optimistic'.

I can tighten it up and make it less "optimistic" :-)

Cheers,
Longman


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue
  2019-04-05 19:21 ` [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue Waiman Long
@ 2019-04-10 16:50   ` Davidlohr Bueso
  2019-04-10 17:08     ` Waiman Long
  0 siblings, 1 reply; 30+ messages in thread
From: Davidlohr Bueso @ 2019-04-10 16:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	linux-kernel, x86, Linus Torvalds, Tim Chen

On Fri, 05 Apr 2019, Waiman Long wrote:

>When the front of the wait queue is a reader, other readers
>immediately following the first reader will also be woken up at the
>same time. However, if there is a writer in between. Those readers
>behind the writer will not be woken up.
>
>Because of optimistic spinning, the lock acquisition order is not FIFO
>anyway. The lock handoff mechanism will ensure that lock starvation
>will not happen.
>
>Assuming that the lock hold times of the other readers still in the
>queue will be about the same as the readers that are being woken up,
>there is really not much additional cost other than the additional
>latency due to the wakeup of additional tasks by the waker. Therefore
>all the readers up to a maximum of 256 in the queue are woken up when
>the first waiter is a reader to improve reader throughput.

Before we stopped waking readers when a writer was encountered but
would otherwise wakeup _all_ readers. I don't understand why you want
to limit this to MAX_READERS_WAKEUP, otherwise I agree it's nice to
skip the writer and continue waking readers in the queue (with the handoff
guarantees obviously).

Would it not be better to do the MAX_READERS_WAKEUP limit only when
a writer is found?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue
  2019-04-10 16:50   ` Davidlohr Bueso
@ 2019-04-10 17:08     ` Waiman Long
  2019-04-10 17:22       ` Davidlohr Bueso
  0 siblings, 1 reply; 30+ messages in thread
From: Waiman Long @ 2019-04-10 17:08 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	linux-kernel, x86, Linus Torvalds, Tim Chen

On 04/10/2019 12:50 PM, Davidlohr Bueso wrote:
> On Fri, 05 Apr 2019, Waiman Long wrote:
>
>> When the front of the wait queue is a reader, other readers
>> immediately following the first reader will also be woken up at the
>> same time. However, if there is a writer in between. Those readers
>> behind the writer will not be woken up.
>>
>> Because of optimistic spinning, the lock acquisition order is not FIFO
>> anyway. The lock handoff mechanism will ensure that lock starvation
>> will not happen.
>>
>> Assuming that the lock hold times of the other readers still in the
>> queue will be about the same as the readers that are being woken up,
>> there is really not much additional cost other than the additional
>> latency due to the wakeup of additional tasks by the waker. Therefore
>> all the readers up to a maximum of 256 in the queue are woken up when
>> the first waiter is a reader to improve reader throughput.
>
> Before we stopped waking readers when a writer was encountered but
> would otherwise wakeup _all_ readers. I don't understand why you want
> to limit this to MAX_READERS_WAKEUP, otherwise I agree it's nice to
> skip the writer and continue waking readers in the queue (with the
> handoff
> guarantees obviously).
>
> Would it not be better to do the MAX_READERS_WAKEUP limit only when
> a writer is found?
>
> Thanks,
> Davidlohr

There are 2 major reasons why there is a limit.

1) It will be unfair to the task that needs to spend so much of its own
CPU time to wake up too many readers.
2) I want to avoid the extreme case that there are more than 32k readers
in the wait queue and make the count overflow.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue
  2019-04-10 17:08     ` Waiman Long
@ 2019-04-10 17:22       ` Davidlohr Bueso
  2019-04-10 17:31         ` Davidlohr Bueso
  2019-04-10 17:53         ` Waiman Long
  0 siblings, 2 replies; 30+ messages in thread
From: Davidlohr Bueso @ 2019-04-10 17:22 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	linux-kernel, x86, Linus Torvalds, Tim Chen

On Wed, 10 Apr 2019, Waiman Long wrote:
>There are 2 major reasons why there is a limit.
>
>1) It will be unfair to the task that needs to spend so much of its own
>CPU time to wake up too many readers.

This has never been a problem before.

>2) I want to avoid the extreme case that there are more than 32k readers
>in the wait queue and make the count overflow.

But in this case the readers are already on the queue.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue
  2019-04-10 17:22       ` Davidlohr Bueso
@ 2019-04-10 17:31         ` Davidlohr Bueso
  2019-04-10 17:54           ` Waiman Long
  2019-04-10 17:53         ` Waiman Long
  1 sibling, 1 reply; 30+ messages in thread
From: Davidlohr Bueso @ 2019-04-10 17:31 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	linux-kernel, x86, Linus Torvalds, Tim Chen

On Wed, 10 Apr 2019, Bueso wrote:

>On Wed, 10 Apr 2019, Waiman Long wrote:
>>2) I want to avoid the extreme case that there are more than 32k readers
>>in the wait queue and make the count overflow.
>
>But in this case the readers are already on the queue.

Never mind this.

But the limit could still be enlarged just to prevent overflows in the case
where there are only readers.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue
  2019-04-10 17:22       ` Davidlohr Bueso
  2019-04-10 17:31         ` Davidlohr Bueso
@ 2019-04-10 17:53         ` Waiman Long
  1 sibling, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-10 17:53 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	linux-kernel, x86, Linus Torvalds, Tim Chen

On 04/10/2019 01:22 PM, Davidlohr Bueso wrote:
> On Wed, 10 Apr 2019, Waiman Long wrote:
>> There are 2 major reasons why there is a limit.
>>
>> 1) It will be unfair to the task that needs to spend so much of its own
>> CPU time to wake up too many readers.
>
> This has never been a problem before.
>

In real workloads, you seldom have more than a couple dozens of readers
waiting in the queue.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue
  2019-04-10 17:31         ` Davidlohr Bueso
@ 2019-04-10 17:54           ` Waiman Long
  0 siblings, 0 replies; 30+ messages in thread
From: Waiman Long @ 2019-04-10 17:54 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	linux-kernel, x86, Linus Torvalds, Tim Chen

On 04/10/2019 01:31 PM, Davidlohr Bueso wrote:
> On Wed, 10 Apr 2019, Bueso wrote:
>
>> On Wed, 10 Apr 2019, Waiman Long wrote:
>>> 2) I want to avoid the extreme case that there are more than 32k
>>> readers
>>> in the wait queue and make the count overflow.
>>
>> But in this case the readers are already on the queue.
>
> Never mind this.
>
> But the limit could still be enlarged just to prevent overflows in the
> case
> where there are only readers.

The limit is kind of arbitrary. We can certainly increase it if 256 is
seemed too low.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 03/12] locking/rwsem: Remove rwsem_wake() wakeup optimization
  2019-04-05 19:21 ` [PATCH-tip v2 03/12] locking/rwsem: Remove rwsem_wake() wakeup optimization Waiman Long
@ 2019-04-10 18:38   ` Davidlohr Bueso
  0 siblings, 0 replies; 30+ messages in thread
From: Davidlohr Bueso @ 2019-04-10 18:38 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Thomas Gleixner,
	linux-kernel, x86, Linus Torvalds, Tim Chen

On Fri, 05 Apr 2019, Waiman Long wrote:

>With the commit 59aabfc7e959 ("locking/rwsem: Reduce spinlock contention
>in wakeup after up_read()/up_write()"), the rwsem_wake() forgoes doing
>a wakeup if the wait_lock cannot be directly acquired and an optimistic
>spinning locker is present.  This can help performance by avoiding
>spinning on the wait_lock when it is contended.
>
>With the later commit 133e89ef5ef3 ("locking/rwsem: Enable lockless
>waiter wakeup(s)"), the performance advantage of the above optimization
>diminishes as the average wait_lock hold time become much shorter.
>
>By supporting rwsem lock handoff, we can no longer relies on the fact
>that the presence of an optimistic spinning locker will ensure that the
>lock will be acquired by a task soon. This can lead to missed wakeup
>and application hang. So the commit 59aabfc7e959 ("locking/rwsem:
>Reduce spinlock contention in wakeup after up_read()/up_write()")
>will have to be reverted.
>
>Signed-off-by: Waiman Long <longman@redhat.com>
>---
> kernel/locking/rwsem-xadd.c | 74 -------------------------------------

I very much like removing this code. It was rather crazy.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation
  2019-04-05 19:21 ` [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
  2019-04-10 15:07   ` Peter Zijlstra
  2019-04-10 15:10   ` Peter Zijlstra
@ 2019-04-10 18:44   ` Peter Zijlstra
  2019-04-11  2:25     ` Waiman Long
  2 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2019-04-10 18:44 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, x86,
	Davidlohr Bueso, Linus Torvalds, Tim Chen

On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
> Because of writer lock stealing, it is possible that a constant
> stream of incoming writers will cause a waiting writer or reader to
> wait indefinitely leading to lock starvation.
> 
> The mutex code has a lock handoff mechanism to prevent lock starvation.
> This patch implements a similar lock handoff mechanism to disable
> lock stealing and force lock handoff to the first waiter in the queue
> after at least a 5ms waiting period. The waiting period is used to
> avoid discouraging lock stealing too much to affect performance.

I would say the handoff it not at all similar to the mutex code. It is
in fact radically different.

> @@ -131,6 +138,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>  		adjustment = RWSEM_READER_BIAS;
>  		oldcount = atomic_long_fetch_add(adjustment, &sem->count);
>  		if (unlikely(oldcount & RWSEM_WRITER_MASK)) {
> +			/*
> +			 * Initiate handoff to reader, if applicable.
> +			 */
> +			if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
> +			    time_after(jiffies, waiter->timeout)) {
> +				adjustment -= RWSEM_FLAG_HANDOFF;
> +				lockevent_inc(rwsem_rlock_handoff);
> +			}
> +
>  			atomic_long_sub(adjustment, &sem->count);
>  			return;
>  		}

That confuses the heck out of me...

The above seems to rely on __rwsem_mark_wake() to be fully serialized
(and it is, by ->wait_lock, but that isn't spelled out anywhere) such
that we don't get double increment of FLAG_HANDOFF.

So there is NO __rwsem_mark_wake() vs __wesem_mark_wake() race like:

  CPU0					CPU1

  oldcount = atomic_long_fetch_add(adjustment, &sem->count)

					oldcount = atomic_long_fetch_add(adjustment, &sem->count)

  if (!(oldcount & HANDOFF))
    adjustment -= HANDOFF;

					if (!(oldcount & HANDOFF))
					  adjustment -= HANDOFF;
  atomic_long_sub(adjustment)
					atomic_long_sub(adjustment)


*whoops* double negative decrement of HANDOFF (aka double increment).


However there is another site that fiddles with the HANDOFF bit, namely
__rwsem_down_write_failed_common(), and that does:

+                               atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);

_OUTSIDE_ of ->wait_lock, which would yield:

  CPU0					CPU1

  oldcount = atomic_long_fetch_add(adjustment, &sem->count)

					atomic_long_or(HANDOFF)

  if (!(oldcount & HANDOFF))
    adjustment -= HANDOFF;

  atomic_long_sub(adjustment)

*whoops*, incremented HANDOFF on HANDOFF.


And there's not a comment in sight that would elucidate if this is
possible or not.


Also:

+                               atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);
+                               first++;
+
+                               /*
+                                * Make sure the handoff bit is seen by
+                                * others before proceeding.
+                                */
+                               smp_mb__after_atomic();

That comment is utter nonsense. smp_mb() doesn't (and cannot) 'make
visible'. There needs to be order between two memops on both sides.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation
  2019-04-10 18:44   ` Peter Zijlstra
@ 2019-04-11  2:25     ` Waiman Long
  2019-04-11  7:16       ` Peter Zijlstra
  0 siblings, 1 reply; 30+ messages in thread
From: Waiman Long @ 2019-04-11  2:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, x86,
	Davidlohr Bueso, Linus Torvalds, Tim Chen

On 04/10/2019 02:44 PM, Peter Zijlstra wrote:
> On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
>> Because of writer lock stealing, it is possible that a constant
>> stream of incoming writers will cause a waiting writer or reader to
>> wait indefinitely leading to lock starvation.
>>
>> The mutex code has a lock handoff mechanism to prevent lock starvation.
>> This patch implements a similar lock handoff mechanism to disable
>> lock stealing and force lock handoff to the first waiter in the queue
>> after at least a 5ms waiting period. The waiting period is used to
>> avoid discouraging lock stealing too much to affect performance.
> I would say the handoff it not at all similar to the mutex code. It is
> in fact radically different.
>

I mean they are similar in concept. Of course, the implementations are
quite different.

>> @@ -131,6 +138,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>>  		adjustment = RWSEM_READER_BIAS;
>>  		oldcount = atomic_long_fetch_add(adjustment, &sem->count);
>>  		if (unlikely(oldcount & RWSEM_WRITER_MASK)) {
>> +			/*
>> +			 * Initiate handoff to reader, if applicable.
>> +			 */
>> +			if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
>> +			    time_after(jiffies, waiter->timeout)) {
>> +				adjustment -= RWSEM_FLAG_HANDOFF;
>> +				lockevent_inc(rwsem_rlock_handoff);
>> +			}
>> +
>>  			atomic_long_sub(adjustment, &sem->count);
>>  			return;
>>  		}
> That confuses the heck out of me...
>
> The above seems to rely on __rwsem_mark_wake() to be fully serialized
> (and it is, by ->wait_lock, but that isn't spelled out anywhere) such
> that we don't get double increment of FLAG_HANDOFF.
>
> So there is NO __rwsem_mark_wake() vs __wesem_mark_wake() race like:
>
>   CPU0					CPU1
>
>   oldcount = atomic_long_fetch_add(adjustment, &sem->count)
>
> 					oldcount = atomic_long_fetch_add(adjustment, &sem->count)
>
>   if (!(oldcount & HANDOFF))
>     adjustment -= HANDOFF;
>
> 					if (!(oldcount & HANDOFF))
> 					  adjustment -= HANDOFF;
>   atomic_long_sub(adjustment)
> 					atomic_long_sub(adjustment)
>
>
> *whoops* double negative decrement of HANDOFF (aka double increment).

Yes, __rwsem_mark_wake() is always called with wait_lock held. I can add
a lockdep_assert() statement to clarify this point.

>
> However there is another site that fiddles with the HANDOFF bit, namely
> __rwsem_down_write_failed_common(), and that does:
>
> +                               atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);
>
> _OUTSIDE_ of ->wait_lock, which would yield:
>
>   CPU0					CPU1
>
>   oldcount = atomic_long_fetch_add(adjustment, &sem->count)
>
> 					atomic_long_or(HANDOFF)
>
>   if (!(oldcount & HANDOFF))
>     adjustment -= HANDOFF;
>
>   atomic_long_sub(adjustment)
>
> *whoops*, incremented HANDOFF on HANDOFF.
>
>
> And there's not a comment in sight that would elucidate if this is
> possible or not.
>

A writer can only set the handoff bit if it is the first waiter in the
queue. If it is the first waiter, a racing __rwsem_mark_wake() will see
that the first waiter is a writer and so won't go into the reader path.
I know I something don't spell out all the conditions that may look
obvious to me but not to others. I will elaborate more in comments.

> Also:
>
> +                               atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);
> +                               first++;
> +
> +                               /*
> +                                * Make sure the handoff bit is seen by
> +                                * others before proceeding.
> +                                */
> +                               smp_mb__after_atomic();
>
> That comment is utter nonsense. smp_mb() doesn't (and cannot) 'make
> visible'. There needs to be order between two memops on both sides.
>
I kind of add that for safety. I will take some time to rethink if it is
really necessary.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation
  2019-04-11  2:25     ` Waiman Long
@ 2019-04-11  7:16       ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2019-04-11  7:16 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, Thomas Gleixner, linux-kernel, x86,
	Davidlohr Bueso, Linus Torvalds, Tim Chen

On Wed, Apr 10, 2019 at 10:25:16PM -0400, Waiman Long wrote:
> On 04/10/2019 02:44 PM, Peter Zijlstra wrote:

> > However there is another site that fiddles with the HANDOFF bit, namely
> > __rwsem_down_write_failed_common(), and that does:
> >
> > +                               atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);
> >
> > _OUTSIDE_ of ->wait_lock, which would yield:
> >
> >   CPU0					CPU1
> >
> >   oldcount = atomic_long_fetch_add(adjustment, &sem->count)
> >
> > 					atomic_long_or(HANDOFF)
> >
> >   if (!(oldcount & HANDOFF))
> >     adjustment -= HANDOFF;
> >
> >   atomic_long_sub(adjustment)
> >
> > *whoops*, incremented HANDOFF on HANDOFF.
> >
> >
> > And there's not a comment in sight that would elucidate if this is
> > possible or not.
> >
> 
> A writer can only set the handoff bit if it is the first waiter in the
> queue. If it is the first waiter, a racing __rwsem_mark_wake() will see
> that the first waiter is a writer and so won't go into the reader path.
> I know I something don't spell out all the conditions that may look
> obvious to me but not to others. I will elaborate more in comments.

Aah, indeed.



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-04-11  7:16 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-05 19:21 [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 01/12] locking/rwsem: Implement a new locking scheme Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
2019-04-10 15:07   ` Peter Zijlstra
2019-04-10 15:28     ` Waiman Long
2019-04-10 15:10   ` Peter Zijlstra
2019-04-10 15:29     ` Waiman Long
2019-04-10 18:44   ` Peter Zijlstra
2019-04-11  2:25     ` Waiman Long
2019-04-11  7:16       ` Peter Zijlstra
2019-04-05 19:21 ` [PATCH-tip v2 03/12] locking/rwsem: Remove rwsem_wake() wakeup optimization Waiman Long
2019-04-10 18:38   ` Davidlohr Bueso
2019-04-05 19:21 ` [PATCH-tip v2 04/12] locking/rwsem: Make rwsem_spin_on_owner() return owner state Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 05/12] locking/rwsem: Ensure an RT task will not spin on reader Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 06/12] locking/rwsem: Wake up almost all readers in wait queue Waiman Long
2019-04-10 16:50   ` Davidlohr Bueso
2019-04-10 17:08     ` Waiman Long
2019-04-10 17:22       ` Davidlohr Bueso
2019-04-10 17:31         ` Davidlohr Bueso
2019-04-10 17:54           ` Waiman Long
2019-04-10 17:53         ` Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 07/12] locking/rwsem: Enable readers spinning on writer Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 08/12] locking/rwsem: Enable time-based spinning on reader-owned rwsem Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 09/12] locking/rwsem: Add more rwsem owner access helpers Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 10/12] locking/rwsem: Guard against making count negative Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 11/12] locking/rwsem: Merge owner into count on x86-64 Waiman Long
2019-04-05 19:21 ` [PATCH-tip v2 12/12] locking/rwsem: Remove redundant computation of writer lock word Waiman Long
2019-04-05 23:27 ` [PATCH-tip v2 00/12] locking/rwsem: Rwsem rearchitecture part 2 Linus Torvalds
2019-04-10 10:00 ` Ingo Molnar
2019-04-10 12:38   ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).