linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH-tip/locking/core v3 00/10]  locking/rwsem: Enable reader optimistic spinning
@ 2016-06-17 15:41 Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 01/10] locking/osq: Make lock/unlock proper acquire/release barrier Waiman Long
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

v2->v3:
 - Used smp_acquire__after_ctrl_dep() to provide acquire barrier.
 - Added the following new patches:
   1) make rwsem_spin_on_owner() return a tristate value.
   2) reactivate reader spinning when there is a large number of
      favorable writer-on-writer spinnings.
   3) move all the rwsem macros in arch-specific rwsem.h files
      into a common asm-generic/rwsem_types.h file.
   4) add a boot parameter to specify the reader spinning threshold.
 - Updated some of the patches as suggested by PeterZ and adjusted
   some of the reader spinning parameters.

v1->v2:
 - Fixed a 0day build error.
 - Added a new patch 1 to make osq_lock() a proper acquire memory
   barrier.
 - Replaced the explicit enabling of reader spinning by an autotuning
   mechanism that disable reader spinning for those rwsems that may
   not benefit from reader spinning.
 - Remove the last xfs patch as it is no longer necessary.

This patchset enables more aggressive optimistic spinning on
both readers and writers waiting on a writer or reader owned
lock. Spinning on writer is done by looking at the on_cpu flag of the
lock owner. Spinning on readers, on the other hand, is count-based as
there is no easy way to figure out if all the readers are running. The
spinner will stop spinning once the count goes to 0. Because of that,
spinning on readers may hurt performance in some cases.

An autotuning mechanism is used to determine if a rwsem can benefit
from reader optimistic spinning. It will maintain reader spinning as
long as no less than 80% of the spins are successful.

Patch 1 updates the osq_lock() function to make it a proper acquire
memory barrier.

Patch 2 reduces the length of the blocking window after a read locking
attempt where writer lock stealing is disabled because of the active
read lock. It can improve rwsem performance for contended lock. It is
independent of the rest of the patchset.

Patch 3 modifies rwsem_spin_on_owner() to return a tri-state value
that can be used in later patch.

Patch 4 puts in place the autotuning mechanism to check if reader
optimistic spinning should be used or not.

Patch 5 moves down the rwsem_down_read_failed() function for later
patches.

Patch 6 moves the macro definitions in various arch-specific rwsem.h
header files into a commont asm-generic/rwsem_types.h file.

Patch 7 changes RWSEM_WAITING_BIAS to simpify reader trylock code.

Patch 8 enables readers to do optimistic spinning.

Patch 9 allows reactivation of reader spinning when a lot of
writer-on-writer spins are successful.

Patch 10 adds a new boot parameter to change the reader spinning
threshold which can be system specific.

Waiman Long (10):
  locking/osq: Make lock/unlock proper acquire/release barrier
  locking/rwsem: Stop active read lock ASAP
  locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value
  locking/rwsem: Enable count-based spinning on reader
  locking/rwsem: move down rwsem_down_read_failed function
  locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h
  locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation
  locking/rwsem: Enable spinning readers
  locking/rwsem: Enable reactivation of reader spinning
  locking/rwsem: Add a boot parameter to reader spinning threshold

 Documentation/kernel-parameters.txt |    3 +
 arch/alpha/include/asm/rwsem.h      |   11 +-
 arch/ia64/include/asm/rwsem.h       |    9 +-
 arch/s390/include/asm/rwsem.h       |    9 +-
 arch/x86/include/asm/rwsem.h        |   22 +---
 include/asm-generic/rwsem.h         |   20 +--
 include/asm-generic/rwsem_types.h   |   28 ++++
 include/linux/rwsem.h               |   23 +++-
 kernel/locking/osq_lock.c           |    7 +-
 kernel/locking/rwsem-xadd.c         |  296 ++++++++++++++++++++++++++---------
 10 files changed, 296 insertions(+), 132 deletions(-)
 create mode 100644 include/asm-generic/rwsem_types.h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH-tip/locking/core v3 01/10] locking/osq: Make lock/unlock proper acquire/release barrier
  2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
@ 2016-06-17 15:41 ` Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 02/10] locking/rwsem: Stop active read lock ASAP Waiman Long
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

The osq_lock() and osq_unlock() function may not provide the necessary
acquire and release barrier in some cases. This patch makes sure
that the proper barriers are provided when osq_lock() is successful
or when osq_unlock() is called.

The change on the unlock side is more for documentation purpose than
is actually needed.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 kernel/locking/osq_lock.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..d957b90 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -124,6 +124,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 
 		cpu_relax_lowlatency();
 	}
+	/*
+	 * Add an acquire memory barrier for pairing with the release barrier
+	 * in unlock.
+	 */
+	smp_acquire__after_ctrl_dep();
 	return true;
 
 unqueue:
@@ -198,7 +203,7 @@ void osq_unlock(struct optimistic_spin_queue *lock)
 	 * Second most likely case.
 	 */
 	node = this_cpu_ptr(&osq_node);
-	next = xchg(&node->next, NULL);
+	next = xchg_release(&node->next, NULL);
 	if (next) {
 		WRITE_ONCE(next->locked, 1);
 		return;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH-tip/locking/core v3 02/10] locking/rwsem: Stop active read lock ASAP
  2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 01/10] locking/osq: Make lock/unlock proper acquire/release barrier Waiman Long
@ 2016-06-17 15:41 ` Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 03/10] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

Currently, when down_read() fails, the active read locking isn't undone
until the rwsem_down_read_failed() function grabs the wait_lock. If the
wait_lock is contended, it may takes a while to get the lock. During
that period, writer lock stealing will be disabled because of the
active read lock.

This patch will release the active read lock ASAP so that writer lock
stealing can happen sooner. The only downside is when the reader is
the first one in the wait queue as it has to issue another atomic
operation to update the count.

On a 4-socket Haswell machine running on a 4.7-rc1 tip-based kernel,
the fio test with multithreaded randrw and randwrite tests on the
same file on a XFS partition on top of a NVDIMM with DAX were run,
the aggregated bandwidths before and after the patch were as follows:

  Test      BW before patch     BW after patch  % change
  ----      ---------------     --------------  --------
  randrw        1210 MB/s          1352 MB/s      +12%
  randwrite     1622 MB/s          1710 MB/s      +5.4%

The write-only microbench also showed improvement because some read
locking was done by the XFS code.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 kernel/locking/rwsem-xadd.c |   19 ++++++++++++++-----
 1 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 2031281..29027c6 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -230,11 +230,18 @@ __rwsem_mark_wake(struct rw_semaphore *sem,
 __visible
 struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 {
-	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+	long count, adjustment = 0;
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk = current;
 	WAKE_Q(wake_q);
 
+	/*
+	 * Undo read bias from down_read operation, stop active locking.
+	 * Doing that after taking the wait_lock may block writer lock
+	 * stealing for too long.
+	 */
+	atomic_long_add(-RWSEM_ACTIVE_READ_BIAS, &sem->count);
+
 	/* set up my own style of waitqueue */
 	waiter.task = tsk;
 	waiter.type = RWSEM_WAITING_FOR_READ;
@@ -244,8 +251,11 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 		adjustment += RWSEM_WAITING_BIAS;
 	list_add_tail(&waiter.list, &sem->wait_list);
 
-	/* we're now waiting on the lock, but no longer actively locking */
-	count = atomic_long_add_return(adjustment, &sem->count);
+	/* we're now waiting on the lock */
+	if (adjustment)
+		count = atomic_long_add_return(adjustment, &sem->count);
+	else
+		count = atomic_long_read(&sem->count);
 
 	/* If there are no active locks, wake the front queued process(es).
 	 *
@@ -253,8 +263,7 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 	 * wake our own waiter to join the existing active readers !
 	 */
 	if (count == RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS &&
-	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
+	    (count > RWSEM_WAITING_BIAS && adjustment))
 		sem = __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 
 	raw_spin_unlock_irq(&sem->wait_lock);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH-tip/locking/core v3 03/10] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value
  2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 01/10] locking/osq: Make lock/unlock proper acquire/release barrier Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 02/10] locking/rwsem: Stop active read lock ASAP Waiman Long
@ 2016-06-17 15:41 ` Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 04/10] locking/rwsem: Enable count-based spinning on reader Waiman Long
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

This patch modifies rwsem_spin_on_owner() to return a tri-state value
to better reflect the state of lock holder which enables us to make a
better decision of what to do next.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 kernel/locking/rwsem-xadd.c |   14 +++++++++-----
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 29027c6..198b732 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -360,9 +360,13 @@ done:
 }
 
 /*
- * Return true only if we can still spin on the owner field of the rwsem.
+ * Return the folowing three values depending on the lock owner state.
+ *   1	when owner has changed and no reader is detected yet.
+ *   0	when owner has change and/or owner is a reader.
+ *  -1	when optimistic spinning has to stop because either the owner stops
+ *	running or its timeslice has been used up.
  */
-static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
+static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner = READ_ONCE(sem->owner);
 
@@ -382,7 +386,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 		/* abort spinning when need_resched or owner is not running */
 		if (!owner->on_cpu || need_resched()) {
 			rcu_read_unlock();
-			return false;
+			return -1;
 		}
 
 		cpu_relax_lowlatency();
@@ -393,7 +397,7 @@ out:
 	 * If there is a new owner or the owner is not set, we continue
 	 * spinning.
 	 */
-	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
+	return rwsem_owner_is_reader(READ_ONCE(sem->owner)) ? 0 : 1;
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
@@ -416,7 +420,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	 *  2) readers own the lock as we can't determine if they are
 	 *     actively running or not.
 	 */
-	while (rwsem_spin_on_owner(sem)) {
+	while (rwsem_spin_on_owner(sem) > 0) {
 		/*
 		 * Try to acquire the lock
 		 */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH-tip/locking/core v3 04/10] locking/rwsem: Enable count-based spinning on reader
  2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
                   ` (2 preceding siblings ...)
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 03/10] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
@ 2016-06-17 15:41 ` Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 05/10] locking/rwsem: move down rwsem_down_read_failed function Waiman Long
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

When the rwsem is owned by reader, writers stop optimistic spinning
simply because there is no easy way to figure out if all the readers
are actively running or not. However, there are scenarios where
the readers are unlikely to sleep and optimistic spinning can help
performance.

This patch provides an autotuning mechanism to find out if a rwsem
can benefit from count-based reader optimistic spinning. A count
(rspin_enabled) in the rwsem data structure is used to track if
optimistic spinning should be enabled. Reader spinning is enabled
by default. Each successful spin (with lock acquisition) will
increment the count by 1 and each unsuccessful spin will decrement
it by 4.  When the count reaches 0, reader spinning is disabled.
Modification of that count is protected by the osq lock. Therefore,
reader spinning will be maintained as long as at least 80% of the
spins are successful.

Both the spinning threshold and the default value for rspin_enabled
can be overridden by architecture specific rwsem.h header file.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 include/linux/rwsem.h       |   19 +++++++++++-
 kernel/locking/rwsem-xadd.c |   66 ++++++++++++++++++++++++++++++++++++++----
 2 files changed, 77 insertions(+), 8 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index dd1d142..8978f87 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -32,6 +32,8 @@ struct rw_semaphore {
 	raw_spinlock_t wait_lock;
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	struct optimistic_spin_queue osq; /* spinner MCS lock */
+	int rspin_enabled;	/* protected by osq lock */
+
 	/*
 	 * Write owner. Used as a speculative check to see
 	 * if the owner is running on the cpu.
@@ -69,8 +71,23 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
 # define __RWSEM_DEP_MAP_INIT(lockname)
 #endif
 
+/*
+ * Each successful reader spin will increment the rspin_enabled by 1.
+ * Each unsuccessful spin, on the other hand, will decrement it by 2.
+ * Reader spinning will be permanently disabled when it reaches 0.
+ */
+#ifndef RWSEM_RSPIN_ENABLED_DEFAULT
+# define RWSEM_RSPIN_ENABLED_DEFAULT	40
+#endif
+#define RWSEM_RSPIN_ENABLED_MAX		1024
+
+#ifndef RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_THRESHOLD	(1 << 12)
+#endif
+
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
-#define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL
+#define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL, \
+		.rspin_enabled = RWSEM_RSPIN_ENABLED_DEFAULT
 #else
 #define __RWSEM_OPT_INIT(lockname)
 #endif
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 198b732..ce68b54 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -85,6 +85,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name,
 	INIT_LIST_HEAD(&sem->wait_list);
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	sem->owner = NULL;
+	sem->rspin_enabled = RWSEM_RSPIN_ENABLED_DEFAULT;
 	osq_lock_init(&sem->osq);
 #endif
 }
@@ -347,9 +348,10 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 	owner = READ_ONCE(sem->owner);
 	if (!rwsem_owner_is_writer(owner)) {
 		/*
-		 * Don't spin if the rwsem is readers owned.
+		 * Don't spin if the rwsem is readers owned and the
+		 * reader spinning threshold isn't set.
 		 */
-		ret = !rwsem_owner_is_reader(owner);
+		ret = !rwsem_owner_is_reader(owner) || sem->rspin_enabled;
 		goto done;
 	}
 
@@ -403,6 +405,8 @@ out:
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 {
 	bool taken = false;
+	int owner_state;	/* Lock owner state */
+	int rspin_cnt;		/* Count for reader spinning */
 
 	preempt_disable();
 
@@ -413,14 +417,16 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	if (!osq_lock(&sem->osq))
 		goto done;
 
+	rspin_cnt = sem->rspin_enabled ? RWSEM_RSPIN_THRESHOLD : 0;
+
 	/*
 	 * Optimistically spin on the owner field and attempt to acquire the
 	 * lock whenever the owner changes. Spinning will be stopped when:
-	 *  1) the owning writer isn't running; or
-	 *  2) readers own the lock as we can't determine if they are
-	 *     actively running or not.
+	 *  1) the owning writer isn't running,
+	 *  2) readers own the lock and reader spinning count has reached 0; or
+	 *  3) its timeslice has been used up.
 	 */
-	while (rwsem_spin_on_owner(sem) > 0) {
+	while ((owner_state = rwsem_spin_on_owner(sem)) >= 0) {
 		/*
 		 * Try to acquire the lock
 		 */
@@ -430,12 +436,24 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		}
 
 		/*
+		 * We only decremnt the rspin_cnt when the lock is owned
+		 * by readers (owner_state == 0). In which case,
+		 * rwsem_spin_on_owner() will essentially be a no-op
+		 * and we will be spinning in this main loop.
+		 */
+		if (owner_state == 0) {
+			if (!rspin_cnt)
+				break;
+			rspin_cnt--;
+		}
+
+		/*
 		 * When there's no owner, we might have preempted between the
 		 * owner acquiring the lock and setting the owner field. If
 		 * we're an RT task that will live-lock because we won't let
 		 * the owner complete.
 		 */
-		if (!sem->owner && (need_resched() || rt_task(current)))
+		if (!sem->owner && rt_task(current))
 			break;
 
 		/*
@@ -446,6 +464,28 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 */
 		cpu_relax_lowlatency();
 	}
+	/*
+	 * Check the success or failure of writer spinning on reader so as
+	 * to adjust the rspin_enabled count accordingly.
+	 */
+	if (rwsem_owner_is_reader(sem->owner)) {
+		/*
+		 * Update rspin_enabled for reader spinning.
+		 *
+		 * Right now, we need more than 2/3 successful spins to
+		 * maintain reader spinning. We will get rid of it if we don't
+		 * have enough successful spins. The decrement amount is kind
+		 * of arbitrary and can be adjusted if necessary.
+		 */
+		if (taken && (sem->rspin_enabled < RWSEM_RSPIN_ENABLED_MAX)) {
+			sem->rspin_enabled++;
+		} else if (!taken) {
+			if  (sem->rspin_enabled > 2)
+				sem->rspin_enabled -= 2;
+			else
+				sem->rspin_enabled = 0;
+		}
+	}
 	osq_unlock(&sem->osq);
 done:
 	preempt_enable();
@@ -460,6 +500,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	return osq_is_locked(&sem->osq);
 }
 
+/*
+ * Return true if reader optimistic spinning is enabled
+ */
+static inline bool reader_spinning_enabled(struct rw_semaphore *sem)
+{
+	return sem->rspin_enabled;
+}
 #else
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 {
@@ -470,6 +517,11 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 {
 	return false;
 }
+
+static inline bool reader_spinning_enabled(struct rw_semaphore *sem)
+{
+	return false;
+}
 #endif
 
 /*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH-tip/locking/core v3 05/10] locking/rwsem: move down rwsem_down_read_failed function
  2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
                   ` (3 preceding siblings ...)
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 04/10] locking/rwsem: Enable count-based spinning on reader Waiman Long
@ 2016-06-17 15:41 ` Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 06/10] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h Waiman Long
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

Move the rwsem_down_read_failed() function down to below the
optimistic spinning section before enabling optimistic spinning for
the readers. It is because the rwsem_down_read_failed() function will
call rwsem_optimistic_spin() in later patch.

There is no change in code.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 kernel/locking/rwsem-xadd.c |  116 +++++++++++++++++++++---------------------
 1 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index ce68b54..5fd689e 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -226,64 +226,6 @@ __rwsem_mark_wake(struct rw_semaphore *sem,
 }
 
 /*
- * Wait for the read lock to be granted
- */
-__visible
-struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
-{
-	long count, adjustment = 0;
-	struct rwsem_waiter waiter;
-	struct task_struct *tsk = current;
-	WAKE_Q(wake_q);
-
-	/*
-	 * Undo read bias from down_read operation, stop active locking.
-	 * Doing that after taking the wait_lock may block writer lock
-	 * stealing for too long.
-	 */
-	atomic_long_add(-RWSEM_ACTIVE_READ_BIAS, &sem->count);
-
-	/* set up my own style of waitqueue */
-	waiter.task = tsk;
-	waiter.type = RWSEM_WAITING_FOR_READ;
-
-	raw_spin_lock_irq(&sem->wait_lock);
-	if (list_empty(&sem->wait_list))
-		adjustment += RWSEM_WAITING_BIAS;
-	list_add_tail(&waiter.list, &sem->wait_list);
-
-	/* we're now waiting on the lock */
-	if (adjustment)
-		count = atomic_long_add_return(adjustment, &sem->count);
-	else
-		count = atomic_long_read(&sem->count);
-
-	/* If there are no active locks, wake the front queued process(es).
-	 *
-	 * If there are no writers and we are first in the queue,
-	 * wake our own waiter to join the existing active readers !
-	 */
-	if (count == RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS && adjustment))
-		sem = __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
-
-	raw_spin_unlock_irq(&sem->wait_lock);
-	wake_up_q(&wake_q);
-
-	/* wait to be given the lock */
-	while (true) {
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-		if (!waiter.task)
-			break;
-		schedule();
-	}
-
-	__set_task_state(tsk, TASK_RUNNING);
-	return sem;
-}
-EXPORT_SYMBOL(rwsem_down_read_failed);
-
-/*
  * This function must be called with the sem->wait_lock held to prevent
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
@@ -525,6 +467,64 @@ static inline bool reader_spinning_enabled(struct rw_semaphore *sem)
 #endif
 
 /*
+ * Wait for the read lock to be granted
+ */
+__visible
+struct rw_semaphore __sched * rwsem_down_read_failed(struct rw_semaphore *sem)
+{
+	long count, adjustment = 0;
+	struct rwsem_waiter waiter;
+	struct task_struct *tsk = current;
+	WAKE_Q(wake_q);
+
+	/*
+	 * Undo read bias from down_read operation, stop active locking.
+	 * Doing that after taking the wait_lock may block writer lock
+	 * stealing for too long.
+	 */
+	atomic_long_add(-RWSEM_ACTIVE_READ_BIAS, &sem->count);
+
+	/* set up my own style of waitqueue */
+	waiter.task = tsk;
+	waiter.type = RWSEM_WAITING_FOR_READ;
+
+	raw_spin_lock_irq(&sem->wait_lock);
+	if (list_empty(&sem->wait_list))
+		adjustment += RWSEM_WAITING_BIAS;
+	list_add_tail(&waiter.list, &sem->wait_list);
+
+	/* we're now waiting on the lock */
+	if (adjustment)
+		count = atomic_long_add_return(adjustment, &sem->count);
+	else
+		count = atomic_long_read(&sem->count);
+
+	/* If there are no active locks, wake the front queued process(es).
+	 *
+	 * If there are no writers and we are first in the queue,
+	 * wake our own waiter to join the existing active readers !
+	 */
+	if (count == RWSEM_WAITING_BIAS ||
+	    (count > RWSEM_WAITING_BIAS && adjustment))
+		sem = __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+
+	raw_spin_unlock_irq(&sem->wait_lock);
+	wake_up_q(&wake_q);
+
+	/* wait to be given the lock */
+	while (true) {
+		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		if (!waiter.task)
+			break;
+		schedule();
+	}
+
+	__set_task_state(tsk, TASK_RUNNING);
+	return sem;
+}
+EXPORT_SYMBOL(rwsem_down_read_failed);
+
+/*
  * Wait until we successfully acquire the write lock
  */
 static inline struct rw_semaphore *
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH-tip/locking/core v3 06/10] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h
  2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
                   ` (4 preceding siblings ...)
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 05/10] locking/rwsem: move down rwsem_down_read_failed function Waiman Long
@ 2016-06-17 15:41 ` Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 07/10] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation Waiman Long
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

Almost all the macro definitions in the various architecture specific
rwsem.h header files are essentially the same. This patch moves all
of them into a common header asm-generic/rwsem_types.h to eliminate
the duplication.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 arch/alpha/include/asm/rwsem.h    |    8 +-------
 arch/ia64/include/asm/rwsem.h     |    7 ++-----
 arch/s390/include/asm/rwsem.h     |    7 +------
 arch/x86/include/asm/rwsem.h      |   19 +------------------
 include/asm-generic/rwsem.h       |   16 +---------------
 include/asm-generic/rwsem_types.h |   26 ++++++++++++++++++++++++++
 6 files changed, 32 insertions(+), 51 deletions(-)
 create mode 100644 include/asm-generic/rwsem_types.h

diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
index 77873d0..f99e39a 100644
--- a/arch/alpha/include/asm/rwsem.h
+++ b/arch/alpha/include/asm/rwsem.h
@@ -13,13 +13,7 @@
 #ifdef __KERNEL__
 
 #include <linux/compiler.h>
-
-#define RWSEM_UNLOCKED_VALUE		0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS		0x0000000000000001L
-#define RWSEM_ACTIVE_MASK		0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS		(-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#include <asm-generic/rwsem_types.h>
 
 static inline void __down_read(struct rw_semaphore *sem)
 {
diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
index 8fa98dd..21a9066 100644
--- a/arch/ia64/include/asm/rwsem.h
+++ b/arch/ia64/include/asm/rwsem.h
@@ -26,13 +26,10 @@
 #endif
 
 #include <asm/intrinsics.h>
+#include <asm-generic/rwsem_types.h>
 
+#undef  RWSEM_UNLOCKED_VALUE
 #define RWSEM_UNLOCKED_VALUE		__IA64_UL_CONST(0x0000000000000000)
-#define RWSEM_ACTIVE_BIAS		(1L)
-#define RWSEM_ACTIVE_MASK		(0xffffffffL)
-#define RWSEM_WAITING_BIAS		(-0x100000000L)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
 
 /*
  * lock for reading
diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 597e7e9..13dedc8 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -39,12 +39,7 @@
 #error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
 #endif
 
-#define RWSEM_UNLOCKED_VALUE	0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS	0x0000000000000001L
-#define RWSEM_ACTIVE_MASK	0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS	(-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS	RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS	(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#include <asm-generic/rwsem_types.h>
 
 /*
  * lock for reading
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index 089ced4..c6d155e 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -38,24 +38,7 @@
 
 #ifdef __KERNEL__
 #include <asm/asm.h>
-
-/*
- * The bias values and the counter type limits the number of
- * potential readers/writers to 32767 for 32 bits and 2147483647
- * for 64 bits.
- */
-
-#ifdef CONFIG_X86_64
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#include <asm-generic/rwsem_types.h>
 
 /*
  * lock for reading
diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index 5be122e..3cb8d98 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -12,21 +12,7 @@
  * Adapted largely from include/asm-i386/rwsem.h
  * by Paul Mackerras <paulus@samba.org>.
  */
-
-/*
- * the semaphore definition
- */
-#ifdef CONFIG_64BIT
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#include <asm-generic/rwsem_types.h>
 
 /*
  * lock for reading
diff --git a/include/asm-generic/rwsem_types.h b/include/asm-generic/rwsem_types.h
new file mode 100644
index 0000000..093ef6a
--- /dev/null
+++ b/include/asm-generic/rwsem_types.h
@@ -0,0 +1,26 @@
+#ifndef _ASM_GENERIC_RWSEM_TYPES_H
+#define _ASM_GENERIC_RWSEM_TYPES_H
+
+#ifdef __KERNEL__
+
+/*
+ * the semaphore definition
+ *
+ * The bias values and the counter type limits the number of
+ * potential readers/writers to 32767 for 32 bits and 2147483647
+ * for 64 bits.
+ */
+#ifdef CONFIG_64BIT
+# define RWSEM_ACTIVE_MASK		0xffffffffL
+#else
+# define RWSEM_ACTIVE_MASK		0x0000ffffL
+#endif
+
+#define RWSEM_UNLOCKED_VALUE		0x00000000L
+#define RWSEM_ACTIVE_BIAS		0x00000001L
+#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
+#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
+#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+
+#endif	/* __KERNEL__ */
+#endif	/* _ASM_GENERIC_RWSEM_TYPES_H */
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH-tip/locking/core v3 07/10] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation
  2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
                   ` (5 preceding siblings ...)
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 06/10] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h Waiman Long
@ 2016-06-17 15:41 ` Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 08/10] locking/rwsem: Enable spinning readers Waiman Long
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

When the count value is in between 0 and RWSEM_WAITING_BIAS, there
are 2 possibilities. Either a writer is present and there is no
waiter or there are waiters and readers. There is no easy way to
know which is true unless the wait_lock is taken.

This patch changes the RWSEM_WAITING_BIAS from 0xffff (32-bit) or
0xffffffff (64-bit) to 0xc0000000 (32-bit) or 0xc000000000000000
(64-bit). By doing so, we will be able to determine if writers
are present by looking at the count value alone without taking the
wait_lock.

This patch has the effect of halving the maximum number of writers
that can attempt to take the write lock simultaneously. However,
even the reduced maximum of about 16k (32-bit) or 1G (64-bit) should
be more than enough for the foreseeable future.

With that change, the following identity is now no longer true:

  RWSEM_ACTIVE_WRITE_BIAS = RWSEM_WAITING_BIAS + RWSEM_ACTIVE_READ_BIAS

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 arch/alpha/include/asm/rwsem.h    |    3 ++-
 arch/ia64/include/asm/rwsem.h     |    2 +-
 arch/s390/include/asm/rwsem.h     |    2 +-
 arch/x86/include/asm/rwsem.h      |    3 ++-
 include/asm-generic/rwsem.h       |    4 ++--
 include/asm-generic/rwsem_types.h |   10 ++++++----
 kernel/locking/rwsem-xadd.c       |   32 ++++++++++++++++++++++++--------
 7 files changed, 38 insertions(+), 18 deletions(-)

diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
index f99e39a..dc236a5 100644
--- a/arch/alpha/include/asm/rwsem.h
+++ b/arch/alpha/include/asm/rwsem.h
@@ -179,7 +179,8 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 	"2:	br	1b\n"
 	".previous"
 	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (-RWSEM_WAITING_BIAS), "m" (sem->count) : "memory");
+	:"Ir" (-RWSEM_ACTIVE_WRITE_BIAS + RWSEM_ACTIVE_READ_BIAS),
+	 "m" (sem->count) : "memory");
 #endif
 	if (unlikely(oldcount < 0))
 		rwsem_downgrade_wake(sem);
diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
index 21a9066..ecea341 100644
--- a/arch/ia64/include/asm/rwsem.h
+++ b/arch/ia64/include/asm/rwsem.h
@@ -141,7 +141,7 @@ __downgrade_write (struct rw_semaphore *sem)
 
 	do {
 		old = atomic_long_read(&sem->count);
-		new = old - RWSEM_WAITING_BIAS;
+		new = old - RWSEM_ACTIVE_WRITE_BIAS + RWSEM_ACTIVE_READ_BIAS;
 	} while (atomic_long_cmpxchg_release(&sem->count, old, new) != old);
 
 	if (old < 0)
diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
index 13dedc8..e675a64 100644
--- a/arch/s390/include/asm/rwsem.h
+++ b/arch/s390/include/asm/rwsem.h
@@ -188,7 +188,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 {
 	signed long old, new, tmp;
 
-	tmp = -RWSEM_WAITING_BIAS;
+	tmp = -RWSEM_ACTIVE_WRITE_BIAS + RWSEM_ACTIVE_READ_BIAS;
 	asm volatile(
 		"	lg	%0,%2\n"
 		"0:	lgr	%1,%0\n"
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index c6d155e..ea36832 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -192,7 +192,8 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 		     "1:\n\t"
 		     "# ending __downgrade_write\n"
 		     : "+m" (sem->count)
-		     : "a" (sem), "er" (-RWSEM_WAITING_BIAS)
+		     : "a" (sem), "er" (-RWSEM_ACTIVE_WRITE_BIAS +
+					 RWSEM_ACTIVE_READ_BIAS)
 		     : "memory", "cc");
 }
 
diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
index 3cb8d98..962e75b 100644
--- a/include/asm-generic/rwsem.h
+++ b/include/asm-generic/rwsem.h
@@ -106,8 +106,8 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 	 * read-locked region is ok to be re-ordered into the
 	 * write side. As such, rely on RELEASE semantics.
 	 */
-	tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS,
-				     (atomic_long_t *)&sem->count);
+	tmp = atomic_long_add_return_release(-RWSEM_ACTIVE_WRITE_BIAS +
+			RWSEM_ACTIVE_READ_BIAS, (atomic_long_t *)&sem->count);
 	if (tmp < 0)
 		rwsem_downgrade_wake(sem);
 }
diff --git a/include/asm-generic/rwsem_types.h b/include/asm-generic/rwsem_types.h
index 093ef6a..6d55d25 100644
--- a/include/asm-generic/rwsem_types.h
+++ b/include/asm-generic/rwsem_types.h
@@ -7,20 +7,22 @@
  * the semaphore definition
  *
  * The bias values and the counter type limits the number of
- * potential readers/writers to 32767 for 32 bits and 2147483647
- * for 64 bits.
+ * potential writers to 16383 for 32 bits and 1073741823 for 64 bits.
+ * The combined readers and writers can go up to 65534 for 32-bits and
+ * 4294967294 for 64-bits.
  */
 #ifdef CONFIG_64BIT
 # define RWSEM_ACTIVE_MASK		0xffffffffL
+# define RWSEM_WAITING_BIAS		(-(1L << 62))
 #else
 # define RWSEM_ACTIVE_MASK		0x0000ffffL
+# define RWSEM_WAITING_BIAS		(-(1L << 30))
 #endif
 
 #define RWSEM_UNLOCKED_VALUE		0x00000000L
 #define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
 #define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
+#define RWSEM_ACTIVE_WRITE_BIAS		(-RWSEM_ACTIVE_MASK)
 
 #endif	/* __KERNEL__ */
 #endif	/* _ASM_GENERIC_RWSEM_TYPES_H */
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 5fd689e..3330c0a 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -29,28 +29,30 @@
  * 0x00000000	rwsem is unlocked, and no one is waiting for the lock or
  *		attempting to read lock or write lock.
  *
- * 0xffff000X	(1) X readers active or attempting lock, with waiters for lock
+ * 0xc000000X	(1) X readers active or attempting lock, with waiters for lock
  *		    X = #active readers + # readers attempting lock
  *		    (X*ACTIVE_BIAS + WAITING_BIAS)
- *		(2) 1 writer attempting lock, no waiters for lock
+ *
+ * 0xffff000X	(1) 1 writer attempting lock, no waiters for lock
  *		    X-1 = #active readers + #readers attempting lock
  *		    ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
- *		(3) 1 writer active, no waiters for lock
+ *		(2) 1 writer active, no waiters for lock
  *		    X-1 = #active readers + #readers attempting lock
  *		    ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
  *
- * 0xffff0001	(1) 1 reader active or attempting lock, waiters for lock
+ * 0xc0000001	(1) 1 reader active or attempting lock, waiters for lock
  *		    (WAITING_BIAS + ACTIVE_BIAS)
- *		(2) 1 writer active or attempting lock, no waiters for lock
+ *
+ * 0xffff0001	(1) 1 writer active or attempting lock, no waiters for lock
  *		    (ACTIVE_WRITE_BIAS)
  *
- * 0xffff0000	(1) There are writers or readers queued but none active
+ * 0xc0000000	(1) There are writers or readers queued but none active
  *		    or in the process of attempting lock.
  *		    (WAITING_BIAS)
  *		Note: writer can attempt to steal lock for this count by adding
  *		ACTIVE_WRITE_BIAS in cmpxchg and checking the old count
  *
- * 0xfffe0001	(1) 1 writer active, or attempting lock. Waiters on queue.
+ * 0xbfff0001	(1) 1 writer active, or attempting lock. Waiters on queue.
  *		    (ACTIVE_WRITE_BIAS + WAITING_BIAS)
  *
  * Note: Readers attempt to lock by adding ACTIVE_BIAS in down_read and checking
@@ -62,9 +64,23 @@
  *	 checking the count becomes ACTIVE_WRITE_BIAS for successful lock
  *	 acquisition (i.e. nobody else has lock or attempts lock).  If
  *	 unsuccessful, in rwsem_down_write_failed, we'll check to see if there
- *	 are only waiters but none active (5th case above), and attempt to
+ *	 are only waiters but none active (7th case above), and attempt to
  *	 steal the lock.
  *
+ *	 We can infer the reader/writer/waiter state of the lock by looking
+ *	 at the count value:
+ *	 (1) count > 0
+ *	     Only readers are present.
+ *	 (2) WAITING_BIAS - ACTIVE_WRITE_BIAS < count < 0
+ *	     Have writers, maybe readers, but no waiter
+ *	 (3) WAITING_BIAS < count <= WAITING_BIAS - ACTIVE_WRITE_BIAS
+ *	     Have readers and waiters, but no writer
+ *	 (4) count < WAITING_BIAS
+ *	     Have writers and waiters, maybe readers
+ *
+ *	 IOW, writers are present when
+ *	 (1) count < WAITING_BIAS, or
+ *	 (2) WAITING_BIAS - ACTIVE_WRITE_BIAS < count < 0
  */
 
 /*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH-tip/locking/core v3 08/10] locking/rwsem: Enable spinning readers
  2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
                   ` (6 preceding siblings ...)
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 07/10] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation Waiman Long
@ 2016-06-17 15:41 ` Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 09/10] locking/rwsem: Enable reactivation of reader spinning Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 10/10] locking/rwsem: Add a boot parameter to reader spinning threshold Waiman Long
  9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

This patch enables readers to optimistically spin when the
rspin_threshold is non-zero. That threshold value should only
be set when the lock owners of the rwsem are unlikely to go to
sleep. Otherwise enabling reader spinning may make the performance
worse in some cases.

On a 4-socket Haswell machine running on a 4.7-rc1 tip-based kernel,
the fio test with multithreaded randrw and randwrite tests on the
same file on a XFS partition on top of a NVDIMM with DAX were run,
the aggregated bandwidths before and after the reader optimistic
spinning patchset were as follows:

  Test      BW before patch     BW after patch  % change
  ----      ---------------     --------------  --------
  randrw        1352 MB/s          2164 MB/s      +60%
  randwrite     1710 MB/s          2550 MB/s      +49%

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 kernel/locking/rwsem-xadd.c |   48 ++++++++++++++++++++++++++++++++++++------
 1 files changed, 41 insertions(+), 7 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 3330c0a..42c8dda 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -83,6 +83,12 @@
  *	 (2) WAITING_BIAS - ACTIVE_WRITE_BIAS < count < 0
  */
 
+static inline bool count_has_writer(long count)
+{
+	return (count < RWSEM_WAITING_BIAS) || ((count < 0) &&
+	       (count > RWSEM_WAITING_BIAS - RWSEM_ACTIVE_WRITE_BIAS));
+}
+
 /*
  * Initialize an rwsem:
  */
@@ -294,6 +300,25 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 	}
 }
 
+/*
+ * Try to acquire read lock before the reader is put on wait queue
+ */
+static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+{
+	long count = atomic_long_read(&sem->count);
+
+	if (count_has_writer(count))
+		return false;
+	count = atomic_long_add_return_acquire(RWSEM_ACTIVE_READ_BIAS,
+					       &sem->count);
+	if (!count_has_writer(count))
+		return true;
+
+	/* Back out the change */
+	atomic_long_add(-RWSEM_ACTIVE_READ_BIAS, &sem->count);
+	return false;
+}
+
 static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner;
@@ -360,7 +385,8 @@ out:
 	return rwsem_owner_is_reader(READ_ONCE(sem->owner)) ? 0 : 1;
 }
 
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+				  enum rwsem_waiter_type type)
 {
 	bool taken = false;
 	int owner_state;	/* Lock owner state */
@@ -388,10 +414,11 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		/*
 		 * Try to acquire the lock
 		 */
-		if (rwsem_try_write_lock_unqueued(sem)) {
-			taken = true;
+		taken = (type == RWSEM_WAITING_FOR_WRITE)
+		      ? rwsem_try_write_lock_unqueued(sem)
+		      : rwsem_try_read_lock_unqueued(sem);
+		if (taken)
 			break;
-		}
 
 		/*
 		 * We only decremnt the rspin_cnt when the lock is owned
@@ -426,7 +453,8 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	 * Check the success or failure of writer spinning on reader so as
 	 * to adjust the rspin_enabled count accordingly.
 	 */
-	if (rwsem_owner_is_reader(sem->owner)) {
+	if ((type == RWSEM_WAITING_FOR_WRITE) &&
+	    rwsem_owner_is_reader(sem->owner)) {
 		/*
 		 * Update rspin_enabled for reader spinning.
 		 *
@@ -466,7 +494,8 @@ static inline bool reader_spinning_enabled(struct rw_semaphore *sem)
 	return sem->rspin_enabled;
 }
 #else
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
+				  enum rwsem_waiter_type type)
 {
 	return false;
 }
@@ -500,6 +529,11 @@ struct rw_semaphore __sched * rwsem_down_read_failed(struct rw_semaphore *sem)
 	 */
 	atomic_long_add(-RWSEM_ACTIVE_READ_BIAS, &sem->count);
 
+	/* do optimistic spinning and steal lock if possible */
+	if (reader_spinning_enabled(sem) &&
+	    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ))
+		return sem;
+
 	/* set up my own style of waitqueue */
 	waiter.task = tsk;
 	waiter.type = RWSEM_WAITING_FOR_READ;
@@ -556,7 +590,7 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 	count = atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS, &sem->count);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_optimistic_spin(sem))
+	if (rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
 		return sem;
 
 	/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH-tip/locking/core v3 09/10] locking/rwsem: Enable reactivation of reader spinning
  2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
                   ` (7 preceding siblings ...)
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 08/10] locking/rwsem: Enable spinning readers Waiman Long
@ 2016-06-17 15:41 ` Waiman Long
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 10/10] locking/rwsem: Add a boot parameter to reader spinning threshold Waiman Long
  9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

Reader optimistic spinning will be disabled once the rspin_enabled
count reaches 0. After that, it cannot be re-enabled. This may cause
an eligible rwsem locked out of reader spinning because of a series
of unfortunate events.

This patch looks at the regular writer-on-writer spinning history. If
there are sufficient more successful spin attempts than failed ones,
it will try to reactivate reader spinning.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 include/linux/rwsem.h       |   12 ++++++++----
 kernel/locking/rwsem-xadd.c |   27 +++++++++++++++++++++++++--
 2 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 8978f87..98284b4 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -32,7 +32,11 @@ struct rw_semaphore {
 	raw_spinlock_t wait_lock;
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	struct optimistic_spin_queue osq; /* spinner MCS lock */
-	int rspin_enabled;	/* protected by osq lock */
+	/*
+	 * Reader optimistic spinning fields protected by osq lock
+	 */
+	uint16_t rspin_enabled;
+	int16_t  wspin_cnt;
 
 	/*
 	 * Write owner. Used as a speculative check to see
@@ -74,10 +78,10 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
 /*
  * Each successful reader spin will increment the rspin_enabled by 1.
  * Each unsuccessful spin, on the other hand, will decrement it by 2.
- * Reader spinning will be permanently disabled when it reaches 0.
+ * Reader spinning will be disabled when it reaches 0.
  */
 #ifndef RWSEM_RSPIN_ENABLED_DEFAULT
-# define RWSEM_RSPIN_ENABLED_DEFAULT	40
+# define RWSEM_RSPIN_ENABLED_DEFAULT	30
 #endif
 #define RWSEM_RSPIN_ENABLED_MAX		1024
 
@@ -87,7 +91,7 @@ static inline int rwsem_is_locked(struct rw_semaphore *sem)
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 #define __RWSEM_OPT_INIT(lockname) , .osq = OSQ_LOCK_UNLOCKED, .owner = NULL, \
-		.rspin_enabled = RWSEM_RSPIN_ENABLED_DEFAULT
+		.rspin_enabled = RWSEM_RSPIN_ENABLED_DEFAULT, .wspin_cnt = 0
 #else
 #define __RWSEM_OPT_INIT(lockname)
 #endif
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 42c8dda..c6b6105 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -108,6 +108,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name,
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	sem->owner = NULL;
 	sem->rspin_enabled = RWSEM_RSPIN_ENABLED_DEFAULT;
+	sem->wspin_cnt = 0;
 	osq_lock_init(&sem->osq);
 #endif
 }
@@ -466,10 +467,32 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 		if (taken && (sem->rspin_enabled < RWSEM_RSPIN_ENABLED_MAX)) {
 			sem->rspin_enabled++;
 		} else if (!taken) {
-			if  (sem->rspin_enabled > 2)
+			if  (sem->rspin_enabled > 2) {
 				sem->rspin_enabled -= 2;
-			else
+			} else if (sem->rspin_enabled) {
 				sem->rspin_enabled = 0;
+				/*
+				 * Reset wspin_cnt so that it won't get
+				 * re-enabled too soon.
+				 */
+				if (sem->wspin_cnt > -30)
+					sem->wspin_cnt = -30;
+			}
+		}
+	} else if (type == RWSEM_WAITING_FOR_WRITE) {
+		/*
+		 * Every 10 successful writer-on-writer spins more than failed
+		 * spins will increment rspin_enabled to encourage more
+		 * writer-on-reader spinning attempts.
+		 */
+		if (taken) {
+			if ((++sem->wspin_cnt >= 10) &&
+			    (sem->rspin_enabled < RWSEM_RSPIN_ENABLED_MAX)) {
+				sem->wspin_cnt = 0;
+				sem->rspin_enabled++;
+			}
+		} else if (sem->wspin_cnt > -100) {
+			sem->wspin_cnt--;
 		}
 	}
 	osq_unlock(&sem->osq);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH-tip/locking/core v3 10/10] locking/rwsem: Add a boot parameter to reader spinning threshold
  2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
                   ` (8 preceding siblings ...)
  2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 09/10] locking/rwsem: Enable reactivation of reader spinning Waiman Long
@ 2016-06-17 15:41 ` Waiman Long
  9 siblings, 0 replies; 11+ messages in thread
From: Waiman Long @ 2016-06-17 15:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, linux-doc, Davidlohr Bueso, Jason Low, Dave Chinner,
	Jonathan Corbet, Scott J Norton, Douglas Hatch, Waiman Long

The default reader spining threshold is current set to 4096. However,
the right reader spinning threshold may vary from one system to
another and among the different architectures. This patch adds a new
kernel boot parameter to modify the threshold value. This enables
better tailoring to the needs of different systems as well as for
testing purposes.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 Documentation/kernel-parameters.txt |    3 +++
 kernel/locking/rwsem-xadd.c         |   14 +++++++++++++-
 2 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 82b42c9..3bee995 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3645,6 +3645,9 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 
 	rw		[KNL] Mount root device read-write on boot
 
+	rwsem_rspin_threshold=
+			[KNL] Set rw semaphore reader spinning threshold
+
 	S		[KNL] Run init in single mode
 
 	s390_iommu=	[HW,S390]
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index c6b6105..6360180 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -280,6 +280,18 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
+ * Reader spinning threshold
+ */
+static int __read_mostly rspin_threshold = RWSEM_RSPIN_THRESHOLD;
+
+static int __init set_rspin_threshold(char *str)
+{
+	get_option(&str, &rspin_threshold);
+	return 0;
+}
+early_param("rwsem_rspin_threshold", set_rspin_threshold);
+
+/*
  * Try to acquire write lock before the writer has been put on wait queue.
  */
 static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
@@ -402,7 +414,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem,
 	if (!osq_lock(&sem->osq))
 		goto done;
 
-	rspin_cnt = sem->rspin_enabled ? RWSEM_RSPIN_THRESHOLD : 0;
+	rspin_cnt = sem->rspin_enabled ? rspin_threshold : 0;
 
 	/*
 	 * Optimistically spin on the owner field and attempt to acquire the
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-06-17 15:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-17 15:41 [RFC PATCH-tip/locking/core v3 00/10] locking/rwsem: Enable reader optimistic spinning Waiman Long
2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 01/10] locking/osq: Make lock/unlock proper acquire/release barrier Waiman Long
2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 02/10] locking/rwsem: Stop active read lock ASAP Waiman Long
2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 03/10] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 04/10] locking/rwsem: Enable count-based spinning on reader Waiman Long
2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 05/10] locking/rwsem: move down rwsem_down_read_failed function Waiman Long
2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 06/10] locking/rwsem: Move common rwsem macros to asm-generic/rwsem_types.h Waiman Long
2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 07/10] locking/rwsem: Change RWSEM_WAITING_BIAS for better disambiguation Waiman Long
2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 08/10] locking/rwsem: Enable spinning readers Waiman Long
2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 09/10] locking/rwsem: Enable reactivation of reader spinning Waiman Long
2016-06-17 15:41 ` [RFC PATCH-tip/locking/core v3 10/10] locking/rwsem: Add a boot parameter to reader spinning threshold Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).