linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] locking/rwsem: Rework reader optimistic spinning
@ 2020-11-18  3:04 Waiman Long
  2020-11-18  3:04 ` [PATCH 1/5] locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath() Waiman Long
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Waiman Long @ 2020-11-18  3:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: linux-kernel, Davidlohr Bueso, Phil Auld, Waiman Long

A recent report of SAP certification failure caused by increased system
time due to rwsem reader optimistic spinning led me to reexamine the
code to see the pro and cons of doing it. This led me to discover a
potential lock starvation scenario as explained in patch 2. That patch
does reduce reader spinning to avoid this potential problem. Patches
3 and 4 are further optimizations of the current code.

Then there is the issue of reader fragmentation that can potentially
reduce performance in some heavy contention cases. Two different approaches
are attempted:
 1) further reduce reader optimistic spinning
 2) disable reader spinning

See the performance shown in patch 5.

This patch series adopts the second approach by dropping reader spinning
for now. We can discuss if this is the right move or we should try the
alternative or just don't do anything further.

Waiman Long (5):
  locking/rwsem: Pass the current atomic count to
    rwsem_down_read_slowpath()
  locking/rwsem: Prevent potential lock starvation
  locking/rwsem: Enable reader optimistic lock stealing
  locking/rwsem: Wake up all waiting readers if RWSEM_WAKE_READ_OWNED
  locking/rwsem: Remove reader optimistic spinning

 kernel/locking/lock_events_list.h |   6 +-
 kernel/locking/rwsem.c            | 277 ++++++++----------------------
 2 files changed, 73 insertions(+), 210 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/5] locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath()
  2020-11-18  3:04 [PATCH 0/5] locking/rwsem: Rework reader optimistic spinning Waiman Long
@ 2020-11-18  3:04 ` Waiman Long
  2020-11-18  3:04 ` [PATCH 2/5] locking/rwsem: Prevent potential lock starvation Waiman Long
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2020-11-18  3:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: linux-kernel, Davidlohr Bueso, Phil Auld, Waiman Long

The atomic count value right after reader count increment can be useful
to determine the rwsem state at trylock time. So the count value is
passed down to rwsem_down_read_slowpath() to be used when appropriate.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index f11b9bd3431d..12761e02ab9b 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -270,12 +270,12 @@ static inline void rwsem_set_nonspinnable(struct rw_semaphore *sem)
 					  owner | RWSEM_NONSPINNABLE));
 }
 
-static inline bool rwsem_read_trylock(struct rw_semaphore *sem)
+static inline long rwsem_read_trylock(struct rw_semaphore *sem)
 {
 	long cnt = atomic_long_add_return_acquire(RWSEM_READER_BIAS, &sem->count);
 	if (WARN_ON_ONCE(cnt < 0))
 		rwsem_set_nonspinnable(sem);
-	return !(cnt & RWSEM_READ_FAILED_MASK);
+	return cnt;
 }
 
 /*
@@ -989,9 +989,9 @@ rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
  * Wait for the read lock to be granted
  */
 static struct rw_semaphore __sched *
-rwsem_down_read_slowpath(struct rw_semaphore *sem, int state)
+rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 {
-	long count, adjustment = -RWSEM_READER_BIAS;
+	long adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 	bool wake = false;
@@ -1337,8 +1337,10 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (!rwsem_read_trylock(sem)) {
-		rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE);
+	long count = rwsem_read_trylock(sem);
+
+	if (count & RWSEM_READ_FAILED_MASK) {
+		rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE, count);
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	} else {
 		rwsem_set_reader_owned(sem);
@@ -1347,8 +1349,10 @@ static inline void __down_read(struct rw_semaphore *sem)
 
 static inline int __down_read_killable(struct rw_semaphore *sem)
 {
-	if (!rwsem_read_trylock(sem)) {
-		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE)))
+	long count = rwsem_read_trylock(sem);
+
+	if (count & RWSEM_READ_FAILED_MASK) {
+		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE, count)))
 			return -EINTR;
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	} else {
-- 
2.18.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 2/5] locking/rwsem: Prevent potential lock starvation
  2020-11-18  3:04 [PATCH 0/5] locking/rwsem: Rework reader optimistic spinning Waiman Long
  2020-11-18  3:04 ` [PATCH 1/5] locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath() Waiman Long
@ 2020-11-18  3:04 ` Waiman Long
  2020-11-20 14:44   ` Peter Zijlstra
  2020-11-18  3:04 ` [PATCH 3/5] locking/rwsem: Enable reader optimistic lock stealing Waiman Long
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2020-11-18  3:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: linux-kernel, Davidlohr Bueso, Phil Auld, Waiman Long

The lock handoff bit is added in commit 4f23dbc1e657 ("locking/rwsem:
Implement lock handoff to prevent lock starvation") to avoid lock
starvation. However, allowing readers to do optimistic spinning does
introduce an unlikely scenario where lock starvation can happen.

The lock handoff bit may only be set when a waiter is being woken up.
In the case of reader unlock, wakeup happens only when the reader count
reaches 0. If there is a continuous stream of incoming readers acquiring
read lock via optimistic spinning, it is possible that the reader count
may never reach 0 and so the handoff bit will never be asserted.

One way to prevent this scenario from happening is to disallow optimistic
spinning if the rwsem is currently owned by readers. If the previous
or current owner is a writer, optimistic spinning will be allowed.

If the previous owner is a reader but the reader count has reached 0
before, a wakeup should have been issued. It is also OK to do optimistic
spinning in this case.

This patch may have some impact on reader performance as it reduces
reader optimistic spinning especially if the lock critical sections
are short the number of contending readers are small.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 12761e02ab9b..ee374ae061c3 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -991,16 +991,27 @@ rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
 static struct rw_semaphore __sched *
 rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 {
-	long adjustment = -RWSEM_READER_BIAS;
+	long owner, adjustment = -RWSEM_READER_BIAS;
+	long rcnt = (count >> RWSEM_READER_SHIFT);	/* Reader count */
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 	bool wake = false;
 
+	/*
+	 * To prevent a constant stream of readers from starving a sleeping
+	 * waiter, don't attempt optimistic spinning if the lock is currently
+	 * owned by readers.
+	 */
+	owner = atomic_long_read(&sem->owner);
+	if ((owner & RWSEM_READER_OWNED) && (rcnt > 1) &&
+	   !(count & RWSEM_WRITER_LOCKED))
+		goto queue;
+
 	/*
 	 * Save the current read-owner of rwsem, if available, and the
 	 * reader nonspinnable bit.
 	 */
-	waiter.last_rowner = atomic_long_read(&sem->owner);
+	waiter.last_rowner = owner;
 	if (!(waiter.last_rowner & RWSEM_READER_OWNED))
 		waiter.last_rowner &= RWSEM_RD_NONSPINNABLE;
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 3/5] locking/rwsem: Enable reader optimistic lock stealing
  2020-11-18  3:04 [PATCH 0/5] locking/rwsem: Rework reader optimistic spinning Waiman Long
  2020-11-18  3:04 ` [PATCH 1/5] locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath() Waiman Long
  2020-11-18  3:04 ` [PATCH 2/5] locking/rwsem: Prevent potential lock starvation Waiman Long
@ 2020-11-18  3:04 ` Waiman Long
  2020-11-20 14:36   ` Peter Zijlstra
  2020-12-08  3:53   ` Davidlohr Bueso
  2020-11-18  3:04 ` [PATCH 4/5] locking/rwsem: Wake up all waiting readers if RWSEM_WAKE_READ_OWNED Waiman Long
  2020-11-18  3:04 ` [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning Waiman Long
  4 siblings, 2 replies; 23+ messages in thread
From: Waiman Long @ 2020-11-18  3:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: linux-kernel, Davidlohr Bueso, Phil Auld, Waiman Long

If the optimistic spinning queue is empty and the rwsem does not have
the handoff or write-lock bits set, it is actually not necessary to
call rwsem_optimistic_spin() to spin on it. Instead, it can steal the
lock directly as its reader bias is in the count already.  If it is
the first reader in this state, it will try to wake up other readers
in the wait queue.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lock_events_list.h |  1 +
 kernel/locking/rwsem.c            | 27 +++++++++++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 239039d0ce21..270a0d351932 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -63,6 +63,7 @@ LOCK_EVENT(rwsem_opt_nospin)	/* # of disabled optspins		*/
 LOCK_EVENT(rwsem_opt_norspin)	/* # of disabled reader-only optspins	*/
 LOCK_EVENT(rwsem_opt_rlock2)	/* # of opt-acquired 2ndary read locks	*/
 LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
+LOCK_EVENT(rwsem_rlock_steal)	/* # of read locks by lock stealing	*/
 LOCK_EVENT(rwsem_rlock_fast)	/* # of fast read locks acquired	*/
 LOCK_EVENT(rwsem_rlock_fail)	/* # of failed read lock acquisitions	*/
 LOCK_EVENT(rwsem_rlock_handoff)	/* # of read lock handoffs		*/
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index ee374ae061c3..930dd4af3639 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -957,6 +957,12 @@ static inline bool rwsem_reader_phase_trylock(struct rw_semaphore *sem,
 	}
 	return false;
 }
+
+static inline bool osq_is_empty(struct rw_semaphore *sem)
+{
+	return !osq_is_locked(&sem->osq);
+}
+
 #else
 static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,
 					   unsigned long nonspinnable)
@@ -977,6 +983,10 @@ static inline bool rwsem_reader_phase_trylock(struct rw_semaphore *sem,
 	return false;
 }
 
+static inline bool osq_is_empty(sem)
+{
+	return false;
+}
 static inline int
 rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
 {
@@ -1007,6 +1017,22 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 	   !(count & RWSEM_WRITER_LOCKED))
 		goto queue;
 
+	/*
+	 * Reader optimistic lock stealing
+	 *
+	 * We can take the read lock directly without doing
+	 * rwsem_optimistic_spin() if the conditions are right.
+	 * Also wake up other readers if it is the first reader.
+	 */
+	if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF)) &&
+	    osq_is_empty(sem)) {
+		rwsem_set_reader_owned(sem);
+		lockevent_inc(rwsem_rlock_steal);
+		if (rcnt == 1)
+			goto wake_readers;
+		return sem;
+	}
+
 	/*
 	 * Save the current read-owner of rwsem, if available, and the
 	 * reader nonspinnable bit.
@@ -1029,6 +1055,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 		 * Wake up other readers in the wait list if the front
 		 * waiter is a reader.
 		 */
+wake_readers:
 		if ((atomic_long_read(&sem->count) & RWSEM_FLAG_WAITERS)) {
 			raw_spin_lock_irq(&sem->wait_lock);
 			if (!list_empty(&sem->wait_list))
-- 
2.18.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 4/5] locking/rwsem: Wake up all waiting readers if RWSEM_WAKE_READ_OWNED
  2020-11-18  3:04 [PATCH 0/5] locking/rwsem: Rework reader optimistic spinning Waiman Long
                   ` (2 preceding siblings ...)
  2020-11-18  3:04 ` [PATCH 3/5] locking/rwsem: Enable reader optimistic lock stealing Waiman Long
@ 2020-11-18  3:04 ` Waiman Long
  2020-11-18  4:53   ` Davidlohr Bueso
  2020-11-18  3:04 ` [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning Waiman Long
  4 siblings, 1 reply; 23+ messages in thread
From: Waiman Long @ 2020-11-18  3:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: linux-kernel, Davidlohr Bueso, Phil Auld, Waiman Long

The rwsem wakeup logic has been modified by commit d3681e269fff
("locking/rwsem: Wake up almost all readers in wait queue") to wake up
all readers in the wait queue if the first waiter is a reader. In the
case of RWSEM_WAKE_READ_OWNED, not all readers can be woken up if the
first waiter happens to be a writer. Complete the logic by waking up
all readers even for this case.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 930dd4af3639..23654e3950b5 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -426,7 +426,12 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
 			lockevent_inc(rwsem_wake_writer);
 		}
 
-		return;
+		/*
+		 * If rwsem has already been owned by reader, wake up other
+		 * readers in the wait queue even if first one is a writer.
+		 */
+		if (wake_type != RWSEM_WAKE_READ_OWNED)
+			return;
 	}
 
 	/*
@@ -1052,8 +1057,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 	if (rwsem_optimistic_spin(sem, false)) {
 		/* rwsem_optimistic_spin() implies ACQUIRE on success */
 		/*
-		 * Wake up other readers in the wait list if the front
-		 * waiter is a reader.
+		 * Wake up other readers in the wait queue.
 		 */
 wake_readers:
 		if ((atomic_long_read(&sem->count) & RWSEM_FLAG_WAITERS)) {
-- 
2.18.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-18  3:04 [PATCH 0/5] locking/rwsem: Rework reader optimistic spinning Waiman Long
                   ` (3 preceding siblings ...)
  2020-11-18  3:04 ` [PATCH 4/5] locking/rwsem: Wake up all waiting readers if RWSEM_WAKE_READ_OWNED Waiman Long
@ 2020-11-18  3:04 ` Waiman Long
  2020-11-18  5:35   ` Davidlohr Bueso
  2020-11-20 14:42   ` Peter Zijlstra
  4 siblings, 2 replies; 23+ messages in thread
From: Waiman Long @ 2020-11-18  3:04 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: linux-kernel, Davidlohr Bueso, Phil Auld, Waiman Long

Reader optimistic spinning is helpful when the reader critical section
is short and there aren't that many readers around. It also improves
the chance that a reader can get the lock as writer optimistic spinning
disproportionally favors writers much more than readers.

Since commit d3681e269fff ("locking/rwsem: Wake up almost all readers
in wait queue"), all the waiting readers are woken up so that they can
all get the read lock and run in parallel. When the number of contending
readers is large, allowing reader optimistic spinning will likely cause
reader fragmentation where multiple smaller groups of readers can get
the read lock in a sequential manner separated by writers. That reduces
reader parallelism.

One possible way to address that drawback is to limit the number of
readers (preferably one) that can do optimistic spinning. These readers
act as representatives of all the waiting readers in the wait queue as
they will wake up all those waiting readers once they get the lock.

Alternatively, as reader optimistic lock stealing has already enhanced
fairness to readers, it may be easier to just remove reader optimistic
spinning and simplifying the optimistic spinning code as a result.

Performance measurements (locking throughput kops/s) using a locking
microbenchmark with 50/50 reader/writer distribution and turbo-boost
disabled was done on a 2-socket Cascade Lake system (48-core 96-thread)
to see the impacts of these changes:

  1) Vanilla     - 5.10-rc3 kernel
  2) Before      - 5.10-rc3 kernel with previous patches in this series
  2) limit-rspin - 5.10-rc3 kernel with limited reader spinning patch
  3) no-rspin    - 5.10-rc3 kernel with reader spinning disabled

  # of threads  CS Load   Vanilla  Before   limit-rspin   no-rspin
  ------------  -------   -------  ------   -----------   --------
       2            1      5,185    5,662      5,214       5,077
       4            1      5,107    4,983      5,188       4,760
       8            1      4,782    4,564      4,720       4,628
      16            1      4,680    4,053      4,567       3,402
      32            1      4,299    1,115      1,118       1,098
      64            1      3,218      983      1,001         957
      96            1      1,938      944        957         930

       2           20      2,008    2,128      2,264       1,665
       4           20      1,390    1,033      1,046       1,101
       8           20      1,472    1,155      1,098       1,213
      16           20      1,332    1,077      1,089       1,122
      32           20        967      914        917         980
      64           20        787      874        891         858
      96           20        730      836        847         844

       2          100        372      356        360         355
       4          100        492      425        434         392
       8          100        533      537        529         538
      16          100        548      572        568         598
      32          100        499      520        527         537
      64          100        466      517        526         512
      96          100        406      497        506         509

The column "CS Load" represents the number of pause instructions issued
in the locking critical section. A CS load of 1 is extremely short and
is not likey in real situations. A load of 20 (moderate) and 100 (long)
are more realistic.

It can be seen that the previous patches in this series have reduced
performance in general except in highly contended cases with moderate
or long critical sections that performance improves a bit. This change
is mostly caused by the "Prevent potential lock starvation" patch that
reduce reader optimistic spinning and hence reduce reader fragmentation.

The patch that further limit reader optimistic spinning doesn't seem to
have too much impact on overall performance as shown in the benchmark
data.

The patch that disables reader optimistic spinning shows reduced
performance at lightly loaded cases, but comparable or slightly better
performance on with heavier contention.

This patch just removes reader optimistic spinning for now. As readers
are not going to do optimistic spinning anymore, we don't need to
consider if the OSQ is empty or not when doing lock stealing.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lock_events_list.h |   5 +-
 kernel/locking/rwsem.c            | 271 +++++-------------------------
 2 files changed, 46 insertions(+), 230 deletions(-)

diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 270a0d351932..97fb6f3f840a 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -56,12 +56,9 @@ LOCK_EVENT(rwsem_sleep_reader)	/* # of reader sleeps			*/
 LOCK_EVENT(rwsem_sleep_writer)	/* # of writer sleeps			*/
 LOCK_EVENT(rwsem_wake_reader)	/* # of reader wakeups			*/
 LOCK_EVENT(rwsem_wake_writer)	/* # of writer wakeups			*/
-LOCK_EVENT(rwsem_opt_rlock)	/* # of opt-acquired read locks		*/
-LOCK_EVENT(rwsem_opt_wlock)	/* # of opt-acquired write locks	*/
+LOCK_EVENT(rwsem_opt_lock)	/* # of opt-acquired write locks	*/
 LOCK_EVENT(rwsem_opt_fail)	/* # of failed optspins			*/
 LOCK_EVENT(rwsem_opt_nospin)	/* # of disabled optspins		*/
-LOCK_EVENT(rwsem_opt_norspin)	/* # of disabled reader-only optspins	*/
-LOCK_EVENT(rwsem_opt_rlock2)	/* # of opt-acquired 2ndary read locks	*/
 LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
 LOCK_EVENT(rwsem_rlock_steal)	/* # of read locks by lock stealing	*/
 LOCK_EVENT(rwsem_rlock_fast)	/* # of fast read locks acquired	*/
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 23654e3950b5..21fea6b4d777 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -31,19 +31,13 @@
 #include "lock_events.h"
 
 /*
- * The least significant 3 bits of the owner value has the following
+ * The least significant 2 bits of the owner value has the following
  * meanings when set.
  *  - Bit 0: RWSEM_READER_OWNED - The rwsem is owned by readers
- *  - Bit 1: RWSEM_RD_NONSPINNABLE - Readers cannot spin on this lock.
- *  - Bit 2: RWSEM_WR_NONSPINNABLE - Writers cannot spin on this lock.
+ *  - Bit 1: RWSEM_NONSPINNABLE - Cannot spin on a reader-owned lock
  *
- * When the rwsem is either owned by an anonymous writer, or it is
- * reader-owned, but a spinning writer has timed out, both nonspinnable
- * bits will be set to disable optimistic spinning by readers and writers.
- * In the later case, the last unlocking reader should then check the
- * writer nonspinnable bit and clear it only to give writers preference
- * to acquire the lock via optimistic spinning, but not readers. Similar
- * action is also done in the reader slowpath.
+ * When the rwsem is reader-owned and a spinning writer has timed out,
+ * the nonspinnable bit will be set to disable optimistic spinning.
 
  * When a writer acquires a rwsem, it puts its task_struct pointer
  * into the owner field. It is cleared after an unlock.
@@ -59,46 +53,14 @@
  * is involved. Ideally we would like to track all the readers that own
  * a rwsem, but the overhead is simply too big.
  *
- * Reader optimistic spinning is helpful when the reader critical section
- * is short and there aren't that many readers around. It makes readers
- * relatively more preferred than writers. When a writer times out spinning
- * on a reader-owned lock and set the nospinnable bits, there are two main
- * reasons for that.
- *
- *  1) The reader critical section is long, perhaps the task sleeps after
- *     acquiring the read lock.
- *  2) There are just too many readers contending the lock causing it to
- *     take a while to service all of them.
- *
- * In the former case, long reader critical section will impede the progress
- * of writers which is usually more important for system performance. In
- * the later case, reader optimistic spinning tends to make the reader
- * groups that contain readers that acquire the lock together smaller
- * leading to more of them. That may hurt performance in some cases. In
- * other words, the setting of nonspinnable bits indicates that reader
- * optimistic spinning may not be helpful for those workloads that cause
- * it.
- *
- * Therefore, any writers that had observed the setting of the writer
- * nonspinnable bit for a given rwsem after they fail to acquire the lock
- * via optimistic spinning will set the reader nonspinnable bit once they
- * acquire the write lock. Similarly, readers that observe the setting
- * of reader nonspinnable bit at slowpath entry will set the reader
- * nonspinnable bits when they acquire the read lock via the wakeup path.
- *
- * Once the reader nonspinnable bit is on, it will only be reset when
- * a writer is able to acquire the rwsem in the fast path or somehow a
- * reader or writer in the slowpath doesn't observe the nonspinable bit.
- *
- * This is to discourage reader optmistic spinning on that particular
- * rwsem and make writers more preferred. This adaptive disabling of reader
- * optimistic spinning will alleviate the negative side effect of this
- * feature.
+ * A fast path reader optimistic lock stealing is supported when the rwsem
+ * is previously owned by a writer and the following conditions are met:
+ *  - OSQ is empty
+ *  - rwsem is not currently writer owned
+ *  - the handoff isn't set.
  */
 #define RWSEM_READER_OWNED	(1UL << 0)
-#define RWSEM_RD_NONSPINNABLE	(1UL << 1)
-#define RWSEM_WR_NONSPINNABLE	(1UL << 2)
-#define RWSEM_NONSPINNABLE	(RWSEM_RD_NONSPINNABLE | RWSEM_WR_NONSPINNABLE)
+#define RWSEM_NONSPINNABLE	(1UL << 1)
 #define RWSEM_OWNER_FLAGS_MASK	(RWSEM_READER_OWNED | RWSEM_NONSPINNABLE)
 
 #ifdef CONFIG_DEBUG_RWSEMS
@@ -203,7 +165,7 @@ static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem,
 					    struct task_struct *owner)
 {
 	unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED |
-		(atomic_long_read(&sem->owner) & RWSEM_RD_NONSPINNABLE);
+		(atomic_long_read(&sem->owner) & RWSEM_NONSPINNABLE);
 
 	atomic_long_set(&sem->owner, val);
 }
@@ -472,10 +434,6 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
 		 * the reader is copied over.
 		 */
 		owner = waiter->task;
-		if (waiter->last_rowner & RWSEM_RD_NONSPINNABLE) {
-			owner = (void *)((unsigned long)owner | RWSEM_RD_NONSPINNABLE);
-			lockevent_inc(rwsem_opt_norspin);
-		}
 		__rwsem_set_reader_owned(sem, owner);
 	}
 
@@ -606,30 +564,6 @@ static inline bool rwsem_try_write_lock(struct rw_semaphore *sem,
 }
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
-/*
- * Try to acquire read lock before the reader is put on wait queue.
- * Lock acquisition isn't allowed if the rwsem is locked or a writer handoff
- * is ongoing.
- */
-static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
-{
-	long count = atomic_long_read(&sem->count);
-
-	if (count & (RWSEM_WRITER_MASK | RWSEM_FLAG_HANDOFF))
-		return false;
-
-	count = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
-	if (!(count & (RWSEM_WRITER_MASK | RWSEM_FLAG_HANDOFF))) {
-		rwsem_set_reader_owned(sem);
-		lockevent_inc(rwsem_opt_rlock);
-		return true;
-	}
-
-	/* Back out the change */
-	atomic_long_add(-RWSEM_READER_BIAS, &sem->count);
-	return false;
-}
-
 /*
  * Try to acquire write lock before the writer has been put on wait queue.
  */
@@ -641,7 +575,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 		if (atomic_long_try_cmpxchg_acquire(&sem->count, &count,
 					count | RWSEM_WRITER_LOCKED)) {
 			rwsem_set_owner(sem);
-			lockevent_inc(rwsem_opt_wlock);
+			lockevent_inc(rwsem_opt_lock);
 			return true;
 		}
 	}
@@ -657,8 +591,7 @@ static inline bool owner_on_cpu(struct task_struct *owner)
 	return owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 }
 
-static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,
-					   unsigned long nonspinnable)
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner;
 	unsigned long flags;
@@ -675,7 +608,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,
 	/*
 	 * Don't check the read-owner as the entry may be stale.
 	 */
-	if ((flags & nonspinnable) ||
+	if ((flags & RWSEM_NONSPINNABLE) ||
 	    (owner && !(flags & RWSEM_READER_OWNED) && !owner_on_cpu(owner)))
 		ret = false;
 	rcu_read_unlock();
@@ -705,9 +638,9 @@ enum owner_state {
 #define OWNER_SPINNABLE		(OWNER_NULL | OWNER_WRITER | OWNER_READER)
 
 static inline enum owner_state
-rwsem_owner_state(struct task_struct *owner, unsigned long flags, unsigned long nonspinnable)
+rwsem_owner_state(struct task_struct *owner, unsigned long flags)
 {
-	if (flags & nonspinnable)
+	if (flags & RWSEM_NONSPINNABLE)
 		return OWNER_NONSPINNABLE;
 
 	if (flags & RWSEM_READER_OWNED)
@@ -717,14 +650,14 @@ rwsem_owner_state(struct task_struct *owner, unsigned long flags, unsigned long
 }
 
 static noinline enum owner_state
-rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
+rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *new, *owner;
 	unsigned long flags, new_flags;
 	enum owner_state state;
 
 	owner = rwsem_owner_flags(sem, &flags);
-	state = rwsem_owner_state(owner, flags, nonspinnable);
+	state = rwsem_owner_state(owner, flags);
 	if (state != OWNER_WRITER)
 		return state;
 
@@ -738,7 +671,7 @@ rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
 		 */
 		new = rwsem_owner_flags(sem, &new_flags);
 		if ((new != owner) || (new_flags != flags)) {
-			state = rwsem_owner_state(new, new_flags, nonspinnable);
+			state = rwsem_owner_state(new, new_flags);
 			break;
 		}
 
@@ -787,14 +720,12 @@ static inline u64 rwsem_rspin_threshold(struct rw_semaphore *sem)
 	return sched_clock() + delta;
 }
 
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
+static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 {
 	bool taken = false;
 	int prev_owner_state = OWNER_NULL;
 	int loop = 0;
 	u64 rspin_threshold = 0;
-	unsigned long nonspinnable = wlock ? RWSEM_WR_NONSPINNABLE
-					   : RWSEM_RD_NONSPINNABLE;
 
 	preempt_disable();
 
@@ -811,15 +742,14 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 	for (;;) {
 		enum owner_state owner_state;
 
-		owner_state = rwsem_spin_on_owner(sem, nonspinnable);
+		owner_state = rwsem_spin_on_owner(sem);
 		if (!(owner_state & OWNER_SPINNABLE))
 			break;
 
 		/*
 		 * Try to acquire the lock
 		 */
-		taken = wlock ? rwsem_try_write_lock_unqueued(sem)
-			      : rwsem_try_read_lock_unqueued(sem);
+		taken = rwsem_try_write_lock_unqueued(sem);
 
 		if (taken)
 			break;
@@ -827,7 +757,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 		/*
 		 * Time-based reader-owned rwsem optimistic spinning
 		 */
-		if (wlock && (owner_state == OWNER_READER)) {
+		if (owner_state == OWNER_READER) {
 			/*
 			 * Re-initialize rspin_threshold every time when
 			 * the owner state changes from non-reader to reader.
@@ -836,7 +766,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 			 * the beginning of the 2nd reader phase.
 			 */
 			if (prev_owner_state != OWNER_READER) {
-				if (rwsem_test_oflags(sem, nonspinnable))
+				if (rwsem_test_oflags(sem, RWSEM_NONSPINNABLE))
 					break;
 				rspin_threshold = rwsem_rspin_threshold(sem);
 				loop = 0;
@@ -912,60 +842,13 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 }
 
 /*
- * Clear the owner's RWSEM_WR_NONSPINNABLE bit if it is set. This should
+ * Clear the owner's RWSEM_NONSPINNABLE bit if it is set. This should
  * only be called when the reader count reaches 0.
- *
- * This give writers better chance to acquire the rwsem first before
- * readers when the rwsem was being held by readers for a relatively long
- * period of time. Race can happen that an optimistic spinner may have
- * just stolen the rwsem and set the owner, but just clearing the
- * RWSEM_WR_NONSPINNABLE bit will do no harm anyway.
  */
-static inline void clear_wr_nonspinnable(struct rw_semaphore *sem)
+static inline void clear_nonspinnable(struct rw_semaphore *sem)
 {
-	if (rwsem_test_oflags(sem, RWSEM_WR_NONSPINNABLE))
-		atomic_long_andnot(RWSEM_WR_NONSPINNABLE, &sem->owner);
-}
-
-/*
- * This function is called when the reader fails to acquire the lock via
- * optimistic spinning. In this case we will still attempt to do a trylock
- * when comparing the rwsem state right now with the state when entering
- * the slowpath indicates that the reader is still in a valid reader phase.
- * This happens when the following conditions are true:
- *
- * 1) The lock is currently reader owned, and
- * 2) The lock is previously not reader-owned or the last read owner changes.
- *
- * In the former case, we have transitioned from a writer phase to a
- * reader-phase while spinning. In the latter case, it means the reader
- * phase hasn't ended when we entered the optimistic spinning loop. In
- * both cases, the reader is eligible to acquire the lock. This is the
- * secondary path where a read lock is acquired optimistically.
- *
- * The reader non-spinnable bit wasn't set at time of entry or it will
- * not be here at all.
- */
-static inline bool rwsem_reader_phase_trylock(struct rw_semaphore *sem,
-					      unsigned long last_rowner)
-{
-	unsigned long owner = atomic_long_read(&sem->owner);
-
-	if (!(owner & RWSEM_READER_OWNED))
-		return false;
-
-	if (((owner ^ last_rowner) & ~RWSEM_OWNER_FLAGS_MASK) &&
-	    rwsem_try_read_lock_unqueued(sem)) {
-		lockevent_inc(rwsem_opt_rlock2);
-		lockevent_add(rwsem_opt_fail, -1);
-		return true;
-	}
-	return false;
-}
-
-static inline bool osq_is_empty(struct rw_semaphore *sem)
-{
-	return !osq_is_locked(&sem->osq);
+	if (rwsem_test_oflags(sem, RWSEM_NONSPINNABLE))
+		atomic_long_andnot(RWSEM_NONSPINNABLE, &sem->owner);
 }
 
 #else
@@ -980,20 +863,10 @@ static inline bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
 	return false;
 }
 
-static inline void clear_wr_nonspinnable(struct rw_semaphore *sem) { }
-
-static inline bool rwsem_reader_phase_trylock(struct rw_semaphore *sem,
-					      unsigned long last_rowner)
-{
-	return false;
-}
+static inline void clear_nonspinnable(struct rw_semaphore *sem) { }
 
-static inline bool osq_is_empty(sem)
-{
-	return false;
-}
 static inline int
-rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
+rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
 	return 0;
 }
@@ -1006,7 +879,7 @@ rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
 static struct rw_semaphore __sched *
 rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 {
-	long owner, adjustment = -RWSEM_READER_BIAS;
+	long adjustment = -RWSEM_READER_BIAS;
 	long rcnt = (count >> RWSEM_READER_SHIFT);	/* Reader count */
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
@@ -1014,12 +887,11 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 
 	/*
 	 * To prevent a constant stream of readers from starving a sleeping
-	 * waiter, don't attempt optimistic spinning if the lock is currently
-	 * owned by readers.
+	 * waiter, don't attempt optimistic lock stealing if the lock is
+	 * currently owned by readers.
 	 */
-	owner = atomic_long_read(&sem->owner);
-	if ((owner & RWSEM_READER_OWNED) && (rcnt > 1) &&
-	   !(count & RWSEM_WRITER_LOCKED))
+	if ((atomic_long_read(&sem->owner) & RWSEM_READER_OWNED) &&
+	    (rcnt > 1) && !(count & RWSEM_WRITER_LOCKED))
 		goto queue;
 
 	/*
@@ -1027,40 +899,16 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 	 *
 	 * We can take the read lock directly without doing
 	 * rwsem_optimistic_spin() if the conditions are right.
-	 * Also wake up other readers if it is the first reader.
 	 */
-	if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF)) &&
-	    osq_is_empty(sem)) {
+	if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF))) {
 		rwsem_set_reader_owned(sem);
 		lockevent_inc(rwsem_rlock_steal);
-		if (rcnt == 1)
-			goto wake_readers;
-		return sem;
-	}
 
-	/*
-	 * Save the current read-owner of rwsem, if available, and the
-	 * reader nonspinnable bit.
-	 */
-	waiter.last_rowner = owner;
-	if (!(waiter.last_rowner & RWSEM_READER_OWNED))
-		waiter.last_rowner &= RWSEM_RD_NONSPINNABLE;
-
-	if (!rwsem_can_spin_on_owner(sem, RWSEM_RD_NONSPINNABLE))
-		goto queue;
-
-	/*
-	 * Undo read bias from down_read() and do optimistic spinning.
-	 */
-	atomic_long_add(-RWSEM_READER_BIAS, &sem->count);
-	adjustment = 0;
-	if (rwsem_optimistic_spin(sem, false)) {
-		/* rwsem_optimistic_spin() implies ACQUIRE on success */
 		/*
-		 * Wake up other readers in the wait queue.
+		 * Wake up other readers in the wait queue if it is
+		 * the first reader.
 		 */
-wake_readers:
-		if ((atomic_long_read(&sem->count) & RWSEM_FLAG_WAITERS)) {
+		if ((rcnt == 1) && (count & RWSEM_FLAG_WAITERS)) {
 			raw_spin_lock_irq(&sem->wait_lock);
 			if (!list_empty(&sem->wait_list))
 				rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED,
@@ -1069,9 +917,6 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 			wake_up_q(&wake_q);
 		}
 		return sem;
-	} else if (rwsem_reader_phase_trylock(sem, waiter.last_rowner)) {
-		/* rwsem_reader_phase_trylock() implies ACQUIRE on success */
-		return sem;
 	}
 
 queue:
@@ -1087,7 +932,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 		 * exit the slowpath and return immediately as its
 		 * RWSEM_READER_BIAS has already been set in the count.
 		 */
-		if (adjustment && !(atomic_long_read(&sem->count) &
+		if (!(atomic_long_read(&sem->count) &
 		     (RWSEM_WRITER_MASK | RWSEM_FLAG_HANDOFF))) {
 			/* Provide lock ACQUIRE */
 			smp_acquire__after_ctrl_dep();
@@ -1101,10 +946,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-	if (adjustment)
-		count = atomic_long_add_return(adjustment, &sem->count);
-	else
-		count = atomic_long_read(&sem->count);
+	count = atomic_long_add_return(adjustment, &sem->count);
 
 	/*
 	 * If there are no active locks, wake the front queued process(es).
@@ -1113,7 +955,7 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 	 * wake our own waiter to join the existing active readers !
 	 */
 	if (!(count & RWSEM_LOCK_MASK)) {
-		clear_wr_nonspinnable(sem);
+		clear_nonspinnable(sem);
 		wake = true;
 	}
 	if (wake || (!(count & RWSEM_WRITER_MASK) &&
@@ -1158,19 +1000,6 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
 	return ERR_PTR(-EINTR);
 }
 
-/*
- * This function is called by the a write lock owner. So the owner value
- * won't get changed by others.
- */
-static inline void rwsem_disable_reader_optspin(struct rw_semaphore *sem,
-						bool disable)
-{
-	if (unlikely(disable)) {
-		atomic_long_or(RWSEM_RD_NONSPINNABLE, &sem->owner);
-		lockevent_inc(rwsem_opt_norspin);
-	}
-}
-
 /*
  * Wait until we successfully acquire the write lock
  */
@@ -1178,26 +1007,17 @@ static struct rw_semaphore *
 rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 {
 	long count;
-	bool disable_rspin;
 	enum writer_wait_state wstate;
 	struct rwsem_waiter waiter;
 	struct rw_semaphore *ret = sem;
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_can_spin_on_owner(sem, RWSEM_WR_NONSPINNABLE) &&
-	    rwsem_optimistic_spin(sem, true)) {
+	if (rwsem_can_spin_on_owner(sem) && rwsem_optimistic_spin(sem)) {
 		/* rwsem_optimistic_spin() implies ACQUIRE on success */
 		return sem;
 	}
 
-	/*
-	 * Disable reader optimistic spinning for this rwsem after
-	 * acquiring the write lock when the setting of the nonspinnable
-	 * bits are observed.
-	 */
-	disable_rspin = atomic_long_read(&sem->owner) & RWSEM_NONSPINNABLE;
-
 	/*
 	 * Optimistic spinning failed, proceed to the slowpath
 	 * and block until we can acquire the sem.
@@ -1266,7 +1086,7 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 		 * without sleeping.
 		 */
 		if (wstate == WRITER_HANDOFF &&
-		    rwsem_spin_on_owner(sem, RWSEM_NONSPINNABLE) == OWNER_NULL)
+		    rwsem_spin_on_owner(sem) == OWNER_NULL)
 			goto trylock_again;
 
 		/* Block until there are no active lockers. */
@@ -1308,7 +1128,6 @@ rwsem_down_write_slowpath(struct rw_semaphore *sem, int state)
 	}
 	__set_current_state(TASK_RUNNING);
 	list_del(&waiter.list);
-	rwsem_disable_reader_optspin(sem, disable_rspin);
 	raw_spin_unlock_irq(&sem->wait_lock);
 	lockevent_inc(rwsem_wlock);
 
@@ -1481,7 +1300,7 @@ static inline void __up_read(struct rw_semaphore *sem)
 	DEBUG_RWSEMS_WARN_ON(tmp < 0, sem);
 	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS)) ==
 		      RWSEM_FLAG_WAITERS)) {
-		clear_wr_nonspinnable(sem);
+		clear_nonspinnable(sem);
 		rwsem_wake(sem, tmp);
 	}
 }
-- 
2.18.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Wake up all waiting readers if RWSEM_WAKE_READ_OWNED
  2020-11-18  3:04 ` [PATCH 4/5] locking/rwsem: Wake up all waiting readers if RWSEM_WAKE_READ_OWNED Waiman Long
@ 2020-11-18  4:53   ` Davidlohr Bueso
  2020-11-19 18:37     ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Davidlohr Bueso @ 2020-11-18  4:53 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Phil Auld

On Tue, 17 Nov 2020, Waiman Long wrote:

>The rwsem wakeup logic has been modified by commit d3681e269fff
>("locking/rwsem: Wake up almost all readers in wait queue") to wake up
>all readers in the wait queue if the first waiter is a reader. In the
>case of RWSEM_WAKE_READ_OWNED, not all readers can be woken up if the
>first waiter happens to be a writer. Complete the logic by waking up
>all readers even for this case.

While rwsems are certainly not fifo, I'm concerned this would give too
much priority to the readers by having the reader owned lock just skip
over the first waiter. And I'd say most users are more concerned about
the writer side. Basically this would affect the phase-fair properties.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-18  3:04 ` [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning Waiman Long
@ 2020-11-18  5:35   ` Davidlohr Bueso
  2020-11-19 18:40     ` Waiman Long
  2020-11-20 14:44     ` Peter Zijlstra
  2020-11-20 14:42   ` Peter Zijlstra
  1 sibling, 2 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2020-11-18  5:35 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Phil Auld

On Tue, 17 Nov 2020, Waiman Long wrote:

>The column "CS Load" represents the number of pause instructions issued
>in the locking critical section. A CS load of 1 is extremely short and
>is not likey in real situations. A load of 20 (moderate) and 100 (long)
>are more realistic.
>
>It can be seen that the previous patches in this series have reduced
>performance in general except in highly contended cases with moderate
>or long critical sections that performance improves a bit. This change
>is mostly caused by the "Prevent potential lock starvation" patch that
>reduce reader optimistic spinning and hence reduce reader fragmentation.
>
>The patch that further limit reader optimistic spinning doesn't seem to
>have too much impact on overall performance as shown in the benchmark
>data.
>
>The patch that disables reader optimistic spinning shows reduced
>performance at lightly loaded cases, but comparable or slightly better
>performance on with heavier contention.

I'm not overly worried about the lightly loaded cases here as the users
(mostly thinking mmap_sem) most likely won't care for real workloads,
not, ie: will-it-scale type things.

So at SUSE we also ran into this very same problem with reader optimistic
spinning and considering the fragmentation went with disabling it, much
like this patch - but without the reader optimistic lock stealing bits
you have. So far nothing has really shown to fall out in our performance
automation. And per your data a single reader spinner does not seem to be
worth the added complexity of keeping reader spinning vs ripping it out.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/5] locking/rwsem: Wake up all waiting readers if RWSEM_WAKE_READ_OWNED
  2020-11-18  4:53   ` Davidlohr Bueso
@ 2020-11-19 18:37     ` Waiman Long
  0 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2020-11-19 18:37 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Phil Auld

On 11/17/20 11:53 PM, Davidlohr Bueso wrote:
> On Tue, 17 Nov 2020, Waiman Long wrote:
>
>> The rwsem wakeup logic has been modified by commit d3681e269fff
>> ("locking/rwsem: Wake up almost all readers in wait queue") to wake up
>> all readers in the wait queue if the first waiter is a reader. In the
>> case of RWSEM_WAKE_READ_OWNED, not all readers can be woken up if the
>> first waiter happens to be a writer. Complete the logic by waking up
>> all readers even for this case.
>
> While rwsems are certainly not fifo, I'm concerned this would give too
> much priority to the readers by having the reader owned lock just skip
> over the first waiter. And I'd say most users are more concerned about
> the writer side. Basically this would affect the phase-fair properties. 

The idea of phase-fair is that when a reader acquires the lock, all the 
current readers are allowed to join. Other readers that come after that 
will not be allowed to join the read phase until the next round. In that 
sense, waking up all readers in the wait queue doesn't violate this 
fact. Patch 2 will guarantee  the later constraint though it has the 
exception that if the reader count reach 0, it will allow reader to 
proceed. I am relying on the handoff mechanism to make sure that there 
will be no lock starvation.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-18  5:35   ` Davidlohr Bueso
@ 2020-11-19 18:40     ` Waiman Long
  2020-11-20 13:11       ` David Laight
  2020-11-20 14:44     ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Waiman Long @ 2020-11-19 18:40 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Phil Auld

On 11/18/20 12:35 AM, Davidlohr Bueso wrote:
> On Tue, 17 Nov 2020, Waiman Long wrote:
>
>> The column "CS Load" represents the number of pause instructions issued
>> in the locking critical section. A CS load of 1 is extremely short and
>> is not likey in real situations. A load of 20 (moderate) and 100 (long)
>> are more realistic.
>>
>> It can be seen that the previous patches in this series have reduced
>> performance in general except in highly contended cases with moderate
>> or long critical sections that performance improves a bit. This change
>> is mostly caused by the "Prevent potential lock starvation" patch that
>> reduce reader optimistic spinning and hence reduce reader fragmentation.
>>
>> The patch that further limit reader optimistic spinning doesn't seem to
>> have too much impact on overall performance as shown in the benchmark
>> data.
>>
>> The patch that disables reader optimistic spinning shows reduced
>> performance at lightly loaded cases, but comparable or slightly better
>> performance on with heavier contention.
>
> I'm not overly worried about the lightly loaded cases here as the users
> (mostly thinking mmap_sem) most likely won't care for real workloads,
> not, ie: will-it-scale type things.
I am not that worry about the lightly loaded cases either. I just state 
the fact that some workloads may see a slightly reduced performance 
because of that.
>
> So at SUSE we also ran into this very same problem with reader optimistic
> spinning and considering the fragmentation went with disabling it, much
> like this patch - but without the reader optimistic lock stealing bits
> you have. So far nothing has really shown to fall out in our performance
> automation. And per your data a single reader spinner does not seem to be
> worth the added complexity of keeping reader spinning vs ripping it out. 

My own testing also show not too much performance difference when 
removing reader spinning except in the lightly loaded cases.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-19 18:40     ` Waiman Long
@ 2020-11-20 13:11       ` David Laight
  2020-11-20 17:04         ` Waiman Long
  2020-11-20 21:38         ` Davidlohr Bueso
  0 siblings, 2 replies; 23+ messages in thread
From: David Laight @ 2020-11-20 13:11 UTC (permalink / raw)
  To: 'Waiman Long', Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Phil Auld

From: Waiman Long
> Sent: 19 November 2020 18:40
...
> My own testing also show not too much performance difference when
> removing reader spinning except in the lightly loaded cases.

I'm confused.

I got massive performance improvements from changing a driver
we have to use mutex instead of the old semaphores (the driver
was written a long time ago).

While these weren't 'rw' the same issue will apply.

The problem was that the semaphore/mutex was typically only held over
a few instructions (eg to add an item to a list).
But with semaphore if you got contention the process always slept.
OTOH mutex spin 'for a while' before sleeping so the code rarely slept.

So I really expect that readers need to spin (for a while) if
a rwsem (etc) is locked for writing.

Clearly you really need a CBU (Crystal Ball Unit) to work out
whether to spin or not.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] locking/rwsem: Enable reader optimistic lock stealing
  2020-11-18  3:04 ` [PATCH 3/5] locking/rwsem: Enable reader optimistic lock stealing Waiman Long
@ 2020-11-20 14:36   ` Peter Zijlstra
  2020-11-20 17:26     ` Waiman Long
  2020-12-08  3:53   ` Davidlohr Bueso
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-11-20 14:36 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, linux-kernel, Davidlohr Bueso, Phil Auld

On Tue, Nov 17, 2020 at 10:04:27PM -0500, Waiman Long wrote:
> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
> index ee374ae061c3..930dd4af3639 100644
> --- a/kernel/locking/rwsem.c
> +++ b/kernel/locking/rwsem.c
> @@ -957,6 +957,12 @@ static inline bool rwsem_reader_phase_trylock(struct rw_semaphore *sem,
>  	}
>  	return false;
>  }
> +
> +static inline bool osq_is_empty(struct rw_semaphore *sem)
> +{
> +	return !osq_is_locked(&sem->osq);
> +}
> +
>  #else
>  static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,
>  					   unsigned long nonspinnable)
> @@ -977,6 +983,10 @@ static inline bool rwsem_reader_phase_trylock(struct rw_semaphore *sem,
>  	return false;
>  }
>  
> +static inline bool osq_is_empty(sem)
> +{
> +	return false;
> +}

Hurph, the naming seems to suggest this ought to be in osq_lock.h, but
it really is part of rwsem, it captures the lack of osq member for this
configuration.

How about: rwsem_no_spinners() instead ?

>  static inline int
>  rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
>  {
> @@ -1007,6 +1017,22 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
>  	   !(count & RWSEM_WRITER_LOCKED))
>  		goto queue;
>  
> +	/*
> +	 * Reader optimistic lock stealing
> +	 *
> +	 * We can take the read lock directly without doing
> +	 * rwsem_optimistic_spin() if the conditions are right.
> +	 * Also wake up other readers if it is the first reader.
> +	 */
> +	if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF)) &&
> +	    osq_is_empty(sem)) {
> +		rwsem_set_reader_owned(sem);
> +		lockevent_inc(rwsem_rlock_steal);
> +		if (rcnt == 1)
> +			goto wake_readers;
> +		return sem;
> +	}

AFAICT this saves at least 3 atomic ops; how common is this case
(you did add a counter but forgot to mention this).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-18  3:04 ` [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning Waiman Long
  2020-11-18  5:35   ` Davidlohr Bueso
@ 2020-11-20 14:42   ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2020-11-20 14:42 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, linux-kernel, Davidlohr Bueso, Phil Auld

On Tue, Nov 17, 2020 at 10:04:29PM -0500, Waiman Long wrote:
> -static inline bool osq_is_empty(struct rw_semaphore *sem)
> -{
> -	return !osq_is_locked(&sem->osq);

> -static inline bool osq_is_empty(sem)
> -{
> -	return false;
> -}

Oh, it doesn't live...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-18  5:35   ` Davidlohr Bueso
  2020-11-19 18:40     ` Waiman Long
@ 2020-11-20 14:44     ` Peter Zijlstra
  2020-11-20 22:39       ` Waiman Long
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-11-20 14:44 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Waiman Long, Ingo Molnar, Will Deacon, linux-kernel, Phil Auld

On Tue, Nov 17, 2020 at 09:35:56PM -0800, Davidlohr Bueso wrote:
> On Tue, 17 Nov 2020, Waiman Long wrote:
> 
> > The column "CS Load" represents the number of pause instructions issued
> > in the locking critical section. A CS load of 1 is extremely short and
> > is not likey in real situations. A load of 20 (moderate) and 100 (long)
> > are more realistic.
> > 
> > It can be seen that the previous patches in this series have reduced
> > performance in general except in highly contended cases with moderate
> > or long critical sections that performance improves a bit. This change
> > is mostly caused by the "Prevent potential lock starvation" patch that
> > reduce reader optimistic spinning and hence reduce reader fragmentation.
> > 
> > The patch that further limit reader optimistic spinning doesn't seem to
> > have too much impact on overall performance as shown in the benchmark
> > data.
> > 
> > The patch that disables reader optimistic spinning shows reduced
> > performance at lightly loaded cases, but comparable or slightly better
> > performance on with heavier contention.
> 
> I'm not overly worried about the lightly loaded cases here as the users
> (mostly thinking mmap_sem) most likely won't care for real workloads,
> not, ie: will-it-scale type things.
> 
> So at SUSE we also ran into this very same problem with reader optimistic
> spinning and considering the fragmentation went with disabling it, much
> like this patch - but without the reader optimistic lock stealing bits
> you have. So far nothing has really shown to fall out in our performance
> automation. And per your data a single reader spinner does not seem to be
> worth the added complexity of keeping reader spinning vs ripping it out.

I'm fine with ripping it... It was finnicky to begin with.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] locking/rwsem: Prevent potential lock starvation
  2020-11-18  3:04 ` [PATCH 2/5] locking/rwsem: Prevent potential lock starvation Waiman Long
@ 2020-11-20 14:44   ` Peter Zijlstra
  2020-11-20 17:27     ` Waiman Long
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2020-11-20 14:44 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, Will Deacon, linux-kernel, Davidlohr Bueso, Phil Auld

On Tue, Nov 17, 2020 at 10:04:26PM -0500, Waiman Long wrote:
> +	long rcnt = (count >> RWSEM_READER_SHIFT);	/* Reader count */

I'm thinking you can do without that comment, the variable name is clear
enough.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-20 13:11       ` David Laight
@ 2020-11-20 17:04         ` Waiman Long
  2020-11-20 17:37           ` David Laight
  2020-11-20 21:38         ` Davidlohr Bueso
  1 sibling, 1 reply; 23+ messages in thread
From: Waiman Long @ 2020-11-20 17:04 UTC (permalink / raw)
  To: David Laight, Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Phil Auld

On 11/20/20 8:11 AM, David Laight wrote:
> From: Waiman Long
>> Sent: 19 November 2020 18:40
> ...
>> My own testing also show not too much performance difference when
>> removing reader spinning except in the lightly loaded cases.
> I'm confused.
>
> I got massive performance improvements from changing a driver
> we have to use mutex instead of the old semaphores (the driver
> was written a long time ago).
>
> While these weren't 'rw' the same issue will apply.
>
> The problem was that the semaphore/mutex was typically only held over
> a few instructions (eg to add an item to a list).
> But with semaphore if you got contention the process always slept.
> OTOH mutex spin 'for a while' before sleeping so the code rarely slept.
>
> So I really expect that readers need to spin (for a while) if
> a rwsem (etc) is locked for writing.
>
> Clearly you really need a CBU (Crystal Ball Unit) to work out
> whether to spin or not.

That is the hard part. For short critical section and not many readers 
around, making the readers spin will likely improve performance. On the 
other hand, if the critical section is long with many readers, make 
readers sleep and then wake them all up at once can have better 
performance. There is no one-size-fit-all solution here.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] locking/rwsem: Enable reader optimistic lock stealing
  2020-11-20 14:36   ` Peter Zijlstra
@ 2020-11-20 17:26     ` Waiman Long
  0 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2020-11-20 17:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, linux-kernel, Davidlohr Bueso, Phil Auld

On 11/20/20 9:36 AM, Peter Zijlstra wrote:
> On Tue, Nov 17, 2020 at 10:04:27PM -0500, Waiman Long wrote:
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index ee374ae061c3..930dd4af3639 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -957,6 +957,12 @@ static inline bool rwsem_reader_phase_trylock(struct rw_semaphore *sem,
>>   	}
>>   	return false;
>>   }
>> +
>> +static inline bool osq_is_empty(struct rw_semaphore *sem)
>> +{
>> +	return !osq_is_locked(&sem->osq);
>> +}
>> +
>>   #else
>>   static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,
>>   					   unsigned long nonspinnable)
>> @@ -977,6 +983,10 @@ static inline bool rwsem_reader_phase_trylock(struct rw_semaphore *sem,
>>   	return false;
>>   }
>>   
>> +static inline bool osq_is_empty(sem)
>> +{
>> +	return false;
>> +}
> Hurph, the naming seems to suggest this ought to be in osq_lock.h, but
> it really is part of rwsem, it captures the lack of osq member for this
> configuration.
>
> How about: rwsem_no_spinners() instead ?
Yes, sure. Will make the name change.
>
>>   static inline int
>>   rwsem_spin_on_owner(struct rw_semaphore *sem, unsigned long nonspinnable)
>>   {
>> @@ -1007,6 +1017,22 @@ rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
>>   	   !(count & RWSEM_WRITER_LOCKED))
>>   		goto queue;
>>   
>> +	/*
>> +	 * Reader optimistic lock stealing
>> +	 *
>> +	 * We can take the read lock directly without doing
>> +	 * rwsem_optimistic_spin() if the conditions are right.
>> +	 * Also wake up other readers if it is the first reader.
>> +	 */
>> +	if (!(count & (RWSEM_WRITER_LOCKED | RWSEM_FLAG_HANDOFF)) &&
>> +	    osq_is_empty(sem)) {
>> +		rwsem_set_reader_owned(sem);
>> +		lockevent_inc(rwsem_rlock_steal);
>> +		if (rcnt == 1)
>> +			goto wake_readers;
>> +		return sem;
>> +	}
> AFAICT this saves at least 3 atomic ops; how common is this case
> (you did add a counter but forgot to mention this).
>
Right, I should have mentioned the counter results.

Below is the relevant counter stats for a test system that have been up 
for more than 21 hours:

rwsem_opt_rlock=11792583 (optmistically acquired read lock)
rwsem_rlock=193357272 (slowpath acquired read lock)
rwsem_rlock_steal=44795149 (lock stealing)

So lock stealing represents about 17.9% of the total read lock acquired 
in non-fast path. I ran some microbenchmark test on the system before, 
so it may skew a bit to the high side. Anyway, this is not an 
insignificant amount.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/5] locking/rwsem: Prevent potential lock starvation
  2020-11-20 14:44   ` Peter Zijlstra
@ 2020-11-20 17:27     ` Waiman Long
  0 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2020-11-20 17:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, linux-kernel, Davidlohr Bueso, Phil Auld

On 11/20/20 9:44 AM, Peter Zijlstra wrote:
> On Tue, Nov 17, 2020 at 10:04:26PM -0500, Waiman Long wrote:
>> +	long rcnt = (count >> RWSEM_READER_SHIFT);	/* Reader count */
> I'm thinking you can do without that comment, the variable name is clear
> enough.
>
Sure.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-20 17:04         ` Waiman Long
@ 2020-11-20 17:37           ` David Laight
  0 siblings, 0 replies; 23+ messages in thread
From: David Laight @ 2020-11-20 17:37 UTC (permalink / raw)
  To: 'Waiman Long', Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Phil Auld

From: Waiman Long
> Sent: 20 November 2020 17:04
> 
> On 11/20/20 8:11 AM, David Laight wrote:
> > From: Waiman Long
> >> Sent: 19 November 2020 18:40
> > ...
> >> My own testing also show not too much performance difference when
> >> removing reader spinning except in the lightly loaded cases.
> > I'm confused.
> >
> > I got massive performance improvements from changing a driver
> > we have to use mutex instead of the old semaphores (the driver
> > was written a long time ago).
> >
> > While these weren't 'rw' the same issue will apply.
> >
> > The problem was that the semaphore/mutex was typically only held over
> > a few instructions (eg to add an item to a list).
> > But with semaphore if you got contention the process always slept.
> > OTOH mutex spin 'for a while' before sleeping so the code rarely slept.
> >
> > So I really expect that readers need to spin (for a while) if
> > a rwsem (etc) is locked for writing.
> >
> > Clearly you really need a CBU (Crystal Ball Unit) to work out
> > whether to spin or not.
> 
> That is the hard part. For short critical section and not many readers
> around, making the readers spin will likely improve performance. On the
> other hand, if the critical section is long with many readers, make
> readers sleep and then wake them all up at once can have better
> performance. There is no one-size-fit-all solution here.

Do the readers actually all wake up at the same time?
rwsem might be special, but if I cv_broadcast a userspace cv
then only one thread is woken.
Once it runs the next one is woken.
This is horrid if you actually want them all to run:
- It takes ages for the target cpu to come out of a low-power state.
- RT processes don't get scheduled if the cpu they last ran on is
  'busy' in kernel.

I can't see why the number of readers is relevant.
They are more likely to start in 'lockstep' if they spin.
(Which I think is what you say is best).

You may want per-rwsem option of how long to spin.
Although there are probably only 2 useful values - 0 and lots.

Are there rw spinlocks?
They can be much better is the critical sections are short.
Especially if they really are short and RT kernels don't
break everything my making the sleep.

I was fixing some userspace code that does a lot of channels of
audio processing.
You can't afford to take a mutex because an interrupt might
come in while you hold it - stopping all the other threads
obtaining the same mutex.
One thread stopping is fine, but having all the threads stop
leads to processing overrun.
Since you can't disable interrupts in userspace (for a spinlock)
I had to replace locked linked lists with arrays indexed by
atomic counters.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-20 13:11       ` David Laight
  2020-11-20 17:04         ` Waiman Long
@ 2020-11-20 21:38         ` Davidlohr Bueso
  2020-11-21 11:50           ` David Laight
  1 sibling, 1 reply; 23+ messages in thread
From: Davidlohr Bueso @ 2020-11-20 21:38 UTC (permalink / raw)
  To: David Laight
  Cc: 'Waiman Long',
	Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel,
	Phil Auld

On Fri, 20 Nov 2020, David Laight wrote:
>I got massive performance improvements from changing a driver
>we have to use mutex instead of the old semaphores (the driver
>was written a long time ago).
>
>While these weren't 'rw' the same issue will apply.
>
>The problem was that the semaphore/mutex was typically only held over
>a few instructions (eg to add an item to a list).
>But with semaphore if you got contention the process always slept.
>OTOH mutex spin 'for a while' before sleeping so the code rarely slept.

The caveat here is if you are using trylock/unlock from irq, which
is the only reason why regular semaphores are still around today. If
not, indeed a mutex is better.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-20 14:44     ` Peter Zijlstra
@ 2020-11-20 22:39       ` Waiman Long
  0 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2020-11-20 22:39 UTC (permalink / raw)
  To: Peter Zijlstra, Davidlohr Bueso
  Cc: Ingo Molnar, Will Deacon, linux-kernel, Phil Auld

On 11/20/20 9:44 AM, Peter Zijlstra wrote:
> On Tue, Nov 17, 2020 at 09:35:56PM -0800, Davidlohr Bueso wrote:
>> On Tue, 17 Nov 2020, Waiman Long wrote:
>>
>>> The column "CS Load" represents the number of pause instructions issued
>>> in the locking critical section. A CS load of 1 is extremely short and
>>> is not likey in real situations. A load of 20 (moderate) and 100 (long)
>>> are more realistic.
>>>
>>> It can be seen that the previous patches in this series have reduced
>>> performance in general except in highly contended cases with moderate
>>> or long critical sections that performance improves a bit. This change
>>> is mostly caused by the "Prevent potential lock starvation" patch that
>>> reduce reader optimistic spinning and hence reduce reader fragmentation.
>>>
>>> The patch that further limit reader optimistic spinning doesn't seem to
>>> have too much impact on overall performance as shown in the benchmark
>>> data.
>>>
>>> The patch that disables reader optimistic spinning shows reduced
>>> performance at lightly loaded cases, but comparable or slightly better
>>> performance on with heavier contention.
>> I'm not overly worried about the lightly loaded cases here as the users
>> (mostly thinking mmap_sem) most likely won't care for real workloads,
>> not, ie: will-it-scale type things.
>>
>> So at SUSE we also ran into this very same problem with reader optimistic
>> spinning and considering the fragmentation went with disabling it, much
>> like this patch - but without the reader optimistic lock stealing bits
>> you have. So far nothing has really shown to fall out in our performance
>> automation. And per your data a single reader spinner does not seem to be
>> worth the added complexity of keeping reader spinning vs ripping it out.
> I'm fine with ripping it... It was finnicky to begin with.
>
Good to know. I am going to sent out v2 with some update commit logs and 
some !CONFIG_RWSEM_SPIN_ON_OWNER fixes.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning
  2020-11-20 21:38         ` Davidlohr Bueso
@ 2020-11-21 11:50           ` David Laight
  0 siblings, 0 replies; 23+ messages in thread
From: David Laight @ 2020-11-21 11:50 UTC (permalink / raw)
  To: 'Davidlohr Bueso'
  Cc: 'Waiman Long',
	Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel,
	Phil Auld

From: Davidlohr Bueso
> Sent: 20 November 2020 21:38
> 
> On Fri, 20 Nov 2020, David Laight wrote:
> >I got massive performance improvements from changing a driver
> >we have to use mutex instead of the old semaphores (the driver
> >was written a long time ago).
> >
> >While these weren't 'rw' the same issue will apply.
> >
> >The problem was that the semaphore/mutex was typically only held over
> >a few instructions (eg to add an item to a list).
> >But with semaphore if you got contention the process always slept.
> >OTOH mutex spin 'for a while' before sleeping so the code rarely slept.
> 
> The caveat here is if you are using trylock/unlock from irq, which
> is the only reason why regular semaphores are still around today. If
> not, indeed a mutex is better.

Unless you want to timeout the lock request.

Timeouts are particularly useful in code paths that might
run after an 'oops' or other deadlock.
Typically for reporting status information.
You get to choose whether to error the status request or
carry on knowing that the data is unlikely to change.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/5] locking/rwsem: Enable reader optimistic lock stealing
  2020-11-18  3:04 ` [PATCH 3/5] locking/rwsem: Enable reader optimistic lock stealing Waiman Long
  2020-11-20 14:36   ` Peter Zijlstra
@ 2020-12-08  3:53   ` Davidlohr Bueso
  1 sibling, 0 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2020-12-08  3:53 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel, Phil Auld

On Tue, 17 Nov 2020, Waiman Long wrote:

>If the optimistic spinning queue is empty and the rwsem does not have
>the handoff or write-lock bits set, it is actually not necessary to
>call rwsem_optimistic_spin() to spin on it. Instead, it can steal the
>lock directly as its reader bias is in the count already.  If it is
>the first reader in this state, it will try to wake up other readers
>in the wait queue.
>
>Signed-off-by: Waiman Long <longman@redhat.com>

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-12-08  4:19 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18  3:04 [PATCH 0/5] locking/rwsem: Rework reader optimistic spinning Waiman Long
2020-11-18  3:04 ` [PATCH 1/5] locking/rwsem: Pass the current atomic count to rwsem_down_read_slowpath() Waiman Long
2020-11-18  3:04 ` [PATCH 2/5] locking/rwsem: Prevent potential lock starvation Waiman Long
2020-11-20 14:44   ` Peter Zijlstra
2020-11-20 17:27     ` Waiman Long
2020-11-18  3:04 ` [PATCH 3/5] locking/rwsem: Enable reader optimistic lock stealing Waiman Long
2020-11-20 14:36   ` Peter Zijlstra
2020-11-20 17:26     ` Waiman Long
2020-12-08  3:53   ` Davidlohr Bueso
2020-11-18  3:04 ` [PATCH 4/5] locking/rwsem: Wake up all waiting readers if RWSEM_WAKE_READ_OWNED Waiman Long
2020-11-18  4:53   ` Davidlohr Bueso
2020-11-19 18:37     ` Waiman Long
2020-11-18  3:04 ` [RFC PATCH 5/5] locking/rwsem: Remove reader optimistic spinning Waiman Long
2020-11-18  5:35   ` Davidlohr Bueso
2020-11-19 18:40     ` Waiman Long
2020-11-20 13:11       ` David Laight
2020-11-20 17:04         ` Waiman Long
2020-11-20 17:37           ` David Laight
2020-11-20 21:38         ` Davidlohr Bueso
2020-11-21 11:50           ` David Laight
2020-11-20 14:44     ` Peter Zijlstra
2020-11-20 22:39       ` Waiman Long
2020-11-20 14:42   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).