All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
@ 2017-10-11 18:01 ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

v5->v6:
 - Reworked the locking algorithm to make it similar to qrwlock.
 - Removed all the architecture specific code & use only generic code.
 - Added waiter lock handoff and time-based reader lock stealing.

v4->v5:
 - Drop the OSQ patch, the need to increase the size of the rwsem
   structure and the autotuning mechanism.
 - Add an intermediate patch to enable readers spinning on writer.
 - Other miscellaneous changes and optimizations.

v3->v4:
 - Rebased to the latest tip tree due to changes to rwsem-xadd.c.
 - Update the OSQ patch to fix race condition.

v2->v3:
 - Used smp_acquire__after_ctrl_dep() to provide acquire barrier.
 - Added the following new patches:
   1) make rwsem_spin_on_owner() return a tristate value.
   2) reactivate reader spinning when there is a large number of
      favorable writer-on-writer spinnings.
   3) move all the rwsem macros in arch-specific rwsem.h files
      into a common asm-generic/rwsem_types.h file.
   4) add a boot parameter to specify the reader spinning threshold.
 - Updated some of the patches as suggested by PeterZ and adjusted
   some of the reader spinning parameters.

v1->v2:
 - Fixed a 0day build error.
 - Added a new patch 1 to make osq_lock() a proper acquire memory
   barrier.
 - Replaced the explicit enabling of reader spinning by an autotuning
   mechanism that disable reader spinning for those rwsems that may
   not benefit from reader spinning.
 - Remove the last xfs patch as it is no longer necessary.

v4: https://lkml.org/lkml/2016/8/18/1039
v5: https://lkml.org/lkml/2017/6/1/841

This patchset revamps the current rwsem-xadd implmentation to make
it saner and easier to work with. This patchset also implements the
following 3 new features:

 1) Waiter lock handoff
 2) Reader optimistic spinning
 3) Time-based reader lock stealing

With these changes, performance on workloads with a mix of readers
and writers will improve substantially. Now rwsem will also become
reader-preferring instead of writer-preferring, which is usually good
for performance purpose.

This patchset also uses generic code for all architectures, thus
all the architecture specific assembly codes can be removed easing
maintenance.

Patch 1 moves down the rwsem_down_read_failed() function for later
patches.

Patch 2 reworks the rwsem-xadd locking and unlocking codes to use
an algorithm somewhat similar to what qrwlock is doing today. All
the fastpath codes are moved to a new kernel/locking/rwsem-xadd.h
header file.

Patch 3 moves all the owner setting code to the fastpath in the
rwsem-xadd.h file as well.

Patch 4 moves content of kernel/locking/rwsem.h to rwsem-xadd.h and
removes it.

Patch 5 moves rwsem internal functions from include/linux/rwsem.h
to rwsem-xadd.h.

Patch 6 removes all the architecture specific rwsem files.

Patch 7 enables forced lock handoff to the first waiter in the wait
queue when it has waited for too long without acquiring the lock. This
prevents lock starvation and makes rwsem more fair.

Patch 8 enables readers to optimistically spin on a writer owned lock.

Patch 9 enables time-based reader lock stealing, thus making rwsem
reader-preferring instead of writer-preferring.

Patch 10 modifies rwsem_spin_on_owner() to return a tri-state value
that can be used in later patch.

Patch 11 enables writers to optimistically spin on reader-owned lock
using a fixed iteration count.

In term of rwsem performance, a rwsem microbenchmark and fio randrw
test with a xfs filesystem on a ramdisk were used to verify the
performance changes due to these patches. Both tests were run on a
2-socket, 40-core Gold 6148 system. The rwsem microbenchmark (1:1
reader/writer ratio) has short critical section while the fio randrw
test has long critical section (4k read/write).

The following table shows the performance of the rwsem microbenchmark
and fio radrw test with different number of patches applied on 4.14
based kernels:

  # of Patches	Locking Rate	FIO Bandwidth	FIO Bandwidth
    Applied	 40 threads	 32 threads	 16 threads
  ------------	------------	-------------	-------------
	0	  38.7 kop/s	  706 MB/s	  704 MB/s
	7	  38.6 kop/s	  668 MB/s	  663 MB/s
	8	  38.9 kop/s	  704 MB/s	  701 MB/s
	9	  39.1 kop/s	  702 MB/s	  707 MB/s
       11	3218.0 kop/s	 2594 MB/s	 2614 MB/s

So this patchset improves mixed read/write rwsem microbench by 83X
and randrw fio bandwidth by about 3.7X.

With separate reader and writer threads (20 each), the following
table shows the per-thread locking rates (min/mean/max) of the rwsem
microbenchmark with various patch level.

  # of Patches		   Reader 		  Writer
    Applied		Locking Rate		Locking Rate
  ------------		------------		------------
	0	5,155/    5,155/    5,155    5,154/248,852/346,281
	7	5,696/    5,697/    5,698  113,500/215,826/320,872
	8	4,827/    5,047/    5,215    4,826/176,797/284,069
	9     211,276/  509,712/1,134,007    4,894/221,839/246,818
       11     884,513/1,043,989/1,252,533    9,604/ 11,105/ 25,225

It can be seen that rwsem changes from writer-preferring to
reader-preferring.

Waiman Long (11):
  locking/rwsem: relocate rwsem_down_read_failed()
  locking/rwsem: Implement a new locking scheme
  locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h
  locking/rwsem: Remove kernel/locking/rwsem.h
  locking/rwsem: Move rwsem internal function declarations to
    rwsem-xadd.h
  locking/rwsem: Remove arch specific rwsem files
  locking/rwsem: Implement lock handoff to prevent lock starvation
  locking/rwsem: Enable readers spinning on writer
  locking/rwsem: Enable time-based reader lock stealing
  locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value
  locking/rwsem: Enable count-based spinning on reader

 arch/alpha/include/asm/rwsem.h  | 195 --------------
 arch/arm/include/asm/Kbuild     |   1 -
 arch/arm64/include/asm/Kbuild   |   1 -
 arch/hexagon/include/asm/Kbuild |   1 -
 arch/ia64/include/asm/rwsem.h   | 154 ------------
 arch/powerpc/include/asm/Kbuild |   1 -
 arch/s390/include/asm/rwsem.h   | 210 ----------------
 arch/sh/include/asm/Kbuild      |   1 -
 arch/sparc/include/asm/Kbuild   |   1 -
 arch/x86/include/asm/rwsem.h    | 221 ----------------
 arch/x86/lib/Makefile           |   1 -
 arch/x86/lib/rwsem.S            | 144 -----------
 arch/xtensa/include/asm/Kbuild  |   1 -
 include/asm-generic/rwsem.h     | 129 ----------
 include/linux/rwsem.h           |  19 +-
 kernel/locking/percpu-rwsem.c   |   4 +
 kernel/locking/rwsem-xadd.c     | 544 ++++++++++++++++++++++++++--------------
 kernel/locking/rwsem-xadd.h     | 272 ++++++++++++++++++++
 kernel/locking/rwsem.c          |  21 +-
 kernel/locking/rwsem.h          |  68 -----
 20 files changed, 638 insertions(+), 1351 deletions(-)
 delete mode 100644 arch/alpha/include/asm/rwsem.h
 delete mode 100644 arch/ia64/include/asm/rwsem.h
 delete mode 100644 arch/s390/include/asm/rwsem.h
 delete mode 100644 arch/x86/include/asm/rwsem.h
 delete mode 100644 arch/x86/lib/rwsem.S
 delete mode 100644 include/asm-generic/rwsem.h
 create mode 100644 kernel/locking/rwsem-xadd.h
 delete mode 100644 kernel/locking/rwsem.h

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
@ 2017-10-11 18:01 ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

v5->v6:
 - Reworked the locking algorithm to make it similar to qrwlock.
 - Removed all the architecture specific code & use only generic code.
 - Added waiter lock handoff and time-based reader lock stealing.

v4->v5:
 - Drop the OSQ patch, the need to increase the size of the rwsem
   structure and the autotuning mechanism.
 - Add an intermediate patch to enable readers spinning on writer.
 - Other miscellaneous changes and optimizations.

v3->v4:
 - Rebased to the latest tip tree due to changes to rwsem-xadd.c.
 - Update the OSQ patch to fix race condition.

v2->v3:
 - Used smp_acquire__after_ctrl_dep() to provide acquire barrier.
 - Added the following new patches:
   1) make rwsem_spin_on_owner() return a tristate value.
   2) reactivate reader spinning when there is a large number of
      favorable writer-on-writer spinnings.
   3) move all the rwsem macros in arch-specific rwsem.h files
      into a common asm-generic/rwsem_types.h file.
   4) add a boot parameter to specify the reader spinning threshold.
 - Updated some of the patches as suggested by PeterZ and adjusted
   some of the reader spinning parameters.

v1->v2:
 - Fixed a 0day build error.
 - Added a new patch 1 to make osq_lock() a proper acquire memory
   barrier.
 - Replaced the explicit enabling of reader spinning by an autotuning
   mechanism that disable reader spinning for those rwsems that may
   not benefit from reader spinning.
 - Remove the last xfs patch as it is no longer necessary.

v4: https://lkml.org/lkml/2016/8/18/1039
v5: https://lkml.org/lkml/2017/6/1/841

This patchset revamps the current rwsem-xadd implmentation to make
it saner and easier to work with. This patchset also implements the
following 3 new features:

 1) Waiter lock handoff
 2) Reader optimistic spinning
 3) Time-based reader lock stealing

With these changes, performance on workloads with a mix of readers
and writers will improve substantially. Now rwsem will also become
reader-preferring instead of writer-preferring, which is usually good
for performance purpose.

This patchset also uses generic code for all architectures, thus
all the architecture specific assembly codes can be removed easing
maintenance.

Patch 1 moves down the rwsem_down_read_failed() function for later
patches.

Patch 2 reworks the rwsem-xadd locking and unlocking codes to use
an algorithm somewhat similar to what qrwlock is doing today. All
the fastpath codes are moved to a new kernel/locking/rwsem-xadd.h
header file.

Patch 3 moves all the owner setting code to the fastpath in the
rwsem-xadd.h file as well.

Patch 4 moves content of kernel/locking/rwsem.h to rwsem-xadd.h and
removes it.

Patch 5 moves rwsem internal functions from include/linux/rwsem.h
to rwsem-xadd.h.

Patch 6 removes all the architecture specific rwsem files.

Patch 7 enables forced lock handoff to the first waiter in the wait
queue when it has waited for too long without acquiring the lock. This
prevents lock starvation and makes rwsem more fair.

Patch 8 enables readers to optimistically spin on a writer owned lock.

Patch 9 enables time-based reader lock stealing, thus making rwsem
reader-preferring instead of writer-preferring.

Patch 10 modifies rwsem_spin_on_owner() to return a tri-state value
that can be used in later patch.

Patch 11 enables writers to optimistically spin on reader-owned lock
using a fixed iteration count.

In term of rwsem performance, a rwsem microbenchmark and fio randrw
test with a xfs filesystem on a ramdisk were used to verify the
performance changes due to these patches. Both tests were run on a
2-socket, 40-core Gold 6148 system. The rwsem microbenchmark (1:1
reader/writer ratio) has short critical section while the fio randrw
test has long critical section (4k read/write).

The following table shows the performance of the rwsem microbenchmark
and fio radrw test with different number of patches applied on 4.14
based kernels:

  # of Patches	Locking Rate	FIO Bandwidth	FIO Bandwidth
    Applied	 40 threads	 32 threads	 16 threads
  ------------	------------	-------------	-------------
	0	  38.7 kop/s	  706 MB/s	  704 MB/s
	7	  38.6 kop/s	  668 MB/s	  663 MB/s
	8	  38.9 kop/s	  704 MB/s	  701 MB/s
	9	  39.1 kop/s	  702 MB/s	  707 MB/s
       11	3218.0 kop/s	 2594 MB/s	 2614 MB/s

So this patchset improves mixed read/write rwsem microbench by 83X
and randrw fio bandwidth by about 3.7X.

With separate reader and writer threads (20 each), the following
table shows the per-thread locking rates (min/mean/max) of the rwsem
microbenchmark with various patch level.

  # of Patches		   Reader 		  Writer
    Applied		Locking Rate		Locking Rate
  ------------		------------		------------
	0	5,155/    5,155/    5,155    5,154/248,852/346,281
	7	5,696/    5,697/    5,698  113,500/215,826/320,872
	8	4,827/    5,047/    5,215    4,826/176,797/284,069
	9     211,276/  509,712/1,134,007    4,894/221,839/246,818
       11     884,513/1,043,989/1,252,533    9,604/ 11,105/ 25,225

It can be seen that rwsem changes from writer-preferring to
reader-preferring.

Waiman Long (11):
  locking/rwsem: relocate rwsem_down_read_failed()
  locking/rwsem: Implement a new locking scheme
  locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h
  locking/rwsem: Remove kernel/locking/rwsem.h
  locking/rwsem: Move rwsem internal function declarations to
    rwsem-xadd.h
  locking/rwsem: Remove arch specific rwsem files
  locking/rwsem: Implement lock handoff to prevent lock starvation
  locking/rwsem: Enable readers spinning on writer
  locking/rwsem: Enable time-based reader lock stealing
  locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value
  locking/rwsem: Enable count-based spinning on reader

 arch/alpha/include/asm/rwsem.h  | 195 --------------
 arch/arm/include/asm/Kbuild     |   1 -
 arch/arm64/include/asm/Kbuild   |   1 -
 arch/hexagon/include/asm/Kbuild |   1 -
 arch/ia64/include/asm/rwsem.h   | 154 ------------
 arch/powerpc/include/asm/Kbuild |   1 -
 arch/s390/include/asm/rwsem.h   | 210 ----------------
 arch/sh/include/asm/Kbuild      |   1 -
 arch/sparc/include/asm/Kbuild   |   1 -
 arch/x86/include/asm/rwsem.h    | 221 ----------------
 arch/x86/lib/Makefile           |   1 -
 arch/x86/lib/rwsem.S            | 144 -----------
 arch/xtensa/include/asm/Kbuild  |   1 -
 include/asm-generic/rwsem.h     | 129 ----------
 include/linux/rwsem.h           |  19 +-
 kernel/locking/percpu-rwsem.c   |   4 +
 kernel/locking/rwsem-xadd.c     | 544 ++++++++++++++++++++++++++--------------
 kernel/locking/rwsem-xadd.h     | 272 ++++++++++++++++++++
 kernel/locking/rwsem.c          |  21 +-
 kernel/locking/rwsem.h          |  68 -----
 20 files changed, 638 insertions(+), 1351 deletions(-)
 delete mode 100644 arch/alpha/include/asm/rwsem.h
 delete mode 100644 arch/ia64/include/asm/rwsem.h
 delete mode 100644 arch/s390/include/asm/rwsem.h
 delete mode 100644 arch/x86/include/asm/rwsem.h
 delete mode 100644 arch/x86/lib/rwsem.S
 delete mode 100644 include/asm-generic/rwsem.h
 create mode 100644 kernel/locking/rwsem-xadd.h
 delete mode 100644 kernel/locking/rwsem.h

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v6 01/11] locking/rwsem: relocate rwsem_down_read_failed()
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:01   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The rwsem_down_read_failed*() functions were relocted from above the
optimistic spinning section to below that section. This enables
them to use functions in that section in future patches. There is no
code change.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 150 ++++++++++++++++++++++----------------------
 1 file changed, 75 insertions(+), 75 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 1fefe6d..db5dedf 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -219,81 +219,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 }
 
 /*
- * Wait for the read lock to be granted
- */
-static inline struct rw_semaphore __sched *
-__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
-{
-	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
-	struct rwsem_waiter waiter;
-	DEFINE_WAKE_Q(wake_q);
-
-	waiter.task = current;
-	waiter.type = RWSEM_WAITING_FOR_READ;
-
-	raw_spin_lock_irq(&sem->wait_lock);
-	if (list_empty(&sem->wait_list))
-		adjustment += RWSEM_WAITING_BIAS;
-	list_add_tail(&waiter.list, &sem->wait_list);
-
-	/* we're now waiting on the lock, but no longer actively locking */
-	count = atomic_long_add_return(adjustment, &sem->count);
-
-	/*
-	 * If there are no active locks, wake the front queued process(es).
-	 *
-	 * If there are no writers and we are first in the queue,
-	 * wake our own waiter to join the existing active readers !
-	 */
-	if (count == RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS &&
-	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
-		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
-
-	raw_spin_unlock_irq(&sem->wait_lock);
-	wake_up_q(&wake_q);
-
-	/* wait to be given the lock */
-	while (true) {
-		set_current_state(state);
-		if (!waiter.task)
-			break;
-		if (signal_pending_state(state, current)) {
-			raw_spin_lock_irq(&sem->wait_lock);
-			if (waiter.task)
-				goto out_nolock;
-			raw_spin_unlock_irq(&sem->wait_lock);
-			break;
-		}
-		schedule();
-	}
-
-	__set_current_state(TASK_RUNNING);
-	return sem;
-out_nolock:
-	list_del(&waiter.list);
-	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
-	raw_spin_unlock_irq(&sem->wait_lock);
-	__set_current_state(TASK_RUNNING);
-	return ERR_PTR(-EINTR);
-}
-
-__visible struct rw_semaphore * __sched
-rwsem_down_read_failed(struct rw_semaphore *sem)
-{
-	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE);
-}
-EXPORT_SYMBOL(rwsem_down_read_failed);
-
-__visible struct rw_semaphore * __sched
-rwsem_down_read_failed_killable(struct rw_semaphore *sem)
-{
-	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE);
-}
-EXPORT_SYMBOL(rwsem_down_read_failed_killable);
-
-/*
  * This function must be called with the sem->wait_lock held to prevent
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
@@ -488,6 +413,81 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 #endif
 
 /*
+ * Wait for the read lock to be granted
+ */
+static inline struct rw_semaphore __sched *
+__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
+{
+	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+	struct rwsem_waiter waiter;
+	DEFINE_WAKE_Q(wake_q);
+
+	waiter.task = current;
+	waiter.type = RWSEM_WAITING_FOR_READ;
+
+	raw_spin_lock_irq(&sem->wait_lock);
+	if (list_empty(&sem->wait_list))
+		adjustment += RWSEM_WAITING_BIAS;
+	list_add_tail(&waiter.list, &sem->wait_list);
+
+	/* we're now waiting on the lock, but no longer actively locking */
+	count = atomic_long_add_return(adjustment, &sem->count);
+
+	/*
+	 * If there are no active locks, wake the front queued process(es).
+	 *
+	 * If there are no writers and we are first in the queue,
+	 * wake our own waiter to join the existing active readers !
+	 */
+	if (count == RWSEM_WAITING_BIAS ||
+	    (count > RWSEM_WAITING_BIAS &&
+	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
+		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+
+	raw_spin_unlock_irq(&sem->wait_lock);
+	wake_up_q(&wake_q);
+
+	/* wait to be given the lock */
+	while (true) {
+		set_current_state(state);
+		if (!waiter.task)
+			break;
+		if (signal_pending_state(state, current)) {
+			raw_spin_lock_irq(&sem->wait_lock);
+			if (waiter.task)
+				goto out_nolock;
+			raw_spin_unlock_irq(&sem->wait_lock);
+			break;
+		}
+		schedule();
+	}
+
+	__set_current_state(TASK_RUNNING);
+	return sem;
+out_nolock:
+	list_del(&waiter.list);
+	if (list_empty(&sem->wait_list))
+		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+	raw_spin_unlock_irq(&sem->wait_lock);
+	__set_current_state(TASK_RUNNING);
+	return ERR_PTR(-EINTR);
+}
+
+__visible struct rw_semaphore * __sched
+rwsem_down_read_failed(struct rw_semaphore *sem)
+{
+	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE);
+}
+EXPORT_SYMBOL(rwsem_down_read_failed);
+
+__visible struct rw_semaphore * __sched
+rwsem_down_read_failed_killable(struct rw_semaphore *sem)
+{
+	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(rwsem_down_read_failed_killable);
+
+/*
  * Wait until we successfully acquire the write lock
  */
 static inline struct rw_semaphore *
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 01/11] locking/rwsem: relocate rwsem_down_read_failed()
@ 2017-10-11 18:01   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The rwsem_down_read_failed*() functions were relocted from above the
optimistic spinning section to below that section. This enables
them to use functions in that section in future patches. There is no
code change.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 150 ++++++++++++++++++++++----------------------
 1 file changed, 75 insertions(+), 75 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 1fefe6d..db5dedf 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -219,81 +219,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 }
 
 /*
- * Wait for the read lock to be granted
- */
-static inline struct rw_semaphore __sched *
-__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
-{
-	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
-	struct rwsem_waiter waiter;
-	DEFINE_WAKE_Q(wake_q);
-
-	waiter.task = current;
-	waiter.type = RWSEM_WAITING_FOR_READ;
-
-	raw_spin_lock_irq(&sem->wait_lock);
-	if (list_empty(&sem->wait_list))
-		adjustment += RWSEM_WAITING_BIAS;
-	list_add_tail(&waiter.list, &sem->wait_list);
-
-	/* we're now waiting on the lock, but no longer actively locking */
-	count = atomic_long_add_return(adjustment, &sem->count);
-
-	/*
-	 * If there are no active locks, wake the front queued process(es).
-	 *
-	 * If there are no writers and we are first in the queue,
-	 * wake our own waiter to join the existing active readers !
-	 */
-	if (count = RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS &&
-	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
-		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
-
-	raw_spin_unlock_irq(&sem->wait_lock);
-	wake_up_q(&wake_q);
-
-	/* wait to be given the lock */
-	while (true) {
-		set_current_state(state);
-		if (!waiter.task)
-			break;
-		if (signal_pending_state(state, current)) {
-			raw_spin_lock_irq(&sem->wait_lock);
-			if (waiter.task)
-				goto out_nolock;
-			raw_spin_unlock_irq(&sem->wait_lock);
-			break;
-		}
-		schedule();
-	}
-
-	__set_current_state(TASK_RUNNING);
-	return sem;
-out_nolock:
-	list_del(&waiter.list);
-	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
-	raw_spin_unlock_irq(&sem->wait_lock);
-	__set_current_state(TASK_RUNNING);
-	return ERR_PTR(-EINTR);
-}
-
-__visible struct rw_semaphore * __sched
-rwsem_down_read_failed(struct rw_semaphore *sem)
-{
-	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE);
-}
-EXPORT_SYMBOL(rwsem_down_read_failed);
-
-__visible struct rw_semaphore * __sched
-rwsem_down_read_failed_killable(struct rw_semaphore *sem)
-{
-	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE);
-}
-EXPORT_SYMBOL(rwsem_down_read_failed_killable);
-
-/*
  * This function must be called with the sem->wait_lock held to prevent
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
@@ -488,6 +413,81 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 #endif
 
 /*
+ * Wait for the read lock to be granted
+ */
+static inline struct rw_semaphore __sched *
+__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
+{
+	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+	struct rwsem_waiter waiter;
+	DEFINE_WAKE_Q(wake_q);
+
+	waiter.task = current;
+	waiter.type = RWSEM_WAITING_FOR_READ;
+
+	raw_spin_lock_irq(&sem->wait_lock);
+	if (list_empty(&sem->wait_list))
+		adjustment += RWSEM_WAITING_BIAS;
+	list_add_tail(&waiter.list, &sem->wait_list);
+
+	/* we're now waiting on the lock, but no longer actively locking */
+	count = atomic_long_add_return(adjustment, &sem->count);
+
+	/*
+	 * If there are no active locks, wake the front queued process(es).
+	 *
+	 * If there are no writers and we are first in the queue,
+	 * wake our own waiter to join the existing active readers !
+	 */
+	if (count = RWSEM_WAITING_BIAS ||
+	    (count > RWSEM_WAITING_BIAS &&
+	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
+		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+
+	raw_spin_unlock_irq(&sem->wait_lock);
+	wake_up_q(&wake_q);
+
+	/* wait to be given the lock */
+	while (true) {
+		set_current_state(state);
+		if (!waiter.task)
+			break;
+		if (signal_pending_state(state, current)) {
+			raw_spin_lock_irq(&sem->wait_lock);
+			if (waiter.task)
+				goto out_nolock;
+			raw_spin_unlock_irq(&sem->wait_lock);
+			break;
+		}
+		schedule();
+	}
+
+	__set_current_state(TASK_RUNNING);
+	return sem;
+out_nolock:
+	list_del(&waiter.list);
+	if (list_empty(&sem->wait_list))
+		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+	raw_spin_unlock_irq(&sem->wait_lock);
+	__set_current_state(TASK_RUNNING);
+	return ERR_PTR(-EINTR);
+}
+
+__visible struct rw_semaphore * __sched
+rwsem_down_read_failed(struct rw_semaphore *sem)
+{
+	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE);
+}
+EXPORT_SYMBOL(rwsem_down_read_failed);
+
+__visible struct rw_semaphore * __sched
+rwsem_down_read_failed_killable(struct rw_semaphore *sem)
+{
+	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE);
+}
+EXPORT_SYMBOL(rwsem_down_read_failed_killable);
+
+/*
  * Wait until we successfully acquire the write lock
  */
 static inline struct rw_semaphore *
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:01   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The current way of using various reader, writer and waiting biases
in the rwsem code are confusing and hard to understand. I have to
reread the rwsem count guide in the rwsem-xadd.c file from time to
time to remind myself how this whole thing works. It also makes the
rwsem code harder to be optimized.

To make rwsem more sane, a new locking scheme similar to the one in
qrwlock is now being used.  The count is now a 32-bit atomic value
in all architectures. The current bit definitions are:

  Bit  0    - writer locked bit
  Bit  1    - waiters present bit
  Bits 2-7  - reserved for future extension
  Bits 8-31 - reader count

Now the cmpxchg instruction is used to acquire the write lock. The
read lock is still acquired with xadd instruction, so there is no
change here.  This scheme will allow up to 16M active readers which
should be more than enough. We can always use some more reserved bits
if necessary.

The same generic locking code will be used for all the architectures
and the architecture specific files will be retired.

This patch also hide the fastpath implementation of rwsem (now in
kernel/locking/rwsem-xadd.h) from the other kernel code as
include/linux/rwsem.h will not include it.

With a locking microbenchmark running on 3.13 based kernel, the total
locking rates (in Mops/s) of the benchmark on a 2-socket 36-core
x86-64 system before and after the patch were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        39.039   33.401   40.432   33.093
        2         9.767   17.250   11.424   18.763
        4         9.069   17.580   10.085   17.372
        8         9.390   15.372   11.733   14.507

The locking rates of the benchmark on a 16-processor Power8 system
were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        15.086   13.738    9.373   13.597
        2         4.864    6.280    5.514    6.309
        4         3.286    4.932    4.153    5.011
        8         2.637    2.248    3.528    2.189

The locking rates of the benchmark on a 32-core Cavium ARM64 system
were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        4.849    3.972    5.194    4.223
        2        3.165    4.628    3.077    4.885
        4        0.742    3.856    0.716    4.136
        8        1.639    2.443    1.330    2.475

For read lock, locking performance was about the same before and
after the patch. For write lock, the new code had better contended
performance (2 or more threads) for both x86 and ppc, but it seemed to
slow down a bit in arm64. The uncontended performance, however, suffers
quite a bit in ppc, but not in x86 and arm64. So cmpxchg does have a
noticeable higher cost than xadd in ppc, the elimination of the atomic
count reversal in slowpath helps the contended performance, though.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/asm-generic/rwsem.h   | 129 -------------------------------------
 include/linux/rwsem.h         |  12 ++--
 kernel/locking/percpu-rwsem.c |   2 +
 kernel/locking/rwsem-xadd.c   | 145 ++++++++++++++----------------------------
 kernel/locking/rwsem-xadd.h   | 117 ++++++++++++++++++++++++++++++++++
 kernel/locking/rwsem.h        |   4 ++
 6 files changed, 174 insertions(+), 235 deletions(-)
 delete mode 100644 include/asm-generic/rwsem.h
 create mode 100644 kernel/locking/rwsem-xadd.h

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
deleted file mode 100644
index 6c6a214..0000000
--- a/include/asm-generic/rwsem.h
+++ /dev/null
@@ -1,129 +0,0 @@
-#ifndef _ASM_GENERIC_RWSEM_H
-#define _ASM_GENERIC_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "Please don't include <asm/rwsem.h> directly, use <linux/rwsem.h> instead."
-#endif
-
-#ifdef __KERNEL__
-
-/*
- * R/W semaphores originally for PPC using the stuff in lib/rwsem.c.
- * Adapted largely from include/asm-i386/rwsem.h
- * by Paul Mackerras <paulus@samba.org>.
- */
-
-/*
- * the semaphore definition
- */
-#ifdef CONFIG_64BIT
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0))
-		rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	while ((tmp = atomic_long_read(&sem->count)) >= 0) {
-		if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp,
-				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			return 1;
-		}
-	}
-	return 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
-					     &sem->count);
-	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
-					     &sem->count);
-	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-	return 0;
-}
-
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_dec_return_release(&sem->count);
-	if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) == 0))
-		rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS,
-						    &sem->count) < 0))
-		rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	/*
-	 * When downgrading from exclusive to shared ownership,
-	 * anything inside the write-locked region cannot leak
-	 * into the read side. In contrast, anything in the
-	 * read-locked region is ok to be re-ordered into the
-	 * write side. As such, rely on RELEASE semantics.
-	 */
-	tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count);
-	if (tmp < 0)
-		rwsem_downgrade_wake(sem);
-}
-
-#endif	/* __KERNEL__ */
-#endif	/* _ASM_GENERIC_RWSEM_H */
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 0ad7318..d0f59df 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -25,11 +25,10 @@
 #include <linux/rwsem-spinlock.h> /* use a generic implementation */
 #define __RWSEM_INIT_COUNT(name)	.count = RWSEM_UNLOCKED_VALUE
 #else
-/* All arch specific implementations share the same struct */
 struct rw_semaphore {
-	atomic_long_t count;
-	struct list_head wait_list;
+	atomic_t count;
 	raw_spinlock_t wait_lock;
+	struct list_head wait_list;
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	struct optimistic_spin_queue osq; /* spinner MCS lock */
 	/*
@@ -50,16 +49,15 @@ struct rw_semaphore {
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
-/* Include the arch specific part */
-#include <asm/rwsem.h>
+#define RWSEM_UNLOCKED_VALUE	0
 
 /* In all implementations count != 0 means locked */
 static inline int rwsem_is_locked(struct rw_semaphore *sem)
 {
-	return atomic_long_read(&sem->count) != 0;
+	return atomic_read(&sem->count) != RWSEM_UNLOCKED_VALUE;
 }
 
-#define __RWSEM_INIT_COUNT(name)	.count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE)
+#define __RWSEM_INIT_COUNT(name)	.count = ATOMIC_INIT(RWSEM_UNLOCKED_VALUE)
 #endif
 
 /* Common initializer macros and functions */
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index 883cf1b..f17dad9 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -7,6 +7,8 @@
 #include <linux/sched.h>
 #include <linux/errno.h>
 
+#include "rwsem.h"
+
 int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
 			const char *name, struct lock_class_key *rwsem_key)
 {
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index db5dedf..39dc5be 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -21,52 +21,20 @@
 #include "rwsem.h"
 
 /*
- * Guide to the rw_semaphore's count field for common values.
- * (32-bit case illustrated, similar for 64-bit)
+ * Guide to the rw_semaphore's count field.
  *
- * 0x0000000X	(1) X readers active or attempting lock, no writer waiting
- *		    X = #active_readers + #readers attempting to lock
- *		    (X*ACTIVE_BIAS)
+ * When the RWSEM_WRITER_LOCKED bit in count is set, the lock is owned
+ * by a writer.
  *
- * 0x00000000	rwsem is unlocked, and no one is waiting for the lock or
- *		attempting to read lock or write lock.
- *
- * 0xffff000X	(1) X readers active or attempting lock, with waiters for lock
- *		    X = #active readers + # readers attempting lock
- *		    (X*ACTIVE_BIAS + WAITING_BIAS)
- *		(2) 1 writer attempting lock, no waiters for lock
- *		    X-1 = #active readers + #readers attempting lock
- *		    ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
- *		(3) 1 writer active, no waiters for lock
- *		    X-1 = #active readers + #readers attempting lock
- *		    ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
- *
- * 0xffff0001	(1) 1 reader active or attempting lock, waiters for lock
- *		    (WAITING_BIAS + ACTIVE_BIAS)
- *		(2) 1 writer active or attempting lock, no waiters for lock
- *		    (ACTIVE_WRITE_BIAS)
- *
- * 0xffff0000	(1) There are writers or readers queued but none active
- *		    or in the process of attempting lock.
- *		    (WAITING_BIAS)
- *		Note: writer can attempt to steal lock for this count by adding
- *		ACTIVE_WRITE_BIAS in cmpxchg and checking the old count
- *
- * 0xfffe0001	(1) 1 writer active, or attempting lock. Waiters on queue.
- *		    (ACTIVE_WRITE_BIAS + WAITING_BIAS)
- *
- * Note: Readers attempt to lock by adding ACTIVE_BIAS in down_read and checking
- *	 the count becomes more than 0 for successful lock acquisition,
- *	 i.e. the case where there are only readers or nobody has lock.
- *	 (1st and 2nd case above).
- *
- *	 Writers attempt to lock by adding ACTIVE_WRITE_BIAS in down_write and
- *	 checking the count becomes ACTIVE_WRITE_BIAS for successful lock
- *	 acquisition (i.e. nobody else has lock or attempts lock).  If
- *	 unsuccessful, in rwsem_down_write_failed, we'll check to see if there
- *	 are only waiters but none active (5th case above), and attempt to
- *	 steal the lock.
+ * The lock is owned by readers when
+ * (1) the RWSEM_WRITER_LOCKED isn't set in count,
+ * (2) some of the reader bits are set in count, and
+ * (3) the owner field is RWSEM_READ_OWNED.
  *
+ * Having some reader bits set is not enough to guarantee a readers owned
+ * lock as the readers may be in the process of backing out from the count
+ * and a writer has just released the lock. So another writer may steal
+ * the lock immediately after that.
  */
 
 /*
@@ -82,7 +50,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name,
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
 	lockdep_init_map(&sem->dep_map, name, key, 0);
 #endif
-	atomic_long_set(&sem->count, RWSEM_UNLOCKED_VALUE);
+	atomic_set(&sem->count, RWSEM_UNLOCKED_VALUE);
 	raw_spin_lock_init(&sem->wait_lock);
 	INIT_LIST_HEAD(&sem->wait_list);
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
@@ -128,7 +96,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			      struct wake_q_head *wake_q)
 {
 	struct rwsem_waiter *waiter, *tmp;
-	long oldcount, woken = 0, adjustment = 0;
+	int oldcount, woken = 0, adjustment = 0;
 
 	/*
 	 * Take a peek at the queue head waiter such that we can determine
@@ -157,22 +125,11 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 	 * so we can bail out early if a writer stole the lock.
 	 */
 	if (wake_type != RWSEM_WAKE_READ_OWNED) {
-		adjustment = RWSEM_ACTIVE_READ_BIAS;
- try_reader_grant:
-		oldcount = atomic_long_fetch_add(adjustment, &sem->count);
-		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
-			/*
-			 * If the count is still less than RWSEM_WAITING_BIAS
-			 * after removing the adjustment, it is assumed that
-			 * a writer has stolen the lock. We have to undo our
-			 * reader grant.
-			 */
-			if (atomic_long_add_return(-adjustment, &sem->count) <
-			    RWSEM_WAITING_BIAS)
-				return;
-
-			/* Last active locker left. Retry waking readers. */
-			goto try_reader_grant;
+		adjustment = RWSEM_READER_BIAS;
+		oldcount = atomic_fetch_add(adjustment, &sem->count);
+		if (unlikely(oldcount & RWSEM_WRITER_LOCKED)) {
+			atomic_sub(adjustment, &sem->count);
+			return;
 		}
 		/*
 		 * It is not really necessary to set it to reader-owned here,
@@ -208,14 +165,14 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		smp_store_release(&waiter->task, NULL);
 	}
 
-	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
+	adjustment = woken * RWSEM_READER_BIAS - adjustment;
 	if (list_empty(&sem->wait_list)) {
 		/* hit end of list above */
-		adjustment -= RWSEM_WAITING_BIAS;
+		adjustment -= RWSEM_FLAG_WAITERS;
 	}
 
 	if (adjustment)
-		atomic_long_add(adjustment, &sem->count);
+		atomic_add(adjustment, &sem->count);
 }
 
 /*
@@ -223,24 +180,17 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
  */
-static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
+static inline bool rwsem_try_write_lock(int count, struct rw_semaphore *sem)
 {
-	/*
-	 * Avoid trying to acquire write lock if count isn't RWSEM_WAITING_BIAS.
-	 */
-	if (count != RWSEM_WAITING_BIAS)
+	int new;
+
+	if (RWSEM_COUNT_IS_LOCKED(count))
 		return false;
 
-	/*
-	 * Acquire the lock by trying to set it to ACTIVE_WRITE_BIAS. If there
-	 * are other tasks on the wait list, we need to add on WAITING_BIAS.
-	 */
-	count = list_is_singular(&sem->wait_list) ?
-			RWSEM_ACTIVE_WRITE_BIAS :
-			RWSEM_ACTIVE_WRITE_BIAS + RWSEM_WAITING_BIAS;
+	new = count + RWSEM_WRITER_LOCKED -
+	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0);
 
-	if (atomic_long_cmpxchg_acquire(&sem->count, RWSEM_WAITING_BIAS, count)
-							== RWSEM_WAITING_BIAS) {
+	if (atomic_cmpxchg_acquire(&sem->count, count, new) == count) {
 		rwsem_set_owner(sem);
 		return true;
 	}
@@ -254,14 +204,14 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
  */
 static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 {
-	long old, count = atomic_long_read(&sem->count);
+	int old, count = atomic_read(&sem->count);
 
 	while (true) {
-		if (!(count == 0 || count == RWSEM_WAITING_BIAS))
+		if (RWSEM_COUNT_IS_LOCKED(count))
 			return false;
 
-		old = atomic_long_cmpxchg_acquire(&sem->count, count,
-				      count + RWSEM_ACTIVE_WRITE_BIAS);
+		old = atomic_cmpxchg_acquire(&sem->count, count,
+				count + RWSEM_WRITER_LOCKED);
 		if (old == count) {
 			rwsem_set_owner(sem);
 			return true;
@@ -418,7 +368,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore __sched *
 __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 {
-	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+	int count, adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
@@ -427,11 +377,11 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	if (list_empty(&sem->wait_list))
-		adjustment += RWSEM_WAITING_BIAS;
+		adjustment += RWSEM_FLAG_WAITERS;
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-	count = atomic_long_add_return(adjustment, &sem->count);
+	count = atomic_add_return(adjustment, &sem->count);
 
 	/*
 	 * If there are no active locks, wake the front queued process(es).
@@ -439,9 +389,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	 * If there are no writers and we are first in the queue,
 	 * wake our own waiter to join the existing active readers !
 	 */
-	if (count == RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS &&
-	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
+	if (!RWSEM_COUNT_IS_LOCKED(count))
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 
 	raw_spin_unlock_irq(&sem->wait_lock);
@@ -467,7 +415,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 out_nolock:
 	list_del(&waiter.list);
 	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
 	raw_spin_unlock_irq(&sem->wait_lock);
 	__set_current_state(TASK_RUNNING);
 	return ERR_PTR(-EINTR);
@@ -493,15 +441,12 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore *
 __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 {
-	long count;
+	int count;
 	bool waiting = true; /* any queued threads before us */
 	struct rwsem_waiter waiter;
 	struct rw_semaphore *ret = sem;
 	DEFINE_WAKE_Q(wake_q);
 
-	/* undo write bias from down_write operation, stop active locking */
-	count = atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS, &sem->count);
-
 	/* do optimistic spinning and steal lock if possible */
 	if (rwsem_optimistic_spin(sem))
 		return sem;
@@ -523,14 +468,14 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 	/* we're now waiting on the lock, but no longer actively locking */
 	if (waiting) {
-		count = atomic_long_read(&sem->count);
+		count = atomic_read(&sem->count);
 
 		/*
 		 * If there were already threads queued before us and there are
 		 * no active writers, the lock must be read owned; so we try to
 		 * wake any read locks that were queued ahead of us.
 		 */
-		if (count > RWSEM_WAITING_BIAS) {
+		if (!(count & RWSEM_WRITER_LOCKED)) {
 			__rwsem_mark_wake(sem, RWSEM_WAKE_READERS, &wake_q);
 			/*
 			 * The wakeup is normally called _after_ the wait_lock
@@ -547,8 +492,9 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 			wake_q_init(&wake_q);
 		}
 
-	} else
-		count = atomic_long_add_return(RWSEM_WAITING_BIAS, &sem->count);
+	} else {
+		count = atomic_add_return(RWSEM_FLAG_WAITERS, &sem->count);
+	}
 
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
@@ -564,7 +510,8 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 			schedule();
 			set_current_state(state);
-		} while ((count = atomic_long_read(&sem->count)) & RWSEM_ACTIVE_MASK);
+			count = atomic_read(&sem->count);
+		} while (RWSEM_COUNT_IS_LOCKED(count));
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
@@ -579,7 +526,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	raw_spin_lock_irq(&sem->wait_lock);
 	list_del(&waiter.list);
 	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
 	else
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 	raw_spin_unlock_irq(&sem->wait_lock);
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
new file mode 100644
index 0000000..3820719
--- /dev/null
+++ b/kernel/locking/rwsem-xadd.h
@@ -0,0 +1,117 @@
+#ifndef _ASM_GENERIC_RWSEM_H
+#define _ASM_GENERIC_RWSEM_H
+
+#include <linux/rwsem.h>
+
+/*
+ * The definition of the atomic counter in the semaphore:
+ *
+ * Bit  0    - writer locked bit
+ * Bit  1    - waiters present bit
+ * Bits 2-7  - reserved
+ * Bits 8-31 - 24-bit reader count
+ *
+ * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
+ * will be used to obtain writer lock.
+ */
+#define RWSEM_WRITER_LOCKED	0X00000001
+#define RWSEM_FLAG_WAITERS	0X00000002
+#define RWSEM_READER_BIAS	0x00000100
+#define RWSEM_READER_SHIFT	8
+#define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
+#define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
+#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
+
+#define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
+
+/*
+ * lock for reading
+ */
+static inline void __down_read(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
+		     & RWSEM_READ_FAILED_MASK))
+		rwsem_down_read_failed(sem);
+}
+
+static inline int __down_read_trylock(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	while (!((tmp = atomic_read(&sem->count)) & RWSEM_READ_FAILED_MASK)) {
+		if (tmp == atomic_cmpxchg_acquire(&sem->count, tmp,
+				   tmp + RWSEM_READER_BIAS)) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * lock for writing
+ */
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_cmpxchg_acquire(&sem->count, 0,
+					    RWSEM_WRITER_LOCKED)))
+		rwsem_down_write_failed(sem);
+}
+
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_cmpxchg_acquire(&sem->count, 0,
+					    RWSEM_WRITER_LOCKED)))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+	return 0;
+}
+
+static inline int __down_write_trylock(struct rw_semaphore *sem)
+{
+	return !atomic_cmpxchg_acquire(&sem->count, 0, RWSEM_WRITER_LOCKED);
+}
+
+/*
+ * unlock after reading
+ */
+static inline void __up_read(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return_release(-RWSEM_READER_BIAS, &sem->count);
+	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS))
+			== RWSEM_FLAG_WAITERS))
+		rwsem_wake(sem);
+}
+
+/*
+ * unlock after writing
+ */
+static inline void __up_write(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_fetch_add_release(-RWSEM_WRITER_LOCKED,
+			&sem->count) & RWSEM_FLAG_WAITERS))
+		rwsem_wake(sem);
+}
+
+/*
+ * downgrade write lock to read lock
+ */
+static inline void __downgrade_write(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	/*
+	 * When downgrading from exclusive to shared ownership,
+	 * anything inside the write-locked region cannot leak
+	 * into the read side. In contrast, anything in the
+	 * read-locked region is ok to be re-ordered into the
+	 * write side. As such, rely on RELEASE semantics.
+	 */
+	tmp = atomic_fetch_add_release(-RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS,
+					&sem->count);
+	if (tmp & RWSEM_FLAG_WAITERS)
+		rwsem_downgrade_wake(sem);
+}
+
+#endif	/* _ASM_GENERIC_RWSEM_H */
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index a699f40..adcc5af 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -66,3 +66,7 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
 }
 #endif
+
+#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
+#include "rwsem-xadd.h"
+#endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme
@ 2017-10-11 18:01   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The current way of using various reader, writer and waiting biases
in the rwsem code are confusing and hard to understand. I have to
reread the rwsem count guide in the rwsem-xadd.c file from time to
time to remind myself how this whole thing works. It also makes the
rwsem code harder to be optimized.

To make rwsem more sane, a new locking scheme similar to the one in
qrwlock is now being used.  The count is now a 32-bit atomic value
in all architectures. The current bit definitions are:

  Bit  0    - writer locked bit
  Bit  1    - waiters present bit
  Bits 2-7  - reserved for future extension
  Bits 8-31 - reader count

Now the cmpxchg instruction is used to acquire the write lock. The
read lock is still acquired with xadd instruction, so there is no
change here.  This scheme will allow up to 16M active readers which
should be more than enough. We can always use some more reserved bits
if necessary.

The same generic locking code will be used for all the architectures
and the architecture specific files will be retired.

This patch also hide the fastpath implementation of rwsem (now in
kernel/locking/rwsem-xadd.h) from the other kernel code as
include/linux/rwsem.h will not include it.

With a locking microbenchmark running on 3.13 based kernel, the total
locking rates (in Mops/s) of the benchmark on a 2-socket 36-core
x86-64 system before and after the patch were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        39.039   33.401   40.432   33.093
        2         9.767   17.250   11.424   18.763
        4         9.069   17.580   10.085   17.372
        8         9.390   15.372   11.733   14.507

The locking rates of the benchmark on a 16-processor Power8 system
were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        15.086   13.738    9.373   13.597
        2         4.864    6.280    5.514    6.309
        4         3.286    4.932    4.153    5.011
        8         2.637    2.248    3.528    2.189

The locking rates of the benchmark on a 32-core Cavium ARM64 system
were as follows:

                  Before Patch      After Patch
   # of Threads  wlock    rlock    wlock    rlock
   ------------  -----    -----    -----    -----
        1        4.849    3.972    5.194    4.223
        2        3.165    4.628    3.077    4.885
        4        0.742    3.856    0.716    4.136
        8        1.639    2.443    1.330    2.475

For read lock, locking performance was about the same before and
after the patch. For write lock, the new code had better contended
performance (2 or more threads) for both x86 and ppc, but it seemed to
slow down a bit in arm64. The uncontended performance, however, suffers
quite a bit in ppc, but not in x86 and arm64. So cmpxchg does have a
noticeable higher cost than xadd in ppc, the elimination of the atomic
count reversal in slowpath helps the contended performance, though.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/asm-generic/rwsem.h   | 129 -------------------------------------
 include/linux/rwsem.h         |  12 ++--
 kernel/locking/percpu-rwsem.c |   2 +
 kernel/locking/rwsem-xadd.c   | 145 ++++++++++++++----------------------------
 kernel/locking/rwsem-xadd.h   | 117 ++++++++++++++++++++++++++++++++++
 kernel/locking/rwsem.h        |   4 ++
 6 files changed, 174 insertions(+), 235 deletions(-)
 delete mode 100644 include/asm-generic/rwsem.h
 create mode 100644 kernel/locking/rwsem-xadd.h

diff --git a/include/asm-generic/rwsem.h b/include/asm-generic/rwsem.h
deleted file mode 100644
index 6c6a214..0000000
--- a/include/asm-generic/rwsem.h
+++ /dev/null
@@ -1,129 +0,0 @@
-#ifndef _ASM_GENERIC_RWSEM_H
-#define _ASM_GENERIC_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "Please don't include <asm/rwsem.h> directly, use <linux/rwsem.h> instead."
-#endif
-
-#ifdef __KERNEL__
-
-/*
- * R/W semaphores originally for PPC using the stuff in lib/rwsem.c.
- * Adapted largely from include/asm-i386/rwsem.h
- * by Paul Mackerras <paulus@samba.org>.
- */
-
-/*
- * the semaphore definition
- */
-#ifdef CONFIG_64BIT
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	if (unlikely(atomic_long_inc_return_acquire(&sem->count) <= 0))
-		rwsem_down_read_failed(sem);
-}
-
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	while ((tmp = atomic_long_read(&sem->count)) >= 0) {
-		if (tmp = atomic_long_cmpxchg_acquire(&sem->count, tmp,
-				   tmp + RWSEM_ACTIVE_READ_BIAS)) {
-			return 1;
-		}
-	}
-	return 0;
-}
-
-/*
- * lock for writing
- */
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
-					     &sem->count);
-	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_add_return_acquire(RWSEM_ACTIVE_WRITE_BIAS,
-					     &sem->count);
-	if (unlikely(tmp != RWSEM_ACTIVE_WRITE_BIAS))
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-	return 0;
-}
-
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_cmpxchg_acquire(&sem->count, RWSEM_UNLOCKED_VALUE,
-		      RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp = RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	tmp = atomic_long_dec_return_release(&sem->count);
-	if (unlikely(tmp < -1 && (tmp & RWSEM_ACTIVE_MASK) = 0))
-		rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	if (unlikely(atomic_long_sub_return_release(RWSEM_ACTIVE_WRITE_BIAS,
-						    &sem->count) < 0))
-		rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	long tmp;
-
-	/*
-	 * When downgrading from exclusive to shared ownership,
-	 * anything inside the write-locked region cannot leak
-	 * into the read side. In contrast, anything in the
-	 * read-locked region is ok to be re-ordered into the
-	 * write side. As such, rely on RELEASE semantics.
-	 */
-	tmp = atomic_long_add_return_release(-RWSEM_WAITING_BIAS, &sem->count);
-	if (tmp < 0)
-		rwsem_downgrade_wake(sem);
-}
-
-#endif	/* __KERNEL__ */
-#endif	/* _ASM_GENERIC_RWSEM_H */
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 0ad7318..d0f59df 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -25,11 +25,10 @@
 #include <linux/rwsem-spinlock.h> /* use a generic implementation */
 #define __RWSEM_INIT_COUNT(name)	.count = RWSEM_UNLOCKED_VALUE
 #else
-/* All arch specific implementations share the same struct */
 struct rw_semaphore {
-	atomic_long_t count;
-	struct list_head wait_list;
+	atomic_t count;
 	raw_spinlock_t wait_lock;
+	struct list_head wait_list;
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	struct optimistic_spin_queue osq; /* spinner MCS lock */
 	/*
@@ -50,16 +49,15 @@ struct rw_semaphore {
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
-/* Include the arch specific part */
-#include <asm/rwsem.h>
+#define RWSEM_UNLOCKED_VALUE	0
 
 /* In all implementations count != 0 means locked */
 static inline int rwsem_is_locked(struct rw_semaphore *sem)
 {
-	return atomic_long_read(&sem->count) != 0;
+	return atomic_read(&sem->count) != RWSEM_UNLOCKED_VALUE;
 }
 
-#define __RWSEM_INIT_COUNT(name)	.count = ATOMIC_LONG_INIT(RWSEM_UNLOCKED_VALUE)
+#define __RWSEM_INIT_COUNT(name)	.count = ATOMIC_INIT(RWSEM_UNLOCKED_VALUE)
 #endif
 
 /* Common initializer macros and functions */
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index 883cf1b..f17dad9 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -7,6 +7,8 @@
 #include <linux/sched.h>
 #include <linux/errno.h>
 
+#include "rwsem.h"
+
 int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
 			const char *name, struct lock_class_key *rwsem_key)
 {
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index db5dedf..39dc5be 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -21,52 +21,20 @@
 #include "rwsem.h"
 
 /*
- * Guide to the rw_semaphore's count field for common values.
- * (32-bit case illustrated, similar for 64-bit)
+ * Guide to the rw_semaphore's count field.
  *
- * 0x0000000X	(1) X readers active or attempting lock, no writer waiting
- *		    X = #active_readers + #readers attempting to lock
- *		    (X*ACTIVE_BIAS)
+ * When the RWSEM_WRITER_LOCKED bit in count is set, the lock is owned
+ * by a writer.
  *
- * 0x00000000	rwsem is unlocked, and no one is waiting for the lock or
- *		attempting to read lock or write lock.
- *
- * 0xffff000X	(1) X readers active or attempting lock, with waiters for lock
- *		    X = #active readers + # readers attempting lock
- *		    (X*ACTIVE_BIAS + WAITING_BIAS)
- *		(2) 1 writer attempting lock, no waiters for lock
- *		    X-1 = #active readers + #readers attempting lock
- *		    ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
- *		(3) 1 writer active, no waiters for lock
- *		    X-1 = #active readers + #readers attempting lock
- *		    ((X-1)*ACTIVE_BIAS + ACTIVE_WRITE_BIAS)
- *
- * 0xffff0001	(1) 1 reader active or attempting lock, waiters for lock
- *		    (WAITING_BIAS + ACTIVE_BIAS)
- *		(2) 1 writer active or attempting lock, no waiters for lock
- *		    (ACTIVE_WRITE_BIAS)
- *
- * 0xffff0000	(1) There are writers or readers queued but none active
- *		    or in the process of attempting lock.
- *		    (WAITING_BIAS)
- *		Note: writer can attempt to steal lock for this count by adding
- *		ACTIVE_WRITE_BIAS in cmpxchg and checking the old count
- *
- * 0xfffe0001	(1) 1 writer active, or attempting lock. Waiters on queue.
- *		    (ACTIVE_WRITE_BIAS + WAITING_BIAS)
- *
- * Note: Readers attempt to lock by adding ACTIVE_BIAS in down_read and checking
- *	 the count becomes more than 0 for successful lock acquisition,
- *	 i.e. the case where there are only readers or nobody has lock.
- *	 (1st and 2nd case above).
- *
- *	 Writers attempt to lock by adding ACTIVE_WRITE_BIAS in down_write and
- *	 checking the count becomes ACTIVE_WRITE_BIAS for successful lock
- *	 acquisition (i.e. nobody else has lock or attempts lock).  If
- *	 unsuccessful, in rwsem_down_write_failed, we'll check to see if there
- *	 are only waiters but none active (5th case above), and attempt to
- *	 steal the lock.
+ * The lock is owned by readers when
+ * (1) the RWSEM_WRITER_LOCKED isn't set in count,
+ * (2) some of the reader bits are set in count, and
+ * (3) the owner field is RWSEM_READ_OWNED.
  *
+ * Having some reader bits set is not enough to guarantee a readers owned
+ * lock as the readers may be in the process of backing out from the count
+ * and a writer has just released the lock. So another writer may steal
+ * the lock immediately after that.
  */
 
 /*
@@ -82,7 +50,7 @@ void __init_rwsem(struct rw_semaphore *sem, const char *name,
 	debug_check_no_locks_freed((void *)sem, sizeof(*sem));
 	lockdep_init_map(&sem->dep_map, name, key, 0);
 #endif
-	atomic_long_set(&sem->count, RWSEM_UNLOCKED_VALUE);
+	atomic_set(&sem->count, RWSEM_UNLOCKED_VALUE);
 	raw_spin_lock_init(&sem->wait_lock);
 	INIT_LIST_HEAD(&sem->wait_list);
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
@@ -128,7 +96,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			      struct wake_q_head *wake_q)
 {
 	struct rwsem_waiter *waiter, *tmp;
-	long oldcount, woken = 0, adjustment = 0;
+	int oldcount, woken = 0, adjustment = 0;
 
 	/*
 	 * Take a peek at the queue head waiter such that we can determine
@@ -157,22 +125,11 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 	 * so we can bail out early if a writer stole the lock.
 	 */
 	if (wake_type != RWSEM_WAKE_READ_OWNED) {
-		adjustment = RWSEM_ACTIVE_READ_BIAS;
- try_reader_grant:
-		oldcount = atomic_long_fetch_add(adjustment, &sem->count);
-		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
-			/*
-			 * If the count is still less than RWSEM_WAITING_BIAS
-			 * after removing the adjustment, it is assumed that
-			 * a writer has stolen the lock. We have to undo our
-			 * reader grant.
-			 */
-			if (atomic_long_add_return(-adjustment, &sem->count) <
-			    RWSEM_WAITING_BIAS)
-				return;
-
-			/* Last active locker left. Retry waking readers. */
-			goto try_reader_grant;
+		adjustment = RWSEM_READER_BIAS;
+		oldcount = atomic_fetch_add(adjustment, &sem->count);
+		if (unlikely(oldcount & RWSEM_WRITER_LOCKED)) {
+			atomic_sub(adjustment, &sem->count);
+			return;
 		}
 		/*
 		 * It is not really necessary to set it to reader-owned here,
@@ -208,14 +165,14 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		smp_store_release(&waiter->task, NULL);
 	}
 
-	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
+	adjustment = woken * RWSEM_READER_BIAS - adjustment;
 	if (list_empty(&sem->wait_list)) {
 		/* hit end of list above */
-		adjustment -= RWSEM_WAITING_BIAS;
+		adjustment -= RWSEM_FLAG_WAITERS;
 	}
 
 	if (adjustment)
-		atomic_long_add(adjustment, &sem->count);
+		atomic_add(adjustment, &sem->count);
 }
 
 /*
@@ -223,24 +180,17 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
  */
-static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
+static inline bool rwsem_try_write_lock(int count, struct rw_semaphore *sem)
 {
-	/*
-	 * Avoid trying to acquire write lock if count isn't RWSEM_WAITING_BIAS.
-	 */
-	if (count != RWSEM_WAITING_BIAS)
+	int new;
+
+	if (RWSEM_COUNT_IS_LOCKED(count))
 		return false;
 
-	/*
-	 * Acquire the lock by trying to set it to ACTIVE_WRITE_BIAS. If there
-	 * are other tasks on the wait list, we need to add on WAITING_BIAS.
-	 */
-	count = list_is_singular(&sem->wait_list) ?
-			RWSEM_ACTIVE_WRITE_BIAS :
-			RWSEM_ACTIVE_WRITE_BIAS + RWSEM_WAITING_BIAS;
+	new = count + RWSEM_WRITER_LOCKED -
+	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0);
 
-	if (atomic_long_cmpxchg_acquire(&sem->count, RWSEM_WAITING_BIAS, count)
-							= RWSEM_WAITING_BIAS) {
+	if (atomic_cmpxchg_acquire(&sem->count, count, new) = count) {
 		rwsem_set_owner(sem);
 		return true;
 	}
@@ -254,14 +204,14 @@ static inline bool rwsem_try_write_lock(long count, struct rw_semaphore *sem)
  */
 static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 {
-	long old, count = atomic_long_read(&sem->count);
+	int old, count = atomic_read(&sem->count);
 
 	while (true) {
-		if (!(count = 0 || count = RWSEM_WAITING_BIAS))
+		if (RWSEM_COUNT_IS_LOCKED(count))
 			return false;
 
-		old = atomic_long_cmpxchg_acquire(&sem->count, count,
-				      count + RWSEM_ACTIVE_WRITE_BIAS);
+		old = atomic_cmpxchg_acquire(&sem->count, count,
+				count + RWSEM_WRITER_LOCKED);
 		if (old = count) {
 			rwsem_set_owner(sem);
 			return true;
@@ -418,7 +368,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore __sched *
 __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 {
-	long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
+	int count, adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
@@ -427,11 +377,11 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	if (list_empty(&sem->wait_list))
-		adjustment += RWSEM_WAITING_BIAS;
+		adjustment += RWSEM_FLAG_WAITERS;
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-	count = atomic_long_add_return(adjustment, &sem->count);
+	count = atomic_add_return(adjustment, &sem->count);
 
 	/*
 	 * If there are no active locks, wake the front queued process(es).
@@ -439,9 +389,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	 * If there are no writers and we are first in the queue,
 	 * wake our own waiter to join the existing active readers !
 	 */
-	if (count = RWSEM_WAITING_BIAS ||
-	    (count > RWSEM_WAITING_BIAS &&
-	     adjustment != -RWSEM_ACTIVE_READ_BIAS))
+	if (!RWSEM_COUNT_IS_LOCKED(count))
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 
 	raw_spin_unlock_irq(&sem->wait_lock);
@@ -467,7 +415,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 out_nolock:
 	list_del(&waiter.list);
 	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
 	raw_spin_unlock_irq(&sem->wait_lock);
 	__set_current_state(TASK_RUNNING);
 	return ERR_PTR(-EINTR);
@@ -493,15 +441,12 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore *
 __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 {
-	long count;
+	int count;
 	bool waiting = true; /* any queued threads before us */
 	struct rwsem_waiter waiter;
 	struct rw_semaphore *ret = sem;
 	DEFINE_WAKE_Q(wake_q);
 
-	/* undo write bias from down_write operation, stop active locking */
-	count = atomic_long_sub_return(RWSEM_ACTIVE_WRITE_BIAS, &sem->count);
-
 	/* do optimistic spinning and steal lock if possible */
 	if (rwsem_optimistic_spin(sem))
 		return sem;
@@ -523,14 +468,14 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 	/* we're now waiting on the lock, but no longer actively locking */
 	if (waiting) {
-		count = atomic_long_read(&sem->count);
+		count = atomic_read(&sem->count);
 
 		/*
 		 * If there were already threads queued before us and there are
 		 * no active writers, the lock must be read owned; so we try to
 		 * wake any read locks that were queued ahead of us.
 		 */
-		if (count > RWSEM_WAITING_BIAS) {
+		if (!(count & RWSEM_WRITER_LOCKED)) {
 			__rwsem_mark_wake(sem, RWSEM_WAKE_READERS, &wake_q);
 			/*
 			 * The wakeup is normally called _after_ the wait_lock
@@ -547,8 +492,9 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 			wake_q_init(&wake_q);
 		}
 
-	} else
-		count = atomic_long_add_return(RWSEM_WAITING_BIAS, &sem->count);
+	} else {
+		count = atomic_add_return(RWSEM_FLAG_WAITERS, &sem->count);
+	}
 
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
@@ -564,7 +510,8 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 			schedule();
 			set_current_state(state);
-		} while ((count = atomic_long_read(&sem->count)) & RWSEM_ACTIVE_MASK);
+			count = atomic_read(&sem->count);
+		} while (RWSEM_COUNT_IS_LOCKED(count));
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
@@ -579,7 +526,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	raw_spin_lock_irq(&sem->wait_lock);
 	list_del(&waiter.list);
 	if (list_empty(&sem->wait_list))
-		atomic_long_add(-RWSEM_WAITING_BIAS, &sem->count);
+		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
 	else
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
 	raw_spin_unlock_irq(&sem->wait_lock);
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
new file mode 100644
index 0000000..3820719
--- /dev/null
+++ b/kernel/locking/rwsem-xadd.h
@@ -0,0 +1,117 @@
+#ifndef _ASM_GENERIC_RWSEM_H
+#define _ASM_GENERIC_RWSEM_H
+
+#include <linux/rwsem.h>
+
+/*
+ * The definition of the atomic counter in the semaphore:
+ *
+ * Bit  0    - writer locked bit
+ * Bit  1    - waiters present bit
+ * Bits 2-7  - reserved
+ * Bits 8-31 - 24-bit reader count
+ *
+ * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
+ * will be used to obtain writer lock.
+ */
+#define RWSEM_WRITER_LOCKED	0X00000001
+#define RWSEM_FLAG_WAITERS	0X00000002
+#define RWSEM_READER_BIAS	0x00000100
+#define RWSEM_READER_SHIFT	8
+#define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
+#define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
+#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
+
+#define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
+
+/*
+ * lock for reading
+ */
+static inline void __down_read(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
+		     & RWSEM_READ_FAILED_MASK))
+		rwsem_down_read_failed(sem);
+}
+
+static inline int __down_read_trylock(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	while (!((tmp = atomic_read(&sem->count)) & RWSEM_READ_FAILED_MASK)) {
+		if (tmp = atomic_cmpxchg_acquire(&sem->count, tmp,
+				   tmp + RWSEM_READER_BIAS)) {
+			return 1;
+		}
+	}
+	return 0;
+}
+
+/*
+ * lock for writing
+ */
+static inline void __down_write(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_cmpxchg_acquire(&sem->count, 0,
+					    RWSEM_WRITER_LOCKED)))
+		rwsem_down_write_failed(sem);
+}
+
+static inline int __down_write_killable(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_cmpxchg_acquire(&sem->count, 0,
+					    RWSEM_WRITER_LOCKED)))
+		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
+			return -EINTR;
+	return 0;
+}
+
+static inline int __down_write_trylock(struct rw_semaphore *sem)
+{
+	return !atomic_cmpxchg_acquire(&sem->count, 0, RWSEM_WRITER_LOCKED);
+}
+
+/*
+ * unlock after reading
+ */
+static inline void __up_read(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	tmp = atomic_add_return_release(-RWSEM_READER_BIAS, &sem->count);
+	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS))
+			= RWSEM_FLAG_WAITERS))
+		rwsem_wake(sem);
+}
+
+/*
+ * unlock after writing
+ */
+static inline void __up_write(struct rw_semaphore *sem)
+{
+	if (unlikely(atomic_fetch_add_release(-RWSEM_WRITER_LOCKED,
+			&sem->count) & RWSEM_FLAG_WAITERS))
+		rwsem_wake(sem);
+}
+
+/*
+ * downgrade write lock to read lock
+ */
+static inline void __downgrade_write(struct rw_semaphore *sem)
+{
+	int tmp;
+
+	/*
+	 * When downgrading from exclusive to shared ownership,
+	 * anything inside the write-locked region cannot leak
+	 * into the read side. In contrast, anything in the
+	 * read-locked region is ok to be re-ordered into the
+	 * write side. As such, rely on RELEASE semantics.
+	 */
+	tmp = atomic_fetch_add_release(-RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS,
+					&sem->count);
+	if (tmp & RWSEM_FLAG_WAITERS)
+		rwsem_downgrade_wake(sem);
+}
+
+#endif	/* _ASM_GENERIC_RWSEM_H */
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index a699f40..adcc5af 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -66,3 +66,7 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
 }
 #endif
+
+#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
+#include "rwsem-xadd.h"
+#endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 03/11] locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:01   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The setting of owner field is specific to rwsem-xadd, it is not needed
for rwsem-spinlock. This patch moves all the owner setting code to
the fast paths directly within rwsem-add.h file.

The rwsem_set_reader_owned() is now only called by the first reader.
So there is no need to do a read to check if the owner is properly
set first.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c |  7 +++----
 kernel/locking/rwsem-xadd.h | 19 ++++++++++++++++---
 kernel/locking/rwsem.c      | 17 ++---------------
 kernel/locking/rwsem.h      | 11 ++++-------
 4 files changed, 25 insertions(+), 29 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 39dc5be..30bc163 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -132,11 +132,10 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			return;
 		}
 		/*
-		 * It is not really necessary to set it to reader-owned here,
-		 * but it gives the spinners an early indication that the
-		 * readers now have the lock.
+		 * Set it to reader-owned for first reader.
 		 */
-		rwsem_set_reader_owned(sem);
+		if (!(oldcount >> RWSEM_READER_SHIFT))
+			rwsem_set_reader_owned(sem);
 	}
 
 	/*
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 3820719..abcb484 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -29,9 +29,12 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
-		     & RWSEM_READ_FAILED_MASK))
+	int count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
+
+	if (unlikely(count & RWSEM_READ_FAILED_MASK))
 		rwsem_down_read_failed(sem);
+	else if ((count >> RWSEM_READER_SHIFT) == 1)
+		rwsem_set_reader_owned(sem);
 }
 
 static inline int __down_read_trylock(struct rw_semaphore *sem)
@@ -41,6 +44,8 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while (!((tmp = atomic_read(&sem->count)) & RWSEM_READ_FAILED_MASK)) {
 		if (tmp == atomic_cmpxchg_acquire(&sem->count, tmp,
 				   tmp + RWSEM_READER_BIAS)) {
+			if (!(tmp >> RWSEM_READER_SHIFT))
+				rwsem_set_reader_owned(sem);
 			return 1;
 		}
 	}
@@ -55,6 +60,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 	if (unlikely(atomic_cmpxchg_acquire(&sem->count, 0,
 					    RWSEM_WRITER_LOCKED)))
 		rwsem_down_write_failed(sem);
+	rwsem_set_owner(sem);
 }
 
 static inline int __down_write_killable(struct rw_semaphore *sem)
@@ -63,12 +69,17 @@ static inline int __down_write_killable(struct rw_semaphore *sem)
 					    RWSEM_WRITER_LOCKED)))
 		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
 			return -EINTR;
+	rwsem_set_owner(sem);
 	return 0;
 }
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	return !atomic_cmpxchg_acquire(&sem->count, 0, RWSEM_WRITER_LOCKED);
+	bool taken = !atomic_cmpxchg_acquire(&sem->count, 0,
+					     RWSEM_WRITER_LOCKED);
+	if (taken)
+		rwsem_set_owner(sem);
+	return taken;
 }
 
 /*
@@ -89,6 +100,7 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
+	rwsem_clear_owner(sem);
 	if (unlikely(atomic_fetch_add_release(-RWSEM_WRITER_LOCKED,
 			&sem->count) & RWSEM_FLAG_WAITERS))
 		rwsem_wake(sem);
@@ -110,6 +122,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 	 */
 	tmp = atomic_fetch_add_release(-RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS,
 					&sem->count);
+	rwsem_set_reader_owned(sem);
 	if (tmp & RWSEM_FLAG_WAITERS)
 		rwsem_downgrade_wake(sem);
 }
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 4d48b1c..0a32725 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -23,7 +23,6 @@ void __sched down_read(struct rw_semaphore *sem)
 	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
-	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read);
@@ -35,10 +34,8 @@ int down_read_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_read_trylock(sem);
 
-	if (ret == 1) {
+	if (ret == 1)
 		rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
-		rwsem_set_reader_owned(sem);
-	}
 	return ret;
 }
 
@@ -53,7 +50,6 @@ void __sched down_write(struct rw_semaphore *sem)
 	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
-	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(down_write);
@@ -71,7 +67,6 @@ int __sched down_write_killable(struct rw_semaphore *sem)
 		return -EINTR;
 	}
 
-	rwsem_set_owner(sem);
 	return 0;
 }
 
@@ -84,10 +79,8 @@ int down_write_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_write_trylock(sem);
 
-	if (ret == 1) {
+	if (ret == 1)
 		rwsem_acquire(&sem->dep_map, 0, 1, _RET_IP_);
-		rwsem_set_owner(sem);
-	}
 
 	return ret;
 }
@@ -113,7 +106,6 @@ void up_write(struct rw_semaphore *sem)
 {
 	rwsem_release(&sem->dep_map, 1, _RET_IP_);
 
-	rwsem_clear_owner(sem);
 	__up_write(sem);
 }
 
@@ -126,7 +118,6 @@ void downgrade_write(struct rw_semaphore *sem)
 {
 	lock_downgrade(&sem->dep_map, _RET_IP_);
 
-	rwsem_set_reader_owned(sem);
 	__downgrade_write(sem);
 }
 
@@ -140,7 +131,6 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
-	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read_nested);
@@ -151,7 +141,6 @@ void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest)
 	rwsem_acquire_nest(&sem->dep_map, 0, 0, nest, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
-	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(_down_write_nest_lock);
@@ -171,7 +160,6 @@ void down_write_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
-	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(down_write_nested);
@@ -186,7 +174,6 @@ int __sched down_write_killable_nested(struct rw_semaphore *sem, int subclass)
 		return -EINTR;
 	}
 
-	rwsem_set_owner(sem);
 	return 0;
 }
 
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index adcc5af..612109f 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -33,15 +33,12 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 	WRITE_ONCE(sem->owner, NULL);
 }
 
+/*
+ * This should only be called by the first reader that acquires the read lock.
+ */
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
-	/*
-	 * We check the owner value first to make sure that we will only
-	 * do a write to the rwsem cacheline when it is really necessary
-	 * to minimize cacheline contention.
-	 */
-	if (sem->owner != RWSEM_READER_OWNED)
-		WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
 }
 
 static inline bool rwsem_owner_is_writer(struct task_struct *owner)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 03/11] locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h
@ 2017-10-11 18:01   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The setting of owner field is specific to rwsem-xadd, it is not needed
for rwsem-spinlock. This patch moves all the owner setting code to
the fast paths directly within rwsem-add.h file.

The rwsem_set_reader_owned() is now only called by the first reader.
So there is no need to do a read to check if the owner is properly
set first.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c |  7 +++----
 kernel/locking/rwsem-xadd.h | 19 ++++++++++++++++---
 kernel/locking/rwsem.c      | 17 ++---------------
 kernel/locking/rwsem.h      | 11 ++++-------
 4 files changed, 25 insertions(+), 29 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 39dc5be..30bc163 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -132,11 +132,10 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			return;
 		}
 		/*
-		 * It is not really necessary to set it to reader-owned here,
-		 * but it gives the spinners an early indication that the
-		 * readers now have the lock.
+		 * Set it to reader-owned for first reader.
 		 */
-		rwsem_set_reader_owned(sem);
+		if (!(oldcount >> RWSEM_READER_SHIFT))
+			rwsem_set_reader_owned(sem);
 	}
 
 	/*
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 3820719..abcb484 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -29,9 +29,12 @@
  */
 static inline void __down_read(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
-		     & RWSEM_READ_FAILED_MASK))
+	int count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
+
+	if (unlikely(count & RWSEM_READ_FAILED_MASK))
 		rwsem_down_read_failed(sem);
+	else if ((count >> RWSEM_READER_SHIFT) = 1)
+		rwsem_set_reader_owned(sem);
 }
 
 static inline int __down_read_trylock(struct rw_semaphore *sem)
@@ -41,6 +44,8 @@ static inline int __down_read_trylock(struct rw_semaphore *sem)
 	while (!((tmp = atomic_read(&sem->count)) & RWSEM_READ_FAILED_MASK)) {
 		if (tmp = atomic_cmpxchg_acquire(&sem->count, tmp,
 				   tmp + RWSEM_READER_BIAS)) {
+			if (!(tmp >> RWSEM_READER_SHIFT))
+				rwsem_set_reader_owned(sem);
 			return 1;
 		}
 	}
@@ -55,6 +60,7 @@ static inline void __down_write(struct rw_semaphore *sem)
 	if (unlikely(atomic_cmpxchg_acquire(&sem->count, 0,
 					    RWSEM_WRITER_LOCKED)))
 		rwsem_down_write_failed(sem);
+	rwsem_set_owner(sem);
 }
 
 static inline int __down_write_killable(struct rw_semaphore *sem)
@@ -63,12 +69,17 @@ static inline int __down_write_killable(struct rw_semaphore *sem)
 					    RWSEM_WRITER_LOCKED)))
 		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
 			return -EINTR;
+	rwsem_set_owner(sem);
 	return 0;
 }
 
 static inline int __down_write_trylock(struct rw_semaphore *sem)
 {
-	return !atomic_cmpxchg_acquire(&sem->count, 0, RWSEM_WRITER_LOCKED);
+	bool taken = !atomic_cmpxchg_acquire(&sem->count, 0,
+					     RWSEM_WRITER_LOCKED);
+	if (taken)
+		rwsem_set_owner(sem);
+	return taken;
 }
 
 /*
@@ -89,6 +100,7 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
+	rwsem_clear_owner(sem);
 	if (unlikely(atomic_fetch_add_release(-RWSEM_WRITER_LOCKED,
 			&sem->count) & RWSEM_FLAG_WAITERS))
 		rwsem_wake(sem);
@@ -110,6 +122,7 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 	 */
 	tmp = atomic_fetch_add_release(-RWSEM_WRITER_LOCKED+RWSEM_READER_BIAS,
 					&sem->count);
+	rwsem_set_reader_owned(sem);
 	if (tmp & RWSEM_FLAG_WAITERS)
 		rwsem_downgrade_wake(sem);
 }
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 4d48b1c..0a32725 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -23,7 +23,6 @@ void __sched down_read(struct rw_semaphore *sem)
 	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
-	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read);
@@ -35,10 +34,8 @@ int down_read_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_read_trylock(sem);
 
-	if (ret = 1) {
+	if (ret = 1)
 		rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
-		rwsem_set_reader_owned(sem);
-	}
 	return ret;
 }
 
@@ -53,7 +50,6 @@ void __sched down_write(struct rw_semaphore *sem)
 	rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
-	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(down_write);
@@ -71,7 +67,6 @@ int __sched down_write_killable(struct rw_semaphore *sem)
 		return -EINTR;
 	}
 
-	rwsem_set_owner(sem);
 	return 0;
 }
 
@@ -84,10 +79,8 @@ int down_write_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_write_trylock(sem);
 
-	if (ret = 1) {
+	if (ret = 1)
 		rwsem_acquire(&sem->dep_map, 0, 1, _RET_IP_);
-		rwsem_set_owner(sem);
-	}
 
 	return ret;
 }
@@ -113,7 +106,6 @@ void up_write(struct rw_semaphore *sem)
 {
 	rwsem_release(&sem->dep_map, 1, _RET_IP_);
 
-	rwsem_clear_owner(sem);
 	__up_write(sem);
 }
 
@@ -126,7 +118,6 @@ void downgrade_write(struct rw_semaphore *sem)
 {
 	lock_downgrade(&sem->dep_map, _RET_IP_);
 
-	rwsem_set_reader_owned(sem);
 	__downgrade_write(sem);
 }
 
@@ -140,7 +131,6 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
-	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read_nested);
@@ -151,7 +141,6 @@ void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map *nest)
 	rwsem_acquire_nest(&sem->dep_map, 0, 0, nest, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
-	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(_down_write_nest_lock);
@@ -171,7 +160,6 @@ void down_write_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_write_trylock, __down_write);
-	rwsem_set_owner(sem);
 }
 
 EXPORT_SYMBOL(down_write_nested);
@@ -186,7 +174,6 @@ int __sched down_write_killable_nested(struct rw_semaphore *sem, int subclass)
 		return -EINTR;
 	}
 
-	rwsem_set_owner(sem);
 	return 0;
 }
 
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index adcc5af..612109f 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -33,15 +33,12 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 	WRITE_ONCE(sem->owner, NULL);
 }
 
+/*
+ * This should only be called by the first reader that acquires the read lock.
+ */
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
-	/*
-	 * We check the owner value first to make sure that we will only
-	 * do a write to the rwsem cacheline when it is really necessary
-	 * to minimize cacheline contention.
-	 */
-	if (sem->owner != RWSEM_READER_OWNED)
-		WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
 }
 
 static inline bool rwsem_owner_is_writer(struct task_struct *owner)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 04/11] locking/rwsem: Remove kernel/locking/rwsem.h
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:01   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The content of kernel/locking/rwsem.h is now specific to rwsem-xadd. So
we can just move the its content into rwsem-xadd.h and remove it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/percpu-rwsem.c |  4 ++-
 kernel/locking/rwsem-xadd.c   |  2 +-
 kernel/locking/rwsem-xadd.h   | 66 +++++++++++++++++++++++++++++++++++++++++
 kernel/locking/rwsem.c        |  4 ++-
 kernel/locking/rwsem.h        | 69 -------------------------------------------
 5 files changed, 73 insertions(+), 72 deletions(-)
 delete mode 100644 kernel/locking/rwsem.h

diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index f17dad9..d06f7c3 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -7,7 +7,9 @@
 #include <linux/sched.h>
 #include <linux/errno.h>
 
-#include "rwsem.h"
+#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
+#include "rwsem-xadd.h"
+#endif
 
 int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
 			const char *name, struct lock_class_key *rwsem_key)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 30bc163..e3ab430 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -18,7 +18,7 @@
 #include <linux/sched/debug.h>
 #include <linux/osq_lock.h>
 
-#include "rwsem.h"
+#include "rwsem-xadd.h"
 
 /*
  * Guide to the rw_semaphore's count field.
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index abcb484..4c19539 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -4,6 +4,72 @@
 #include <linux/rwsem.h>
 
 /*
+ * The owner field of the rw_semaphore structure will be set to
+ * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
+ * the owner field when it unlocks. A reader, on the other hand, will
+ * not touch the owner field when it unlocks.
+ *
+ * In essence, the owner field now has the following 3 states:
+ *  1) 0
+ *     - lock is free or the owner hasn't set the field yet
+ *  2) RWSEM_READER_OWNED
+ *     - lock is currently or previously owned by readers (lock is free
+ *       or not set by owner yet)
+ *  3) Other non-zero value
+ *     - a writer owns the lock
+ */
+#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
+
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * All writes to owner are protected by WRITE_ONCE() to make sure that
+ * store tearing can't happen as optimistic spinners may read and use
+ * the owner value concurrently without lock. Read from owner, however,
+ * may not need READ_ONCE() as long as the pointer value is only used
+ * for comparison and isn't being dereferenced.
+ */
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+	WRITE_ONCE(sem->owner, current);
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+	WRITE_ONCE(sem->owner, NULL);
+}
+
+/*
+ * This should only be called by the first reader that acquires the read lock.
+ */
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+}
+
+static inline bool rwsem_owner_is_writer(struct task_struct *owner)
+{
+	return owner && owner != RWSEM_READER_OWNED;
+}
+
+static inline bool rwsem_owner_is_reader(struct task_struct *owner)
+{
+	return owner == RWSEM_READER_OWNED;
+}
+#else
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+}
+
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+}
+#endif
+
+/*
  * The definition of the atomic counter in the semaphore:
  *
  * Bit  0    - writer locked bit
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 0a32725..2ad3af8 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -12,7 +12,9 @@
 #include <linux/rwsem.h>
 #include <linux/atomic.h>
 
-#include "rwsem.h"
+#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
+#include "rwsem-xadd.h"
+#endif
 
 /*
  * lock for reading
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
deleted file mode 100644
index 612109f..0000000
--- a/kernel/locking/rwsem.h
+++ /dev/null
@@ -1,69 +0,0 @@
-/*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
- *
- * In essence, the owner field now has the following 3 states:
- *  1) 0
- *     - lock is free or the owner hasn't set the field yet
- *  2) RWSEM_READER_OWNED
- *     - lock is currently or previously owned by readers (lock is free
- *       or not set by owner yet)
- *  3) Other non-zero value
- *     - a writer owns the lock
- */
-#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
-
-#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
-/*
- * All writes to owner are protected by WRITE_ONCE() to make sure that
- * store tearing can't happen as optimistic spinners may read and use
- * the owner value concurrently without lock. Read from owner, however,
- * may not need READ_ONCE() as long as the pointer value is only used
- * for comparison and isn't being dereferenced.
- */
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-	WRITE_ONCE(sem->owner, current);
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-	WRITE_ONCE(sem->owner, NULL);
-}
-
-/*
- * This should only be called by the first reader that acquires the read lock.
- */
-static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
-{
-	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
-}
-
-static inline bool rwsem_owner_is_writer(struct task_struct *owner)
-{
-	return owner && owner != RWSEM_READER_OWNED;
-}
-
-static inline bool rwsem_owner_is_reader(struct task_struct *owner)
-{
-	return owner == RWSEM_READER_OWNED;
-}
-#else
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-}
-
-static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
-{
-}
-#endif
-
-#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
-#include "rwsem-xadd.h"
-#endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 04/11] locking/rwsem: Remove kernel/locking/rwsem.h
@ 2017-10-11 18:01   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

The content of kernel/locking/rwsem.h is now specific to rwsem-xadd. So
we can just move the its content into rwsem-xadd.h and remove it.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/percpu-rwsem.c |  4 ++-
 kernel/locking/rwsem-xadd.c   |  2 +-
 kernel/locking/rwsem-xadd.h   | 66 +++++++++++++++++++++++++++++++++++++++++
 kernel/locking/rwsem.c        |  4 ++-
 kernel/locking/rwsem.h        | 69 -------------------------------------------
 5 files changed, 73 insertions(+), 72 deletions(-)
 delete mode 100644 kernel/locking/rwsem.h

diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index f17dad9..d06f7c3 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -7,7 +7,9 @@
 #include <linux/sched.h>
 #include <linux/errno.h>
 
-#include "rwsem.h"
+#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
+#include "rwsem-xadd.h"
+#endif
 
 int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
 			const char *name, struct lock_class_key *rwsem_key)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 30bc163..e3ab430 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -18,7 +18,7 @@
 #include <linux/sched/debug.h>
 #include <linux/osq_lock.h>
 
-#include "rwsem.h"
+#include "rwsem-xadd.h"
 
 /*
  * Guide to the rw_semaphore's count field.
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index abcb484..4c19539 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -4,6 +4,72 @@
 #include <linux/rwsem.h>
 
 /*
+ * The owner field of the rw_semaphore structure will be set to
+ * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
+ * the owner field when it unlocks. A reader, on the other hand, will
+ * not touch the owner field when it unlocks.
+ *
+ * In essence, the owner field now has the following 3 states:
+ *  1) 0
+ *     - lock is free or the owner hasn't set the field yet
+ *  2) RWSEM_READER_OWNED
+ *     - lock is currently or previously owned by readers (lock is free
+ *       or not set by owner yet)
+ *  3) Other non-zero value
+ *     - a writer owns the lock
+ */
+#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
+
+#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * All writes to owner are protected by WRITE_ONCE() to make sure that
+ * store tearing can't happen as optimistic spinners may read and use
+ * the owner value concurrently without lock. Read from owner, however,
+ * may not need READ_ONCE() as long as the pointer value is only used
+ * for comparison and isn't being dereferenced.
+ */
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+	WRITE_ONCE(sem->owner, current);
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+	WRITE_ONCE(sem->owner, NULL);
+}
+
+/*
+ * This should only be called by the first reader that acquires the read lock.
+ */
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+}
+
+static inline bool rwsem_owner_is_writer(struct task_struct *owner)
+{
+	return owner && owner != RWSEM_READER_OWNED;
+}
+
+static inline bool rwsem_owner_is_reader(struct task_struct *owner)
+{
+	return owner = RWSEM_READER_OWNED;
+}
+#else
+static inline void rwsem_set_owner(struct rw_semaphore *sem)
+{
+}
+
+static inline void rwsem_clear_owner(struct rw_semaphore *sem)
+{
+}
+
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+}
+#endif
+
+/*
  * The definition of the atomic counter in the semaphore:
  *
  * Bit  0    - writer locked bit
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 0a32725..2ad3af8 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -12,7 +12,9 @@
 #include <linux/rwsem.h>
 #include <linux/atomic.h>
 
-#include "rwsem.h"
+#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
+#include "rwsem-xadd.h"
+#endif
 
 /*
  * lock for reading
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
deleted file mode 100644
index 612109f..0000000
--- a/kernel/locking/rwsem.h
+++ /dev/null
@@ -1,69 +0,0 @@
-/*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
- *
- * In essence, the owner field now has the following 3 states:
- *  1) 0
- *     - lock is free or the owner hasn't set the field yet
- *  2) RWSEM_READER_OWNED
- *     - lock is currently or previously owned by readers (lock is free
- *       or not set by owner yet)
- *  3) Other non-zero value
- *     - a writer owns the lock
- */
-#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
-
-#ifdef CONFIG_RWSEM_SPIN_ON_OWNER
-/*
- * All writes to owner are protected by WRITE_ONCE() to make sure that
- * store tearing can't happen as optimistic spinners may read and use
- * the owner value concurrently without lock. Read from owner, however,
- * may not need READ_ONCE() as long as the pointer value is only used
- * for comparison and isn't being dereferenced.
- */
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-	WRITE_ONCE(sem->owner, current);
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-	WRITE_ONCE(sem->owner, NULL);
-}
-
-/*
- * This should only be called by the first reader that acquires the read lock.
- */
-static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
-{
-	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
-}
-
-static inline bool rwsem_owner_is_writer(struct task_struct *owner)
-{
-	return owner && owner != RWSEM_READER_OWNED;
-}
-
-static inline bool rwsem_owner_is_reader(struct task_struct *owner)
-{
-	return owner = RWSEM_READER_OWNED;
-}
-#else
-static inline void rwsem_set_owner(struct rw_semaphore *sem)
-{
-}
-
-static inline void rwsem_clear_owner(struct rw_semaphore *sem)
-{
-}
-
-static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
-{
-}
-#endif
-
-#ifdef CONFIG_RWSEM_XCHGADD_ALGORITHM
-#include "rwsem-xadd.h"
-#endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 05/11] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:01   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

We don't need to expose rwsem internal functions which are not supposed
to be called directly from other kernel code.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/rwsem.h       | 7 -------
 kernel/locking/rwsem-xadd.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index d0f59df..32389ee 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -42,13 +42,6 @@ struct rw_semaphore {
 #endif
 };
 
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
-extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
-
 #define RWSEM_UNLOCKED_VALUE	0
 
 /* In all implementations count != 0 means locked */
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 4c19539..9b30f0c 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -90,6 +90,13 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 
 #define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
 
+extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
+extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
+
 /*
  * lock for reading
  */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 05/11] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h
@ 2017-10-11 18:01   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

We don't need to expose rwsem internal functions which are not supposed
to be called directly from other kernel code.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/rwsem.h       | 7 -------
 kernel/locking/rwsem-xadd.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index d0f59df..32389ee 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -42,13 +42,6 @@ struct rw_semaphore {
 #endif
 };
 
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
-extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
-
 #define RWSEM_UNLOCKED_VALUE	0
 
 /* In all implementations count != 0 means locked */
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 4c19539..9b30f0c 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -90,6 +90,13 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 
 #define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
 
+extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
+extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
+extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
+
 /*
  * lock for reading
  */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 06/11] locking/rwsem: Remove arch specific rwsem files
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:01   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

As the generic rwsem-xadd code is using the appropriate acquire and
release versions of the atomic operations, the arch specific rwsem.h
files will not that much faster than the generic code. So we can
remove those arch specific rwsem.h and stop building asm/rwsem.h to
reduce maintenance effort.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 arch/alpha/include/asm/rwsem.h  | 195 -----------------------------------
 arch/arm/include/asm/Kbuild     |   1 -
 arch/arm64/include/asm/Kbuild   |   1 -
 arch/hexagon/include/asm/Kbuild |   1 -
 arch/ia64/include/asm/rwsem.h   | 154 ----------------------------
 arch/powerpc/include/asm/Kbuild |   1 -
 arch/s390/include/asm/rwsem.h   | 210 --------------------------------------
 arch/sh/include/asm/Kbuild      |   1 -
 arch/sparc/include/asm/Kbuild   |   1 -
 arch/x86/include/asm/rwsem.h    | 221 ----------------------------------------
 arch/x86/lib/Makefile           |   1 -
 arch/x86/lib/rwsem.S            | 144 --------------------------
 arch/xtensa/include/asm/Kbuild  |   1 -
 13 files changed, 932 deletions(-)
 delete mode 100644 arch/alpha/include/asm/rwsem.h
 delete mode 100644 arch/ia64/include/asm/rwsem.h
 delete mode 100644 arch/s390/include/asm/rwsem.h
 delete mode 100644 arch/x86/include/asm/rwsem.h
 delete mode 100644 arch/x86/lib/rwsem.S

diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
deleted file mode 100644
index 77873d0..0000000
--- a/arch/alpha/include/asm/rwsem.h
+++ /dev/null
@@ -1,195 +0,0 @@
-#ifndef _ALPHA_RWSEM_H
-#define _ALPHA_RWSEM_H
-
-/*
- * Written by Ivan Kokshaysky <ink@jurassic.park.msu.ru>, 2001.
- * Based on asm-alpha/semaphore.h and asm-i386/rwsem.h
- */
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-
-#include <linux/compiler.h>
-
-#define RWSEM_UNLOCKED_VALUE		0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS		0x0000000000000001L
-#define RWSEM_ACTIVE_MASK		0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS		(-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter += RWSEM_ACTIVE_READ_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"1:	ldq_l	%0,%1\n"
-	"	addq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	mb\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_READ_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(oldcount < 0))
-		rwsem_down_read_failed(sem);
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	long old, new, res;
-
-	res = atomic_long_read(&sem->count);
-	do {
-		new = res + RWSEM_ACTIVE_READ_BIAS;
-		if (new <= 0)
-			break;
-		old = res;
-		res = atomic_long_cmpxchg(&sem->count, old, new);
-	} while (res != old);
-	return res >= 0 ? 1 : 0;
-}
-
-static inline long ___down_write(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter += RWSEM_ACTIVE_WRITE_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"1:	ldq_l	%0,%1\n"
-	"	addq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	mb\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
-#endif
-	return oldcount;
-}
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	if (unlikely(___down_write(sem)))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (unlikely(___down_write(sem)))
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	long ret = atomic_long_cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-			   RWSEM_ACTIVE_WRITE_BIAS);
-	if (ret == RWSEM_UNLOCKED_VALUE)
-		return 1;
-	return 0;
-}
-
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter -= RWSEM_ACTIVE_READ_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"	mb\n"
-	"1:	ldq_l	%0,%1\n"
-	"	subq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_READ_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(oldcount < 0))
-		if ((int)oldcount - RWSEM_ACTIVE_READ_BIAS == 0)
-			rwsem_wake(sem);
-}
-
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	long count;
-#ifndef	CONFIG_SMP
-	sem->count.counter -= RWSEM_ACTIVE_WRITE_BIAS;
-	count = sem->count.counter;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"	mb\n"
-	"1:	ldq_l	%0,%1\n"
-	"	subq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	subq	%0,%3,%0\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (count), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(count))
-		if ((int)count == 0)
-			rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter -= RWSEM_WAITING_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"1:	ldq_l	%0,%1\n"
-	"	addq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	mb\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (-RWSEM_WAITING_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(oldcount < 0))
-		rwsem_downgrade_wake(sem);
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ALPHA_RWSEM_H */
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 721ab5e..58337ef 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -12,7 +12,6 @@ generic-y += mm-arch-hooks.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += seccomp.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 2326e39..38366a6 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -16,7 +16,6 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += msi.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += segment.h
 generic-y += serial.h
 generic-y += set_memory.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 3401368..002eb1f 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -24,7 +24,6 @@ generic-y += mm-arch-hooks.h
 generic-y += pci.h
 generic-y += percpu.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
deleted file mode 100644
index 8fa98dd..0000000
--- a/arch/ia64/include/asm/rwsem.h
+++ /dev/null
@@ -1,154 +0,0 @@
-/*
- * R/W semaphores for ia64
- *
- * Copyright (C) 2003 Ken Chen <kenneth.w.chen@intel.com>
- * Copyright (C) 2003 Asit Mallick <asit.k.mallick@intel.com>
- * Copyright (C) 2005 Christoph Lameter <cl@linux.com>
- *
- * Based on asm-i386/rwsem.h and other architecture implementation.
- *
- * The MSW of the count is the negated number of active writers and
- * waiting lockers, and the LSW is the total number of active locks.
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffffffff00000001 for
- * the case of an uncontended lock. Readers increment by 1 and see a positive
- * value when uncontended, negative if there are writers (and maybe) readers
- * waiting (in which case it goes to sleep).
- */
-
-#ifndef _ASM_IA64_RWSEM_H
-#define _ASM_IA64_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "Please don't include <asm/rwsem.h> directly, use <linux/rwsem.h> instead."
-#endif
-
-#include <asm/intrinsics.h>
-
-#define RWSEM_UNLOCKED_VALUE		__IA64_UL_CONST(0x0000000000000000)
-#define RWSEM_ACTIVE_BIAS		(1L)
-#define RWSEM_ACTIVE_MASK		(0xffffffffL)
-#define RWSEM_WAITING_BIAS		(-0x100000000L)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void
-__down_read (struct rw_semaphore *sem)
-{
-	long result = ia64_fetchadd8_acq((unsigned long *)&sem->count.counter, 1);
-
-	if (result < 0)
-		rwsem_down_read_failed(sem);
-}
-
-/*
- * lock for writing
- */
-static inline long
-___down_write (struct rw_semaphore *sem)
-{
-	long old, new;
-
-	do {
-		old = atomic_long_read(&sem->count);
-		new = old + RWSEM_ACTIVE_WRITE_BIAS;
-	} while (atomic_long_cmpxchg_acquire(&sem->count, old, new) != old);
-
-	return old;
-}
-
-static inline void
-__down_write (struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int
-__down_write_killable (struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-/*
- * unlock after reading
- */
-static inline void
-__up_read (struct rw_semaphore *sem)
-{
-	long result = ia64_fetchadd8_rel((unsigned long *)&sem->count.counter, -1);
-
-	if (result < 0 && (--result & RWSEM_ACTIVE_MASK) == 0)
-		rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void
-__up_write (struct rw_semaphore *sem)
-{
-	long old, new;
-
-	do {
-		old = atomic_long_read(&sem->count);
-		new = old - RWSEM_ACTIVE_WRITE_BIAS;
-	} while (atomic_long_cmpxchg_release(&sem->count, old, new) != old);
-
-	if (new < 0 && (new & RWSEM_ACTIVE_MASK) == 0)
-		rwsem_wake(sem);
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int
-__down_read_trylock (struct rw_semaphore *sem)
-{
-	long tmp;
-	while ((tmp = atomic_long_read(&sem->count)) >= 0) {
-		if (tmp == atomic_long_cmpxchg_acquire(&sem->count, tmp, tmp+1)) {
-			return 1;
-		}
-	}
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int
-__down_write_trylock (struct rw_semaphore *sem)
-{
-	long tmp = atomic_long_cmpxchg_acquire(&sem->count,
-			RWSEM_UNLOCKED_VALUE, RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp == RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void
-__downgrade_write (struct rw_semaphore *sem)
-{
-	long old, new;
-
-	do {
-		old = atomic_long_read(&sem->count);
-		new = old - RWSEM_WAITING_BIAS;
-	} while (atomic_long_cmpxchg_release(&sem->count, old, new) != old);
-
-	if (old < 0)
-		rwsem_downgrade_wake(sem);
-}
-
-#endif /* _ASM_IA64_RWSEM_H */
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 2542ea1..e25807a 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -6,6 +6,5 @@ generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += vtime.h
 generic-y += msi.h
diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
deleted file mode 100644
index 597e7e9..0000000
--- a/arch/s390/include/asm/rwsem.h
+++ /dev/null
@@ -1,210 +0,0 @@
-#ifndef _S390_RWSEM_H
-#define _S390_RWSEM_H
-
-/*
- *  S390 version
- *    Copyright IBM Corp. 2002
- *    Author(s): Martin Schwidefsky (schwidefsky@de.ibm.com)
- *
- *  Based on asm-alpha/semaphore.h and asm-i386/rwsem.h
- */
-
-/*
- *
- * The MSW of the count is the negated number of active writers and waiting
- * lockers, and the LSW is the total number of active locks
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
- * uncontended lock. This can be determined because XADD returns the old value.
- * Readers increment by 1 and see a positive value when uncontended, negative
- * if there are writers (and maybe) readers waiting (in which case it goes to
- * sleep).
- *
- * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
- * be extended to 65534 by manually checking the whole MSW rather than relying
- * on the S flag.
- *
- * The value of ACTIVE_BIAS supports up to 65535 active processes.
- *
- * This should be totally fair - if anything is waiting, a process that wants a
- * lock will go to the back of the queue. When the currently active lock is
- * released, if there's a writer at the front of the queue, then that and only
- * that will be woken up; if there's a bunch of consecutive readers at the
- * front, then they'll all be woken up, but no other readers will be.
- */
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#define RWSEM_UNLOCKED_VALUE	0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS	0x0000000000000001L
-#define RWSEM_ACTIVE_MASK	0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS	(-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS	RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS	(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	signed long old, new;
-
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	aghi	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "i" (RWSEM_ACTIVE_READ_BIAS)
-		: "cc", "memory");
-	if (old < 0)
-		rwsem_down_read_failed(sem);
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	signed long old, new;
-
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	ltgr	%1,%0\n"
-		"	jm	1f\n"
-		"	aghi	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b\n"
-		"1:"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "i" (RWSEM_ACTIVE_READ_BIAS)
-		: "cc", "memory");
-	return old >= 0 ? 1 : 0;
-}
-
-/*
- * lock for writing
- */
-static inline long ___down_write(struct rw_semaphore *sem)
-{
-	signed long old, new, tmp;
-
-	tmp = RWSEM_ACTIVE_WRITE_BIAS;
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	ag	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "m" (tmp)
-		: "cc", "memory");
-
-	return old;
-}
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	signed long old;
-
-	asm volatile(
-		"	lg	%0,%1\n"
-		"0:	ltgr	%0,%0\n"
-		"	jnz	1f\n"
-		"	csg	%0,%3,%1\n"
-		"	jl	0b\n"
-		"1:"
-		: "=&d" (old), "=Q" (sem->count)
-		: "Q" (sem->count), "d" (RWSEM_ACTIVE_WRITE_BIAS)
-		: "cc", "memory");
-	return (old == RWSEM_UNLOCKED_VALUE) ? 1 : 0;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	signed long old, new;
-
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	aghi	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "i" (-RWSEM_ACTIVE_READ_BIAS)
-		: "cc", "memory");
-	if (new < 0)
-		if ((new & RWSEM_ACTIVE_MASK) == 0)
-			rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	signed long old, new, tmp;
-
-	tmp = -RWSEM_ACTIVE_WRITE_BIAS;
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	ag	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "m" (tmp)
-		: "cc", "memory");
-	if (new < 0)
-		if ((new & RWSEM_ACTIVE_MASK) == 0)
-			rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	signed long old, new, tmp;
-
-	tmp = -RWSEM_WAITING_BIAS;
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	ag	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "m" (tmp)
-		: "cc", "memory");
-	if (new > 1)
-		rwsem_downgrade_wake(sem);
-}
-
-#endif /* _S390_RWSEM_H */
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 1a6f9c3..24af7e6 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -13,7 +13,6 @@ generic-y += mm-arch-hooks.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += serial.h
 generic-y += sizes.h
 generic-y += trace_clock.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 80ddc01..2c6d988 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -15,7 +15,6 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += module.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += serial.h
 generic-y += trace_clock.h
 generic-y += word-at-a-time.h
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
deleted file mode 100644
index a8f486e..0000000
--- a/arch/x86/include/asm/rwsem.h
+++ /dev/null
@@ -1,221 +0,0 @@
-/* rwsem.h: R/W semaphores implemented using XADD/CMPXCHG for i486+
- *
- * Written by David Howells (dhowells@redhat.com).
- *
- * Derived from asm-x86/semaphore.h
- *
- *
- * The MSW of the count is the negated number of active writers and waiting
- * lockers, and the LSW is the total number of active locks
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
- * uncontended lock. This can be determined because XADD returns the old value.
- * Readers increment by 1 and see a positive value when uncontended, negative
- * if there are writers (and maybe) readers waiting (in which case it goes to
- * sleep).
- *
- * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
- * be extended to 65534 by manually checking the whole MSW rather than relying
- * on the S flag.
- *
- * The value of ACTIVE_BIAS supports up to 65535 active processes.
- *
- * This should be totally fair - if anything is waiting, a process that wants a
- * lock will go to the back of the queue. When the currently active lock is
- * released, if there's a writer at the front of the queue, then that and only
- * that will be woken up; if there's a bunch of consecutive readers at the
- * front, then they'll all be woken up, but no other readers will be.
- */
-
-#ifndef _ASM_X86_RWSEM_H
-#define _ASM_X86_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-#include <asm/asm.h>
-
-/*
- * The bias values and the counter type limits the number of
- * potential readers/writers to 32767 for 32 bits and 2147483647
- * for 64 bits.
- */
-
-#ifdef CONFIG_X86_64
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	asm volatile("# beginning down_read\n\t"
-		     LOCK_PREFIX _ASM_INC "(%[sem])\n\t"
-		     /* adds 0x00000001 */
-		     "  jns        1f\n"
-		     "  call call_rwsem_down_read_failed\n"
-		     "1:\n\t"
-		     "# ending down_read\n\t"
-		     : "+m" (sem->count)
-		     : [sem] "a" (sem)
-		     : "memory", "cc");
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline bool __down_read_trylock(struct rw_semaphore *sem)
-{
-	long result, tmp;
-	asm volatile("# beginning __down_read_trylock\n\t"
-		     "  mov          %[count],%[result]\n\t"
-		     "1:\n\t"
-		     "  mov          %[result],%[tmp]\n\t"
-		     "  add          %[inc],%[tmp]\n\t"
-		     "  jle	     2f\n\t"
-		     LOCK_PREFIX "  cmpxchg  %[tmp],%[count]\n\t"
-		     "  jnz	     1b\n\t"
-		     "2:\n\t"
-		     "# ending __down_read_trylock\n\t"
-		     : [count] "+m" (sem->count), [result] "=&a" (result),
-		       [tmp] "=&r" (tmp)
-		     : [inc] "i" (RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
-	return result >= 0;
-}
-
-/*
- * lock for writing
- */
-#define ____down_write(sem, slow_path)			\
-({							\
-	long tmp;					\
-	struct rw_semaphore* ret;			\
-							\
-	asm volatile("# beginning down_write\n\t"	\
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"	\
-		     /* adds 0xffff0001, returns the old value */ \
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
-		     /* was the active mask 0 before? */\
-		     "  jz        1f\n"			\
-		     "  call " slow_path "\n"		\
-		     "1:\n"				\
-		     "# ending down_write"		\
-		     : "+m" (sem->count), [tmp] "=d" (tmp),	\
-		       "=a" (ret), ASM_CALL_CONSTRAINT	\
-		     : [sem] "a" (sem), "[tmp]" (RWSEM_ACTIVE_WRITE_BIAS) \
-		     : "memory", "cc");			\
-	ret;						\
-})
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	____down_write(sem, "call_rwsem_down_write_failed");
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable")))
-		return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline bool __down_write_trylock(struct rw_semaphore *sem)
-{
-	bool result;
-	long tmp0, tmp1;
-	asm volatile("# beginning __down_write_trylock\n\t"
-		     "  mov          %[count],%[tmp0]\n\t"
-		     "1:\n\t"
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jnz          2f\n\t"
-		     "  mov          %[tmp0],%[tmp1]\n\t"
-		     "  add          %[inc],%[tmp1]\n\t"
-		     LOCK_PREFIX "  cmpxchg  %[tmp1],%[count]\n\t"
-		     "  jnz	     1b\n\t"
-		     "2:\n\t"
-		     CC_SET(e)
-		     "# ending __down_write_trylock\n\t"
-		     : [count] "+m" (sem->count), [tmp0] "=&a" (tmp0),
-		       [tmp1] "=&r" (tmp1), CC_OUT(e) (result)
-		     : [inc] "er" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory");
-	return result;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long tmp;
-	asm volatile("# beginning __up_read\n\t"
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
-		     /* subtracts 1, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n"
-		     "# ending __up_read\n"
-		     : "+m" (sem->count), [tmp] "=d" (tmp)
-		     : [sem] "a" (sem), "[tmp]" (-RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	long tmp;
-	asm volatile("# beginning __up_write\n\t"
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
-		     /* subtracts 0xffff0001, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n\t"
-		     "# ending __up_write\n"
-		     : "+m" (sem->count), [tmp] "=d" (tmp)
-		     : [sem] "a" (sem), "[tmp]" (-RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	asm volatile("# beginning __downgrade_write\n\t"
-		     LOCK_PREFIX _ASM_ADD "%[inc],(%[sem])\n\t"
-		     /*
-		      * transitions 0xZZZZ0001 -> 0xYYYY0001 (i386)
-		      *     0xZZZZZZZZ00000001 -> 0xYYYYYYYY00000001 (x86_64)
-		      */
-		     "  jns       1f\n\t"
-		     "  call call_rwsem_downgrade_wake\n"
-		     "1:\n\t"
-		     "# ending __downgrade_write\n"
-		     : "+m" (sem->count)
-		     : [sem] "a" (sem), [inc] "er" (-RWSEM_WAITING_BIAS)
-		     : "memory", "cc");
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ASM_X86_RWSEM_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a7413..b4e8c8e 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -22,7 +22,6 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
 lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
-lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
deleted file mode 100644
index bf2c607..0000000
--- a/arch/x86/lib/rwsem.S
+++ /dev/null
@@ -1,144 +0,0 @@
-/*
- * x86 semaphore implementation.
- *
- * (C) Copyright 1999 Linus Torvalds
- *
- * Portions Copyright 1999 Red Hat, Inc.
- *
- *	This program is free software; you can redistribute it and/or
- *	modify it under the terms of the GNU General Public License
- *	as published by the Free Software Foundation; either version
- *	2 of the License, or (at your option) any later version.
- *
- * rw semaphores implemented November 1999 by Benjamin LaHaise <bcrl@kvack.org>
- */
-
-#include <linux/linkage.h>
-#include <asm/alternative-asm.h>
-#include <asm/frame.h>
-
-#define __ASM_HALF_REG(reg)	__ASM_SEL(reg, e##reg)
-#define __ASM_HALF_SIZE(inst)	__ASM_SEL(inst##w, inst##l)
-
-#ifdef CONFIG_X86_32
-
-/*
- * The semaphore operations have a special calling sequence that
- * allow us to do a simpler in-line version of them. These routines
- * need to convert that sequence back into the C sequence when
- * there is contention on the semaphore.
- *
- * %eax contains the semaphore pointer on entry. Save the C-clobbered
- * registers (%eax, %edx and %ecx) except %eax which is either a return
- * value or just gets clobbered. Same is true for %edx so make sure GCC
- * reloads it after the slow path, by making it hold a temporary, for
- * example see ____down_write().
- */
-
-#define save_common_regs \
-	pushl %ecx
-
-#define restore_common_regs \
-	popl %ecx
-
-	/* Avoid uglifying the argument copying x86-64 needs to do. */
-	.macro movq src, dst
-	.endm
-
-#else
-
-/*
- * x86-64 rwsem wrappers
- *
- * This interfaces the inline asm code to the slow-path
- * C routines. We need to save the call-clobbered regs
- * that the asm does not mark as clobbered, and move the
- * argument from %rax to %rdi.
- *
- * NOTE! We don't need to save %rax, because the functions
- * will always return the semaphore pointer in %rax (which
- * is also the input argument to these helpers)
- *
- * The following can clobber %rdx because the asm clobbers it:
- *   call_rwsem_down_write_failed
- *   call_rwsem_wake
- * but %rdi, %rsi, %rcx, %r8-r11 always need saving.
- */
-
-#define save_common_regs \
-	pushq %rdi; \
-	pushq %rsi; \
-	pushq %rcx; \
-	pushq %r8;  \
-	pushq %r9;  \
-	pushq %r10; \
-	pushq %r11
-
-#define restore_common_regs \
-	popq %r11; \
-	popq %r10; \
-	popq %r9; \
-	popq %r8; \
-	popq %rcx; \
-	popq %rsi; \
-	popq %rdi
-
-#endif
-
-/* Fix up special calling conventions */
-ENTRY(call_rwsem_down_read_failed)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_down_read_failed
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_read_failed)
-
-ENTRY(call_rwsem_down_write_failed)
-	FRAME_BEGIN
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_write_failed)
-
-ENTRY(call_rwsem_down_write_failed_killable)
-	FRAME_BEGIN
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed_killable
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_write_failed_killable)
-
-ENTRY(call_rwsem_wake)
-	FRAME_BEGIN
-	/* do nothing if still outstanding active readers */
-	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-	jnz 1f
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_wake
-	restore_common_regs
-1:	FRAME_END
-	ret
-ENDPROC(call_rwsem_wake)
-
-ENTRY(call_rwsem_downgrade_wake)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_downgrade_wake
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_downgrade_wake)
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index dff7cc3..199523e 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -21,7 +21,6 @@ generic-y += mm-arch-hooks.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += sections.h
 generic-y += topology.h
 generic-y += trace_clock.h
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 06/11] locking/rwsem: Remove arch specific rwsem files
@ 2017-10-11 18:01   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

As the generic rwsem-xadd code is using the appropriate acquire and
release versions of the atomic operations, the arch specific rwsem.h
files will not that much faster than the generic code. So we can
remove those arch specific rwsem.h and stop building asm/rwsem.h to
reduce maintenance effort.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 arch/alpha/include/asm/rwsem.h  | 195 -----------------------------------
 arch/arm/include/asm/Kbuild     |   1 -
 arch/arm64/include/asm/Kbuild   |   1 -
 arch/hexagon/include/asm/Kbuild |   1 -
 arch/ia64/include/asm/rwsem.h   | 154 ----------------------------
 arch/powerpc/include/asm/Kbuild |   1 -
 arch/s390/include/asm/rwsem.h   | 210 --------------------------------------
 arch/sh/include/asm/Kbuild      |   1 -
 arch/sparc/include/asm/Kbuild   |   1 -
 arch/x86/include/asm/rwsem.h    | 221 ----------------------------------------
 arch/x86/lib/Makefile           |   1 -
 arch/x86/lib/rwsem.S            | 144 --------------------------
 arch/xtensa/include/asm/Kbuild  |   1 -
 13 files changed, 932 deletions(-)
 delete mode 100644 arch/alpha/include/asm/rwsem.h
 delete mode 100644 arch/ia64/include/asm/rwsem.h
 delete mode 100644 arch/s390/include/asm/rwsem.h
 delete mode 100644 arch/x86/include/asm/rwsem.h
 delete mode 100644 arch/x86/lib/rwsem.S

diff --git a/arch/alpha/include/asm/rwsem.h b/arch/alpha/include/asm/rwsem.h
deleted file mode 100644
index 77873d0..0000000
--- a/arch/alpha/include/asm/rwsem.h
+++ /dev/null
@@ -1,195 +0,0 @@
-#ifndef _ALPHA_RWSEM_H
-#define _ALPHA_RWSEM_H
-
-/*
- * Written by Ivan Kokshaysky <ink@jurassic.park.msu.ru>, 2001.
- * Based on asm-alpha/semaphore.h and asm-i386/rwsem.h
- */
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-
-#include <linux/compiler.h>
-
-#define RWSEM_UNLOCKED_VALUE		0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS		0x0000000000000001L
-#define RWSEM_ACTIVE_MASK		0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS		(-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter += RWSEM_ACTIVE_READ_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"1:	ldq_l	%0,%1\n"
-	"	addq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	mb\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_READ_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(oldcount < 0))
-		rwsem_down_read_failed(sem);
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	long old, new, res;
-
-	res = atomic_long_read(&sem->count);
-	do {
-		new = res + RWSEM_ACTIVE_READ_BIAS;
-		if (new <= 0)
-			break;
-		old = res;
-		res = atomic_long_cmpxchg(&sem->count, old, new);
-	} while (res != old);
-	return res >= 0 ? 1 : 0;
-}
-
-static inline long ___down_write(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter += RWSEM_ACTIVE_WRITE_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"1:	ldq_l	%0,%1\n"
-	"	addq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	mb\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
-#endif
-	return oldcount;
-}
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	if (unlikely(___down_write(sem)))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (unlikely(___down_write(sem)))
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	long ret = atomic_long_cmpxchg(&sem->count, RWSEM_UNLOCKED_VALUE,
-			   RWSEM_ACTIVE_WRITE_BIAS);
-	if (ret = RWSEM_UNLOCKED_VALUE)
-		return 1;
-	return 0;
-}
-
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter -= RWSEM_ACTIVE_READ_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"	mb\n"
-	"1:	ldq_l	%0,%1\n"
-	"	subq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_READ_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(oldcount < 0))
-		if ((int)oldcount - RWSEM_ACTIVE_READ_BIAS = 0)
-			rwsem_wake(sem);
-}
-
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	long count;
-#ifndef	CONFIG_SMP
-	sem->count.counter -= RWSEM_ACTIVE_WRITE_BIAS;
-	count = sem->count.counter;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"	mb\n"
-	"1:	ldq_l	%0,%1\n"
-	"	subq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	subq	%0,%3,%0\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (count), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (RWSEM_ACTIVE_WRITE_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(count))
-		if ((int)count = 0)
-			rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	long oldcount;
-#ifndef	CONFIG_SMP
-	oldcount = sem->count.counter;
-	sem->count.counter -= RWSEM_WAITING_BIAS;
-#else
-	long temp;
-	__asm__ __volatile__(
-	"1:	ldq_l	%0,%1\n"
-	"	addq	%0,%3,%2\n"
-	"	stq_c	%2,%1\n"
-	"	beq	%2,2f\n"
-	"	mb\n"
-	".subsection 2\n"
-	"2:	br	1b\n"
-	".previous"
-	:"=&r" (oldcount), "=m" (sem->count), "=&r" (temp)
-	:"Ir" (-RWSEM_WAITING_BIAS), "m" (sem->count) : "memory");
-#endif
-	if (unlikely(oldcount < 0))
-		rwsem_downgrade_wake(sem);
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ALPHA_RWSEM_H */
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 721ab5e..58337ef 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -12,7 +12,6 @@ generic-y += mm-arch-hooks.h
 generic-y += msi.h
 generic-y += parport.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += seccomp.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 2326e39..38366a6 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -16,7 +16,6 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += msi.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += segment.h
 generic-y += serial.h
 generic-y += set_memory.h
diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 3401368..002eb1f 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -24,7 +24,6 @@ generic-y += mm-arch-hooks.h
 generic-y += pci.h
 generic-y += percpu.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/ia64/include/asm/rwsem.h b/arch/ia64/include/asm/rwsem.h
deleted file mode 100644
index 8fa98dd..0000000
--- a/arch/ia64/include/asm/rwsem.h
+++ /dev/null
@@ -1,154 +0,0 @@
-/*
- * R/W semaphores for ia64
- *
- * Copyright (C) 2003 Ken Chen <kenneth.w.chen@intel.com>
- * Copyright (C) 2003 Asit Mallick <asit.k.mallick@intel.com>
- * Copyright (C) 2005 Christoph Lameter <cl@linux.com>
- *
- * Based on asm-i386/rwsem.h and other architecture implementation.
- *
- * The MSW of the count is the negated number of active writers and
- * waiting lockers, and the LSW is the total number of active locks.
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffffffff00000001 for
- * the case of an uncontended lock. Readers increment by 1 and see a positive
- * value when uncontended, negative if there are writers (and maybe) readers
- * waiting (in which case it goes to sleep).
- */
-
-#ifndef _ASM_IA64_RWSEM_H
-#define _ASM_IA64_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "Please don't include <asm/rwsem.h> directly, use <linux/rwsem.h> instead."
-#endif
-
-#include <asm/intrinsics.h>
-
-#define RWSEM_UNLOCKED_VALUE		__IA64_UL_CONST(0x0000000000000000)
-#define RWSEM_ACTIVE_BIAS		(1L)
-#define RWSEM_ACTIVE_MASK		(0xffffffffL)
-#define RWSEM_WAITING_BIAS		(-0x100000000L)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void
-__down_read (struct rw_semaphore *sem)
-{
-	long result = ia64_fetchadd8_acq((unsigned long *)&sem->count.counter, 1);
-
-	if (result < 0)
-		rwsem_down_read_failed(sem);
-}
-
-/*
- * lock for writing
- */
-static inline long
-___down_write (struct rw_semaphore *sem)
-{
-	long old, new;
-
-	do {
-		old = atomic_long_read(&sem->count);
-		new = old + RWSEM_ACTIVE_WRITE_BIAS;
-	} while (atomic_long_cmpxchg_acquire(&sem->count, old, new) != old);
-
-	return old;
-}
-
-static inline void
-__down_write (struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int
-__down_write_killable (struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-/*
- * unlock after reading
- */
-static inline void
-__up_read (struct rw_semaphore *sem)
-{
-	long result = ia64_fetchadd8_rel((unsigned long *)&sem->count.counter, -1);
-
-	if (result < 0 && (--result & RWSEM_ACTIVE_MASK) = 0)
-		rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void
-__up_write (struct rw_semaphore *sem)
-{
-	long old, new;
-
-	do {
-		old = atomic_long_read(&sem->count);
-		new = old - RWSEM_ACTIVE_WRITE_BIAS;
-	} while (atomic_long_cmpxchg_release(&sem->count, old, new) != old);
-
-	if (new < 0 && (new & RWSEM_ACTIVE_MASK) = 0)
-		rwsem_wake(sem);
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int
-__down_read_trylock (struct rw_semaphore *sem)
-{
-	long tmp;
-	while ((tmp = atomic_long_read(&sem->count)) >= 0) {
-		if (tmp = atomic_long_cmpxchg_acquire(&sem->count, tmp, tmp+1)) {
-			return 1;
-		}
-	}
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int
-__down_write_trylock (struct rw_semaphore *sem)
-{
-	long tmp = atomic_long_cmpxchg_acquire(&sem->count,
-			RWSEM_UNLOCKED_VALUE, RWSEM_ACTIVE_WRITE_BIAS);
-	return tmp = RWSEM_UNLOCKED_VALUE;
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void
-__downgrade_write (struct rw_semaphore *sem)
-{
-	long old, new;
-
-	do {
-		old = atomic_long_read(&sem->count);
-		new = old - RWSEM_WAITING_BIAS;
-	} while (atomic_long_cmpxchg_release(&sem->count, old, new) != old);
-
-	if (old < 0)
-		rwsem_downgrade_wake(sem);
-}
-
-#endif /* _ASM_IA64_RWSEM_H */
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 2542ea1..e25807a 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -6,6 +6,5 @@ generic-y += irq_work.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += vtime.h
 generic-y += msi.h
diff --git a/arch/s390/include/asm/rwsem.h b/arch/s390/include/asm/rwsem.h
deleted file mode 100644
index 597e7e9..0000000
--- a/arch/s390/include/asm/rwsem.h
+++ /dev/null
@@ -1,210 +0,0 @@
-#ifndef _S390_RWSEM_H
-#define _S390_RWSEM_H
-
-/*
- *  S390 version
- *    Copyright IBM Corp. 2002
- *    Author(s): Martin Schwidefsky (schwidefsky@de.ibm.com)
- *
- *  Based on asm-alpha/semaphore.h and asm-i386/rwsem.h
- */
-
-/*
- *
- * The MSW of the count is the negated number of active writers and waiting
- * lockers, and the LSW is the total number of active locks
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
- * uncontended lock. This can be determined because XADD returns the old value.
- * Readers increment by 1 and see a positive value when uncontended, negative
- * if there are writers (and maybe) readers waiting (in which case it goes to
- * sleep).
- *
- * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
- * be extended to 65534 by manually checking the whole MSW rather than relying
- * on the S flag.
- *
- * The value of ACTIVE_BIAS supports up to 65535 active processes.
- *
- * This should be totally fair - if anything is waiting, a process that wants a
- * lock will go to the back of the queue. When the currently active lock is
- * released, if there's a writer at the front of the queue, then that and only
- * that will be woken up; if there's a bunch of consecutive readers at the
- * front, then they'll all be woken up, but no other readers will be.
- */
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#define RWSEM_UNLOCKED_VALUE	0x0000000000000000L
-#define RWSEM_ACTIVE_BIAS	0x0000000000000001L
-#define RWSEM_ACTIVE_MASK	0x00000000ffffffffL
-#define RWSEM_WAITING_BIAS	(-0x0000000100000000L)
-#define RWSEM_ACTIVE_READ_BIAS	RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS	(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	signed long old, new;
-
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	aghi	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "i" (RWSEM_ACTIVE_READ_BIAS)
-		: "cc", "memory");
-	if (old < 0)
-		rwsem_down_read_failed(sem);
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline int __down_read_trylock(struct rw_semaphore *sem)
-{
-	signed long old, new;
-
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	ltgr	%1,%0\n"
-		"	jm	1f\n"
-		"	aghi	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b\n"
-		"1:"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "i" (RWSEM_ACTIVE_READ_BIAS)
-		: "cc", "memory");
-	return old >= 0 ? 1 : 0;
-}
-
-/*
- * lock for writing
- */
-static inline long ___down_write(struct rw_semaphore *sem)
-{
-	signed long old, new, tmp;
-
-	tmp = RWSEM_ACTIVE_WRITE_BIAS;
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	ag	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "m" (tmp)
-		: "cc", "memory");
-
-	return old;
-}
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		rwsem_down_write_failed(sem);
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (___down_write(sem))
-		if (IS_ERR(rwsem_down_write_failed_killable(sem)))
-			return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline int __down_write_trylock(struct rw_semaphore *sem)
-{
-	signed long old;
-
-	asm volatile(
-		"	lg	%0,%1\n"
-		"0:	ltgr	%0,%0\n"
-		"	jnz	1f\n"
-		"	csg	%0,%3,%1\n"
-		"	jl	0b\n"
-		"1:"
-		: "=&d" (old), "=Q" (sem->count)
-		: "Q" (sem->count), "d" (RWSEM_ACTIVE_WRITE_BIAS)
-		: "cc", "memory");
-	return (old = RWSEM_UNLOCKED_VALUE) ? 1 : 0;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	signed long old, new;
-
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	aghi	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "i" (-RWSEM_ACTIVE_READ_BIAS)
-		: "cc", "memory");
-	if (new < 0)
-		if ((new & RWSEM_ACTIVE_MASK) = 0)
-			rwsem_wake(sem);
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	signed long old, new, tmp;
-
-	tmp = -RWSEM_ACTIVE_WRITE_BIAS;
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	ag	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "m" (tmp)
-		: "cc", "memory");
-	if (new < 0)
-		if ((new & RWSEM_ACTIVE_MASK) = 0)
-			rwsem_wake(sem);
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	signed long old, new, tmp;
-
-	tmp = -RWSEM_WAITING_BIAS;
-	asm volatile(
-		"	lg	%0,%2\n"
-		"0:	lgr	%1,%0\n"
-		"	ag	%1,%4\n"
-		"	csg	%0,%1,%2\n"
-		"	jl	0b"
-		: "=&d" (old), "=&d" (new), "=Q" (sem->count)
-		: "Q" (sem->count), "m" (tmp)
-		: "cc", "memory");
-	if (new > 1)
-		rwsem_downgrade_wake(sem);
-}
-
-#endif /* _S390_RWSEM_H */
diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild
index 1a6f9c3..24af7e6 100644
--- a/arch/sh/include/asm/Kbuild
+++ b/arch/sh/include/asm/Kbuild
@@ -13,7 +13,6 @@ generic-y += mm-arch-hooks.h
 generic-y += parport.h
 generic-y += percpu.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += serial.h
 generic-y += sizes.h
 generic-y += trace_clock.h
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 80ddc01..2c6d988 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -15,7 +15,6 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += module.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += serial.h
 generic-y += trace_clock.h
 generic-y += word-at-a-time.h
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
deleted file mode 100644
index a8f486e..0000000
--- a/arch/x86/include/asm/rwsem.h
+++ /dev/null
@@ -1,221 +0,0 @@
-/* rwsem.h: R/W semaphores implemented using XADD/CMPXCHG for i486+
- *
- * Written by David Howells (dhowells@redhat.com).
- *
- * Derived from asm-x86/semaphore.h
- *
- *
- * The MSW of the count is the negated number of active writers and waiting
- * lockers, and the LSW is the total number of active locks
- *
- * The lock count is initialized to 0 (no active and no waiting lockers).
- *
- * When a writer subtracts WRITE_BIAS, it'll get 0xffff0001 for the case of an
- * uncontended lock. This can be determined because XADD returns the old value.
- * Readers increment by 1 and see a positive value when uncontended, negative
- * if there are writers (and maybe) readers waiting (in which case it goes to
- * sleep).
- *
- * The value of WAITING_BIAS supports up to 32766 waiting processes. This can
- * be extended to 65534 by manually checking the whole MSW rather than relying
- * on the S flag.
- *
- * The value of ACTIVE_BIAS supports up to 65535 active processes.
- *
- * This should be totally fair - if anything is waiting, a process that wants a
- * lock will go to the back of the queue. When the currently active lock is
- * released, if there's a writer at the front of the queue, then that and only
- * that will be woken up; if there's a bunch of consecutive readers at the
- * front, then they'll all be woken up, but no other readers will be.
- */
-
-#ifndef _ASM_X86_RWSEM_H
-#define _ASM_X86_RWSEM_H
-
-#ifndef _LINUX_RWSEM_H
-#error "please don't include asm/rwsem.h directly, use linux/rwsem.h instead"
-#endif
-
-#ifdef __KERNEL__
-#include <asm/asm.h>
-
-/*
- * The bias values and the counter type limits the number of
- * potential readers/writers to 32767 for 32 bits and 2147483647
- * for 64 bits.
- */
-
-#ifdef CONFIG_X86_64
-# define RWSEM_ACTIVE_MASK		0xffffffffL
-#else
-# define RWSEM_ACTIVE_MASK		0x0000ffffL
-#endif
-
-#define RWSEM_UNLOCKED_VALUE		0x00000000L
-#define RWSEM_ACTIVE_BIAS		0x00000001L
-#define RWSEM_WAITING_BIAS		(-RWSEM_ACTIVE_MASK-1)
-#define RWSEM_ACTIVE_READ_BIAS		RWSEM_ACTIVE_BIAS
-#define RWSEM_ACTIVE_WRITE_BIAS		(RWSEM_WAITING_BIAS + RWSEM_ACTIVE_BIAS)
-
-/*
- * lock for reading
- */
-static inline void __down_read(struct rw_semaphore *sem)
-{
-	asm volatile("# beginning down_read\n\t"
-		     LOCK_PREFIX _ASM_INC "(%[sem])\n\t"
-		     /* adds 0x00000001 */
-		     "  jns        1f\n"
-		     "  call call_rwsem_down_read_failed\n"
-		     "1:\n\t"
-		     "# ending down_read\n\t"
-		     : "+m" (sem->count)
-		     : [sem] "a" (sem)
-		     : "memory", "cc");
-}
-
-/*
- * trylock for reading -- returns 1 if successful, 0 if contention
- */
-static inline bool __down_read_trylock(struct rw_semaphore *sem)
-{
-	long result, tmp;
-	asm volatile("# beginning __down_read_trylock\n\t"
-		     "  mov          %[count],%[result]\n\t"
-		     "1:\n\t"
-		     "  mov          %[result],%[tmp]\n\t"
-		     "  add          %[inc],%[tmp]\n\t"
-		     "  jle	     2f\n\t"
-		     LOCK_PREFIX "  cmpxchg  %[tmp],%[count]\n\t"
-		     "  jnz	     1b\n\t"
-		     "2:\n\t"
-		     "# ending __down_read_trylock\n\t"
-		     : [count] "+m" (sem->count), [result] "=&a" (result),
-		       [tmp] "=&r" (tmp)
-		     : [inc] "i" (RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
-	return result >= 0;
-}
-
-/*
- * lock for writing
- */
-#define ____down_write(sem, slow_path)			\
-({							\
-	long tmp;					\
-	struct rw_semaphore* ret;			\
-							\
-	asm volatile("# beginning down_write\n\t"	\
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"	\
-		     /* adds 0xffff0001, returns the old value */ \
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t" \
-		     /* was the active mask 0 before? */\
-		     "  jz        1f\n"			\
-		     "  call " slow_path "\n"		\
-		     "1:\n"				\
-		     "# ending down_write"		\
-		     : "+m" (sem->count), [tmp] "=d" (tmp),	\
-		       "=a" (ret), ASM_CALL_CONSTRAINT	\
-		     : [sem] "a" (sem), "[tmp]" (RWSEM_ACTIVE_WRITE_BIAS) \
-		     : "memory", "cc");			\
-	ret;						\
-})
-
-static inline void __down_write(struct rw_semaphore *sem)
-{
-	____down_write(sem, "call_rwsem_down_write_failed");
-}
-
-static inline int __down_write_killable(struct rw_semaphore *sem)
-{
-	if (IS_ERR(____down_write(sem, "call_rwsem_down_write_failed_killable")))
-		return -EINTR;
-
-	return 0;
-}
-
-/*
- * trylock for writing -- returns 1 if successful, 0 if contention
- */
-static inline bool __down_write_trylock(struct rw_semaphore *sem)
-{
-	bool result;
-	long tmp0, tmp1;
-	asm volatile("# beginning __down_write_trylock\n\t"
-		     "  mov          %[count],%[tmp0]\n\t"
-		     "1:\n\t"
-		     "  test " __ASM_SEL(%w1,%k1) "," __ASM_SEL(%w1,%k1) "\n\t"
-		     /* was the active mask 0 before? */
-		     "  jnz          2f\n\t"
-		     "  mov          %[tmp0],%[tmp1]\n\t"
-		     "  add          %[inc],%[tmp1]\n\t"
-		     LOCK_PREFIX "  cmpxchg  %[tmp1],%[count]\n\t"
-		     "  jnz	     1b\n\t"
-		     "2:\n\t"
-		     CC_SET(e)
-		     "# ending __down_write_trylock\n\t"
-		     : [count] "+m" (sem->count), [tmp0] "=&a" (tmp0),
-		       [tmp1] "=&r" (tmp1), CC_OUT(e) (result)
-		     : [inc] "er" (RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory");
-	return result;
-}
-
-/*
- * unlock after reading
- */
-static inline void __up_read(struct rw_semaphore *sem)
-{
-	long tmp;
-	asm volatile("# beginning __up_read\n\t"
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
-		     /* subtracts 1, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n"
-		     "# ending __up_read\n"
-		     : "+m" (sem->count), [tmp] "=d" (tmp)
-		     : [sem] "a" (sem), "[tmp]" (-RWSEM_ACTIVE_READ_BIAS)
-		     : "memory", "cc");
-}
-
-/*
- * unlock after writing
- */
-static inline void __up_write(struct rw_semaphore *sem)
-{
-	long tmp;
-	asm volatile("# beginning __up_write\n\t"
-		     LOCK_PREFIX "  xadd      %[tmp],(%[sem])\n\t"
-		     /* subtracts 0xffff0001, returns the old value */
-		     "  jns        1f\n\t"
-		     "  call call_rwsem_wake\n" /* expects old value in %edx */
-		     "1:\n\t"
-		     "# ending __up_write\n"
-		     : "+m" (sem->count), [tmp] "=d" (tmp)
-		     : [sem] "a" (sem), "[tmp]" (-RWSEM_ACTIVE_WRITE_BIAS)
-		     : "memory", "cc");
-}
-
-/*
- * downgrade write lock to read lock
- */
-static inline void __downgrade_write(struct rw_semaphore *sem)
-{
-	asm volatile("# beginning __downgrade_write\n\t"
-		     LOCK_PREFIX _ASM_ADD "%[inc],(%[sem])\n\t"
-		     /*
-		      * transitions 0xZZZZ0001 -> 0xYYYY0001 (i386)
-		      *     0xZZZZZZZZ00000001 -> 0xYYYYYYYY00000001 (x86_64)
-		      */
-		     "  jns       1f\n\t"
-		     "  call call_rwsem_downgrade_wake\n"
-		     "1:\n\t"
-		     "# ending __downgrade_write\n"
-		     : "+m" (sem->count)
-		     : [sem] "a" (sem), [inc] "er" (-RWSEM_WAITING_BIAS)
-		     : "memory", "cc");
-}
-
-#endif /* __KERNEL__ */
-#endif /* _ASM_X86_RWSEM_H */
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a7413..b4e8c8e 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -22,7 +22,6 @@ obj-$(CONFIG_SMP) += msr-smp.o cache-smp.o
 lib-y := delay.o misc.o cmdline.o cpu.o
 lib-y += usercopy_$(BITS).o usercopy.o getuser.o putuser.o
 lib-y += memcpy_$(BITS).o
-lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
 
diff --git a/arch/x86/lib/rwsem.S b/arch/x86/lib/rwsem.S
deleted file mode 100644
index bf2c607..0000000
--- a/arch/x86/lib/rwsem.S
+++ /dev/null
@@ -1,144 +0,0 @@
-/*
- * x86 semaphore implementation.
- *
- * (C) Copyright 1999 Linus Torvalds
- *
- * Portions Copyright 1999 Red Hat, Inc.
- *
- *	This program is free software; you can redistribute it and/or
- *	modify it under the terms of the GNU General Public License
- *	as published by the Free Software Foundation; either version
- *	2 of the License, or (at your option) any later version.
- *
- * rw semaphores implemented November 1999 by Benjamin LaHaise <bcrl@kvack.org>
- */
-
-#include <linux/linkage.h>
-#include <asm/alternative-asm.h>
-#include <asm/frame.h>
-
-#define __ASM_HALF_REG(reg)	__ASM_SEL(reg, e##reg)
-#define __ASM_HALF_SIZE(inst)	__ASM_SEL(inst##w, inst##l)
-
-#ifdef CONFIG_X86_32
-
-/*
- * The semaphore operations have a special calling sequence that
- * allow us to do a simpler in-line version of them. These routines
- * need to convert that sequence back into the C sequence when
- * there is contention on the semaphore.
- *
- * %eax contains the semaphore pointer on entry. Save the C-clobbered
- * registers (%eax, %edx and %ecx) except %eax which is either a return
- * value or just gets clobbered. Same is true for %edx so make sure GCC
- * reloads it after the slow path, by making it hold a temporary, for
- * example see ____down_write().
- */
-
-#define save_common_regs \
-	pushl %ecx
-
-#define restore_common_regs \
-	popl %ecx
-
-	/* Avoid uglifying the argument copying x86-64 needs to do. */
-	.macro movq src, dst
-	.endm
-
-#else
-
-/*
- * x86-64 rwsem wrappers
- *
- * This interfaces the inline asm code to the slow-path
- * C routines. We need to save the call-clobbered regs
- * that the asm does not mark as clobbered, and move the
- * argument from %rax to %rdi.
- *
- * NOTE! We don't need to save %rax, because the functions
- * will always return the semaphore pointer in %rax (which
- * is also the input argument to these helpers)
- *
- * The following can clobber %rdx because the asm clobbers it:
- *   call_rwsem_down_write_failed
- *   call_rwsem_wake
- * but %rdi, %rsi, %rcx, %r8-r11 always need saving.
- */
-
-#define save_common_regs \
-	pushq %rdi; \
-	pushq %rsi; \
-	pushq %rcx; \
-	pushq %r8;  \
-	pushq %r9;  \
-	pushq %r10; \
-	pushq %r11
-
-#define restore_common_regs \
-	popq %r11; \
-	popq %r10; \
-	popq %r9; \
-	popq %r8; \
-	popq %rcx; \
-	popq %rsi; \
-	popq %rdi
-
-#endif
-
-/* Fix up special calling conventions */
-ENTRY(call_rwsem_down_read_failed)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_down_read_failed
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_read_failed)
-
-ENTRY(call_rwsem_down_write_failed)
-	FRAME_BEGIN
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_write_failed)
-
-ENTRY(call_rwsem_down_write_failed_killable)
-	FRAME_BEGIN
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_down_write_failed_killable
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_down_write_failed_killable)
-
-ENTRY(call_rwsem_wake)
-	FRAME_BEGIN
-	/* do nothing if still outstanding active readers */
-	__ASM_HALF_SIZE(dec) %__ASM_HALF_REG(dx)
-	jnz 1f
-	save_common_regs
-	movq %rax,%rdi
-	call rwsem_wake
-	restore_common_regs
-1:	FRAME_END
-	ret
-ENDPROC(call_rwsem_wake)
-
-ENTRY(call_rwsem_downgrade_wake)
-	FRAME_BEGIN
-	save_common_regs
-	__ASM_SIZE(push,) %__ASM_REG(dx)
-	movq %rax,%rdi
-	call rwsem_downgrade_wake
-	__ASM_SIZE(pop,) %__ASM_REG(dx)
-	restore_common_regs
-	FRAME_END
-	ret
-ENDPROC(call_rwsem_downgrade_wake)
diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index dff7cc3..199523e 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -21,7 +21,6 @@ generic-y += mm-arch-hooks.h
 generic-y += param.h
 generic-y += percpu.h
 generic-y += preempt.h
-generic-y += rwsem.h
 generic-y += sections.h
 generic-y += topology.h
 generic-y += trace_clock.h
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 07/11] locking/rwsem: Implement lock handoff to prevent lock starvation
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:01   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

Because of writer lock stealing, it is possible that a constant
stream of incoming writers will cause a waiting writer or reader to
wait indefinitely leading to lock starvation.

The mutex code has a lock handoff mechanism to prevent lock starvation.
This patch implements a similar lock handoff mechanism to disable
lock stealing and force lock handoff to the first waiter in the queue
after at least a 10ms waiting period. The waiting period is used to
avoid discouraging lock stealing too much to affect performance.

A rwsem microbenchmark was run for 5 seconds on a 2-socket 40-core
80-thread x86-64 system with a 4.14-rc2 based kernel and 60 write_lock
threads with 1us sleep critical section.

For the unpatched kernel, the locking rate was 15,519 kop/s. However
there were 28 threads with only 1 locking operation done (practically
starved). The thread with the highest locking rate had done more than
646k of them.

For the patched kernel, the locking rate dropped to 12,590 kop/s. The
number of locking operations done per thread had a range of 14,450 -
22,648. The rwsem became much more fair with the tradeoff of lower
overall throughput.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 98 +++++++++++++++++++++++++++++++++++++--------
 kernel/locking/rwsem-xadd.h | 22 ++++++----
 2 files changed, 96 insertions(+), 24 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index e3ab430..bca412f 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -70,6 +70,7 @@ struct rwsem_waiter {
 	struct list_head list;
 	struct task_struct *task;
 	enum rwsem_waiter_type type;
+	unsigned long timeout;
 };
 
 enum rwsem_wake_type {
@@ -79,6 +80,16 @@ enum rwsem_wake_type {
 };
 
 /*
+ * The minimum waiting time (10ms) in the wait queue before initiating the
+ * handoff protocol.
+ *
+ * The queue head waiter that is aborted (killed) will pass the handoff
+ * bit, if set, to the next waiter, but the bit has to be cleared when
+ * the wait queue becomes empty.
+ */
+#define RWSEM_WAIT_TIMEOUT	((HZ - 1)/100 + 1)
+
+/*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_xxxx(), then:
  *   - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
@@ -128,6 +139,13 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		adjustment = RWSEM_READER_BIAS;
 		oldcount = atomic_fetch_add(adjustment, &sem->count);
 		if (unlikely(oldcount & RWSEM_WRITER_LOCKED)) {
+			/*
+			 * Initiate handoff to reader, if applicable.
+			 */
+			if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
+			    time_after(jiffies, waiter->timeout))
+				adjustment -= RWSEM_FLAG_HANDOFF;
+
 			atomic_sub(adjustment, &sem->count);
 			return;
 		}
@@ -170,6 +188,12 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		adjustment -= RWSEM_FLAG_WAITERS;
 	}
 
+	/*
+	 * Clear the handoff flag
+	 */
+	if (woken && RWSEM_COUNT_IS_HANDOFF(atomic_read(&sem->count)))
+		adjustment -= RWSEM_FLAG_HANDOFF;
+
 	if (adjustment)
 		atomic_add(adjustment, &sem->count);
 }
@@ -179,15 +203,20 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
  */
-static inline bool rwsem_try_write_lock(int count, struct rw_semaphore *sem)
+static inline bool
+rwsem_try_write_lock(int count, struct rw_semaphore *sem, bool first)
 {
 	int new;
 
 	if (RWSEM_COUNT_IS_LOCKED(count))
 		return false;
 
+	if (!first && RWSEM_COUNT_IS_HANDOFF(count))
+		return false;
+
 	new = count + RWSEM_WRITER_LOCKED -
-	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0);
+	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0) -
+	     (count & RWSEM_FLAG_HANDOFF);
 
 	if (atomic_cmpxchg_acquire(&sem->count, count, new) == count) {
 		rwsem_set_owner(sem);
@@ -206,7 +235,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 	int old, count = atomic_read(&sem->count);
 
 	while (true) {
-		if (RWSEM_COUNT_IS_LOCKED(count))
+		if (RWSEM_COUNT_IS_LOCKED_OR_HANDOFF(count))
 			return false;
 
 		old = atomic_cmpxchg_acquire(&sem->count, count,
@@ -362,6 +391,16 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 #endif
 
 /*
+ * This is safe to be called without holding the wait_lock.
+ */
+static inline bool
+rwsem_waiter_is_first(struct rw_semaphore *sem, struct rwsem_waiter *waiter)
+{
+	return list_first_entry(&sem->wait_list, struct rwsem_waiter, list)
+			== waiter;
+}
+
+/*
  * Wait for the read lock to be granted
  */
 static inline struct rw_semaphore __sched *
@@ -373,6 +412,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_READ;
+	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	if (list_empty(&sem->wait_list))
@@ -413,8 +453,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	return sem;
 out_nolock:
 	list_del(&waiter.list);
-	if (list_empty(&sem->wait_list))
-		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
+	if (list_empty(&sem->wait_list)) {
+		int adjustment = -RWSEM_FLAG_WAITERS;
+
+		if (RWSEM_COUNT_IS_HANDOFF(atomic_read(&sem->count)))
+			adjustment -= RWSEM_FLAG_HANDOFF;
+		atomic_add(adjustment, &sem->count);
+	}
 	raw_spin_unlock_irq(&sem->wait_lock);
 	__set_current_state(TASK_RUNNING);
 	return ERR_PTR(-EINTR);
@@ -441,7 +486,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 {
 	int count;
-	bool waiting = true; /* any queued threads before us */
+	bool first = false;	/* First one in queue */
 	struct rwsem_waiter waiter;
 	struct rw_semaphore *ret = sem;
 	DEFINE_WAKE_Q(wake_q);
@@ -456,17 +501,18 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	 */
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_WRITE;
+	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 
 	/* account for this before adding a new element to the list */
 	if (list_empty(&sem->wait_list))
-		waiting = false;
+		first = true;
 
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-	if (waiting) {
+	if (!first) {
 		count = atomic_read(&sem->count);
 
 		/*
@@ -498,19 +544,30 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
 	while (true) {
-		if (rwsem_try_write_lock(count, sem))
+		if (rwsem_try_write_lock(count, sem, first))
 			break;
+
 		raw_spin_unlock_irq(&sem->wait_lock);
 
 		/* Block until there are no active lockers. */
-		do {
+		for (;;) {
 			if (signal_pending_state(state, current))
 				goto out_nolock;
 
 			schedule();
 			set_current_state(state);
 			count = atomic_read(&sem->count);
-		} while (RWSEM_COUNT_IS_LOCKED(count));
+
+			if (!first)
+				first = rwsem_waiter_is_first(sem, &waiter);
+
+			if (!RWSEM_COUNT_IS_LOCKED(count))
+				break;
+
+			if (first && !RWSEM_COUNT_IS_HANDOFF(count) &&
+			    time_after(jiffies, waiter.timeout))
+				atomic_or(RWSEM_FLAG_HANDOFF, &sem->count);
+		}
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
@@ -524,10 +581,15 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	__set_current_state(TASK_RUNNING);
 	raw_spin_lock_irq(&sem->wait_lock);
 	list_del(&waiter.list);
-	if (list_empty(&sem->wait_list))
-		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
-	else
+	if (list_empty(&sem->wait_list)) {
+		int adjustment = -RWSEM_FLAG_WAITERS;
+
+		if (RWSEM_COUNT_IS_HANDOFF(atomic_read(&sem->count)))
+			adjustment -= RWSEM_FLAG_HANDOFF;
+		atomic_add(adjustment, &sem->count);
+	} else {
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+	}
 	raw_spin_unlock_irq(&sem->wait_lock);
 	wake_up_q(&wake_q);
 
@@ -553,7 +615,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
  * - up_read/up_write has decremented the active part of count if we come here
  */
 __visible
-struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
+struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, int count)
 {
 	unsigned long flags;
 	DEFINE_WAKE_Q(wake_q);
@@ -586,7 +648,9 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	smp_rmb();
 
 	/*
-	 * If a spinner is present, it is not necessary to do the wakeup.
+	 * If a spinner is present and the handoff flag isn't set, it is
+	 * not necessary to do the wakeup.
+	 *
 	 * Try to do wakeup only if the trylock succeeds to minimize
 	 * spinlock contention which may introduce too much delay in the
 	 * unlock operation.
@@ -605,7 +669,7 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	 * rwsem_has_spinner() is true, it will guarantee at least one
 	 * trylock attempt on the rwsem later on.
 	 */
-	if (rwsem_has_spinner(sem)) {
+	if (rwsem_has_spinner(sem) && !(count & RWSEM_FLAG_HANDOFF)) {
 		/*
 		 * The smp_rmb() here is to make sure that the spinner
 		 * state is consulted before reading the wait_lock.
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 9b30f0c..8cb12ed 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -74,7 +74,8 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
  *
  * Bit  0    - writer locked bit
  * Bit  1    - waiters present bit
- * Bits 2-7  - reserved
+ * Bit  2    - lock handoff bit
+ * Bits 3-7  - reserved
  * Bits 8-31 - 24-bit reader count
  *
  * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
@@ -82,19 +83,24 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
  */
 #define RWSEM_WRITER_LOCKED	0X00000001
 #define RWSEM_FLAG_WAITERS	0X00000002
+#define RWSEM_FLAG_HANDOFF	0X00000004
 #define RWSEM_READER_BIAS	0x00000100
 #define RWSEM_READER_SHIFT	8
 #define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
 #define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
-#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
+#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS|\
+				 RWSEM_FLAG_HANDOFF)
 
 #define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
+#define RWSEM_COUNT_IS_HANDOFF(c)	((c) & RWSEM_FLAG_HANDOFF)
+#define RWSEM_COUNT_IS_LOCKED_OR_HANDOFF(c)	\
+	((c) & (RWSEM_LOCK_MASK|RWSEM_FLAG_HANDOFF))
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
+extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *, int count);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
 /*
@@ -165,7 +171,7 @@ static inline void __up_read(struct rw_semaphore *sem)
 	tmp = atomic_add_return_release(-RWSEM_READER_BIAS, &sem->count);
 	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS))
 			== RWSEM_FLAG_WAITERS))
-		rwsem_wake(sem);
+		rwsem_wake(sem, tmp);
 }
 
 /*
@@ -173,10 +179,12 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
+	int tmp;
+
 	rwsem_clear_owner(sem);
-	if (unlikely(atomic_fetch_add_release(-RWSEM_WRITER_LOCKED,
-			&sem->count) & RWSEM_FLAG_WAITERS))
-		rwsem_wake(sem);
+	tmp = atomic_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count);
+	if (unlikely(tmp & RWSEM_FLAG_WAITERS))
+		rwsem_wake(sem, tmp);
 }
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 07/11] locking/rwsem: Implement lock handoff to prevent lock starvation
@ 2017-10-11 18:01   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

Because of writer lock stealing, it is possible that a constant
stream of incoming writers will cause a waiting writer or reader to
wait indefinitely leading to lock starvation.

The mutex code has a lock handoff mechanism to prevent lock starvation.
This patch implements a similar lock handoff mechanism to disable
lock stealing and force lock handoff to the first waiter in the queue
after at least a 10ms waiting period. The waiting period is used to
avoid discouraging lock stealing too much to affect performance.

A rwsem microbenchmark was run for 5 seconds on a 2-socket 40-core
80-thread x86-64 system with a 4.14-rc2 based kernel and 60 write_lock
threads with 1us sleep critical section.

For the unpatched kernel, the locking rate was 15,519 kop/s. However
there were 28 threads with only 1 locking operation done (practically
starved). The thread with the highest locking rate had done more than
646k of them.

For the patched kernel, the locking rate dropped to 12,590 kop/s. The
number of locking operations done per thread had a range of 14,450 -
22,648. The rwsem became much more fair with the tradeoff of lower
overall throughput.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 98 +++++++++++++++++++++++++++++++++++++--------
 kernel/locking/rwsem-xadd.h | 22 ++++++----
 2 files changed, 96 insertions(+), 24 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index e3ab430..bca412f 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -70,6 +70,7 @@ struct rwsem_waiter {
 	struct list_head list;
 	struct task_struct *task;
 	enum rwsem_waiter_type type;
+	unsigned long timeout;
 };
 
 enum rwsem_wake_type {
@@ -79,6 +80,16 @@ enum rwsem_wake_type {
 };
 
 /*
+ * The minimum waiting time (10ms) in the wait queue before initiating the
+ * handoff protocol.
+ *
+ * The queue head waiter that is aborted (killed) will pass the handoff
+ * bit, if set, to the next waiter, but the bit has to be cleared when
+ * the wait queue becomes empty.
+ */
+#define RWSEM_WAIT_TIMEOUT	((HZ - 1)/100 + 1)
+
+/*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_xxxx(), then:
  *   - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
@@ -128,6 +139,13 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		adjustment = RWSEM_READER_BIAS;
 		oldcount = atomic_fetch_add(adjustment, &sem->count);
 		if (unlikely(oldcount & RWSEM_WRITER_LOCKED)) {
+			/*
+			 * Initiate handoff to reader, if applicable.
+			 */
+			if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
+			    time_after(jiffies, waiter->timeout))
+				adjustment -= RWSEM_FLAG_HANDOFF;
+
 			atomic_sub(adjustment, &sem->count);
 			return;
 		}
@@ -170,6 +188,12 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		adjustment -= RWSEM_FLAG_WAITERS;
 	}
 
+	/*
+	 * Clear the handoff flag
+	 */
+	if (woken && RWSEM_COUNT_IS_HANDOFF(atomic_read(&sem->count)))
+		adjustment -= RWSEM_FLAG_HANDOFF;
+
 	if (adjustment)
 		atomic_add(adjustment, &sem->count);
 }
@@ -179,15 +203,20 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
  * race conditions between checking the rwsem wait list and setting the
  * sem->count accordingly.
  */
-static inline bool rwsem_try_write_lock(int count, struct rw_semaphore *sem)
+static inline bool
+rwsem_try_write_lock(int count, struct rw_semaphore *sem, bool first)
 {
 	int new;
 
 	if (RWSEM_COUNT_IS_LOCKED(count))
 		return false;
 
+	if (!first && RWSEM_COUNT_IS_HANDOFF(count))
+		return false;
+
 	new = count + RWSEM_WRITER_LOCKED -
-	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0);
+	     (list_is_singular(&sem->wait_list) ? RWSEM_FLAG_WAITERS : 0) -
+	     (count & RWSEM_FLAG_HANDOFF);
 
 	if (atomic_cmpxchg_acquire(&sem->count, count, new) = count) {
 		rwsem_set_owner(sem);
@@ -206,7 +235,7 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 	int old, count = atomic_read(&sem->count);
 
 	while (true) {
-		if (RWSEM_COUNT_IS_LOCKED(count))
+		if (RWSEM_COUNT_IS_LOCKED_OR_HANDOFF(count))
 			return false;
 
 		old = atomic_cmpxchg_acquire(&sem->count, count,
@@ -362,6 +391,16 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 #endif
 
 /*
+ * This is safe to be called without holding the wait_lock.
+ */
+static inline bool
+rwsem_waiter_is_first(struct rw_semaphore *sem, struct rwsem_waiter *waiter)
+{
+	return list_first_entry(&sem->wait_list, struct rwsem_waiter, list)
+			= waiter;
+}
+
+/*
  * Wait for the read lock to be granted
  */
 static inline struct rw_semaphore __sched *
@@ -373,6 +412,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_READ;
+	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 	if (list_empty(&sem->wait_list))
@@ -413,8 +453,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	return sem;
 out_nolock:
 	list_del(&waiter.list);
-	if (list_empty(&sem->wait_list))
-		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
+	if (list_empty(&sem->wait_list)) {
+		int adjustment = -RWSEM_FLAG_WAITERS;
+
+		if (RWSEM_COUNT_IS_HANDOFF(atomic_read(&sem->count)))
+			adjustment -= RWSEM_FLAG_HANDOFF;
+		atomic_add(adjustment, &sem->count);
+	}
 	raw_spin_unlock_irq(&sem->wait_lock);
 	__set_current_state(TASK_RUNNING);
 	return ERR_PTR(-EINTR);
@@ -441,7 +486,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
 {
 	int count;
-	bool waiting = true; /* any queued threads before us */
+	bool first = false;	/* First one in queue */
 	struct rwsem_waiter waiter;
 	struct rw_semaphore *ret = sem;
 	DEFINE_WAKE_Q(wake_q);
@@ -456,17 +501,18 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	 */
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_WRITE;
+	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
 
 	raw_spin_lock_irq(&sem->wait_lock);
 
 	/* account for this before adding a new element to the list */
 	if (list_empty(&sem->wait_list))
-		waiting = false;
+		first = true;
 
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we're now waiting on the lock, but no longer actively locking */
-	if (waiting) {
+	if (!first) {
 		count = atomic_read(&sem->count);
 
 		/*
@@ -498,19 +544,30 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	/* wait until we successfully acquire the lock */
 	set_current_state(state);
 	while (true) {
-		if (rwsem_try_write_lock(count, sem))
+		if (rwsem_try_write_lock(count, sem, first))
 			break;
+
 		raw_spin_unlock_irq(&sem->wait_lock);
 
 		/* Block until there are no active lockers. */
-		do {
+		for (;;) {
 			if (signal_pending_state(state, current))
 				goto out_nolock;
 
 			schedule();
 			set_current_state(state);
 			count = atomic_read(&sem->count);
-		} while (RWSEM_COUNT_IS_LOCKED(count));
+
+			if (!first)
+				first = rwsem_waiter_is_first(sem, &waiter);
+
+			if (!RWSEM_COUNT_IS_LOCKED(count))
+				break;
+
+			if (first && !RWSEM_COUNT_IS_HANDOFF(count) &&
+			    time_after(jiffies, waiter.timeout))
+				atomic_or(RWSEM_FLAG_HANDOFF, &sem->count);
+		}
 
 		raw_spin_lock_irq(&sem->wait_lock);
 	}
@@ -524,10 +581,15 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	__set_current_state(TASK_RUNNING);
 	raw_spin_lock_irq(&sem->wait_lock);
 	list_del(&waiter.list);
-	if (list_empty(&sem->wait_list))
-		atomic_add(-RWSEM_FLAG_WAITERS, &sem->count);
-	else
+	if (list_empty(&sem->wait_list)) {
+		int adjustment = -RWSEM_FLAG_WAITERS;
+
+		if (RWSEM_COUNT_IS_HANDOFF(atomic_read(&sem->count)))
+			adjustment -= RWSEM_FLAG_HANDOFF;
+		atomic_add(adjustment, &sem->count);
+	} else {
 		__rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+	}
 	raw_spin_unlock_irq(&sem->wait_lock);
 	wake_up_q(&wake_q);
 
@@ -553,7 +615,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
  * - up_read/up_write has decremented the active part of count if we come here
  */
 __visible
-struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
+struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem, int count)
 {
 	unsigned long flags;
 	DEFINE_WAKE_Q(wake_q);
@@ -586,7 +648,9 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	smp_rmb();
 
 	/*
-	 * If a spinner is present, it is not necessary to do the wakeup.
+	 * If a spinner is present and the handoff flag isn't set, it is
+	 * not necessary to do the wakeup.
+	 *
 	 * Try to do wakeup only if the trylock succeeds to minimize
 	 * spinlock contention which may introduce too much delay in the
 	 * unlock operation.
@@ -605,7 +669,7 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
 	 * rwsem_has_spinner() is true, it will guarantee at least one
 	 * trylock attempt on the rwsem later on.
 	 */
-	if (rwsem_has_spinner(sem)) {
+	if (rwsem_has_spinner(sem) && !(count & RWSEM_FLAG_HANDOFF)) {
 		/*
 		 * The smp_rmb() here is to make sure that the spinner
 		 * state is consulted before reading the wait_lock.
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 9b30f0c..8cb12ed 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -74,7 +74,8 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
  *
  * Bit  0    - writer locked bit
  * Bit  1    - waiters present bit
- * Bits 2-7  - reserved
+ * Bit  2    - lock handoff bit
+ * Bits 3-7  - reserved
  * Bits 8-31 - 24-bit reader count
  *
  * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
@@ -82,19 +83,24 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
  */
 #define RWSEM_WRITER_LOCKED	0X00000001
 #define RWSEM_FLAG_WAITERS	0X00000002
+#define RWSEM_FLAG_HANDOFF	0X00000004
 #define RWSEM_READER_BIAS	0x00000100
 #define RWSEM_READER_SHIFT	8
 #define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
 #define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
-#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
+#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS|\
+				 RWSEM_FLAG_HANDOFF)
 
 #define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
+#define RWSEM_COUNT_IS_HANDOFF(c)	((c) & RWSEM_FLAG_HANDOFF)
+#define RWSEM_COUNT_IS_LOCKED_OR_HANDOFF(c)	\
+	((c) & (RWSEM_LOCK_MASK|RWSEM_FLAG_HANDOFF))
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *);
+extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *, int count);
 extern struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem);
 
 /*
@@ -165,7 +171,7 @@ static inline void __up_read(struct rw_semaphore *sem)
 	tmp = atomic_add_return_release(-RWSEM_READER_BIAS, &sem->count);
 	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS))
 			= RWSEM_FLAG_WAITERS))
-		rwsem_wake(sem);
+		rwsem_wake(sem, tmp);
 }
 
 /*
@@ -173,10 +179,12 @@ static inline void __up_read(struct rw_semaphore *sem)
  */
 static inline void __up_write(struct rw_semaphore *sem)
 {
+	int tmp;
+
 	rwsem_clear_owner(sem);
-	if (unlikely(atomic_fetch_add_release(-RWSEM_WRITER_LOCKED,
-			&sem->count) & RWSEM_FLAG_WAITERS))
-		rwsem_wake(sem);
+	tmp = atomic_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count);
+	if (unlikely(tmp & RWSEM_FLAG_WAITERS))
+		rwsem_wake(sem, tmp);
 }
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 08/11] locking/rwsem: Enable readers spinning on writer
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:01   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

This patch enables readers to optimistically spin on a
rwsem when it is owned by a writer instead of going to sleep
directly.  The rwsem_can_spin_on_owner() function is extracted
out of rwsem_optimistic_spin() and is called directly by
rwsem_down_read_failed() and rwsem_down_write_failed().

This patch may actually reduce performance under certain circumstances
as the readers may not be grouped together in the wait queue anymore.
So we may have a number of small reader groups among writers instead
of a large reader group. However, this change is needed for some of
the subsequent patches.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 66 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 57 insertions(+), 9 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index bca412f..52305c3 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -228,6 +228,28 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
+ * Try to acquire read lock before the reader is put on wait queue.
+ */
+static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+{
+	int count = atomic_read(&sem->count);
+
+	if (count & (RWSEM_FLAG_HANDOFF|RWSEM_WRITER_LOCKED))
+		return false;
+
+	count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
+	if (!(count & (RWSEM_FLAG_HANDOFF|RWSEM_WRITER_LOCKED))) {
+		if (!(count >> RWSEM_READER_SHIFT))
+			rwsem_set_reader_owned(sem);
+		return true;
+	}
+
+	/* Back out the change */
+	atomic_add(-RWSEM_READER_BIAS, &sem->count);
+	return false;
+}
+
+/*
  * Try to acquire write lock before the writer has been put on wait queue.
  */
 static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
@@ -318,16 +340,14 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
 
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool
+rwsem_optimistic_spin(struct rw_semaphore *sem, enum rwsem_waiter_type type)
 {
 	bool taken = false;
 
 	preempt_disable();
 
 	/* sem->wait_lock should not be held when doing optimistic spinning */
-	if (!rwsem_can_spin_on_owner(sem))
-		goto done;
-
 	if (!osq_lock(&sem->osq))
 		goto done;
 
@@ -342,10 +362,12 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		/*
 		 * Try to acquire the lock
 		 */
-		if (rwsem_try_write_lock_unqueued(sem)) {
-			taken = true;
+		taken = (type == RWSEM_WAITING_FOR_WRITE)
+		      ? rwsem_try_write_lock_unqueued(sem)
+		      : rwsem_try_read_lock_unqueued(sem);
+
+		if (taken)
 			break;
-		}
 
 		/*
 		 * When there's no owner, we might have preempted between the
@@ -379,7 +401,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 }
 
 #else
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+{
+	return false;
+}
+
+static inline bool
+rwsem_optimistic_spin(struct rw_semaphore *sem, enum rwsem_waiter_type type)
 {
 	return false;
 }
@@ -406,10 +434,29 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore __sched *
 __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 {
+	bool can_spin;
 	int count, adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
+	/*
+	 * Undo read bias from down_read operation to stop active locking if:
+	 * 1) Optimistic spinners are present; or
+	 * 2) optimistic spinning is allowed.
+	 */
+	can_spin = rwsem_can_spin_on_owner(sem);
+	if (can_spin || rwsem_has_spinner(sem)) {
+		atomic_add(-RWSEM_READER_BIAS, &sem->count);
+		adjustment = 0;
+
+		/*
+		 * Do optimistic spinning and steal lock if possible.
+		 */
+		if (can_spin &&
+		    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ))
+			return sem;
+	}
+
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_READ;
 	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
@@ -492,7 +539,8 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_optimistic_spin(sem))
+	if (rwsem_can_spin_on_owner(sem) &&
+	    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
 		return sem;
 
 	/*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 08/11] locking/rwsem: Enable readers spinning on writer
@ 2017-10-11 18:01   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:01 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

This patch enables readers to optimistically spin on a
rwsem when it is owned by a writer instead of going to sleep
directly.  The rwsem_can_spin_on_owner() function is extracted
out of rwsem_optimistic_spin() and is called directly by
rwsem_down_read_failed() and rwsem_down_write_failed().

This patch may actually reduce performance under certain circumstances
as the readers may not be grouped together in the wait queue anymore.
So we may have a number of small reader groups among writers instead
of a large reader group. However, this change is needed for some of
the subsequent patches.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 66 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 57 insertions(+), 9 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index bca412f..52305c3 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -228,6 +228,28 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
+ * Try to acquire read lock before the reader is put on wait queue.
+ */
+static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+{
+	int count = atomic_read(&sem->count);
+
+	if (count & (RWSEM_FLAG_HANDOFF|RWSEM_WRITER_LOCKED))
+		return false;
+
+	count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
+	if (!(count & (RWSEM_FLAG_HANDOFF|RWSEM_WRITER_LOCKED))) {
+		if (!(count >> RWSEM_READER_SHIFT))
+			rwsem_set_reader_owned(sem);
+		return true;
+	}
+
+	/* Back out the change */
+	atomic_add(-RWSEM_READER_BIAS, &sem->count);
+	return false;
+}
+
+/*
  * Try to acquire write lock before the writer has been put on wait queue.
  */
 static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
@@ -318,16 +340,14 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
 
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static bool
+rwsem_optimistic_spin(struct rw_semaphore *sem, enum rwsem_waiter_type type)
 {
 	bool taken = false;
 
 	preempt_disable();
 
 	/* sem->wait_lock should not be held when doing optimistic spinning */
-	if (!rwsem_can_spin_on_owner(sem))
-		goto done;
-
 	if (!osq_lock(&sem->osq))
 		goto done;
 
@@ -342,10 +362,12 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		/*
 		 * Try to acquire the lock
 		 */
-		if (rwsem_try_write_lock_unqueued(sem)) {
-			taken = true;
+		taken = (type = RWSEM_WAITING_FOR_WRITE)
+		      ? rwsem_try_write_lock_unqueued(sem)
+		      : rwsem_try_read_lock_unqueued(sem);
+
+		if (taken)
 			break;
-		}
 
 		/*
 		 * When there's no owner, we might have preempted between the
@@ -379,7 +401,13 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 }
 
 #else
-static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+{
+	return false;
+}
+
+static inline bool
+rwsem_optimistic_spin(struct rw_semaphore *sem, enum rwsem_waiter_type type)
 {
 	return false;
 }
@@ -406,10 +434,29 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 static inline struct rw_semaphore __sched *
 __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 {
+	bool can_spin;
 	int count, adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
+	/*
+	 * Undo read bias from down_read operation to stop active locking if:
+	 * 1) Optimistic spinners are present; or
+	 * 2) optimistic spinning is allowed.
+	 */
+	can_spin = rwsem_can_spin_on_owner(sem);
+	if (can_spin || rwsem_has_spinner(sem)) {
+		atomic_add(-RWSEM_READER_BIAS, &sem->count);
+		adjustment = 0;
+
+		/*
+		 * Do optimistic spinning and steal lock if possible.
+		 */
+		if (can_spin &&
+		    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_READ))
+			return sem;
+	}
+
 	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_READ;
 	waiter.timeout = jiffies + RWSEM_WAIT_TIMEOUT;
@@ -492,7 +539,8 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_optimistic_spin(sem))
+	if (rwsem_can_spin_on_owner(sem) &&
+	    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
 		return sem;
 
 	/*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 09/11] locking/rwsem: Enable time-based reader lock stealing
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:02   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

Because of writer lock stealing and optimistic spinning, writers
were preferred over readers in rwsem. However, preferring readers is
generally good for performance. However, we need to do it in a way
that won't starve writers as an incoming stream of readers will not
allow a wakeup call to happen which is needed to allow a waiting
writer to set the handoff bit to initiate the handoff protocol.

Now the owner field is extended to hold a timestamp put in by the
first reader that acquires the lock. An incoming reader is allowed
to steal the lock if the lock is reader owned, the handoff bit isn't
set and the owner timestamp matches the current time.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 45 +++++++++++++++++++++++++++-------
 kernel/locking/rwsem-xadd.h | 60 +++++++++++++++++++++++++++++++++++----------
 2 files changed, 83 insertions(+), 22 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 52305c3..38a6c32 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -150,10 +150,11 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			return;
 		}
 		/*
-		 * Set it to reader-owned for first reader.
+		 * Since the wait queue head is a reader, we can set a
+		 * new timestamp even if it is not the first reader to
+		 * acquire the current lock.
 		 */
-		if (!(oldcount >> RWSEM_READER_SHIFT))
-			rwsem_set_reader_owned(sem);
+		rwsem_set_reader_owned(sem);
 	}
 
 	/*
@@ -400,6 +401,22 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	return osq_is_locked(&sem->osq);
 }
 
+static inline bool
+rwsem_reader_steal_lock(struct rw_semaphore *sem, int count)
+{
+	struct task_struct *owner = READ_ONCE(sem->owner);
+
+	/*
+	 * Reader can steal the lock if:
+	 * 1) the lock is reader-owned;
+	 * 2) the handoff bit isn't set; and
+	 * 3) the time stamp in the owner field matches the current time
+	 *    when it is properly initialized.
+	 */
+	return !(count & (RWSEM_WRITER_LOCKED|RWSEM_FLAG_HANDOFF)) &&
+		(!rwsem_owner_is_reader(owner) ||
+		  rwsem_owner_timestamp_match(owner));
+}
 #else
 static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 {
@@ -416,6 +433,12 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 {
 	return false;
 }
+
+static inline bool
+rwsem_reader_steal_lock(struct rw_semaphore *sem, int count)
+{
+	return false;
+}
 #endif
 
 /*
@@ -432,13 +455,16 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
  * Wait for the read lock to be granted
  */
 static inline struct rw_semaphore __sched *
-__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
+__rwsem_down_read_failed_common(struct rw_semaphore *sem, int count, int state)
 {
 	bool can_spin;
-	int count, adjustment = -RWSEM_READER_BIAS;
+	int adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
+	if (rwsem_reader_steal_lock(sem, count))
+		return sem;
+
 	/*
 	 * Undo read bias from down_read operation to stop active locking if:
 	 * 1) Optimistic spinners are present; or
@@ -513,16 +539,17 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 }
 
 __visible struct rw_semaphore * __sched
-rwsem_down_read_failed(struct rw_semaphore *sem)
+rwsem_down_read_failed(struct rw_semaphore *sem, int count)
 {
-	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE);
+	return __rwsem_down_read_failed_common(sem, count,
+					       TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(rwsem_down_read_failed);
 
 __visible struct rw_semaphore * __sched
-rwsem_down_read_failed_killable(struct rw_semaphore *sem)
+rwsem_down_read_failed_killable(struct rw_semaphore *sem, int count)
 {
-	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE);
+	return __rwsem_down_read_failed_common(sem, count, TASK_KILLABLE);
 }
 EXPORT_SYMBOL(rwsem_down_read_failed_killable);
 
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 8cb12ed..bf47d4a 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -4,21 +4,32 @@
 #include <linux/rwsem.h>
 
 /*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
+ * When a reader acquires the lock, the RWSEM_READ_OWNED bit of the owner
+ * field will be set. In addition, a timestamp based on the current jiffies
+ * value will be put into the upper bits of the owner. An incoming reader will
+ * now be allowed to join the read lock iff the current time matches the
+ * timestamp. This will greatly favor the readers which is generally good
+ * for improving throughput. The timestamp check, however, will prevent
+ * a continuous stream of incoming readers from monopolizing the lock
+ * and starving the writers.
+ *
+ * A writer will clear the owner field when it unlocks. A reader, on the
+ * other hand, will not touch the owner field when it unlocks.
  *
  * In essence, the owner field now has the following 3 states:
  *  1) 0
  *     - lock is free or the owner hasn't set the field yet
- *  2) RWSEM_READER_OWNED
+ *  2) (owner & RWSEM_READER_OWNED) == RWSEM_READER_OWNED
  *     - lock is currently or previously owned by readers (lock is free
- *       or not set by owner yet)
+ *       or not set by owner yet). The other bits in the owner field can
+ *       be used for other purpose.
  *  3) Other non-zero value
  *     - a writer owns the lock
  */
-#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
+#define RWSEM_READER_OWNED		(1UL)
+#define RWSEM_READER_TIMESTAMP_SHIFT	8
+#define RWSEM_READER_TIMESTAMP_MASK	\
+	~((1UL << RWSEM_READER_TIMESTAMP_SHIFT) - 1)
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
@@ -43,17 +54,32 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
  */
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
-	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+	WRITE_ONCE(sem->owner, (void *)(RWSEM_READER_OWNED |
+		  (jiffies << RWSEM_READER_TIMESTAMP_SHIFT)));
 }
 
 static inline bool rwsem_owner_is_writer(struct task_struct *owner)
 {
-	return owner && owner != RWSEM_READER_OWNED;
+	return owner && !((unsigned long)owner & RWSEM_READER_OWNED);
 }
 
 static inline bool rwsem_owner_is_reader(struct task_struct *owner)
 {
-	return owner == RWSEM_READER_OWNED;
+	return (unsigned long)owner & RWSEM_READER_OWNED;
+}
+
+/*
+ * Return true if the timestamp matches the current time.
+ */
+static inline bool rwsem_owner_timestamp_match(struct task_struct *owner)
+{
+	return ((unsigned long)owner & RWSEM_READER_TIMESTAMP_MASK) ==
+	       (jiffies << RWSEM_READER_TIMESTAMP_SHIFT);
+}
+
+static inline bool rwsem_is_reader_owned(struct rw_semaphore *sem)
+{
+	return rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
@@ -67,6 +93,11 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
 }
+
+static inline bool rwsem_is_reader_owned(struct rw_semaphore *sem)
+{
+	return false;
+}
 #endif
 
 /*
@@ -96,8 +127,11 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 #define RWSEM_COUNT_IS_LOCKED_OR_HANDOFF(c)	\
 	((c) & (RWSEM_LOCK_MASK|RWSEM_FLAG_HANDOFF))
 
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
+extern struct rw_semaphore *
+rwsem_down_read_failed(struct rw_semaphore *sem, int count);
+extern struct rw_semaphore *
+rwsem_down_read_failed_killable(struct rw_semaphore *sem, int count);
+
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *, int count);
@@ -111,7 +145,7 @@ static inline void __down_read(struct rw_semaphore *sem)
 	int count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
 
 	if (unlikely(count & RWSEM_READ_FAILED_MASK))
-		rwsem_down_read_failed(sem);
+		rwsem_down_read_failed(sem, count);
 	else if ((count >> RWSEM_READER_SHIFT) == 1)
 		rwsem_set_reader_owned(sem);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 09/11] locking/rwsem: Enable time-based reader lock stealing
@ 2017-10-11 18:02   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

Because of writer lock stealing and optimistic spinning, writers
were preferred over readers in rwsem. However, preferring readers is
generally good for performance. However, we need to do it in a way
that won't starve writers as an incoming stream of readers will not
allow a wakeup call to happen which is needed to allow a waiting
writer to set the handoff bit to initiate the handoff protocol.

Now the owner field is extended to hold a timestamp put in by the
first reader that acquires the lock. An incoming reader is allowed
to steal the lock if the lock is reader owned, the handoff bit isn't
set and the owner timestamp matches the current time.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 45 +++++++++++++++++++++++++++-------
 kernel/locking/rwsem-xadd.h | 60 +++++++++++++++++++++++++++++++++++----------
 2 files changed, 83 insertions(+), 22 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 52305c3..38a6c32 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -150,10 +150,11 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 			return;
 		}
 		/*
-		 * Set it to reader-owned for first reader.
+		 * Since the wait queue head is a reader, we can set a
+		 * new timestamp even if it is not the first reader to
+		 * acquire the current lock.
 		 */
-		if (!(oldcount >> RWSEM_READER_SHIFT))
-			rwsem_set_reader_owned(sem);
+		rwsem_set_reader_owned(sem);
 	}
 
 	/*
@@ -400,6 +401,22 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	return osq_is_locked(&sem->osq);
 }
 
+static inline bool
+rwsem_reader_steal_lock(struct rw_semaphore *sem, int count)
+{
+	struct task_struct *owner = READ_ONCE(sem->owner);
+
+	/*
+	 * Reader can steal the lock if:
+	 * 1) the lock is reader-owned;
+	 * 2) the handoff bit isn't set; and
+	 * 3) the time stamp in the owner field matches the current time
+	 *    when it is properly initialized.
+	 */
+	return !(count & (RWSEM_WRITER_LOCKED|RWSEM_FLAG_HANDOFF)) &&
+		(!rwsem_owner_is_reader(owner) ||
+		  rwsem_owner_timestamp_match(owner));
+}
 #else
 static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 {
@@ -416,6 +433,12 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 {
 	return false;
 }
+
+static inline bool
+rwsem_reader_steal_lock(struct rw_semaphore *sem, int count)
+{
+	return false;
+}
 #endif
 
 /*
@@ -432,13 +455,16 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
  * Wait for the read lock to be granted
  */
 static inline struct rw_semaphore __sched *
-__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
+__rwsem_down_read_failed_common(struct rw_semaphore *sem, int count, int state)
 {
 	bool can_spin;
-	int count, adjustment = -RWSEM_READER_BIAS;
+	int adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
+	if (rwsem_reader_steal_lock(sem, count))
+		return sem;
+
 	/*
 	 * Undo read bias from down_read operation to stop active locking if:
 	 * 1) Optimistic spinners are present; or
@@ -513,16 +539,17 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 }
 
 __visible struct rw_semaphore * __sched
-rwsem_down_read_failed(struct rw_semaphore *sem)
+rwsem_down_read_failed(struct rw_semaphore *sem, int count)
 {
-	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE);
+	return __rwsem_down_read_failed_common(sem, count,
+					       TASK_UNINTERRUPTIBLE);
 }
 EXPORT_SYMBOL(rwsem_down_read_failed);
 
 __visible struct rw_semaphore * __sched
-rwsem_down_read_failed_killable(struct rw_semaphore *sem)
+rwsem_down_read_failed_killable(struct rw_semaphore *sem, int count)
 {
-	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE);
+	return __rwsem_down_read_failed_common(sem, count, TASK_KILLABLE);
 }
 EXPORT_SYMBOL(rwsem_down_read_failed_killable);
 
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index 8cb12ed..bf47d4a 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -4,21 +4,32 @@
 #include <linux/rwsem.h>
 
 /*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
+ * When a reader acquires the lock, the RWSEM_READ_OWNED bit of the owner
+ * field will be set. In addition, a timestamp based on the current jiffies
+ * value will be put into the upper bits of the owner. An incoming reader will
+ * now be allowed to join the read lock iff the current time matches the
+ * timestamp. This will greatly favor the readers which is generally good
+ * for improving throughput. The timestamp check, however, will prevent
+ * a continuous stream of incoming readers from monopolizing the lock
+ * and starving the writers.
+ *
+ * A writer will clear the owner field when it unlocks. A reader, on the
+ * other hand, will not touch the owner field when it unlocks.
  *
  * In essence, the owner field now has the following 3 states:
  *  1) 0
  *     - lock is free or the owner hasn't set the field yet
- *  2) RWSEM_READER_OWNED
+ *  2) (owner & RWSEM_READER_OWNED) = RWSEM_READER_OWNED
  *     - lock is currently or previously owned by readers (lock is free
- *       or not set by owner yet)
+ *       or not set by owner yet). The other bits in the owner field can
+ *       be used for other purpose.
  *  3) Other non-zero value
  *     - a writer owns the lock
  */
-#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
+#define RWSEM_READER_OWNED		(1UL)
+#define RWSEM_READER_TIMESTAMP_SHIFT	8
+#define RWSEM_READER_TIMESTAMP_MASK	\
+	~((1UL << RWSEM_READER_TIMESTAMP_SHIFT) - 1)
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
@@ -43,17 +54,32 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
  */
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
-	WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+	WRITE_ONCE(sem->owner, (void *)(RWSEM_READER_OWNED |
+		  (jiffies << RWSEM_READER_TIMESTAMP_SHIFT)));
 }
 
 static inline bool rwsem_owner_is_writer(struct task_struct *owner)
 {
-	return owner && owner != RWSEM_READER_OWNED;
+	return owner && !((unsigned long)owner & RWSEM_READER_OWNED);
 }
 
 static inline bool rwsem_owner_is_reader(struct task_struct *owner)
 {
-	return owner = RWSEM_READER_OWNED;
+	return (unsigned long)owner & RWSEM_READER_OWNED;
+}
+
+/*
+ * Return true if the timestamp matches the current time.
+ */
+static inline bool rwsem_owner_timestamp_match(struct task_struct *owner)
+{
+	return ((unsigned long)owner & RWSEM_READER_TIMESTAMP_MASK) =
+	       (jiffies << RWSEM_READER_TIMESTAMP_SHIFT);
+}
+
+static inline bool rwsem_is_reader_owned(struct rw_semaphore *sem)
+{
+	return rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
@@ -67,6 +93,11 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
 }
+
+static inline bool rwsem_is_reader_owned(struct rw_semaphore *sem)
+{
+	return false;
+}
 #endif
 
 /*
@@ -96,8 +127,11 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 #define RWSEM_COUNT_IS_LOCKED_OR_HANDOFF(c)	\
 	((c) & (RWSEM_LOCK_MASK|RWSEM_FLAG_HANDOFF))
 
-extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
-extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
+extern struct rw_semaphore *
+rwsem_down_read_failed(struct rw_semaphore *sem, int count);
+extern struct rw_semaphore *
+rwsem_down_read_failed_killable(struct rw_semaphore *sem, int count);
+
 extern struct rw_semaphore *rwsem_down_write_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_write_failed_killable(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_wake(struct rw_semaphore *, int count);
@@ -111,7 +145,7 @@ static inline void __down_read(struct rw_semaphore *sem)
 	int count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
 
 	if (unlikely(count & RWSEM_READ_FAILED_MASK))
-		rwsem_down_read_failed(sem);
+		rwsem_down_read_failed(sem, count);
 	else if ((count >> RWSEM_READER_SHIFT) = 1)
 		rwsem_set_reader_owned(sem);
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 10/11] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:02   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

This patch modifies rwsem_spin_on_owner() to return a tri-state value
to better reflect the state of lock holder which enables us to make a
better decision of what to do next.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 38a6c32..d0f3778 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -301,9 +301,13 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 }
 
 /*
- * Return true only if we can still spin on the owner field of the rwsem.
+ * Return the folowing three values depending on the lock owner state.
+ *   1	when owner has changed and no reader is detected yet.
+ *   0	when owner has change and/or owner is a reader.
+ *  -1	when optimistic spinning has to stop because either the owner stops
+ *	running or its timeslice has been used up.
  */
-static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
+static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner = READ_ONCE(sem->owner);
 
@@ -327,7 +331,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 		if (!owner->on_cpu || need_resched() ||
 				vcpu_is_preempted(task_cpu(owner))) {
 			rcu_read_unlock();
-			return false;
+			return -1;
 		}
 
 		cpu_relax();
@@ -338,7 +342,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 	 * If there is a new owner or the owner is not set, we continue
 	 * spinning.
 	 */
-	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
+	return rwsem_owner_is_reader(READ_ONCE(sem->owner)) ? 0 : 1;
 }
 
 static bool
@@ -359,7 +363,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 	 *  2) readers own the lock as we can't determine if they are
 	 *     actively running or not.
 	 */
-	while (rwsem_spin_on_owner(sem)) {
+	while (rwsem_spin_on_owner(sem) > 0) {
 		/*
 		 * Try to acquire the lock
 		 */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 10/11] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value
@ 2017-10-11 18:02   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

This patch modifies rwsem_spin_on_owner() to return a tri-state value
to better reflect the state of lock holder which enables us to make a
better decision of what to do next.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 38a6c32..d0f3778 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -301,9 +301,13 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 }
 
 /*
- * Return true only if we can still spin on the owner field of the rwsem.
+ * Return the folowing three values depending on the lock owner state.
+ *   1	when owner has changed and no reader is detected yet.
+ *   0	when owner has change and/or owner is a reader.
+ *  -1	when optimistic spinning has to stop because either the owner stops
+ *	running or its timeslice has been used up.
  */
-static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
+static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
 	struct task_struct *owner = READ_ONCE(sem->owner);
 
@@ -327,7 +331,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 		if (!owner->on_cpu || need_resched() ||
 				vcpu_is_preempted(task_cpu(owner))) {
 			rcu_read_unlock();
-			return false;
+			return -1;
 		}
 
 		cpu_relax();
@@ -338,7 +342,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 	 * If there is a new owner or the owner is not set, we continue
 	 * spinning.
 	 */
-	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
+	return rwsem_owner_is_reader(READ_ONCE(sem->owner)) ? 0 : 1;
 }
 
 static bool
@@ -359,7 +363,7 @@ static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 	 *  2) readers own the lock as we can't determine if they are
 	 *     actively running or not.
 	 */
-	while (rwsem_spin_on_owner(sem)) {
+	while (rwsem_spin_on_owner(sem) > 0) {
 		/*
 		 * Try to acquire the lock
 		 */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 11/11] locking/rwsem: Enable count-based spinning on reader
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:02   ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

When the rwsem is owned by reader, writers stop optimistic spinning
simply because there is no easy way to figure out if all the readers
are actively running or not. However, there are scenarios where
the readers are unlikely to sleep and optimistic spinning can help
performance.

This patch provides a simple mechanism for spinning on a reader-owned
rwsem. It is a loop count threshold based spinning where the count will
get reset whenenver the the rwsem reader count value changes indicating
that the rwsem is still active. There is another maximum count value
that limits that maximum number of spinnings that can happen.

When the loop or max counts reach 0, a bit will be set in the owner
field to indicate that no more optimistic spinning should be done on
this rwsem until it becomes writer owned again.

The spinning threshold and maximum values can be overridden by
architecture specific rwsem.h header file, if necessary. The current
default threshold value is 512 iterations.

On a 2-socket 40-core x86-64 Gold 6148 system, a rwsem microbenchmark
was run with 40 locking threads (one/core) doing 10s of equal number
of reader and writer lock/unlock operations on the same rwsem
alternatively, the resulting locking total rates on a 4.14 based
kernel were 927 kop/s and 3218 kop/s without and with the patch
respectively. That was an increase of about 247%.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 97 +++++++++++++++++++++++++++++++++++++++------
 kernel/locking/rwsem-xadd.h | 27 +++++++++++++
 2 files changed, 111 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index d0f3778..62147a9 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -90,6 +90,22 @@ enum rwsem_wake_type {
 #define RWSEM_WAIT_TIMEOUT	((HZ - 1)/100 + 1)
 
 /*
+ * Reader-owned rwsem spinning threshold and maximum value
+ *
+ * This threshold and maximum values can be overridden by architecture
+ * specific value. The loop count will be reset whenenver the rwsem count
+ * value changes. The max value constrains the total number of reader-owned
+ * lock spinnings that can happen.
+ */
+#ifdef	ARCH_RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_THRESHOLD	ARCH_RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_MAX	ARCH_RWSEM_RSPIN_MAX
+#else
+# define RWSEM_RSPIN_THRESHOLD	(1 << 9)
+# define RWSEM_RSPIN_MAX	(1 << 12)
+#endif
+
+/*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_xxxx(), then:
  *   - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
@@ -230,8 +246,17 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
  * Try to acquire read lock before the reader is put on wait queue.
+ *
+ * To avoid writer starvation, it cannot acquire the lock if the reader
+ * count isn't 0 and there is a timestamp mismatch. In this case, the
+ * reader has to stop spinning.
+ *
+ * This will at least allow the reader count to go to 0 and wake up the
+ * first one in the wait queue which can initiate the handoff protocol,
+ * if necessary.
  */
-static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem,
+						bool *stop_spin)
 {
 	int count = atomic_read(&sem->count);
 
@@ -240,11 +265,21 @@ static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
 
 	count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
 	if (!(count & (RWSEM_FLAG_HANDOFF|RWSEM_WRITER_LOCKED))) {
-		if (!(count >> RWSEM_READER_SHIFT))
+		if (!(count >> RWSEM_READER_SHIFT)) {
 			rwsem_set_reader_owned(sem);
+		} else {
+			struct task_struct *owner = READ_ONCE(sem->owner);
+
+			if (rwsem_owner_is_reader(owner) &&
+			   !rwsem_owner_timestamp_match(owner)) {
+				*stop_spin = true;
+				goto backout;
+			}
+		}
 		return true;
 	}
 
+backout:
 	/* Back out the change */
 	atomic_add(-RWSEM_READER_BIAS, &sem->count);
 	return false;
@@ -272,7 +307,8 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 	}
 }
 
-static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,
+					   bool reader)
 {
 	struct task_struct *owner;
 	bool ret = true;
@@ -284,9 +320,9 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 	owner = READ_ONCE(sem->owner);
 	if (!rwsem_owner_is_writer(owner)) {
 		/*
-		 * Don't spin if the rwsem is readers owned.
+		 * Don't spin if the rspin disable bit is set for writer.
 		 */
-		ret = !rwsem_owner_is_reader(owner);
+		ret = reader || !rwsem_owner_is_spin_disabled(owner);
 		goto done;
 	}
 
@@ -349,6 +385,11 @@ static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
 rwsem_optimistic_spin(struct rw_semaphore *sem, enum rwsem_waiter_type type)
 {
 	bool taken = false;
+	bool stop_spin = false;
+	int owner_state;	/* Lock owner state */
+	int rspin_cnt = RWSEM_RSPIN_THRESHOLD;
+	int rspin_max = RWSEM_RSPIN_MAX;
+	int old_count = 0;
 
 	preempt_disable();
 
@@ -356,25 +397,55 @@ static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
 	if (!osq_lock(&sem->osq))
 		goto done;
 
+	if (rwsem_is_spin_disabled(sem))
+		rspin_cnt = 0;
+
 	/*
 	 * Optimistically spin on the owner field and attempt to acquire the
 	 * lock whenever the owner changes. Spinning will be stopped when:
-	 *  1) the owning writer isn't running; or
-	 *  2) readers own the lock as we can't determine if they are
-	 *     actively running or not.
+	 *  1) the owning writer isn't running;
+	 *  2) writer: readers own the lock and spinning count has reached 0;
+	 *  3) reader: timestamp mismatch.
 	 */
-	while (rwsem_spin_on_owner(sem) > 0) {
+	while ((owner_state = rwsem_spin_on_owner(sem)) >= 0) {
 		/*
 		 * Try to acquire the lock
 		 */
 		taken = (type == RWSEM_WAITING_FOR_WRITE)
 		      ? rwsem_try_write_lock_unqueued(sem)
-		      : rwsem_try_read_lock_unqueued(sem);
+		      : rwsem_try_read_lock_unqueued(sem, &stop_spin);
 
-		if (taken)
+		if (taken || stop_spin)
 			break;
 
 		/*
+		 * We only decremnt the rspin_cnt when the lock is owned
+		 * by readers (owner_state == 0). In which case,
+		 * rwsem_spin_on_owner() will essentially be a no-op
+		 * and we will be spinning in this main loop. The spinning
+		 * count will be reset whenever the rwsem count value
+		 * changes.
+		 */
+		if (!owner_state) {
+			int count;
+
+			if (!rspin_cnt || !rspin_max) {
+				if (!rwsem_is_spin_disabled(sem))
+					rwsem_set_spin_disabled(sem);
+				break;
+			}
+
+			count = atomic_read(&sem->count) >> RWSEM_READER_SHIFT;
+			if (count != old_count) {
+				old_count = count;
+				rspin_cnt = RWSEM_RSPIN_THRESHOLD;
+			} else {
+				rspin_cnt--;
+			}
+			rspin_max--;
+		}
+
+		/*
 		 * When there's no owner, we might have preempted between the
 		 * owner acquiring the lock and setting the owner field. If
 		 * we're an RT task that will live-lock because we won't let
@@ -474,7 +545,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	 * 1) Optimistic spinners are present; or
 	 * 2) optimistic spinning is allowed.
 	 */
-	can_spin = rwsem_can_spin_on_owner(sem);
+	can_spin = rwsem_can_spin_on_owner(sem, true);
 	if (can_spin || rwsem_has_spinner(sem)) {
 		atomic_add(-RWSEM_READER_BIAS, &sem->count);
 		adjustment = 0;
@@ -570,7 +641,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_can_spin_on_owner(sem) &&
+	if (rwsem_can_spin_on_owner(sem, false) &&
 	    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
 		return sem;
 
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index bf47d4a..6b60aac 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -27,6 +27,7 @@
  *     - a writer owns the lock
  */
 #define RWSEM_READER_OWNED		(1UL)
+#define RWSEM_SPIN_DISABLED		(2UL)
 #define RWSEM_READER_TIMESTAMP_SHIFT	8
 #define RWSEM_READER_TIMESTAMP_MASK	\
 	~((1UL << RWSEM_READER_TIMESTAMP_SHIFT) - 1)
@@ -81,6 +82,32 @@ static inline bool rwsem_is_reader_owned(struct rw_semaphore *sem)
 {
 	return rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
+
+static inline bool rwsem_owner_is_spin_disabled(struct task_struct *owner)
+{
+	return (unsigned long)owner & RWSEM_SPIN_DISABLED;
+}
+
+/*
+ * Try to set an optimistic spinning disable bit while it is reader-owned.
+ */
+static inline void rwsem_set_spin_disabled(struct rw_semaphore *sem)
+{
+	struct task_struct *owner = READ_ONCE(sem->owner);
+
+	/*
+	 * Failure in cmpxchg() will be ignored, and the caller is expected
+	 * to retry later.
+	 */
+	if (rwsem_owner_is_reader(owner))
+		cmpxchg(&sem->owner, owner,
+			(void *)((unsigned long)owner|RWSEM_SPIN_DISABLED));
+}
+
+static inline bool rwsem_is_spin_disabled(struct rw_semaphore *sem)
+{
+	return rwsem_owner_is_spin_disabled(READ_ONCE(sem->owner));
+}
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v6 11/11] locking/rwsem: Enable count-based spinning on reader
@ 2017-10-11 18:02   ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:02 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, x86, linux-alpha, linux-ia64, linux-s390,
	linux-arch, Davidlohr Bueso, Dave Chinner, Waiman Long

When the rwsem is owned by reader, writers stop optimistic spinning
simply because there is no easy way to figure out if all the readers
are actively running or not. However, there are scenarios where
the readers are unlikely to sleep and optimistic spinning can help
performance.

This patch provides a simple mechanism for spinning on a reader-owned
rwsem. It is a loop count threshold based spinning where the count will
get reset whenenver the the rwsem reader count value changes indicating
that the rwsem is still active. There is another maximum count value
that limits that maximum number of spinnings that can happen.

When the loop or max counts reach 0, a bit will be set in the owner
field to indicate that no more optimistic spinning should be done on
this rwsem until it becomes writer owned again.

The spinning threshold and maximum values can be overridden by
architecture specific rwsem.h header file, if necessary. The current
default threshold value is 512 iterations.

On a 2-socket 40-core x86-64 Gold 6148 system, a rwsem microbenchmark
was run with 40 locking threads (one/core) doing 10s of equal number
of reader and writer lock/unlock operations on the same rwsem
alternatively, the resulting locking total rates on a 4.14 based
kernel were 927 kop/s and 3218 kop/s without and with the patch
respectively. That was an increase of about 247%.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem-xadd.c | 97 +++++++++++++++++++++++++++++++++++++++------
 kernel/locking/rwsem-xadd.h | 27 +++++++++++++
 2 files changed, 111 insertions(+), 13 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index d0f3778..62147a9 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -90,6 +90,22 @@ enum rwsem_wake_type {
 #define RWSEM_WAIT_TIMEOUT	((HZ - 1)/100 + 1)
 
 /*
+ * Reader-owned rwsem spinning threshold and maximum value
+ *
+ * This threshold and maximum values can be overridden by architecture
+ * specific value. The loop count will be reset whenenver the rwsem count
+ * value changes. The max value constrains the total number of reader-owned
+ * lock spinnings that can happen.
+ */
+#ifdef	ARCH_RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_THRESHOLD	ARCH_RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_MAX	ARCH_RWSEM_RSPIN_MAX
+#else
+# define RWSEM_RSPIN_THRESHOLD	(1 << 9)
+# define RWSEM_RSPIN_MAX	(1 << 12)
+#endif
+
+/*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_xxxx(), then:
  *   - the 'active part' of count (&0x0000ffff) reached 0 (but may have changed)
@@ -230,8 +246,17 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 /*
  * Try to acquire read lock before the reader is put on wait queue.
+ *
+ * To avoid writer starvation, it cannot acquire the lock if the reader
+ * count isn't 0 and there is a timestamp mismatch. In this case, the
+ * reader has to stop spinning.
+ *
+ * This will at least allow the reader count to go to 0 and wake up the
+ * first one in the wait queue which can initiate the handoff protocol,
+ * if necessary.
  */
-static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
+static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem,
+						bool *stop_spin)
 {
 	int count = atomic_read(&sem->count);
 
@@ -240,11 +265,21 @@ static inline bool rwsem_try_read_lock_unqueued(struct rw_semaphore *sem)
 
 	count = atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
 	if (!(count & (RWSEM_FLAG_HANDOFF|RWSEM_WRITER_LOCKED))) {
-		if (!(count >> RWSEM_READER_SHIFT))
+		if (!(count >> RWSEM_READER_SHIFT)) {
 			rwsem_set_reader_owned(sem);
+		} else {
+			struct task_struct *owner = READ_ONCE(sem->owner);
+
+			if (rwsem_owner_is_reader(owner) &&
+			   !rwsem_owner_timestamp_match(owner)) {
+				*stop_spin = true;
+				goto backout;
+			}
+		}
 		return true;
 	}
 
+backout:
 	/* Back out the change */
 	atomic_add(-RWSEM_READER_BIAS, &sem->count);
 	return false;
@@ -272,7 +307,8 @@ static inline bool rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
 	}
 }
 
-static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
+static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem,
+					   bool reader)
 {
 	struct task_struct *owner;
 	bool ret = true;
@@ -284,9 +320,9 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 	owner = READ_ONCE(sem->owner);
 	if (!rwsem_owner_is_writer(owner)) {
 		/*
-		 * Don't spin if the rwsem is readers owned.
+		 * Don't spin if the rspin disable bit is set for writer.
 		 */
-		ret = !rwsem_owner_is_reader(owner);
+		ret = reader || !rwsem_owner_is_spin_disabled(owner);
 		goto done;
 	}
 
@@ -349,6 +385,11 @@ static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
 rwsem_optimistic_spin(struct rw_semaphore *sem, enum rwsem_waiter_type type)
 {
 	bool taken = false;
+	bool stop_spin = false;
+	int owner_state;	/* Lock owner state */
+	int rspin_cnt = RWSEM_RSPIN_THRESHOLD;
+	int rspin_max = RWSEM_RSPIN_MAX;
+	int old_count = 0;
 
 	preempt_disable();
 
@@ -356,25 +397,55 @@ static noinline int rwsem_spin_on_owner(struct rw_semaphore *sem)
 	if (!osq_lock(&sem->osq))
 		goto done;
 
+	if (rwsem_is_spin_disabled(sem))
+		rspin_cnt = 0;
+
 	/*
 	 * Optimistically spin on the owner field and attempt to acquire the
 	 * lock whenever the owner changes. Spinning will be stopped when:
-	 *  1) the owning writer isn't running; or
-	 *  2) readers own the lock as we can't determine if they are
-	 *     actively running or not.
+	 *  1) the owning writer isn't running;
+	 *  2) writer: readers own the lock and spinning count has reached 0;
+	 *  3) reader: timestamp mismatch.
 	 */
-	while (rwsem_spin_on_owner(sem) > 0) {
+	while ((owner_state = rwsem_spin_on_owner(sem)) >= 0) {
 		/*
 		 * Try to acquire the lock
 		 */
 		taken = (type = RWSEM_WAITING_FOR_WRITE)
 		      ? rwsem_try_write_lock_unqueued(sem)
-		      : rwsem_try_read_lock_unqueued(sem);
+		      : rwsem_try_read_lock_unqueued(sem, &stop_spin);
 
-		if (taken)
+		if (taken || stop_spin)
 			break;
 
 		/*
+		 * We only decremnt the rspin_cnt when the lock is owned
+		 * by readers (owner_state = 0). In which case,
+		 * rwsem_spin_on_owner() will essentially be a no-op
+		 * and we will be spinning in this main loop. The spinning
+		 * count will be reset whenever the rwsem count value
+		 * changes.
+		 */
+		if (!owner_state) {
+			int count;
+
+			if (!rspin_cnt || !rspin_max) {
+				if (!rwsem_is_spin_disabled(sem))
+					rwsem_set_spin_disabled(sem);
+				break;
+			}
+
+			count = atomic_read(&sem->count) >> RWSEM_READER_SHIFT;
+			if (count != old_count) {
+				old_count = count;
+				rspin_cnt = RWSEM_RSPIN_THRESHOLD;
+			} else {
+				rspin_cnt--;
+			}
+			rspin_max--;
+		}
+
+		/*
 		 * When there's no owner, we might have preempted between the
 		 * owner acquiring the lock and setting the owner field. If
 		 * we're an RT task that will live-lock because we won't let
@@ -474,7 +545,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	 * 1) Optimistic spinners are present; or
 	 * 2) optimistic spinning is allowed.
 	 */
-	can_spin = rwsem_can_spin_on_owner(sem);
+	can_spin = rwsem_can_spin_on_owner(sem, true);
 	if (can_spin || rwsem_has_spinner(sem)) {
 		atomic_add(-RWSEM_READER_BIAS, &sem->count);
 		adjustment = 0;
@@ -570,7 +641,7 @@ static inline bool rwsem_has_spinner(struct rw_semaphore *sem)
 	DEFINE_WAKE_Q(wake_q);
 
 	/* do optimistic spinning and steal lock if possible */
-	if (rwsem_can_spin_on_owner(sem) &&
+	if (rwsem_can_spin_on_owner(sem, false) &&
 	    rwsem_optimistic_spin(sem, RWSEM_WAITING_FOR_WRITE))
 		return sem;
 
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index bf47d4a..6b60aac 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -27,6 +27,7 @@
  *     - a writer owns the lock
  */
 #define RWSEM_READER_OWNED		(1UL)
+#define RWSEM_SPIN_DISABLED		(2UL)
 #define RWSEM_READER_TIMESTAMP_SHIFT	8
 #define RWSEM_READER_TIMESTAMP_MASK	\
 	~((1UL << RWSEM_READER_TIMESTAMP_SHIFT) - 1)
@@ -81,6 +82,32 @@ static inline bool rwsem_is_reader_owned(struct rw_semaphore *sem)
 {
 	return rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
+
+static inline bool rwsem_owner_is_spin_disabled(struct task_struct *owner)
+{
+	return (unsigned long)owner & RWSEM_SPIN_DISABLED;
+}
+
+/*
+ * Try to set an optimistic spinning disable bit while it is reader-owned.
+ */
+static inline void rwsem_set_spin_disabled(struct rw_semaphore *sem)
+{
+	struct task_struct *owner = READ_ONCE(sem->owner);
+
+	/*
+	 * Failure in cmpxchg() will be ignored, and the caller is expected
+	 * to retry later.
+	 */
+	if (rwsem_owner_is_reader(owner))
+		cmpxchg(&sem->owner, owner,
+			(void *)((unsigned long)owner|RWSEM_SPIN_DISABLED));
+}
+
+static inline bool rwsem_is_spin_disabled(struct rw_semaphore *sem)
+{
+	return rwsem_owner_is_spin_disabled(READ_ONCE(sem->owner));
+}
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme
  2017-10-11 18:01   ` Waiman Long
@ 2017-10-11 18:40     ` Peter Zijlstra
  -1 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2017-10-11 18:40 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On Wed, Oct 11, 2017 at 02:01:53PM -0400, Waiman Long wrote:
> +/*
> + * The definition of the atomic counter in the semaphore:
> + *
> + * Bit  0    - writer locked bit
> + * Bit  1    - waiters present bit
> + * Bits 2-7  - reserved
> + * Bits 8-31 - 24-bit reader count
> + *
> + * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
> + * will be used to obtain writer lock.
> + */
> +#define RWSEM_WRITER_LOCKED	0X00000001
> +#define RWSEM_FLAG_WAITERS	0X00000002
> +#define RWSEM_READER_BIAS	0x00000100
> +#define RWSEM_READER_SHIFT	8
> +#define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
> +#define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
> +#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
> +
> +#define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
> +
> +/*
> + * lock for reading
> + */
> +static inline void __down_read(struct rw_semaphore *sem)
> +{
> +	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
> +		     & RWSEM_READ_FAILED_MASK))
> +		rwsem_down_read_failed(sem);
> +}

So I implemented rwsem-mutex (also qrwlock based) that puts

  (unsigned long)current | RWSEM_WRITER

in the atomic_long_t rw_semaphore::owner field. The down-side is that
you can't do fetch_add based __down_read, because that would clobber the
pointer. The up-side is that we have a stable owner pointer (which is
what I needed for PI like things).

I've yet to do performance tests -- and I've not done a bunch of the
obvious optimisations either.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme
@ 2017-10-11 18:40     ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2017-10-11 18:40 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On Wed, Oct 11, 2017 at 02:01:53PM -0400, Waiman Long wrote:
> +/*
> + * The definition of the atomic counter in the semaphore:
> + *
> + * Bit  0    - writer locked bit
> + * Bit  1    - waiters present bit
> + * Bits 2-7  - reserved
> + * Bits 8-31 - 24-bit reader count
> + *
> + * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
> + * will be used to obtain writer lock.
> + */
> +#define RWSEM_WRITER_LOCKED	0X00000001
> +#define RWSEM_FLAG_WAITERS	0X00000002
> +#define RWSEM_READER_BIAS	0x00000100
> +#define RWSEM_READER_SHIFT	8
> +#define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
> +#define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
> +#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
> +
> +#define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
> +
> +/*
> + * lock for reading
> + */
> +static inline void __down_read(struct rw_semaphore *sem)
> +{
> +	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
> +		     & RWSEM_READ_FAILED_MASK))
> +		rwsem_down_read_failed(sem);
> +}

So I implemented rwsem-mutex (also qrwlock based) that puts

  (unsigned long)current | RWSEM_WRITER

in the atomic_long_t rw_semaphore::owner field. The down-side is that
you can't do fetch_add based __down_read, because that would clobber the
pointer. The up-side is that we have a stable owner pointer (which is
what I needed for PI like things).

I've yet to do performance tests -- and I've not done a bunch of the
obvious optimisations either.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 18:48   ` Peter Zijlstra
  -1 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2017-10-11 18:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote:
>   # of Patches		   Reader 		  Writer
>     Applied		Locking Rate		Locking Rate
>   ------------		------------		------------
> 	0	5,155/    5,155/    5,155    5,154/248,852/346,281
> 	7	5,696/    5,697/    5,698  113,500/215,826/320,872
> 	8	4,827/    5,047/    5,215    4,826/176,797/284,069
> 	9     211,276/  509,712/1,134,007    4,894/221,839/246,818
>        11     884,513/1,043,989/1,252,533    9,604/ 11,105/ 25,225
> 
> It can be seen that rwsem changes from writer-preferring to
> reader-preferring.

A bit radically so, you almost starve the writers there.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
@ 2017-10-11 18:48   ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2017-10-11 18:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote:
>   # of Patches		   Reader 		  Writer
>     Applied		Locking Rate		Locking Rate
>   ------------		------------		------------
> 	0	5,155/    5,155/    5,155    5,154/248,852/346,281
> 	7	5,696/    5,697/    5,698  113,500/215,826/320,872
> 	8	4,827/    5,047/    5,215    4,826/176,797/284,069
> 	9     211,276/  509,712/1,134,007    4,894/221,839/246,818
>        11     884,513/1,043,989/1,252,533    9,604/ 11,105/ 25,225
> 
> It can be seen that rwsem changes from writer-preferring to
> reader-preferring.

A bit radically so, you almost starve the writers there.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
  2017-10-11 18:48   ` Peter Zijlstra
@ 2017-10-11 18:50     ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On 10/11/2017 02:48 PM, Peter Zijlstra wrote:
> On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote:
>>   # of Patches		   Reader 		  Writer
>>     Applied		Locking Rate		Locking Rate
>>   ------------		------------		------------
>> 	0	5,155/    5,155/    5,155    5,154/248,852/346,281
>> 	7	5,696/    5,697/    5,698  113,500/215,826/320,872
>> 	8	4,827/    5,047/    5,215    4,826/176,797/284,069
>> 	9     211,276/  509,712/1,134,007    4,894/221,839/246,818
>>        11     884,513/1,043,989/1,252,533    9,604/ 11,105/ 25,225
>>
>> It can be seen that rwsem changes from writer-preferring to
>> reader-preferring.
> A bit radically so, you almost starve the writers there.

Yes, almost, but the lock handoff code will make sure that it won't
actually get starved. That is why I added aggressive reader lock
stealing after the lock handoff code.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
@ 2017-10-11 18:50     ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On 10/11/2017 02:48 PM, Peter Zijlstra wrote:
> On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote:
>>   # of Patches		   Reader 		  Writer
>>     Applied		Locking Rate		Locking Rate
>>   ------------		------------		------------
>> 	0	5,155/    5,155/    5,155    5,154/248,852/346,281
>> 	7	5,696/    5,697/    5,698  113,500/215,826/320,872
>> 	8	4,827/    5,047/    5,215    4,826/176,797/284,069
>> 	9     211,276/  509,712/1,134,007    4,894/221,839/246,818
>>        11     884,513/1,043,989/1,252,533    9,604/ 11,105/ 25,225
>>
>> It can be seen that rwsem changes from writer-preferring to
>> reader-preferring.
> A bit radically so, you almost starve the writers there.

Yes, almost, but the lock handoff code will make sure that it won't
actually get starved. That is why I added aggressive reader lock
stealing after the lock handoff code.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme
  2017-10-11 18:40     ` Peter Zijlstra
@ 2017-10-11 18:58       ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On 10/11/2017 02:40 PM, Peter Zijlstra wrote:
> On Wed, Oct 11, 2017 at 02:01:53PM -0400, Waiman Long wrote:
>> +/*
>> + * The definition of the atomic counter in the semaphore:
>> + *
>> + * Bit  0    - writer locked bit
>> + * Bit  1    - waiters present bit
>> + * Bits 2-7  - reserved
>> + * Bits 8-31 - 24-bit reader count
>> + *
>> + * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
>> + * will be used to obtain writer lock.
>> + */
>> +#define RWSEM_WRITER_LOCKED	0X00000001
>> +#define RWSEM_FLAG_WAITERS	0X00000002
>> +#define RWSEM_READER_BIAS	0x00000100
>> +#define RWSEM_READER_SHIFT	8
>> +#define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
>> +#define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
>> +#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
>> +
>> +#define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
>> +
>> +/*
>> + * lock for reading
>> + */
>> +static inline void __down_read(struct rw_semaphore *sem)
>> +{
>> +	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
>> +		     & RWSEM_READ_FAILED_MASK))
>> +		rwsem_down_read_failed(sem);
>> +}
> So I implemented rwsem-mutex (also qrwlock based) that puts
>
>   (unsigned long)current | RWSEM_WRITER
>
> in the atomic_long_t rw_semaphore::owner field. The down-side is that
> you can't do fetch_add based __down_read, because that would clobber the
> pointer. The up-side is that we have a stable owner pointer (which is
> what I needed for PI like things).

Without fetch_add for readers, it could lead to reduced performance for
reader heavy workloads.

Are you trying to do a PI version of rwsem? It can work when the lock is
writer owned, but not when it is reader owned.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme
@ 2017-10-11 18:58       ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 18:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On 10/11/2017 02:40 PM, Peter Zijlstra wrote:
> On Wed, Oct 11, 2017 at 02:01:53PM -0400, Waiman Long wrote:
>> +/*
>> + * The definition of the atomic counter in the semaphore:
>> + *
>> + * Bit  0    - writer locked bit
>> + * Bit  1    - waiters present bit
>> + * Bits 2-7  - reserved
>> + * Bits 8-31 - 24-bit reader count
>> + *
>> + * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
>> + * will be used to obtain writer lock.
>> + */
>> +#define RWSEM_WRITER_LOCKED	0X00000001
>> +#define RWSEM_FLAG_WAITERS	0X00000002
>> +#define RWSEM_READER_BIAS	0x00000100
>> +#define RWSEM_READER_SHIFT	8
>> +#define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
>> +#define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
>> +#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
>> +
>> +#define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
>> +
>> +/*
>> + * lock for reading
>> + */
>> +static inline void __down_read(struct rw_semaphore *sem)
>> +{
>> +	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
>> +		     & RWSEM_READ_FAILED_MASK))
>> +		rwsem_down_read_failed(sem);
>> +}
> So I implemented rwsem-mutex (also qrwlock based) that puts
>
>   (unsigned long)current | RWSEM_WRITER
>
> in the atomic_long_t rw_semaphore::owner field. The down-side is that
> you can't do fetch_add based __down_read, because that would clobber the
> pointer. The up-side is that we have a stable owner pointer (which is
> what I needed for PI like things).

Without fetch_add for readers, it could lead to reduced performance for
reader heavy workloads.

Are you trying to do a PI version of rwsem? It can work when the lock is
writer owned, but not when it is reader owned.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme
  2017-10-11 18:58       ` Waiman Long
@ 2017-10-11 19:05         ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 19:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On 10/11/2017 02:58 PM, Waiman Long wrote:
> On 10/11/2017 02:40 PM, Peter Zijlstra wrote:
>> On Wed, Oct 11, 2017 at 02:01:53PM -0400, Waiman Long wrote:
>>> +/*
>>> + * The definition of the atomic counter in the semaphore:
>>> + *
>>> + * Bit  0    - writer locked bit
>>> + * Bit  1    - waiters present bit
>>> + * Bits 2-7  - reserved
>>> + * Bits 8-31 - 24-bit reader count
>>> + *
>>> + * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
>>> + * will be used to obtain writer lock.
>>> + */
>>> +#define RWSEM_WRITER_LOCKED	0X00000001
>>> +#define RWSEM_FLAG_WAITERS	0X00000002
>>> +#define RWSEM_READER_BIAS	0x00000100
>>> +#define RWSEM_READER_SHIFT	8
>>> +#define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
>>> +#define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
>>> +#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
>>> +
>>> +#define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
>>> +
>>> +/*
>>> + * lock for reading
>>> + */
>>> +static inline void __down_read(struct rw_semaphore *sem)
>>> +{
>>> +	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
>>> +		     & RWSEM_READ_FAILED_MASK))
>>> +		rwsem_down_read_failed(sem);
>>> +}
>> So I implemented rwsem-mutex (also qrwlock based) that puts
>>
>>   (unsigned long)current | RWSEM_WRITER
>>
>> in the atomic_long_t rw_semaphore::owner field. The down-side is that
>> you can't do fetch_add based __down_read, because that would clobber the
>> pointer. The up-side is that we have a stable owner pointer (which is
>> what I needed for PI like things).
> Without fetch_add for readers, it could lead to reduced performance for
> reader heavy workloads.
>
> Are you trying to do a PI version of rwsem? It can work when the lock is
> writer owned, but not when it is reader owned.

I have actually been thinking about giving priority to RT or DL task by
putting the task in front of the wait queue and assert the lock handoff
bit, for example. There are extra reserved bits left that can be useful
for adding these additional features. That will be later when I am done
with the current patch.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme
@ 2017-10-11 19:05         ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 19:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On 10/11/2017 02:58 PM, Waiman Long wrote:
> On 10/11/2017 02:40 PM, Peter Zijlstra wrote:
>> On Wed, Oct 11, 2017 at 02:01:53PM -0400, Waiman Long wrote:
>>> +/*
>>> + * The definition of the atomic counter in the semaphore:
>>> + *
>>> + * Bit  0    - writer locked bit
>>> + * Bit  1    - waiters present bit
>>> + * Bits 2-7  - reserved
>>> + * Bits 8-31 - 24-bit reader count
>>> + *
>>> + * atomic_fetch_add() is used to obtain reader lock, whereas atomic_cmpxchg()
>>> + * will be used to obtain writer lock.
>>> + */
>>> +#define RWSEM_WRITER_LOCKED	0X00000001
>>> +#define RWSEM_FLAG_WAITERS	0X00000002
>>> +#define RWSEM_READER_BIAS	0x00000100
>>> +#define RWSEM_READER_SHIFT	8
>>> +#define RWSEM_READER_MASK	(~((1U << RWSEM_READER_SHIFT) - 1))
>>> +#define RWSEM_LOCK_MASK 	(RWSEM_WRITER_LOCKED|RWSEM_READER_MASK)
>>> +#define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_LOCKED|RWSEM_FLAG_WAITERS)
>>> +
>>> +#define RWSEM_COUNT_IS_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
>>> +
>>> +/*
>>> + * lock for reading
>>> + */
>>> +static inline void __down_read(struct rw_semaphore *sem)
>>> +{
>>> +	if (unlikely(atomic_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count)
>>> +		     & RWSEM_READ_FAILED_MASK))
>>> +		rwsem_down_read_failed(sem);
>>> +}
>> So I implemented rwsem-mutex (also qrwlock based) that puts
>>
>>   (unsigned long)current | RWSEM_WRITER
>>
>> in the atomic_long_t rw_semaphore::owner field. The down-side is that
>> you can't do fetch_add based __down_read, because that would clobber the
>> pointer. The up-side is that we have a stable owner pointer (which is
>> what I needed for PI like things).
> Without fetch_add for readers, it could lead to reduced performance for
> reader heavy workloads.
>
> Are you trying to do a PI version of rwsem? It can work when the lock is
> writer owned, but not when it is reader owned.

I have actually been thinking about giving priority to RT or DL task by
putting the task in front of the wait queue and assert the lock handoff
bit, for example. There are extra reserved bits left that can be useful
for adding these additional features. That will be later when I am done
with the current patch.

Cheers,
Longman





^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme
  2017-10-11 18:58       ` Waiman Long
@ 2017-10-11 19:36         ` Peter Zijlstra
  -1 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2017-10-11 19:36 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On Wed, Oct 11, 2017 at 02:58:02PM -0400, Waiman Long wrote:
> On 10/11/2017 02:40 PM, Peter Zijlstra wrote:
> > So I implemented rwsem-mutex (also qrwlock based) that puts
> >
> >   (unsigned long)current | RWSEM_WRITER
> >
> > in the atomic_long_t rw_semaphore::owner field. The down-side is that
> > you can't do fetch_add based __down_read, because that would clobber the
> > pointer. The up-side is that we have a stable owner pointer (which is
> > what I needed for PI like things).
> 
> Without fetch_add for readers, it could lead to reduced performance for
> reader heavy workloads.

Yeah I know.. :-)

> Are you trying to do a PI version of rwsem? It can work when the lock is
> writer owned, but not when it is reader owned.

Not classical PI; there's one of those in -rt btw:

  https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/kernel/locking/rwsem-rt.c?h=linux-4.11.y-rt

I'm implementing proxy-execution (or rather, playing with it in a few
spare moments here and there). But yes, it will only be able to boost
write owners. But in order to make that happen I need the lock state and
owner thing in the same field, like mutex.


In any case, I'll try and have a look at these patches.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme
@ 2017-10-11 19:36         ` Peter Zijlstra
  0 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2017-10-11 19:36 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, x86, linux-alpha, linux-ia64,
	linux-s390, linux-arch, Davidlohr Bueso, Dave Chinner

On Wed, Oct 11, 2017 at 02:58:02PM -0400, Waiman Long wrote:
> On 10/11/2017 02:40 PM, Peter Zijlstra wrote:
> > So I implemented rwsem-mutex (also qrwlock based) that puts
> >
> >   (unsigned long)current | RWSEM_WRITER
> >
> > in the atomic_long_t rw_semaphore::owner field. The down-side is that
> > you can't do fetch_add based __down_read, because that would clobber the
> > pointer. The up-side is that we have a stable owner pointer (which is
> > what I needed for PI like things).
> 
> Without fetch_add for readers, it could lead to reduced performance for
> reader heavy workloads.

Yeah I know.. :-)

> Are you trying to do a PI version of rwsem? It can work when the lock is
> writer owned, but not when it is reader owned.

Not classical PI; there's one of those in -rt btw:

  https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/kernel/locking/rwsem-rt.c?h=linux-4.11.y-rt

I'm implementing proxy-execution (or rather, playing with it in a few
spare moments here and there). But yes, it will only be able to boost
write owners. But in order to make that happen I need the lock state and
owner thing in the same field, like mutex.


In any case, I'll try and have a look at these patches.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
  2017-10-11 18:48   ` Peter Zijlstra
@ 2017-10-11 20:45     ` Dave Chinner
  -1 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2017-10-11 20:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, linux-kernel, x86, linux-alpha,
	linux-ia64, linux-s390, linux-arch, Davidlohr Bueso

On Wed, Oct 11, 2017 at 08:48:40PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote:
> >   # of Patches		   Reader 		  Writer
> >     Applied		Locking Rate		Locking Rate
> >   ------------		------------		------------
> > 	0	5,155/    5,155/    5,155    5,154/248,852/346,281
> > 	7	5,696/    5,697/    5,698  113,500/215,826/320,872
> > 	8	4,827/    5,047/    5,215    4,826/176,797/284,069
> > 	9     211,276/  509,712/1,134,007    4,894/221,839/246,818
> >        11     884,513/1,043,989/1,252,533    9,604/ 11,105/ 25,225
> > 
> > It can be seen that rwsem changes from writer-preferring to
> > reader-preferring.
> 
> A bit radically so, you almost starve the writers there.

Which is a bit of a problem for us, because we often use the write
locks as an IO barrier for operations like truncate, fallocate, etc.
i.e. we want it to immediately block readers.

That's going to be a bit of a problem if, for example, we have so
many AIO-based direct IO writers on a file we can't get fallocate to
run in a timely fashion to preallocate the space the writers are
soon going to write into.

Not to mention the AIO-DIO append case where we have multiple
concurrent writers at EOF, and so every so often one of the many IOs
needs to take the write lock extending EOF safely. Blocking that for
10ms waiting for a hand-off is going to make all the people who care
about deterministic IO latency go nuts....

So from my perspective on the IO side, I'd much prefer a write bias.
Indeed, if we go back to the Irix XFS code, all these locks we
defined as "MR_BARRIER" locks, which meant the XFS rwsems were
specifically intended to have writer bias.

I think we can live with a fair r/w bias, but swinging from a
50:1 write bias to a 100:1 read bias is going change behaviour
dramatically, and in many cases it won't be an improvement...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
@ 2017-10-11 20:45     ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2017-10-11 20:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ingo Molnar, linux-kernel, x86, linux-alpha,
	linux-ia64, linux-s390, linux-arch, Davidlohr Bueso

On Wed, Oct 11, 2017 at 08:48:40PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote:
> >   # of Patches		   Reader 		  Writer
> >     Applied		Locking Rate		Locking Rate
> >   ------------		------------		------------
> > 	0	5,155/    5,155/    5,155    5,154/248,852/346,281
> > 	7	5,696/    5,697/    5,698  113,500/215,826/320,872
> > 	8	4,827/    5,047/    5,215    4,826/176,797/284,069
> > 	9     211,276/  509,712/1,134,007    4,894/221,839/246,818
> >        11     884,513/1,043,989/1,252,533    9,604/ 11,105/ 25,225
> > 
> > It can be seen that rwsem changes from writer-preferring to
> > reader-preferring.
> 
> A bit radically so, you almost starve the writers there.

Which is a bit of a problem for us, because we often use the write
locks as an IO barrier for operations like truncate, fallocate, etc.
i.e. we want it to immediately block readers.

That's going to be a bit of a problem if, for example, we have so
many AIO-based direct IO writers on a file we can't get fallocate to
run in a timely fashion to preallocate the space the writers are
soon going to write into.

Not to mention the AIO-DIO append case where we have multiple
concurrent writers at EOF, and so every so often one of the many IOs
needs to take the write lock extending EOF safely. Blocking that for
10ms waiting for a hand-off is going to make all the people who care
about deterministic IO latency go nuts....

So from my perspective on the IO side, I'd much prefer a write bias.
Indeed, if we go back to the Irix XFS code, all these locks we
defined as "MR_BARRIER" locks, which meant the XFS rwsems were
specifically intended to have writer bias.

I think we can live with a fair r/w bias, but swinging from a
50:1 write bias to a 100:1 read bias is going change behaviour
dramatically, and in many cases it won't be an improvement...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
  2017-10-11 18:01 ` Waiman Long
@ 2017-10-11 20:50   ` Dave Chinner
  -1 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2017-10-11 20:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, x86, linux-alpha,
	linux-ia64, linux-s390, linux-arch, Davidlohr Bueso

On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote:
> In term of rwsem performance, a rwsem microbenchmark and fio randrw
> test with a xfs filesystem on a ramdisk were used to verify the
> performance changes due to these patches. Both tests were run on a
> 2-socket, 40-core Gold 6148 system. The rwsem microbenchmark (1:1
> reader/writer ratio) has short critical section while the fio randrw
> test has long critical section (4k read/write).
> 
> The following table shows the performance of the rwsem microbenchmark
> and fio radrw test with different number of patches applied on 4.14
> based kernels:
> 
>   # of Patches	Locking Rate	FIO Bandwidth	FIO Bandwidth
>     Applied	 40 threads	 32 threads	 16 threads
>   ------------	------------	-------------	-------------
> 	0	  38.7 kop/s	  706 MB/s	  704 MB/s
> 	7	  38.6 kop/s	  668 MB/s	  663 MB/s
> 	8	  38.9 kop/s	  704 MB/s	  701 MB/s
> 	9	  39.1 kop/s	  702 MB/s	  707 MB/s
>        11	3218.0 kop/s	 2594 MB/s	 2614 MB/s
> 
> So this patchset improves mixed read/write rwsem microbench by 83X
> and randrw fio bandwidth by about 3.7X.

Overall improvement in bandwidth is not necessarily a good thing -
this could simply demonstrate total write bandwidth starvation and
so it's only reporting read bandwith. It's much more important to
look at the change in read bandwidth vs write bandwidth in the fio
test. i.e. exactly how did the IO balance change as a result of
changing the locking bias?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
@ 2017-10-11 20:50   ` Dave Chinner
  0 siblings, 0 replies; 42+ messages in thread
From: Dave Chinner @ 2017-10-11 20:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, x86, linux-alpha,
	linux-ia64, linux-s390, linux-arch, Davidlohr Bueso

On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote:
> In term of rwsem performance, a rwsem microbenchmark and fio randrw
> test with a xfs filesystem on a ramdisk were used to verify the
> performance changes due to these patches. Both tests were run on a
> 2-socket, 40-core Gold 6148 system. The rwsem microbenchmark (1:1
> reader/writer ratio) has short critical section while the fio randrw
> test has long critical section (4k read/write).
> 
> The following table shows the performance of the rwsem microbenchmark
> and fio radrw test with different number of patches applied on 4.14
> based kernels:
> 
>   # of Patches	Locking Rate	FIO Bandwidth	FIO Bandwidth
>     Applied	 40 threads	 32 threads	 16 threads
>   ------------	------------	-------------	-------------
> 	0	  38.7 kop/s	  706 MB/s	  704 MB/s
> 	7	  38.6 kop/s	  668 MB/s	  663 MB/s
> 	8	  38.9 kop/s	  704 MB/s	  701 MB/s
> 	9	  39.1 kop/s	  702 MB/s	  707 MB/s
>        11	3218.0 kop/s	 2594 MB/s	 2614 MB/s
> 
> So this patchset improves mixed read/write rwsem microbench by 83X
> and randrw fio bandwidth by about 3.7X.

Overall improvement in bandwidth is not necessarily a good thing -
this could simply demonstrate total write bandwidth starvation and
so it's only reporting read bandwith. It's much more important to
look at the change in read bandwidth vs write bandwidth in the fio
test. i.e. exactly how did the IO balance change as a result of
changing the locking bias?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
  2017-10-11 20:50   ` Dave Chinner
@ 2017-10-11 20:57     ` Waiman Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 20:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, x86, linux-alpha,
	linux-ia64, linux-s390, linux-arch, Davidlohr Bueso

On 10/11/2017 04:50 PM, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote:
>> In term of rwsem performance, a rwsem microbenchmark and fio randrw
>> test with a xfs filesystem on a ramdisk were used to verify the
>> performance changes due to these patches. Both tests were run on a
>> 2-socket, 40-core Gold 6148 system. The rwsem microbenchmark (1:1
>> reader/writer ratio) has short critical section while the fio randrw
>> test has long critical section (4k read/write).
>>
>> The following table shows the performance of the rwsem microbenchmark
>> and fio radrw test with different number of patches applied on 4.14
>> based kernels:
>>
>>   # of Patches	Locking Rate	FIO Bandwidth	FIO Bandwidth
>>     Applied	 40 threads	 32 threads	 16 threads
>>   ------------	------------	-------------	-------------
>> 	0	  38.7 kop/s	  706 MB/s	  704 MB/s
>> 	7	  38.6 kop/s	  668 MB/s	  663 MB/s
>> 	8	  38.9 kop/s	  704 MB/s	  701 MB/s
>> 	9	  39.1 kop/s	  702 MB/s	  707 MB/s
>>        11	3218.0 kop/s	 2594 MB/s	 2614 MB/s
>>
>> So this patchset improves mixed read/write rwsem microbench by 83X
>> and randrw fio bandwidth by about 3.7X.
> Overall improvement in bandwidth is not necessarily a good thing -
> this could simply demonstrate total write bandwidth starvation and
> so it's only reporting read bandwith. It's much more important to
> look at the change in read bandwidth vs write bandwidth in the fio
> test. i.e. exactly how did the IO balance change as a result of
> changing the locking bias?

Thanks for the input. I can take out the reader lock stealing part. That
will give it a more fair reader/writer bias. It can also be an option
that be set when the rwsem is inited.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features
@ 2017-10-11 20:57     ` Waiman Long
  0 siblings, 0 replies; 42+ messages in thread
From: Waiman Long @ 2017-10-11 20:57 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, x86, linux-alpha,
	linux-ia64, linux-s390, linux-arch, Davidlohr Bueso

On 10/11/2017 04:50 PM, Dave Chinner wrote:
> On Wed, Oct 11, 2017 at 02:01:51PM -0400, Waiman Long wrote:
>> In term of rwsem performance, a rwsem microbenchmark and fio randrw
>> test with a xfs filesystem on a ramdisk were used to verify the
>> performance changes due to these patches. Both tests were run on a
>> 2-socket, 40-core Gold 6148 system. The rwsem microbenchmark (1:1
>> reader/writer ratio) has short critical section while the fio randrw
>> test has long critical section (4k read/write).
>>
>> The following table shows the performance of the rwsem microbenchmark
>> and fio radrw test with different number of patches applied on 4.14
>> based kernels:
>>
>>   # of Patches	Locking Rate	FIO Bandwidth	FIO Bandwidth
>>     Applied	 40 threads	 32 threads	 16 threads
>>   ------------	------------	-------------	-------------
>> 	0	  38.7 kop/s	  706 MB/s	  704 MB/s
>> 	7	  38.6 kop/s	  668 MB/s	  663 MB/s
>> 	8	  38.9 kop/s	  704 MB/s	  701 MB/s
>> 	9	  39.1 kop/s	  702 MB/s	  707 MB/s
>>        11	3218.0 kop/s	 2594 MB/s	 2614 MB/s
>>
>> So this patchset improves mixed read/write rwsem microbench by 83X
>> and randrw fio bandwidth by about 3.7X.
> Overall improvement in bandwidth is not necessarily a good thing -
> this could simply demonstrate total write bandwidth starvation and
> so it's only reporting read bandwith. It's much more important to
> look at the change in read bandwidth vs write bandwidth in the fio
> test. i.e. exactly how did the IO balance change as a result of
> changing the locking bias?

Thanks for the input. I can take out the reader lock stealing part. That
will give it a more fair reader/writer bias. It can also be an option
that be set when the rwsem is inited.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2017-10-11 20:57 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-11 18:01 [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
2017-10-11 18:01 ` Waiman Long
2017-10-11 18:01 ` [PATCH v6 01/11] locking/rwsem: relocate rwsem_down_read_failed() Waiman Long
2017-10-11 18:01   ` Waiman Long
2017-10-11 18:01 ` [PATCH v6 02/11] locking/rwsem: Implement a new locking scheme Waiman Long
2017-10-11 18:01   ` Waiman Long
2017-10-11 18:40   ` Peter Zijlstra
2017-10-11 18:40     ` Peter Zijlstra
2017-10-11 18:58     ` Waiman Long
2017-10-11 18:58       ` Waiman Long
2017-10-11 19:05       ` Waiman Long
2017-10-11 19:05         ` Waiman Long
2017-10-11 19:36       ` Peter Zijlstra
2017-10-11 19:36         ` Peter Zijlstra
2017-10-11 18:01 ` [PATCH v6 03/11] locking/rwsem: Move owner setting code from rwsem.c to rwsem-xadd.h Waiman Long
2017-10-11 18:01   ` Waiman Long
2017-10-11 18:01 ` [PATCH v6 04/11] locking/rwsem: Remove kernel/locking/rwsem.h Waiman Long
2017-10-11 18:01   ` Waiman Long
2017-10-11 18:01 ` [PATCH v6 05/11] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h Waiman Long
2017-10-11 18:01   ` Waiman Long
2017-10-11 18:01 ` [PATCH v6 06/11] locking/rwsem: Remove arch specific rwsem files Waiman Long
2017-10-11 18:01   ` Waiman Long
2017-10-11 18:01 ` [PATCH v6 07/11] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
2017-10-11 18:01   ` Waiman Long
2017-10-11 18:01 ` [PATCH v6 08/11] locking/rwsem: Enable readers spinning on writer Waiman Long
2017-10-11 18:01   ` Waiman Long
2017-10-11 18:02 ` [PATCH v6 09/11] locking/rwsem: Enable time-based reader lock stealing Waiman Long
2017-10-11 18:02   ` Waiman Long
2017-10-11 18:02 ` [PATCH v6 10/11] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
2017-10-11 18:02   ` Waiman Long
2017-10-11 18:02 ` [PATCH v6 11/11] locking/rwsem: Enable count-based spinning on reader Waiman Long
2017-10-11 18:02   ` Waiman Long
2017-10-11 18:48 ` [PATCH v6 00/11] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Peter Zijlstra
2017-10-11 18:48   ` Peter Zijlstra
2017-10-11 18:50   ` Waiman Long
2017-10-11 18:50     ` Waiman Long
2017-10-11 20:45   ` Dave Chinner
2017-10-11 20:45     ` Dave Chinner
2017-10-11 20:50 ` Dave Chinner
2017-10-11 20:50   ` Dave Chinner
2017-10-11 20:57   ` Waiman Long
2017-10-11 20:57     ` Waiman Long

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.