All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Switch arm64 over to qrwlock
@ 2017-10-05 12:54 ` Will Deacon
  0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arm-kernel, Jeremy.Linton, peterz, mingo, longman,
	boqun.feng, paulmck, Will Deacon

Hi all,

This patch series reworks bits of the qrwlock code that it can be used
to replace the asm rwlocks currently implemented for arm64. The structure
of the series is:

  Patches 1-3	: Work WFE into qrwlock using atomic_cond_read_acquire so
		  we can avoid busy-waiting.

  Patch 4	: Enable qrwlocks for arm64

  Patch 5-6	: Ensure writer slowpath fairness. This has a potential
		  performance impact on the writer unlock path, so I've
		  kept them at the end.

The patches apply on top of my other locking cleanups:

  http://lkml.kernel.org/r/1507055129-12300-1-git-send-email-will.deacon@arm.com

although the conflict with mainline is trivial to resolve without those.
The full stack is also pushed here:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock

All comments (particularly related to testing and performance) welcome!

Cheers,

Will

--->8

Will Deacon (6):
  kernel/locking: Use struct qrwlock instead of struct __qrwlock
  locking/atomic: Add atomic_cond_read_acquire
  kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
  arm64: locking: Move rwlock implementation over to qrwlocks
  kernel/locking: Prevent slowpath writers getting held up by fastpath
  kernel/locking: Remove unused union members from struct qrwlock

 arch/arm64/Kconfig                      |  17 ++++
 arch/arm64/include/asm/Kbuild           |   1 +
 arch/arm64/include/asm/spinlock.h       | 164 +-------------------------------
 arch/arm64/include/asm/spinlock_types.h |   6 +-
 include/asm-generic/atomic-long.h       |   3 +
 include/asm-generic/qrwlock.h           |  14 +--
 include/asm-generic/qrwlock_types.h     |   2 +-
 include/linux/atomic.h                  |   4 +
 kernel/locking/qrwlock.c                |  83 +++-------------
 9 files changed, 43 insertions(+), 251 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 0/6] Switch arm64 over to qrwlock
@ 2017-10-05 12:54 ` Will Deacon
  0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

This patch series reworks bits of the qrwlock code that it can be used
to replace the asm rwlocks currently implemented for arm64. The structure
of the series is:

  Patches 1-3	: Work WFE into qrwlock using atomic_cond_read_acquire so
		  we can avoid busy-waiting.

  Patch 4	: Enable qrwlocks for arm64

  Patch 5-6	: Ensure writer slowpath fairness. This has a potential
		  performance impact on the writer unlock path, so I've
		  kept them at the end.

The patches apply on top of my other locking cleanups:

  http://lkml.kernel.org/r/1507055129-12300-1-git-send-email-will.deacon at arm.com

although the conflict with mainline is trivial to resolve without those.
The full stack is also pushed here:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock

All comments (particularly related to testing and performance) welcome!

Cheers,

Will

--->8

Will Deacon (6):
  kernel/locking: Use struct qrwlock instead of struct __qrwlock
  locking/atomic: Add atomic_cond_read_acquire
  kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
  arm64: locking: Move rwlock implementation over to qrwlocks
  kernel/locking: Prevent slowpath writers getting held up by fastpath
  kernel/locking: Remove unused union members from struct qrwlock

 arch/arm64/Kconfig                      |  17 ++++
 arch/arm64/include/asm/Kbuild           |   1 +
 arch/arm64/include/asm/spinlock.h       | 164 +-------------------------------
 arch/arm64/include/asm/spinlock_types.h |   6 +-
 include/asm-generic/atomic-long.h       |   3 +
 include/asm-generic/qrwlock.h           |  14 +--
 include/asm-generic/qrwlock_types.h     |   2 +-
 include/linux/atomic.h                  |   4 +
 kernel/locking/qrwlock.c                |  83 +++-------------
 9 files changed, 43 insertions(+), 251 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/6] kernel/locking: Use struct qrwlock instead of struct __qrwlock
  2017-10-05 12:54 ` Will Deacon
@ 2017-10-05 12:54   ` Will Deacon
  -1 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arm-kernel, Jeremy.Linton, peterz, mingo, longman,
	boqun.feng, paulmck, Will Deacon

There's no good reason to keep the internal structure of struct qrwlock
hidden from qrwlock.h, particularly as it's actually needed for unlock
and ends up being abstracted independently behind the __qrwlock_write_byte
function.

Stop pretending we can hide this stuff, and move the __qrwlock definition
into qrwlock, removing the __qrwlock_write_byte nastiness and using the
same struct definition everywhere instead.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/qrwlock.h       | 12 +-----------
 include/asm-generic/qrwlock_types.h | 15 +++++++++++++--
 kernel/locking/qrwlock.c            | 26 ++------------------------
 3 files changed, 16 insertions(+), 37 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 50925327b0a8..02c0a768e6b0 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -129,22 +129,12 @@ static inline void queued_read_unlock(struct qrwlock *lock)
 }
 
 /**
- * __qrwlock_write_byte - retrieve the write byte address of a queue rwlock
- * @lock : Pointer to queue rwlock structure
- * Return: the write byte address of a queue rwlock
- */
-static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
-{
-	return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
-}
-
-/**
  * queued_write_unlock - release write lock of a queue rwlock
  * @lock : Pointer to queue rwlock structure
  */
 static inline void queued_write_unlock(struct qrwlock *lock)
 {
-	smp_store_release(__qrwlock_write_byte(lock), 0);
+	smp_store_release(&lock->wmode, 0);
 }
 
 /*
diff --git a/include/asm-generic/qrwlock_types.h b/include/asm-generic/qrwlock_types.h
index 0abc6b6062fb..507f2dc51bba 100644
--- a/include/asm-generic/qrwlock_types.h
+++ b/include/asm-generic/qrwlock_types.h
@@ -9,12 +9,23 @@
  */
 
 typedef struct qrwlock {
-	atomic_t		cnts;
+	union {
+		atomic_t cnts;
+		struct {
+#ifdef __LITTLE_ENDIAN
+			u8 wmode;	/* Writer mode   */
+			u8 rcnts[3];	/* Reader counts */
+#else
+			u8 rcnts[3];	/* Reader counts */
+			u8 wmode;	/* Writer mode   */
+#endif
+		};
+	};
 	arch_spinlock_t		wait_lock;
 } arch_rwlock_t;
 
 #define	__ARCH_RW_LOCK_UNLOCKED {		\
-	.cnts = ATOMIC_INIT(0),			\
+	{ .cnts = ATOMIC_INIT(0), },		\
 	.wait_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
 }
 
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 2655f26ec882..1af791e37348 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -23,26 +23,6 @@
 #include <linux/spinlock.h>
 #include <asm/qrwlock.h>
 
-/*
- * This internal data structure is used for optimizing access to some of
- * the subfields within the atomic_t cnts.
- */
-struct __qrwlock {
-	union {
-		atomic_t cnts;
-		struct {
-#ifdef __LITTLE_ENDIAN
-			u8 wmode;	/* Writer mode   */
-			u8 rcnts[3];	/* Reader counts */
-#else
-			u8 rcnts[3];	/* Reader counts */
-			u8 wmode;	/* Writer mode   */
-#endif
-		};
-	};
-	arch_spinlock_t	lock;
-};
-
 /**
  * rspin_until_writer_unlock - inc reader count & spin until writer is gone
  * @lock  : Pointer to queue rwlock structure
@@ -125,10 +105,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	 * or wait for a previous writer to go away.
 	 */
 	for (;;) {
-		struct __qrwlock *l = (struct __qrwlock *)lock;
-
-		if (!READ_ONCE(l->wmode) &&
-		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
+		if (!READ_ONCE(lock->wmode) &&
+		   (cmpxchg_relaxed(&lock->wmode, 0, _QW_WAITING) == 0))
 			break;
 
 		cpu_relax();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/6] kernel/locking: Use struct qrwlock instead of struct __qrwlock
@ 2017-10-05 12:54   ` Will Deacon
  0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

There's no good reason to keep the internal structure of struct qrwlock
hidden from qrwlock.h, particularly as it's actually needed for unlock
and ends up being abstracted independently behind the __qrwlock_write_byte
function.

Stop pretending we can hide this stuff, and move the __qrwlock definition
into qrwlock, removing the __qrwlock_write_byte nastiness and using the
same struct definition everywhere instead.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/qrwlock.h       | 12 +-----------
 include/asm-generic/qrwlock_types.h | 15 +++++++++++++--
 kernel/locking/qrwlock.c            | 26 ++------------------------
 3 files changed, 16 insertions(+), 37 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 50925327b0a8..02c0a768e6b0 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -129,22 +129,12 @@ static inline void queued_read_unlock(struct qrwlock *lock)
 }
 
 /**
- * __qrwlock_write_byte - retrieve the write byte address of a queue rwlock
- * @lock : Pointer to queue rwlock structure
- * Return: the write byte address of a queue rwlock
- */
-static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
-{
-	return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
-}
-
-/**
  * queued_write_unlock - release write lock of a queue rwlock
  * @lock : Pointer to queue rwlock structure
  */
 static inline void queued_write_unlock(struct qrwlock *lock)
 {
-	smp_store_release(__qrwlock_write_byte(lock), 0);
+	smp_store_release(&lock->wmode, 0);
 }
 
 /*
diff --git a/include/asm-generic/qrwlock_types.h b/include/asm-generic/qrwlock_types.h
index 0abc6b6062fb..507f2dc51bba 100644
--- a/include/asm-generic/qrwlock_types.h
+++ b/include/asm-generic/qrwlock_types.h
@@ -9,12 +9,23 @@
  */
 
 typedef struct qrwlock {
-	atomic_t		cnts;
+	union {
+		atomic_t cnts;
+		struct {
+#ifdef __LITTLE_ENDIAN
+			u8 wmode;	/* Writer mode   */
+			u8 rcnts[3];	/* Reader counts */
+#else
+			u8 rcnts[3];	/* Reader counts */
+			u8 wmode;	/* Writer mode   */
+#endif
+		};
+	};
 	arch_spinlock_t		wait_lock;
 } arch_rwlock_t;
 
 #define	__ARCH_RW_LOCK_UNLOCKED {		\
-	.cnts = ATOMIC_INIT(0),			\
+	{ .cnts = ATOMIC_INIT(0), },		\
 	.wait_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
 }
 
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 2655f26ec882..1af791e37348 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -23,26 +23,6 @@
 #include <linux/spinlock.h>
 #include <asm/qrwlock.h>
 
-/*
- * This internal data structure is used for optimizing access to some of
- * the subfields within the atomic_t cnts.
- */
-struct __qrwlock {
-	union {
-		atomic_t cnts;
-		struct {
-#ifdef __LITTLE_ENDIAN
-			u8 wmode;	/* Writer mode   */
-			u8 rcnts[3];	/* Reader counts */
-#else
-			u8 rcnts[3];	/* Reader counts */
-			u8 wmode;	/* Writer mode   */
-#endif
-		};
-	};
-	arch_spinlock_t	lock;
-};
-
 /**
  * rspin_until_writer_unlock - inc reader count & spin until writer is gone
  * @lock  : Pointer to queue rwlock structure
@@ -125,10 +105,8 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	 * or wait for a previous writer to go away.
 	 */
 	for (;;) {
-		struct __qrwlock *l = (struct __qrwlock *)lock;
-
-		if (!READ_ONCE(l->wmode) &&
-		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
+		if (!READ_ONCE(lock->wmode) &&
+		   (cmpxchg_relaxed(&lock->wmode, 0, _QW_WAITING) == 0))
 			break;
 
 		cpu_relax();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/6] locking/atomic: Add atomic_cond_read_acquire
  2017-10-05 12:54 ` Will Deacon
@ 2017-10-05 12:54   ` Will Deacon
  -1 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arm-kernel, Jeremy.Linton, peterz, mingo, longman,
	boqun.feng, paulmck, Will Deacon

smp_cond_load_acquire provides a way to spin on a variable with acquire
semantics until some conditional expression involing the variable is
satisfied. Architectures such as arm64 can potentially enter a low-power
state, waking up only when the value of the variable changes, which
reduces the system impact of tight polling loops.

This patch makes the same interface available to users of atomic_t,
atomic64_t and atomic_long_t, rather than require messy accesses to the
structure internals.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/atomic-long.h | 3 +++
 include/linux/atomic.h            | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/include/asm-generic/atomic-long.h b/include/asm-generic/atomic-long.h
index 288cc9e96395..f2d97b782031 100644
--- a/include/asm-generic/atomic-long.h
+++ b/include/asm-generic/atomic-long.h
@@ -243,4 +243,7 @@ static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u)
 #define atomic_long_inc_not_zero(l) \
 	ATOMIC_LONG_PFX(_inc_not_zero)((ATOMIC_LONG_PFX(_t) *)(l))
 
+#define atomic_long_cond_read_acquire(v, c) \
+	ATOMIC_LONG_PFX(_cond_read_acquire)((ATOMIC_LONG_PFX(_t) *)(v), (c))
+
 #endif  /*  _ASM_GENERIC_ATOMIC_LONG_H  */
diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 40d6bfec0e0d..0aeb2b3f4578 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -653,6 +653,8 @@ static inline int atomic_dec_if_positive(atomic_t *v)
 }
 #endif
 
+#define atomic_cond_read_acquire(v, c)	smp_cond_load_acquire(&(v)->counter, (c))
+
 #ifdef CONFIG_GENERIC_ATOMIC64
 #include <asm-generic/atomic64.h>
 #endif
@@ -1072,6 +1074,8 @@ static inline long long atomic64_fetch_andnot_release(long long i, atomic64_t *v
 }
 #endif
 
+#define atomic64_cond_read_acquire(v, c)	smp_cond_load_acquire(&(v)->counter, (c))
+
 #include <asm-generic/atomic-long.h>
 
 #endif /* _LINUX_ATOMIC_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/6] locking/atomic: Add atomic_cond_read_acquire
@ 2017-10-05 12:54   ` Will Deacon
  0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

smp_cond_load_acquire provides a way to spin on a variable with acquire
semantics until some conditional expression involing the variable is
satisfied. Architectures such as arm64 can potentially enter a low-power
state, waking up only when the value of the variable changes, which
reduces the system impact of tight polling loops.

This patch makes the same interface available to users of atomic_t,
atomic64_t and atomic_long_t, rather than require messy accesses to the
structure internals.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/atomic-long.h | 3 +++
 include/linux/atomic.h            | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/include/asm-generic/atomic-long.h b/include/asm-generic/atomic-long.h
index 288cc9e96395..f2d97b782031 100644
--- a/include/asm-generic/atomic-long.h
+++ b/include/asm-generic/atomic-long.h
@@ -243,4 +243,7 @@ static inline long atomic_long_add_unless(atomic_long_t *l, long a, long u)
 #define atomic_long_inc_not_zero(l) \
 	ATOMIC_LONG_PFX(_inc_not_zero)((ATOMIC_LONG_PFX(_t) *)(l))
 
+#define atomic_long_cond_read_acquire(v, c) \
+	ATOMIC_LONG_PFX(_cond_read_acquire)((ATOMIC_LONG_PFX(_t) *)(v), (c))
+
 #endif  /*  _ASM_GENERIC_ATOMIC_LONG_H  */
diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 40d6bfec0e0d..0aeb2b3f4578 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -653,6 +653,8 @@ static inline int atomic_dec_if_positive(atomic_t *v)
 }
 #endif
 
+#define atomic_cond_read_acquire(v, c)	smp_cond_load_acquire(&(v)->counter, (c))
+
 #ifdef CONFIG_GENERIC_ATOMIC64
 #include <asm-generic/atomic64.h>
 #endif
@@ -1072,6 +1074,8 @@ static inline long long atomic64_fetch_andnot_release(long long i, atomic64_t *v
 }
 #endif
 
+#define atomic64_cond_read_acquire(v, c)	smp_cond_load_acquire(&(v)->counter, (c))
+
 #include <asm-generic/atomic-long.h>
 
 #endif /* _LINUX_ATOMIC_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/6] kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
  2017-10-05 12:54 ` Will Deacon
@ 2017-10-05 12:54   ` Will Deacon
  -1 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arm-kernel, Jeremy.Linton, peterz, mingo, longman,
	boqun.feng, paulmck, Will Deacon

The qrwlock slowpaths involve spinning when either a prospective reader
is waiting for a concurrent writer to drain, or a prospective writer is
waiting for concurrent readers to drain. In both of these situations,
atomic_cond_read_acquire can be used to avoid busy-waiting and make use
of any backoff functionality provided by the architecture.

This patch replaces the open-code loops and rspin_until_writer_unlock
implementation with atomic_cond_read_acquire. The write mode transition
zero to _QW_WAITING is left alone, since (a) this doesn't need acquire
semantics and (b) should be fast.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 kernel/locking/qrwlock.c | 47 +++++++++++------------------------------------
 1 file changed, 11 insertions(+), 36 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 1af791e37348..b7ea4647c74d 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -24,23 +24,6 @@
 #include <asm/qrwlock.h>
 
 /**
- * rspin_until_writer_unlock - inc reader count & spin until writer is gone
- * @lock  : Pointer to queue rwlock structure
- * @writer: Current queue rwlock writer status byte
- *
- * In interrupt context or at the head of the queue, the reader will just
- * increment the reader count & wait until the writer releases the lock.
- */
-static __always_inline void
-rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
-{
-	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
-		cpu_relax();
-		cnts = atomic_read_acquire(&lock->cnts);
-	}
-}
-
-/**
  * queued_read_lock_slowpath - acquire read lock of a queue rwlock
  * @lock: Pointer to queue rwlock structure
  * @cnts: Current qrwlock lock value
@@ -53,13 +36,12 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 	if (unlikely(in_interrupt())) {
 		/*
 		 * Readers in interrupt context will get the lock immediately
-		 * if the writer is just waiting (not holding the lock yet).
-		 * The rspin_until_writer_unlock() function returns immediately
-		 * in this case. Otherwise, they will spin (with ACQUIRE
-		 * semantics) until the lock is available without waiting in
-		 * the queue.
+		 * if the writer is just waiting (not holding the lock yet),
+		 * so spin with ACQUIRE semantics until the lock is available
+		 * without waiting in the queue.
 		 */
-		rspin_until_writer_unlock(lock, cnts);
+		atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK)
+					 != _QW_LOCKED);
 		return;
 	}
 	atomic_sub(_QR_BIAS, &lock->cnts);
@@ -68,14 +50,14 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 	 * Put the reader into the wait queue
 	 */
 	arch_spin_lock(&lock->wait_lock);
+	atomic_add(_QR_BIAS, &lock->cnts);
 
 	/*
 	 * The ACQUIRE semantics of the following spinning code ensure
 	 * that accesses can't leak upwards out of our subsequent critical
 	 * section in the case that the lock is currently held for write.
 	 */
-	cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
-	rspin_until_writer_unlock(lock, cnts);
+	atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK) != _QW_LOCKED);
 
 	/*
 	 * Signal the next one in queue to become queue head
@@ -90,8 +72,6 @@ EXPORT_SYMBOL(queued_read_lock_slowpath);
  */
 void queued_write_lock_slowpath(struct qrwlock *lock)
 {
-	u32 cnts;
-
 	/* Put the writer into the wait queue */
 	arch_spin_lock(&lock->wait_lock);
 
@@ -113,15 +93,10 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	}
 
 	/* When no more readers, set the locked flag */
-	for (;;) {
-		cnts = atomic_read(&lock->cnts);
-		if ((cnts == _QW_WAITING) &&
-		    (atomic_cmpxchg_acquire(&lock->cnts, _QW_WAITING,
-					    _QW_LOCKED) == _QW_WAITING))
-			break;
-
-		cpu_relax();
-	}
+	do {
+		atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
+	} while (atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING,
+					_QW_LOCKED) != _QW_WAITING);
 unlock:
 	arch_spin_unlock(&lock->wait_lock);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/6] kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
@ 2017-10-05 12:54   ` Will Deacon
  0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

The qrwlock slowpaths involve spinning when either a prospective reader
is waiting for a concurrent writer to drain, or a prospective writer is
waiting for concurrent readers to drain. In both of these situations,
atomic_cond_read_acquire can be used to avoid busy-waiting and make use
of any backoff functionality provided by the architecture.

This patch replaces the open-code loops and rspin_until_writer_unlock
implementation with atomic_cond_read_acquire. The write mode transition
zero to _QW_WAITING is left alone, since (a) this doesn't need acquire
semantics and (b) should be fast.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 kernel/locking/qrwlock.c | 47 +++++++++++------------------------------------
 1 file changed, 11 insertions(+), 36 deletions(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 1af791e37348..b7ea4647c74d 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -24,23 +24,6 @@
 #include <asm/qrwlock.h>
 
 /**
- * rspin_until_writer_unlock - inc reader count & spin until writer is gone
- * @lock  : Pointer to queue rwlock structure
- * @writer: Current queue rwlock writer status byte
- *
- * In interrupt context or at the head of the queue, the reader will just
- * increment the reader count & wait until the writer releases the lock.
- */
-static __always_inline void
-rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
-{
-	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
-		cpu_relax();
-		cnts = atomic_read_acquire(&lock->cnts);
-	}
-}
-
-/**
  * queued_read_lock_slowpath - acquire read lock of a queue rwlock
  * @lock: Pointer to queue rwlock structure
  * @cnts: Current qrwlock lock value
@@ -53,13 +36,12 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 	if (unlikely(in_interrupt())) {
 		/*
 		 * Readers in interrupt context will get the lock immediately
-		 * if the writer is just waiting (not holding the lock yet).
-		 * The rspin_until_writer_unlock() function returns immediately
-		 * in this case. Otherwise, they will spin (with ACQUIRE
-		 * semantics) until the lock is available without waiting in
-		 * the queue.
+		 * if the writer is just waiting (not holding the lock yet),
+		 * so spin with ACQUIRE semantics until the lock is available
+		 * without waiting in the queue.
 		 */
-		rspin_until_writer_unlock(lock, cnts);
+		atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK)
+					 != _QW_LOCKED);
 		return;
 	}
 	atomic_sub(_QR_BIAS, &lock->cnts);
@@ -68,14 +50,14 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 	 * Put the reader into the wait queue
 	 */
 	arch_spin_lock(&lock->wait_lock);
+	atomic_add(_QR_BIAS, &lock->cnts);
 
 	/*
 	 * The ACQUIRE semantics of the following spinning code ensure
 	 * that accesses can't leak upwards out of our subsequent critical
 	 * section in the case that the lock is currently held for write.
 	 */
-	cnts = atomic_fetch_add_acquire(_QR_BIAS, &lock->cnts);
-	rspin_until_writer_unlock(lock, cnts);
+	atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK) != _QW_LOCKED);
 
 	/*
 	 * Signal the next one in queue to become queue head
@@ -90,8 +72,6 @@ EXPORT_SYMBOL(queued_read_lock_slowpath);
  */
 void queued_write_lock_slowpath(struct qrwlock *lock)
 {
-	u32 cnts;
-
 	/* Put the writer into the wait queue */
 	arch_spin_lock(&lock->wait_lock);
 
@@ -113,15 +93,10 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	}
 
 	/* When no more readers, set the locked flag */
-	for (;;) {
-		cnts = atomic_read(&lock->cnts);
-		if ((cnts == _QW_WAITING) &&
-		    (atomic_cmpxchg_acquire(&lock->cnts, _QW_WAITING,
-					    _QW_LOCKED) == _QW_WAITING))
-			break;
-
-		cpu_relax();
-	}
+	do {
+		atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
+	} while (atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING,
+					_QW_LOCKED) != _QW_WAITING);
 unlock:
 	arch_spin_unlock(&lock->wait_lock);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/6] arm64: locking: Move rwlock implementation over to qrwlocks
  2017-10-05 12:54 ` Will Deacon
@ 2017-10-05 12:54   ` Will Deacon
  -1 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arm-kernel, Jeremy.Linton, peterz, mingo, longman,
	boqun.feng, paulmck, Will Deacon

Now that the qrwlock can make use of WFE, remove our homebrew rwlock
code in favour of the generic queued implementation.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig                      |  17 ++++
 arch/arm64/include/asm/Kbuild           |   1 +
 arch/arm64/include/asm/spinlock.h       | 164 +-------------------------------
 arch/arm64/include/asm/spinlock_types.h |   6 +-
 4 files changed, 20 insertions(+), 168 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..6d32c9b0d4bb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -22,7 +22,24 @@ config ARM64
 	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG if ACPI_APEI_SEA
+	select ARCH_INLINE_READ_LOCK if !PREEMPT
+	select ARCH_INLINE_READ_LOCK_BH if !PREEMPT
+	select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPT
+	select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPT
+	select ARCH_INLINE_READ_UNLOCK if !PREEMPT
+	select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPT
+	select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPT
+	select ARCH_INLINE_READ_UNLOCK_IRQSAVE if !PREEMPT
+	select ARCH_INLINE_WRITE_LOCK if !PREEMPT
+	select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPT
+	select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPT
+	select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPT
+	select ARCH_INLINE_WRITE_UNLOCK if !PREEMPT
+	select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPT
+	select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPT
+	select ARCH_INLINE_WRITE_UNLOCK_IRQSAVE if !PREEMPT
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_NUMA_BALANCING
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 2326e39d5892..e63d0a8312de 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += msi.h
 generic-y += preempt.h
+generic-y += qrwlock.h
 generic-y += rwsem.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index aa51a38e46e4..fdb827c7832f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -137,169 +137,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 }
 #define arch_spin_is_contended	arch_spin_is_contended
 
-/*
- * Write lock implementation.
- *
- * Write locks set bit 31. Unlocking, is done by writing 0 since the lock is
- * exclusively held.
- *
- * The memory barriers are implicit with the load-acquire and store-release
- * instructions.
- */
-
-static inline void arch_write_lock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"	sevl\n"
-	"1:	wfe\n"
-	"2:	ldaxr	%w0, %1\n"
-	"	cbnz	%w0, 1b\n"
-	"	stxr	%w0, %w2, %1\n"
-	"	cbnz	%w0, 2b\n"
-	__nops(1),
-	/* LSE atomics */
-	"1:	mov	%w0, wzr\n"
-	"2:	casa	%w0, %w2, %1\n"
-	"	cbz	%w0, 3f\n"
-	"	ldxr	%w0, %1\n"
-	"	cbz	%w0, 2b\n"
-	"	wfe\n"
-	"	b	1b\n"
-	"3:")
-	: "=&r" (tmp), "+Q" (rw->lock)
-	: "r" (0x80000000)
-	: "memory");
-}
-
-static inline int arch_write_trylock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"1:	ldaxr	%w0, %1\n"
-	"	cbnz	%w0, 2f\n"
-	"	stxr	%w0, %w2, %1\n"
-	"	cbnz	%w0, 1b\n"
-	"2:",
-	/* LSE atomics */
-	"	mov	%w0, wzr\n"
-	"	casa	%w0, %w2, %1\n"
-	__nops(2))
-	: "=&r" (tmp), "+Q" (rw->lock)
-	: "r" (0x80000000)
-	: "memory");
-
-	return !tmp;
-}
-
-static inline void arch_write_unlock(arch_rwlock_t *rw)
-{
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	"	stlr	wzr, %0",
-	"	swpl	wzr, wzr, %0")
-	: "=Q" (rw->lock) :: "memory");
-}
-
-/* write_can_lock - would write_trylock() succeed? */
-#define arch_write_can_lock(x)		((x)->lock == 0)
-
-/*
- * Read lock implementation.
- *
- * It exclusively loads the lock value, increments it and stores the new value
- * back if positive and the CPU still exclusively owns the location. If the
- * value is negative, the lock is already held.
- *
- * During unlocking there may be multiple active read locks but no write lock.
- *
- * The memory barriers are implicit with the load-acquire and store-release
- * instructions.
- *
- * Note that in UNDEFINED cases, such as unlocking a lock twice, the LL/SC
- * and LSE implementations may exhibit different behaviour (although this
- * will have no effect on lockdep).
- */
-static inline void arch_read_lock(arch_rwlock_t *rw)
-{
-	unsigned int tmp, tmp2;
-
-	asm volatile(
-	"	sevl\n"
-	ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"1:	wfe\n"
-	"2:	ldaxr	%w0, %2\n"
-	"	add	%w0, %w0, #1\n"
-	"	tbnz	%w0, #31, 1b\n"
-	"	stxr	%w1, %w0, %2\n"
-	"	cbnz	%w1, 2b\n"
-	__nops(1),
-	/* LSE atomics */
-	"1:	wfe\n"
-	"2:	ldxr	%w0, %2\n"
-	"	adds	%w1, %w0, #1\n"
-	"	tbnz	%w1, #31, 1b\n"
-	"	casa	%w0, %w1, %2\n"
-	"	sbc	%w0, %w1, %w0\n"
-	"	cbnz	%w0, 2b")
-	: "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
-	:
-	: "cc", "memory");
-}
-
-static inline void arch_read_unlock(arch_rwlock_t *rw)
-{
-	unsigned int tmp, tmp2;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"1:	ldxr	%w0, %2\n"
-	"	sub	%w0, %w0, #1\n"
-	"	stlxr	%w1, %w0, %2\n"
-	"	cbnz	%w1, 1b",
-	/* LSE atomics */
-	"	movn	%w0, #0\n"
-	"	staddl	%w0, %2\n"
-	__nops(2))
-	: "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
-	:
-	: "memory");
-}
-
-static inline int arch_read_trylock(arch_rwlock_t *rw)
-{
-	unsigned int tmp, tmp2;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"	mov	%w1, #1\n"
-	"1:	ldaxr	%w0, %2\n"
-	"	add	%w0, %w0, #1\n"
-	"	tbnz	%w0, #31, 2f\n"
-	"	stxr	%w1, %w0, %2\n"
-	"	cbnz	%w1, 1b\n"
-	"2:",
-	/* LSE atomics */
-	"	ldr	%w0, %2\n"
-	"	adds	%w1, %w0, #1\n"
-	"	tbnz	%w1, #31, 1f\n"
-	"	casa	%w0, %w1, %2\n"
-	"	sbc	%w1, %w1, %w0\n"
-	__nops(1)
-	"1:")
-	: "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
-	:
-	: "cc", "memory");
-
-	return !tmp2;
-}
-
-/* read_can_lock - would read_trylock() succeed? */
-#define arch_read_can_lock(x)		((x)->lock < 0x80000000)
+#include <asm/qrwlock.h>
 
 /* See include/linux/spinlock.h */
 #define smp_mb__after_spinlock()	smp_mb()
diff --git a/arch/arm64/include/asm/spinlock_types.h b/arch/arm64/include/asm/spinlock_types.h
index 55be59a35e3f..6b856012c51b 100644
--- a/arch/arm64/include/asm/spinlock_types.h
+++ b/arch/arm64/include/asm/spinlock_types.h
@@ -36,10 +36,6 @@ typedef struct {
 
 #define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 , 0 }
 
-typedef struct {
-	volatile unsigned int lock;
-} arch_rwlock_t;
-
-#define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
+#include <asm-generic/qrwlock_types.h>
 
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/6] arm64: locking: Move rwlock implementation over to qrwlocks
@ 2017-10-05 12:54   ` Will Deacon
  0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

Now that the qrwlock can make use of WFE, remove our homebrew rwlock
code in favour of the generic queued implementation.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig                      |  17 ++++
 arch/arm64/include/asm/Kbuild           |   1 +
 arch/arm64/include/asm/spinlock.h       | 164 +-------------------------------
 arch/arm64/include/asm/spinlock_types.h |   6 +-
 4 files changed, 20 insertions(+), 168 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..6d32c9b0d4bb 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -22,7 +22,24 @@ config ARM64
 	select ARCH_HAS_STRICT_MODULE_RWX
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_HAVE_NMI_SAFE_CMPXCHG if ACPI_APEI_SEA
+	select ARCH_INLINE_READ_LOCK if !PREEMPT
+	select ARCH_INLINE_READ_LOCK_BH if !PREEMPT
+	select ARCH_INLINE_READ_LOCK_IRQ if !PREEMPT
+	select ARCH_INLINE_READ_LOCK_IRQSAVE if !PREEMPT
+	select ARCH_INLINE_READ_UNLOCK if !PREEMPT
+	select ARCH_INLINE_READ_UNLOCK_BH if !PREEMPT
+	select ARCH_INLINE_READ_UNLOCK_IRQ if !PREEMPT
+	select ARCH_INLINE_READ_UNLOCK_IRQSAVE if !PREEMPT
+	select ARCH_INLINE_WRITE_LOCK if !PREEMPT
+	select ARCH_INLINE_WRITE_LOCK_BH if !PREEMPT
+	select ARCH_INLINE_WRITE_LOCK_IRQ if !PREEMPT
+	select ARCH_INLINE_WRITE_LOCK_IRQSAVE if !PREEMPT
+	select ARCH_INLINE_WRITE_UNLOCK if !PREEMPT
+	select ARCH_INLINE_WRITE_UNLOCK_BH if !PREEMPT
+	select ARCH_INLINE_WRITE_UNLOCK_IRQ if !PREEMPT
+	select ARCH_INLINE_WRITE_UNLOCK_IRQSAVE if !PREEMPT
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ARCH_USE_QUEUED_RWLOCKS
 	select ARCH_SUPPORTS_MEMORY_FAILURE
 	select ARCH_SUPPORTS_ATOMIC_RMW
 	select ARCH_SUPPORTS_NUMA_BALANCING
diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 2326e39d5892..e63d0a8312de 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -16,6 +16,7 @@ generic-y += mcs_spinlock.h
 generic-y += mm-arch-hooks.h
 generic-y += msi.h
 generic-y += preempt.h
+generic-y += qrwlock.h
 generic-y += rwsem.h
 generic-y += segment.h
 generic-y += serial.h
diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index aa51a38e46e4..fdb827c7832f 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -137,169 +137,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock)
 }
 #define arch_spin_is_contended	arch_spin_is_contended
 
-/*
- * Write lock implementation.
- *
- * Write locks set bit 31. Unlocking, is done by writing 0 since the lock is
- * exclusively held.
- *
- * The memory barriers are implicit with the load-acquire and store-release
- * instructions.
- */
-
-static inline void arch_write_lock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"	sevl\n"
-	"1:	wfe\n"
-	"2:	ldaxr	%w0, %1\n"
-	"	cbnz	%w0, 1b\n"
-	"	stxr	%w0, %w2, %1\n"
-	"	cbnz	%w0, 2b\n"
-	__nops(1),
-	/* LSE atomics */
-	"1:	mov	%w0, wzr\n"
-	"2:	casa	%w0, %w2, %1\n"
-	"	cbz	%w0, 3f\n"
-	"	ldxr	%w0, %1\n"
-	"	cbz	%w0, 2b\n"
-	"	wfe\n"
-	"	b	1b\n"
-	"3:")
-	: "=&r" (tmp), "+Q" (rw->lock)
-	: "r" (0x80000000)
-	: "memory");
-}
-
-static inline int arch_write_trylock(arch_rwlock_t *rw)
-{
-	unsigned int tmp;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"1:	ldaxr	%w0, %1\n"
-	"	cbnz	%w0, 2f\n"
-	"	stxr	%w0, %w2, %1\n"
-	"	cbnz	%w0, 1b\n"
-	"2:",
-	/* LSE atomics */
-	"	mov	%w0, wzr\n"
-	"	casa	%w0, %w2, %1\n"
-	__nops(2))
-	: "=&r" (tmp), "+Q" (rw->lock)
-	: "r" (0x80000000)
-	: "memory");
-
-	return !tmp;
-}
-
-static inline void arch_write_unlock(arch_rwlock_t *rw)
-{
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	"	stlr	wzr, %0",
-	"	swpl	wzr, wzr, %0")
-	: "=Q" (rw->lock) :: "memory");
-}
-
-/* write_can_lock - would write_trylock() succeed? */
-#define arch_write_can_lock(x)		((x)->lock == 0)
-
-/*
- * Read lock implementation.
- *
- * It exclusively loads the lock value, increments it and stores the new value
- * back if positive and the CPU still exclusively owns the location. If the
- * value is negative, the lock is already held.
- *
- * During unlocking there may be multiple active read locks but no write lock.
- *
- * The memory barriers are implicit with the load-acquire and store-release
- * instructions.
- *
- * Note that in UNDEFINED cases, such as unlocking a lock twice, the LL/SC
- * and LSE implementations may exhibit different behaviour (although this
- * will have no effect on lockdep).
- */
-static inline void arch_read_lock(arch_rwlock_t *rw)
-{
-	unsigned int tmp, tmp2;
-
-	asm volatile(
-	"	sevl\n"
-	ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"1:	wfe\n"
-	"2:	ldaxr	%w0, %2\n"
-	"	add	%w0, %w0, #1\n"
-	"	tbnz	%w0, #31, 1b\n"
-	"	stxr	%w1, %w0, %2\n"
-	"	cbnz	%w1, 2b\n"
-	__nops(1),
-	/* LSE atomics */
-	"1:	wfe\n"
-	"2:	ldxr	%w0, %2\n"
-	"	adds	%w1, %w0, #1\n"
-	"	tbnz	%w1, #31, 1b\n"
-	"	casa	%w0, %w1, %2\n"
-	"	sbc	%w0, %w1, %w0\n"
-	"	cbnz	%w0, 2b")
-	: "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
-	:
-	: "cc", "memory");
-}
-
-static inline void arch_read_unlock(arch_rwlock_t *rw)
-{
-	unsigned int tmp, tmp2;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"1:	ldxr	%w0, %2\n"
-	"	sub	%w0, %w0, #1\n"
-	"	stlxr	%w1, %w0, %2\n"
-	"	cbnz	%w1, 1b",
-	/* LSE atomics */
-	"	movn	%w0, #0\n"
-	"	staddl	%w0, %2\n"
-	__nops(2))
-	: "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
-	:
-	: "memory");
-}
-
-static inline int arch_read_trylock(arch_rwlock_t *rw)
-{
-	unsigned int tmp, tmp2;
-
-	asm volatile(ARM64_LSE_ATOMIC_INSN(
-	/* LL/SC */
-	"	mov	%w1, #1\n"
-	"1:	ldaxr	%w0, %2\n"
-	"	add	%w0, %w0, #1\n"
-	"	tbnz	%w0, #31, 2f\n"
-	"	stxr	%w1, %w0, %2\n"
-	"	cbnz	%w1, 1b\n"
-	"2:",
-	/* LSE atomics */
-	"	ldr	%w0, %2\n"
-	"	adds	%w1, %w0, #1\n"
-	"	tbnz	%w1, #31, 1f\n"
-	"	casa	%w0, %w1, %2\n"
-	"	sbc	%w1, %w1, %w0\n"
-	__nops(1)
-	"1:")
-	: "=&r" (tmp), "=&r" (tmp2), "+Q" (rw->lock)
-	:
-	: "cc", "memory");
-
-	return !tmp2;
-}
-
-/* read_can_lock - would read_trylock() succeed? */
-#define arch_read_can_lock(x)		((x)->lock < 0x80000000)
+#include <asm/qrwlock.h>
 
 /* See include/linux/spinlock.h */
 #define smp_mb__after_spinlock()	smp_mb()
diff --git a/arch/arm64/include/asm/spinlock_types.h b/arch/arm64/include/asm/spinlock_types.h
index 55be59a35e3f..6b856012c51b 100644
--- a/arch/arm64/include/asm/spinlock_types.h
+++ b/arch/arm64/include/asm/spinlock_types.h
@@ -36,10 +36,6 @@ typedef struct {
 
 #define __ARCH_SPIN_LOCK_UNLOCKED	{ 0 , 0 }
 
-typedef struct {
-	volatile unsigned int lock;
-} arch_rwlock_t;
-
-#define __ARCH_RW_LOCK_UNLOCKED		{ 0 }
+#include <asm-generic/qrwlock_types.h>
 
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] kernel/locking: Prevent slowpath writers getting held up by fastpath
  2017-10-05 12:54 ` Will Deacon
@ 2017-10-05 12:54   ` Will Deacon
  -1 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arm-kernel, Jeremy.Linton, peterz, mingo, longman,
	boqun.feng, paulmck, Will Deacon

When a prospective writer takes the qrwlock locking slowpath due to the
lock being held, it attempts to cmpxchg the wmode field from 0 to
_QW_WAITING so that concurrent lockers also take the slowpath and queue
on the spinlock accordingly, allowing the lockers to drain.

Unfortunately, this isn't fair, because a fastpath writer that comes in
after the lock is made available but before the _QW_WAITING flag is set
can effectively jump the queue. If there is a steady stream of prospective
writers, then the waiter will be held off indefinitely.

This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
into two bits in the wmode byte and having the waiter set _QW_WAITING
unconditionally. This then forces the slow-path for concurrent lockers,
but requires that a writer unlock operation performs an
atomic_sub_release instead of a store_release so that the waiting status
is preserved.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/qrwlock.h |  4 ++--
 kernel/locking/qrwlock.c      | 20 +++++---------------
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 02c0a768e6b0..8b7edef500e5 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -41,7 +41,7 @@
  *       +----+----+----+----+
  */
 #define	_QW_WAITING	1		/* A writer is waiting	   */
-#define	_QW_LOCKED	0xff		/* A writer holds the lock */
+#define	_QW_LOCKED	2		/* A writer holds the lock */
 #define	_QW_WMASK	0xff		/* Writer mask		   */
 #define	_QR_SHIFT	8		/* Reader count shift	   */
 #define _QR_BIAS	(1U << _QR_SHIFT)
@@ -134,7 +134,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
  */
 static inline void queued_write_unlock(struct qrwlock *lock)
 {
-	smp_store_release(&lock->wmode, 0);
+	(void)atomic_sub_return_release(_QW_LOCKED, &lock->cnts);
 }
 
 /*
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index b7ea4647c74d..e940f2c2b4f2 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -40,8 +40,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 		 * so spin with ACQUIRE semantics until the lock is available
 		 * without waiting in the queue.
 		 */
-		atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK)
-					 != _QW_LOCKED);
+		atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED));
 		return;
 	}
 	atomic_sub(_QR_BIAS, &lock->cnts);
@@ -57,7 +56,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 	 * that accesses can't leak upwards out of our subsequent critical
 	 * section in the case that the lock is currently held for write.
 	 */
-	atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK) != _QW_LOCKED);
+	atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED));
 
 	/*
 	 * Signal the next one in queue to become queue head
@@ -80,19 +79,10 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	    (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0))
 		goto unlock;
 
-	/*
-	 * Set the waiting flag to notify readers that a writer is pending,
-	 * or wait for a previous writer to go away.
-	 */
-	for (;;) {
-		if (!READ_ONCE(lock->wmode) &&
-		   (cmpxchg_relaxed(&lock->wmode, 0, _QW_WAITING) == 0))
-			break;
-
-		cpu_relax();
-	}
+	/* Set the waiting flag to notify readers that a writer is pending */
+	atomic_add(_QW_WAITING, &lock->cnts);
 
-	/* When no more readers, set the locked flag */
+	/* When no more readers or writers, set the locked flag */
 	do {
 		atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
 	} while (atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] kernel/locking: Prevent slowpath writers getting held up by fastpath
@ 2017-10-05 12:54   ` Will Deacon
  0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

When a prospective writer takes the qrwlock locking slowpath due to the
lock being held, it attempts to cmpxchg the wmode field from 0 to
_QW_WAITING so that concurrent lockers also take the slowpath and queue
on the spinlock accordingly, allowing the lockers to drain.

Unfortunately, this isn't fair, because a fastpath writer that comes in
after the lock is made available but before the _QW_WAITING flag is set
can effectively jump the queue. If there is a steady stream of prospective
writers, then the waiter will be held off indefinitely.

This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
into two bits in the wmode byte and having the waiter set _QW_WAITING
unconditionally. This then forces the slow-path for concurrent lockers,
but requires that a writer unlock operation performs an
atomic_sub_release instead of a store_release so that the waiting status
is preserved.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/qrwlock.h |  4 ++--
 kernel/locking/qrwlock.c      | 20 +++++---------------
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 02c0a768e6b0..8b7edef500e5 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -41,7 +41,7 @@
  *       +----+----+----+----+
  */
 #define	_QW_WAITING	1		/* A writer is waiting	   */
-#define	_QW_LOCKED	0xff		/* A writer holds the lock */
+#define	_QW_LOCKED	2		/* A writer holds the lock */
 #define	_QW_WMASK	0xff		/* Writer mask		   */
 #define	_QR_SHIFT	8		/* Reader count shift	   */
 #define _QR_BIAS	(1U << _QR_SHIFT)
@@ -134,7 +134,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
  */
 static inline void queued_write_unlock(struct qrwlock *lock)
 {
-	smp_store_release(&lock->wmode, 0);
+	(void)atomic_sub_return_release(_QW_LOCKED, &lock->cnts);
 }
 
 /*
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index b7ea4647c74d..e940f2c2b4f2 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -40,8 +40,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 		 * so spin with ACQUIRE semantics until the lock is available
 		 * without waiting in the queue.
 		 */
-		atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK)
-					 != _QW_LOCKED);
+		atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED));
 		return;
 	}
 	atomic_sub(_QR_BIAS, &lock->cnts);
@@ -57,7 +56,7 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
 	 * that accesses can't leak upwards out of our subsequent critical
 	 * section in the case that the lock is currently held for write.
 	 */
-	atomic_cond_read_acquire(&lock->cnts, (VAL & _QW_WMASK) != _QW_LOCKED);
+	atomic_cond_read_acquire(&lock->cnts, !(VAL & _QW_LOCKED));
 
 	/*
 	 * Signal the next one in queue to become queue head
@@ -80,19 +79,10 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	    (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0))
 		goto unlock;
 
-	/*
-	 * Set the waiting flag to notify readers that a writer is pending,
-	 * or wait for a previous writer to go away.
-	 */
-	for (;;) {
-		if (!READ_ONCE(lock->wmode) &&
-		   (cmpxchg_relaxed(&lock->wmode, 0, _QW_WAITING) == 0))
-			break;
-
-		cpu_relax();
-	}
+	/* Set the waiting flag to notify readers that a writer is pending */
+	atomic_add(_QW_WAITING, &lock->cnts);
 
-	/* When no more readers, set the locked flag */
+	/* When no more readers or writers, set the locked flag */
 	do {
 		atomic_cond_read_acquire(&lock->cnts, VAL == _QW_WAITING);
 	} while (atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/6] kernel/locking: Remove unused union members from struct qrwlock
  2017-10-05 12:54 ` Will Deacon
@ 2017-10-05 12:54   ` Will Deacon
  -1 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-arm-kernel, Jeremy.Linton, peterz, mingo, longman,
	boqun.feng, paulmck, Will Deacon

Now that all atomic operations are performed on the full lock word of
the qrwlock, there's no need to define a union type exposing the reader
and writer subcomponents. Remove them.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/qrwlock_types.h | 15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/asm-generic/qrwlock_types.h b/include/asm-generic/qrwlock_types.h
index 507f2dc51bba..7f53be31359c 100644
--- a/include/asm-generic/qrwlock_types.h
+++ b/include/asm-generic/qrwlock_types.h
@@ -9,23 +9,12 @@
  */
 
 typedef struct qrwlock {
-	union {
-		atomic_t cnts;
-		struct {
-#ifdef __LITTLE_ENDIAN
-			u8 wmode;	/* Writer mode   */
-			u8 rcnts[3];	/* Reader counts */
-#else
-			u8 rcnts[3];	/* Reader counts */
-			u8 wmode;	/* Writer mode   */
-#endif
-		};
-	};
+	atomic_t cnts;
 	arch_spinlock_t		wait_lock;
 } arch_rwlock_t;
 
 #define	__ARCH_RW_LOCK_UNLOCKED {		\
-	{ .cnts = ATOMIC_INIT(0), },		\
+	.cnts = ATOMIC_INIT(0),			\
 	.wait_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/6] kernel/locking: Remove unused union members from struct qrwlock
@ 2017-10-05 12:54   ` Will Deacon
  0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

Now that all atomic operations are performed on the full lock word of
the qrwlock, there's no need to define a union type exposing the reader
and writer subcomponents. Remove them.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/qrwlock_types.h | 15 ++-------------
 1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/include/asm-generic/qrwlock_types.h b/include/asm-generic/qrwlock_types.h
index 507f2dc51bba..7f53be31359c 100644
--- a/include/asm-generic/qrwlock_types.h
+++ b/include/asm-generic/qrwlock_types.h
@@ -9,23 +9,12 @@
  */
 
 typedef struct qrwlock {
-	union {
-		atomic_t cnts;
-		struct {
-#ifdef __LITTLE_ENDIAN
-			u8 wmode;	/* Writer mode   */
-			u8 rcnts[3];	/* Reader counts */
-#else
-			u8 rcnts[3];	/* Reader counts */
-			u8 wmode;	/* Writer mode   */
-#endif
-		};
-	};
+	atomic_t cnts;
 	arch_spinlock_t		wait_lock;
 } arch_rwlock_t;
 
 #define	__ARCH_RW_LOCK_UNLOCKED {		\
-	{ .cnts = ATOMIC_INIT(0), },		\
+	.cnts = ATOMIC_INIT(0),			\
 	.wait_lock = __ARCH_SPIN_LOCK_UNLOCKED,	\
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] kernel/locking: Prevent slowpath writers getting held up by fastpath
  2017-10-05 12:54   ` Will Deacon
@ 2017-10-05 13:56     ` Peter Zijlstra
  -1 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2017-10-05 13:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, Jeremy.Linton, mingo, longman,
	boqun.feng, paulmck

On Thu, Oct 05, 2017 at 01:54:56PM +0100, Will Deacon wrote:
> When a prospective writer takes the qrwlock locking slowpath due to the
> lock being held, it attempts to cmpxchg the wmode field from 0 to
> _QW_WAITING so that concurrent lockers also take the slowpath and queue
> on the spinlock accordingly, allowing the lockers to drain.
> 
> Unfortunately, this isn't fair, because a fastpath writer that comes in
> after the lock is made available but before the _QW_WAITING flag is set
> can effectively jump the queue. If there is a steady stream of prospective
> writers, then the waiter will be held off indefinitely.
> 
> This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
> into two bits in the wmode byte and having the waiter set _QW_WAITING
> unconditionally. This then forces the slow-path for concurrent lockers,
> but requires that a writer unlock operation performs an
> atomic_sub_release instead of a store_release so that the waiting status
> is preserved.

> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> index 02c0a768e6b0..8b7edef500e5 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -41,7 +41,7 @@
>   *       +----+----+----+----+
>   */
>  #define	_QW_WAITING	1		/* A writer is waiting	   */
> -#define	_QW_LOCKED	0xff		/* A writer holds the lock */
> +#define	_QW_LOCKED	2		/* A writer holds the lock */
>  #define	_QW_WMASK	0xff		/* Writer mask		   */
>  #define	_QR_SHIFT	8		/* Reader count shift	   */
>  #define _QR_BIAS	(1U << _QR_SHIFT)
> @@ -134,7 +134,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
>   */
>  static inline void queued_write_unlock(struct qrwlock *lock)
>  {
> -	smp_store_release(&lock->wmode, 0);
> +	(void)atomic_sub_return_release(_QW_LOCKED, &lock->cnts);
>  }

That is a fairly painful hit on x86. Changes a regular store into an
"LOCK XADD" +20 cycles right there.

Can't we steal one of the reader bits for waiting?

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 7d026bf27713..5524801a02a5 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -40,10 +40,10 @@
  *       |      rd      | wr |
  *       +----+----+----+----+
  */
-#define	_QW_WAITING	1		/* A writer is waiting	   */
-#define	_QW_LOCKED	0xff		/* A writer holds the lock */
-#define	_QW_WMASK	0xff		/* Writer mask		   */
-#define	_QR_SHIFT	8		/* Reader count shift	   */
+#define	_QW_WAITING	0x100		/* A writer is waiting	   */
+#define	_QW_LOCKED	0x0ff		/* A writer holds the lock */
+#define	_QW_WMASK	0x1ff		/* Writer mask		   */
+#define	_QR_SHIFT	9		/* Reader count shift	   */
 #define _QR_BIAS	(1U << _QR_SHIFT)
 
 /*
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 2655f26ec882..5f75caea97e0 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -54,7 +54,7 @@ struct __qrwlock {
 static __always_inline void
 rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
 {
-	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
+	while ((cnts & _QW_LOCKED)) {
 		cpu_relax();
 		cnts = atomic_read_acquire(&lock->cnts);
 	}
@@ -120,21 +120,10 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	    (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0))
 		goto unlock;
 
-	/*
-	 * Set the waiting flag to notify readers that a writer is pending,
-	 * or wait for a previous writer to go away.
-	 */
-	for (;;) {
-		struct __qrwlock *l = (struct __qrwlock *)lock;
-
-		if (!READ_ONCE(l->wmode) &&
-		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
-			break;
-
-		cpu_relax();
-	}
+	/* Set the waiting flag to notify readers that a writer is pending */
+	atomic_add(_QW_WAITING, &lock->cnts);
 
-	/* When no more readers, set the locked flag */
+	/* When no more readers or writers, set the locked flag */
 	for (;;) {
 		cnts = atomic_read(&lock->cnts);
 		if ((cnts == _QW_WAITING) &&

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/6] kernel/locking: Prevent slowpath writers getting held up by fastpath
@ 2017-10-05 13:56     ` Peter Zijlstra
  0 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2017-10-05 13:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 05, 2017 at 01:54:56PM +0100, Will Deacon wrote:
> When a prospective writer takes the qrwlock locking slowpath due to the
> lock being held, it attempts to cmpxchg the wmode field from 0 to
> _QW_WAITING so that concurrent lockers also take the slowpath and queue
> on the spinlock accordingly, allowing the lockers to drain.
> 
> Unfortunately, this isn't fair, because a fastpath writer that comes in
> after the lock is made available but before the _QW_WAITING flag is set
> can effectively jump the queue. If there is a steady stream of prospective
> writers, then the waiter will be held off indefinitely.
> 
> This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
> into two bits in the wmode byte and having the waiter set _QW_WAITING
> unconditionally. This then forces the slow-path for concurrent lockers,
> but requires that a writer unlock operation performs an
> atomic_sub_release instead of a store_release so that the waiting status
> is preserved.

> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> index 02c0a768e6b0..8b7edef500e5 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -41,7 +41,7 @@
>   *       +----+----+----+----+
>   */
>  #define	_QW_WAITING	1		/* A writer is waiting	   */
> -#define	_QW_LOCKED	0xff		/* A writer holds the lock */
> +#define	_QW_LOCKED	2		/* A writer holds the lock */
>  #define	_QW_WMASK	0xff		/* Writer mask		   */
>  #define	_QR_SHIFT	8		/* Reader count shift	   */
>  #define _QR_BIAS	(1U << _QR_SHIFT)
> @@ -134,7 +134,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
>   */
>  static inline void queued_write_unlock(struct qrwlock *lock)
>  {
> -	smp_store_release(&lock->wmode, 0);
> +	(void)atomic_sub_return_release(_QW_LOCKED, &lock->cnts);
>  }

That is a fairly painful hit on x86. Changes a regular store into an
"LOCK XADD" +20 cycles right there.

Can't we steal one of the reader bits for waiting?

diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
index 7d026bf27713..5524801a02a5 100644
--- a/include/asm-generic/qrwlock.h
+++ b/include/asm-generic/qrwlock.h
@@ -40,10 +40,10 @@
  *       |      rd      | wr |
  *       +----+----+----+----+
  */
-#define	_QW_WAITING	1		/* A writer is waiting	   */
-#define	_QW_LOCKED	0xff		/* A writer holds the lock */
-#define	_QW_WMASK	0xff		/* Writer mask		   */
-#define	_QR_SHIFT	8		/* Reader count shift	   */
+#define	_QW_WAITING	0x100		/* A writer is waiting	   */
+#define	_QW_LOCKED	0x0ff		/* A writer holds the lock */
+#define	_QW_WMASK	0x1ff		/* Writer mask		   */
+#define	_QR_SHIFT	9		/* Reader count shift	   */
 #define _QR_BIAS	(1U << _QR_SHIFT)
 
 /*
diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 2655f26ec882..5f75caea97e0 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c
@@ -54,7 +54,7 @@ struct __qrwlock {
 static __always_inline void
 rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
 {
-	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
+	while ((cnts & _QW_LOCKED)) {
 		cpu_relax();
 		cnts = atomic_read_acquire(&lock->cnts);
 	}
@@ -120,21 +120,10 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
 	    (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0))
 		goto unlock;
 
-	/*
-	 * Set the waiting flag to notify readers that a writer is pending,
-	 * or wait for a previous writer to go away.
-	 */
-	for (;;) {
-		struct __qrwlock *l = (struct __qrwlock *)lock;
-
-		if (!READ_ONCE(l->wmode) &&
-		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
-			break;
-
-		cpu_relax();
-	}
+	/* Set the waiting flag to notify readers that a writer is pending */
+	atomic_add(_QW_WAITING, &lock->cnts);
 
-	/* When no more readers, set the locked flag */
+	/* When no more readers or writers, set the locked flag */
 	for (;;) {
 		cnts = atomic_read(&lock->cnts);
 		if ((cnts == _QW_WAITING) &&

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] kernel/locking: Prevent slowpath writers getting held up by fastpath
  2017-10-05 13:56     ` Peter Zijlstra
@ 2017-10-05 14:37       ` Waiman Long
  -1 siblings, 0 replies; 25+ messages in thread
From: Waiman Long @ 2017-10-05 14:37 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon
  Cc: linux-kernel, linux-arm-kernel, Jeremy.Linton, mingo, boqun.feng,
	paulmck

On 10/05/2017 09:56 AM, Peter Zijlstra wrote:
> On Thu, Oct 05, 2017 at 01:54:56PM +0100, Will Deacon wrote:
>> When a prospective writer takes the qrwlock locking slowpath due to the
>> lock being held, it attempts to cmpxchg the wmode field from 0 to
>> _QW_WAITING so that concurrent lockers also take the slowpath and queue
>> on the spinlock accordingly, allowing the lockers to drain.
>>
>> Unfortunately, this isn't fair, because a fastpath writer that comes in
>> after the lock is made available but before the _QW_WAITING flag is set
>> can effectively jump the queue. If there is a steady stream of prospective
>> writers, then the waiter will be held off indefinitely.
>>
>> This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
>> into two bits in the wmode byte and having the waiter set _QW_WAITING
>> unconditionally. This then forces the slow-path for concurrent lockers,
>> but requires that a writer unlock operation performs an
>> atomic_sub_release instead of a store_release so that the waiting status
>> is preserved.
>> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
>> index 02c0a768e6b0..8b7edef500e5 100644
>> --- a/include/asm-generic/qrwlock.h
>> +++ b/include/asm-generic/qrwlock.h
>> @@ -41,7 +41,7 @@
>>   *       +----+----+----+----+
>>   */
>>  #define	_QW_WAITING	1		/* A writer is waiting	   */
>> -#define	_QW_LOCKED	0xff		/* A writer holds the lock */
>> +#define	_QW_LOCKED	2		/* A writer holds the lock */
>>  #define	_QW_WMASK	0xff		/* Writer mask		   */
>>  #define	_QR_SHIFT	8		/* Reader count shift	   */
>>  #define _QR_BIAS	(1U << _QR_SHIFT)
>> @@ -134,7 +134,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
>>   */
>>  static inline void queued_write_unlock(struct qrwlock *lock)
>>  {
>> -	smp_store_release(&lock->wmode, 0);
>> +	(void)atomic_sub_return_release(_QW_LOCKED, &lock->cnts);
>>  }
> That is a fairly painful hit on x86. Changes a regular store into an
> "LOCK XADD" +20 cycles right there.
>
> Can't we steal one of the reader bits for waiting?
>
> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> index 7d026bf27713..5524801a02a5 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -40,10 +40,10 @@
>   *       |      rd      | wr |
>   *       +----+----+----+----+
>   */
> -#define	_QW_WAITING	1		/* A writer is waiting	   */
> -#define	_QW_LOCKED	0xff		/* A writer holds the lock */
> -#define	_QW_WMASK	0xff		/* Writer mask		   */
> -#define	_QR_SHIFT	8		/* Reader count shift	   */
> +#define	_QW_WAITING	0x100		/* A writer is waiting	   */
> +#define	_QW_LOCKED	0x0ff		/* A writer holds the lock */
> +#define	_QW_WMASK	0x1ff		/* Writer mask		   */
> +#define	_QR_SHIFT	9		/* Reader count shift	   */
>  #define _QR_BIAS	(1U << _QR_SHIFT)
>  
>  /*
> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index 2655f26ec882..5f75caea97e0 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -54,7 +54,7 @@ struct __qrwlock {
>  static __always_inline void
>  rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
>  {
> -	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
> +	while ((cnts & _QW_LOCKED)) {
>  		cpu_relax();
>  		cnts = atomic_read_acquire(&lock->cnts);
>  	}
> @@ -120,21 +120,10 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
>  	    (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0))
>  		goto unlock;
>  
> -	/*
> -	 * Set the waiting flag to notify readers that a writer is pending,
> -	 * or wait for a previous writer to go away.
> -	 */
> -	for (;;) {
> -		struct __qrwlock *l = (struct __qrwlock *)lock;
> -
> -		if (!READ_ONCE(l->wmode) &&
> -		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
> -			break;
> -
> -		cpu_relax();
> -	}
> +	/* Set the waiting flag to notify readers that a writer is pending */
> +	atomic_add(_QW_WAITING, &lock->cnts);
>  
> -	/* When no more readers, set the locked flag */
> +	/* When no more readers or writers, set the locked flag */
>  	for (;;) {
>  		cnts = atomic_read(&lock->cnts);
>  		if ((cnts == _QW_WAITING) &&

I think this is the better of dealing with the fairness problem.

Acked-by: Waiman Long <longman@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 5/6] kernel/locking: Prevent slowpath writers getting held up by fastpath
@ 2017-10-05 14:37       ` Waiman Long
  0 siblings, 0 replies; 25+ messages in thread
From: Waiman Long @ 2017-10-05 14:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/05/2017 09:56 AM, Peter Zijlstra wrote:
> On Thu, Oct 05, 2017 at 01:54:56PM +0100, Will Deacon wrote:
>> When a prospective writer takes the qrwlock locking slowpath due to the
>> lock being held, it attempts to cmpxchg the wmode field from 0 to
>> _QW_WAITING so that concurrent lockers also take the slowpath and queue
>> on the spinlock accordingly, allowing the lockers to drain.
>>
>> Unfortunately, this isn't fair, because a fastpath writer that comes in
>> after the lock is made available but before the _QW_WAITING flag is set
>> can effectively jump the queue. If there is a steady stream of prospective
>> writers, then the waiter will be held off indefinitely.
>>
>> This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
>> into two bits in the wmode byte and having the waiter set _QW_WAITING
>> unconditionally. This then forces the slow-path for concurrent lockers,
>> but requires that a writer unlock operation performs an
>> atomic_sub_release instead of a store_release so that the waiting status
>> is preserved.
>> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
>> index 02c0a768e6b0..8b7edef500e5 100644
>> --- a/include/asm-generic/qrwlock.h
>> +++ b/include/asm-generic/qrwlock.h
>> @@ -41,7 +41,7 @@
>>   *       +----+----+----+----+
>>   */
>>  #define	_QW_WAITING	1		/* A writer is waiting	   */
>> -#define	_QW_LOCKED	0xff		/* A writer holds the lock */
>> +#define	_QW_LOCKED	2		/* A writer holds the lock */
>>  #define	_QW_WMASK	0xff		/* Writer mask		   */
>>  #define	_QR_SHIFT	8		/* Reader count shift	   */
>>  #define _QR_BIAS	(1U << _QR_SHIFT)
>> @@ -134,7 +134,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
>>   */
>>  static inline void queued_write_unlock(struct qrwlock *lock)
>>  {
>> -	smp_store_release(&lock->wmode, 0);
>> +	(void)atomic_sub_return_release(_QW_LOCKED, &lock->cnts);
>>  }
> That is a fairly painful hit on x86. Changes a regular store into an
> "LOCK XADD" +20 cycles right there.
>
> Can't we steal one of the reader bits for waiting?
>
> diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> index 7d026bf27713..5524801a02a5 100644
> --- a/include/asm-generic/qrwlock.h
> +++ b/include/asm-generic/qrwlock.h
> @@ -40,10 +40,10 @@
>   *       |      rd      | wr |
>   *       +----+----+----+----+
>   */
> -#define	_QW_WAITING	1		/* A writer is waiting	   */
> -#define	_QW_LOCKED	0xff		/* A writer holds the lock */
> -#define	_QW_WMASK	0xff		/* Writer mask		   */
> -#define	_QR_SHIFT	8		/* Reader count shift	   */
> +#define	_QW_WAITING	0x100		/* A writer is waiting	   */
> +#define	_QW_LOCKED	0x0ff		/* A writer holds the lock */
> +#define	_QW_WMASK	0x1ff		/* Writer mask		   */
> +#define	_QR_SHIFT	9		/* Reader count shift	   */
>  #define _QR_BIAS	(1U << _QR_SHIFT)
>  
>  /*
> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
> index 2655f26ec882..5f75caea97e0 100644
> --- a/kernel/locking/qrwlock.c
> +++ b/kernel/locking/qrwlock.c
> @@ -54,7 +54,7 @@ struct __qrwlock {
>  static __always_inline void
>  rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
>  {
> -	while ((cnts & _QW_WMASK) == _QW_LOCKED) {
> +	while ((cnts & _QW_LOCKED)) {
>  		cpu_relax();
>  		cnts = atomic_read_acquire(&lock->cnts);
>  	}
> @@ -120,21 +120,10 @@ void queued_write_lock_slowpath(struct qrwlock *lock)
>  	    (atomic_cmpxchg_acquire(&lock->cnts, 0, _QW_LOCKED) == 0))
>  		goto unlock;
>  
> -	/*
> -	 * Set the waiting flag to notify readers that a writer is pending,
> -	 * or wait for a previous writer to go away.
> -	 */
> -	for (;;) {
> -		struct __qrwlock *l = (struct __qrwlock *)lock;
> -
> -		if (!READ_ONCE(l->wmode) &&
> -		   (cmpxchg_relaxed(&l->wmode, 0, _QW_WAITING) == 0))
> -			break;
> -
> -		cpu_relax();
> -	}
> +	/* Set the waiting flag to notify readers that a writer is pending */
> +	atomic_add(_QW_WAITING, &lock->cnts);
>  
> -	/* When no more readers, set the locked flag */
> +	/* When no more readers or writers, set the locked flag */
>  	for (;;) {
>  		cnts = atomic_read(&lock->cnts);
>  		if ((cnts == _QW_WAITING) &&

I think this is the better of dealing with the fairness problem.

Acked-by: Waiman Long <longman@redhat.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/6] kernel/locking: Prevent slowpath writers getting held up by fastpath
  2017-10-05 13:56     ` Peter Zijlstra
@ 2017-10-05 14:42       ` Will Deacon
  -1 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 14:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arm-kernel, Jeremy.Linton, mingo, longman,
	boqun.feng, paulmck

HI Peter,

Thanks for having a look.

On Thu, Oct 05, 2017 at 03:56:18PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 05, 2017 at 01:54:56PM +0100, Will Deacon wrote:
> > When a prospective writer takes the qrwlock locking slowpath due to the
> > lock being held, it attempts to cmpxchg the wmode field from 0 to
> > _QW_WAITING so that concurrent lockers also take the slowpath and queue
> > on the spinlock accordingly, allowing the lockers to drain.
> > 
> > Unfortunately, this isn't fair, because a fastpath writer that comes in
> > after the lock is made available but before the _QW_WAITING flag is set
> > can effectively jump the queue. If there is a steady stream of prospective
> > writers, then the waiter will be held off indefinitely.
> > 
> > This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
> > into two bits in the wmode byte and having the waiter set _QW_WAITING
> > unconditionally. This then forces the slow-path for concurrent lockers,
> > but requires that a writer unlock operation performs an
> > atomic_sub_release instead of a store_release so that the waiting status
> > is preserved.
> 
> > diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> > index 02c0a768e6b0..8b7edef500e5 100644
> > --- a/include/asm-generic/qrwlock.h
> > +++ b/include/asm-generic/qrwlock.h
> > @@ -41,7 +41,7 @@
> >   *       +----+----+----+----+
> >   */
> >  #define	_QW_WAITING	1		/* A writer is waiting	   */
> > -#define	_QW_LOCKED	0xff		/* A writer holds the lock */
> > +#define	_QW_LOCKED	2		/* A writer holds the lock */
> >  #define	_QW_WMASK	0xff		/* Writer mask		   */
> >  #define	_QR_SHIFT	8		/* Reader count shift	   */
> >  #define _QR_BIAS	(1U << _QR_SHIFT)
> > @@ -134,7 +134,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
> >   */
> >  static inline void queued_write_unlock(struct qrwlock *lock)
> >  {
> > -	smp_store_release(&lock->wmode, 0);
> > +	(void)atomic_sub_return_release(_QW_LOCKED, &lock->cnts);
> >  }
> 
> That is a fairly painful hit on x86. Changes a regular store into an
> "LOCK XADD" +20 cycles right there.

Yeah, I mentioned that in the cover letter which is also why it's at the end
of the series ;) However, it's worth noting that this is the same as the
reader unlock path and, as it stands, there's a real risk of writer
starvation with the current code which isn't great for a queued lock.

> Can't we steal one of the reader bits for waiting?

I considered this at LPC and somehow convinced myself it didn't work, but
actually all it's really doing is making the _QW_LOCKED bit a byte, so it
should work fine.

I'll work that into v2.

Will

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 5/6] kernel/locking: Prevent slowpath writers getting held up by fastpath
@ 2017-10-05 14:42       ` Will Deacon
  0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-05 14:42 UTC (permalink / raw)
  To: linux-arm-kernel

HI Peter,

Thanks for having a look.

On Thu, Oct 05, 2017 at 03:56:18PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 05, 2017 at 01:54:56PM +0100, Will Deacon wrote:
> > When a prospective writer takes the qrwlock locking slowpath due to the
> > lock being held, it attempts to cmpxchg the wmode field from 0 to
> > _QW_WAITING so that concurrent lockers also take the slowpath and queue
> > on the spinlock accordingly, allowing the lockers to drain.
> > 
> > Unfortunately, this isn't fair, because a fastpath writer that comes in
> > after the lock is made available but before the _QW_WAITING flag is set
> > can effectively jump the queue. If there is a steady stream of prospective
> > writers, then the waiter will be held off indefinitely.
> > 
> > This patch restores fairness by separating _QW_WAITING and _QW_LOCKED
> > into two bits in the wmode byte and having the waiter set _QW_WAITING
> > unconditionally. This then forces the slow-path for concurrent lockers,
> > but requires that a writer unlock operation performs an
> > atomic_sub_release instead of a store_release so that the waiting status
> > is preserved.
> 
> > diff --git a/include/asm-generic/qrwlock.h b/include/asm-generic/qrwlock.h
> > index 02c0a768e6b0..8b7edef500e5 100644
> > --- a/include/asm-generic/qrwlock.h
> > +++ b/include/asm-generic/qrwlock.h
> > @@ -41,7 +41,7 @@
> >   *       +----+----+----+----+
> >   */
> >  #define	_QW_WAITING	1		/* A writer is waiting	   */
> > -#define	_QW_LOCKED	0xff		/* A writer holds the lock */
> > +#define	_QW_LOCKED	2		/* A writer holds the lock */
> >  #define	_QW_WMASK	0xff		/* Writer mask		   */
> >  #define	_QR_SHIFT	8		/* Reader count shift	   */
> >  #define _QR_BIAS	(1U << _QR_SHIFT)
> > @@ -134,7 +134,7 @@ static inline void queued_read_unlock(struct qrwlock *lock)
> >   */
> >  static inline void queued_write_unlock(struct qrwlock *lock)
> >  {
> > -	smp_store_release(&lock->wmode, 0);
> > +	(void)atomic_sub_return_release(_QW_LOCKED, &lock->cnts);
> >  }
> 
> That is a fairly painful hit on x86. Changes a regular store into an
> "LOCK XADD" +20 cycles right there.

Yeah, I mentioned that in the cover letter which is also why it's at the end
of the series ;) However, it's worth noting that this is the same as the
reader unlock path and, as it stands, there's a real risk of writer
starvation with the current code which isn't great for a queued lock.

> Can't we steal one of the reader bits for waiting?

I considered this at LPC and somehow convinced myself it didn't work, but
actually all it's really doing is making the _QW_LOCKED bit a byte, so it
should work fine.

I'll work that into v2.

Will

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/6] Switch arm64 over to qrwlock
  2017-10-05 12:54 ` Will Deacon
@ 2017-10-05 22:12   ` Jeremy Linton
  -1 siblings, 0 replies; 25+ messages in thread
From: Jeremy Linton @ 2017-10-05 22:12 UTC (permalink / raw)
  To: Will Deacon, linux-kernel
  Cc: linux-arm-kernel, peterz, mingo, longman, boqun.feng, paulmck

Hi,


On 10/05/2017 07:54 AM, Will Deacon wrote:
> Hi all,
> 
> This patch series reworks bits of the qrwlock code that it can be used
> to replace the asm rwlocks currently implemented for arm64. The structure
> of the series is:
> 
>    Patches 1-3	: Work WFE into qrwlock using atomic_cond_read_acquire so
> 		  we can avoid busy-waiting.
> 
>    Patch 4	: Enable qrwlocks for arm64
> 
>    Patch 5-6	: Ensure writer slowpath fairness. This has a potential
> 		  performance impact on the writer unlock path, so I've
> 		  kept them at the end.
> 
> The patches apply on top of my other locking cleanups:
> 
>    http://lkml.kernel.org/r/1507055129-12300-1-git-send-email-will.deacon@arm.com
> 
> although the conflict with mainline is trivial to resolve without those.
> The full stack is also pushed here:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
> 
> All comments (particularly related to testing and performance) welcome!

I haven't done any perf testing, but the machines continue to boot, and 
the stress-ng test which causes task lock problems with the normal arm64 
rwlock now appears to run as expected. So, its a good start!



> 
> Cheers,
> 
> Will
> 
> --->8
> 
> Will Deacon (6):
>    kernel/locking: Use struct qrwlock instead of struct __qrwlock
>    locking/atomic: Add atomic_cond_read_acquire
>    kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
>    arm64: locking: Move rwlock implementation over to qrwlocks
>    kernel/locking: Prevent slowpath writers getting held up by fastpath
>    kernel/locking: Remove unused union members from struct qrwlock
> 
>   arch/arm64/Kconfig                      |  17 ++++
>   arch/arm64/include/asm/Kbuild           |   1 +
>   arch/arm64/include/asm/spinlock.h       | 164 +-------------------------------
>   arch/arm64/include/asm/spinlock_types.h |   6 +-
>   include/asm-generic/atomic-long.h       |   3 +
>   include/asm-generic/qrwlock.h           |  14 +--
>   include/asm-generic/qrwlock_types.h     |   2 +-
>   include/linux/atomic.h                  |   4 +
>   kernel/locking/qrwlock.c                |  83 +++-------------
>   9 files changed, 43 insertions(+), 251 deletions(-)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 0/6] Switch arm64 over to qrwlock
@ 2017-10-05 22:12   ` Jeremy Linton
  0 siblings, 0 replies; 25+ messages in thread
From: Jeremy Linton @ 2017-10-05 22:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,


On 10/05/2017 07:54 AM, Will Deacon wrote:
> Hi all,
> 
> This patch series reworks bits of the qrwlock code that it can be used
> to replace the asm rwlocks currently implemented for arm64. The structure
> of the series is:
> 
>    Patches 1-3	: Work WFE into qrwlock using atomic_cond_read_acquire so
> 		  we can avoid busy-waiting.
> 
>    Patch 4	: Enable qrwlocks for arm64
> 
>    Patch 5-6	: Ensure writer slowpath fairness. This has a potential
> 		  performance impact on the writer unlock path, so I've
> 		  kept them at the end.
> 
> The patches apply on top of my other locking cleanups:
> 
>    http://lkml.kernel.org/r/1507055129-12300-1-git-send-email-will.deacon at arm.com
> 
> although the conflict with mainline is trivial to resolve without those.
> The full stack is also pushed here:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
> 
> All comments (particularly related to testing and performance) welcome!

I haven't done any perf testing, but the machines continue to boot, and 
the stress-ng test which causes task lock problems with the normal arm64 
rwlock now appears to run as expected. So, its a good start!



> 
> Cheers,
> 
> Will
> 
> --->8
> 
> Will Deacon (6):
>    kernel/locking: Use struct qrwlock instead of struct __qrwlock
>    locking/atomic: Add atomic_cond_read_acquire
>    kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
>    arm64: locking: Move rwlock implementation over to qrwlocks
>    kernel/locking: Prevent slowpath writers getting held up by fastpath
>    kernel/locking: Remove unused union members from struct qrwlock
> 
>   arch/arm64/Kconfig                      |  17 ++++
>   arch/arm64/include/asm/Kbuild           |   1 +
>   arch/arm64/include/asm/spinlock.h       | 164 +-------------------------------
>   arch/arm64/include/asm/spinlock_types.h |   6 +-
>   include/asm-generic/atomic-long.h       |   3 +
>   include/asm-generic/qrwlock.h           |  14 +--
>   include/asm-generic/qrwlock_types.h     |   2 +-
>   include/linux/atomic.h                  |   4 +
>   kernel/locking/qrwlock.c                |  83 +++-------------
>   9 files changed, 43 insertions(+), 251 deletions(-)
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/6] Switch arm64 over to qrwlock
  2017-10-05 22:12   ` Jeremy Linton
@ 2017-10-06  8:39     ` Will Deacon
  -1 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-06  8:39 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: linux-kernel, linux-arm-kernel, peterz, mingo, longman,
	boqun.feng, paulmck

Hi Jeremy,

On Thu, Oct 05, 2017 at 05:12:44PM -0500, Jeremy Linton wrote:
> On 10/05/2017 07:54 AM, Will Deacon wrote:
> >This patch series reworks bits of the qrwlock code that it can be used
> >to replace the asm rwlocks currently implemented for arm64. The structure
> >of the series is:
> >
> >   Patches 1-3	: Work WFE into qrwlock using atomic_cond_read_acquire so
> >		  we can avoid busy-waiting.
> >
> >   Patch 4	: Enable qrwlocks for arm64
> >
> >   Patch 5-6	: Ensure writer slowpath fairness. This has a potential
> >		  performance impact on the writer unlock path, so I've
> >		  kept them at the end.
> >
> >The patches apply on top of my other locking cleanups:
> >
> >   http://lkml.kernel.org/r/1507055129-12300-1-git-send-email-will.deacon@arm.com
> >
> >although the conflict with mainline is trivial to resolve without those.
> >The full stack is also pushed here:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
> >
> >All comments (particularly related to testing and performance) welcome!
> 
> I haven't done any perf testing, but the machines continue to boot, and the
> stress-ng test which causes task lock problems with the normal arm64 rwlock
> now appears to run as expected. So, its a good start!

Excellent! Mind if I add your tested-by?

Will

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 0/6] Switch arm64 over to qrwlock
@ 2017-10-06  8:39     ` Will Deacon
  0 siblings, 0 replies; 25+ messages in thread
From: Will Deacon @ 2017-10-06  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeremy,

On Thu, Oct 05, 2017 at 05:12:44PM -0500, Jeremy Linton wrote:
> On 10/05/2017 07:54 AM, Will Deacon wrote:
> >This patch series reworks bits of the qrwlock code that it can be used
> >to replace the asm rwlocks currently implemented for arm64. The structure
> >of the series is:
> >
> >   Patches 1-3	: Work WFE into qrwlock using atomic_cond_read_acquire so
> >		  we can avoid busy-waiting.
> >
> >   Patch 4	: Enable qrwlocks for arm64
> >
> >   Patch 5-6	: Ensure writer slowpath fairness. This has a potential
> >		  performance impact on the writer unlock path, so I've
> >		  kept them at the end.
> >
> >The patches apply on top of my other locking cleanups:
> >
> >   http://lkml.kernel.org/r/1507055129-12300-1-git-send-email-will.deacon at arm.com
> >
> >although the conflict with mainline is trivial to resolve without those.
> >The full stack is also pushed here:
> >
> >   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
> >
> >All comments (particularly related to testing and performance) welcome!
> 
> I haven't done any perf testing, but the machines continue to boot, and the
> stress-ng test which causes task lock problems with the normal arm64 rwlock
> now appears to run as expected. So, its a good start!

Excellent! Mind if I add your tested-by?

Will

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/6] Switch arm64 over to qrwlock
@ 2017-10-10 15:17 Jan Glauber
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Glauber @ 2017-10-10 15:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, Jeremy Linton, Peter Zijlstra,
	Ingo Molnar, longman, boqun.feng, paulmck

2017-10-05 14:54 GMT+02:00 Will Deacon <will.deacon@arm.com>:
>
> Hi all,
>
> This patch series reworks bits of the qrwlock code that it can be used
> to replace the asm rwlocks currently implemented for arm64. The structure
> of the series is:
>
>   Patches 1-3   : Work WFE into qrwlock using atomic_cond_read_acquire so
>                   we can avoid busy-waiting.
>
>   Patch 4       : Enable qrwlocks for arm64
>
>   Patch 5-6     : Ensure writer slowpath fairness. This has a potential
>                   performance impact on the writer unlock path, so I've
>                   kept them at the end.
>
> The patches apply on top of my other locking cleanups:
>
>   http://lkml.kernel.org/r/1507055129-12300-1-git-send-email-will.deacon@arm.com
>
> although the conflict with mainline is trivial to resolve without those.
> The full stack is also pushed here:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
>
> All comments (particularly related to testing and performance) welcome!

Hi Will,

run your patches on ThunderX2.

Old RW locks:

insmod ./locktorture.ko nwriters_stress=56 nreaders_stress=224
torture_type="rw_lock_irq" stat_interval=2

[  558.700042] Writes:  Total: 10127  Max/Min: 0/0   Fail: 0
[  558.700054] Reads :  Total: 764714  Max/Min: 0/0   Fail: 0
[  561.797011] Writes:  Total: 11288  Max/Min: 0/0   Fail: 0
[  561.802518] Reads :  Total: 2104452  Max/Min: 0/0   Fail: 0
[  565.844219] Writes:  Total: 11512  Max/Min: 0/0   Fail: 0
[  565.849710] Reads :  Total: 4277492  Max/Min: 0/0   Fail: 0

Queued RW locks:

[  221.491207] Writes:  Total: 57318  Max/Min: 0/0   Fail: 0
[  221.491219] Reads :  Total: 382979  Max/Min: 0/0   Fail: 0
[  223.507065] Writes:  Total: 83490  Max/Min: 0/0   Fail: 0
[  223.512611] Reads :  Total: 684848  Max/Min: 0/0   Fail: 0
[  225.522937] Writes:  Total: 110012  Max/Min: 0/0   Fail: 0
[  225.528511] Reads :  Total: 968826  Max/Min: 0/0   Fail: 0

So readers are still preferred over writers, but results are _way_
better. Also, with the old implementation
above test hung the machine which does not happen with the queued variant.

If you want you can add:
Tested-by: Jan Glauber <jglauber@cavium.com>

--Jan

> Cheers,
>
> Will
>
> --->8
>
> Will Deacon (6):
>   kernel/locking: Use struct qrwlock instead of struct __qrwlock
>   locking/atomic: Add atomic_cond_read_acquire
>   kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock
>   arm64: locking: Move rwlock implementation over to qrwlocks
>   kernel/locking: Prevent slowpath writers getting held up by fastpath
>   kernel/locking: Remove unused union members from struct qrwlock
>
>  arch/arm64/Kconfig                      |  17 ++++
>  arch/arm64/include/asm/Kbuild           |   1 +
>  arch/arm64/include/asm/spinlock.h       | 164 +-------------------------------
>  arch/arm64/include/asm/spinlock_types.h |   6 +-
>  include/asm-generic/atomic-long.h       |   3 +
>  include/asm-generic/qrwlock.h           |  14 +--
>  include/asm-generic/qrwlock_types.h     |   2 +-
>  include/linux/atomic.h                  |   4 +
>  kernel/locking/qrwlock.c                |  83 +++-------------
>  9 files changed, 43 insertions(+), 251 deletions(-)
>
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-10-10 15:17 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-05 12:54 [PATCH 0/6] Switch arm64 over to qrwlock Will Deacon
2017-10-05 12:54 ` Will Deacon
2017-10-05 12:54 ` [PATCH 1/6] kernel/locking: Use struct qrwlock instead of struct __qrwlock Will Deacon
2017-10-05 12:54   ` Will Deacon
2017-10-05 12:54 ` [PATCH 2/6] locking/atomic: Add atomic_cond_read_acquire Will Deacon
2017-10-05 12:54   ` Will Deacon
2017-10-05 12:54 ` [PATCH 3/6] kernel/locking: Use atomic_cond_read_acquire when spinning in qrwlock Will Deacon
2017-10-05 12:54   ` Will Deacon
2017-10-05 12:54 ` [PATCH 4/6] arm64: locking: Move rwlock implementation over to qrwlocks Will Deacon
2017-10-05 12:54   ` Will Deacon
2017-10-05 12:54 ` [PATCH 5/6] kernel/locking: Prevent slowpath writers getting held up by fastpath Will Deacon
2017-10-05 12:54   ` Will Deacon
2017-10-05 13:56   ` Peter Zijlstra
2017-10-05 13:56     ` Peter Zijlstra
2017-10-05 14:37     ` Waiman Long
2017-10-05 14:37       ` Waiman Long
2017-10-05 14:42     ` Will Deacon
2017-10-05 14:42       ` Will Deacon
2017-10-05 12:54 ` [PATCH 6/6] kernel/locking: Remove unused union members from struct qrwlock Will Deacon
2017-10-05 12:54   ` Will Deacon
2017-10-05 22:12 ` [PATCH 0/6] Switch arm64 over to qrwlock Jeremy Linton
2017-10-05 22:12   ` Jeremy Linton
2017-10-06  8:39   ` Will Deacon
2017-10-06  8:39     ` Will Deacon
2017-10-10 15:17 Jan Glauber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.