linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field
@ 2016-05-18  1:26 Waiman Long
  2016-05-18  1:26 ` [PATCH v4 1/5] " Waiman Long
                   ` (5 more replies)
  0 siblings, 6 replies; 31+ messages in thread
From: Waiman Long @ 2016-05-18  1:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Davidlohr Bueso, Jason Low, Dave Chinner,
	Peter Hurley, Paul E. McKenney, Scott J Norton, Douglas Hatch,
	Waiman Long

 v3->v4:
  - Add a new patch 2 to use WRITE_ONCE() for all rwsem->owner stores
    to prevent store tearing.

 v2->v3:
  - Make minor code changes as suggested by PeterZ & Peter Hurley.
  - Add 2 minor patches (#2 & #3) to improve the rwsem code
  - Add a 4th patch to streamline the rwsem_optimistic_spin() code.

 v1->v2:
  - Add rwsem_is_reader_owned() helper & rename rwsem_reader_owned()
    to rwsem_set_reader_owned().
  - Add more comments to clarify the purpose of some of the code
    changes.

Patch 1 is the main patch of this series.

Patch 2 protects against store tearing of rwsem->owner field which 
can cause problem when a reader tries to dereference it.

Patch 3 eliminates redundant wakeup caused by a reader waking itself.

Patch 4 improves the efficiency of the reader wakeup code.

Patch 5 streamlines the rwsem_optimistic_spin() to make it simpler.

Waiman Long (5):
  locking/rwsem: Add reader-owned state to the owner field
  locking/rwsem: Protect all writes to owner by WRITE_ONCE()
  locking/rwsem: Don't wake up one's own task
  locking/rwsem: Improve reader wakeup code
  locking/rwsem: Streamline the rwsem_optimistic_spin() code

 kernel/locking/rwsem-xadd.c |   75 ++++++++++++++++++++++++------------------
 kernel/locking/rwsem.c      |    8 +++-
 kernel/locking/rwsem.h      |   52 ++++++++++++++++++++++++++++-
 3 files changed, 99 insertions(+), 36 deletions(-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v4 1/5] locking/rwsem: Add reader-owned state to the owner field
  2016-05-18  1:26 [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field Waiman Long
@ 2016-05-18  1:26 ` Waiman Long
  2016-06-06 17:18   ` Davidlohr Bueso
  2016-06-08 14:25   ` [tip:locking/core] " tip-bot for Waiman Long
  2016-05-18  1:26 ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE() Waiman Long
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 31+ messages in thread
From: Waiman Long @ 2016-05-18  1:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Davidlohr Bueso, Jason Low, Dave Chinner,
	Peter Hurley, Paul E. McKenney, Scott J Norton, Douglas Hatch,
	Waiman Long

Currently, it is not possible to determine for sure if a reader
owns a rwsem by looking at the content of the rwsem data structure.
This patch adds a new state RWSEM_READER_OWNED to the owner field
to indicate that readers currently own the lock. This enables us to
address the following 2 issues in the rwsem optimistic spinning code:

 1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
    the owner field is NULL which can mean either the readers own
    the lock or the owning writer hasn't set the owner field yet.
    In the latter case, we miss the chance to do optimistic spinning.

 2) While a writer is waiting in the OSQ and a reader takes the lock,
    the writer will continue to spin when out of the OSQ in the main
    rwsem_optimistic_spin() loop as the owner field is NULL wasting
    CPU cycles if some of readers are sleeping.

Adding the new state will allow optimistic spinning to go forward as
long as the owner field is not RWSEM_READER_OWNED and the owner is
running, if set, but stop immediately when that state has been reached.

On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
fio test with multithreaded randrw and randwrite tests on the same
file on a XFS partition on top of a NVDIMM were run, the aggregated
bandwidths before and after the patch were as follows:

  Test      BW before patch     BW after patch  % change
  ----      ---------------     --------------  --------
  randrw         988 MB/s          1192 MB/s      +21%
  randwrite     1513 MB/s          1623 MB/s      +7.3%

The perf profile of the rwsem_down_write_failed() function in randrw
before and after the patch were:

   19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
   14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed

The actual CPU cycles spend in rwsem_down_write_failed() dropped from
5.88% to 1.52% after the patch.

The xfstests was also run and no regression was observed.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Acked-by: Jason Low <jason.low2@hp.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
---
 kernel/locking/rwsem-xadd.c |   41 ++++++++++++++++++++++-------------------
 kernel/locking/rwsem.c      |    8 ++++++--
 kernel/locking/rwsem.h      |   41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09e30c6..c278f5a 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -155,6 +155,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 			/* Last active locker left. Retry waking readers. */
 			goto try_reader_grant;
 		}
+		/*
+		 * It is not really necessary to set it to reader-owned here,
+		 * but it gives the spinners an early indication that the
+		 * readers now have the lock.
+		 */
+		rwsem_set_reader_owned(sem);
 	}
 
 	/* Grant an infinite number of read locks to the readers at the front
@@ -306,16 +312,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 
 	rcu_read_lock();
 	owner = READ_ONCE(sem->owner);
-	if (!owner) {
-		long count = READ_ONCE(sem->count);
+	if (!rwsem_owner_is_writer(owner)) {
 		/*
-		 * If sem->owner is not set, yet we have just recently entered the
-		 * slowpath with the lock being active, then there is a possibility
-		 * reader(s) may have the lock. To be safe, bail spinning in these
-		 * situations.
+		 * Don't spin if the rwsem is readers owned.
 		 */
-		if (count & RWSEM_ACTIVE_MASK)
-			ret = false;
+		ret = !rwsem_owner_is_reader(owner);
 		goto done;
 	}
 
@@ -328,8 +329,6 @@ done:
 static noinline
 bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 {
-	long count;
-
 	rcu_read_lock();
 	while (sem->owner == owner) {
 		/*
@@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 	}
 	rcu_read_unlock();
 
-	if (READ_ONCE(sem->owner))
-		return true; /* new owner, continue spinning */
-
 	/*
-	 * When the owner is not set, the lock could be free or
-	 * held by readers. Check the counter to verify the
-	 * state.
+	 * If there is a new owner or the owner is not set, we continue
+	 * spinning.
 	 */
-	count = READ_ONCE(sem->count);
-	return (count == 0 || count == RWSEM_WAITING_BIAS);
+	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
@@ -378,7 +372,16 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 
 	while (true) {
 		owner = READ_ONCE(sem->owner);
-		if (owner && !rwsem_spin_on_owner(sem, owner))
+		/*
+		 * Don't spin if
+		 * 1) the owner is a reader as we we can't determine if the
+		 *    reader is actively running or not.
+		 * 2) The rwsem_spin_on_owner() returns false which means
+		 *    the owner isn't running.
+		 */
+		if (rwsem_owner_is_reader(owner) ||
+		   (rwsem_owner_is_writer(owner) &&
+		   !rwsem_spin_on_owner(sem, owner)))
 			break;
 
 		/* wait_lock will be acquired if write_lock is obtained */
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index c817216..5838f56 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -22,6 +22,7 @@ void __sched down_read(struct rw_semaphore *sem)
 	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read);
@@ -33,8 +34,10 @@ int down_read_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_read_trylock(sem);
 
-	if (ret == 1)
+	if (ret == 1) {
 		rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
+		rwsem_set_reader_owned(sem);
+	}
 	return ret;
 }
 
@@ -124,7 +127,7 @@ void downgrade_write(struct rw_semaphore *sem)
 	 * lockdep: a downgraded write will live on as a write
 	 * dependency.
 	 */
-	rwsem_clear_owner(sem);
+	rwsem_set_reader_owned(sem);
 	__downgrade_write(sem);
 }
 
@@ -138,6 +141,7 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read_nested);
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 870ed9a..8f43ba2 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -1,3 +1,20 @@
+/*
+ * The owner field of the rw_semaphore structure will be set to
+ * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
+ * the owner field when it unlocks. A reader, on the other hand, will
+ * not touch the owner field when it unlocks.
+ *
+ * In essence, the owner field now has the following 3 states:
+ *  1) 0
+ *     - lock is free or the owner hasn't set the field yet
+ *  2) RWSEM_READER_OWNED
+ *     - lock is currently or previously owned by readers (lock is free
+ *       or not set by owner yet)
+ *  3) Other non-zero value
+ *     - a writer owns the lock
+ */
+#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
+
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
@@ -9,6 +26,26 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 	sem->owner = NULL;
 }
 
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+	/*
+	 * We check the owner value first to make sure that we will only
+	 * do a write to the rwsem cacheline when it is really necessary
+	 * to minimize cacheline contention.
+	 */
+	if (sem->owner != RWSEM_READER_OWNED)
+		sem->owner = RWSEM_READER_OWNED;
+}
+
+static inline bool rwsem_owner_is_writer(struct task_struct *owner)
+{
+	return owner && owner != RWSEM_READER_OWNED;
+}
+
+static inline bool rwsem_owner_is_reader(struct task_struct *owner)
+{
+	return owner == RWSEM_READER_OWNED;
+}
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
@@ -17,4 +54,8 @@ static inline void rwsem_set_owner(struct rw_semaphore *sem)
 static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 {
 }
+
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+}
 #endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE()
  2016-05-18  1:26 [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field Waiman Long
  2016-05-18  1:26 ` [PATCH v4 1/5] " Waiman Long
@ 2016-05-18  1:26 ` Waiman Long
  2016-05-18 14:04   ` Davidlohr Bueso
  2016-06-08 14:25   ` [tip:locking/core] " tip-bot for Waiman Long
  2016-05-18  1:26 ` [PATCH v4 3/5] locking/rwsem: Don't wake up one's own task Waiman Long
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 31+ messages in thread
From: Waiman Long @ 2016-05-18  1:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Davidlohr Bueso, Jason Low, Dave Chinner,
	Peter Hurley, Paul E. McKenney, Scott J Norton, Douglas Hatch,
	Waiman Long

Without using WRITE_ONCE(), the compiler can potentially break a
write into multiple smaller ones (store tearing). So a read from the
same data by another task concurrently may return a partial result.
This can result in a kernel crash if the data is a memory address
that is being dereferenced.

This patch changes all write to rwsem->owner to use WRITE_ONCE()
to make sure that store tearing will not happen. READ_ONCE() may
not be needed for rwsem->owner as long as the value is only used for
comparison and not dereferencing.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 kernel/locking/rwsem.h |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 8f43ba2..a699f40 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -16,14 +16,21 @@
 #define RWSEM_READER_OWNED	((struct task_struct *)1UL)
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * All writes to owner are protected by WRITE_ONCE() to make sure that
+ * store tearing can't happen as optimistic spinners may read and use
+ * the owner value concurrently without lock. Read from owner, however,
+ * may not need READ_ONCE() as long as the pointer value is only used
+ * for comparison and isn't being dereferenced.
+ */
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
-	sem->owner = current;
+	WRITE_ONCE(sem->owner, current);
 }
 
 static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 {
-	sem->owner = NULL;
+	WRITE_ONCE(sem->owner, NULL);
 }
 
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
@@ -34,7 +41,7 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 	 * to minimize cacheline contention.
 	 */
 	if (sem->owner != RWSEM_READER_OWNED)
-		sem->owner = RWSEM_READER_OWNED;
+		WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
 }
 
 static inline bool rwsem_owner_is_writer(struct task_struct *owner)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 3/5] locking/rwsem: Don't wake up one's own task
  2016-05-18  1:26 [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field Waiman Long
  2016-05-18  1:26 ` [PATCH v4 1/5] " Waiman Long
  2016-05-18  1:26 ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE() Waiman Long
@ 2016-05-18  1:26 ` Waiman Long
  2016-05-18 10:30   ` Peter Zijlstra
  2016-05-18  1:26 ` [PATCH v4 4/5] locking/rwsem: Improve reader wakeup code Waiman Long
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 31+ messages in thread
From: Waiman Long @ 2016-05-18  1:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Davidlohr Bueso, Jason Low, Dave Chinner,
	Peter Hurley, Paul E. McKenney, Scott J Norton, Douglas Hatch,
	Waiman Long

As rwsem_down_read_failed() will queue itself and potentially call
__rwsem_do_wake(sem, RWSEM_WAKE_ANY), it is possible that a reader
will try to wake up its own task. This patch adds a check to make
sure that this won't happen.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
---
 kernel/locking/rwsem-xadd.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index c278f5a..007814f 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -202,7 +202,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
 		 */
 		smp_mb();
 		waiter->task = NULL;
-		wake_up_process(tsk);
+		if (tsk != current)
+			wake_up_process(tsk);
 		put_task_struct(tsk);
 	} while (--loop);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 4/5] locking/rwsem: Improve reader wakeup code
  2016-05-18  1:26 [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field Waiman Long
                   ` (2 preceding siblings ...)
  2016-05-18  1:26 ` [PATCH v4 3/5] locking/rwsem: Don't wake up one's own task Waiman Long
@ 2016-05-18  1:26 ` Waiman Long
  2016-06-08 14:25   ` [tip:locking/core] " tip-bot for Waiman Long
  2016-05-18  1:26 ` [PATCH v4 5/5] locking/rwsem: Streamline the rwsem_optimistic_spin() code Waiman Long
  2016-05-18 10:52 ` [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field Peter Zijlstra
  5 siblings, 1 reply; 31+ messages in thread
From: Waiman Long @ 2016-05-18  1:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Davidlohr Bueso, Jason Low, Dave Chinner,
	Peter Hurley, Paul E. McKenney, Scott J Norton, Douglas Hatch,
	Waiman Long

In __rwsem_do_wake(), the reader wakeup code will assume a writer
has stolen the lock if the active reader/writer count is not 0.
However, this is not as reliable an indicator as the original
"< RWSEM_WAITING_BIAS" check. If another reader is present, the code
will still break out and exit even if the writer is gone. This patch
changes it to check the same "< RWSEM_WAITING_BIAS" condition to
reduce the chance of false positive.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
---
 kernel/locking/rwsem-xadd.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 007814f..e3a7e06 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -148,9 +148,14 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type)
  try_reader_grant:
 		oldcount = rwsem_atomic_update(adjustment, sem) - adjustment;
 		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
-			/* A writer stole the lock. Undo our reader grant. */
-			if (rwsem_atomic_update(-adjustment, sem) &
-						RWSEM_ACTIVE_MASK)
+			/*
+			 * If the count is still less than RWSEM_WAITING_BIAS
+			 * after removing the adjustment, it is assumed that
+			 * a writer has stolen the lock. We have to undo our
+			 * reader grant.
+			 */
+			if (rwsem_atomic_update(-adjustment, sem)
+			    < RWSEM_WAITING_BIAS)
 				goto out;
 			/* Last active locker left. Retry waking readers. */
 			goto try_reader_grant;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 5/5] locking/rwsem: Streamline the rwsem_optimistic_spin() code
  2016-05-18  1:26 [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field Waiman Long
                   ` (3 preceding siblings ...)
  2016-05-18  1:26 ` [PATCH v4 4/5] locking/rwsem: Improve reader wakeup code Waiman Long
@ 2016-05-18  1:26 ` Waiman Long
  2016-06-08 14:26   ` [tip:locking/core] " tip-bot for Waiman Long
  2016-05-18 10:52 ` [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field Peter Zijlstra
  5 siblings, 1 reply; 31+ messages in thread
From: Waiman Long @ 2016-05-18  1:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Davidlohr Bueso, Jason Low, Dave Chinner,
	Peter Hurley, Paul E. McKenney, Scott J Norton, Douglas Hatch,
	Waiman Long

This patch moves the owner loading and checking code entirely inside of
rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin()
loop.

Suggested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
---
 kernel/locking/rwsem-xadd.c |   38 ++++++++++++++++++++------------------
 1 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index e3a7e06..a85a2bd 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -332,9 +332,16 @@ done:
 	return ret;
 }
 
-static noinline
-bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
+/*
+ * Return true only if we can still spin on the owner field of the rwsem.
+ */
+static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
+	struct task_struct *owner = READ_ONCE(sem->owner);
+
+	if (!rwsem_owner_is_writer(owner))
+		goto out;
+
 	rcu_read_lock();
 	while (sem->owner == owner) {
 		/*
@@ -354,7 +361,7 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 		cpu_relax_lowlatency();
 	}
 	rcu_read_unlock();
-
+out:
 	/*
 	 * If there is a new owner or the owner is not set, we continue
 	 * spinning.
@@ -364,7 +371,6 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 {
-	struct task_struct *owner;
 	bool taken = false;
 
 	preempt_disable();
@@ -376,21 +382,17 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	if (!osq_lock(&sem->osq))
 		goto done;
 
-	while (true) {
-		owner = READ_ONCE(sem->owner);
+	/*
+	 * Optimistically spin on the owner field and attempt to acquire the
+	 * lock whenever the owner changes. Spinning will be stopped when:
+	 *  1) the owning writer isn't running; or
+	 *  2) readers own the lock as we can't determine if they are
+	 *     actively running or not.
+	 */
+	while (rwsem_spin_on_owner(sem)) {
 		/*
-		 * Don't spin if
-		 * 1) the owner is a reader as we we can't determine if the
-		 *    reader is actively running or not.
-		 * 2) The rwsem_spin_on_owner() returns false which means
-		 *    the owner isn't running.
+		 * Try to acquire the lock
 		 */
-		if (rwsem_owner_is_reader(owner) ||
-		   (rwsem_owner_is_writer(owner) &&
-		   !rwsem_spin_on_owner(sem, owner)))
-			break;
-
-		/* wait_lock will be acquired if write_lock is obtained */
 		if (rwsem_try_write_lock_unqueued(sem)) {
 			taken = true;
 			break;
@@ -402,7 +404,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * we're an RT task that will live-lock because we won't let
 		 * the owner complete.
 		 */
-		if (!owner && (need_resched() || rt_task(current)))
+		if (!sem->owner && (need_resched() || rt_task(current)))
 			break;
 
 		/*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/5] locking/rwsem: Don't wake up one's own task
  2016-05-18  1:26 ` [PATCH v4 3/5] locking/rwsem: Don't wake up one's own task Waiman Long
@ 2016-05-18 10:30   ` Peter Zijlstra
  2016-05-18 16:04     ` Waiman Long
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2016-05-18 10:30 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, Davidlohr Bueso, Jason Low,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch

On Tue, May 17, 2016 at 09:26:21PM -0400, Waiman Long wrote:
> As rwsem_down_read_failed() will queue itself and potentially call
> __rwsem_do_wake(sem, RWSEM_WAKE_ANY), it is possible that a reader
> will try to wake up its own task. This patch adds a check to make
> sure that this won't happen.
> 

Yes, this is 'weird', but why are we fixing it at the cost of an extra
branch?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field
  2016-05-18  1:26 [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field Waiman Long
                   ` (4 preceding siblings ...)
  2016-05-18  1:26 ` [PATCH v4 5/5] locking/rwsem: Streamline the rwsem_optimistic_spin() code Waiman Long
@ 2016-05-18 10:52 ` Peter Zijlstra
  5 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2016-05-18 10:52 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ingo Molnar, linux-kernel, Davidlohr Bueso, Jason Low,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch


OK, I frobbed all 3 rwsem patch sets together and hopefully have
something that 'works' ;-)

Please have a look at the result here:

  https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git/log/?h=locking/core 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE()
  2016-05-18  1:26 ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE() Waiman Long
@ 2016-05-18 14:04   ` Davidlohr Bueso
  2016-05-18 17:21     ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE Jason Low
  2016-05-18 17:23     ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE() Jason Low
  2016-06-08 14:25   ` [tip:locking/core] " tip-bot for Waiman Long
  1 sibling, 2 replies; 31+ messages in thread
From: Davidlohr Bueso @ 2016-05-18 14:04 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Jason Low,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch

On Tue, 17 May 2016, Waiman Long wrote:

>Without using WRITE_ONCE(), the compiler can potentially break a
>write into multiple smaller ones (store tearing). So a read from the
>same data by another task concurrently may return a partial result.
>This can result in a kernel crash if the data is a memory address
>that is being dereferenced.
>
>This patch changes all write to rwsem->owner to use WRITE_ONCE()
>to make sure that store tearing will not happen. READ_ONCE() may
>not be needed for rwsem->owner as long as the value is only used for
>comparison and not dereferencing.
>
>Signed-off-by: Waiman Long <Waiman.Long@hpe.com>

Yes, ->owner can obviously be handled locklessly during optimistic
spinning.

Acked-by: Davidlohr Bueso <dave@stgolabs.net>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/5] locking/rwsem: Don't wake up one's own task
  2016-05-18 10:30   ` Peter Zijlstra
@ 2016-05-18 16:04     ` Waiman Long
  0 siblings, 0 replies; 31+ messages in thread
From: Waiman Long @ 2016-05-18 16:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Davidlohr Bueso, Jason Low,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch

On 05/18/2016 06:30 AM, Peter Zijlstra wrote:
> On Tue, May 17, 2016 at 09:26:21PM -0400, Waiman Long wrote:
>> As rwsem_down_read_failed() will queue itself and potentially call
>> __rwsem_do_wake(sem, RWSEM_WAKE_ANY), it is possible that a reader
>> will try to wake up its own task. This patch adds a check to make
>> sure that this won't happen.
>>
> Yes, this is 'weird', but why are we fixing it at the cost of an extra
> branch?

I think I should have put an unlikely tag there. Anyway, the cost of an 
extra branch should be pretty small compared with all the task wakeup 
work that have to be done. If you think this is necessary, I am totally 
fine for scrapping it.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-18 14:04   ` Davidlohr Bueso
@ 2016-05-18 17:21     ` Jason Low
  2016-05-18 18:29       ` Waiman Long
  2016-05-18 17:23     ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE() Jason Low
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Low @ 2016-05-18 17:21 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch, jason.low2

On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote:
> On Tue, 17 May 2016, Waiman Long wrote:
> 
> >Without using WRITE_ONCE(), the compiler can potentially break a
> >write into multiple smaller ones (store tearing). So a read from the
> >same data by another task concurrently may return a partial result.
> >This can result in a kernel crash if the data is a memory address
> >that is being dereferenced.
> >
> >This patch changes all write to rwsem->owner to use WRITE_ONCE()
> >to make sure that store tearing will not happen. READ_ONCE() may
> >not be needed for rwsem->owner as long as the value is only used for
> >comparison and not dereferencing.

It might be okay to leave out READ_ONCE() for reading rwsem->owner, but
couldn't we include it to at least document that we're performing a
"special" lockless read?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE()
  2016-05-18 14:04   ` Davidlohr Bueso
  2016-05-18 17:21     ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE Jason Low
@ 2016-05-18 17:23     ` Jason Low
  1 sibling, 0 replies; 31+ messages in thread
From: Jason Low @ 2016-05-18 17:23 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Waiman Long, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch, jason.low2

On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote:
> On Tue, 17 May 2016, Waiman Long wrote:
> 
> >Without using WRITE_ONCE(), the compiler can potentially break a
> >write into multiple smaller ones (store tearing). So a read from the
> >same data by another task concurrently may return a partial result.
> >This can result in a kernel crash if the data is a memory address
> >that is being dereferenced.
> >
> >This patch changes all write to rwsem->owner to use WRITE_ONCE()
> >to make sure that store tearing will not happen. READ_ONCE() may
> >not be needed for rwsem->owner as long as the value is only used for
> >comparison and not dereferencing.
> >
> >Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
> 
> Yes, ->owner can obviously be handled locklessly during optimistic
> spinning.
> 
> Acked-by: Davidlohr Bueso <dave@stgolabs.net>

Acked-by: Jason Low <jason.low2@hpe.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-18 17:21     ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE Jason Low
@ 2016-05-18 18:29       ` Waiman Long
  2016-05-18 19:58         ` Jason Low
  0 siblings, 1 reply; 31+ messages in thread
From: Waiman Long @ 2016-05-18 18:29 UTC (permalink / raw)
  To: Jason Low
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch, jason.low2

On 05/18/2016 01:21 PM, Jason Low wrote:
> On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote:
>> On Tue, 17 May 2016, Waiman Long wrote:
>>
>>> Without using WRITE_ONCE(), the compiler can potentially break a
>>> write into multiple smaller ones (store tearing). So a read from the
>>> same data by another task concurrently may return a partial result.
>>> This can result in a kernel crash if the data is a memory address
>>> that is being dereferenced.
>>>
>>> This patch changes all write to rwsem->owner to use WRITE_ONCE()
>>> to make sure that store tearing will not happen. READ_ONCE() may
>>> not be needed for rwsem->owner as long as the value is only used for
>>> comparison and not dereferencing.
> It might be okay to leave out READ_ONCE() for reading rwsem->owner, but
> couldn't we include it to at least document that we're performing a
> "special" lockless read?
>

Using READ_ONCE() does have a bit of cost as it limits compiler 
optimization. If we changes all access to rwsem->owner to READ_ONCE() 
and WRITE_ONCE(), we may as well change its type to volatile and be done 
with. I am not against doing that, but it feels a bit over-reach for me. 
On the other hand, we may define a do-nothing macro that designates the 
owner as a special variable for documentation purpose, but don't need 
protection at that particular call site.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-18 18:29       ` Waiman Long
@ 2016-05-18 19:58         ` Jason Low
  2016-05-19 22:21           ` Jason Low
  2016-05-21 16:04           ` Peter Hurley
  0 siblings, 2 replies; 31+ messages in thread
From: Jason Low @ 2016-05-18 19:58 UTC (permalink / raw)
  To: Waiman Long
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch, jason.low2

On Wed, 2016-05-18 at 14:29 -0400, Waiman Long wrote:
> On 05/18/2016 01:21 PM, Jason Low wrote:
> > On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote:
> >> On Tue, 17 May 2016, Waiman Long wrote:
> >>
> >>> Without using WRITE_ONCE(), the compiler can potentially break a
> >>> write into multiple smaller ones (store tearing). So a read from the
> >>> same data by another task concurrently may return a partial result.
> >>> This can result in a kernel crash if the data is a memory address
> >>> that is being dereferenced.
> >>>
> >>> This patch changes all write to rwsem->owner to use WRITE_ONCE()
> >>> to make sure that store tearing will not happen. READ_ONCE() may
> >>> not be needed for rwsem->owner as long as the value is only used for
> >>> comparison and not dereferencing.
> > It might be okay to leave out READ_ONCE() for reading rwsem->owner, but
> > couldn't we include it to at least document that we're performing a
> > "special" lockless read?
> >
> 
> Using READ_ONCE() does have a bit of cost as it limits compiler 
> optimization. If we changes all access to rwsem->owner to READ_ONCE() 
> and WRITE_ONCE(), we may as well change its type to volatile and be done 
> with.

Right, although there are still places like the init function where
WRITE_ONCE isn't necessary.

> I am not against doing that, but it feels a bit over-reach for me. 
> On the other hand, we may define a do-nothing macro that designates the 
> owner as a special variable for documentation purpose, but don't need 
> protection at that particular call site.

It should be fine to use the standard READ_ONCE here, even if it's just
for documentation, as it's probably not going to cost anything in
practice. It would be better to avoid adding any special macros for this
which may just add more complexity.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-18 19:58         ` Jason Low
@ 2016-05-19 22:21           ` Jason Low
  2016-05-20 20:26             ` Waiman Long
  2016-05-21 16:04           ` Peter Hurley
  1 sibling, 1 reply; 31+ messages in thread
From: Jason Low @ 2016-05-19 22:21 UTC (permalink / raw)
  To: Waiman Long
  Cc: jason.low2, Davidlohr Bueso, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Paul E. McKenney, Scott J Norton, Douglas Hatch

On Wed, 2016-05-18 at 12:58 -0700, Jason Low wrote:
> On Wed, 2016-05-18 at 14:29 -0400, Waiman Long wrote:
> > On 05/18/2016 01:21 PM, Jason Low wrote:
> > > On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote:
> > >> On Tue, 17 May 2016, Waiman Long wrote:
> > >>
> > >>> Without using WRITE_ONCE(), the compiler can potentially break a
> > >>> write into multiple smaller ones (store tearing). So a read from the
> > >>> same data by another task concurrently may return a partial result.
> > >>> This can result in a kernel crash if the data is a memory address
> > >>> that is being dereferenced.
> > >>>
> > >>> This patch changes all write to rwsem->owner to use WRITE_ONCE()
> > >>> to make sure that store tearing will not happen. READ_ONCE() may
> > >>> not be needed for rwsem->owner as long as the value is only used for
> > >>> comparison and not dereferencing.
> > > It might be okay to leave out READ_ONCE() for reading rwsem->owner, but
> > > couldn't we include it to at least document that we're performing a
> > > "special" lockless read?
> > >
> > 
> > Using READ_ONCE() does have a bit of cost as it limits compiler 
> > optimization. If we changes all access to rwsem->owner to READ_ONCE() 
> > and WRITE_ONCE(), we may as well change its type to volatile and be done 
> > with.
> 
> Right, although there are still places like the init function where
> WRITE_ONCE isn't necessary.
> 
> > I am not against doing that, but it feels a bit over-reach for me. 
> > On the other hand, we may define a do-nothing macro that designates the 
> > owner as a special variable for documentation purpose, but don't need 
> > protection at that particular call site.
> 
> It should be fine to use the standard READ_ONCE here, even if it's just
> for documentation, as it's probably not going to cost anything in
> practice. It would be better to avoid adding any special macros for this
> which may just add more complexity.

By the way, this potential "partial write" issue may also apply to
mutexes as well, so we should also make a similar change to
mutex_set_owner() and mutex_clear_owner().

Jason

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-19 22:21           ` Jason Low
@ 2016-05-20 20:26             ` Waiman Long
  0 siblings, 0 replies; 31+ messages in thread
From: Waiman Long @ 2016-05-20 20:26 UTC (permalink / raw)
  To: Jason Low
  Cc: jason.low2, Davidlohr Bueso, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Paul E. McKenney, Scott J Norton, Douglas Hatch

On 05/19/2016 06:21 PM, Jason Low wrote:
> On Wed, 2016-05-18 at 12:58 -0700, Jason Low wrote:
>> On Wed, 2016-05-18 at 14:29 -0400, Waiman Long wrote:
>>> On 05/18/2016 01:21 PM, Jason Low wrote:
>>>> On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote:
>>>>> On Tue, 17 May 2016, Waiman Long wrote:
>>>>>
>>>>>> Without using WRITE_ONCE(), the compiler can potentially break a
>>>>>> write into multiple smaller ones (store tearing). So a read from the
>>>>>> same data by another task concurrently may return a partial result.
>>>>>> This can result in a kernel crash if the data is a memory address
>>>>>> that is being dereferenced.
>>>>>>
>>>>>> This patch changes all write to rwsem->owner to use WRITE_ONCE()
>>>>>> to make sure that store tearing will not happen. READ_ONCE() may
>>>>>> not be needed for rwsem->owner as long as the value is only used for
>>>>>> comparison and not dereferencing.
>>>> It might be okay to leave out READ_ONCE() for reading rwsem->owner, but
>>>> couldn't we include it to at least document that we're performing a
>>>> "special" lockless read?
>>>>
>>> Using READ_ONCE() does have a bit of cost as it limits compiler
>>> optimization. If we changes all access to rwsem->owner to READ_ONCE()
>>> and WRITE_ONCE(), we may as well change its type to volatile and be done
>>> with.
>> Right, although there are still places like the init function where
>> WRITE_ONCE isn't necessary.
>>
>>> I am not against doing that, but it feels a bit over-reach for me.
>>> On the other hand, we may define a do-nothing macro that designates the
>>> owner as a special variable for documentation purpose, but don't need
>>> protection at that particular call site.
>> It should be fine to use the standard READ_ONCE here, even if it's just
>> for documentation, as it's probably not going to cost anything in
>> practice. It would be better to avoid adding any special macros for this
>> which may just add more complexity.
> By the way, this potential "partial write" issue may also apply to
> mutexes as well, so we should also make a similar change to
> mutex_set_owner() and mutex_clear_owner().
>
> Jason
>
  Yes, I am aware of that. I just don't have the time to to do a mutex 
patch yet. As you have sent out a patch on that, this is now covered.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-18 19:58         ` Jason Low
  2016-05-19 22:21           ` Jason Low
@ 2016-05-21 16:04           ` Peter Hurley
  2016-05-22 10:42             ` Peter Zijlstra
  2016-05-23 18:46             ` Jason Low
  1 sibling, 2 replies; 31+ messages in thread
From: Peter Hurley @ 2016-05-21 16:04 UTC (permalink / raw)
  To: Jason Low, Waiman Long
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Dave Chinner, Paul E. McKenney, Scott J Norton, Douglas Hatch,
	jason.low2

On 05/18/2016 12:58 PM, Jason Low wrote:
> On Wed, 2016-05-18 at 14:29 -0400, Waiman Long wrote:
>> On 05/18/2016 01:21 PM, Jason Low wrote:
>>> On Wed, 2016-05-18 at 07:04 -0700, Davidlohr Bueso wrote:
>>>> On Tue, 17 May 2016, Waiman Long wrote:
>>>>
>>>>> Without using WRITE_ONCE(), the compiler can potentially break a
>>>>> write into multiple smaller ones (store tearing). So a read from the
>>>>> same data by another task concurrently may return a partial result.
>>>>> This can result in a kernel crash if the data is a memory address
>>>>> that is being dereferenced.
>>>>>
>>>>> This patch changes all write to rwsem->owner to use WRITE_ONCE()
>>>>> to make sure that store tearing will not happen. READ_ONCE() may
>>>>> not be needed for rwsem->owner as long as the value is only used for
>>>>> comparison and not dereferencing.
>>> It might be okay to leave out READ_ONCE() for reading rwsem->owner, but
>>> couldn't we include it to at least document that we're performing a
>>> "special" lockless read?
>>>
>>
>> Using READ_ONCE() does have a bit of cost as it limits compiler 
>> optimization. If we changes all access to rwsem->owner to READ_ONCE() 
>> and WRITE_ONCE(), we may as well change its type to volatile and be done 
>> with.
> 
> Right, although there are still places like the init function where
> WRITE_ONCE isn't necessary.

Which doesn't cost anything either.


>> I am not against doing that, but it feels a bit over-reach for me. 
>> On the other hand, we may define a do-nothing macro that designates the 
>> owner as a special variable for documentation purpose, but don't need 
>> protection at that particular call site.
> 
> It should be fine to use the standard READ_ONCE here, even if it's just
> for documentation, as it's probably not going to cost anything in
> practice. It would be better to avoid adding any special macros for this
> which may just add more complexity.

See, I don't understand this line of reasoning at all.

I read this as "it's ok to be non-optimal here where were spinning CPU
time but not ok to be non-optimal generally elsewhere where it's
way less important like at init time".

And by the way, it's not just "here" but _everywhere_.
What about reading ->on_cpu locklessly?

Sure it's a bool, but doesn't the "we need to document lockless access"
argument equally apply here?

Regards,
Peter Hurley

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-21 16:04           ` Peter Hurley
@ 2016-05-22 10:42             ` Peter Zijlstra
  2016-05-23 18:46             ` Jason Low
  1 sibling, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2016-05-22 10:42 UTC (permalink / raw)
  To: Peter Hurley
  Cc: Jason Low, Waiman Long, Davidlohr Bueso, Ingo Molnar,
	linux-kernel, Dave Chinner, Paul E. McKenney, Scott J Norton,
	Douglas Hatch, jason.low2

On Sat, May 21, 2016 at 09:04:14AM -0700, Peter Hurley wrote:
> And by the way, it's not just "here" but _everywhere_.
> What about reading ->on_cpu locklessly?

In the smp_cond_acquire() rework that I have pending that one is
actually 'fixed'.

But yeah, the whole load/store tearing thing is giant pain inflicted
upon us by C language and GCC people :/

And the problem is that the 'regression' is now so old it doesn't matter
they fix it or not, we have to support compilers that think its fine to
generates tears :-(

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-21 16:04           ` Peter Hurley
  2016-05-22 10:42             ` Peter Zijlstra
@ 2016-05-23 18:46             ` Jason Low
  2016-05-23 19:44               ` Davidlohr Bueso
  2016-05-25  1:25               ` Waiman Long
  1 sibling, 2 replies; 31+ messages in thread
From: Jason Low @ 2016-05-23 18:46 UTC (permalink / raw)
  To: Peter Hurley
  Cc: jason.low2, Waiman Long, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, linux-kernel, Dave Chinner, Paul E. McKenney,
	Scott J Norton, Douglas Hatch

On Sat, 2016-05-21 at 09:04 -0700, Peter Hurley wrote:
> On 05/18/2016 12:58 PM, Jason Low wrote:
> > It should be fine to use the standard READ_ONCE here, even if it's just
> > for documentation, as it's probably not going to cost anything in
> > practice. It would be better to avoid adding any special macros for this
> > which may just add more complexity.
> 
> See, I don't understand this line of reasoning at all.
> 
> I read this as "it's ok to be non-optimal here where were spinning CPU
> time but not ok to be non-optimal generally elsewhere where it's
> way less important like at init time".

So I think there is a difference between using it during init time and
using it here where we're spinning. During init time, initializing the
owner field locklessly is normal. No other thread should be concurrently
be writing to the field, since the structure is just getting
initialized, so there are no surprises there.

Our access of the owner field in this function is special in that we're
using a bit of "lockless magic" to read and write to a field that gets
concurrently accessed without any serialization. Since we're not taking
the wait_lock in a scenario where we'd normally would take a lock, it
would be good to have this documented.

> And by the way, it's not just "here" but _everywhere_.
> What about reading ->on_cpu locklessly?

Sure, we could also use READ_ONCE when reading ->on_cpu  :)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-23 18:46             ` Jason Low
@ 2016-05-23 19:44               ` Davidlohr Bueso
  2016-05-23 20:15                 ` Paul E. McKenney
  2016-05-25  1:25               ` Waiman Long
  1 sibling, 1 reply; 31+ messages in thread
From: Davidlohr Bueso @ 2016-05-23 19:44 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Hurley, jason.low2, Waiman Long, Peter Zijlstra,
	Ingo Molnar, linux-kernel, Dave Chinner, Paul E. McKenney,
	Scott J Norton, Douglas Hatch

On Mon, 23 May 2016, Jason Low wrote:

>On Sat, 2016-05-21 at 09:04 -0700, Peter Hurley wrote:
>> On 05/18/2016 12:58 PM, Jason Low wrote:
>> > It should be fine to use the standard READ_ONCE here, even if it's just
>> > for documentation, as it's probably not going to cost anything in
>> > practice. It would be better to avoid adding any special macros for this
>> > which may just add more complexity.
>>
>> See, I don't understand this line of reasoning at all.
>>
>> I read this as "it's ok to be non-optimal here where were spinning CPU
>> time but not ok to be non-optimal generally elsewhere where it's
>> way less important like at init time".
>
>So I think there is a difference between using it during init time and
>using it here where we're spinning. During init time, initializing the
>owner field locklessly is normal. No other thread should be concurrently
>be writing to the field, since the structure is just getting
>initialized, so there are no surprises there.
>
>Our access of the owner field in this function is special in that we're
>using a bit of "lockless magic" to read and write to a field that gets
>concurrently accessed without any serialization. Since we're not taking
>the wait_lock in a scenario where we'd normally would take a lock, it
>would be good to have this documented.
>
>> And by the way, it's not just "here" but _everywhere_.
>> What about reading ->on_cpu locklessly?
>
>Sure, we could also use READ_ONCE when reading ->on_cpu  :)

Locking wise we should be covered with ->on_cpu as we're always under rcu_read_lock
(barrier, preempt_disable). But I'm not sure if this rule applies throughout the
scheduler, however, like it does in, say thread_group_cputime(). cpu_clock_sample()
(from posix timers) seems to mix and match being done under rcu. So ultimately I
think you're right.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-23 19:44               ` Davidlohr Bueso
@ 2016-05-23 20:15                 ` Paul E. McKenney
  2016-05-23 21:04                   ` Davidlohr Bueso
  0 siblings, 1 reply; 31+ messages in thread
From: Paul E. McKenney @ 2016-05-23 20:15 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Jason Low, Peter Hurley, jason.low2, Waiman Long, Peter Zijlstra,
	Ingo Molnar, linux-kernel, Dave Chinner, Scott J Norton,
	Douglas Hatch

On Mon, May 23, 2016 at 12:44:16PM -0700, Davidlohr Bueso wrote:
> On Mon, 23 May 2016, Jason Low wrote:
> 
> >On Sat, 2016-05-21 at 09:04 -0700, Peter Hurley wrote:
> >>On 05/18/2016 12:58 PM, Jason Low wrote:
> >>> It should be fine to use the standard READ_ONCE here, even if it's just
> >>> for documentation, as it's probably not going to cost anything in
> >>> practice. It would be better to avoid adding any special macros for this
> >>> which may just add more complexity.
> >>
> >>See, I don't understand this line of reasoning at all.
> >>
> >>I read this as "it's ok to be non-optimal here where were spinning CPU
> >>time but not ok to be non-optimal generally elsewhere where it's
> >>way less important like at init time".
> >
> >So I think there is a difference between using it during init time and
> >using it here where we're spinning. During init time, initializing the
> >owner field locklessly is normal. No other thread should be concurrently
> >be writing to the field, since the structure is just getting
> >initialized, so there are no surprises there.
> >
> >Our access of the owner field in this function is special in that we're
> >using a bit of "lockless magic" to read and write to a field that gets
> >concurrently accessed without any serialization. Since we're not taking
> >the wait_lock in a scenario where we'd normally would take a lock, it
> >would be good to have this documented.
> >
> >>And by the way, it's not just "here" but _everywhere_.
> >>What about reading ->on_cpu locklessly?
> >
> >Sure, we could also use READ_ONCE when reading ->on_cpu  :)
> 
> Locking wise we should be covered with ->on_cpu as we're always under rcu_read_lock
> (barrier, preempt_disable). But I'm not sure if this rule applies throughout the
> scheduler, however, like it does in, say thread_group_cputime(). cpu_clock_sample()
> (from posix timers) seems to mix and match being done under rcu. So ultimately I
> think you're right.

But rcu_read_lock() does not exclude updates, which is one reason why
pointer reads use rcu_dereference() rather than normal assignments.

So I do not believe that rcu_read_lock() is helping you in this case.

That said, it is a bit hard to imagine the compiler tearing a load from
an int...  But compilers have uncovered weaknesses in my imagination
more than once in the past.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-23 20:15                 ` Paul E. McKenney
@ 2016-05-23 21:04                   ` Davidlohr Bueso
  0 siblings, 0 replies; 31+ messages in thread
From: Davidlohr Bueso @ 2016-05-23 21:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Jason Low, Peter Hurley, jason.low2, Waiman Long, Peter Zijlstra,
	Ingo Molnar, linux-kernel, Dave Chinner, Scott J Norton,
	Douglas Hatch

On Mon, 23 May 2016, Paul E. McKenney wrote:

>But rcu_read_lock() does not exclude updates, which is one reason why
>pointer reads use rcu_dereference() rather than normal assignments.

Yes, I was referring to readers. With updates, otoh, are done holding a number of locks.

>
>So I do not believe that rcu_read_lock() is helping you in this case.
>
>That said, it is a bit hard to imagine the compiler tearing a load from
>an int...

Oh, right, good point.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE
  2016-05-23 18:46             ` Jason Low
  2016-05-23 19:44               ` Davidlohr Bueso
@ 2016-05-25  1:25               ` Waiman Long
  1 sibling, 0 replies; 31+ messages in thread
From: Waiman Long @ 2016-05-25  1:25 UTC (permalink / raw)
  To: Jason Low
  Cc: Peter Hurley, jason.low2, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, linux-kernel, Dave Chinner, Paul E. McKenney,
	Scott J Norton, Douglas Hatch

On 05/23/2016 02:46 PM, Jason Low wrote:
> On Sat, 2016-05-21 at 09:04 -0700, Peter Hurley wrote:
>> On 05/18/2016 12:58 PM, Jason Low wrote:
>>> It should be fine to use the standard READ_ONCE here, even if it's just
>>> for documentation, as it's probably not going to cost anything in
>>> practice. It would be better to avoid adding any special macros for this
>>> which may just add more complexity.
>> See, I don't understand this line of reasoning at all.
>>
>> I read this as "it's ok to be non-optimal here where were spinning CPU
>> time but not ok to be non-optimal generally elsewhere where it's
>> way less important like at init time".
> So I think there is a difference between using it during init time and
> using it here where we're spinning. During init time, initializing the
> owner field locklessly is normal. No other thread should be concurrently
> be writing to the field, since the structure is just getting
> initialized, so there are no surprises there.
>
> Our access of the owner field in this function is special in that we're
> using a bit of "lockless magic" to read and write to a field that gets
> concurrently accessed without any serialization. Since we're not taking
> the wait_lock in a scenario where we'd normally would take a lock, it
> would be good to have this documented.
>
>> And by the way, it's not just "here" but _everywhere_.
>> What about reading ->on_cpu locklessly?
> Sure, we could also use READ_ONCE when reading ->on_cpu  :)
>

As on_cpu is just a boolean, load tearing isn't really a problem. You 
either see the bit 0 set or not, but not something in between (not a 
qbit)  :-)

Cheers,
Longman

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/5] locking/rwsem: Add reader-owned state to the owner field
  2016-05-18  1:26 ` [PATCH v4 1/5] " Waiman Long
@ 2016-06-06 17:18   ` Davidlohr Bueso
  2016-06-06 20:03     ` Waiman Long
  2016-06-06 21:02     ` Peter Zijlstra
  2016-06-08 14:25   ` [tip:locking/core] " tip-bot for Waiman Long
  1 sibling, 2 replies; 31+ messages in thread
From: Davidlohr Bueso @ 2016-06-06 17:18 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Jason Low,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch

Hi Ingo -- is there any reason why this series (or at least this particular
patch) was not picked up (being in Peter's queue)? It seems that all the
recent rwsem changes are now in -tip, with the exception of these.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/5] locking/rwsem: Add reader-owned state to the owner field
  2016-06-06 17:18   ` Davidlohr Bueso
@ 2016-06-06 20:03     ` Waiman Long
  2016-06-06 21:02     ` Peter Zijlstra
  1 sibling, 0 replies; 31+ messages in thread
From: Waiman Long @ 2016-06-06 20:03 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Jason Low,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch

On 06/06/2016 01:18 PM, Davidlohr Bueso wrote:
> Hi Ingo -- is there any reason why this series (or at least this 
> particular
> patch) was not picked up (being in Peter's queue)? It seems that all the
> recent rwsem changes are now in -tip, with the exception of these.
>
> Thanks,
> Davidlohr

Yes, I would certainly like to have this patch series tested in -next. 
Given that i_mutex has now become i_rwsem, this patch series will help 
to further narrow the performance gap between mutex and rwsem.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/5] locking/rwsem: Add reader-owned state to the owner field
  2016-06-06 17:18   ` Davidlohr Bueso
  2016-06-06 20:03     ` Waiman Long
@ 2016-06-06 21:02     ` Peter Zijlstra
  2016-06-06 21:49       ` Waiman Long
  1 sibling, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2016-06-06 21:02 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Waiman Long, Ingo Molnar, linux-kernel, Jason Low, Dave Chinner,
	Peter Hurley, Paul E. McKenney, Scott J Norton, Douglas Hatch

On Mon, Jun 06, 2016 at 10:18:16AM -0700, Davidlohr Bueso wrote:
> Hi Ingo -- is there any reason why this series (or at least this particular
> patch) was not picked up (being in Peter's queue)? It seems that all the
> recent rwsem changes are now in -tip, with the exception of these.

I placed it after Jason's atomic_long_t patches which fell through,
which resulted in these patches not applying.

I'll try and get the lot fixed up and such..

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/5] locking/rwsem: Add reader-owned state to the owner field
  2016-06-06 21:02     ` Peter Zijlstra
@ 2016-06-06 21:49       ` Waiman Long
  0 siblings, 0 replies; 31+ messages in thread
From: Waiman Long @ 2016-06-06 21:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Davidlohr Bueso, Ingo Molnar, linux-kernel, Jason Low,
	Dave Chinner, Peter Hurley, Paul E. McKenney, Scott J Norton,
	Douglas Hatch

On 06/06/2016 05:02 PM, Peter Zijlstra wrote:
> On Mon, Jun 06, 2016 at 10:18:16AM -0700, Davidlohr Bueso wrote:
>> Hi Ingo -- is there any reason why this series (or at least this particular
>> patch) was not picked up (being in Peter's queue)? It seems that all the
>> recent rwsem changes are now in -tip, with the exception of these.
> I placed it after Jason's atomic_long_t patches which fell through,
> which resulted in these patches not applying.
>
> I'll try and get the lot fixed up and such..

Good to hear that my rwsem patch will be in your queue soon:-)

Cheers,
Longman

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [tip:locking/core] locking/rwsem: Add reader-owned state to the owner field
  2016-05-18  1:26 ` [PATCH v4 1/5] " Waiman Long
  2016-06-06 17:18   ` Davidlohr Bueso
@ 2016-06-08 14:25   ` tip-bot for Waiman Long
  1 sibling, 0 replies; 31+ messages in thread
From: tip-bot for Waiman Long @ 2016-06-08 14:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: david, Waiman.Long, paulmck, akpm, jason.low2, peterz,
	scott.norton, peter, dave, torvalds, linux-kernel, mingo,
	doug.hatch, tglx, hpa

Commit-ID:  19c5d690e41697fcdd19379ab9d10d8d37818414
Gitweb:     http://git.kernel.org/tip/19c5d690e41697fcdd19379ab9d10d8d37818414
Author:     Waiman Long <Waiman.Long@hpe.com>
AuthorDate: Tue, 17 May 2016 21:26:19 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 8 Jun 2016 15:16:59 +0200

locking/rwsem: Add reader-owned state to the owner field

Currently, it is not possible to determine for sure if a reader
owns a rwsem by looking at the content of the rwsem data structure.
This patch adds a new state RWSEM_READER_OWNED to the owner field
to indicate that readers currently own the lock. This enables us to
address the following 2 issues in the rwsem optimistic spinning code:

 1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
    the owner field is NULL which can mean either the readers own
    the lock or the owning writer hasn't set the owner field yet.
    In the latter case, we miss the chance to do optimistic spinning.

 2) While a writer is waiting in the OSQ and a reader takes the lock,
    the writer will continue to spin when out of the OSQ in the main
    rwsem_optimistic_spin() loop as the owner field is NULL wasting
    CPU cycles if some of readers are sleeping.

Adding the new state will allow optimistic spinning to go forward as
long as the owner field is not RWSEM_READER_OWNED and the owner is
running, if set, but stop immediately when that state has been reached.

On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
fio test with multithreaded randrw and randwrite tests on the same
file on a XFS partition on top of a NVDIMM were run, the aggregated
bandwidths before and after the patch were as follows:

  Test      BW before patch     BW after patch  % change
  ----      ---------------     --------------  --------
  randrw         988 MB/s          1192 MB/s      +21%
  randwrite     1513 MB/s          1623 MB/s      +7.3%

The perf profile of the rwsem_down_write_failed() function in randrw
before and after the patch were:

   19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
   14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed

The actual CPU cycles spend in rwsem_down_write_failed() dropped from
5.88% to 1.52% after the patch.

The xfstests was also run and no regression was observed.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jason Low <jason.low2@hp.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1463534783-38814-2-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rwsem-xadd.c | 41 ++++++++++++++++++++++-------------------
 kernel/locking/rwsem.c      |  8 ++++++--
 kernel/locking/rwsem.h      | 41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 63b40a5..6b0d060 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -163,6 +163,12 @@ __rwsem_mark_wake(struct rw_semaphore *sem,
 			/* Last active locker left. Retry waking readers. */
 			goto try_reader_grant;
 		}
+		/*
+		 * It is not really necessary to set it to reader-owned here,
+		 * but it gives the spinners an early indication that the
+		 * readers now have the lock.
+		 */
+		rwsem_set_reader_owned(sem);
 	}
 
 	/* Grant an infinite number of read locks to the readers at the front
@@ -325,16 +331,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 
 	rcu_read_lock();
 	owner = READ_ONCE(sem->owner);
-	if (!owner) {
-		long count = atomic_long_read(&sem->count);
+	if (!rwsem_owner_is_writer(owner)) {
 		/*
-		 * If sem->owner is not set, yet we have just recently entered the
-		 * slowpath with the lock being active, then there is a possibility
-		 * reader(s) may have the lock. To be safe, bail spinning in these
-		 * situations.
+		 * Don't spin if the rwsem is readers owned.
 		 */
-		if (count & RWSEM_ACTIVE_MASK)
-			ret = false;
+		ret = !rwsem_owner_is_reader(owner);
 		goto done;
 	}
 
@@ -347,8 +348,6 @@ done:
 static noinline
 bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 {
-	long count;
-
 	rcu_read_lock();
 	while (sem->owner == owner) {
 		/*
@@ -369,16 +368,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 	}
 	rcu_read_unlock();
 
-	if (READ_ONCE(sem->owner))
-		return true; /* new owner, continue spinning */
-
 	/*
-	 * When the owner is not set, the lock could be free or
-	 * held by readers. Check the counter to verify the
-	 * state.
+	 * If there is a new owner or the owner is not set, we continue
+	 * spinning.
 	 */
-	count = atomic_long_read(&sem->count);
-	return (count == 0 || count == RWSEM_WAITING_BIAS);
+	return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
@@ -397,7 +391,16 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 
 	while (true) {
 		owner = READ_ONCE(sem->owner);
-		if (owner && !rwsem_spin_on_owner(sem, owner))
+		/*
+		 * Don't spin if
+		 * 1) the owner is a reader as we we can't determine if the
+		 *    reader is actively running or not.
+		 * 2) The rwsem_spin_on_owner() returns false which means
+		 *    the owner isn't running.
+		 */
+		if (rwsem_owner_is_reader(owner) ||
+		   (rwsem_owner_is_writer(owner) &&
+		   !rwsem_spin_on_owner(sem, owner)))
 			break;
 
 		/* wait_lock will be acquired if write_lock is obtained */
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 2e853ad..45ba475 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -22,6 +22,7 @@ void __sched down_read(struct rw_semaphore *sem)
 	rwsem_acquire_read(&sem->dep_map, 0, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read);
@@ -33,8 +34,10 @@ int down_read_trylock(struct rw_semaphore *sem)
 {
 	int ret = __down_read_trylock(sem);
 
-	if (ret == 1)
+	if (ret == 1) {
 		rwsem_acquire_read(&sem->dep_map, 0, 1, _RET_IP_);
+		rwsem_set_reader_owned(sem);
+	}
 	return ret;
 }
 
@@ -124,7 +127,7 @@ void downgrade_write(struct rw_semaphore *sem)
 	 * lockdep: a downgraded write will live on as a write
 	 * dependency.
 	 */
-	rwsem_clear_owner(sem);
+	rwsem_set_reader_owned(sem);
 	__downgrade_write(sem);
 }
 
@@ -138,6 +141,7 @@ void down_read_nested(struct rw_semaphore *sem, int subclass)
 	rwsem_acquire_read(&sem->dep_map, subclass, 0, _RET_IP_);
 
 	LOCK_CONTENDED(sem, __down_read_trylock, __down_read);
+	rwsem_set_reader_owned(sem);
 }
 
 EXPORT_SYMBOL(down_read_nested);
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 870ed9a..8f43ba2 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -1,3 +1,20 @@
+/*
+ * The owner field of the rw_semaphore structure will be set to
+ * RWSEM_READ_OWNED when a reader grabs the lock. A writer will clear
+ * the owner field when it unlocks. A reader, on the other hand, will
+ * not touch the owner field when it unlocks.
+ *
+ * In essence, the owner field now has the following 3 states:
+ *  1) 0
+ *     - lock is free or the owner hasn't set the field yet
+ *  2) RWSEM_READER_OWNED
+ *     - lock is currently or previously owned by readers (lock is free
+ *       or not set by owner yet)
+ *  3) Other non-zero value
+ *     - a writer owns the lock
+ */
+#define RWSEM_READER_OWNED	((struct task_struct *)1UL)
+
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
@@ -9,6 +26,26 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 	sem->owner = NULL;
 }
 
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+	/*
+	 * We check the owner value first to make sure that we will only
+	 * do a write to the rwsem cacheline when it is really necessary
+	 * to minimize cacheline contention.
+	 */
+	if (sem->owner != RWSEM_READER_OWNED)
+		sem->owner = RWSEM_READER_OWNED;
+}
+
+static inline bool rwsem_owner_is_writer(struct task_struct *owner)
+{
+	return owner && owner != RWSEM_READER_OWNED;
+}
+
+static inline bool rwsem_owner_is_reader(struct task_struct *owner)
+{
+	return owner == RWSEM_READER_OWNED;
+}
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
@@ -17,4 +54,8 @@ static inline void rwsem_set_owner(struct rw_semaphore *sem)
 static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 {
 }
+
+static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
+{
+}
 #endif

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip:locking/core] locking/rwsem: Protect all writes to owner by WRITE_ONCE()
  2016-05-18  1:26 ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE() Waiman Long
  2016-05-18 14:04   ` Davidlohr Bueso
@ 2016-06-08 14:25   ` tip-bot for Waiman Long
  1 sibling, 0 replies; 31+ messages in thread
From: tip-bot for Waiman Long @ 2016-06-08 14:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, jason.low2, linux-kernel, doug.hatch, david,
	Waiman.Long, peter, torvalds, peterz, dave, akpm, paulmck,
	scott.norton, hpa

Commit-ID:  fb6a44f33be542fd81575ff93a4e8118d6a58592
Gitweb:     http://git.kernel.org/tip/fb6a44f33be542fd81575ff93a4e8118d6a58592
Author:     Waiman Long <Waiman.Long@hpe.com>
AuthorDate: Tue, 17 May 2016 21:26:20 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 8 Jun 2016 15:16:59 +0200

locking/rwsem: Protect all writes to owner by WRITE_ONCE()

Without using WRITE_ONCE(), the compiler can potentially break a
write into multiple smaller ones (store tearing). So a read from the
same data by another task concurrently may return a partial result.
This can result in a kernel crash if the data is a memory address
that is being dereferenced.

This patch changes all write to rwsem->owner to use WRITE_ONCE()
to make sure that store tearing will not happen. READ_ONCE() may
not be needed for rwsem->owner as long as the value is only used for
comparison and not dereferencing.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1463534783-38814-3-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rwsem.h | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index 8f43ba2..a699f40 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -16,14 +16,21 @@
 #define RWSEM_READER_OWNED	((struct task_struct *)1UL)
 
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
+/*
+ * All writes to owner are protected by WRITE_ONCE() to make sure that
+ * store tearing can't happen as optimistic spinners may read and use
+ * the owner value concurrently without lock. Read from owner, however,
+ * may not need READ_ONCE() as long as the pointer value is only used
+ * for comparison and isn't being dereferenced.
+ */
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
-	sem->owner = current;
+	WRITE_ONCE(sem->owner, current);
 }
 
 static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 {
-	sem->owner = NULL;
+	WRITE_ONCE(sem->owner, NULL);
 }
 
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
@@ -34,7 +41,7 @@ static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 	 * to minimize cacheline contention.
 	 */
 	if (sem->owner != RWSEM_READER_OWNED)
-		sem->owner = RWSEM_READER_OWNED;
+		WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
 }
 
 static inline bool rwsem_owner_is_writer(struct task_struct *owner)

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip:locking/core] locking/rwsem: Improve reader wakeup code
  2016-05-18  1:26 ` [PATCH v4 4/5] locking/rwsem: Improve reader wakeup code Waiman Long
@ 2016-06-08 14:25   ` tip-bot for Waiman Long
  0 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Waiman Long @ 2016-06-08 14:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: doug.hatch, linux-kernel, jason.low2, akpm, Waiman.Long, paulmck,
	scott.norton, david, tglx, mingo, peterz, dave, torvalds, peter,
	hpa

Commit-ID:  bf7b4c472db44413251bcef79ca1f6bf1ec81475
Gitweb:     http://git.kernel.org/tip/bf7b4c472db44413251bcef79ca1f6bf1ec81475
Author:     Waiman Long <Waiman.Long@hpe.com>
AuthorDate: Tue, 17 May 2016 21:26:22 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 8 Jun 2016 15:17:00 +0200

locking/rwsem: Improve reader wakeup code

In __rwsem_do_wake(), the reader wakeup code will assume a writer
has stolen the lock if the active reader/writer count is not 0.
However, this is not as reliable an indicator as the original
"< RWSEM_WAITING_BIAS" check. If another reader is present, the code
will still break out and exit even if the writer is gone. This patch
changes it to check the same "< RWSEM_WAITING_BIAS" condition to
reduce the chance of false positive.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1463534783-38814-5-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rwsem-xadd.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 6b0d060..4f1daf5 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -156,9 +156,14 @@ __rwsem_mark_wake(struct rw_semaphore *sem,
 		oldcount = atomic_long_add_return(adjustment, &sem->count) - adjustment;
 
 		if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
-			/* A writer stole the lock. Undo our reader grant. */
-			if (atomic_long_sub_return(adjustment, &sem->count) &
-						RWSEM_ACTIVE_MASK)
+			/*
+			 * If the count is still less than RWSEM_WAITING_BIAS
+			 * after removing the adjustment, it is assumed that
+			 * a writer has stolen the lock. We have to undo our
+			 * reader grant.
+			 */
+			if (atomic_long_add_return(-adjustment, &sem->count) <
+			    RWSEM_WAITING_BIAS)
 				goto out;
 			/* Last active locker left. Retry waking readers. */
 			goto try_reader_grant;

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [tip:locking/core] locking/rwsem: Streamline the rwsem_optimistic_spin() code
  2016-05-18  1:26 ` [PATCH v4 5/5] locking/rwsem: Streamline the rwsem_optimistic_spin() code Waiman Long
@ 2016-06-08 14:26   ` tip-bot for Waiman Long
  0 siblings, 0 replies; 31+ messages in thread
From: tip-bot for Waiman Long @ 2016-06-08 14:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: scott.norton, tglx, hpa, torvalds, linux-kernel, david,
	jason.low2, dave, akpm, doug.hatch, paulmck, peterz, Waiman.Long,
	peter, mingo

Commit-ID:  ddd0fa73c2b71c35de4fe7ae60a5f1a6cddc2cf0
Gitweb:     http://git.kernel.org/tip/ddd0fa73c2b71c35de4fe7ae60a5f1a6cddc2cf0
Author:     Waiman Long <Waiman.Long@hpe.com>
AuthorDate: Tue, 17 May 2016 21:26:23 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 8 Jun 2016 15:17:00 +0200

locking/rwsem: Streamline the rwsem_optimistic_spin() code

This patch moves the owner loading and checking code entirely inside of
rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin()
loop.

Suggested-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Peter Hurley <peter@hurleysoftware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1463534783-38814-6-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rwsem-xadd.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 4f1daf5..2031281 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -350,9 +350,16 @@ done:
 	return ret;
 }
 
-static noinline
-bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
+/*
+ * Return true only if we can still spin on the owner field of the rwsem.
+ */
+static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
+	struct task_struct *owner = READ_ONCE(sem->owner);
+
+	if (!rwsem_owner_is_writer(owner))
+		goto out;
+
 	rcu_read_lock();
 	while (sem->owner == owner) {
 		/*
@@ -372,7 +379,7 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 		cpu_relax_lowlatency();
 	}
 	rcu_read_unlock();
-
+out:
 	/*
 	 * If there is a new owner or the owner is not set, we continue
 	 * spinning.
@@ -382,7 +389,6 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 {
-	struct task_struct *owner;
 	bool taken = false;
 
 	preempt_disable();
@@ -394,21 +400,17 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 	if (!osq_lock(&sem->osq))
 		goto done;
 
-	while (true) {
-		owner = READ_ONCE(sem->owner);
+	/*
+	 * Optimistically spin on the owner field and attempt to acquire the
+	 * lock whenever the owner changes. Spinning will be stopped when:
+	 *  1) the owning writer isn't running; or
+	 *  2) readers own the lock as we can't determine if they are
+	 *     actively running or not.
+	 */
+	while (rwsem_spin_on_owner(sem)) {
 		/*
-		 * Don't spin if
-		 * 1) the owner is a reader as we we can't determine if the
-		 *    reader is actively running or not.
-		 * 2) The rwsem_spin_on_owner() returns false which means
-		 *    the owner isn't running.
+		 * Try to acquire the lock
 		 */
-		if (rwsem_owner_is_reader(owner) ||
-		   (rwsem_owner_is_writer(owner) &&
-		   !rwsem_spin_on_owner(sem, owner)))
-			break;
-
-		/* wait_lock will be acquired if write_lock is obtained */
 		if (rwsem_try_write_lock_unqueued(sem)) {
 			taken = true;
 			break;
@@ -420,7 +422,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 		 * we're an RT task that will live-lock because we won't let
 		 * the owner complete.
 		 */
-		if (!owner && (need_resched() || rt_task(current)))
+		if (!sem->owner && (need_resched() || rt_task(current)))
 			break;
 
 		/*

^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-06-08 14:27 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-18  1:26 [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field Waiman Long
2016-05-18  1:26 ` [PATCH v4 1/5] " Waiman Long
2016-06-06 17:18   ` Davidlohr Bueso
2016-06-06 20:03     ` Waiman Long
2016-06-06 21:02     ` Peter Zijlstra
2016-06-06 21:49       ` Waiman Long
2016-06-08 14:25   ` [tip:locking/core] " tip-bot for Waiman Long
2016-05-18  1:26 ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE() Waiman Long
2016-05-18 14:04   ` Davidlohr Bueso
2016-05-18 17:21     ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE Jason Low
2016-05-18 18:29       ` Waiman Long
2016-05-18 19:58         ` Jason Low
2016-05-19 22:21           ` Jason Low
2016-05-20 20:26             ` Waiman Long
2016-05-21 16:04           ` Peter Hurley
2016-05-22 10:42             ` Peter Zijlstra
2016-05-23 18:46             ` Jason Low
2016-05-23 19:44               ` Davidlohr Bueso
2016-05-23 20:15                 ` Paul E. McKenney
2016-05-23 21:04                   ` Davidlohr Bueso
2016-05-25  1:25               ` Waiman Long
2016-05-18 17:23     ` [PATCH v4 2/5] locking/rwsem: Protect all writes to owner by WRITE_ONCE() Jason Low
2016-06-08 14:25   ` [tip:locking/core] " tip-bot for Waiman Long
2016-05-18  1:26 ` [PATCH v4 3/5] locking/rwsem: Don't wake up one's own task Waiman Long
2016-05-18 10:30   ` Peter Zijlstra
2016-05-18 16:04     ` Waiman Long
2016-05-18  1:26 ` [PATCH v4 4/5] locking/rwsem: Improve reader wakeup code Waiman Long
2016-06-08 14:25   ` [tip:locking/core] " tip-bot for Waiman Long
2016-05-18  1:26 ` [PATCH v4 5/5] locking/rwsem: Streamline the rwsem_optimistic_spin() code Waiman Long
2016-06-08 14:26   ` [tip:locking/core] " tip-bot for Waiman Long
2016-05-18 10:52 ` [PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).