linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] locking/rwsem: Make owner store task pointer of last owning reader
@ 2018-09-06 20:18 Waiman Long
  2018-09-10  9:31 ` Peter Zijlstra
  2018-09-10 10:54 ` [tip:locking/core] " tip-bot for Waiman Long
  0 siblings, 2 replies; 10+ messages in thread
From: Waiman Long @ 2018-09-06 20:18 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon
  Cc: linux-kernel, Davidlohr Bueso, Waiman Long

Currently, when a reader acquires a lock, it only sets the
RWSEM_READER_OWNED bit in the owner field. The other bits are simply
not used. When debugging hanging cases involving rwsems and readers,
the owner value does not provide much useful information at all.

This patch modifies the current behavior to always store the task_struct
pointer of the last rwsem-acquiring reader in a reader-owned rwsem. This
may be useful in debugging rwsem hanging cases especially if only one
reader is involved. However, the task in the owner field may not the
real owner or one of the real owners at all when the owner value is
examined, for example, in a crash dump. So it is just an additional
hint about the past history.

If CONFIG_DEBUG_RWSEMS is enabled, the owner field will be checked at
unlock time too to make sure the task pointer value is valid. That does
have a slight performance cost and so is only enabled as part of that
debug option.

From the performance point of view, it is expected that the changes
shouldn't have any noticeable performance impact. A rwsem microbenchmark
(with 48 worker threads and 1:1 reader/writer ratio) was ran on a
2-socket 24-core 48-thread Haswell system.  The locking rates on a
4.19-rc1 based kernel were as follows:

  1) Unpatched kernel:				543.3 kops/s
  2) Patched kernel:				549.2 kops/s
  3) Patched kernel (CONFIG_DEBUG_RWSEMS on):	546.6 kops/s

There was actually a slight increase in performance (1.1%) in this
particular case. Maybe it was caused by the elimination of a branch or
just a testing noise. Turning on the CONFIG_DEBUG_RWSEMS option also
had less than the expected impact on performance.

The least significant 2 bits of the owner value are now used to designate
the rwsem is readers owned and the owners are anonymous.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/rwsem.h       |  4 +-
 kernel/locking/rwsem-xadd.c |  2 +-
 kernel/locking/rwsem.c      |  7 ++--
 kernel/locking/rwsem.h      | 95 +++++++++++++++++++++++++++++++++------------
 4 files changed, 78 insertions(+), 30 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index ab93b6e..67dbb57 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -45,10 +45,10 @@ struct rw_semaphore {
 };
 
 /*
- * Setting bit 0 of the owner field with other non-zero bits will indicate
+ * Setting bit 1 of the owner field but not bit 0 will indicate
  * that the rwsem is writer-owned with an unknown owner.
  */
-#define RWSEM_OWNER_UNKNOWN	((struct task_struct *)-1L)
+#define RWSEM_OWNER_UNKNOWN	((struct task_struct *)-2L)
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 01fcb80..09b1800 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -180,7 +180,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		 * but it gives the spinners an early indication that the
 		 * readers now have the lock.
 		 */
-		rwsem_set_reader_owned(sem);
+		__rwsem_set_reader_owned(sem, waiter->task);
 	}
 
 	/*
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 776308d..e586f0d 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -117,8 +117,9 @@ int down_write_trylock(struct rw_semaphore *sem)
 void up_read(struct rw_semaphore *sem)
 {
 	rwsem_release(&sem->dep_map, 1, _RET_IP_);
-	DEBUG_RWSEMS_WARN_ON(sem->owner != RWSEM_READER_OWNED);
+	DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED));
 
+	rwsem_clear_reader_owned(sem);
 	__up_read(sem);
 }
 
@@ -181,7 +182,7 @@ void down_read_non_owner(struct rw_semaphore *sem)
 	might_sleep();
 
 	__down_read(sem);
-	rwsem_set_reader_owned(sem);
+	__rwsem_set_reader_owned(sem, NULL);
 }
 
 EXPORT_SYMBOL(down_read_non_owner);
@@ -215,7 +216,7 @@ int __sched down_write_killable_nested(struct rw_semaphore *sem, int subclass)
 
 void up_read_non_owner(struct rw_semaphore *sem)
 {
-	DEBUG_RWSEMS_WARN_ON(sem->owner != RWSEM_READER_OWNED);
+	DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED));
 	__up_read(sem);
 }
 
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index b9d0e72..bad2bca 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -1,24 +1,30 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READER_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
+ * The least significant 2 bits of the owner value has the following
+ * meanings when set.
+ *  - RWSEM_READER_OWNED (bit 0): The rwsem is owned by readers
+ *  - RWSEM_ANONYMOUSLY_OWNED (bit 1): The rwsem is anonymously owned,
+ *    i.e. the owner(s) cannot be readily determined. It can be reader
+ *    owned or the owning writer is indeterminate.
  *
- * In essence, the owner field now has the following 4 states:
- *  1) 0
- *     - lock is free or the owner hasn't set the field yet
- *  2) RWSEM_READER_OWNED
- *     - lock is currently or previously owned by readers (lock is free
- *       or not set by owner yet)
- *  3) RWSEM_ANONYMOUSLY_OWNED bit set with some other bits set as well
- *     - lock is owned by an anonymous writer, so spinning on the lock
- *       owner should be disabled.
- *  4) Other non-zero value
- *     - a writer owns the lock and other writers can spin on the lock owner.
+ * When a writer acquires a rwsem, it puts its task_struct pointer
+ * into the owner field. It is cleared after an unlock.
+ *
+ * When a reader acquires a rwsem, it will also puts its task_struct
+ * pointer into the owner field with both the RWSEM_READER_OWNED and
+ * RWSEM_ANONYMOUSLY_OWNED bits set. On unlock, the owner field will
+ * largely be left untouched. So for a free or reader-owned rwsem,
+ * the owner value may contain information about the last reader that
+ * acquires the rwsem. The anonymous bit is set because that particular
+ * reader may or may not still own the lock.
+ *
+ * That information may be helpful in debugging cases where the system
+ * seems to hang on a reader owned rwsem especially if only one reader
+ * is involved. Ideally we would like to track all the readers that own
+ * a rwsem, but the overhead is simply too big.
  */
-#define RWSEM_ANONYMOUSLY_OWNED	(1UL << 0)
-#define RWSEM_READER_OWNED	((struct task_struct *)RWSEM_ANONYMOUSLY_OWNED)
+#define RWSEM_READER_OWNED	(1UL << 0)
+#define RWSEM_ANONYMOUSLY_OWNED	(1UL << 1)
 
 #ifdef CONFIG_DEBUG_RWSEMS
 # define DEBUG_RWSEMS_WARN_ON(c)	DEBUG_LOCKS_WARN_ON(c)
@@ -44,15 +50,26 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 	WRITE_ONCE(sem->owner, NULL);
 }
 
+/*
+ * The task_struct pointer of the last owning reader will be left in
+ * the owner field.
+ *
+ * Note that the owner value just indicates the task has owned the rwsem
+ * previously, it may not be the real owner or one of the real owners
+ * anymore when that field is examined, so take it with a grain of salt.
+ */
+static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem,
+					    struct task_struct *owner)
+{
+	unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED
+						 | RWSEM_ANONYMOUSLY_OWNED;
+
+	WRITE_ONCE(sem->owner, (struct task_struct *)val);
+}
+
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
-	/*
-	 * We check the owner value first to make sure that we will only
-	 * do a write to the rwsem cacheline when it is really necessary
-	 * to minimize cacheline contention.
-	 */
-	if (READ_ONCE(sem->owner) != RWSEM_READER_OWNED)
-		WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+	__rwsem_set_reader_owned(sem, current);
 }
 
 /*
@@ -72,6 +89,25 @@ static inline bool rwsem_has_anonymous_owner(struct task_struct *owner)
 {
 	return (unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED;
 }
+
+#ifdef CONFIG_DEBUG_RWSEMS
+/*
+ * With CONFIG_DEBUG_RWSEMS configured, it will make sure that if there
+ * is a task pointer in owner of a reader-owned rwsem, it will be the
+ * real owner or one of the real owners. The only exception is when the
+ * unlock is done by up_read_non_owner().
+ */
+#define rwsem_clear_reader_owned rwsem_clear_reader_owned
+static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem)
+{
+	unsigned long val = (unsigned long)current | RWSEM_READER_OWNED
+						   | RWSEM_ANONYMOUSLY_OWNED;
+	if (READ_ONCE(sem->owner) == (struct task_struct *)val)
+		cmpxchg_relaxed((unsigned long *)&sem->owner, val,
+				RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED);
+}
+#endif
+
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
@@ -81,7 +117,18 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 {
 }
 
+static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem,
+					   struct task_struct *owner)
+{
+}
+
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
 }
 #endif
+
+#ifndef rwsem_clear_reader_owned
+static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem)
+{
+}
+#endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/rwsem: Make owner store task pointer of last owning reader
  2018-09-06 20:18 [PATCH] locking/rwsem: Make owner store task pointer of last owning reader Waiman Long
@ 2018-09-10  9:31 ` Peter Zijlstra
  2018-09-10 15:01   ` Waiman Long
  2018-09-11 19:21   ` Davidlohr Bueso
  2018-09-10 10:54 ` [tip:locking/core] " tip-bot for Waiman Long
  1 sibling, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2018-09-10  9:31 UTC (permalink / raw)
  To: Waiman Long; +Cc: Ingo Molnar, Will Deacon, linux-kernel, Davidlohr Bueso

On Thu, Sep 06, 2018 at 04:18:34PM -0400, Waiman Long wrote:
> Currently, when a reader acquires a lock, it only sets the
> RWSEM_READER_OWNED bit in the owner field. The other bits are simply
> not used. When debugging hanging cases involving rwsems and readers,
> the owner value does not provide much useful information at all.
> 
> This patch modifies the current behavior to always store the task_struct
> pointer of the last rwsem-acquiring reader in a reader-owned rwsem. This
> may be useful in debugging rwsem hanging cases especially if only one
> reader is involved. However, the task in the owner field may not the
> real owner or one of the real owners at all when the owner value is
> examined, for example, in a crash dump. So it is just an additional
> hint about the past history.
> 
> If CONFIG_DEBUG_RWSEMS is enabled, the owner field will be checked at
> unlock time too to make sure the task pointer value is valid. That does
> have a slight performance cost and so is only enabled as part of that
> debug option.
> 
> From the performance point of view, it is expected that the changes
> shouldn't have any noticeable performance impact. A rwsem microbenchmark
> (with 48 worker threads and 1:1 reader/writer ratio) was ran on a
> 2-socket 24-core 48-thread Haswell system.  The locking rates on a
> 4.19-rc1 based kernel were as follows:
> 
>   1) Unpatched kernel:				543.3 kops/s
>   2) Patched kernel:				549.2 kops/s
>   3) Patched kernel (CONFIG_DEBUG_RWSEMS on):	546.6 kops/s
> 
> There was actually a slight increase in performance (1.1%) in this
> particular case. Maybe it was caused by the elimination of a branch or
> just a testing noise. Turning on the CONFIG_DEBUG_RWSEMS option also
> had less than the expected impact on performance.
> 
> The least significant 2 bits of the owner value are now used to designate
> the rwsem is readers owned and the owners are anonymous.

So no immediate objection; but I'm still hoping to some day rewrite the
whole rwsem thing along the lines we did mutex and merge the 'count' and
'owner' fields into one.

[ RT has something along those lines, and I have half a patch that
  starts doing that too, but I never found enough time to really work on
  it :-( ]

Anyway, when we do something like that, this goes out the window of
course.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:locking/core] locking/rwsem: Make owner store task pointer of last owning reader
  2018-09-06 20:18 [PATCH] locking/rwsem: Make owner store task pointer of last owning reader Waiman Long
  2018-09-10  9:31 ` Peter Zijlstra
@ 2018-09-10 10:54 ` tip-bot for Waiman Long
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot for Waiman Long @ 2018-09-10 10:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, torvalds, a.p.zijlstra, will.deacon, linux-kernel, hpa,
	longman, mingo, dave, tglx

Commit-ID:  925b9cd1b89a94b7124d128c80dfc48f78a63098
Gitweb:     https://git.kernel.org/tip/925b9cd1b89a94b7124d128c80dfc48f78a63098
Author:     Waiman Long <longman@redhat.com>
AuthorDate: Thu, 6 Sep 2018 16:18:34 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 10 Sep 2018 12:04:07 +0200

locking/rwsem: Make owner store task pointer of last owning reader

Currently, when a reader acquires a lock, it only sets the
RWSEM_READER_OWNED bit in the owner field. The other bits are simply
not used. When debugging hanging cases involving rwsems and readers,
the owner value does not provide much useful information at all.

This patch modifies the current behavior to always store the task_struct
pointer of the last rwsem-acquiring reader in a reader-owned rwsem. This
may be useful in debugging rwsem hanging cases especially if only one
reader is involved. However, the task in the owner field may not the
real owner or one of the real owners at all when the owner value is
examined, for example, in a crash dump. So it is just an additional
hint about the past history.

If CONFIG_DEBUG_RWSEMS=y is enabled, the owner field will be checked at
unlock time too to make sure the task pointer value is valid. That does
have a slight performance cost and so is only enabled as part of that
debug option.

From the performance point of view, it is expected that the changes
shouldn't have any noticeable performance impact. A rwsem microbenchmark
(with 48 worker threads and 1:1 reader/writer ratio) was ran on a
2-socket 24-core 48-thread Haswell system.  The locking rates on a
4.19-rc1 based kernel were as follows:

  1) Unpatched kernel:				543.3 kops/s
  2) Patched kernel:				549.2 kops/s
  3) Patched kernel (CONFIG_DEBUG_RWSEMS on):	546.6 kops/s

There was actually a slight increase in performance (1.1%) in this
particular case. Maybe it was caused by the elimination of a branch or
just a testing noise. Turning on the CONFIG_DEBUG_RWSEMS option also
had less than the expected impact on performance.

The least significant 2 bits of the owner value are now used to designate
the rwsem is readers owned and the owners are anonymous.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1536265114-10842-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/rwsem.h       |  4 +-
 kernel/locking/rwsem-xadd.c |  2 +-
 kernel/locking/rwsem.c      |  7 ++--
 kernel/locking/rwsem.h      | 95 +++++++++++++++++++++++++++++++++------------
 4 files changed, 78 insertions(+), 30 deletions(-)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index ab93b6eae696..67dbb57508b1 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -45,10 +45,10 @@ struct rw_semaphore {
 };
 
 /*
- * Setting bit 0 of the owner field with other non-zero bits will indicate
+ * Setting bit 1 of the owner field but not bit 0 will indicate
  * that the rwsem is writer-owned with an unknown owner.
  */
-#define RWSEM_OWNER_UNKNOWN	((struct task_struct *)-1L)
+#define RWSEM_OWNER_UNKNOWN	((struct task_struct *)-2L)
 
 extern struct rw_semaphore *rwsem_down_read_failed(struct rw_semaphore *sem);
 extern struct rw_semaphore *rwsem_down_read_failed_killable(struct rw_semaphore *sem);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 01fcb807598c..09b180063ee1 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -180,7 +180,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		 * but it gives the spinners an early indication that the
 		 * readers now have the lock.
 		 */
-		rwsem_set_reader_owned(sem);
+		__rwsem_set_reader_owned(sem, waiter->task);
 	}
 
 	/*
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 776308d2fa9e..e586f0d03ad3 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -117,8 +117,9 @@ EXPORT_SYMBOL(down_write_trylock);
 void up_read(struct rw_semaphore *sem)
 {
 	rwsem_release(&sem->dep_map, 1, _RET_IP_);
-	DEBUG_RWSEMS_WARN_ON(sem->owner != RWSEM_READER_OWNED);
+	DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED));
 
+	rwsem_clear_reader_owned(sem);
 	__up_read(sem);
 }
 
@@ -181,7 +182,7 @@ void down_read_non_owner(struct rw_semaphore *sem)
 	might_sleep();
 
 	__down_read(sem);
-	rwsem_set_reader_owned(sem);
+	__rwsem_set_reader_owned(sem, NULL);
 }
 
 EXPORT_SYMBOL(down_read_non_owner);
@@ -215,7 +216,7 @@ EXPORT_SYMBOL(down_write_killable_nested);
 
 void up_read_non_owner(struct rw_semaphore *sem)
 {
-	DEBUG_RWSEMS_WARN_ON(sem->owner != RWSEM_READER_OWNED);
+	DEBUG_RWSEMS_WARN_ON(!((unsigned long)sem->owner & RWSEM_READER_OWNED));
 	__up_read(sem);
 }
 
diff --git a/kernel/locking/rwsem.h b/kernel/locking/rwsem.h
index b9d0e72aa80f..bad2bca0268b 100644
--- a/kernel/locking/rwsem.h
+++ b/kernel/locking/rwsem.h
@@ -1,24 +1,30 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 /*
- * The owner field of the rw_semaphore structure will be set to
- * RWSEM_READER_OWNED when a reader grabs the lock. A writer will clear
- * the owner field when it unlocks. A reader, on the other hand, will
- * not touch the owner field when it unlocks.
+ * The least significant 2 bits of the owner value has the following
+ * meanings when set.
+ *  - RWSEM_READER_OWNED (bit 0): The rwsem is owned by readers
+ *  - RWSEM_ANONYMOUSLY_OWNED (bit 1): The rwsem is anonymously owned,
+ *    i.e. the owner(s) cannot be readily determined. It can be reader
+ *    owned or the owning writer is indeterminate.
  *
- * In essence, the owner field now has the following 4 states:
- *  1) 0
- *     - lock is free or the owner hasn't set the field yet
- *  2) RWSEM_READER_OWNED
- *     - lock is currently or previously owned by readers (lock is free
- *       or not set by owner yet)
- *  3) RWSEM_ANONYMOUSLY_OWNED bit set with some other bits set as well
- *     - lock is owned by an anonymous writer, so spinning on the lock
- *       owner should be disabled.
- *  4) Other non-zero value
- *     - a writer owns the lock and other writers can spin on the lock owner.
+ * When a writer acquires a rwsem, it puts its task_struct pointer
+ * into the owner field. It is cleared after an unlock.
+ *
+ * When a reader acquires a rwsem, it will also puts its task_struct
+ * pointer into the owner field with both the RWSEM_READER_OWNED and
+ * RWSEM_ANONYMOUSLY_OWNED bits set. On unlock, the owner field will
+ * largely be left untouched. So for a free or reader-owned rwsem,
+ * the owner value may contain information about the last reader that
+ * acquires the rwsem. The anonymous bit is set because that particular
+ * reader may or may not still own the lock.
+ *
+ * That information may be helpful in debugging cases where the system
+ * seems to hang on a reader owned rwsem especially if only one reader
+ * is involved. Ideally we would like to track all the readers that own
+ * a rwsem, but the overhead is simply too big.
  */
-#define RWSEM_ANONYMOUSLY_OWNED	(1UL << 0)
-#define RWSEM_READER_OWNED	((struct task_struct *)RWSEM_ANONYMOUSLY_OWNED)
+#define RWSEM_READER_OWNED	(1UL << 0)
+#define RWSEM_ANONYMOUSLY_OWNED	(1UL << 1)
 
 #ifdef CONFIG_DEBUG_RWSEMS
 # define DEBUG_RWSEMS_WARN_ON(c)	DEBUG_LOCKS_WARN_ON(c)
@@ -44,15 +50,26 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 	WRITE_ONCE(sem->owner, NULL);
 }
 
+/*
+ * The task_struct pointer of the last owning reader will be left in
+ * the owner field.
+ *
+ * Note that the owner value just indicates the task has owned the rwsem
+ * previously, it may not be the real owner or one of the real owners
+ * anymore when that field is examined, so take it with a grain of salt.
+ */
+static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem,
+					    struct task_struct *owner)
+{
+	unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED
+						 | RWSEM_ANONYMOUSLY_OWNED;
+
+	WRITE_ONCE(sem->owner, (struct task_struct *)val);
+}
+
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
-	/*
-	 * We check the owner value first to make sure that we will only
-	 * do a write to the rwsem cacheline when it is really necessary
-	 * to minimize cacheline contention.
-	 */
-	if (READ_ONCE(sem->owner) != RWSEM_READER_OWNED)
-		WRITE_ONCE(sem->owner, RWSEM_READER_OWNED);
+	__rwsem_set_reader_owned(sem, current);
 }
 
 /*
@@ -72,6 +89,25 @@ static inline bool rwsem_has_anonymous_owner(struct task_struct *owner)
 {
 	return (unsigned long)owner & RWSEM_ANONYMOUSLY_OWNED;
 }
+
+#ifdef CONFIG_DEBUG_RWSEMS
+/*
+ * With CONFIG_DEBUG_RWSEMS configured, it will make sure that if there
+ * is a task pointer in owner of a reader-owned rwsem, it will be the
+ * real owner or one of the real owners. The only exception is when the
+ * unlock is done by up_read_non_owner().
+ */
+#define rwsem_clear_reader_owned rwsem_clear_reader_owned
+static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem)
+{
+	unsigned long val = (unsigned long)current | RWSEM_READER_OWNED
+						   | RWSEM_ANONYMOUSLY_OWNED;
+	if (READ_ONCE(sem->owner) == (struct task_struct *)val)
+		cmpxchg_relaxed((unsigned long *)&sem->owner, val,
+				RWSEM_READER_OWNED | RWSEM_ANONYMOUSLY_OWNED);
+}
+#endif
+
 #else
 static inline void rwsem_set_owner(struct rw_semaphore *sem)
 {
@@ -81,7 +117,18 @@ static inline void rwsem_clear_owner(struct rw_semaphore *sem)
 {
 }
 
+static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem,
+					   struct task_struct *owner)
+{
+}
+
 static inline void rwsem_set_reader_owned(struct rw_semaphore *sem)
 {
 }
 #endif
+
+#ifndef rwsem_clear_reader_owned
+static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem)
+{
+}
+#endif

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/rwsem: Make owner store task pointer of last owning reader
  2018-09-10  9:31 ` Peter Zijlstra
@ 2018-09-10 15:01   ` Waiman Long
  2018-09-10 17:15     ` Davidlohr Bueso
  2018-09-11 19:21   ` Davidlohr Bueso
  1 sibling, 1 reply; 10+ messages in thread
From: Waiman Long @ 2018-09-10 15:01 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Will Deacon, linux-kernel, Davidlohr Bueso

On 09/10/2018 05:31 AM, Peter Zijlstra wrote:
> On Thu, Sep 06, 2018 at 04:18:34PM -0400, Waiman Long wrote:
>> Currently, when a reader acquires a lock, it only sets the
>> RWSEM_READER_OWNED bit in the owner field. The other bits are simply
>> not used. When debugging hanging cases involving rwsems and readers,
>> the owner value does not provide much useful information at all.
>>
>> This patch modifies the current behavior to always store the task_struct
>> pointer of the last rwsem-acquiring reader in a reader-owned rwsem. This
>> may be useful in debugging rwsem hanging cases especially if only one
>> reader is involved. However, the task in the owner field may not the
>> real owner or one of the real owners at all when the owner value is
>> examined, for example, in a crash dump. So it is just an additional
>> hint about the past history.
>>
>> If CONFIG_DEBUG_RWSEMS is enabled, the owner field will be checked at
>> unlock time too to make sure the task pointer value is valid. That does
>> have a slight performance cost and so is only enabled as part of that
>> debug option.
>>
>> From the performance point of view, it is expected that the changes
>> shouldn't have any noticeable performance impact. A rwsem microbenchmark
>> (with 48 worker threads and 1:1 reader/writer ratio) was ran on a
>> 2-socket 24-core 48-thread Haswell system.  The locking rates on a
>> 4.19-rc1 based kernel were as follows:
>>
>>   1) Unpatched kernel:				543.3 kops/s
>>   2) Patched kernel:				549.2 kops/s
>>   3) Patched kernel (CONFIG_DEBUG_RWSEMS on):	546.6 kops/s
>>
>> There was actually a slight increase in performance (1.1%) in this
>> particular case. Maybe it was caused by the elimination of a branch or
>> just a testing noise. Turning on the CONFIG_DEBUG_RWSEMS option also
>> had less than the expected impact on performance.
>>
>> The least significant 2 bits of the owner value are now used to designate
>> the rwsem is readers owned and the owners are anonymous.
> So no immediate objection; but I'm still hoping to some day rewrite the
> whole rwsem thing along the lines we did mutex and merge the 'count' and
> 'owner' fields into one.
>
> [ RT has something along those lines, and I have half a patch that
>   starts doing that too, but I never found enough time to really work on
>   it :-( ]
>
> Anyway, when we do something like that, this goes out the window of
> course.

That may not be the case. I have also thought about merging owner and
count. For writer, it can write its owner value to the combined
count/owner. For readers, I think we will need to keep a reader count
and so cannot store its owner value there. In that case, we could have a
debug-only field for the reader-owner value.

One major issue with a combined count/owner is that we may have to use
cmpxchg for reader lock which will certainly impact reader-heavy
workloads. I have also thought about ways to compress the task pointer
address so that it can use fewer bits and leave the rests for reader
count. It is probably doable on 64-bit systems, but likely not on 32-bit
system given that there are less bits to play around.

I would certainly like to hear your thought on how to handle these
dilemmas as I may be able to find time to work on that. I had a reader
optimistic spinning patch that was sent out a while ago. I know it won't
get serious time for review until we can resolve this combine
owner/count issue first.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/rwsem: Make owner store task pointer of last owning reader
  2018-09-10 15:01   ` Waiman Long
@ 2018-09-10 17:15     ` Davidlohr Bueso
  2018-09-10 17:35       ` Waiman Long
  2018-09-11  8:17       ` Peter Zijlstra
  0 siblings, 2 replies; 10+ messages in thread
From: Davidlohr Bueso @ 2018-09-10 17:15 UTC (permalink / raw)
  To: Waiman Long; +Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On Mon, 10 Sep 2018, Waiman Long wrote:

>One major issue with a combined count/owner is that we may have to use
>cmpxchg for reader lock which will certainly impact reader-heavy
>workloads. I have also thought about ways to compress the task pointer
>address so that it can use fewer bits and leave the rests for reader
>count. It is probably doable on 64-bit systems, but likely not on 32-bit
>system given that there are less bits to play around.

Yeah we've discussed this before. As a cleanup it would obviously be good,
but I fear about raw performance loss when using cmpxchg instead of xadd.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/rwsem: Make owner store task pointer of last owning reader
  2018-09-10 17:15     ` Davidlohr Bueso
@ 2018-09-10 17:35       ` Waiman Long
  2018-09-11  8:17       ` Peter Zijlstra
  1 sibling, 0 replies; 10+ messages in thread
From: Waiman Long @ 2018-09-10 17:35 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, linux-kernel

On 09/10/2018 01:15 PM, Davidlohr Bueso wrote:
> On Mon, 10 Sep 2018, Waiman Long wrote:
>
>> One major issue with a combined count/owner is that we may have to use
>> cmpxchg for reader lock which will certainly impact reader-heavy
>> workloads. I have also thought about ways to compress the task pointer
>> address so that it can use fewer bits and leave the rests for reader
>> count. It is probably doable on 64-bit systems, but likely not on 32-bit
>> system given that there are less bits to play around.
>
> Yeah we've discussed this before. As a cleanup it would obviously be
> good,
> but I fear about raw performance loss when using cmpxchg instead of xadd. 

I don't think using cmpxchg for writers will be a performance issue.
However, using cmpxchg for readers will be. One thought that I have is
to use the tid of the writer on the lower half and use the upper half
for reader count. However, there is some overhead in translating tid to
task pointer. So I don't know how useful that may be. So we may still
need to keep the owner field around.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/rwsem: Make owner store task pointer of last owning reader
  2018-09-10 17:15     ` Davidlohr Bueso
  2018-09-10 17:35       ` Waiman Long
@ 2018-09-11  8:17       ` Peter Zijlstra
  2018-09-11 12:56         ` Waiman Long
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2018-09-11  8:17 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Waiman Long, Ingo Molnar, Will Deacon, linux-kernel

On Mon, Sep 10, 2018 at 10:15:50AM -0700, Davidlohr Bueso wrote:
> On Mon, 10 Sep 2018, Waiman Long wrote:
> 
> > One major issue with a combined count/owner is that we may have to use
> > cmpxchg for reader lock which will certainly impact reader-heavy
> > workloads. I have also thought about ways to compress the task pointer
> > address so that it can use fewer bits and leave the rests for reader
> > count. It is probably doable on 64-bit systems, but likely not on 32-bit
> > system given that there are less bits to play around.
> 
> Yeah we've discussed this before. As a cleanup it would obviously be good,
> but I fear about raw performance loss when using cmpxchg instead of xadd.

Does it really matter though? Last time I looked at something similar
(refcount_t) the "LOCK INCL" vs "LOCK CMPXCHG" was something like 15 vs
23 cycles (and that was with the cmpxchg loop actually doing a lot
more).

Do we really care about the down_read() path _that_ much? I thought that
with the main pain point, pagefaults, the problem was mostly the line
bouncing, not a few extra cycles.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/rwsem: Make owner store task pointer of last owning reader
  2018-09-11  8:17       ` Peter Zijlstra
@ 2018-09-11 12:56         ` Waiman Long
  0 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2018-09-11 12:56 UTC (permalink / raw)
  To: Peter Zijlstra, Davidlohr Bueso; +Cc: Ingo Molnar, Will Deacon, linux-kernel

On 09/11/2018 04:17 AM, Peter Zijlstra wrote:
> On Mon, Sep 10, 2018 at 10:15:50AM -0700, Davidlohr Bueso wrote:
>> On Mon, 10 Sep 2018, Waiman Long wrote:
>>
>>> One major issue with a combined count/owner is that we may have to use
>>> cmpxchg for reader lock which will certainly impact reader-heavy
>>> workloads. I have also thought about ways to compress the task pointer
>>> address so that it can use fewer bits and leave the rests for reader
>>> count. It is probably doable on 64-bit systems, but likely not on 32-bit
>>> system given that there are less bits to play around.
>> Yeah we've discussed this before. As a cleanup it would obviously be good,
>> but I fear about raw performance loss when using cmpxchg instead of xadd.
> Does it really matter though? Last time I looked at something similar
> (refcount_t) the "LOCK INCL" vs "LOCK CMPXCHG" was something like 15 vs
> 23 cycles (and that was with the cmpxchg loop actually doing a lot
> more).
>
> Do we really care about the down_read() path _that_ much? I thought that
> with the main pain point, pagefaults, the problem was mostly the line
> bouncing, not a few extra cycles.

It is not simply that cmpxchg needs more cycles to run, it is the fact
that a few retries may be necessary to actually increment the count in
case of contention leading to more RMW cycles than with xadd. I am not
saying that we can't do that, but there is a price we need to pay.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/rwsem: Make owner store task pointer of last owning reader
  2018-09-10  9:31 ` Peter Zijlstra
  2018-09-10 15:01   ` Waiman Long
@ 2018-09-11 19:21   ` Davidlohr Bueso
  2018-09-11 19:38     ` Peter Zijlstra
  1 sibling, 1 reply; 10+ messages in thread
From: Davidlohr Bueso @ 2018-09-11 19:21 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Waiman Long, Ingo Molnar, Will Deacon, linux-kernel

On Mon, 10 Sep 2018, Peter Zijlstra wrote:

>[ RT has something along those lines, and I have half a patch that
>  starts doing that too, but I never found enough time to really work on
>  it :-( ]

But don't rt rwsems act pretty much like a regular mutex in that it only
allows readers to nest, not share the lock because of priority inversion
scenarios.

Also, wrt rtmutex, will that top-spinner thing ever make it in?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] locking/rwsem: Make owner store task pointer of last owning reader
  2018-09-11 19:21   ` Davidlohr Bueso
@ 2018-09-11 19:38     ` Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2018-09-11 19:38 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Waiman Long, Ingo Molnar, Will Deacon, linux-kernel

On Tue, Sep 11, 2018 at 12:21:22PM -0700, Davidlohr Bueso wrote:
> On Mon, 10 Sep 2018, Peter Zijlstra wrote:
> 
> > [ RT has something along those lines, and I have half a patch that
> >  starts doing that too, but I never found enough time to really work on
> >  it :-( ]
> 
> But don't rt rwsems act pretty much like a regular mutex in that it only
> allows readers to nest, not share the lock because of priority inversion
> scenarios.
> 
> Also, wrt rtmutex, will that top-spinner thing ever make it in?

Nah, tglx rewrote that a while ago. It's something semi clever now.
Lemme see if I can find the patch.

  https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/kernel/locking/rwsem-rt.c?h=linux-4.18.y-rt&id=1a42d0b3c3f34895cb5d63e1cd2be02f6077919b

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-09-11 19:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-06 20:18 [PATCH] locking/rwsem: Make owner store task pointer of last owning reader Waiman Long
2018-09-10  9:31 ` Peter Zijlstra
2018-09-10 15:01   ` Waiman Long
2018-09-10 17:15     ` Davidlohr Bueso
2018-09-10 17:35       ` Waiman Long
2018-09-11  8:17       ` Peter Zijlstra
2018-09-11 12:56         ` Waiman Long
2018-09-11 19:21   ` Davidlohr Bueso
2018-09-11 19:38     ` Peter Zijlstra
2018-09-10 10:54 ` [tip:locking/core] " tip-bot for Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).