linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org, linux-xtensa@linux-xtensa.org,
	Davidlohr Bueso <dave@stgolabs.net>,
	linux-ia64@vger.kernel.org, Tim Chen <tim.c.chen@linux.intel.com>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-sh@vger.kernel.org, linux-hexagon@vger.kernel.org,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Borislav Petkov <bp@alien8.de>,
	linux-alpha@vger.kernel.org, sparclinux@vger.kernel.org,
	Waiman Long <longman@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
Subject: [PATCH-tip 20/22] locking/rwsem: Enable count-based spinning on reader
Date: Thu,  7 Feb 2019 14:07:24 -0500	[thread overview]
Message-ID: <1549566446-27967-21-git-send-email-longman@redhat.com> (raw)
In-Reply-To: <1549566446-27967-1-git-send-email-longman@redhat.com>

When the rwsem is owned by reader, writers stop optimistic spinning
simply because there is no easy way to figure out if all the readers
are actively running or not. However, there are scenarios where
the readers are unlikely to sleep and optimistic spinning can help
performance.

This patch provides a simple mechanism for spinning on a reader-owned
rwsem. It is a loop count threshold based spinning where the count will
get reset whenenver the rwsem reader count value changes indicating
that the rwsem is still active. There is another maximum count value
that limits that maximum number of spinnings that can happen.

When the loop or max counts reach 0, a bit will be set in the owner
field to indicate that no more optimistic spinning should be done on
this rwsem until it becomes writer owned again. Not even readers
is allowed to acquire the reader-locked rwsem for better fairness.

The spinning threshold and maximum values can be overridden by
architecture specific header file, if necessary. The current default
threshold value is 512 iterations.

With a locking microbenchmark running on 5.0 based kernel, the total
locking rates (in kops/s) of the benchmark on a 4-socket 56-core x86-64
system with equal numbers of readers and writers before all the reader
spining patches, before this patch and after this patch were as follows:

   # of Threads  Pre-rspin    Pre-Patch   Post-patch
   ------------  ---------    ---------   ----------
        2          1,926        2,120        8,057
        4          1,391        1,320        7,680
        8            716          694        7,284
       16            618          606        6,542
       32            501          487        1,449
       64             61           57          480

This patch gives a big boost in performance for mixed reader/writer
workloads.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/lock_events_list.h |  1 +
 kernel/locking/rwsem-xadd.c       | 63 +++++++++++++++++++++++++++++++++++----
 kernel/locking/rwsem-xadd.h       | 45 +++++++++++++++++++++-------
 3 files changed, 94 insertions(+), 15 deletions(-)

diff --git a/kernel/locking/lock_events_list.h b/kernel/locking/lock_events_list.h
index 54b6650..0052534 100644
--- a/kernel/locking/lock_events_list.h
+++ b/kernel/locking/lock_events_list.h
@@ -60,6 +60,7 @@
 LOCK_EVENT(rwsem_opt_rlock)	/* # of read locks opt-spin acquired	*/
 LOCK_EVENT(rwsem_opt_wlock)	/* # of write locks opt-spin acquired	*/
 LOCK_EVENT(rwsem_opt_fail)	/* # of failed opt-spinnings		*/
+LOCK_EVENT(rwsem_opt_nospin)	/* # of disabled reader opt-spinnings	*/
 LOCK_EVENT(rwsem_rlock)		/* # of read locks acquired		*/
 LOCK_EVENT(rwsem_rlock_fast)	/* # of fast read locks acquired	*/
 LOCK_EVENT(rwsem_rlock_fail)	/* # of failed read lock acquisitions	*/
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 015edd6..3beb942 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -95,6 +95,22 @@ enum rwsem_wake_type {
 #define RWSEM_WAIT_TIMEOUT	((HZ - 1)/200 + 1)
 
 /*
+ * Reader-owned rwsem spinning threshold and maximum value
+ *
+ * This threshold and maximum values can be overridden by architecture
+ * specific value. The loop count will be reset whenenver the rwsem count
+ * value changes. The max value constrains the total number of reader-owned
+ * lock spinnings that can happen.
+ */
+#ifdef	ARCH_RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_THRESHOLD	ARCH_RWSEM_RSPIN_THRESHOLD
+# define RWSEM_RSPIN_MAX	ARCH_RWSEM_RSPIN_MAX
+#else
+# define RWSEM_RSPIN_THRESHOLD	(1 << 9)
+# define RWSEM_RSPIN_MAX	(1 << 12)
+#endif
+
+/*
  * handle the lock release when processes blocked on it that can now run
  * - if we come here from up_xxxx(), then the RWSEM_FLAG_WAITERS bit must
  *   have been set.
@@ -324,7 +340,7 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 	owner = rwsem_get_owner(sem);
 	if (owner) {
 		ret = is_rwsem_owner_spinnable(owner) &&
-		      owner_on_cpu(owner, sem);
+		     (is_rwsem_owner_reader(owner) || owner_on_cpu(owner, sem));
 	}
 	rcu_read_unlock();
 	lockevent_cond_inc(rwsem_opt_fail, !ret);
@@ -359,7 +375,8 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem)
 	 * This enables the spinner to move forward and do a trylock
 	 * earlier.
 	 */
-	while (owner && (READ_ONCE(sem->owner) == owner)) {
+	while (owner && !is_rwsem_owner_reader(owner)
+		     && (READ_ONCE(sem->owner) == owner)) {
 		/*
 		 * Ensure we emit the owner->on_cpu, dereference _after_
 		 * checking sem->owner still matches owner, if that fails,
@@ -394,6 +411,10 @@ static noinline enum owner_state rwsem_spin_on_owner(struct rw_semaphore *sem)
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock)
 {
 	bool taken = false;
+	enum owner_state owner_state;
+	int rspin_cnt = RWSEM_RSPIN_THRESHOLD;
+	int rspin_max = RWSEM_RSPIN_MAX;
+	int old_rcount = 0;
 
 	preempt_disable();
 
@@ -401,14 +422,16 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock)
 	if (!osq_lock(&sem->osq))
 		goto done;
 
+	if (!is_rwsem_spinnable(sem))
+		rspin_cnt = 0;
+
 	/*
 	 * Optimistically spin on the owner field and attempt to acquire the
 	 * lock whenever the owner changes. Spinning will be stopped when:
 	 *  1) the owning writer isn't running; or
-	 *  2) readers own the lock as we can't determine if they are
-	 *     actively running or not.
+	 *  2) readers own the lock and spinning count has reached 0.
 	 */
-	while (rwsem_spin_on_owner(sem) == OWNER_SPINNABLE) {
+	while ((owner_state = rwsem_spin_on_owner(sem)) != OWNER_NONSPINNABLE) {
 		/*
 		 * Try to acquire the lock
 		 */
@@ -429,6 +452,36 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, const long wlock)
 			break;
 
 		/*
+		 * We only decremnt rspin_cnt when a writer is trying to
+		 * acquire a lock owned by readers. In which case,
+		 * rwsem_spin_on_owner() will essentially be a no-op
+		 * and we will be spinning in this main loop. The spinning
+		 * count will be reset whenever the rwsem count value
+		 * changes.
+		 */
+		if (wlock && (owner_state == OWNER_READER)) {
+			int rcount;
+
+			if (!rspin_cnt || !rspin_max) {
+				if (is_rwsem_spinnable(sem)) {
+					rwsem_set_nonspinnable(sem);
+					lockevent_inc(rwsem_opt_nospin);
+				}
+				break;
+			}
+
+			rcount = atomic_long_read(&sem->count)
+					>> RWSEM_READER_SHIFT;
+			if (rcount != old_rcount) {
+				old_rcount = rcount;
+				rspin_cnt = RWSEM_RSPIN_THRESHOLD;
+			} else {
+				rspin_cnt--;
+			}
+			rspin_max--;
+		}
+
+		/*
 		 * The cpu_relax() call is a compiler barrier which forces
 		 * everything in this loop to be re-loaded. We don't need
 		 * memory barriers as we'll eventually observe the right
diff --git a/kernel/locking/rwsem-xadd.h b/kernel/locking/rwsem-xadd.h
index eb4ef36..be67dbd 100644
--- a/kernel/locking/rwsem-xadd.h
+++ b/kernel/locking/rwsem-xadd.h
@@ -5,18 +5,20 @@
  *  - RWSEM_READER_OWNED (bit 0): The rwsem is owned by readers
  *  - RWSEM_ANONYMOUSLY_OWNED (bit 1): The rwsem is anonymously owned,
  *    i.e. the owner(s) cannot be readily determined. It can be reader
- *    owned or the owning writer is indeterminate.
+ *    owned or the owning writer is indeterminate. Optimistic spinning
+ *    should be disabled if this flag is set.
  *
  * When a writer acquires a rwsem, it puts its task_struct pointer
- * into the owner field. It is cleared after an unlock.
+ * into the owner field or the count itself (64-bit only. It should
+ * be cleared after an unlock.
  *
  * When a reader acquires a rwsem, it will also puts its task_struct
- * pointer into the owner field with both the RWSEM_READER_OWNED and
- * RWSEM_ANONYMOUSLY_OWNED bits set. On unlock, the owner field will
- * largely be left untouched. So for a free or reader-owned rwsem,
- * the owner value may contain information about the last reader that
- * acquires the rwsem. The anonymous bit is set because that particular
- * reader may or may not still own the lock.
+ * pointer into the owner field with the RWSEM_READER_OWNED bit set.
+ * On unlock, the owner field will largely be left untouched. So
+ * for a free or reader-owned rwsem, the owner value may contain
+ * information about the last reader that acquires the rwsem. The
+ * anonymous bit may also be set to permanently disable optimistic
+ * spinning on a reader-own rwsem until a writer comes along.
  *
  * That information may be helpful in debugging cases where the system
  * seems to hang on a reader owned rwsem especially if only one reader
@@ -182,8 +184,7 @@ static inline struct task_struct *rwsem_get_owner(struct rw_semaphore *sem)
 static inline void __rwsem_set_reader_owned(struct rw_semaphore *sem,
 					    struct task_struct *owner)
 {
-	unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED
-						 | RWSEM_ANONYMOUSLY_OWNED;
+	unsigned long val = (unsigned long)owner | RWSEM_READER_OWNED;
 
 	WRITE_ONCE(sem->owner, (struct task_struct *)val);
 }
@@ -209,6 +210,14 @@ static inline bool is_rwsem_owner_reader(struct task_struct *owner)
 }
 
 /*
+ * Return true if the rwsem is spinnable.
+ */
+static inline bool is_rwsem_spinnable(struct rw_semaphore *sem)
+{
+	return is_rwsem_owner_spinnable(READ_ONCE(sem->owner));
+}
+
+/*
  * Return true if the rwsem is owned by a reader.
  */
 static inline bool is_rwsem_reader_owned(struct rw_semaphore *sem)
@@ -226,6 +235,22 @@ static inline bool is_rwsem_reader_owned(struct rw_semaphore *sem)
 }
 
 /*
+ * Set the RWSEM_ANONYMOUSLY_OWNED flag if the RWSEM_READER_OWNED flag
+ * remains set. Otherwise, the operation will be aborted.
+ */
+static inline void rwsem_set_nonspinnable(struct rw_semaphore *sem)
+{
+	long owner = (long)READ_ONCE(sem->owner);
+
+	while (is_rwsem_owner_reader((struct task_struct *)owner)) {
+		if (!is_rwsem_owner_spinnable((struct task_struct *)owner))
+			break;
+		owner = cmpxchg((long *)&sem->owner, owner,
+				owner | RWSEM_ANONYMOUSLY_OWNED);
+	}
+}
+
+/*
  * Return true if rwsem is owned by an anonymous writer or readers.
  */
 static inline bool rwsem_has_anonymous_owner(struct task_struct *owner)
-- 
1.8.3.1


  parent reply	other threads:[~2019-02-07 20:59 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-07 19:07 [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Waiman Long
2019-02-07 19:07 ` [PATCH-tip 01/22] locking/qspinlock_stat: Introduce a generic lockevent counting APIs Waiman Long
2019-02-07 19:07 ` [PATCH-tip 02/22] locking/lock_events: Make lock_events available for all archs & other locks Waiman Long
2019-02-07 19:07 ` [PATCH-tip 03/22] locking/rwsem: Relocate rwsem_down_read_failed() Waiman Long
2019-02-07 19:07 ` [PATCH-tip 04/22] locking/rwsem: Remove arch specific rwsem files Waiman Long
2019-02-07 19:36   ` Peter Zijlstra
2019-02-07 19:43     ` Waiman Long
2019-02-07 19:48     ` Peter Zijlstra
2019-02-07 19:07 ` [PATCH-tip 05/22] locking/rwsem: Move owner setting code from rwsem.c to rwsem.h Waiman Long
2019-02-07 19:07 ` [PATCH-tip 06/22] locking/rwsem: Rename kernel/locking/rwsem.h Waiman Long
2019-02-07 19:07 ` [PATCH-tip 07/22] locking/rwsem: Move rwsem internal function declarations to rwsem-xadd.h Waiman Long
2019-02-07 19:07 ` [PATCH-tip 08/22] locking/rwsem: Add debug check for __down_read*() Waiman Long
2019-02-07 19:07 ` [PATCH-tip 09/22] locking/rwsem: Enhance DEBUG_RWSEMS_WARN_ON() macro Waiman Long
2019-02-07 19:07 ` [PATCH-tip 10/22] locking/rwsem: Enable lock event counting Waiman Long
2019-02-07 19:07 ` [PATCH-tip 11/22] locking/rwsem: Implement a new locking scheme Waiman Long
2019-02-07 19:07 ` [PATCH-tip 12/22] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
2019-02-07 19:07 ` [PATCH-tip 13/22] locking/rwsem: Remove rwsem_wake() wakeup optimization Waiman Long
2019-02-07 19:07 ` [PATCH-tip 14/22] locking/rwsem: Add more rwsem owner access helpers Waiman Long
2019-02-07 19:07 ` [PATCH-tip 15/22] locking/rwsem: Merge owner into count on x86-64 Waiman Long
2019-02-07 19:45   ` Peter Zijlstra
2019-02-07 19:55     ` Waiman Long
2019-02-07 20:08   ` Peter Zijlstra
2019-02-07 20:54     ` Waiman Long
2019-02-08 14:19       ` Waiman Long
2019-02-07 19:07 ` [PATCH-tip 16/22] locking/rwsem: Remove redundant computation of writer lock word Waiman Long
2019-02-07 19:07 ` [PATCH-tip 17/22] locking/rwsem: Recheck owner if it is not on cpu Waiman Long
2019-02-07 19:07 ` [PATCH-tip 18/22] locking/rwsem: Make rwsem_spin_on_owner() return a tri-state value Waiman Long
2019-02-07 19:07 ` [PATCH-tip 19/22] locking/rwsem: Enable readers spinning on writer Waiman Long
2019-02-07 19:07 ` Waiman Long [this message]
2019-02-07 19:07 ` [PATCH-tip 21/22] locking/rwsem: Wake up all readers in wait queue Waiman Long
2019-02-07 19:07 ` [PATCH-tip 22/22] locking/rwsem: Ensure an RT task will not spin on reader Waiman Long
2019-02-07 19:51 ` [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features Davidlohr Bueso
2019-02-07 20:00   ` Waiman Long
2019-02-11  7:38     ` Ingo Molnar
2019-02-08 19:50 ` Linus Torvalds
2019-02-08 20:31   ` Waiman Long
2019-02-09  0:03     ` Linus Torvalds
2019-02-14 13:23     ` Davidlohr Bueso
2019-02-14 15:22       ` Waiman Long
2019-02-13  9:19 ` Chen Rong
2019-02-13 19:56   ` Linus Torvalds
2019-04-10  8:15     ` huang ying
2019-04-10 16:08       ` Waiman Long
2019-04-12  0:49         ` huang ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1549566446-27967-21-git-send-email-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bp@alien8.de \
    --cc=dave@stgolabs.net \
    --cc=hpa@zytor.com \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-hexagon@vger.kernel.org \
    --cc=linux-ia64@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux-xtensa@linux-xtensa.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).