All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Borislav Petkov <bp@alien8.de>, "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	Davidlohr Bueso <dave@stgolabs.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	huang ying <huang.ying.caritas@gmail.com>,
	Waiman Long <longman@redhat.com>
Subject: [PATCH v8 16/19] locking/rwsem: Guard against making count negative
Date: Mon, 20 May 2019 16:59:15 -0400	[thread overview]
Message-ID: <20190520205918.22251-17-longman@redhat.com> (raw)
In-Reply-To: <20190520205918.22251-1-longman@redhat.com>

The upper bits of the count field is used as reader count. When
sufficient number of active readers are present, the most significant
bit will be set and the count becomes negative. If the number of active
readers keep on piling up, we may eventually overflow the reader counts.
This is not likely to happen unless the number of bits reserved for
reader count is reduced because those bits are need for other purpose.

To prevent this count overflow from happening, the most significant
bit is now treated as a guard bit (RWSEM_FLAG_READFAIL). Read-lock
attempts will now fail for both the fast and slow paths whenever this
bit is set. So all those extra readers will be put to sleep in the wait
list. Wakeup will not happen until the reader count reaches 0.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 95 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 80 insertions(+), 15 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 743476f386b2..028f29b39045 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -116,13 +116,28 @@
 #endif
 
 /*
- * The definition of the atomic counter in the semaphore:
+ * On 64-bit architectures, the bit definitions of the count are:
  *
- * Bit  0   - writer locked bit
- * Bit  1   - waiters present bit
- * Bit  2   - lock handoff bit
- * Bits 3-7 - reserved
- * Bits 8-X - 24-bit (32-bit) or 56-bit reader count
+ * Bit  0    - writer locked bit
+ * Bit  1    - waiters present bit
+ * Bit  2    - lock handoff bit
+ * Bits 3-7  - reserved
+ * Bits 8-62 - 55-bit reader count
+ * Bit  63   - read fail bit
+ *
+ * On 32-bit architectures, the bit definitions of the count are:
+ *
+ * Bit  0    - writer locked bit
+ * Bit  1    - waiters present bit
+ * Bit  2    - lock handoff bit
+ * Bits 3-7  - reserved
+ * Bits 8-30 - 23-bit reader count
+ * Bit  31   - read fail bit
+ *
+ * It is not likely that the most significant bit (read fail bit) will ever
+ * be set. This guard bit is still checked anyway in the down_read() fastpath
+ * just in case we need to use up more of the reader bits for other purpose
+ * in the future.
  *
  * atomic_long_fetch_add() is used to obtain reader lock, whereas
  * atomic_long_cmpxchg() will be used to obtain writer lock.
@@ -139,6 +154,7 @@
 #define RWSEM_WRITER_LOCKED	(1UL << 0)
 #define RWSEM_FLAG_WAITERS	(1UL << 1)
 #define RWSEM_FLAG_HANDOFF	(1UL << 2)
+#define RWSEM_FLAG_READFAIL	(1UL << (BITS_PER_LONG - 1))
 
 #define RWSEM_READER_SHIFT	8
 #define RWSEM_READER_BIAS	(1UL << RWSEM_READER_SHIFT)
@@ -146,7 +162,7 @@
 #define RWSEM_WRITER_MASK	RWSEM_WRITER_LOCKED
 #define RWSEM_LOCK_MASK		(RWSEM_WRITER_MASK|RWSEM_READER_MASK)
 #define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\
-				 RWSEM_FLAG_HANDOFF)
+				 RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL)
 
 /*
  * All writes to owner are protected by WRITE_ONCE() to make sure that
@@ -253,6 +269,28 @@ static inline void rwsem_set_nonspinnable(struct rw_semaphore *sem)
 	}
 }
 
+/*
+ * This function does a read trylock by incrementing the reader count
+ * and then decrementing it immediately if too many readers are present
+ * (count becomes negative) in order to prevent the remote possibility
+ * of overflowing the count with minimal delay between the increment
+ * and decrement.
+ *
+ * It returns the adjustment that should be added back to the count
+ * in the slowpath.
+ */
+static inline long rwsem_read_trylock(struct rw_semaphore *sem, long *cnt)
+{
+	long adjustment = -RWSEM_READER_BIAS;
+
+	*cnt = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS, &sem->count);
+	if (unlikely(*cnt < 0)) {
+		atomic_long_add(-RWSEM_READER_BIAS, &sem->count);
+		adjustment = 0;
+	}
+	return adjustment;
+}
+
 /*
  * Return just the real task structure pointer of the owner
  */
@@ -401,6 +439,12 @@ static void rwsem_mark_wake(struct rw_semaphore *sem,
 		return;
 	}
 
+	/*
+	 * No reader wakeup if there are too many of them already.
+	 */
+	if (unlikely(atomic_long_read(&sem->count) < 0))
+		return;
+
 	/*
 	 * Writers might steal the lock before we grant it to the next reader.
 	 * We prefer to do the first reader grant before counting readers
@@ -947,13 +991,30 @@ static inline bool rwsem_reader_phase_trylock(struct rw_semaphore *sem,
  * Wait for the read lock to be granted
  */
 static struct rw_semaphore __sched *
-rwsem_down_read_slowpath(struct rw_semaphore *sem, int state)
+rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long adjustment)
 {
-	long count, adjustment = -RWSEM_READER_BIAS;
+	long count;
 	bool wake = false;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
+	if (unlikely(!adjustment)) {
+		/*
+		 * This shouldn't happen. If it does, there is probably
+		 * something wrong in the system.
+		 */
+		WARN_ON_ONCE(1);
+
+		/*
+		 * An adjustment of 0 means that there are too many readers
+		 * holding or trying to acquire the lock. So disable
+		 * optimistic spinning and go directly into the wait list.
+		 */
+		if (rwsem_test_oflags(sem, RWSEM_RD_NONSPINNABLE))
+			rwsem_set_nonspinnable(sem);
+		goto queue;
+	}
+
 	/*
 	 * Save the current read-owner of rwsem, if available, and the
 	 * reader nonspinnable bit.
@@ -1271,9 +1332,10 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
  */
 inline void __down_read(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
-			&sem->count) & RWSEM_READ_FAILED_MASK)) {
-		rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE);
+	long tmp, adjustment = rwsem_read_trylock(sem, &tmp);
+
+	if (unlikely(tmp & RWSEM_READ_FAILED_MASK)) {
+		rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE, adjustment);
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	} else {
 		rwsem_set_reader_owned(sem);
@@ -1282,9 +1344,11 @@ inline void __down_read(struct rw_semaphore *sem)
 
 static inline int __down_read_killable(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
-			&sem->count) & RWSEM_READ_FAILED_MASK)) {
-		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE)))
+	long tmp, adjustment = rwsem_read_trylock(sem, &tmp);
+
+	if (unlikely(tmp & RWSEM_READ_FAILED_MASK)) {
+		if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE,
+						    adjustment)))
 			return -EINTR;
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	} else {
@@ -1360,6 +1424,7 @@ inline void __up_read(struct rw_semaphore *sem)
 	DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	rwsem_clear_reader_owned(sem);
 	tmp = atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count);
+	DEBUG_RWSEMS_WARN_ON(tmp < 0, sem);
 	if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS)) ==
 		      RWSEM_FLAG_WAITERS)) {
 		clear_wr_nonspinnable(sem);
-- 
2.18.1


  parent reply	other threads:[~2019-05-20 21:00 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-20 20:58 [PATCH v8 00/19] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
2019-05-20 20:59 ` [PATCH v8 01/19] locking/rwsem: Make owner available even if !CONFIG_RWSEM_SPIN_ON_OWNER Waiman Long
2019-06-17 14:23   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 02/19] locking/rwsem: Remove rwsem_wake() wakeup optimization Waiman Long
2019-06-17 14:24   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 03/19] locking/rwsem: Implement a new locking scheme Waiman Long
2019-06-17 14:24   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 04/19] locking/rwsem: Merge rwsem.h and rwsem-xadd.c into rwsem.c Waiman Long
2019-06-17 14:25   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 05/19] locking/rwsem: Code cleanup after files merging Waiman Long
2019-06-17 14:26   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 06/19] locking/rwsem: Make rwsem_spin_on_owner() return owner state Waiman Long
2019-06-17 14:27   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 07/19] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
2019-06-04  3:03   ` Yuyang Du
2019-06-04  3:26     ` Yuyang Du
2019-06-04  9:12       ` Boqun Feng
2019-06-04 16:00         ` Waiman Long
2019-06-05  7:48           ` Yuyang Du
2019-06-04 13:21       ` Waiman Long
2019-06-17 14:27   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 08/19] locking/rwsem: Always release wait_lock before waking up tasks Waiman Long
2019-06-17 14:28   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 09/19] locking/rwsem: More optimal RT task handling of null owner Waiman Long
2019-06-17 14:29   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 10/19] locking/rwsem: Wake up almost all readers in wait queue Waiman Long
2019-06-17 14:29   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 11/19] locking/rwsem: Clarify usage of owner's nonspinaable bit Waiman Long
2019-06-17 14:30   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 12/19] locking/rwsem: Enable readers spinning on writer Waiman Long
2019-06-17 14:31   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 13/19] locking/rwsem: Make rwsem->owner an atomic_long_t Waiman Long
2019-06-04  8:52   ` Peter Zijlstra
2019-06-04 15:44     ` Waiman Long
2019-06-17 14:32   ` [tip:locking/core] " tip-bot for Waiman Long
2019-07-19 18:45   ` [PATCH v8 13/19] " Luis Henriques
2019-07-19 19:32     ` Waiman Long
2019-07-19 19:45       ` Luis Henriques
2019-07-19 20:14         ` Waiman Long
2019-07-19 19:51       ` Linus Torvalds
2019-07-20  8:41         ` Luis Henriques
2019-07-20  9:32           ` Luis Henriques
2019-07-20  9:45             ` Luis Henriques
2019-07-20 11:10           ` Peter Zijlstra
2019-07-20 15:04           ` Waiman Long
2019-07-21 20:49             ` Luis Henriques
2019-07-23  2:57               ` Waiman Long
2019-07-25 15:59             ` [tip:locking/core] locking/rwsem: Don't call owner_on_cpu() on read-owner tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 14/19] locking/rwsem: Enable time-based spinning on reader-owned rwsem Waiman Long
2019-06-04  9:03   ` Peter Zijlstra
2019-06-04 16:54     ` Waiman Long
2019-06-17 14:32   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 15/19] locking/rwsem: Adaptive disabling of reader optimistic spinning Waiman Long
2019-06-04  9:10   ` Peter Zijlstra
2019-06-04 17:28     ` Waiman Long
2019-06-04  9:14   ` Peter Zijlstra
2019-06-04 17:29     ` Waiman Long
2019-06-04  9:20   ` Peter Zijlstra
2019-06-04 17:30     ` Waiman Long
2019-06-04 17:38       ` Peter Zijlstra
2019-06-04 18:04         ` Waiman Long
2019-06-04 18:14           ` Peter Zijlstra
2019-06-04 18:21             ` Waiman Long
2019-06-05 18:13               ` Waiman Long
2019-06-05 20:19                 ` Peter Zijlstra
2019-06-05 20:52                   ` Linus Torvalds
2019-06-06  8:03                     ` Peter Zijlstra
2019-06-06  8:11                       ` Peter Zijlstra
2019-06-04 10:58   ` Peter Zijlstra
2019-06-17 14:33   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` Waiman Long [this message]
2019-06-11 13:11   ` [PATCH v8 16/19] locking/rwsem: Guard against making count negative Peter Zijlstra
2019-06-11 13:27     ` Peter Zijlstra
2019-06-11 13:13   ` Peter Zijlstra
2019-06-17 14:34   ` [tip:locking/core] " tip-bot for Waiman Long
2019-05-20 20:59 ` [PATCH v8 17/19] locking/rwsem: Merge owner into count on x86-64 Waiman Long
2019-06-04  9:45   ` Peter Zijlstra
2019-06-04 15:47     ` Waiman Long
2019-06-04 17:02       ` Peter Zijlstra
2019-06-04 17:06         ` Waiman Long
2019-06-04 17:18           ` Peter Zijlstra
2019-05-20 20:59 ` [PATCH v8 18/19] locking/rwsem: Remove redundant computation of writer lock word Waiman Long
2019-05-20 20:59 ` [PATCH v8 19/19] locking/rwsem: Disable preemption in down_read*() if owner in count Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190520205918.22251-17-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=bp@alien8.de \
    --cc=dave@stgolabs.net \
    --cc=hpa@zytor.com \
    --cc=huang.ying.caritas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.