All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	Davidlohr Bueso <dave@stgolabs.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	huang ying <huang.ying.caritas@gmail.com>,
	Waiman Long <longman@redhat.com>
Subject: [PATCH v4 14/16] locking/rwsem: Guard against making count negative
Date: Sat, 13 Apr 2019 13:22:57 -0400	[thread overview]
Message-ID: <20190413172259.2740-15-longman@redhat.com> (raw)
In-Reply-To: <20190413172259.2740-1-longman@redhat.com>

The upper bits of the count field is used as reader count. When
sufficient number of active readers are present, the most significant
bit will be set and the count becomes negative. If the number of active
readers keep on piling up, we may eventually overflow the reader counts.
This is not likely to happen unless the number of bits reserved for
reader count is reduced because those bits are need for other purpose.

To prevent this count overflow from happening, the most significant bit
is now treated as a guard bit (RWSEM_FLAG_READFAIL). Read-lock attempts
will now fail for both the fast and optimistic spinning paths whenever
this bit is set. So all those extra readers will be put to sleep in
the wait queue. Wakeup will not happen until the reader count reaches 0.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/locking/rwsem.c | 84 ++++++++++++++++++++++++++++++++----------
 1 file changed, 64 insertions(+), 20 deletions(-)

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index ab26aba65371..f37ab6358fe0 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -73,13 +73,28 @@
 #endif
 
 /*
- * The definition of the atomic counter in the semaphore:
+ * On 64-bit architectures, the bit definitions of the count are:
  *
- * Bit  0   - writer locked bit
- * Bit  1   - waiters present bit
- * Bit  2   - lock handoff bit
- * Bits 3-7 - reserved
- * Bits 8-X - 24-bit (32-bit) or 56-bit reader count
+ * Bit  0    - writer locked bit
+ * Bit  1    - waiters present bit
+ * Bit  2    - lock handoff bit
+ * Bits 3-7  - reserved
+ * Bits 8-62 - 55-bit reader count
+ * Bit  63   - read fail bit
+ *
+ * On 32-bit architectures, the bit definitions of the count are:
+ *
+ * Bit  0    - writer locked bit
+ * Bit  1    - waiters present bit
+ * Bit  2    - lock handoff bit
+ * Bits 3-7  - reserved
+ * Bits 8-30 - 23-bit reader count
+ * Bit  31   - read fail bit
+ *
+ * It is not likely that the most significant bit (read fail bit) will ever
+ * be set. This guard bit is still checked anyway in the down_read() fastpath
+ * just in case we need to use up more of the reader bits for other purpose
+ * in the future.
  *
  * atomic_long_fetch_add() is used to obtain reader lock, whereas
  * atomic_long_cmpxchg() will be used to obtain writer lock.
@@ -96,6 +111,7 @@
 #define RWSEM_WRITER_LOCKED	(1UL << 0)
 #define RWSEM_FLAG_WAITERS	(1UL << 1)
 #define RWSEM_FLAG_HANDOFF	(1UL << 2)
+#define RWSEM_FLAG_READFAIL	(1UL << (BITS_PER_LONG - 1))
 
 #define RWSEM_READER_SHIFT	8
 #define RWSEM_READER_BIAS	(1UL << RWSEM_READER_SHIFT)
@@ -103,7 +119,7 @@
 #define RWSEM_WRITER_MASK	RWSEM_WRITER_LOCKED
 #define RWSEM_LOCK_MASK		(RWSEM_WRITER_MASK|RWSEM_READER_MASK)
 #define RWSEM_READ_FAILED_MASK	(RWSEM_WRITER_MASK|RWSEM_FLAG_WAITERS|\
-				 RWSEM_FLAG_HANDOFF)
+				 RWSEM_FLAG_HANDOFF|RWSEM_FLAG_READFAIL)
 
 #define RWSEM_COUNT_LOCKED(c)	((c) & RWSEM_LOCK_MASK)
 #define RWSEM_COUNT_WLOCKED(c)	((c) & RWSEM_WRITER_MASK)
@@ -315,7 +331,8 @@ enum writer_wait_state {
 /*
  * We limit the maximum number of readers that can be woken up for a
  * wake-up call to not penalizing the waking thread for spending too
- * much time doing it.
+ * much time doing it as well as the unlikely possiblity of overflowing
+ * the reader count.
  */
 #define MAX_READERS_WAKEUP	0x100
 
@@ -799,12 +816,35 @@ rwsem_waiter_is_first(struct rw_semaphore *sem, struct rwsem_waiter *waiter)
  * Wait for the read lock to be granted
  */
 static inline struct rw_semaphore __sched *
-__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
+__rwsem_down_read_failed_common(struct rw_semaphore *sem, int state, long count)
 {
-	long count, adjustment = -RWSEM_READER_BIAS;
+	long adjustment = -RWSEM_READER_BIAS;
 	struct rwsem_waiter waiter;
 	DEFINE_WAKE_Q(wake_q);
 
+	if (unlikely(count < 0)) {
+		/*
+		 * The sign bit has been set meaning that too many active
+		 * readers are present. We need to decrement reader count &
+		 * enter wait queue immediately to avoid overflowing the
+		 * reader count.
+		 *
+		 * As preemption is not disabled, there is a remote
+		 * possibility that premption can happen in the narrow
+		 * timing window between incrementing and decrementing
+		 * the reader count and the task is put to sleep for a
+		 * considerable amount of time. If sufficient number
+		 * of such unfortunate sequence of events happen, we
+		 * may still overflow the reader count. It is extremely
+		 * unlikey, though. If this is a concern, we should consider
+		 * disable preemption during this timing window to make
+		 * sure that such unfortunate event will not happen.
+		 */
+		atomic_long_add(-RWSEM_READER_BIAS, &sem->count);
+		adjustment = 0;
+		goto queue;
+	}
+
 	if (!rwsem_can_spin_on_owner(sem))
 		goto queue;
 
@@ -905,15 +945,15 @@ __rwsem_down_read_failed_common(struct rw_semaphore *sem, int state)
 }
 
 static inline struct rw_semaphore * __sched
-rwsem_down_read_failed(struct rw_semaphore *sem)
+rwsem_down_read_failed(struct rw_semaphore *sem, long cnt)
 {
-	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE);
+	return __rwsem_down_read_failed_common(sem, TASK_UNINTERRUPTIBLE, cnt);
 }
 
 static inline struct rw_semaphore * __sched
-rwsem_down_read_failed_killable(struct rw_semaphore *sem)
+rwsem_down_read_failed_killable(struct rw_semaphore *sem, long cnt)
 {
-	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE);
+	return __rwsem_down_read_failed_common(sem, TASK_KILLABLE, cnt);
 }
 
 /*
@@ -1118,9 +1158,11 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
  */
 inline void __down_read(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
-			&sem->count) & RWSEM_READ_FAILED_MASK)) {
-		rwsem_down_read_failed(sem);
+	long count = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
+						   &sem->count);
+
+	if (unlikely(count & RWSEM_READ_FAILED_MASK)) {
+		rwsem_down_read_failed(sem, count);
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	} else {
 		rwsem_set_reader_owned(sem);
@@ -1129,9 +1171,11 @@ inline void __down_read(struct rw_semaphore *sem)
 
 static inline int __down_read_killable(struct rw_semaphore *sem)
 {
-	if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
-			&sem->count) & RWSEM_READ_FAILED_MASK)) {
-		if (IS_ERR(rwsem_down_read_failed_killable(sem)))
+	long count = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
+						   &sem->count);
+
+	if (unlikely(count & RWSEM_READ_FAILED_MASK)) {
+		if (IS_ERR(rwsem_down_read_failed_killable(sem, count)))
 			return -EINTR;
 		DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
 	} else {
-- 
2.18.1


  parent reply	other threads:[~2019-04-13 17:24 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-13 17:22 [PATCH v4 00/16] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
2019-04-13 17:22 ` [PATCH v4 01/16] locking/rwsem: Prevent unneeded warning during locking selftest Waiman Long
2019-04-18  8:04   ` [tip:locking/core] " tip-bot for Waiman Long
2019-04-13 17:22 ` [PATCH v4 02/16] locking/rwsem: Make owner available even if !CONFIG_RWSEM_SPIN_ON_OWNER Waiman Long
2019-04-13 17:22 ` [PATCH v4 03/16] locking/rwsem: Remove rwsem_wake() wakeup optimization Waiman Long
2019-04-13 17:22 ` [PATCH v4 04/16] locking/rwsem: Implement a new locking scheme Waiman Long
2019-04-16 13:22   ` Peter Zijlstra
2019-04-16 13:32     ` Waiman Long
2019-04-16 14:18       ` Peter Zijlstra
2019-04-16 14:42         ` Peter Zijlstra
2019-04-13 17:22 ` [PATCH v4 05/16] locking/rwsem: Merge rwsem.h and rwsem-xadd.c into rwsem.c Waiman Long
2019-04-13 17:22 ` [PATCH v4 06/16] locking/rwsem: Code cleanup after files merging Waiman Long
2019-04-16 16:01   ` Peter Zijlstra
2019-04-16 16:17     ` Peter Zijlstra
2019-04-16 19:45       ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 07/16] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
2019-04-16 14:12   ` Peter Zijlstra
2019-04-16 20:26     ` Waiman Long
2019-04-16 21:07       ` Waiman Long
2019-04-17  7:13         ` Peter Zijlstra
2019-04-17 16:22           ` Waiman Long
2019-04-16 15:49   ` Peter Zijlstra
2019-04-16 16:15     ` Peter Zijlstra
2019-04-16 18:41       ` Waiman Long
2019-04-16 18:16     ` Waiman Long
2019-04-16 18:32       ` Peter Zijlstra
2019-04-17  7:35       ` Peter Zijlstra
2019-04-17 16:35         ` Waiman Long
2019-04-17  8:05       ` Peter Zijlstra
2019-04-17 16:39         ` Waiman Long
2019-04-18  8:22           ` Peter Zijlstra
2019-04-17  8:17   ` Peter Zijlstra
2019-04-13 17:22 ` [PATCH v4 08/16] locking/rwsem: Make rwsem_spin_on_owner() return owner state Waiman Long
2019-04-17  9:00   ` Peter Zijlstra
2019-04-17 16:42     ` Waiman Long
2019-04-17 10:19   ` Peter Zijlstra
2019-04-17 16:53     ` Waiman Long
2019-04-17 12:41   ` Peter Zijlstra
2019-04-17 12:47     ` Peter Zijlstra
2019-04-17 18:29       ` Waiman Long
2019-04-18  8:39         ` Peter Zijlstra
2019-04-17 13:00     ` Peter Zijlstra
2019-04-17 18:50       ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 09/16] locking/rwsem: Ensure an RT task will not spin on reader Waiman Long
2019-04-17 13:18   ` Peter Zijlstra
2019-04-17 18:47     ` Waiman Long
2019-04-18  8:52       ` Peter Zijlstra
2019-04-18 13:27         ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 10/16] locking/rwsem: Wake up almost all readers in wait queue Waiman Long
2019-04-16 16:50   ` Davidlohr Bueso
2019-04-16 17:37     ` Waiman Long
2019-04-17 13:39   ` Peter Zijlstra
2019-04-17 17:16     ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 11/16] locking/rwsem: Enable readers spinning on writer Waiman Long
2019-04-17 13:56   ` Peter Zijlstra
2019-04-17 17:34     ` Waiman Long
2019-04-18  8:57       ` Peter Zijlstra
2019-04-18 14:35         ` Waiman Long
2019-04-17 13:58   ` Peter Zijlstra
2019-04-17 17:45     ` Waiman Long
2019-04-18  9:00       ` Peter Zijlstra
2019-04-18 13:40         ` Waiman Long
2019-04-17 14:05   ` Peter Zijlstra
2019-04-17 17:51     ` Waiman Long
2019-04-18  9:11       ` Peter Zijlstra
2019-04-18 14:37         ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 12/16] locking/rwsem: Enable time-based spinning on reader-owned rwsem Waiman Long
2019-04-18 13:06   ` Peter Zijlstra
2019-04-18 15:15     ` Waiman Long
2019-04-19  7:56       ` Peter Zijlstra
2019-04-19 14:33         ` Waiman Long
2019-04-19 15:36           ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 13/16] locking/rwsem: Add more rwsem owner access helpers Waiman Long
2019-04-13 17:22 ` Waiman Long [this message]
2019-04-18 13:51   ` [PATCH v4 14/16] locking/rwsem: Guard against making count negative Peter Zijlstra
2019-04-18 14:08     ` Waiman Long
2019-04-18 14:30       ` Peter Zijlstra
2019-04-18 14:40       ` Peter Zijlstra
2019-04-18 14:54         ` Waiman Long
2019-04-19 10:26           ` Peter Zijlstra
2019-04-19 12:02             ` Peter Zijlstra
2019-04-19 13:03               ` Peter Zijlstra
2019-04-19 13:15                 ` Peter Zijlstra
2019-04-19 19:39                   ` Waiman Long
2019-04-21 21:07                     ` Waiman Long
2019-04-23 14:17                       ` Peter Zijlstra
2019-04-23 14:31                         ` Waiman Long
2019-04-23 16:27                         ` Linus Torvalds
2019-04-23 19:12                           ` Waiman Long
2019-04-23 19:34                             ` Peter Zijlstra
2019-04-23 19:41                               ` Waiman Long
2019-04-23 19:55                                 ` [PATCH] bpf: Fix preempt_enable_no_resched() abuse Peter Zijlstra
2019-04-23 20:03                                   ` [PATCH] trace: " Peter Zijlstra
2019-04-23 23:58                                     ` Steven Rostedt
2019-04-29  6:39                                     ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-04-29 13:31                                       ` Steven Rostedt
2019-04-29 14:08                                         ` Ingo Molnar
2019-04-23 20:27                                   ` [PATCH] bpf: " Linus Torvalds
2019-04-23 20:35                                     ` Peter Zijlstra
2019-04-23 20:45                                       ` Linus Torvalds
2019-04-24 13:19                                       ` Peter Zijlstra
2019-04-25 21:23                                   ` Alexei Starovoitov
2019-04-26  7:14                                     ` Peter Zijlstra
2019-04-24  7:09                             ` [PATCH v4 14/16] locking/rwsem: Guard against making count negative Peter Zijlstra
2019-04-24 16:49                               ` Waiman Long
2019-04-24 17:01                                 ` Peter Zijlstra
2019-04-24 17:10                                   ` Waiman Long
2019-04-24 17:56                                   ` Linus Torvalds
2019-04-13 17:22 ` [PATCH v4 15/16] locking/rwsem: Merge owner into count on x86-64 Waiman Long
2019-04-18 14:28   ` Peter Zijlstra
2019-04-18 14:40     ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 16/16] locking/rwsem: Remove redundant computation of writer lock word Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190413172259.2740-15-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=huang.ying.caritas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.