[RFC PATCH] locking/rwsem: Add upgrade_read()

* [RFC PATCH] locking/rwsem: Add upgrade_read()
@ 2021-09-22 19:36 Waiman Long
  2021-09-22 20:06 ` Peter Zijlstra
  2021-09-23  1:16 ` Boqun Feng
  0 siblings, 2 replies; 5+ messages in thread
From: Waiman Long @ 2021-09-22 19:36 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng
  Cc: linux-kernel, Davidlohr Bueso, Waiman Long

Currently there are about 12 instances in the kernel where an up_read()
is immediately followed by a down_write() of the same lock. For example,

  drivers/tty/n_tty.c:		up_read(&tty->termios_rwsem);
  drivers/tty/n_tty.c-		down_write(&tty->termios_rwsem);

Since we have already provided a downgrade_write() function, we may as
well provide an upgrade_read() function to make the code easier to read
and the intention clearer.

If the current task is the only reader, the upgrade can be done by a
single atomic operation. If not, the upgrade will have to be done by a
separate up_read() call followed by a down_write(). In the former case,
the handoff bit is not considered and the waiter will have to wait a
bit longer to acquire the lock.

The new upgrade_read() function returns a value of 0 for safe upgrade
where rwsem protected data won't change. Otherwise a value of 1 is
returned to indicate unsafe upgrade where rwsem protected data may
change during the upgrade process.

For PREEMPT_RT, it falls back to up_read() followed by down_write()
for simplicity.

Some uses of down_write() with long lock hold time may be changed
to the following format in the future:

	down_read()
	/* check data */
	if (upgrade_read()) {
		/* unsafe upgrade, recheck data */
	}
	/* update data */
	up_write();

As long as the "recheck data" and "update data" parts are relatively
short compared with the "check data" part, this conversion may help to
improve parallelism and reduce lock contention.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/rwsem.h  |  5 ++++
 kernel/locking/rwsem.c | 53 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 352c6127cb90..8ece58224f25 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -207,6 +207,11 @@ extern void up_write(struct rw_semaphore *sem);
  */
 extern void downgrade_write(struct rw_semaphore *sem);
 
+/*
+ * upgrade read lock to write lock
+ */
+extern int upgrade_read(struct rw_semaphore *sem);
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 /*
  * nested locking. NOTE: rwsems are not allowed to recurse
diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 000e8d5a2884..aeb5b0668304 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1203,6 +1203,29 @@ static struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)
 	return sem;
 }
 
+/*
+ * Try to upgrade read lock to write lock
+ */
+static inline int __try_upgrade_read(struct rw_semaphore *sem)
+{
+	long count = atomic_long_read(&sem->count);
+
+	WARN_ON_ONCE(count & RWSEM_WRITER_LOCKED);
+
+	/*
+	 * When upgrading from shared to exclusive ownership,
+	 * anything inside the write-locked region cannot leak
+	 * into the read side. Use an ACQUIRE semantics.
+	 */
+	if (((count & RWSEM_READER_MASK) == RWSEM_READER_BIAS) &&
+	     atomic_long_try_cmpxchg_acquire(&sem->count, &count,
+			count - RWSEM_READER_BIAS + RWSEM_WRITER_LOCKED)) {
+		rwsem_set_owner(sem);
+		return 1;
+	}
+	return 0;
+}
+
 /*
  * lock for reading
  */
@@ -1438,6 +1461,11 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
 	rwbase_write_downgrade(&sem->rwbase);
 }
 
+static inline int __try_upgrade_read(struct rw_semaphore *sem)
+{
+	return 0;
+}
+
 /* Debug stubs for the common API */
 #define DEBUG_RWSEMS_WARN_ON(c, sem)
 
@@ -1581,6 +1609,31 @@ void downgrade_write(struct rw_semaphore *sem)
 }
 EXPORT_SYMBOL(downgrade_write);
 
+/*
+ * Upgrade read lock to write lock
+ *
+ * Return: 0 when upgrade is safe, i.e. rwsem protected data do not change;
+ *         1 when upgrade is unsafe as rwsem protected data may have changed.
+ */
+int upgrade_read(struct rw_semaphore *sem)
+{
+	if (__try_upgrade_read(sem)) {
+		rwsem_release(&sem->dep_map, _RET_IP_);
+		rwsem_acquire(&sem->dep_map, 0, 0, _RET_IP_);
+		return 0;
+	}
+
+	/*
+	 * We cannot directly upgrade to the write lock, just do a regular
+	 * up_read() and down_write() sequence. The data protected by the
+	 * rwsem may have changed before the write lock is acquired.
+	 */
+	down_read(sem);
+	up_write(sem);
+	return 1;
+}
+EXPORT_SYMBOL(upgrade_read);
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
 void down_read_nested(struct rw_semaphore *sem, int subclass)
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread