All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Davidlohr Bueso <dave@stgolabs.net>
Subject: [patch 15/63] locking: Add base code for RT rw_semaphore and rwlock
Date: Fri, 30 Jul 2021 15:50:22 +0200	[thread overview]
Message-ID: <20210730135206.018846923@linutronix.de> (raw)
In-Reply-To: 20210730135007.155909613@linutronix.de

From: Thomas Gleixner <tglx@linutronix.de>

On PREEMPT_RT rw_semaphores and rwlocks are substituted with a rtmutex and
a reader count. The implementation is writer unfair as it is not feasible
to do priority inheritance on multiple readers, but experience has shown
that realtime workloads are not the typical workloads which are sensitive
to writer starvation.

The inner workings of rw_semaphores and rwlocks on RT are almost indentical
except for the task state and signal handling. rw_semaphores are not state
preserving over a contention, they are expected to enter and leave with state
== TASK_RUNNING. rwlocks have a mechanism to preserve the state of the task
at entry and restore it after unblocking taking potential non-lock related
wakeups into account. rw_semaphores can also be subject to signal handling
interrupting a blocked state, while rwlocks ignore signals.

To avoid code duplication, provide a shared implementation which takes the
small difference vs. state and signals into account. The code is included
into the relevant rw_semaphore/rwlock base code and compiled for each use
case seperately.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/rwbase_rt.h  |   38 ++++++
 kernel/locking/rwbase_rt.c |  263 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 301 insertions(+)
 create mode 100644 include/linux/rwbase_rt.h
 create mode 100644 kernel/locking/rwbase_rt.c
---
--- /dev/null
+++ b/include/linux/rwbase_rt.h
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#ifndef _LINUX_RW_BASE_RT_H
+#define _LINUX_RW_BASE_RT_H
+
+#include <linux/rtmutex.h>
+#include <linux/atomic.h>
+
+#define READER_BIAS		(1U << 31)
+#define WRITER_BIAS		(1U << 30)
+
+struct rwbase_rt {
+	atomic_t		readers;
+	struct rt_mutex_base	rtmutex;
+};
+
+#define __RWBASE_INITIALIZER(name)				\
+{								\
+	.readers = ATOMIC_INIT(READER_BIAS),			\
+	.rtmutex = __RT_MUTEX_BASE_INITIALIZER(name.rtmutex),	\
+}
+
+#define init_rwbase_rt(rwbase)					\
+	do {							\
+		rt_mutex_base_init(&(rwbase)->rtmutex);		\
+		atomic_set(&(rwbase)->readers, READER_BIAS);	\
+	} while (0)
+
+
+static __always_inline bool rw_base_is_locked(struct rwbase_rt *rwb)
+{
+	return atomic_read(&rwb->readers) != READER_BIAS;
+}
+
+static __always_inline bool rw_base_is_contended(struct rwbase_rt *rwb)
+{
+	return atomic_read(&rwb->readers) > 0;
+}
+#endif
--- /dev/null
+++ b/kernel/locking/rwbase_rt.c
@@ -0,0 +1,263 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * RT-specific reader/writer semaphores and reader/writer locks
+ *
+ * down_write/write_lock()
+ *  1) Lock rtmutex
+ *  2) Remove the reader BIAS to force readers into the slow path
+ *  3) Wait until all readers have left the critical region
+ *  4) Mark it write locked
+ *
+ * up_write/write_unlock()
+ *  1) Remove the write locked marker
+ *  2) Set the reader BIAS so readers can use the fast path again
+ *  3) Unlock rtmutex to release blocked readers
+ *
+ * down_read/read_lock()
+ *  1) Try fast path acquisition (reader BIAS is set)
+ *  2) Take tmutex::wait_lock which protects the writelocked flag
+ *  3) If !writelocked, acquire it for read
+ *  4) If writelocked, block on tmutex
+ *  5) unlock rtmutex, goto 1)
+ *
+ * up_read/read_unlock()
+ *  1) Try fast path release (reader count != 1)
+ *  2) Wake the writer waiting in down_write()/write_lock() #3
+ *
+ * down_read/read_lock()#3 has the consequence, that rw semaphores and rw
+ * locks on RT are not writer fair, but writers, which should be avoided in
+ * RT tasks (think mmap_sem), are subject to the rtmutex priority/DL
+ * inheritance mechanism.
+ *
+ * It's possible to make the rw primitives writer fair by keeping a list of
+ * active readers. A blocked writer would force all newly incoming readers
+ * to block on the rtmutex, but the rtmutex would have to be proxy locked
+ * for one reader after the other. We can't use multi-reader inheritance
+ * because there is no way to support that with SCHED_DEADLINE.
+ * Implementing the one by one reader boosting/handover mechanism is a
+ * major surgery for a very dubious value.
+ *
+ * The risk of writer starvation is there, but the pathological use cases
+ * which trigger it are not necessarily the typical RT workloads.
+ *
+ * Common code shared between RT rw_semaphore and rwlock
+ */
+
+static __always_inline int rwbase_read_trylock(struct rwbase_rt *rwb)
+{
+	int r;
+
+	/*
+	 * Increment reader count, if sem->readers < 0, i.e. READER_BIAS is
+	 * set.
+	 */
+	for (r = atomic_read(&rwb->readers); r < 0;) {
+		if (likely(atomic_try_cmpxchg(&rwb->readers, &r, r + 1)))
+			return 1;
+	}
+	return 0;
+}
+
+static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
+				      unsigned int state)
+{
+	struct rt_mutex_base *rtm = &rwb->rtmutex;
+	int ret;
+
+	raw_spin_lock_irq(&rtm->wait_lock);
+	/*
+	 * Allow readers as long as the writer has not completely
+	 * acquired the semaphore for write.
+	 */
+	if (atomic_read(&rwb->readers) != WRITER_BIAS) {
+		atomic_inc(&rwb->readers);
+		raw_spin_unlock_irq(&rtm->wait_lock);
+		return 0;
+	}
+
+	/*
+	 * Call into the slow lock path with the rtmutex->wait_lock
+	 * held, so this can't result in the following race:
+	 *
+	 * Reader1		Reader2		Writer
+	 *			down_read()
+	 *					down_write()
+	 *					rtmutex_lock(m)
+	 *					wait()
+	 * down_read()
+	 * unlock(m->wait_lock)
+	 *			up_read()
+	 *			wake(Writer)
+	 *					lock(m->wait_lock)
+	 *					sem->writelocked=true
+	 *					unlock(m->wait_lock)
+	 *
+	 *					up_write()
+	 *					sem->writelocked=false
+	 *					rtmutex_unlock(m)
+	 *			down_read()
+	 *					down_write()
+	 *					rtmutex_lock(m)
+	 *					wait()
+	 * rtmutex_lock(m)
+	 *
+	 * That would put Reader1 behind the writer waiting on
+	 * Reader2 to call up_read() which might be unbound.
+	 */
+
+	/*
+	 * For rwlocks this returns 0 unconditionally, so the below
+	 * !ret conditionals are optimized out.
+	 */
+	ret = rwbase_rtmutex_slowlock_locked(rtm, state);
+
+	/*
+	 * On success the rtmutex is held, so there can't be a writer
+	 * active. Increment the reader count and immediately drop the
+	 * rtmutex again.
+	 *
+	 * rtmutex->wait_lock has to be unlocked in any case of course.
+	 */
+	if (!ret)
+		atomic_inc(&rwb->readers);
+	raw_spin_unlock_irq(&rtm->wait_lock);
+	if (!ret)
+		rwbase_rtmutex_unlock(rtm);
+	return ret;
+}
+
+static __always_inline int rwbase_read_lock(struct rwbase_rt *rwb,
+					    unsigned int state)
+{
+	if (rwbase_read_trylock(rwb))
+		return 0;
+
+	return __rwbase_read_lock(rwb, state);
+}
+
+static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb,
+					 unsigned int state)
+{
+	struct rt_mutex_base *rtm = &rwb->rtmutex;
+	struct task_struct *owner;
+
+	raw_spin_lock_irq(&rtm->wait_lock);
+	/*
+	 * Wake the writer, i.e. the rtmutex owner. It might release the
+	 * rtmutex concurrently in the fast path (due to a signal), but to
+	 * clean up rwb->readers it needs to acquire rtm->wait_lock. The
+	 * worst case which can happen is a spurious wakeup.
+	 */
+	owner = rt_mutex_owner(rtm);
+	if (owner)
+		wake_up_state(owner, state);
+
+	raw_spin_unlock_irq(&rtm->wait_lock);
+}
+
+static __always_inline void rwbase_read_unlock(struct rwbase_rt *rwb,
+					       unsigned int state)
+{
+	/*
+	 * rwb->readers can only hit 0 when a writer is waiting for the
+	 * active readers to leave the critical region.
+	 */
+	if (unlikely(atomic_dec_and_test(&rwb->readers)))
+		__rwbase_read_unlock(rwb, state);
+}
+
+static inline void __rwbase_write_unlock(struct rwbase_rt *rwb, int bias,
+					 unsigned long flags)
+{
+	struct rt_mutex_base *rtm = &rwb->rtmutex;
+
+	atomic_add(READER_BIAS - bias, &rwb->readers);
+	raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
+	rwbase_rtmutex_unlock(rtm);
+}
+
+static inline void rwbase_write_unlock(struct rwbase_rt *rwb)
+{
+	struct rt_mutex_base *rtm = &rwb->rtmutex;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
+	__rwbase_write_unlock(rwb, WRITER_BIAS, flags);
+}
+
+static inline void rwbase_write_downgrade(struct rwbase_rt *rwb)
+{
+	struct rt_mutex_base *rtm = &rwb->rtmutex;
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
+	/* Release it and account current as reader */
+	__rwbase_write_unlock(rwb, WRITER_BIAS - 1, flags);
+}
+
+static int __sched rwbase_write_lock(struct rwbase_rt *rwb,
+				     unsigned int state)
+{
+	struct rt_mutex_base *rtm = &rwb->rtmutex;
+	unsigned long flags;
+
+	/* Take the rtmutex as a first step */
+	if (rwbase_rtmutex_lock_state(rtm, state))
+		return -EINTR;
+
+	/* Force readers into slow path */
+	atomic_sub(READER_BIAS, &rwb->readers);
+
+	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
+	/*
+	 * set_current_state() for rw_semaphore
+	 * current_save_and_set_rtlock_wait_state() for rwlock
+	 */
+	rwbase_set_and_save_current_state(state);
+
+	/* Block until all readers have left the critical region. */
+	for (; atomic_read(&rwb->readers);) {
+		/* Optimized out for rwlocks */
+		if (rwbase_signal_pending_state(state, current)) {
+			__set_current_state(TASK_RUNNING);
+			__rwbase_write_unlock(rwb, 0, flags);
+			return -EINTR;
+		}
+		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
+
+		/*
+		 * Schedule and wait for the readers to leave the critical
+		 * section. The last reader leaving it wakes the waiter.
+		 */
+		if (atomic_read(&rwb->readers) != 0)
+			rwbase_schedule();
+		set_current_state(state);
+		raw_spin_lock_irqsave(&rtm->wait_lock, flags);
+	}
+
+	atomic_set(&rwb->readers, WRITER_BIAS);
+	rwbase_restore_current_state();
+	raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
+	return 0;
+}
+
+static inline int rwbase_write_trylock(struct rwbase_rt *rwb)
+{
+	struct rt_mutex_base *rtm = &rwb->rtmutex;
+	unsigned long flags;
+
+	if (!rwbase_rtmutex_trylock(rtm))
+		return 0;
+
+	atomic_sub(READER_BIAS, &rwb->readers);
+
+	raw_spin_lock_irqsave(&rtm->wait_lock, flags);
+	if (!atomic_read(&rwb->readers)) {
+		atomic_set(&rwb->readers, WRITER_BIAS);
+		raw_spin_unlock_irqrestore(&rtm->wait_lock, flags);
+		return 1;
+	}
+	__rwbase_write_unlock(rwb, 0, flags);
+	return 0;
+}


  parent reply	other threads:[~2021-07-30 14:20 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 13:50 [patch 00/63] locking, sched: The PREEMPT-RT locking infrastructure Thomas Gleixner
2021-07-30 13:50 ` [patch 01/63] sched: Split out the wakeup state check Thomas Gleixner
2021-07-30 13:50 ` [patch 02/63] sched: Introduce TASK_RTLOCK_WAIT Thomas Gleixner
2021-07-30 13:50 ` [patch 03/63] sched: Prepare for RT sleeping spin/rwlocks Thomas Gleixner
2021-08-01 15:30   ` Mike Galbraith
2021-08-03  9:48     ` Peter Zijlstra
2021-08-03 14:04       ` Thomas Gleixner
2021-08-03 14:51         ` Peter Zijlstra
2021-08-03 20:11           ` Thomas Gleixner
2021-07-30 13:50 ` [patch 04/63] sched: Rework the __schedule() preempt argument Thomas Gleixner
2021-07-30 13:50 ` [patch 05/63] sched: Provide schedule point for RT locks Thomas Gleixner
2021-07-30 13:50 ` [patch 06/63] sched/wake_q: Provide WAKE_Q_HEAD_INITIALIZER Thomas Gleixner
2021-07-30 13:50 ` [patch 07/63] media/atomisp: Use lockdep instead of *mutex_is_locked() Thomas Gleixner
2021-07-30 13:50 ` [patch 08/63] rtmutex: Remove rt_mutex_is_locked() Thomas Gleixner
2021-07-30 13:50 ` [patch 09/63] rtmutex: Convert macros to inlines Thomas Gleixner
2021-07-30 13:50 ` [patch 10/63] rtmutex: Switch to try_cmpxchg() Thomas Gleixner
2021-07-30 13:50 ` [patch 11/63] rtmutex: Split API and implementation Thomas Gleixner
2021-07-30 13:50 ` [patch 12/63] rtmutex: Split out the inner parts of struct rtmutex Thomas Gleixner
2021-07-30 13:50 ` [patch 13/63] locking/rtmutex: Provide rt_mutex_slowlock_locked() Thomas Gleixner
2021-07-30 13:50 ` [patch 14/63] rtmutex: Provide rt_mutex_base_is_locked() Thomas Gleixner
2021-07-30 13:50 ` Thomas Gleixner [this message]
2021-08-04 19:37   ` [patch 15/63] locking: Add base code for RT rw_semaphore and rwlock Waiman Long
2021-08-05  9:04     ` Thomas Gleixner
2021-08-05 14:59       ` Waiman Long
2021-07-30 13:50 ` [patch 16/63] locking/rwsem: Add rtmutex based R/W semaphore implementation Thomas Gleixner
2021-07-30 13:50 ` [patch 17/63] locking/rtmutex: Add wake_state to rt_mutex_waiter Thomas Gleixner
2021-07-30 13:50 ` [patch 18/63] locking/rtmutex: Provide rt_wake_q and helpers Thomas Gleixner
2021-07-30 13:50 ` [patch 19/63] locking/rtmutex: Use rt_mutex_wake_q_head Thomas Gleixner
2021-07-30 13:50 ` [patch 20/63] locking/rtmutex: Prepare RT rt_mutex_wake_q for RT locks Thomas Gleixner
2021-07-30 13:50 ` [patch 21/63] locking/rtmutex: Guard regular sleeping locks specific functions Thomas Gleixner
2021-07-30 13:50 ` [patch 22/63] locking/spinlock: Split the lock types header Thomas Gleixner
2021-08-04 21:17   ` Waiman Long
2021-08-05  8:54     ` Thomas Gleixner
2021-07-30 13:50 ` [patch 23/63] locking/rtmutex: Prevent future include recursion hell Thomas Gleixner
2021-07-30 13:50 ` [patch 24/63] locking/lockdep: Reduce includes in debug_locks.h Thomas Gleixner
2021-07-30 13:50 ` [patch 25/63] rbtree: Split out the rbtree type definitions Thomas Gleixner
2021-07-30 13:50 ` [patch 26/63] locking/rtmutex: Include only rbtree types Thomas Gleixner
2021-07-30 13:50 ` [patch 27/63] locking/spinlock: Provide RT specific spinlock type Thomas Gleixner
2021-07-30 13:50 ` [patch 28/63] locking/spinlock: Provide RT variant header Thomas Gleixner
2021-07-30 13:50 ` [patch 29/63] locking/rtmutex: Provide the spin/rwlock core lock function Thomas Gleixner
2021-07-30 13:50 ` [patch 30/63] locking/spinlock: Provide RT variant Thomas Gleixner
2021-08-04 23:34   ` Waiman Long
2021-08-05  8:54     ` Thomas Gleixner
2021-07-30 13:50 ` [patch 31/63] locking/rwlock: " Thomas Gleixner
2021-07-30 13:50 ` [patch 32/63] locking/mutex: Consolidate core headers Thomas Gleixner
2021-07-30 13:50 ` [patch 33/63] locking/mutex: Move waiter to core header Thomas Gleixner
2021-07-30 13:50 ` [patch 34/63] locking/ww_mutex: Move ww_mutex declarations into ww_mutex.h Thomas Gleixner
2021-07-30 13:50 ` [patch 35/63] locking/mutex: Make mutex::wait_lock raw Thomas Gleixner
2021-07-30 13:50 ` [patch 36/63] locking/ww_mutex: Simplify lockdep annotation Thomas Gleixner
2021-07-30 13:50 ` [patch 37/63] locking/ww_mutex: Gather mutex_waiter initialization Thomas Gleixner
2021-07-30 13:50 ` [patch 38/63] locking/ww_mutex: Split up ww_mutex_unlock() Thomas Gleixner
2021-07-30 13:50 ` [patch 39/63] locking/ww_mutex: Split W/W implementation logic Thomas Gleixner
2021-07-30 13:50 ` [patch 40/63] locking/ww_mutex: Remove __sched annotation Thomas Gleixner
2021-07-30 13:50 ` [patch 41/63] locking/ww_mutex: Abstract waiter iteration Thomas Gleixner
2021-07-30 13:50 ` [patch 42/63] locking/ww_mutex: Abstract waiter enqueueing Thomas Gleixner
2021-07-30 13:50 ` [patch 43/63] locking/ww_mutex: Abstract mutex accessors Thomas Gleixner
2021-07-30 13:50 ` [patch 44/63] locking/ww_mutex: Abstract mutex types Thomas Gleixner
2021-07-30 13:50 ` [patch 45/63] locking/ww_mutex: Abstract internal lock access Thomas Gleixner
2021-07-30 13:50 ` [patch 46/63] locking/ww_mutex: Implement rt_mutex accessors Thomas Gleixner
2021-07-30 13:50 ` [patch 47/63] locking/ww_mutex: Add RT priority to W/W order Thomas Gleixner
2021-07-30 13:50 ` [patch 48/63] locking/ww_mutex: Add rt_mutex based lock type and accessors Thomas Gleixner
2021-07-30 13:50 ` [patch 49/63] locking/rtmutex: Extend the rtmutex core to support ww_mutex Thomas Gleixner
2021-07-30 13:50 ` [patch 50/63] locking/ww_mutex: Implement rtmutex based ww_mutex API functions Thomas Gleixner
2021-07-31 13:26   ` Mike Galbraith
2021-08-01 21:18     ` Thomas Gleixner
2021-07-30 13:50 ` [patch 51/63] locking/rtmutex: Add mutex variant for RT Thomas Gleixner
2021-07-30 13:50 ` [patch 52/63] lib/test_lockup: Adapt to changed variables Thomas Gleixner
2021-07-30 13:51 ` [patch 53/63] futex: Validate waiter correctly in futex_proxy_trylock_atomic() Thomas Gleixner
2021-07-30 13:51 ` [patch 54/63] futex: Cleanup stale comments Thomas Gleixner
2021-07-30 13:51 ` [patch 55/63] futex: Correct the number of requeued waiters for PI Thomas Gleixner
2021-07-30 13:51 ` [patch 56/63] futex: Restructure futex_requeue() Thomas Gleixner
2021-07-30 13:51 ` [patch 57/63] futex: Clarify comment in futex_requeue() Thomas Gleixner
2021-07-30 13:51 ` [patch 58/63] futex: Prevent requeue_pi() lock nesting issue on RT Thomas Gleixner
2021-08-02 12:56   ` Peter Zijlstra
2021-08-02 13:10     ` Peter Zijlstra
2021-08-02 14:35       ` Thomas Gleixner
2021-08-02 14:34     ` Thomas Gleixner
2021-08-03 10:28     ` Peter Zijlstra
2021-08-03 21:10       ` Thomas Gleixner
2021-08-03 10:07   ` Peter Zijlstra
2021-08-03 21:10     ` Thomas Gleixner
2021-08-03 11:20   ` Peter Zijlstra
2021-08-03 21:22     ` Thomas Gleixner
2021-07-30 13:51 ` [patch 59/63] rtmutex: Prevent lockdep false positive with PI futexes Thomas Gleixner
2021-07-30 13:51 ` [patch 60/63] preempt: Adjust PREEMPT_LOCK_OFFSET for RT Thomas Gleixner
2021-07-30 13:51 ` [patch 61/63] locking/rtmutex: Implement equal priority lock stealing Thomas Gleixner
2021-07-30 13:51 ` [patch 62/63] locking/rtmutex: Add adaptive spinwait mechanism Thomas Gleixner
2021-08-04 12:30   ` Peter Zijlstra
2021-08-04 17:49     ` Thomas Gleixner
2021-07-30 13:51 ` [patch 63/63] locking/rtmutex: Use adaptive spinwait for all rtmutex based locks Thomas Gleixner
     [not found] ` <20210803063217.2325-1-hdanton@sina.com>
2021-08-03  9:10   ` [patch 30/63] locking/spinlock: Provide RT variant Thomas Gleixner
2021-08-03 12:37 ` [patch 00/63] locking, sched: The PREEMPT-RT locking infrastructure Daniel Bristot de Oliveira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210730135206.018846923@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=bristot@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.