[PATCH] locking/rtmutex: Flush the plug before entering the slowpath.

* [PATCH] locking/rtmutex: Flush the plug before entering the slowpath.
@ 2023-03-22 16:27 Sebastian Andrzej Siewior
  2023-03-28 16:54 ` [PATCH] locking/rtmutex: Do the trylock-slowpath with DEBUG_RT_MUTEXES enabled Sebastian Andrzej Siewior
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2023-03-22 16:27 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: Crystal Wood, John Keeping, Boqun Feng, Ingo Molnar,
	Peter Zijlstra, Waiman Long, Will Deacon

blk_flush_plug() is invoked on schedule() to flush out the IO progress
that has been made so far so that it is globally visible. This is
important to avoid deadlocks because a lock owner can wait for IO.
Therefore the IO must be first flushed before a thread can block on a
lock.

The plug flush routine can acquire a sleeping lock which is contended.
Blocking on a lock requires an assignment to task_struct::pi_blocked_on.
If blk_flush_plug() is invoked from the slow path on schedule() then the
variable is already set and will be overwritten by the lock in
blk_flush_plug().
Therefore it is needed to invoke blk_flush_plug() (and block on
potential locks in the process) before the blocking on the actual lock.

Invoke blk_flush_plug() before blocking on a sleeping lock. The
PREEMPT_RT only sleeping locks (spinlock_t and rwlock_t) are excluded
because their slow path does not invoke blk_flush_plug().

Fixes: e17ba59b7e8e1 ("locking/rtmutex: Guard regular sleeping locks specific functions")
Reported-by: Crystal Wood <swood@redhat.com>
Link: https://lore.kernel.org/4b4ab374d3e24e6ea8df5cadc4297619a6d945af.camel@redhat.com
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
On 2023-02-20 19:21:51 [+0100], Thomas Gleixner wrote:
> 
> This still leaves the problem vs. io_wq_worker_sleeping() and it's
> running() counterpart after schedule().

io_wq_worker_sleeping() has a kfree() so it probably should be moved,
too.
io_wq_worker_running() is a OR and INC and is fine.

> Aside of that for CONFIG_DEBUG_RT_MUTEXES=y builds it flushes on every
> lock operation whether the lock is contended or not.

For mutex & ww_mutex operations. rwsem is not affected by
CONFIG_DEBUG_RT_MUTEXES. As for mutex it could be mitigated by invoking
try_to_take_rt_mutex() before blk_flush_plug().

 kernel/locking/rtmutex.c     | 7 +++++++
 kernel/locking/rwbase_rt.c   | 8 ++++++++
 kernel/locking/ww_rt_mutex.c | 5 +++++
 3 files changed, 20 insertions(+)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 728f434de2bbf..c1bc2cb1522cb 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -23,6 +23,7 @@
 #include <linux/sched/rt.h>
 #include <linux/sched/wake_q.h>
 #include <linux/ww_mutex.h>
+#include <linux/blkdev.h>
 
 #include <trace/events/lock.h>
 
@@ -1700,6 +1701,12 @@ static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock,
 	if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
 		return 0;
 
+	/*
+	 * If we are going to sleep and we have plugged IO queued, make sure to
+	 * submit it to avoid deadlocks.
+	 */
+	blk_flush_plug(current->plug, true);
+
 	return rt_mutex_slowlock(lock, NULL, state);
 }
 #endif /* RT_MUTEX_BUILD_MUTEX */
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index c201aadb93017..70c08ec4ad8af 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -143,6 +143,14 @@ static __always_inline int rwbase_read_lock(struct rwbase_rt *rwb,
 	if (rwbase_read_trylock(rwb))
 		return 0;
 
+	if (state != TASK_RTLOCK_WAIT) {
+		/*
+		 * If we are going to sleep and we have plugged IO queued,
+		 * make sure to submit it to avoid deadlocks.
+		 */
+		blk_flush_plug(current->plug, true);
+	}
+
 	return __rwbase_read_lock(rwb, state);
 }
 
diff --git a/kernel/locking/ww_rt_mutex.c b/kernel/locking/ww_rt_mutex.c
index d1473c624105c..472e3622abf09 100644
--- a/kernel/locking/ww_rt_mutex.c
+++ b/kernel/locking/ww_rt_mutex.c
@@ -67,6 +67,11 @@ __ww_rt_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx,
 			ww_mutex_set_context_fastpath(lock, ww_ctx);
 		return 0;
 	}
+	/*
+	 * If we are going to sleep and we have plugged IO queued, make sure to
+	 * submit it to avoid deadlocks.
+	 */
+	blk_flush_plug(current->plug, true);
 
 	ret = rt_mutex_slowlock(&rtm->rtmutex, ww_ctx, state);
 
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread