All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Waiman Long <longman@redhat.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Davidlohr Bueso <dave@stgolabs.net>
Subject: Re: [patch 0/2] locking/rtmutex: Cure two subtle bugs
Date: Thu, 26 Aug 2021 15:26:36 +0200	[thread overview]
Message-ID: <YSeWjCHoK4v5OcOt@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20210825101857.420032248@linutronix.de>

On Wed, Aug 25, 2021 at 12:33:11PM +0200, Thomas Gleixner wrote:
> The fixes are straight forward, but the rtmutex based ww_mutex
> implementation still has some more rough egdes which need to be ironed out.

Something like the below (which should be two patches).

There's two issues at hand:

 - 'spurious' -EDEADLK reports due to prior loops
 - actual deadlock detection that will/should be resolved by ww_mutex
   instead.

Both stem from the fact that ww_mutex can legitimately create cycles in
the lock graph. The first issue is because a blocker (P3 blow) can get
trapped in a cycle it didn't cause and hit the iteration depth.

The latter can be observed when we suppose P2 to be the highest priority
and thus should win progress rights over P1, but P2 would also be 'late'
and be the one to close the cycle. In that case PI-walk would report
-EDEADLK to P2, even though w/w would pick and wake P1 to receive
-EDEADLK.

Now; afaict these conditions below capture the above, but then fail to
adequately handle more complex locking chains where, for example, two
w/w classes are inverted, which is a geniune deadlock, which the below
will suppress.

Luckily w/w stuff is in-kernel only and lockdep should capture them; any
user chains should be unaffected.

famous last words etc..

---
 kernel/locking/rtmutex.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index c8fe74ef8db9..b7145d724b5f 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -656,6 +656,31 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
 	if (next_lock != waiter->lock)
 		goto out_unlock_pi;
 
+	/*
+	 * There could be 'spurious' loops in the lock graph due to ww_mutex,
+	 * consider:
+	 *
+	 *   P1: A, ww_A, ww_B
+	 *   P2: ww_B, ww_A
+	 *   P3: A
+	 *
+	 * P3 should not return -EDEADLK because it gets trapped in the cycle
+	 * created by P1 and P2 (which will resolve -- and runs into
+	 * max_lock_depth above). Therefore disable detect_deadlock such that
+	 * the below termination condition can trigger once all relevant tasks
+	 * are boosted.
+	 *
+	 * Even when we start with ww_mutex we can disable deadlock detection,
+	 * since we would supress a ww_mutex induced deadlock at [6] anyway.
+	 * Supressing it here however is not sufficient since we might still
+	 * hit [6] due to adjustment driven iteration.
+	 *
+	 * NOTE: if someone were to create a deadlock between 2 ww_classes we'd
+	 * utterly fail to report it; lockdep should.
+	 */
+	if (waiter->ww_ctx && detect_deadlock)
+		detect_deadlock = false;
+
 	/*
 	 * Drop out, when the task has no waiters. Note,
 	 * top_waiter can be NULL, when we are in the deboosting
@@ -717,8 +742,21 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
 	 * walk, we detected a deadlock.
 	 */
 	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
-		raw_spin_unlock(&lock->wait_lock);
 		ret = -EDEADLK;
+
+		/*
+		 * When the deadlock is due to ww_mutex; also see above. Don't
+		 * report the deadlock and instead let the ww_mutex wound/die
+		 * logic pick which of the contending threads gets -EDEADLK.
+		 *
+		 * NOTE: assumes the cycle only contains a single ww_class; any
+		 * other configuration and we fail to report; also, see
+		 * lockdep.
+		 */
+		if (orig_waiter->ww_ctx)
+			ret = 0;
+
+		raw_spin_unlock(&lock->wait_lock);
 		goto out_unlock_pi;
 	}
 

  parent reply	other threads:[~2021-08-26 13:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-25 10:33 [patch 0/2] locking/rtmutex: Cure two subtle bugs Thomas Gleixner
2021-08-25 10:33 ` [patch 1/2] locking/rtmutex: Dont dereference waiter lockless Thomas Gleixner
2021-08-25 14:17   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2021-08-25 10:33 ` [patch 2/2] locking/rtmutex: Dequeue waiter on ww_mutex deadlock Thomas Gleixner
2021-08-25 14:17   ` [tip: locking/core] " tip-bot2 for Thomas Gleixner
2021-08-25 11:40 ` [patch 0/2] locking/rtmutex: Cure two subtle bugs Peter Zijlstra
2021-08-26 13:26 ` Peter Zijlstra [this message]
2021-08-27  7:56   ` Sebastian Andrzej Siewior
2021-08-27 12:31   ` [tip: locking/core] locking/rtmutex: Return success on deadlock for ww_mutex waiters tip-bot2 for Peter Zijlstra
2021-08-27 12:31   ` [tip: locking/core] locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes tip-bot2 for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YSeWjCHoK4v5OcOt@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bigeasy@linutronix.de \
    --cc=dave@stgolabs.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.