All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: tglx@linutronix.de
Cc: mingo@kernel.org, juri.lelli@arm.com, rostedt@goodmis.org,
	xlpang@redhat.com, bigeasy@linutronix.de,
	linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com,
	jdesfossez@efficios.com, bristot@redhat.com,
	dvhart@infradead.org, peterz@infradead.org
Subject: [PATCH -v6 13/13] futex: futex_lock_pi() vs PREEMPT_RT_FULL
Date: Wed, 22 Mar 2017 11:36:00 +0100	[thread overview]
Message-ID: <20170322104152.161341537@infradead.org> (raw)
In-Reply-To: 20170322103547.756091212@infradead.org

[-- Attachment #1: peterz-futex-pi-unlock-12.patch --]
[-- Type: text/plain, Size: 6343 bytes --]

When PREEMPT_RT_FULL does the spinlock -> rt_mutex substitution the PI
chain code will (falsely) report a deadlock and BUG.

The problem is that we hold hb->lock (now an rt_mutex) while doing
task_blocks_on_rt_mutex on the futex's pi_state::rtmutex. This, when
interleaved just right with futex_unlock_pi() leads it to believe we
have an AB-BA deadlock.

  Task1 (holds rt_mutex,	Task2 (does FUTEX_LOCK_PI)
         does FUTEX_UNLOCK_PI)

				lock hb->lock
				lock rt_mutex (as per start_proxy)
  lock hb->lock

Which is a trivial AB-BA.

It is not an actual deadlock, because we won't be holding hb->lock by
the time we actually block on rt_mutex, but the chainwalk code doesn't
know that.

To avoid this problem, do the same thing we do in futex_unlock_pi()
and drop hb->lock after acquiring wait_lock. This still fully
serializes against futex_unlock_pi(), since adding to the wait_list
does the very same lock dance, and removing it holds both locks.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex.c                  |   30 +++++++++++++++++-------
 kernel/locking/rtmutex.c        |   49 ++++++++++++++++++++++------------------
 kernel/locking/rtmutex_common.h |    3 ++
 3 files changed, 52 insertions(+), 30 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2654,20 +2654,33 @@ static int futex_lock_pi(u32 __user *uad
 		goto no_block;
 	}
 
+	rt_mutex_init_waiter(&rt_waiter);
+
 	/*
-	 * We must add ourselves to the rt_mutex waitlist while holding hb->lock
-	 * such that the hb and rt_mutex wait lists match.
+	 * On PREEMPT_RT_FULL, when hb->lock becomes an rt_mutex, we must not
+	 * hold it while doing rt_mutex_start_proxy(), because then it will
+	 * include hb->lock in the blocking chain, even through we'll not in
+	 * fact hold it while blocking. This will lead it to report -EDEADLK
+	 * and BUG when futex_unlock_pi() interleaves with this.
+	 *
+	 * Therefore acquire wait_lock while holding hb->lock, but drop the
+	 * latter before calling rt_mutex_start_proxy_lock(). This still fully
+	 * serializes against futex_unlock_pi() as that does the exact same
+	 * lock handoff sequence.
 	 */
-	rt_mutex_init_waiter(&rt_waiter);
-	ret = rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current);
+	raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock);
+	spin_unlock(q.lock_ptr);
+	ret = __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current);
+	raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock);
+
 	if (ret) {
 		if (ret == 1)
 			ret = 0;
 
+		spin_lock(q.lock_ptr);
 		goto no_block;
 	}
 
-	spin_unlock(q.lock_ptr);
 
 	if (unlikely(to))
 		hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS);
@@ -2680,6 +2693,9 @@ static int futex_lock_pi(u32 __user *uad
 	 * first acquire the hb->lock before removing the lock from the
 	 * rt_mutex waitqueue, such that we can keep the hb and rt_mutex
 	 * wait lists consistent.
+	 *
+	 * In particular; it is important that futex_unlock_pi() can not
+	 * observe this inconsistency.
 	 */
 	if (ret && !rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter))
 		ret = 0;
@@ -2791,10 +2807,6 @@ static int futex_unlock_pi(u32 __user *u
 
 		get_pi_state(pi_state);
 		/*
-		 * Since modifying the wait_list is done while holding both
-		 * hb->lock and wait_lock, holding either is sufficient to
-		 * observe it.
-		 *
 		 * By taking wait_lock while still holding hb->lock, we ensure
 		 * there is no point where we hold neither; and therefore
 		 * wake_futex_pi() must observe a state consistent with what we
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1669,31 +1669,14 @@ void rt_mutex_proxy_unlock(struct rt_mut
 	rt_mutex_set_owner(lock, NULL);
 }
 
-/**
- * rt_mutex_start_proxy_lock() - Start lock acquisition for another task
- * @lock:		the rt_mutex to take
- * @waiter:		the pre-initialized rt_mutex_waiter
- * @task:		the task to prepare
- *
- * Returns:
- *  0 - task blocked on lock
- *  1 - acquired the lock for task, caller should wake it up
- * <0 - error
- *
- * Special API call for FUTEX_REQUEUE_PI support.
- */
-int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+int __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
 			      struct rt_mutex_waiter *waiter,
 			      struct task_struct *task)
 {
 	int ret;
 
-	raw_spin_lock_irq(&lock->wait_lock);
-
-	if (try_to_take_rt_mutex(lock, task, NULL)) {
-		raw_spin_unlock_irq(&lock->wait_lock);
+	if (try_to_take_rt_mutex(lock, task, NULL))
 		return 1;
-	}
 
 	/* We enforce deadlock detection for futexes */
 	ret = task_blocks_on_rt_mutex(lock, waiter, task,
@@ -1712,12 +1695,36 @@ int rt_mutex_start_proxy_lock(struct rt_
 	if (unlikely(ret))
 		remove_waiter(lock, waiter);
 
-	raw_spin_unlock_irq(&lock->wait_lock);
-
 	debug_rt_mutex_print_deadlock(waiter);
 
 	return ret;
 }
+
+/**
+ * rt_mutex_start_proxy_lock() - Start lock acquisition for another task
+ * @lock:		the rt_mutex to take
+ * @waiter:		the pre-initialized rt_mutex_waiter
+ * @task:		the task to prepare
+ *
+ * Returns:
+ *  0 - task blocked on lock
+ *  1 - acquired the lock for task, caller should wake it up
+ * <0 - error
+ *
+ * Special API call for FUTEX_REQUEUE_PI support.
+ */
+int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+			      struct rt_mutex_waiter *waiter,
+			      struct task_struct *task)
+{
+	int ret;
+
+	raw_spin_lock_irq(&lock->wait_lock);
+	ret = __rt_mutex_start_proxy_lock(lock, waiter, task);
+	raw_spin_unlock_irq(&lock->wait_lock);
+
+	return ret;
+}
 
 /**
  * rt_mutex_next_owner - return the next owner of the lock
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -104,6 +104,9 @@ extern void rt_mutex_init_proxy_locked(s
 extern void rt_mutex_proxy_unlock(struct rt_mutex *lock,
 				  struct task_struct *proxy_owner);
 extern void rt_mutex_init_waiter(struct rt_mutex_waiter *waiter);
+extern int __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+				     struct rt_mutex_waiter *waiter,
+				     struct task_struct *task);
 extern int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
 				     struct rt_mutex_waiter *waiter,
 				     struct task_struct *task);

  parent reply	other threads:[~2017-03-22 10:45 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-22 10:35 [PATCH -v6 00/13] The arduous story of FUTEX_UNLOCK_PI Peter Zijlstra
2017-03-22 10:35 ` [PATCH -v6 01/13] futex: Cleanup variable names for futex_top_waiter() Peter Zijlstra
2017-03-23 18:19   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-03-24 21:11   ` [PATCH -v6 01/13] " Darren Hart
2017-03-22 10:35 ` [PATCH -v6 02/13] futex: Use smp_store_release() in mark_wake_futex() Peter Zijlstra
2017-03-23 18:19   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-03-24 21:16   ` [PATCH -v6 02/13] " Darren Hart
2017-03-22 10:35 ` [PATCH -v6 03/13] futex: Remove rt_mutex_deadlock_account_*() Peter Zijlstra
2017-03-23 18:20   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-03-24 21:29   ` [PATCH -v6 03/13] " Darren Hart
2017-03-24 21:31     ` Darren Hart
2017-03-22 10:35 ` [PATCH -v6 04/13] futex,rt_mutex: Provide futex specific rt_mutex API Peter Zijlstra
2017-03-23 18:20   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-03-25  0:37   ` [PATCH -v6 04/13] " Darren Hart
2017-04-06 12:15     ` Peter Zijlstra
2017-04-06 17:02       ` Darren Hart
2017-04-05 15:02   ` Darren Hart
2017-04-06 12:17     ` Peter Zijlstra
2017-04-06 17:08       ` Darren Hart
2017-03-22 10:35 ` [PATCH -v6 05/13] futex: Change locking rules Peter Zijlstra
2017-03-23 18:21   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-04-05 21:18   ` [PATCH -v6 05/13] " Darren Hart
2017-04-06 12:28     ` Peter Zijlstra
2017-04-06 15:58       ` Joe Perches
2017-04-06 17:21       ` Darren Hart
2017-03-22 10:35 ` [PATCH -v6 06/13] futex: Cleanup refcounting Peter Zijlstra
2017-03-23 18:21   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-04-05 21:29   ` [PATCH -v6 06/13] " Darren Hart
2017-03-22 10:35 ` [PATCH -v6 07/13] futex: Rework inconsistent rt_mutex/futex_q state Peter Zijlstra
2017-03-23 18:22   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-04-05 21:58   ` [PATCH -v6 07/13] " Darren Hart
2017-03-22 10:35 ` [PATCH -v6 08/13] futex: Pull rt_mutex_futex_unlock() out from under hb->lock Peter Zijlstra
2017-03-23 18:22   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-04-05 23:52   ` [PATCH -v6 08/13] " Darren Hart
2017-04-06 12:42     ` Peter Zijlstra
2017-04-06 17:42       ` Darren Hart
2017-03-22 10:35 ` [PATCH -v6 09/13] futex,rt_mutex: Introduce rt_mutex_init_waiter() Peter Zijlstra
2017-03-23 18:23   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-04-05 23:57   ` [PATCH -v6 09/13] " Darren Hart
2017-03-22 10:35 ` [PATCH -v6 10/13] futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock() Peter Zijlstra
2017-03-23 18:23   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-04-07 23:30   ` [PATCH -v6 10/13] " Darren Hart
2017-04-07 23:35     ` Darren Hart
2017-03-22 10:35 ` [PATCH -v6 11/13] futex: Rework futex_lock_pi() to use rt_mutex_*_proxy_lock() Peter Zijlstra
2017-03-23 18:24   ` [tip:locking/core] " tip-bot for Peter Zijlstra
2017-04-08  0:55   ` [PATCH -v6 11/13] " Darren Hart
2017-04-10 15:51   ` alexander.levin
2017-04-10 16:03     ` Thomas Gleixner
2017-04-14  9:30       ` [tip:locking/core] futex: Avoid freeing an active timer tip-bot for Thomas Gleixner
2017-03-22 10:35 ` [PATCH -v6 12/13] futex: futex_unlock_pi() determinism Peter Zijlstra
2017-03-23 18:24   ` [tip:locking/core] futex: Futex_unlock_pi() determinism tip-bot for Peter Zijlstra
2017-04-08  1:27   ` [PATCH -v6 12/13] futex: futex_unlock_pi() determinism Darren Hart
2017-03-22 10:36 ` Peter Zijlstra [this message]
2017-03-23 18:25   ` [tip:locking/core] futex: Drop hb->lock before enqueueing on the rtmutex tip-bot for Peter Zijlstra
2017-04-08  2:26   ` [PATCH -v6 13/13] futex: futex_lock_pi() vs PREEMPT_RT_FULL Darren Hart
2017-04-08  5:22     ` Mike Galbraith
2017-04-10  8:43     ` Sebastian Andrzej Siewior
2017-04-10  9:08     ` Peter Zijlstra
2017-04-10 16:05       ` Darren Hart
2017-03-24  1:45 ` [PATCH -v6 00/13] The arduous story of FUTEX_UNLOCK_PI Darren Hart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170322104152.161341537@infradead.org \
    --to=peterz@infradead.org \
    --cc=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=dvhart@infradead.org \
    --cc=jdesfossez@efficios.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=xlpang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.