All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	juri.lelli@arm.com, bigeasy@linutronix.de, xlpang@redhat.com,
	rostedt@goodmis.org, mathieu.desnoyers@efficios.com,
	jdesfossez@efficios.com, dvhart@infradead.org,
	bristot@redhat.com, Thomas Gleixner <tglx@linutronix.de>,
	Ben Hutchings <ben@decadent.org.uk>
Subject: [PATCH 4.9 03/41] futex: Pull rt_mutex_futex_unlock() out from under hb->lock
Date: Fri,  5 Mar 2021 13:22:10 +0100	[thread overview]
Message-ID: <20210305120851.434265124@linuxfoundation.org> (raw)
In-Reply-To: <20210305120851.255002428@linuxfoundation.org>

From: Ben Hutchings <ben@decadent.org.uk>

From: Peter Zijlstra <peterz@infradead.org>

commit 16ffa12d742534d4ff73e8b3a4e81c1de39196f0 upstream.

There's a number of 'interesting' problems, all caused by holding
hb->lock while doing the rt_mutex_unlock() equivalient.

Notably:

 - a PI inversion on hb->lock; and,

 - a SCHED_DEADLINE crash because of pointer instability.

The previous changes:

 - changed the locking rules to cover {uval,pi_state} with wait_lock.

 - allow to do rt_mutex_futex_unlock() without dropping wait_lock; which in
   turn allows to rely on wait_lock atomicity completely.

 - simplified the waiter conundrum.

It's now sufficient to hold rtmutex::wait_lock and a reference on the
pi_state to protect the state consistency, so hb->lock can be dropped
before calling rt_mutex_futex_unlock().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104151.900002056@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[bwh: Backported to 4.9: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/futex.c |  111 ++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 68 insertions(+), 43 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -966,10 +966,12 @@ static void exit_pi_state_list(struct ta
 		pi_state->owner = NULL;
 		raw_spin_unlock_irq(&curr->pi_lock);
 
-		rt_mutex_futex_unlock(&pi_state->pi_mutex);
-
+		get_pi_state(pi_state);
 		spin_unlock(&hb->lock);
 
+		rt_mutex_futex_unlock(&pi_state->pi_mutex);
+		put_pi_state(pi_state);
+
 		raw_spin_lock_irq(&curr->pi_lock);
 	}
 	raw_spin_unlock_irq(&curr->pi_lock);
@@ -1083,6 +1085,11 @@ static int attach_to_pi_state(u32 __user
 	 * has dropped the hb->lock in between queue_me() and unqueue_me_pi(),
 	 * which in turn means that futex_lock_pi() still has a reference on
 	 * our pi_state.
+	 *
+	 * The waiter holding a reference on @pi_state also protects against
+	 * the unlocked put_pi_state() in futex_unlock_pi(), futex_lock_pi()
+	 * and futex_wait_requeue_pi() as it cannot go to 0 and consequently
+	 * free pi_state before we can take a reference ourselves.
 	 */
 	WARN_ON(!atomic_read(&pi_state->refcount));
 
@@ -1537,48 +1544,40 @@ static void mark_wake_futex(struct wake_
 	q->lock_ptr = NULL;
 }
 
-static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *top_waiter,
-			 struct futex_hash_bucket *hb)
+/*
+ * Caller must hold a reference on @pi_state.
+ */
+static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_state)
 {
-	struct task_struct *new_owner;
-	struct futex_pi_state *pi_state = top_waiter->pi_state;
 	u32 uninitialized_var(curval), newval;
+	struct task_struct *new_owner;
+	bool deboost = false;
 	WAKE_Q(wake_q);
-	bool deboost;
 	int ret = 0;
 
-	if (!pi_state)
-		return -EINVAL;
-
-	/*
-	 * If current does not own the pi_state then the futex is
-	 * inconsistent and user space fiddled with the futex value.
-	 */
-	if (pi_state->owner != current)
-		return -EINVAL;
-
 	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 	new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
-
-	/*
-	 * When we interleave with futex_lock_pi() where it does
-	 * rt_mutex_timed_futex_lock(), we might observe @top_waiter futex_q waiter,
-	 * but the rt_mutex's wait_list can be empty (either still, or again,
-	 * depending on which side we land).
-	 *
-	 * When this happens, give up our locks and try again, giving the
-	 * futex_lock_pi() instance time to complete, either by waiting on the
-	 * rtmutex or removing itself from the futex queue.
-	 */
 	if (!new_owner) {
-		raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
-		return -EAGAIN;
+		/*
+		 * Since we held neither hb->lock nor wait_lock when coming
+		 * into this function, we could have raced with futex_lock_pi()
+		 * such that we might observe @this futex_q waiter, but the
+		 * rt_mutex's wait_list can be empty (either still, or again,
+		 * depending on which side we land).
+		 *
+		 * When this happens, give up our locks and try again, giving
+		 * the futex_lock_pi() instance time to complete, either by
+		 * waiting on the rtmutex or removing itself from the futex
+		 * queue.
+		 */
+		ret = -EAGAIN;
+		goto out_unlock;
 	}
 
 	/*
-	 * We pass it to the next owner. The WAITERS bit is always
-	 * kept enabled while there is PI state around. We cleanup the
-	 * owner died bit, because we are the owner.
+	 * We pass it to the next owner. The WAITERS bit is always kept
+	 * enabled while there is PI state around. We cleanup the owner
+	 * died bit, because we are the owner.
 	 */
 	newval = FUTEX_WAITERS | task_pid_vnr(new_owner);
 
@@ -1611,15 +1610,15 @@ static int wake_futex_pi(u32 __user *uad
 		deboost = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q);
 	}
 
+out_unlock:
 	raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
-	spin_unlock(&hb->lock);
 
 	if (deboost) {
 		wake_up_q(&wake_q);
 		rt_mutex_adjust_prio(current);
 	}
 
-	return 0;
+	return ret;
 }
 
 /*
@@ -2493,7 +2492,7 @@ retry:
 	if (get_futex_value_locked(&uval, uaddr))
 		goto handle_fault;
 
-	while (1) {
+	for (;;) {
 		newval = (uval & FUTEX_OWNER_DIED) | newtid;
 
 		if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval))
@@ -3006,10 +3005,36 @@ retry:
 	 */
 	top_waiter = futex_top_waiter(hb, &key);
 	if (top_waiter) {
-		ret = wake_futex_pi(uaddr, uval, top_waiter, hb);
+		struct futex_pi_state *pi_state = top_waiter->pi_state;
+
+		ret = -EINVAL;
+		if (!pi_state)
+			goto out_unlock;
+
 		/*
-		 * In case of success wake_futex_pi dropped the hash
-		 * bucket lock.
+		 * If current does not own the pi_state then the futex is
+		 * inconsistent and user space fiddled with the futex value.
+		 */
+		if (pi_state->owner != current)
+			goto out_unlock;
+
+		/*
+		 * Grab a reference on the pi_state and drop hb->lock.
+		 *
+		 * The reference ensures pi_state lives, dropping the hb->lock
+		 * is tricky.. wake_futex_pi() will take rt_mutex::wait_lock to
+		 * close the races against futex_lock_pi(), but in case of
+		 * _any_ fail we'll abort and retry the whole deal.
+		 */
+		get_pi_state(pi_state);
+		spin_unlock(&hb->lock);
+
+		ret = wake_futex_pi(uaddr, uval, pi_state);
+
+		put_pi_state(pi_state);
+
+		/*
+		 * Success, we're done! No tricky corner cases.
 		 */
 		if (!ret)
 			goto out_putkey;
@@ -3024,7 +3049,6 @@ retry:
 		 * setting the FUTEX_WAITERS bit. Try again.
 		 */
 		if (ret == -EAGAIN) {
-			spin_unlock(&hb->lock);
 			put_futex_key(&key);
 			goto retry;
 		}
@@ -3032,7 +3056,7 @@ retry:
 		 * wake_futex_pi has detected invalid state. Tell user
 		 * space.
 		 */
-		goto out_unlock;
+		goto out_putkey;
 	}
 
 	/*
@@ -3042,8 +3066,10 @@ retry:
 	 * preserve the WAITERS bit not the OWNER_DIED one. We are the
 	 * owner.
 	 */
-	if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))
+	if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0)) {
+		spin_unlock(&hb->lock);
 		goto pi_faulted;
+	}
 
 	/*
 	 * If uval has changed, let user space handle it.
@@ -3057,7 +3083,6 @@ out_putkey:
 	return ret;
 
 pi_faulted:
-	spin_unlock(&hb->lock);
 	put_futex_key(&key);
 
 	ret = fault_in_user_writeable(uaddr);



  parent reply	other threads:[~2021-03-05 12:41 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-05 12:22 [PATCH 4.9 00/41] 4.9.260-rc1 review Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 01/41] futex: Cleanup variable names for futex_top_waiter() Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 02/41] futex: Cleanup refcounting Greg Kroah-Hartman
2021-03-05 12:22 ` Greg Kroah-Hartman [this message]
2021-03-05 12:22 ` [PATCH 4.9 04/41] futex: Futex_unlock_pi() determinism Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 05/41] futex: Fix pi_state->owner serialization Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 06/41] futex: Fix more put_pi_state() vs. exit_pi_state_list() races Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 07/41] futex: Dont enable IRQs unconditionally in put_pi_state() Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 08/41] net: usb: qmi_wwan: support ZTE P685M modem Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 09/41] arm: kprobes: Allow to handle reentered kprobe on single-stepping Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 10/41] scripts: use pkg-config to locate libcrypto Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 11/41] scripts: set proper OpenSSL include dir also for sign-file Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 12/41] hugetlb: fix update_and_free_page contig page struct assumption Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 13/41] printk: fix deadlock when kernel panic Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 14/41] arm64: Remove redundant mov from LL/SC cmpxchg Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 15/41] arm64: Avoid redundant type conversions in xchg() and cmpxchg() Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 16/41] arm64: cmpxchg: Use "K" instead of "L" for ll/sc immediate constraint Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 17/41] arm64: Use correct ll/sc atomic constraints Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 18/41] JFS: more checks for invalid superblock Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 19/41] xfs: Fix assert failure in xfs_setattr_size() Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 20/41] smackfs: restrict bytes count in smackfs write functions Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 21/41] net: fix up truesize of cloned skb in skb_prepare_for_shift() Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 22/41] mm/hugetlb.c: fix unnecessary address expansion of pmd sharing Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 23/41] staging: fwserial: Fix error handling in fwserial_create Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 24/41] x86/reboot: Add Zotac ZBOX CI327 nano PCI reboot quirk Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 25/41] vt/consolemap: do font sum unsigned Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 26/41] wlcore: Fix command execute failure 19 for wl12xx Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 27/41] pktgen: fix misuse of BUG_ON() in pktgen_thread_worker() Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 28/41] ath10k: fix wmi mgmt tx queue full due to race condition Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 29/41] x86/build: Treat R_386_PLT32 relocation as R_386_PC32 Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 30/41] Bluetooth: Fix null pointer dereference in amp_read_loc_assoc_final_data Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 31/41] staging: most: sound: add sanity check for function argument Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 32/41] media: uvcvideo: Allow entities with no pads Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 33/41] scsi: iscsi: Restrict sessions and handles to admin capabilities Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 34/41] sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 35/41] scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 36/41] scsi: iscsi: Verify lengths on passthrough PDUs Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 37/41] Xen/gnttab: handle p2m update errors on a per-slot basis Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 38/41] xen-netback: respect gnttab_map_refs()s return value Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 39/41] zsmalloc: account the number of compacted pages correctly Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 40/41] swap: fix swapfile read/write offset Greg Kroah-Hartman
2021-03-05 12:22 ` [PATCH 4.9 41/41] media: v4l: ioctl: Fix memory leak in video_usercopy Greg Kroah-Hartman
2021-03-05 17:51 ` [PATCH 4.9 00/41] 4.9.260-rc1 review Jon Hunter
2021-03-06  5:16 ` Florian Fainelli
2021-03-06 16:29 ` Guenter Roeck
2021-03-06 16:30 ` Guenter Roeck
2021-03-07  2:26 ` Naresh Kamboju
2021-03-12 20:24 ` Florian Fainelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210305120851.434265124@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=ben@decadent.org.uk \
    --cc=bigeasy@linutronix.de \
    --cc=bristot@redhat.com \
    --cc=dvhart@infradead.org \
    --cc=jdesfossez@efficios.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=xlpang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.