stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Yang Tao <yang.tao172@zte.com.cn>,
	Yi Wang <wang.yi59@zte.com.cn>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>
Subject: [PATCH 5.4 28/66] futex: Prevent robust futex exit race
Date: Wed, 27 Nov 2019 21:32:23 +0100	[thread overview]
Message-ID: <20191127202701.518579879@linuxfoundation.org> (raw)
In-Reply-To: <20191127202632.536277063@linuxfoundation.org>

From: Yang Tao <yang.tao172@zte.com.cn>

commit ca16d5bee59807bf04deaab0a8eccecd5061528c upstream.

Robust futexes utilize the robust_list mechanism to allow the kernel to
release futexes which are held when a task exits. The exit can be voluntary
or caused by a signal or fault. This prevents that waiters block forever.

The futex operations in user space store a pointer to the futex they are
either locking or unlocking in the op_pending member of the per task robust
list.

After a lock operation has succeeded the futex is queued in the robust list
linked list and the op_pending pointer is cleared.

After an unlock operation has succeeded the futex is removed from the
robust list linked list and the op_pending pointer is cleared.

The robust list exit code checks for the pending operation and any futex
which is queued in the linked list. It carefully checks whether the futex
value is the TID of the exiting task. If so, it sets the OWNER_DIED bit and
tries to wake up a potential waiter.

This is race free for the lock operation but unlock has two race scenarios
where waiters might not be woken up. These issues can be observed with
regular robust pthread mutexes. PI aware pthread mutexes are not affected.

(1) Unlocking task is killed after unlocking the futex value in user space
    before being able to wake a waiter.

        pthread_mutex_unlock()
                |
                V
        atomic_exchange_rel (&mutex->__data.__lock, 0)
                        <------------------------killed
            lll_futex_wake ()                   |
                                                |
                                                |(__lock = 0)
                                                |(enter kernel)
                                                |
                                                V
                                            do_exit()
                                            exit_mm()
                                          mm_release()
                                        exit_robust_list()
                                        handle_futex_death()
                                                |
                                                |(__lock = 0)
                                                |(uval = 0)
                                                |
                                                V
        if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
                return 0;

    The sanity check which ensures that the user space futex is owned by
    the exiting task prevents the wakeup of waiters which in consequence
    block infinitely.

(2) Waiting task is killed after a wakeup and before it can acquire the
    futex in user space.

        OWNER                         WAITER
				futex_wait()
   pthread_mutex_unlock()               |
                |                       |
                |(__lock = 0)           |
                |                       |
                V                       |
         futex_wake() ------------>  wakeup()
                                        |
                                        |(return to userspace)
                                        |(__lock = 0)
                                        |
                                        V
                        oldval = mutex->__data.__lock
                                          <-----------------killed
    atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,  |
                        id | assume_other_futex_waiters, 0)      |
                                                                 |
                                                                 |
                                                   (enter kernel)|
                                                                 |
                                                                 V
                                                         do_exit()
                                                        |
                                                        |
                                                        V
                                        handle_futex_death()
                                        |
                                        |(__lock = 0)
                                        |(uval = 0)
                                        |
                                        V
        if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
                return 0;

    The sanity check which ensures that the user space futex is owned
    by the exiting task prevents the wakeup of waiters, which seems to
    be correct as the exiting task does not own the futex value, but
    the consequence is that other waiters wont be woken up and block
    infinitely.

In both scenarios the following conditions are true:

   - task->robust_list->list_op_pending != NULL
   - user space futex value == 0
   - Regular futex (not PI)

If these conditions are met then it is reasonably safe to wake up a
potential waiter in order to prevent the above problems.

As this might be a false positive it can cause spurious wakeups, but the
waiter side has to handle other types of unrelated wakeups, e.g. signals
gracefully anyway. So such a spurious wakeup will not affect the
correctness of these operations.

This workaround must not touch the user space futex value and cannot set
the OWNER_DIED bit because the lock value is 0, i.e. uncontended. Setting
OWNER_DIED in this case would result in inconsistent state and subsequently
in malfunction of the owner died handling in user space.

The rest of the user space state is still consistent as no other task can
observe the list_op_pending entry in the exiting tasks robust list.

The eventually woken up waiter will observe the uncontended lock value and
take it over.

[ tglx: Massaged changelog and comment. Made the return explicit and not
  	depend on the subsequent check and added constants to hand into
  	handle_futex_death() instead of plain numbers. Fixed a few coding
	style issues. ]

Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")
Signed-off-by: Yang Tao <yang.tao172@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1573010582-35297-1-git-send-email-wang.yi59@zte.com.cn
Link: https://lkml.kernel.org/r/20191106224555.943191378@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 kernel/futex.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 51 insertions(+), 7 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3452,11 +3452,16 @@ err_unlock:
 	return ret;
 }
 
+/* Constants for the pending_op argument of handle_futex_death */
+#define HANDLE_DEATH_PENDING	true
+#define HANDLE_DEATH_LIST	false
+
 /*
  * Process a futex-list entry, check whether it's owned by the
  * dying task, and do notification if so:
  */
-static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi)
+static int handle_futex_death(u32 __user *uaddr, struct task_struct *curr,
+			      bool pi, bool pending_op)
 {
 	u32 uval, uninitialized_var(nval), mval;
 	int err;
@@ -3469,6 +3474,42 @@ retry:
 	if (get_user(uval, uaddr))
 		return -1;
 
+	/*
+	 * Special case for regular (non PI) futexes. The unlock path in
+	 * user space has two race scenarios:
+	 *
+	 * 1. The unlock path releases the user space futex value and
+	 *    before it can execute the futex() syscall to wake up
+	 *    waiters it is killed.
+	 *
+	 * 2. A woken up waiter is killed before it can acquire the
+	 *    futex in user space.
+	 *
+	 * In both cases the TID validation below prevents a wakeup of
+	 * potential waiters which can cause these waiters to block
+	 * forever.
+	 *
+	 * In both cases the following conditions are met:
+	 *
+	 *	1) task->robust_list->list_op_pending != NULL
+	 *	   @pending_op == true
+	 *	2) User space futex value == 0
+	 *	3) Regular futex: @pi == false
+	 *
+	 * If these conditions are met, it is safe to attempt waking up a
+	 * potential waiter without touching the user space futex value and
+	 * trying to set the OWNER_DIED bit. The user space futex value is
+	 * uncontended and the rest of the user space mutex state is
+	 * consistent, so a woken waiter will just take over the
+	 * uncontended futex. Setting the OWNER_DIED bit would create
+	 * inconsistent state and malfunction of the user space owner died
+	 * handling.
+	 */
+	if (pending_op && !pi && !uval) {
+		futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
+		return 0;
+	}
+
 	if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
 		return 0;
 
@@ -3588,10 +3629,11 @@ void exit_robust_list(struct task_struct
 		 * A pending lock might already be on the list, so
 		 * don't process it twice:
 		 */
-		if (entry != pending)
+		if (entry != pending) {
 			if (handle_futex_death((void __user *)entry + futex_offset,
-						curr, pi))
+						curr, pi, HANDLE_DEATH_LIST))
 				return;
+		}
 		if (rc)
 			return;
 		entry = next_entry;
@@ -3605,9 +3647,10 @@ void exit_robust_list(struct task_struct
 		cond_resched();
 	}
 
-	if (pending)
+	if (pending) {
 		handle_futex_death((void __user *)pending + futex_offset,
-				   curr, pip);
+				   curr, pip, HANDLE_DEATH_PENDING);
+	}
 }
 
 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
@@ -3784,7 +3827,8 @@ void compat_exit_robust_list(struct task
 		if (entry != pending) {
 			void __user *uaddr = futex_uaddr(entry, futex_offset);
 
-			if (handle_futex_death(uaddr, curr, pi))
+			if (handle_futex_death(uaddr, curr, pi,
+					       HANDLE_DEATH_LIST))
 				return;
 		}
 		if (rc)
@@ -3803,7 +3847,7 @@ void compat_exit_robust_list(struct task
 	if (pending) {
 		void __user *uaddr = futex_uaddr(pending, futex_offset);
 
-		handle_futex_death(uaddr, curr, pip);
+		handle_futex_death(uaddr, curr, pip, HANDLE_DEATH_PENDING);
 	}
 }
 



  parent reply	other threads:[~2019-11-27 21:16 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-27 20:31 [PATCH 5.4 00/66] 5.4.1-stable review Greg Kroah-Hartman
2019-11-27 20:31 ` [PATCH 5.4 01/66] Bluetooth: Fix invalid-free in bcsp_close() Greg Kroah-Hartman
2019-11-27 20:31 ` [PATCH 5.4 02/66] ath9k_hw: fix uninitialized variable data Greg Kroah-Hartman
2019-11-27 20:31 ` [PATCH 5.4 03/66] ath10k: Fix a NULL-ptr-deref bug in ath10k_usb_alloc_urb_from_pipe Greg Kroah-Hartman
2019-11-27 20:31 ` [PATCH 5.4 04/66] ath10k: Fix HOST capability QMI incompatibility Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 05/66] ath10k: restore QCA9880-AR1A (v1) detection Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 06/66] Revert "Bluetooth: hci_ll: set operational frequency earlier" Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 07/66] Revert "dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues" Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 08/66] md/raid10: prevent access of uninitialized resync_pages offset Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 09/66] x86/insn: Fix awk regexp warnings Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 10/66] x86/speculation: Fix incorrect MDS/TAA mitigation status Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 11/66] x86/speculation: Fix redundant MDS mitigation message Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 12/66] nbd: prevent memory leak Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 13/66] x86/stackframe/32: Repair 32-bit Xen PV Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 14/66] x86/xen/32: Make xen_iret_crit_fixup() independent of frame layout Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 15/66] x86/xen/32: Simplify ring check in xen_iret_crit_fixup() Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 16/66] x86/doublefault/32: Fix stack canaries in the double fault handler Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 17/66] x86/pti/32: Size initial_page_table correctly Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 18/66] x86/cpu_entry_area: Add guard page for entry stack on 32bit Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 19/66] x86/entry/32: Fix IRET exception Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 20/66] x86/entry/32: Use %ss segment where required Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 21/66] x86/entry/32: Move FIXUP_FRAME after pushing %fs in SAVE_ALL Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 22/66] x86/entry/32: Unwind the ESPFIX stack earlier on exception entry Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 23/66] x86/entry/32: Fix NMI vs ESPFIX Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 24/66] selftests/x86/mov_ss_trap: Fix the SYSENTER test Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 25/66] selftests/x86/sigreturn/32: Invalidate DS and ES when abusing the kernel Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 26/66] x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 27/66] x86/entry/32: Fix FIXUP_ESPFIX_STACK with user CR3 Greg Kroah-Hartman
2019-11-27 20:32 ` Greg Kroah-Hartman [this message]
2019-11-27 20:32 ` [PATCH 5.4 29/66] ALSA: usb-audio: Fix NULL dereference at parsing BADD Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 30/66] ALSA: usb-audio: Fix Scarlett 6i6 Gen 2 port data Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 31/66] media: vivid: Set vid_cap_streaming and vid_out_streaming to true Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 32/66] media: vivid: Fix wrong locking that causes race conditions on streaming stop Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 33/66] media: usbvision: Fix invalid accesses after device disconnect Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 34/66] media: usbvision: Fix races among open, close, and disconnect Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 35/66] cpufreq: Add NULL checks to show() and store() methods of cpufreq Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 36/66] futex: Move futex exit handling into futex code Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 37/66] futex: Replace PF_EXITPIDONE with a state Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 38/66] exit/exec: Seperate mm_release() Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 39/66] futex: Split futex_mm_release() for exit/exec Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 40/66] futex: Set task::futex_state to DEAD right after handling futex exit Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 41/66] futex: Mark the begin of futex exit explicitly Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 42/66] futex: Sanitize exit state handling Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 43/66] futex: Provide state handling for exec() as well Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 44/66] futex: Add mutex around futex exit Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 45/66] futex: Provide distinct return value when owner is exiting Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 46/66] futex: Prevent exit livelock Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 47/66] media: uvcvideo: Fix error path in control parsing failure Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 48/66] media: b2c2-flexcop-usb: add sanity checking Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 49/66] media: cxusb: detect cxusb_ctrl_msg error in query Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 50/66] media: imon: invalid dereference in imon_touch_event Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 51/66] media: mceusb: fix out of bounds read in MCE receiver buffer Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 52/66] ALSA: hda - Disable audio component for legacy Nvidia HDMI codecs Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 53/66] USBIP: add config dependency for SGL_ALLOC Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 54/66] usbip: tools: fix fd leakage in the function of read_attr_usbip_status Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 55/66] usbip: Fix uninitialized symbol nents in stub_recv_cmd_submit() Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 56/66] usb-serial: cp201x: support Mark-10 digital force gauge Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 57/66] USB: chaoskey: fix error case of a timeout Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 58/66] appledisplay: fix error handling in the scheduled work Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 59/66] USB: serial: mos7840: add USB ID to support Moxa UPort 2210 Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 60/66] USB: serial: mos7720: fix remote wakeup Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 61/66] USB: serial: mos7840: " Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 62/66] USB: serial: option: add support for DW5821e with eSIM support Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 63/66] USB: serial: option: add support for Foxconn T77W968 LTE modules Greg Kroah-Hartman
2019-11-27 20:32 ` [PATCH 5.4 64/66] staging: comedi: usbduxfast: usbduxfast_ai_cmdtest rounding error Greg Kroah-Hartman
2019-11-27 20:33 ` [PATCH 5.4 65/66] powerpc/book3s64: Fix link stack flush on context switch Greg Kroah-Hartman
2019-11-27 20:33 ` [PATCH 5.4 66/66] KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel Greg Kroah-Hartman
2019-11-28  8:56 ` [PATCH 5.4 00/66] 5.4.1-stable review Naresh Kamboju
2019-11-29  8:52   ` Greg Kroah-Hartman
2019-11-28 10:42 ` Jon Hunter
2019-11-29  8:52   ` Greg Kroah-Hartman
2019-11-28 15:40 ` shuah
2019-11-29  8:52   ` Greg Kroah-Hartman
2019-11-28 18:47 ` Guenter Roeck
2019-11-29  8:51   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191127202701.518579879@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=wang.yi59@zte.com.cn \
    --cc=yang.tao172@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).