All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Stefan Liebler <stli@linux.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Darren Hart <dvhart@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Sasha Levin <sashal@kernel.org>,
	Sudip Mukherjee <sudipm.mukherjee@gmail.com>,
	Lee Jones <lee.jones@linaro.org>
Subject: [PATCH 4.9 15/49] futex: Cure exit race
Date: Mon, 22 Feb 2021 13:36:13 +0100	[thread overview]
Message-ID: <20210222121025.999623772@linuxfoundation.org> (raw)
In-Reply-To: <20210222121022.546148341@linuxfoundation.org>

From: Thomas Gleixner <tglx@linutronix.de>

commit da791a667536bf8322042e38ca85d55a78d3c273 upstream.

Stefan reported, that the glibc tst-robustpi4 test case fails
occasionally. That case creates the following race between
sys_exit() and sys_futex_lock_pi():

 CPU0				CPU1

 sys_exit()			sys_futex()
  do_exit()			 futex_lock_pi()
   exit_signals(tsk)		  No waiters:
    tsk->flags |= PF_EXITING;	  *uaddr == 0x00000PID
  mm_release(tsk)		  Set waiter bit
   exit_robust_list(tsk) {	  *uaddr = 0x80000PID;
      Set owner died		  attach_to_pi_owner() {
    *uaddr = 0xC0000000;	   tsk = get_task(PID);
   }				   if (!tsk->flags & PF_EXITING) {
  ...				     attach();
  tsk->flags |= PF_EXITPIDONE;	   } else {
				     if (!(tsk->flags & PF_EXITPIDONE))
				       return -EAGAIN;
				     return -ESRCH; <--- FAIL
				   }

ESRCH is returned all the way to user space, which triggers the glibc test
case assert. Returning ESRCH unconditionally is wrong here because the user
space value has been changed by the exiting task to 0xC0000000, i.e. the
FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
is a valid state and the kernel has to handle it, i.e. taking the futex.

Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
is set in the task which 'owns' the futex. If the value has changed, let
the kernel retry the operation, which includes all regular sanity checks
and correctly handles the FUTEX_OWNER_DIED case.

If it hasn't changed, then return ESRCH as there is no way to distinguish
this case from malfunctioning user space. This happens when the exiting
task did not have a robust list, the robust list was corrupted or the user
space value in the futex was simply bogus.

Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sashal@kernel.org>
Cc: stable@vger.kernel.org
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Lee: Required to satisfy functional dependency from futex back-port.
 Re-add the missing handle_exit_race() parts from:
 3d4775df0a89 ("futex: Replace PF_EXITPIDONE with a state")]
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 kernel/futex.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 65 insertions(+), 6 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1201,11 +1201,67 @@ static void wait_for_owner_exiting(int r
 	put_task_struct(exiting);
 }
 
+static int handle_exit_race(u32 __user *uaddr, u32 uval,
+			    struct task_struct *tsk)
+{
+	u32 uval2;
+
+	/*
+	 * If the futex exit state is not yet FUTEX_STATE_DEAD, wait
+	 * for it to finish.
+	 */
+	if (tsk && tsk->futex_state != FUTEX_STATE_DEAD)
+		return -EAGAIN;
+
+	/*
+	 * Reread the user space value to handle the following situation:
+	 *
+	 * CPU0				CPU1
+	 *
+	 * sys_exit()			sys_futex()
+	 *  do_exit()			 futex_lock_pi()
+	 *                                futex_lock_pi_atomic()
+	 *   exit_signals(tsk)		    No waiters:
+	 *    tsk->flags |= PF_EXITING;	    *uaddr == 0x00000PID
+	 *  mm_release(tsk)		    Set waiter bit
+	 *   exit_robust_list(tsk) {	    *uaddr = 0x80000PID;
+	 *      Set owner died		    attach_to_pi_owner() {
+	 *    *uaddr = 0xC0000000;	     tsk = get_task(PID);
+	 *   }				     if (!tsk->flags & PF_EXITING) {
+	 *  ...				       attach();
+	 *  tsk->futex_state =               } else {
+	 *	FUTEX_STATE_DEAD;              if (tsk->futex_state !=
+	 *					  FUTEX_STATE_DEAD)
+	 *				         return -EAGAIN;
+	 *				       return -ESRCH; <--- FAIL
+	 *				     }
+	 *
+	 * Returning ESRCH unconditionally is wrong here because the
+	 * user space value has been changed by the exiting task.
+	 *
+	 * The same logic applies to the case where the exiting task is
+	 * already gone.
+	 */
+	if (get_futex_value_locked(&uval2, uaddr))
+		return -EFAULT;
+
+	/* If the user space value has changed, try again. */
+	if (uval2 != uval)
+		return -EAGAIN;
+
+	/*
+	 * The exiting task did not have a robust list, the robust list was
+	 * corrupted or the user space value in *uaddr is simply bogus.
+	 * Give up and tell user space.
+	 */
+	return -ESRCH;
+}
+
 /*
  * Lookup the task for the TID provided from user space and attach to
  * it after doing proper sanity checks.
  */
-static int attach_to_pi_owner(u32 uval, union futex_key *key,
+static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
 			      struct futex_pi_state **ps,
 			      struct task_struct **exiting)
 {
@@ -1216,12 +1272,15 @@ static int attach_to_pi_owner(u32 uval,
 	/*
 	 * We are the first waiter - try to look up the real owner and attach
 	 * the new pi_state to it, but bail out when TID = 0 [1]
+	 *
+	 * The !pid check is paranoid. None of the call sites should end up
+	 * with pid == 0, but better safe than sorry. Let the caller retry
 	 */
 	if (!pid)
-		return -ESRCH;
+		return -EAGAIN;
 	p = futex_find_get_task(pid);
 	if (!p)
-		return -ESRCH;
+		return handle_exit_race(uaddr, uval, NULL);
 
 	if (unlikely(p->flags & PF_KTHREAD)) {
 		put_task_struct(p);
@@ -1240,7 +1299,7 @@ static int attach_to_pi_owner(u32 uval,
 		 * FUTEX_STATE_DEAD, we know that the task has finished
 		 * the cleanup:
 		 */
-		int ret = (p->futex_state = FUTEX_STATE_DEAD) ? -ESRCH : -EAGAIN;
+		int ret = handle_exit_race(uaddr, uval, p);
 
 		raw_spin_unlock_irq(&p->pi_lock);
 		/*
@@ -1306,7 +1365,7 @@ static int lookup_pi_state(u32 __user *u
 	 * We are the first waiter - try to look up the owner based on
 	 * @uval and attach to it.
 	 */
-	return attach_to_pi_owner(uval, key, ps, exiting);
+	return attach_to_pi_owner(uaddr, uval, key, ps, exiting);
 }
 
 static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
@@ -1422,7 +1481,7 @@ static int futex_lock_pi_atomic(u32 __us
 	 * attach to the owner. If that fails, no harm done, we only
 	 * set the FUTEX_WAITERS bit in the user space variable.
 	 */
-	return attach_to_pi_owner(uval, key, ps, exiting);
+	return attach_to_pi_owner(uaddr, newval, key, ps, exiting);
 }
 
 /**



  parent reply	other threads:[~2021-02-22 13:55 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-22 12:35 [PATCH 4.9 00/49] 4.9.258-rc1 review Greg Kroah-Hartman
2021-02-22 12:35 ` [PATCH 4.9 01/49] mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 02/49] fgraph: Initialize tracing_graph_pause at task creation Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 03/49] remoteproc: qcom_q6v5_mss: Validate MBA firmware size before load Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 04/49] af_key: relax availability checks for skb size calculation Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 05/49] iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 06/49] iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 07/49] iwlwifi: mvm: guard against device removal in reprobe Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 08/49] SUNRPC: Move simple_get_bytes and simple_get_netobj into private header Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 09/49] SUNRPC: Handle 0 length opaque XDR object data properly Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 10/49] lib/string: Add strscpy_pad() function Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 11/49] include/trace/events/writeback.h: fix -Wstringop-truncation warnings Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 12/49] memcg: fix a crash in wb_workfn when a device disappears Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 13/49] futex: Ensure the correct return value from futex_lock_pi() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 14/49] futex: Change locking rules Greg Kroah-Hartman
2021-02-22 12:36 ` Greg Kroah-Hartman [this message]
2021-02-22 12:36 ` [PATCH 4.9 16/49] squashfs: add more sanity checks in id lookup Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 17/49] squashfs: add more sanity checks in inode lookup Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 18/49] squashfs: add more sanity checks in xattr id lookup Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 19/49] tracing: Do not count ftrace events in top level enable output Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 20/49] tracing: Check length before giving out the filter buffer Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 21/49] ovl: skip getxattr of security labels Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 22/49] ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 23/49] memblock: do not start bottom-up allocations with kernel_end Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 24/49] bpf: Check for integer overflow when using roundup_pow_of_two() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 25/49] netfilter: xt_recent: Fix attempt to update deleted entry Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 26/49] xen/netback: avoid race in xenvif_rx_ring_slots_available() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 27/49] netfilter: conntrack: skip identical origin tuple in same zone only Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 28/49] h8300: fix PREEMPTION build, TI_PRE_COUNT undefined Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 29/49] usb: dwc3: ulpi: fix checkpatch warning Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 30/49] usb: dwc3: ulpi: Replace CPU-based busyloop with Protocol-based one Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 31/49] net/vmw_vsock: improve locking in vsock_connect_timeout() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 32/49] net: watchdog: hold device global xmit lock during tx disable Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 33/49] vsock/virtio: update credit only if socket is not closed Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 34/49] vsock: fix locking in vsock_shutdown() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 35/49] x86/build: Disable CET instrumentation in the kernel for 32-bit too Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 36/49] trace: Use -mcount-record for dynamic ftrace Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 37/49] tracing: Fix SKIP_STACK_VALIDATION=1 build due to bad merge with -mrecord-mcount Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 38/49] tracing: Avoid calling cc-option -mrecord-mcount for every Makefile Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 39/49] Xen/x86: dont bail early from clear_foreign_p2m_mapping() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 40/49] Xen/x86: also check kernel mapping in set_foreign_p2m_mapping() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 41/49] Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 42/49] Xen/gntdev: correct error checking " Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 43/49] xen/arm: dont ignore return errors from set_phys_to_machine Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 44/49] xen-blkback: dont "handle" error by BUG() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 45/49] xen-netback: " Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 46/49] xen-scsiback: " Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 47/49] xen-blkback: fix error handling in xen_blkbk_map() Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 48/49] scsi: qla2xxx: Fix crash during driver load on big endian machines Greg Kroah-Hartman
2021-02-22 12:36 ` [PATCH 4.9 49/49] kvm: check tlbs_dirty directly Greg Kroah-Hartman
2021-02-22 18:24 ` [PATCH 4.9 00/49] 4.9.258-rc1 review Florian Fainelli
2021-02-22 21:27 ` Guenter Roeck
2021-02-23 12:04 ` Naresh Kamboju
2021-02-23 13:47 ` Jon Hunter
2021-02-23 21:19 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210222121025.999623772@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dvhart@infradead.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=lee.jones@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=stli@linux.ibm.com \
    --cc=sudipm.mukherjee@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.