linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Liebler <stli@linux.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Darren Hart <dvhart@infradead.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [patch] futex: Cure exit race
Date: Tue, 11 Dec 2018 09:04:50 +0100	[thread overview]
Message-ID: <06e7e269-6c87-f5f2-9a15-435b0a376105@linux.ibm.com> (raw)
In-Reply-To: <20181210152311.986181245@linutronix.de>

Hi Thomas,

does this also handle the ESRCH returned by
attach_to_pi_owner(...)
{...
	if (!pid)
		return -ESRCH;
	p = find_get_task_by_vpid(pid);
	if (!p)
		return -ESRCH;
...

I think pid should never be zero when attach_to_pi_owner is called.
But it can happen that p is null? At least I traced the "return -ESRCH" 
with the 4.17 kernel. Unfortunately both returns were done by the same 
instruction address.

Bye
Stefan

On 12/10/2018 04:23 PM, Thomas Gleixner wrote:
> Stefan reported, that the glibc tst-robustpi4 test case fails
> occasionally. That case creates the following race between
> sys_exit() and sys_futex(LOCK_PI):
> 
>   CPU0				CPU1
> 
>   sys_exit()			sys_futex()
>    do_exit()			 futex_lock_pi()
>     exit_signals(tsk)		  No waiters:
>      tsk->flags |= PF_EXITING;	  *uaddr == 0x00000PID
>    mm_release(tsk)		  Set waiter bit
>     exit_robust_list(tsk) {	  *uaddr = 0x80000PID;
>        Set owner died		  attach_to_pi_owner() {
>      *uaddr = 0xC0000000;	   tsk = get_task(PID);
>     }				   if (!tsk->flags & PF_EXITING) {
>    ...				     attach();
>    tsk->flags |= PF_EXITPIDONE;	   } else {
> 				     if (!(tsk->flags & PF_EXITPIDONE))
> 				       return -EAGAIN;
> 				     return -ESRCH; <--- FAIL
> 				   }
> 
> ESRCH is returned all the way to user space, which triggers the glibc test
> case assert. Returning ESRCH unconditionally is wrong here because the user
> space value has been changed by the exiting task to 0xC0000000, i.e. the
> FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This
> is a valid state and the kernel has to handle it, i.e. taking the futex.
> 
> Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE
> is set in the task which owns the futex. If the value has changed, let
> the kernel retry the operation, which includes all regular sanity checks
> and correctly handles the FUTEX_OWNER_DIED case.
> 
> If it hasn't changed, then return ESRCH as there is no way to distinguish
> this case from malfunctioning user space. This happens when the exiting
> task did not have a robust list, the robust list was corrupted or the user
> space value in the futex was simply bogus.
> 
> Reported-by: Stefan Liebler <stli@linux.ibm.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: stable@vger.kernel.org
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467
> ---
>   kernel/futex.c |   57 +++++++++++++++++++++++++++++++++++++++++++++++++++++----
>   1 file changed, 53 insertions(+), 4 deletions(-)
> 
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1148,11 +1148,60 @@ static int attach_to_pi_state(u32 __user
>   	return ret;
>   }
>   
> +static int handle_exit_race(u32 __user *uaddr, u32 uval, struct task_struct *tsk)
> +{
> +	u32 uval2;
> +
> +	/*
> +	 * If PF_EXITPIDONE is not yet set try again.
> +	 */
> +	if (!(tsk->flags & PF_EXITPIDONE))
> +		return -EAGAIN;
> +
> +	/*
> +	 * Reread the user space value to handle the following situation:
> +	 *
> +	 * CPU0				CPU1
> +	 *
> +	 * sys_exit()			sys_futex()
> +	 *  do_exit()			 futex_lock_pi()
> +	 *   exit_signals(tsk)		  No waiters:
> +	 *    tsk->flags |= PF_EXITING;	  *uaddr == 0x00000PID
> +	 *  mm_release(tsk)		  Set waiter bit
> +	 *   exit_robust_list(tsk) {	  *uaddr = 0x80000PID;
> +	 *      Set owner died		  attach_to_pi_owner() {
> +	 *    *uaddr = 0xC0000000;	   tsk = get_task(PID);
> +	 *   }				   if (!tsk->flags & PF_EXITING) {
> +	 *  ...				     attach();
> +	 *  tsk->flags |= PF_EXITPIDONE;   } else {
> +	 *				     if (!(tsk->flags & PF_EXITPIDONE))
> +	 *				       return -EAGAIN;
> +	 *				     return -ESRCH; <--- FAIL
> +	 *				   }
> +	 *
> +	 * Returning ESRCH unconditionally is wrong here because the
> +	 * user space value has been changed by the exiting task.
> +	 */
> +	if (get_futex_value_locked(&uval2, uaddr))
> +		return -EFAULT;
> +
> +	/* If the user space value has changed, try again. */
> +	if (uval2 != uval)
> +		return -EAGAIN;
> +
> +	/*
> +	 * The exiting task did not have a robust list, the robust list was
> +	 * corrupted or the user space value in *uaddr is simply bogus.
> +	 * Give up and tell user space.
> +	 */
> +	return -ESRCH;
> +}
> +
>   /*
>    * Lookup the task for the TID provided from user space and attach to
>    * it after doing proper sanity checks.
>    */
> -static int attach_to_pi_owner(u32 uval, union futex_key *key,
> +static int attach_to_pi_owner(u32 __user *uaddr, u32 uval, union futex_key *key,
>   			      struct futex_pi_state **ps)
>   {
>   	pid_t pid = uval & FUTEX_TID_MASK;
> @@ -1187,7 +1236,7 @@ static int attach_to_pi_owner(u32 uval,
>   		 * set, we know that the task has finished the
>   		 * cleanup:
>   		 */
> -		int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
> +		int ret = handle_exit_race(uaddr, uval, p);
>   
>   		raw_spin_unlock_irq(&p->pi_lock);
>   		put_task_struct(p);
> @@ -1244,7 +1293,7 @@ static int lookup_pi_state(u32 __user *u
>   	 * We are the first waiter - try to look up the owner based on
>   	 * @uval and attach to it.
>   	 */
> -	return attach_to_pi_owner(uval, key, ps);
> +	return attach_to_pi_owner(uaddr, uval, key, ps);
>   }
>   
>   static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
> @@ -1352,7 +1401,7 @@ static int futex_lock_pi_atomic(u32 __us
>   	 * attach to the owner. If that fails, no harm done, we only
>   	 * set the FUTEX_WAITERS bit in the user space variable.
>   	 */
> -	return attach_to_pi_owner(uval, key, ps);
> +	return attach_to_pi_owner(uaddr, uval, key, ps);
>   }
>   
>   /**
> 
> 


  parent reply	other threads:[~2018-12-11  8:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-10 15:23 [patch] futex: Cure exit race Thomas Gleixner
2018-12-10 16:02 ` Peter Zijlstra
2018-12-10 17:43   ` Thomas Gleixner
2018-12-12  9:04     ` Peter Zijlstra
2018-12-18  9:31       ` Thomas Gleixner
2018-12-19 13:29         ` Thomas Gleixner
2018-12-19 19:13           ` Thomas Gleixner
     [not found] ` <20181210210920.75EBD20672@mail.kernel.org>
2018-12-10 21:16   ` Thomas Gleixner
2018-12-10 23:01     ` Sasha Levin
2018-12-11 10:29       ` Thomas Gleixner
2018-12-11  8:04 ` Stefan Liebler [this message]
2018-12-11 10:32   ` Thomas Gleixner
2018-12-18 22:18 ` [tip:locking/urgent] " tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=06e7e269-6c87-f5f2-9a15-435b0a376105@linux.ibm.com \
    --to=stli@linux.ibm.com \
    --cc=dvhart@infradead.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).