[PATCH] futex: Prevent pi_state from double freeing in case of error

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] futex: Prevent pi_state from double freeing in case of error
@ 2015-12-18  8:43 Bhuvanesh_Surachari
  2015-12-18 14:55 ` Davidlohr Bueso
  2015-12-18 23:08 ` Darren Hart
  0 siblings, 2 replies; 8+ messages in thread
From: Bhuvanesh_Surachari @ 2015-12-18  8:43 UTC (permalink / raw)
  To: tglx
  Cc: peterz, mingo, dave, akpm, linux, viresh.kumar, luto, bigeasy,
	mtk.manpages, linux-kernel, Bhuvanesh Surachari,
	Bhuvanesh Surachari, Andy Lowe

From: Bhuvanesh Surachari <bhuvanesh_surachari@mentor.com>

In case of error from rt_mutex_start_proxy_lock pi_state is freed
twice in futex_requeue function. Hence removing free_pi_state in
else branch and branching to the location where pi_state is freed.

Signed-off-by: Bhuvanesh Surachari <Bhuvanesh_Surachari@mentor.com>
Signed-off-by: Andy Lowe <Andy_Lowe@mentor.com>
---
 kernel/futex.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 684d754..264b3f2 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1815,7 +1815,6 @@ retry_private:
 			} else if (ret) {
 				/* -EDEADLK */
 				this->pi_state = NULL;
-				free_pi_state(pi_state);
 				goto out_unlock;
 			}
 		}
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] futex: Prevent pi_state from double freeing in case of error
  2015-12-18  8:43 [PATCH] futex: Prevent pi_state from double freeing in case of error Bhuvanesh_Surachari
@ 2015-12-18 14:55 ` Davidlohr Bueso
  2015-12-19 18:24   ` Thomas Gleixner
  2015-12-18 23:08 ` Darren Hart
  1 sibling, 1 reply; 8+ messages in thread
From: Davidlohr Bueso @ 2015-12-18 14:55 UTC (permalink / raw)
  To: Bhuvanesh_Surachari
  Cc: tglx, peterz, mingo, akpm, linux, viresh.kumar, luto, bigeasy,
	mtk.manpages, linux-kernel, Andy Lowe

On Fri, 18 Dec 2015, Bhuvanesh_Surachari@mentor.com wrote:

>From: Bhuvanesh Surachari <bhuvanesh_surachari@mentor.com>
>
>In case of error from rt_mutex_start_proxy_lock pi_state is freed
>twice in futex_requeue function. Hence removing free_pi_state in
>else branch and branching to the location where pi_state is freed.

This reads weird.

Do note that free_pi_state is already protected against passing it a nil
pi_state, so this is merely cosmetic.

Hmm but yeah, looks legit. The cases were we free the pi_state on error
are when doing retry kind of paths, EDEADLK not being one of those cases.

>
>Signed-off-by: Bhuvanesh Surachari <Bhuvanesh_Surachari@mentor.com>
>Signed-off-by: Andy Lowe <Andy_Lowe@mentor.com>

This last SoB tag is incorrect, Andy Lowe is not carrying the patch for you.

>---
> kernel/futex.c |    1 -
> 1 file changed, 1 deletion(-)
>
>diff --git a/kernel/futex.c b/kernel/futex.c
>index 684d754..264b3f2 100644
>--- a/kernel/futex.c
>+++ b/kernel/futex.c
>@@ -1815,7 +1815,6 @@ retry_private:
> 			} else if (ret) {
> 				/* -EDEADLK */
> 				this->pi_state = NULL;
>-				free_pi_state(pi_state);
> 				goto out_unlock;
> 			}
> 		}


Nit but I think we also want to set pi_state to nil after freeing it
in out_unlock. That way this scenario would simply be goto out_unlock.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] futex: Prevent pi_state from double freeing in case of error
  2015-12-18  8:43 [PATCH] futex: Prevent pi_state from double freeing in case of error Bhuvanesh_Surachari
  2015-12-18 14:55 ` Davidlohr Bueso
@ 2015-12-18 23:08 ` Darren Hart
  1 sibling, 0 replies; 8+ messages in thread
From: Darren Hart @ 2015-12-18 23:08 UTC (permalink / raw)
  To: Bhuvanesh_Surachari
  Cc: tglx, peterz, mingo, dave, akpm, linux, viresh.kumar, luto,
	bigeasy, mtk.manpages, linux-kernel, Andy Lowe

On Fri, Dec 18, 2015 at 02:13:43PM +0530, Bhuvanesh_Surachari@mentor.com wrote:
> From: Bhuvanesh Surachari <bhuvanesh_surachari@mentor.com>
> 
> In case of error from rt_mutex_start_proxy_lock pi_state is freed
> twice in futex_requeue function. Hence removing free_pi_state in
> else branch and branching to the location where pi_state is freed.
> 
> Signed-off-by: Bhuvanesh Surachari <Bhuvanesh_Surachari@mentor.com>
> Signed-off-by: Andy Lowe <Andy_Lowe@mentor.com>

Apparently inadvertently introduced by:

commit 30a6b8031fe14031ab27c1fa3483cb9780e7f63c
    futex: Fix a race condition between REQUEUE_PI and task death

Reviewed-by: Darren Hart <dvhart@linux.intel.com>

> ---
>  kernel/futex.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 684d754..264b3f2 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1815,7 +1815,6 @@ retry_private:
>  			} else if (ret) {
>  				/* -EDEADLK */
>  				this->pi_state = NULL;
> -				free_pi_state(pi_state);
>  				goto out_unlock;
>  			}
>  		}
> -- 
> 1.7.9.5
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] futex: Prevent pi_state from double freeing in case of error
  2015-12-18 14:55 ` Davidlohr Bueso
@ 2015-12-19 18:24   ` Thomas Gleixner
  2015-12-20  5:21     ` Darren Hart
  2015-12-23 14:56     ` Bhuvanesh
  0 siblings, 2 replies; 8+ messages in thread
From: Thomas Gleixner @ 2015-12-19 18:24 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Bhuvanesh_Surachari, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	linux, viresh.kumar, luto, Sebastian Sewior, mtk.manpages, LKML,
	Andy Lowe, Darren Hart

On Fri, 18 Dec 2015, Davidlohr Bueso wrote:
> On Fri, 18 Dec 2015, Bhuvanesh_Surachari@mentor.com wrote:
> 
> > From: Bhuvanesh Surachari <bhuvanesh_surachari@mentor.com>
> > 
> > In case of error from rt_mutex_start_proxy_lock pi_state is freed
> > twice in futex_requeue function. Hence removing free_pi_state in
> > else branch and branching to the location where pi_state is freed.

This is wrong. See below.
 
> This reads weird.
> 
> Do note that free_pi_state is already protected against passing it a nil
> pi_state, so this is merely cosmetic.

No, it's not cosmetic. pi_state is not set to NULL - this->pi_state is.

> Hmm but yeah, looks legit. The cases were we free the pi_state on error
> are when doing retry kind of paths, EDEADLK not being one of those cases.

It does not look legit. It is simply wrong.

> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index 684d754..264b3f2 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -1815,7 +1815,6 @@ retry_private:
> > 			} else if (ret) {
> > 				/* -EDEADLK */
> > 				this->pi_state = NULL;
> > -				free_pi_state(pi_state);
> > 				goto out_unlock;
> > 			}
> > 		}
> 
> 
> Nit but I think we also want to set pi_state to nil after freeing it
> in out_unlock. That way this scenario would simply be goto out_unlock.

No. In the normal exit case we only drop the extra reference on the
pi_state, but we CANNOT clear this->pi_state. That would be fatal as
we just queued a waiter which relies on that state pointer ....

Clearing pi_state itself is pointless as it's a local variable and
ceases to exist because we return.

Let me explain, why the patch is wrong and why some more things are
buggy and suboptimal here with that refcount stuff:

The first thing after we checked the uaddr1 value is:

    if (requeue_pi && (task_count - nr_wake < nr_requeue)) {

       ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
       	     				&key2, &pi_state, nr_requeue);

       Let's look at the return values and the pi_state

       0:        
	  The user space futex @uaddr2 is locked and we either
	  attached to existing pi_state or created new state.

  	  In both cases we hold a reference count on the state, but
  	  the waiter is still enqueued in hb1.

       <0:

	  Any kind of error happened. pi_state is NULL

       >0:

          The user space futex take over was successfull. pi_state is
          NULL. The return value is the vpid of the top waiter,
          i.e. the task on which behalf we acquired the futex.

	  Waiter is dequeued from hb1 and woken up.

	  So in this case we call

	  ret = lookup_pi_state(ret, hb2, &key2, &pi_state);

	  Here the return value means:

	    0:
	        pi_state is !NULL and a reference count on the state is
	       	held

	    <0:
	        Any kind of error happened. pi_state is NULL

  Now we check the return value of the above:

     switch (ret) {
     case 0:
		break;
	 Fine...

     case -EFAULT:
     	  free_pi_state(pi_state);
	  pi_state = NULL;
	  
	  That's pointless and confusing because pi_state IS NULL. If
	  it is not NULL then either futex_proxy_trylock_atomic() or
	  lookup_pi_state() are completely buggered.

     Ditto for -EAGAIN

     For the default case we goto out_unlock, which also calls
     free_pi_state(pi_state), but that's not a real issue as it is
     NULL safe.

If everything went fine, then we enter 

     plist_for_each_entry_safe(this, next, &hb1->chain, list) {

     And in the requeue_pi case we do:
     
      if (requeue_pi) {

      	       atomic_inc(&pi_state->refcount);

      So here we take another refcount on the state, i.e. one for each
      waiter. And we store the pointer in the waiters futex_q object:

	       this->pi_state = pi_state;
	       
	       ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
					       this->rt_waiter,
					       this->task);
	       if (ret == 1) {

      We locked the rt mutex on behalf of this waiter. We do neither
      clear this->pi_state nor drop the refcount because the waiter
      needs the pi_state for fixing up the futex user space value.

      Now the waiter should drop the reference, but that's not the
      case. So this is a real issue here which needs to be
      fixed. Patch comes in seperate mail.

	      } else if (ret) {
	      	     free_pi_state(pi_state);
		     this->pi_state = NULL;

      This undoes the above atomic_inc() and clears the pi_state
      reference in the waiter futex_q object.

      So if we remove this we leak a reference count and have a
      dangling pointer in the waiters futex_q object .... Not so
      desired, right?

		     goto out_unlock;
              }
       }
       ....

out_unlock:
	free_pi_state(pi_state);

  Here we drop the extra reference to the pi_state which we
  acquired at the beginning of the requeue_pi code before we
  started to requeue waiters.


So the patch and the reviews of it are plain wrong.

This refcount stuff certainly lacks a few comments, but I really would
like to know:

 - Why was this patch created in the first place?

   The changelog is completely useless. It does not tell what the
   observed issue was or whether this was merily the result of reading
   the code and assuming that this is a double free. But it was
   certainly not due to a thorough analysis of the code in question.

 - Why are the reviews so sloppy and useless?

   The one I'm answering to is just hillarious and the other one picks
   a random commit, claims that it inadvertantly introduced the issue
   and is done with it. Really helpful.

Folks, this is not the way it works. It took me a couple of hours to
get this analyzed properly, but I don't see that anyone involved in
this thread has spent more than a few seconds on it.

At least I found a real issue related to this refcount stuff, but I
would have preferred that the people involved in this would have found
it in the first place instead of providing half baken crap.

Yours grumpy

      tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] futex: Prevent pi_state from double freeing in case of error
  2015-12-19 18:24   ` Thomas Gleixner
@ 2015-12-20  5:21     ` Darren Hart
  2015-12-21 17:42       ` Davidlohr Bueso
  2015-12-23 14:56     ` Bhuvanesh
  1 sibling, 1 reply; 8+ messages in thread
From: Darren Hart @ 2015-12-20  5:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Davidlohr Bueso, Bhuvanesh_Surachari, Peter Zijlstra,
	Ingo Molnar, Andrew Morton, linux, viresh.kumar, luto,
	Sebastian Sewior, mtk.manpages, LKML, Andy Lowe

On Sat, Dec 19, 2015 at 07:24:38PM +0100, Thomas Gleixner wrote:
>  - Why are the reviews so sloppy and useless?
> 
>    The one I'm answering to is just hillarious and the other one picks
>    a random commit, claims that it inadvertantly introduced the issue
>    and is done with it. Really helpful.
> 
> Folks, this is not the way it works. It took me a couple of hours to
> get this analyzed properly, but I don't see that anyone involved in
> this thread has spent more than a few seconds on it.
> 
> At least I found a real issue related to this refcount stuff, but I
> would have preferred that the people involved in this would have found
> it in the first place instead of providing half baken crap.
> 
> Yours grumpy
> 
>       tglx

More care was clearly warranted on my part. I thought I saw the connection with
the previous commit I referenced (it wasn't random!), but clearly didn't dig
deep enough.

My apologies,

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] futex: Prevent pi_state from double freeing in case of error
  2015-12-20  5:21     ` Darren Hart
@ 2015-12-21 17:42       ` Davidlohr Bueso
  0 siblings, 0 replies; 8+ messages in thread
From: Davidlohr Bueso @ 2015-12-21 17:42 UTC (permalink / raw)
  To: Darren Hart
  Cc: Thomas Gleixner, Bhuvanesh_Surachari, Peter Zijlstra,
	Ingo Molnar, Andrew Morton, linux, viresh.kumar, luto,
	Sebastian Sewior, mtk.manpages, LKML, Andy Lowe

On Sat, 19 Dec 2015, Darren Hart wrote:

>On Sat, Dec 19, 2015 at 07:24:38PM +0100, Thomas Gleixner wrote:
>>  - Why are the reviews so sloppy and useless?
>>
>>    The one I'm answering to is just hillarious and the other one picks
>>    a random commit, claims that it inadvertantly introduced the issue
>>    and is done with it. Really helpful.
>>
>> Folks, this is not the way it works. It took me a couple of hours to
>> get this analyzed properly, but I don't see that anyone involved in
>> this thread has spent more than a few seconds on it.
>>
>> At least I found a real issue related to this refcount stuff, but I
>> would have preferred that the people involved in this would have found
>> it in the first place instead of providing half baken crap.
>>
>> Yours grumpy
>>
>>       tglx
>
>More care was clearly warranted on my part. I thought I saw the connection with
>the previous commit I referenced (it wasn't random!), but clearly didn't dig
>deep enough.

Same here. I hadn't looked at it long enough, so while I was hesitant of sending
a full review/ack, I should not have sent a half-arse answer either. Apologies.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] futex: Prevent pi_state from double freeing in case of error
  2015-12-19 18:24   ` Thomas Gleixner
  2015-12-20  5:21     ` Darren Hart
@ 2015-12-23 14:56     ` Bhuvanesh
  2015-12-23 18:07       ` Thomas Gleixner
  1 sibling, 1 reply; 8+ messages in thread
From: Bhuvanesh @ 2015-12-23 14:56 UTC (permalink / raw)
  To: Thomas Gleixner, Davidlohr Bueso
  Cc: Peter Zijlstra, Ingo Molnar, Andrew Morton, linux, viresh.kumar,
	luto, Sebastian Sewior, mtk.manpages, LKML, Andy Lowe,
	Darren Hart

Hi,

>  - Why was this patch created in the first place?
> 
>    The changelog is completely useless. It does not tell what the
>    observed issue was or whether this was merily the result of reading
>    the code and assuming that this is a double free. But it was
>    certainly not due to a thorough analysis of the code in question.

Apologies for not putting the backtrace earlier.

During our regression test of the kernel version 3.14, generated a 
warning in futex code and resulted in crash with the backtrace given 
below:

WARNING: CPU: 0 PID: 1468 at fs/inode.c:399 ihold+0x40/0x48()
Backtrace: 
[<80011b88>] (dump_backtrace) from [<80011d90>] (show_stack+0x18/0x1c)
[<80011d78>] (show_stack) from [<805036b8>] (dump_stack+0x74/0xc0)
[<80503644>] (dump_stack) from [<8002393c>] (warn_slowpath_common+0x70/0x94)
[<800238cc>] (warn_slowpath_common) from [<80023a04>] (warn_slowpath_null+0x24/0x2c)
[<800239e0>] (warn_slowpath_null) from [<80115004>] (ihold+0x40/0x48)
[<80114fc4>] (ihold) from [<8007f33c>] (get_futex_key_refs+0x58/0x64)
[<8007f2e4>] (get_futex_key_refs) from [<8007f524>] (get_futex_key+0x1dc/0x200)
[<8007f348>] (get_futex_key) from [<80080048>] (futex_wake+0x4c/0x144)
[<8007fffc>] (futex_wake) from [<800819dc>] (do_futex+0xf8/0x984)
[<800818e4>] (do_futex) from [<80082358>] (SyS_futex+0xf0/0x15c)
[<80082268>] (SyS_futex) from [<8000e060>] (ret_fast_syscall+0x0/0x50)
Kernel BUG at 80114f5c [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
CPU: 1 PID: 826 Comm: mediaengine_out Tainted: G        W    3.14.51-03408-gf4477ef #1
task: a31c6d40 ti: 96ab2000 task.ti: 96ab2000
PC is at clear_inode+0x5c/0x60
LR is at preempt_count_sub+0xd8/0x104
pc : [<80114f5c>]    lr : [<8050aae0>]    psr: 20070113
sp : 96ab3e78  ip : 96ab3e48  fp : 96ab3e8c
r10: 7eb42200  r9 : 836b5a30  r8 : 96ab3f04
r7 : 836b5a20  r6 : 836b5a30  r5 : 836b5b14  r4 : 836b5a30
r3 : 00000060  r2 : 836b5b58  r1 : 80114f30  r0 : 80508274
Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c5387d  Table: 331dc04a  DAC: 00000015
Process mediaengine_out (pid: 826, stack limit = 0x96ab2238)
Stack: (0x96ab3e78 to 0x96ab4000)
3e60:                                                       836b5a20 836b5a20
3e80: 96ab3eb4 96ab3e90 800d5e68 80114f0c ffffffff ffffffff 836b5a30 836b5ad4
3ea0: 80518f30 80518f30 96ab3ed4 96ab3eb8 80115154 800d5d48 00000000 836b5a30
3ec0: 836b5a80 a27a6c00 96ab3ef4 96ab3ed8 80115bc4 801150c4 00000000 a0353aa0
3ee0: 00000000 836c7000 96ab3f94 96ab3ef8 8010bae0 80115a94 a287b310 ffffff9c
3f00: 836b5a30 00000000 a287b310 a0053660 faee90ab 00000027 836c7019 96ab3e38
3f20: 00000000 a0007330 a20f04c0 00000000 00000002 000000e4 00000000 00000000
3f40: 8000e224 8000e224 96ab2000 00000000 96ab3f6c 96ab3f60 800fff9c 800ffd6c
3f60: 96ab3f8c 96ab2000 00000004 0064dc48 49fc2fb8 00000023 0000000a 8000e224
3f80: 96ab2000 00000000 96ab3fa4 96ab3f98 8010c694 8010b9e4 00000000 96ab3fa8
3fa0: 8000e060 8010c688 0064dc48 49fc2fb8 7eb42200 0064dc6c 00000000 006e696f
3fc0: 0064dc48 49fc2fb8 00000023 0000000a 2c90eb88 7eb42c90 7eb45334 7eb4224c
3fe0: 49fcd178 7eb421fc 49fbe248 49f0b6ec 400f0010 7eb42200 00000000 00000000
Backtrace: 
[<80114f00>] (clear_inode) from [<800d5e68>] (shmem_evict_inode+0x12c/0x148)
[<800d5d3c>] (shmem_evict_inode) from [<80115154>] (evict+0x9c/0x160)
[<801150b8>] (evict) from [<80115bc4>] (iput+0x13c/0x144)
[<80115a88>] (iput) from [<8010bae0>] (do_unlinkat+0x108/0x1c8)
[<8010b9d8>] (do_unlinkat) from [<8010c694>] (SyS_unlink+0x18/0x1c)
[<8010c67c>] (SyS_unlink) from [<8000e060>] (ret_fast_syscall+0x0/0x50)
Code: e3130040 03a03060 0584306c 089da830 (e7f001f2) 
Kernel panic - not syncing: Fatal exception
Backtrace: 
[<80011b88>] (dump_backtrace) from [<80011d90>] (show_stack+0x18/0x1c)
[<80011d78>] (show_stack) from [<805036b8>] (dump_stack+0x74/0xc0)
[<80503644>] (dump_stack) from [<800136a0>] (handle_IPI+0xdc/0x184)
[<800135c4>] (handle_IPI) from [<8000858c>] (gic_handle_irq+0x60/0x68)
[<8000852c>] (gic_handle_irq) from [<80508a84>] (__irq_svc+0x44/0x78)
[<8000ed4c>] (arch_cpu_idle) from [<80069644>] (cpu_startup_entry+0x1c0/0x254)
[<80069484>] (cpu_startup_entry) from [<804ff4ec>] (rest_init+0x78/0x90)
[<804ff474>] (rest_init) from [<806d1b40>] (start_kernel+0x314/0x370)
[<806d182c>] (start_kernel) from [<10008074>] (0x10008074)

We observed the above issue thrice in our testing. Unfortunately we 
don't know the usecase or steps which resulted in the above behavior, 
since the testing was random. 

We did the static analysis of the futex code and didn't find anything
suspicious in usage of get_futex_key_refs and drop_futex_key_refs.
Assuming it could be due to corruption of futex_key struct we arrived
at the patch.

Sorry for overlooking the code. Thanks for the clarification.

Thank you,
Regards,
Bhuvanesh

On 12/19/2015 11:54 PM, Thomas Gleixner wrote:
> On Fri, 18 Dec 2015, Davidlohr Bueso wrote:
>> On Fri, 18 Dec 2015, Bhuvanesh_Surachari@mentor.com wrote:
>>
>>> From: Bhuvanesh Surachari <bhuvanesh_surachari@mentor.com>
>>>
>>> In case of error from rt_mutex_start_proxy_lock pi_state is freed
>>> twice in futex_requeue function. Hence removing free_pi_state in
>>> else branch and branching to the location where pi_state is freed.
> 
> This is wrong. See below.
>  
>> This reads weird.
>>
>> Do note that free_pi_state is already protected against passing it a nil
>> pi_state, so this is merely cosmetic.
> 
> No, it's not cosmetic. pi_state is not set to NULL - this->pi_state is.
> 
>> Hmm but yeah, looks legit. The cases were we free the pi_state on error
>> are when doing retry kind of paths, EDEADLK not being one of those cases.
> 
> It does not look legit. It is simply wrong.
> 
>>> diff --git a/kernel/futex.c b/kernel/futex.c
>>> index 684d754..264b3f2 100644
>>> --- a/kernel/futex.c
>>> +++ b/kernel/futex.c
>>> @@ -1815,7 +1815,6 @@ retry_private:
>>> 			} else if (ret) {
>>> 				/* -EDEADLK */
>>> 				this->pi_state = NULL;
>>> -				free_pi_state(pi_state);
>>> 				goto out_unlock;
>>> 			}
>>> 		}
>>
>>
>> Nit but I think we also want to set pi_state to nil after freeing it
>> in out_unlock. That way this scenario would simply be goto out_unlock.
> 
> No. In the normal exit case we only drop the extra reference on the
> pi_state, but we CANNOT clear this->pi_state. That would be fatal as
> we just queued a waiter which relies on that state pointer ....
> 
> Clearing pi_state itself is pointless as it's a local variable and
> ceases to exist because we return.
> 
> Let me explain, why the patch is wrong and why some more things are
> buggy and suboptimal here with that refcount stuff:
> 
> The first thing after we checked the uaddr1 value is:
> 
>     if (requeue_pi && (task_count - nr_wake < nr_requeue)) {
> 
>        ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
>        	     				&key2, &pi_state, nr_requeue);
> 
>        Let's look at the return values and the pi_state
> 
>        0:        
> 	  The user space futex @uaddr2 is locked and we either
> 	  attached to existing pi_state or created new state.
> 
>   	  In both cases we hold a reference count on the state, but
>   	  the waiter is still enqueued in hb1.
> 
>        <0:
> 
> 	  Any kind of error happened. pi_state is NULL
> 
>        >0:
> 
>           The user space futex take over was successfull. pi_state is
>           NULL. The return value is the vpid of the top waiter,
>           i.e. the task on which behalf we acquired the futex.
> 
> 	  Waiter is dequeued from hb1 and woken up.
> 
> 	  So in this case we call
> 
> 	  ret = lookup_pi_state(ret, hb2, &key2, &pi_state);
> 
> 	  Here the return value means:
> 
> 	    0:
> 	        pi_state is !NULL and a reference count on the state is
> 	       	held
> 
> 	    <0:
> 	        Any kind of error happened. pi_state is NULL
> 
>   Now we check the return value of the above:
> 
>      switch (ret) {
>      case 0:
> 		break;
> 	 Fine...
> 
>      case -EFAULT:
>      	  free_pi_state(pi_state);
> 	  pi_state = NULL;
> 	  
> 	  That's pointless and confusing because pi_state IS NULL. If
> 	  it is not NULL then either futex_proxy_trylock_atomic() or
> 	  lookup_pi_state() are completely buggered.
> 
>      Ditto for -EAGAIN
> 
>      For the default case we goto out_unlock, which also calls
>      free_pi_state(pi_state), but that's not a real issue as it is
>      NULL safe.
> 
> If everything went fine, then we enter 
> 
>      plist_for_each_entry_safe(this, next, &hb1->chain, list) {
> 
>      And in the requeue_pi case we do:
>      
>       if (requeue_pi) {
> 
>       	       atomic_inc(&pi_state->refcount);
> 
>       So here we take another refcount on the state, i.e. one for each
>       waiter. And we store the pointer in the waiters futex_q object:
> 
> 	       this->pi_state = pi_state;
> 	       
> 	       ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
> 					       this->rt_waiter,
> 					       this->task);
> 	       if (ret == 1) {
> 
>       We locked the rt mutex on behalf of this waiter. We do neither
>       clear this->pi_state nor drop the refcount because the waiter
>       needs the pi_state for fixing up the futex user space value.
> 
>       Now the waiter should drop the reference, but that's not the
>       case. So this is a real issue here which needs to be
>       fixed. Patch comes in seperate mail.
> 
> 	      } else if (ret) {
> 	      	     free_pi_state(pi_state);
> 		     this->pi_state = NULL;
> 
>       This undoes the above atomic_inc() and clears the pi_state
>       reference in the waiter futex_q object.
> 
>       So if we remove this we leak a reference count and have a
>       dangling pointer in the waiters futex_q object .... Not so
>       desired, right?
> 
> 		     goto out_unlock;
>               }
>        }
>        ....
> 
> out_unlock:
> 	free_pi_state(pi_state);
> 
>   Here we drop the extra reference to the pi_state which we
>   acquired at the beginning of the requeue_pi code before we
>   started to requeue waiters.
> 
> 
> So the patch and the reviews of it are plain wrong.
> 
> This refcount stuff certainly lacks a few comments, but I really would
> like to know:
> 
>  - Why was this patch created in the first place?
> 
>    The changelog is completely useless. It does not tell what the
>    observed issue was or whether this was merily the result of reading
>    the code and assuming that this is a double free. But it was
>    certainly not due to a thorough analysis of the code in question.
> 
>  - Why are the reviews so sloppy and useless?
> 
>    The one I'm answering to is just hillarious and the other one picks
>    a random commit, claims that it inadvertantly introduced the issue
>    and is done with it. Really helpful.
> 
> Folks, this is not the way it works. It took me a couple of hours to
> get this analyzed properly, but I don't see that anyone involved in
> this thread has spent more than a few seconds on it.
> 
> At least I found a real issue related to this refcount stuff, but I
> would have preferred that the people involved in this would have found
> it in the first place instead of providing half baken crap.
> 
> Yours grumpy
> 
>       tglx
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] futex: Prevent pi_state from double freeing in case of error
  2015-12-23 14:56     ` Bhuvanesh
@ 2015-12-23 18:07       ` Thomas Gleixner
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Gleixner @ 2015-12-23 18:07 UTC (permalink / raw)
  To: Bhuvanesh
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	linux, viresh.kumar, luto, Sebastian Sewior, mtk.manpages, LKML,
	Andy Lowe, Darren Hart

Bhuvanesh,

On Wed, 23 Dec 2015, Bhuvanesh wrote:
> Apologies for not putting the backtrace earlier.

No problem. Let's look at it.
 
> During our regression test of the kernel version 3.14, generated a 
> warning in futex code and resulted in crash with the backtrace given 
> below:
> 
> WARNING: CPU: 0 PID: 1468 at fs/inode.c:399 ihold+0x40/0x48()
> Backtrace: 
> [<80011b88>] (dump_backtrace) from [<80011d90>] (show_stack+0x18/0x1c)
> [<80011d78>] (show_stack) from [<805036b8>] (dump_stack+0x74/0xc0)
> [<80503644>] (dump_stack) from [<8002393c>] (warn_slowpath_common+0x70/0x94)
> [<800238cc>] (warn_slowpath_common) from [<80023a04>] (warn_slowpath_null+0x24/0x2c)
> [<800239e0>] (warn_slowpath_null) from [<80115004>] (ihold+0x40/0x48)
> [<80114fc4>] (ihold) from [<8007f33c>] (get_futex_key_refs+0x58/0x64)
> [<8007f2e4>] (get_futex_key_refs) from [<8007f524>] (get_futex_key+0x1dc/0x200)
> [<8007f348>] (get_futex_key) from [<80080048>] (futex_wake+0x4c/0x144)

That's a totally different code path. It comes from get_futex_key_refs() and
ihold() complains about inode refcount being less than 2. So the futex sits in
a memory mapped file.

> [<8007fffc>] (futex_wake) from [<800819dc>] (do_futex+0xf8/0x984)
> [<800818e4>] (do_futex) from [<80082358>] (SyS_futex+0xf0/0x15c)
> [<80082268>] (SyS_futex) from [<8000e060>] (ret_fast_syscall+0x0/0x50)

> Kernel BUG at 80114f5c [verbose debug info unavailable]
> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> CPU: 1 PID: 826 Comm: mediaengine_out Tainted: G        W    3.14.51-03408-gf4477ef #1
> task: a31c6d40 ti: 96ab2000 task.ti: 96ab2000
> PC is at clear_inode+0x5c/0x60

Here the shmem code triggers a bug in clear_inode(). I can't tell which one
exactly (there are a couple of them).

> LR is at preempt_count_sub+0xd8/0x104

> [<80114f00>] (clear_inode) from [<800d5e68>] (shmem_evict_inode+0x12c/0x148)
> [<800d5d3c>] (shmem_evict_inode) from [<80115154>] (evict+0x9c/0x160)
> [<801150b8>] (evict) from [<80115bc4>] (iput+0x13c/0x144)
> [<80115a88>] (iput) from [<8010bae0>] (do_unlinkat+0x108/0x1c8)
> [<8010b9d8>] (do_unlinkat) from [<8010c694>] (SyS_unlink+0x18/0x1c)
> [<8010c67c>] (SyS_unlink) from [<8000e060>] (ret_fast_syscall+0x0/0x50)

The call comes from sys_unlink. So a file is removed. I can't tell whether
this is related, but it probably is.

> We observed the above issue thrice in our testing. Unfortunately we 
> don't know the usecase or steps which resulted in the above behavior, 
> since the testing was random. 

You might try to analyse the futex/mmap code of PID 1468. It might be
something when the process shuts down and tears down the map. That might give
you a hint how to reproduce the issue.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-12-23 18:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-18  8:43 [PATCH] futex: Prevent pi_state from double freeing in case of error Bhuvanesh_Surachari
2015-12-18 14:55 ` Davidlohr Bueso
2015-12-19 18:24   ` Thomas Gleixner
2015-12-20  5:21     ` Darren Hart
2015-12-21 17:42       ` Davidlohr Bueso
2015-12-23 14:56     ` Bhuvanesh
2015-12-23 18:07       ` Thomas Gleixner
2015-12-18 23:08 ` Darren Hart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.