All of lore.kernel.org
 help / color / mirror / Atom feed
* futex question
@ 2009-09-30  1:10 Anirban Sinha
  2009-10-01  9:22 ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Anirban Sinha @ 2009-09-30  1:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar

Hi Folks:

We are observing something interesting regarding how task->robust_list
pointer is being handled across a sys_execve() call. If a task does a
sys_set_robust_list() with a certain head pointer and then at some point
does a execve() call to over-write it's address space, the 'robust-list'
pointer is never cleared. So in essence what happens is that during task
exit, within mm_release(), the 
if (unlikely(tsk->robust_list)) condition might still be true because
the pointer has a non-null address. However, the actual address value
may not belong to the new address space or point to something else
within the new address space. Should we not just clear the pointer (and
it's compat version) within do_execve()?

Granted, within exit_robust_list(), the fetch_robust_entry() calls will
fail and bail out of the function. So in essence, nothing bad should
happen. However, that extra code should save us from entering
exit_robust_list() in the first place.

CCing Ingo since the robust futex support was started by him.

Cheers,

Ani
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-09-30  1:10 futex question Anirban Sinha
@ 2009-10-01  9:22 ` Ingo Molnar
  2009-10-01 16:54   ` Anirban Sinha
  2009-10-01 23:46   ` Anirban Sinha
  0 siblings, 2 replies; 26+ messages in thread
From: Ingo Molnar @ 2009-10-01  9:22 UTC (permalink / raw)
  To: Anirban Sinha, Darren Hart, Thomas Gleixner, Peter Zijlstra; +Cc: linux-kernel


(Cc:-ed more futex folks.)

* Anirban Sinha <ASinha@zeugmasystems.com> wrote:

> Hi Folks:
> 
> We are observing something interesting regarding how task->robust_list
> pointer is being handled across a sys_execve() call. If a task does a
> sys_set_robust_list() with a certain head pointer and then at some point
> does a execve() call to over-write it's address space, the 'robust-list'
> pointer is never cleared. So in essence what happens is that during task
> exit, within mm_release(), the 
> if (unlikely(tsk->robust_list)) condition might still be true because
> the pointer has a non-null address. However, the actual address value
> may not belong to the new address space or point to something else
> within the new address space. Should we not just clear the pointer (and
> it's compat version) within do_execve()?
> 
> Granted, within exit_robust_list(), the fetch_robust_entry() calls will
> fail and bail out of the function. So in essence, nothing bad should
> happen. However, that extra code should save us from entering
> exit_robust_list() in the first place.
> 
> CCing Ingo since the robust futex support was started by him.
> 
> Cheers,
> 
> Ani
>  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: futex question
  2009-10-01  9:22 ` Ingo Molnar
@ 2009-10-01 16:54   ` Anirban Sinha
  2009-10-01 23:46   ` Anirban Sinha
  1 sibling, 0 replies; 26+ messages in thread
From: Anirban Sinha @ 2009-10-01 16:54 UTC (permalink / raw)
  To: Ingo Molnar, Darren Hart, Thomas Gleixner, Peter Zijlstra; +Cc: linux-kernel

>
>(Cc:-ed more futex folks.)
>

That is great but I am wondering, what's your opinion on this Ingo? 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: futex question
  2009-10-01  9:22 ` Ingo Molnar
  2009-10-01 16:54   ` Anirban Sinha
@ 2009-10-01 23:46   ` Anirban Sinha
  2009-10-02 23:38     ` Darren Hart
  1 sibling, 1 reply; 26+ messages in thread
From: Anirban Sinha @ 2009-10-01 23:46 UTC (permalink / raw)
  To: Ingo Molnar, Darren Hart, Thomas Gleixner, Peter Zijlstra; +Cc: linux-kernel


> Should we not just clear the pointer (and
>> it's compat version) within do_execve()?


In our private repository, applying the following patch resolved the
issues I mentioned. I no longer see messages like this:

 [futex] ("ifconfig")(pid=2509) exit_robust_list:unable to fetch robust
entry. uaddr=0x000000002abbc4f0

from my instrumented kernel within exit_robust_list(). My
instrumentation looked something like this:

if (fetch_robust_entry(...)) {
printk(...);
return;
}

Just tossing the patch in the community in case someone is interested
...



Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com>
---
 fs/compat.c |    3 +++
 fs/exec.c   |    3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index 6d6f98f..c3d117c 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1510,6 +1510,9 @@ int compat_do_execve(char * filename,
        if (retval)
                goto out_file;

+#ifdef CONFIG_FUTEX
+       current->compat_robust_list = NULL;
+#endif
        bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
        if ((retval = bprm->argc) < 0)
                goto out;
diff --git a/fs/exec.c b/fs/exec.c
index 172ceb6..e9334b8 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1323,6 +1323,9 @@ int do_execve(char * filename,
        retval = bprm_mm_init(bprm);
        if (retval)
                goto out_file;
+#ifdef CONFIG_FUTEX
+       current->robust_list = NULL;
+#endif

        bprm->argc = count(argv, MAX_ARG_STRINGS);
        if ((retval = bprm->argc) < 0)



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-01 23:46   ` Anirban Sinha
@ 2009-10-02 23:38     ` Darren Hart
  2009-10-03  0:36       ` Anirban Sinha
  2009-10-04  8:44       ` Thomas Gleixner
  0 siblings, 2 replies; 26+ messages in thread
From: Darren Hart @ 2009-10-02 23:38 UTC (permalink / raw)
  To: Anirban Sinha; +Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-kernel

Anirban Sinha wrote:
>> Should we not just clear the pointer (and
>>> it's compat version) within do_execve()?
> 
> 
> In our private repository, applying the following patch resolved the
> issues I mentioned. I no longer see messages like this:
> 
>  [futex] ("ifconfig")(pid=2509) exit_robust_list:unable to fetch robust
> entry. uaddr=0x000000002abbc4f0
> 
> from my instrumented kernel within exit_robust_list(). My
> instrumentation looked something like this:

> 
> if (fetch_robust_entry(...)) {
> printk(...);
> return;
> }
> 
> Just tossing the patch in the community in case someone is interested


Thanks for sending the patch.  I'm looking into it now.  Couple questions:

1) What caused you to instrument this path in the first place?  Were you 
seeing some unexpected behavior?

2) I wonder why we would need to clear the robust list, but I don't see 
other things like pi_blocked_on, etc. in execve being cleared.  I'm 
looking into this now (perhaps we don't do the same cleanup, need to 
check).... have to get on the plane...

--
Darren Hart

> ...
> 
> 
> 
> Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com>
> ---
>  fs/compat.c |    3 +++
>  fs/exec.c   |    3 +++
>  2 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/compat.c b/fs/compat.c
> index 6d6f98f..c3d117c 100644
> --- a/fs/compat.c
> +++ b/fs/compat.c
> @@ -1510,6 +1510,9 @@ int compat_do_execve(char * filename,
>         if (retval)
>                 goto out_file;
> 
> +#ifdef CONFIG_FUTEX
> +       current->compat_robust_list = NULL;
> +#endif
>         bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
>         if ((retval = bprm->argc) < 0)
>                 goto out;
> diff --git a/fs/exec.c b/fs/exec.c
> index 172ceb6..e9334b8 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1323,6 +1323,9 @@ int do_execve(char * filename,
>         retval = bprm_mm_init(bprm);
>         if (retval)
>                 goto out_file;
> +#ifdef CONFIG_FUTEX
> +       current->robust_list = NULL;
> +#endif
> 
>         bprm->argc = count(argv, MAX_ARG_STRINGS);
>         if ((retval = bprm->argc) < 0)
> 
> 


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: futex question
  2009-10-02 23:38     ` Darren Hart
@ 2009-10-03  0:36       ` Anirban Sinha
  2009-10-03  4:14         ` Eric Dumazet
  2009-10-04  8:44       ` Thomas Gleixner
  1 sibling, 1 reply; 26+ messages in thread
From: Anirban Sinha @ 2009-10-03  0:36 UTC (permalink / raw)
  To: Darren Hart
  Cc: Ingo Molnar, Thomas Gleixner, Peter Zijlstra, linux-kernel, Kaz Kylheku

>
>
>Thanks for sending the patch.  I'm looking into it now.  Couple
questions:
>
>1) What caused you to instrument this path in the first place?  Were
you
>seeing some unexpected behavior?

Basically, all this started as a means to aid debug or at least isolate
cases of memory corruption. When a process holding a futex died, the
robust futex cleanup operation can be foiled if there are any memory
corruptions in the user land. The "carefully inspecting the user land
linked list" part would bail out silently. So no process would get
EOWNERDEAD and wake up. So we decided to add printks so that we can
track these silent return cases.

We thought that actual number of cases of silently bailing out would be
rare so we did not expect any of those logs in the kernel buffer under
regular circumstances. To our surprise, we found lots of those logs!
This puzzled us.  I looked at the code again and again but it deed some
seem to have any issues. Then it occurred to us (kaz) that an execve()
call can also cause invalid pointer values to remain in the task
structure. I did some testing and it seemed to indicate that this was
indeed the case.

There is a discussion on this by Kaz on the linux mips mailing list:

http://www.linux-mips.org/archives/linux-mips/2009-09/msg00130.html



>
>2) I wonder why we would need to clear the robust list, but I don't see
>other things like pi_blocked_on, etc. in execve being cleared. 

Yea, may be. We don't use pi-futexes, so not too much confident of the
pi futex codebase there. You can look into those too.


 I'm
>looking into this now (perhaps we don't do the same cleanup, need to
>check).... have to get on the plane...

Have a good trip and thanks for looking into it. :)

Ani

>>
>>
>> Signed-off-by: Anirban Sinha <asinha@zeugmasystems.com>
>> ---
>>  fs/compat.c |    3 +++
>>  fs/exec.c   |    3 +++
>>  2 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/compat.c b/fs/compat.c
>> index 6d6f98f..c3d117c 100644
>> --- a/fs/compat.c
>> +++ b/fs/compat.c
>> @@ -1510,6 +1510,9 @@ int compat_do_execve(char * filename,
>>         if (retval)
>>                 goto out_file;
>>
>> +#ifdef CONFIG_FUTEX
>> +       current->compat_robust_list = NULL;
>> +#endif
>>         bprm->argc = compat_count(argv, MAX_ARG_STRINGS);
>>         if ((retval = bprm->argc) < 0)
>>                 goto out;
>> diff --git a/fs/exec.c b/fs/exec.c
>> index 172ceb6..e9334b8 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1323,6 +1323,9 @@ int do_execve(char * filename,
>>         retval = bprm_mm_init(bprm);
>>         if (retval)
>>                 goto out_file;
>> +#ifdef CONFIG_FUTEX
>> +       current->robust_list = NULL;
>> +#endif
>>
>>         bprm->argc = count(argv, MAX_ARG_STRINGS);
>>         if ((retval = bprm->argc) < 0)
>>
>>
>
>
>--
>Darren Hart
>IBM Linux Technology Center
>Real-Time Linux Team

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-03  0:36       ` Anirban Sinha
@ 2009-10-03  4:14         ` Eric Dumazet
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2009-10-03  4:14 UTC (permalink / raw)
  To: Anirban Sinha
  Cc: Darren Hart, Ingo Molnar, Thomas Gleixner, Peter Zijlstra,
	linux-kernel, Kaz Kylheku

Anirban Sinha a écrit :
>>
>> Thanks for sending the patch.  I'm looking into it now.  Couple
> questions:
>> 1) What caused you to instrument this path in the first place?  Were
> you
>> seeing some unexpected behavior?
> 
> Basically, all this started as a means to aid debug or at least isolate
> cases of memory corruption. When a process holding a futex died, the
> robust futex cleanup operation can be foiled if there are any memory
> corruptions in the user land. The "carefully inspecting the user land
> linked list" part would bail out silently. So no process would get
> EOWNERDEAD and wake up. So we decided to add printks so that we can
> track these silent return cases.
> 
> We thought that actual number of cases of silently bailing out would be
> rare so we did not expect any of those logs in the kernel buffer under
> regular circumstances. To our surprise, we found lots of those logs!
> This puzzled us.  I looked at the code again and again but it deed some
> seem to have any issues. Then it occurred to us (kaz) that an execve()
> call can also cause invalid pointer values to remain in the task
> structure. I did some testing and it seemed to indicate that this was
> indeed the case.
> 
> There is a discussion on this by Kaz on the linux mips mailing list:
> 
> http://www.linux-mips.org/archives/linux-mips/2009-09/msg00130.html

This exactly looks like what I discovered a while ago about futex used
for pthread management. Anirban, this is a real security flaw and this
should be fixed as fast as possible :)

Commit 9c8a8228d0827e0d91d28527209988f672f97d28
author	Eric Dumazet <eric.dumazet@gmail.com>
	Thu, 6 Aug 2009 22:09:28 +0000 (15:09 -0700)
execve: must clear current->clear_child_tid

While looking at Jens Rosenboom bug report
(http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
a dying "ps" program, we found following problem.

clone() syscall has special support for TID of created threads.  This
support includes two features.

One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
TID value.

One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
thread dies.

The integer location is a user provided pointer, provided at clone()
time.

kernel keeps this pointer value into current->clear_child_tid.

At execve() time, we should make sure kernel doesnt keep this user
provided pointer, as full user memory is replaced by a new one.

As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
memory in forked processes.

Following sequence could happen:

1) bash (or any program) starts a new process, by a fork() call that
   glibc maps to a clone( ...  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
   ...) syscall

2) When new process starts, its current->clear_child_tid is set to a
   location that has a meaning only in bash (or initial program) context
   (&THREAD_SELF->tid)

3) This new process does the execve() syscall to start a new program.
   current->clear_child_tid is left unchanged (a non NULL value)

4) If this new program creates some threads, and initial thread exits,
   kernel will attempt to clear the integer pointed by
   current->clear_child_tid from mm_release() :

        if (tsk->clear_child_tid
            && !(tsk->flags & PF_SIGNALED)
            && atomic_read(&mm->mm_users) > 1) {
                u32 __user * tidptr = tsk->clear_child_tid;
                tsk->clear_child_tid = NULL;

                /*
                 * We don't check the error code - if userspace has
                 * not set up a proper pointer then tough luck.
                 */
<< here >>      put_user(0, tidptr);
                sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
        }

5) OR : if new program is not multi-threaded, but spied by /proc/pid
   users (ps command for example), mm_users > 1, and the exiting program
   could corrupt 4 bytes in a persistent memory area (shm or memory mapped
   file)

If current->clear_child_tid points to a writeable portion of memory of the
new program, kernel happily and silently corrupts 4 bytes of memory, with
unexpected effects.

Fix is straightforward and should not break any sane program.

Reported-by: Jens Rosenboom <jens@mcbone.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sonny Rao <sonnyrao@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-02 23:38     ` Darren Hart
  2009-10-03  0:36       ` Anirban Sinha
@ 2009-10-04  8:44       ` Thomas Gleixner
       [not found]         ` <DDFD17CC94A9BD49A82147DDF7D545C501F457C5@exchange.ZeugmaSystems.local>
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-04  8:44 UTC (permalink / raw)
  To: Darren Hart; +Cc: Anirban Sinha, Ingo Molnar, Peter Zijlstra, linux-kernel

On Fri, 2 Oct 2009, Darren Hart wrote:
> Anirban Sinha wrote:
> > > Should we not just clear the pointer (and
> > > > it's compat version) within do_execve()?
> > 
> > 
> > In our private repository, applying the following patch resolved the
> > issues I mentioned. I no longer see messages like this:
> > 
> >  [futex] ("ifconfig")(pid=2509) exit_robust_list:unable to fetch robust
> > entry. uaddr=0x000000002abbc4f0
> > 
> > from my instrumented kernel within exit_robust_list(). My
> > instrumentation looked something like this:
> 
> > 
> > if (fetch_robust_entry(...)) {
> > printk(...);
> > return;
> > }
> > 
> > Just tossing the patch in the community in case someone is interested
> 
> 
> Thanks for sending the patch.  I'm looking into it now.  Couple questions:
> 
> 1) What caused you to instrument this path in the first place?  Were you
> seeing some unexpected behavior?
> 
> 2) I wonder why we would need to clear the robust list, but I don't see other
> things like pi_blocked_on, etc. in execve being cleared.  I'm looking into
> this now (perhaps we don't do the same cleanup, need to check).... have to get
> on the plane...

Hmm, just setting the robust list pointer to NULL fixes the problem at
hand, but I wonder whether we need to call exit_robust_list() as
well. 

Vs. pi_blocked_on: If that's != NULL then you are not executing that
code path because you hang in the scheduler waiting for the lock. 

The interesting question is whether we need to call
exit_pi_state_list() to fix up held locks.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
       [not found]         ` <DDFD17CC94A9BD49A82147DDF7D545C501F457C5@exchange.ZeugmaSystems.local>
@ 2009-10-04 16:37           ` Anirban Sinha
  2009-10-04 16:59             ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Anirban Sinha @ 2009-10-04 16:37 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, linux-kernel, Darren Hart,
	Peter Zijlstra, Kaz Kylheku, Anirban Sinha

>> 1) What caused you to instrument this path in the first place?  Were you
>> seeing some unexpected behavior?
>>
>> 2) I wonder why we would need to clear the robust list, but I don't see other
>> things like pi_blocked_on, etc. in execve being cleared.  I'm looking into
>> this now (perhaps we don't do the same cleanup, need to check).... have to get
>> on the plane...
> 
> Hmm, just setting the robust list pointer to NULL fixes the problem at
> hand, but I wonder whether we need to call exit_robust_list() as
> well. 

hmm. That is an interesting thought. But I wonder if acquiring a lock and then exec()ing in the critical section is a legal thing to do. It does not feel right. Currently, with or without my change, such a thing would indefinitely block other waiters on the same futex.  

Ani

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-04 16:37           ` Anirban Sinha
@ 2009-10-04 16:59             ` Thomas Gleixner
  2009-10-05 10:36               ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-04 16:59 UTC (permalink / raw)
  To: Anirban Sinha
  Cc: Ingo Molnar, linux-kernel, Darren Hart, Peter Zijlstra,
	Kaz Kylheku, Anirban Sinha

On Sun, 4 Oct 2009, Anirban Sinha wrote:

> >> 1) What caused you to instrument this path in the first place?  Were you
> >> seeing some unexpected behavior?
> >>
> >> 2) I wonder why we would need to clear the robust list, but I don't see other
> >> things like pi_blocked_on, etc. in execve being cleared.  I'm looking into
> >> this now (perhaps we don't do the same cleanup, need to check).... have to get
> >> on the plane...
> > 
> > Hmm, just setting the robust list pointer to NULL fixes the problem at
> > hand, but I wonder whether we need to call exit_robust_list() as
> > well. 
 
> hmm. That is an interesting thought. But I wonder if acquiring a
> lock and then exec()ing in the critical section is a legal thing to

It's definitely legal. There is no law which forbids to do that. :)

> do. It does not feel right. Currently, with or without my change,
> such a thing would indefinitely block other waiters on the same
> futex.

Right. Which completely defeats the purpose of the robust list. Will
have a look tomorrow.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-04 16:59             ` Thomas Gleixner
@ 2009-10-05 10:36               ` Peter Zijlstra
  2009-10-05 10:56                 ` Thomas Gleixner
                                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Peter Zijlstra @ 2009-10-05 10:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha

On Sun, 2009-10-04 at 18:59 +0200, Thomas Gleixner wrote:

> > do. It does not feel right. Currently, with or without my change,
> > such a thing would indefinitely block other waiters on the same
> > futex.
> 
> Right. Which completely defeats the purpose of the robust list. Will
> have a look tomorrow.

Right, so mm_release() which is meant to destroy the old mm context
actually does exit_robust_list(), but the problem is that it does so on
the new mm, not the old one that got passed down to mm_release().

The other detail is that exit_robust_list() doesn't clear
current->robust_list.

The problem with the patch send my Ani is that it clears the robust
lists before the point of no return, so on a failing execve() we'd have
messed up the state.

Making exit_robust_list() deal with an mm that is not the current mm is
interesting indeed.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 10:36               ` Peter Zijlstra
@ 2009-10-05 10:56                 ` Thomas Gleixner
  2009-10-05 11:16                   ` Peter Zijlstra
  2009-10-05 11:58                 ` Peter Zijlstra
  2009-10-05 18:11                 ` Anirban Sinha
  2 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-05 10:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha

Peter,

On Mon, 5 Oct 2009, Peter Zijlstra wrote:

> On Sun, 2009-10-04 at 18:59 +0200, Thomas Gleixner wrote:
> 
> > > do. It does not feel right. Currently, with or without my change,
> > > such a thing would indefinitely block other waiters on the same
> > > futex.
> > 
> > Right. Which completely defeats the purpose of the robust list. Will
> > have a look tomorrow.
> 
> Right, so mm_release() which is meant to destroy the old mm context
> actually does exit_robust_list(), but the problem is that it does so on
> the new mm, not the old one that got passed down to mm_release().
> 
> The other detail is that exit_robust_list() doesn't clear
> current->robust_list.

I know.

> The problem with the patch send my Ani is that it clears the robust
> lists before the point of no return, so on a failing execve() we'd have
> messed up the state.

Right. We need to do that at the latest possible point.

Looking more into that I think we should check whether the robust list
has an entry (lock held) in do_execve() and return -EWOULDBLOCK to
luser space. Same if pi_waiters is not empty. Holding a lock and
calling execve() is simply broken.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 10:56                 ` Thomas Gleixner
@ 2009-10-05 11:16                   ` Peter Zijlstra
  2009-10-05 11:19                     ` Ingo Molnar
  2009-10-05 11:47                     ` Thomas Gleixner
  0 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2009-10-05 11:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha

On Mon, 2009-10-05 at 12:56 +0200, Thomas Gleixner wrote:
> 
> Looking more into that I think we should check whether the robust list
> has an entry (lock held) in do_execve() and return -EWOULDBLOCK to
> luser space. Same if pi_waiters is not empty. Holding a lock and
> calling execve() is simply broken.

Fine by me ;-)

something like the below?

The question is of course what Ani was doing that triggered this in the
first place and if he can live with this.. :-)

---
 fs/exec.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index d49be6b..0812ba6 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1295,6 +1295,22 @@ int do_execve(char * filename,
 	bool clear_in_exec;
 	int retval;
 
+	retval = -EWOULDBLOCK;
+#ifdef CONFIG_FUTEX
+	if (unlikely(current->robust_list))
+		goto out_ret;
+#ifdef CONFIG_COMPAT
+	if (unlikely(current->compat_robust_list))
+		goto out_ret;
+#endif
+	spin_lock_irq(&current->pi_lock);
+	if (!list_empty(&current->pi_state_list)) {
+		spin_unlock_irq(&current->pi_lock);
+		goto out_ret;
+	}
+	spin_unlock_irq(&current->pi_lock);
+#endif
+
 	retval = unshare_files(&displaced);
 	if (retval)
 		goto out_ret;



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 11:16                   ` Peter Zijlstra
@ 2009-10-05 11:19                     ` Ingo Molnar
  2009-10-05 11:50                       ` Thomas Gleixner
  2009-10-05 11:47                     ` Thomas Gleixner
  1 sibling, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2009-10-05 11:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Anirban Sinha, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha


* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> diff --git a/fs/exec.c b/fs/exec.c
> index d49be6b..0812ba6 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1295,6 +1295,22 @@ int do_execve(char * filename,
>  	bool clear_in_exec;
>  	int retval;
>  
> +	retval = -EWOULDBLOCK;
> +#ifdef CONFIG_FUTEX
> +	if (unlikely(current->robust_list))
> +		goto out_ret;
> +#ifdef CONFIG_COMPAT
> +	if (unlikely(current->compat_robust_list))
> +		goto out_ret;
> +#endif
> +	spin_lock_irq(&current->pi_lock);
> +	if (!list_empty(&current->pi_state_list)) {
> +		spin_unlock_irq(&current->pi_lock);
> +		goto out_ret;
> +	}
> +	spin_unlock_irq(&current->pi_lock);
> +#endif

i suspect this should have the form of:

	retval = can_exec_robust_futexes();
	if (retval)
		goto out_ret

  	retval = unshare_files(&displaced);
  	if (retval)
  		goto out_ret;

	...

but ... shouldnt we just do what exec normally does and zap any state 
that shouldnt be carried over into the new context - instead of denying 
the exec? Am i missing something?

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 11:16                   ` Peter Zijlstra
  2009-10-05 11:19                     ` Ingo Molnar
@ 2009-10-05 11:47                     ` Thomas Gleixner
  2009-10-05 13:11                       ` Anirban Sinha
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-05 11:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha

On Mon, 5 Oct 2009, Peter Zijlstra wrote:
> On Mon, 2009-10-05 at 12:56 +0200, Thomas Gleixner wrote:
> > 
> > Looking more into that I think we should check whether the robust list
> > has an entry (lock held) in do_execve() and return -EWOULDBLOCK to
> > luser space. Same if pi_waiters is not empty. Holding a lock and
> > calling execve() is simply broken.
> 
> Fine by me ;-)
> 
> something like the below?
> 
> The question is of course what Ani was doing that triggered this in the
> first place and if he can live with this.. :-)
> 
> ---
>  fs/exec.c |   16 ++++++++++++++++
>  1 files changed, 16 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index d49be6b..0812ba6 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1295,6 +1295,22 @@ int do_execve(char * filename,
>  	bool clear_in_exec;
>  	int retval;
>  
> +	retval = -EWOULDBLOCK;
> +#ifdef CONFIG_FUTEX
> +	if (unlikely(current->robust_list))
> +		goto out_ret;
> +#ifdef CONFIG_COMPAT
> +	if (unlikely(current->compat_robust_list))
> +		goto out_ret;
> +#endif

  That needs to call into the futex code and check whether the list is
  empty. If not empty, return. If empty set the pointer to NULL

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 11:19                     ` Ingo Molnar
@ 2009-10-05 11:50                       ` Thomas Gleixner
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-05 11:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Anirban Sinha, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha

On Mon, 5 Oct 2009, Ingo Molnar wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> 
> > diff --git a/fs/exec.c b/fs/exec.c
> > index d49be6b..0812ba6 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1295,6 +1295,22 @@ int do_execve(char * filename,
> >  	bool clear_in_exec;
> >  	int retval;
> >  
> > +	retval = -EWOULDBLOCK;
> > +#ifdef CONFIG_FUTEX
> > +	if (unlikely(current->robust_list))
> > +		goto out_ret;
> > +#ifdef CONFIG_COMPAT
> > +	if (unlikely(current->compat_robust_list))
> > +		goto out_ret;
> > +#endif
> > +	spin_lock_irq(&current->pi_lock);
> > +	if (!list_empty(&current->pi_state_list)) {
> > +		spin_unlock_irq(&current->pi_lock);
> > +		goto out_ret;
> > +	}
> > +	spin_unlock_irq(&current->pi_lock);
> > +#endif
> 
> i suspect this should have the form of:
> 
> 	retval = can_exec_robust_futexes();
> 	if (retval)
> 		goto out_ret

  Yes.
 
>   	retval = unshare_files(&displaced);
>   	if (retval)
>   		goto out_ret;
> 
> 	...
> 
> but ... shouldnt we just do what exec normally does and zap any state 
> that shouldnt be carried over into the new context - instead of denying 
> the exec? Am i missing something?

We want to check whether the robust list is empty. If it's not empty
then we deny the exec instead of silently releasing the futexes or
just ignoring the robust list entirely. Same applies for the pi
waiters list.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 10:36               ` Peter Zijlstra
  2009-10-05 10:56                 ` Thomas Gleixner
@ 2009-10-05 11:58                 ` Peter Zijlstra
  2009-10-05 11:59                   ` Thomas Gleixner
  2009-10-05 18:11                 ` Anirban Sinha
  2 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2009-10-05 11:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha

On Mon, 2009-10-05 at 12:36 +0200, Peter Zijlstra wrote:
> On Sun, 2009-10-04 at 18:59 +0200, Thomas Gleixner wrote:
> 
> > > do. It does not feel right. Currently, with or without my change,
> > > such a thing would indefinitely block other waiters on the same
> > > futex.
> > 
> > Right. Which completely defeats the purpose of the robust list. Will
> > have a look tomorrow.
> 
> Right, so mm_release() which is meant to destroy the old mm context
> actually does exit_robust_list(), but the problem is that it does so on
> the new mm, not the old one that got passed down to mm_release().
> 
> The other detail is that exit_robust_list() doesn't clear
> current->robust_list.
> 
> The problem with the patch send my Ani is that it clears the robust
> lists before the point of no return, so on a failing execve() we'd have
> messed up the state.
> 
> Making exit_robust_list() deal with an mm that is not the current mm is
> interesting indeed.

Hmm...

static int exec_mmap(struct mm_struct *mm)
{
        struct task_struct *tsk;
        struct mm_struct * old_mm, *active_mm;

        /* Notify parent that we're no longer interested in the old VM */
        tsk = current;
        old_mm = current->mm;
        mm_release(tsk, old_mm);

        if (old_mm) {
                /*
                 * Make sure that if there is a core dump in progress
                 * for the old mm, we get out and die instead of going
                 * through with the exec.  We must hold mmap_sem around
                 * checking core_state and changing tsk->mm.
                 */
                down_read(&old_mm->mmap_sem);
                if (unlikely(old_mm->core_state)) {
                        up_read(&old_mm->mmap_sem);
                        return -EINTR;
                }
        }
        task_lock(tsk);
        active_mm = tsk->active_mm;
        tsk->mm = mm;
        tsk->active_mm = mm;
        activate_mm(active_mm, mm);
        task_unlock(tsk);
        arch_pick_mmap_layout(mm);
        if (old_mm) {
                up_read(&old_mm->mmap_sem);
                BUG_ON(active_mm != old_mm);
                mm_update_next_owner(old_mm);
                mmput(old_mm);
                return 0;
        }
        mmdrop(active_mm);
        return 0;
}

Actually calls mm_release() before the flip, so the below might actually
be sufficient?

---
 kernel/fork.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 266c6af..4c20fff 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -570,12 +570,18 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
 
 	/* Get rid of any futexes when releasing the mm */
 #ifdef CONFIG_FUTEX
-	if (unlikely(tsk->robust_list))
+	if (unlikely(tsk->robust_list)) {
 		exit_robust_list(tsk);
+		tsk->robust_list = NULL;
+	}
 #ifdef CONFIG_COMPAT
-	if (unlikely(tsk->compat_robust_list))
+	if (unlikely(tsk->compat_robust_list)) {
 		compat_exit_robust_list(tsk);
+		tsk->compat_robust_list = NULL;
+	}
 #endif
+	if (unlikely(!list_empty(&tsk->pi_state_list)))
+		exit_pi_state_list(tsk);
 #endif
 
 	/* Get rid of any cached register state */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 11:58                 ` Peter Zijlstra
@ 2009-10-05 11:59                   ` Thomas Gleixner
  2009-10-05 12:18                     ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-05 11:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha

On Mon, 5 Oct 2009, Peter Zijlstra wrote:

> On Mon, 2009-10-05 at 12:36 +0200, Peter Zijlstra wrote:
> > On Sun, 2009-10-04 at 18:59 +0200, Thomas Gleixner wrote:
> > 
> > > > do. It does not feel right. Currently, with or without my change,
> > > > such a thing would indefinitely block other waiters on the same
> > > > futex.
> > > 
> > > Right. Which completely defeats the purpose of the robust list. Will
> > > have a look tomorrow.
> > 
> > Right, so mm_release() which is meant to destroy the old mm context
> > actually does exit_robust_list(), but the problem is that it does so on
> > the new mm, not the old one that got passed down to mm_release().
> > 
> > The other detail is that exit_robust_list() doesn't clear
> > current->robust_list.
> > 
> > The problem with the patch send my Ani is that it clears the robust
> > lists before the point of no return, so on a failing execve() we'd have
> > messed up the state.
> > 
> > Making exit_robust_list() deal with an mm that is not the current mm is
> > interesting indeed.
> 
> Hmm...
> 
> static int exec_mmap(struct mm_struct *mm)
> {
>         struct task_struct *tsk;
>         struct mm_struct * old_mm, *active_mm;
> 
>         /* Notify parent that we're no longer interested in the old VM */
>         tsk = current;
>         old_mm = current->mm;
>         mm_release(tsk, old_mm);
> 
>         if (old_mm) {
>                 /*
>                  * Make sure that if there is a core dump in progress
>                  * for the old mm, we get out and die instead of going
>                  * through with the exec.  We must hold mmap_sem around
>                  * checking core_state and changing tsk->mm.
>                  */
>                 down_read(&old_mm->mmap_sem);
>                 if (unlikely(old_mm->core_state)) {
>                         up_read(&old_mm->mmap_sem);
>                         return -EINTR;
>                 }
>         }
>         task_lock(tsk);
>         active_mm = tsk->active_mm;
>         tsk->mm = mm;
>         tsk->active_mm = mm;
>         activate_mm(active_mm, mm);
>         task_unlock(tsk);
>         arch_pick_mmap_layout(mm);
>         if (old_mm) {
>                 up_read(&old_mm->mmap_sem);
>                 BUG_ON(active_mm != old_mm);
>                 mm_update_next_owner(old_mm);
>                 mmput(old_mm);
>                 return 0;
>         }
>         mmdrop(active_mm);
>         return 0;
> }
> 
> Actually calls mm_release() before the flip, so the below might actually
> be sufficient?

Stared at the same place a minute ago :) But still I wonder if it's a
good idea to silently release locks and set the state to OWNERDEAD
instead of hitting the app programmer with a big clue stick in case
the app holds locks when calling execve().

Thanks,

	tglx
 
> ---
>  kernel/fork.c |   10 ++++++++--
>  1 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 266c6af..4c20fff 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -570,12 +570,18 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>  
>  	/* Get rid of any futexes when releasing the mm */
>  #ifdef CONFIG_FUTEX
> -	if (unlikely(tsk->robust_list))
> +	if (unlikely(tsk->robust_list)) {
>  		exit_robust_list(tsk);
> +		tsk->robust_list = NULL;
> +	}
>  #ifdef CONFIG_COMPAT
> -	if (unlikely(tsk->compat_robust_list))
> +	if (unlikely(tsk->compat_robust_list)) {
>  		compat_exit_robust_list(tsk);
> +		tsk->compat_robust_list = NULL;
> +	}
>  #endif
> +	if (unlikely(!list_empty(&tsk->pi_state_list)))
> +		exit_pi_state_list(tsk);
>  #endif
>  
>  	/* Get rid of any cached register state */
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 11:59                   ` Thomas Gleixner
@ 2009-10-05 12:18                     ` Peter Zijlstra
  2009-10-05 12:24                       ` Ingo Molnar
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2009-10-05 12:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha

On Mon, 2009-10-05 at 13:59 +0200, Thomas Gleixner wrote:
> Stared at the same place a minute ago :) But still I wonder if it's a
> good idea to silently release locks and set the state to OWNERDEAD
> instead of hitting the app programmer with a big clue stick in case
> the app holds locks when calling execve().

Agreed, I rather like the feedback. With regular exit like things
there's just not much we can do to avoid the mess, but here we can
actually avoid it, seems a waste not to do so.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 12:18                     ` Peter Zijlstra
@ 2009-10-05 12:24                       ` Ingo Molnar
  2009-10-05 14:09                         ` Darren Hart
  0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2009-10-05 12:24 UTC (permalink / raw)
  To: Peter Zijlstra, Oleg Nesterov, Andrew Morton, Linus Torvalds
  Cc: Thomas Gleixner, Anirban Sinha, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, 2009-10-05 at 13:59 +0200, Thomas Gleixner wrote:
>
> > Stared at the same place a minute ago :) But still I wonder if it's 
> > a good idea to silently release locks and set the state to OWNERDEAD 
> > instead of hitting the app programmer with a big clue stick in case 
> > the app holds locks when calling execve().
> 
> Agreed, I rather like the feedback. With regular exit like things 
> there's just not much we can do to avoid the mess, but here we can 
> actually avoid it, seems a waste not to do so.

Well, exec() has been a 'exit() + boot-strap next process' kind of thing 
from the get go - with little state carried over into the new task. This 
has security and robustness reasons as well.

So i think exec() should release all existing state, unless told 
otherwise. Making it behave differently for robust futexes sounds 
assymetric to me.

It might make sense though - a 'prevent exec because you are holding 
locks!' thing. Dunno.

Cc:-ed a few execve() semantics experts who might want to chime in.

If a (buggy) app calls execve() with a (robust) futex still held should 
we auto-force-release robust locks held, or fail the exec with an error 
code? I think the forced release is a 'anomalous exit' thing mostly, 
while calling exec() is not anomalous at all.

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 11:47                     ` Thomas Gleixner
@ 2009-10-05 13:11                       ` Anirban Sinha
  2009-10-05 13:28                         ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Anirban Sinha @ 2009-10-05 13:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha

Hi all:

catching up with the mails now. still pretty early here in the west coast.

Once upon a time, like on 09-10-05 4:47 AM, Thomas Gleixner wrote:
> On Mon, 5 Oct 2009, Peter Zijlstra wrote:
>> On Mon, 2009-10-05 at 12:56 +0200, Thomas Gleixner wrote:
>>>
>>> Looking more into that I think we should check whether the robust list
>>> has an entry (lock held) in do_execve() and return -EWOULDBLOCK to
>>> luser space. Same if pi_waiters is not empty. Holding a lock and
>>> calling execve() is simply broken.
>>

ha ha! Now you say it's broken :) Indeed, I also thought so. However, I  wonder if this behavior change could potentially break some user space application? 
 

>> Fine by me ;-)
>>
>> something like the below?
>>
>> The question is of course what Ani was doing that triggered this in the
>> first place and if he can live with this.. :-)
>>
>> ---
>>  fs/exec.c |   16 ++++++++++++++++
>>  1 files changed, 16 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index d49be6b..0812ba6 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1295,6 +1295,22 @@ int do_execve(char * filename,
>>  	bool clear_in_exec;
>>  	int retval;
>>  
>> +	retval = -EWOULDBLOCK;
>> +#ifdef CONFIG_FUTEX
>> +	if (unlikely(current->robust_list))
>> +		goto out_ret;
>> +#ifdef CONFIG_COMPAT
>> +	if (unlikely(current->compat_robust_list))
>> +		goto out_ret;
>> +#endif
> 
>   That needs to call into the futex code and check whether the list is
>   empty. If not empty, return. If empty set the pointer to NULL

True. Just because the head pointer has been registered does not mean that the list is non-empty. It can point back to itself when no locks are held as it's circular.

We need to clear those pointers regardless. After the exceve(), the address values are meaningless under the new mm context.

Ani

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 13:11                       ` Anirban Sinha
@ 2009-10-05 13:28                         ` Thomas Gleixner
  2009-10-05 14:03                           ` Anirban Sinha
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-05 13:28 UTC (permalink / raw)
  To: Anirban Sinha
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha

Can you please fix your mail client to do proper line breaks at around
78 chars ?

On Mon, 5 Oct 2009, Anirban Sinha wrote:
>
> We need to clear those pointers regardless. After the exceve(), the
> address values are meaningless under the new mm context.

That's out of question. We just need to come to a decision whether we
silently clean up callers dumbness or not.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 13:28                         ` Thomas Gleixner
@ 2009-10-05 14:03                           ` Anirban Sinha
  2009-10-05 18:36                             ` Anirban Sinha
  0 siblings, 1 reply; 26+ messages in thread
From: Anirban Sinha @ 2009-10-05 14:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Darren Hart,
	Kaz Kylheku, Anirban Sinha



Once upon a time, like on 09-10-05 6:28 AM, Thomas Gleixner wrote:
> Can you please fix your mail client to do proper line breaks at around
> 78 chars ?

Hmm. I made that change deliberately according to instructions in
Documentation/email-clients.txt so that it does not break my patches :) Anyway,
back to previous setting.

>>
>> We need to clear those pointers regardless. After the exceve(), the
>> address values are meaningless under the new mm context.
> 
> That's out of question. We just need to come to a decision whether we
> silently clean up callers dumbness or not.

Correct me if I am wrong, but according to Ingo:

> So i think exec() should release all existing state, unless told
> otherwise. Making it behave differently for robust futexes sounds
> assymetric to me.

I thought clearing the head pointers was a part of the stale existing state?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: futex question
  2009-10-05 12:24                       ` Ingo Molnar
@ 2009-10-05 14:09                         ` Darren Hart
  0 siblings, 0 replies; 26+ messages in thread
From: Darren Hart @ 2009-10-05 14:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Oleg Nesterov, Andrew Morton, Linus Torvalds,
	Thomas Gleixner, Anirban Sinha, linux-kernel, Kaz Kylheku,
	Anirban Sinha

Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Mon, 2009-10-05 at 13:59 +0200, Thomas Gleixner wrote:
>>
>>> Stared at the same place a minute ago :) But still I wonder if it's 
>>> a good idea to silently release locks and set the state to OWNERDEAD 
>>> instead of hitting the app programmer with a big clue stick in case 
>>> the app holds locks when calling execve().
>> Agreed, I rather like the feedback. With regular exit like things 
>> there's just not much we can do to avoid the mess, but here we can 
>> actually avoid it, seems a waste not to do so.
> 
> Well, exec() has been a 'exit() + boot-strap next process' kind of thing 
> from the get go - with little state carried over into the new task. This 
> has security and robustness reasons as well.
> 
> So i think exec() should release all existing state, unless told 
> otherwise. Making it behave differently for robust futexes sounds 
> assymetric to me.
> 
> It might make sense though - a 'prevent exec because you are holding 
> locks!' thing. Dunno.
> 
> Cc:-ed a few execve() semantics experts who might want to chime in.
> 
> If a (buggy) app calls execve() with a (robust) futex still held should 
> we auto-force-release robust locks held, or fail the exec with an error 
> code? I think the forced release is a 'anomalous exit' thing mostly, 
> while calling exec() is not anomalous at all.

My first thought earlier in the thread was that changing the exec 
behavior to fail if either a robust or pi futex is held would be liable 
to break existing applications. I can now see the argument that such 
apps are broken already, and if they aren't hanging, it's simply because 
they are hacking around it.

I think the semantics work for robust mutexes, if you exec, the exec'ing 
"thread" is effectively dead, so EOWNERDEAD makes sense.

This doesn't seem to work for PI futexes, unless they are also Robust of 
course.  Here I would expect a userspace application to hang.

The only locking related statements made in the SUS or our Linux man 
pages is in regards to named semaphores.  And here it is only said that 
the semaphore will be closed like a call to sem_close().  sem_close(3) 
doesn't specify a return value if the semaphore is held when called.

The closing of message queues and canceling of any pending asynchronous 
I/O might provide precedent for just unlocking held locks and moving on 
in the case of PI. EOWNERDEAD still makes more sense to me from a robust 
point of view.

And from the ignorant-fool department, the docs refer to replacing the 
"process image" on execve, doesn't that mean that if there are 20 
threads in a process and one of them calls execve that all 20 are 
destroyed?  If so, then we are only concerned with 
PTHREAD_PROCESS_SHARED futexes, since none of the private futexes will 
have any users after the execve.


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: futex question
  2009-10-05 10:36               ` Peter Zijlstra
  2009-10-05 10:56                 ` Thomas Gleixner
  2009-10-05 11:58                 ` Peter Zijlstra
@ 2009-10-05 18:11                 ` Anirban Sinha
  2 siblings, 0 replies; 26+ messages in thread
From: Anirban Sinha @ 2009-10-05 18:11 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart, Kaz Kylheku


>
>The problem with the patch send my Ani is that it clears the robust
>lists before the point of no return, so on a failing execve() we'd have
>messed up the state.

Ah! yes. I should have added the lines after load_binary() succeeds:

 fs/compat.c |    3 +++
 fs/exec.c   |    3 +++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/compat.c b/fs/compat.c
index 6d6f98f..7d1baf5 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1539,6 +1539,9 @@ int compat_do_execve(char * filename,
        if (retval < 0)
                goto out;

+#ifdef CONFIG_FUTEX
+       current->compat_robust_list = NULL;
+#endif
        /* execve succeeded */
        current->fs->in_exec = 0;
        current->in_execve = 0;
diff --git a/fs/exec.c b/fs/exec.c
index 172ceb6..d7b4ca3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1354,6 +1354,9 @@ int do_execve(char * filename,
        if (retval < 0)
                goto out;

+#ifdef CONFIG_FUTEX
+       current->robust_list = NULL;
+#endif
        /* execve succeeded */
        current->fs->in_exec = 0;
        current->in_execve = 0;



btw, by the same token, shouldn't we call sched_exec() after the exec()
actually succeeds and the process has a new mm (thus having a smallest
effective memory and cache footprint)? I know that I could be missing
something subtle.

Ani


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* RE: futex question
  2009-10-05 14:03                           ` Anirban Sinha
@ 2009-10-05 18:36                             ` Anirban Sinha
  0 siblings, 0 replies; 26+ messages in thread
From: Anirban Sinha @ 2009-10-05 18:36 UTC (permalink / raw)
  To: Anirban Sinha, Thomas Gleixner
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Darren Hart, Kaz Kylheku

>I thought clearing the head pointers was a part of the stale existing
state?

.. and by head pointer I mean the current->robust-list pointer and it's
compat version.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-10-05 18:37 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-30  1:10 futex question Anirban Sinha
2009-10-01  9:22 ` Ingo Molnar
2009-10-01 16:54   ` Anirban Sinha
2009-10-01 23:46   ` Anirban Sinha
2009-10-02 23:38     ` Darren Hart
2009-10-03  0:36       ` Anirban Sinha
2009-10-03  4:14         ` Eric Dumazet
2009-10-04  8:44       ` Thomas Gleixner
     [not found]         ` <DDFD17CC94A9BD49A82147DDF7D545C501F457C5@exchange.ZeugmaSystems.local>
2009-10-04 16:37           ` Anirban Sinha
2009-10-04 16:59             ` Thomas Gleixner
2009-10-05 10:36               ` Peter Zijlstra
2009-10-05 10:56                 ` Thomas Gleixner
2009-10-05 11:16                   ` Peter Zijlstra
2009-10-05 11:19                     ` Ingo Molnar
2009-10-05 11:50                       ` Thomas Gleixner
2009-10-05 11:47                     ` Thomas Gleixner
2009-10-05 13:11                       ` Anirban Sinha
2009-10-05 13:28                         ` Thomas Gleixner
2009-10-05 14:03                           ` Anirban Sinha
2009-10-05 18:36                             ` Anirban Sinha
2009-10-05 11:58                 ` Peter Zijlstra
2009-10-05 11:59                   ` Thomas Gleixner
2009-10-05 12:18                     ` Peter Zijlstra
2009-10-05 12:24                       ` Ingo Molnar
2009-10-05 14:09                         ` Darren Hart
2009-10-05 18:11                 ` Anirban Sinha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.