* Re: futex question
2009-10-05 10:36 ` Peter Zijlstra
@ 2009-10-05 10:56 ` Thomas Gleixner
2009-10-05 11:16 ` Peter Zijlstra
2009-10-05 11:58 ` Peter Zijlstra
2009-10-05 18:11 ` Anirban Sinha
2 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-05 10:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
Peter,
On Mon, 5 Oct 2009, Peter Zijlstra wrote:
> On Sun, 2009-10-04 at 18:59 +0200, Thomas Gleixner wrote:
>
> > > do. It does not feel right. Currently, with or without my change,
> > > such a thing would indefinitely block other waiters on the same
> > > futex.
> >
> > Right. Which completely defeats the purpose of the robust list. Will
> > have a look tomorrow.
>
> Right, so mm_release() which is meant to destroy the old mm context
> actually does exit_robust_list(), but the problem is that it does so on
> the new mm, not the old one that got passed down to mm_release().
>
> The other detail is that exit_robust_list() doesn't clear
> current->robust_list.
I know.
> The problem with the patch send my Ani is that it clears the robust
> lists before the point of no return, so on a failing execve() we'd have
> messed up the state.
Right. We need to do that at the latest possible point.
Looking more into that I think we should check whether the robust list
has an entry (lock held) in do_execve() and return -EWOULDBLOCK to
luser space. Same if pi_waiters is not empty. Holding a lock and
calling execve() is simply broken.
Thanks,
tglx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 10:56 ` Thomas Gleixner
@ 2009-10-05 11:16 ` Peter Zijlstra
2009-10-05 11:19 ` Ingo Molnar
2009-10-05 11:47 ` Thomas Gleixner
0 siblings, 2 replies; 26+ messages in thread
From: Peter Zijlstra @ 2009-10-05 11:16 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
On Mon, 2009-10-05 at 12:56 +0200, Thomas Gleixner wrote:
>
> Looking more into that I think we should check whether the robust list
> has an entry (lock held) in do_execve() and return -EWOULDBLOCK to
> luser space. Same if pi_waiters is not empty. Holding a lock and
> calling execve() is simply broken.
Fine by me ;-)
something like the below?
The question is of course what Ani was doing that triggered this in the
first place and if he can live with this.. :-)
---
fs/exec.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index d49be6b..0812ba6 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1295,6 +1295,22 @@ int do_execve(char * filename,
bool clear_in_exec;
int retval;
+ retval = -EWOULDBLOCK;
+#ifdef CONFIG_FUTEX
+ if (unlikely(current->robust_list))
+ goto out_ret;
+#ifdef CONFIG_COMPAT
+ if (unlikely(current->compat_robust_list))
+ goto out_ret;
+#endif
+ spin_lock_irq(¤t->pi_lock);
+ if (!list_empty(¤t->pi_state_list)) {
+ spin_unlock_irq(¤t->pi_lock);
+ goto out_ret;
+ }
+ spin_unlock_irq(¤t->pi_lock);
+#endif
+
retval = unshare_files(&displaced);
if (retval)
goto out_ret;
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 11:16 ` Peter Zijlstra
@ 2009-10-05 11:19 ` Ingo Molnar
2009-10-05 11:50 ` Thomas Gleixner
2009-10-05 11:47 ` Thomas Gleixner
1 sibling, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2009-10-05 11:19 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Thomas Gleixner, Anirban Sinha, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
* Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> diff --git a/fs/exec.c b/fs/exec.c
> index d49be6b..0812ba6 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1295,6 +1295,22 @@ int do_execve(char * filename,
> bool clear_in_exec;
> int retval;
>
> + retval = -EWOULDBLOCK;
> +#ifdef CONFIG_FUTEX
> + if (unlikely(current->robust_list))
> + goto out_ret;
> +#ifdef CONFIG_COMPAT
> + if (unlikely(current->compat_robust_list))
> + goto out_ret;
> +#endif
> + spin_lock_irq(¤t->pi_lock);
> + if (!list_empty(¤t->pi_state_list)) {
> + spin_unlock_irq(¤t->pi_lock);
> + goto out_ret;
> + }
> + spin_unlock_irq(¤t->pi_lock);
> +#endif
i suspect this should have the form of:
retval = can_exec_robust_futexes();
if (retval)
goto out_ret
retval = unshare_files(&displaced);
if (retval)
goto out_ret;
...
but ... shouldnt we just do what exec normally does and zap any state
that shouldnt be carried over into the new context - instead of denying
the exec? Am i missing something?
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 11:19 ` Ingo Molnar
@ 2009-10-05 11:50 ` Thomas Gleixner
0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-05 11:50 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Anirban Sinha, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
On Mon, 5 Oct 2009, Ingo Molnar wrote:
> * Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> > diff --git a/fs/exec.c b/fs/exec.c
> > index d49be6b..0812ba6 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1295,6 +1295,22 @@ int do_execve(char * filename,
> > bool clear_in_exec;
> > int retval;
> >
> > + retval = -EWOULDBLOCK;
> > +#ifdef CONFIG_FUTEX
> > + if (unlikely(current->robust_list))
> > + goto out_ret;
> > +#ifdef CONFIG_COMPAT
> > + if (unlikely(current->compat_robust_list))
> > + goto out_ret;
> > +#endif
> > + spin_lock_irq(¤t->pi_lock);
> > + if (!list_empty(¤t->pi_state_list)) {
> > + spin_unlock_irq(¤t->pi_lock);
> > + goto out_ret;
> > + }
> > + spin_unlock_irq(¤t->pi_lock);
> > +#endif
>
> i suspect this should have the form of:
>
> retval = can_exec_robust_futexes();
> if (retval)
> goto out_ret
Yes.
> retval = unshare_files(&displaced);
> if (retval)
> goto out_ret;
>
> ...
>
> but ... shouldnt we just do what exec normally does and zap any state
> that shouldnt be carried over into the new context - instead of denying
> the exec? Am i missing something?
We want to check whether the robust list is empty. If it's not empty
then we deny the exec instead of silently releasing the futexes or
just ignoring the robust list entirely. Same applies for the pi
waiters list.
Thanks,
tglx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 11:16 ` Peter Zijlstra
2009-10-05 11:19 ` Ingo Molnar
@ 2009-10-05 11:47 ` Thomas Gleixner
2009-10-05 13:11 ` Anirban Sinha
1 sibling, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-05 11:47 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
On Mon, 5 Oct 2009, Peter Zijlstra wrote:
> On Mon, 2009-10-05 at 12:56 +0200, Thomas Gleixner wrote:
> >
> > Looking more into that I think we should check whether the robust list
> > has an entry (lock held) in do_execve() and return -EWOULDBLOCK to
> > luser space. Same if pi_waiters is not empty. Holding a lock and
> > calling execve() is simply broken.
>
> Fine by me ;-)
>
> something like the below?
>
> The question is of course what Ani was doing that triggered this in the
> first place and if he can live with this.. :-)
>
> ---
> fs/exec.c | 16 ++++++++++++++++
> 1 files changed, 16 insertions(+), 0 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index d49be6b..0812ba6 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1295,6 +1295,22 @@ int do_execve(char * filename,
> bool clear_in_exec;
> int retval;
>
> + retval = -EWOULDBLOCK;
> +#ifdef CONFIG_FUTEX
> + if (unlikely(current->robust_list))
> + goto out_ret;
> +#ifdef CONFIG_COMPAT
> + if (unlikely(current->compat_robust_list))
> + goto out_ret;
> +#endif
That needs to call into the futex code and check whether the list is
empty. If not empty, return. If empty set the pointer to NULL
Thanks,
tglx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 11:47 ` Thomas Gleixner
@ 2009-10-05 13:11 ` Anirban Sinha
2009-10-05 13:28 ` Thomas Gleixner
0 siblings, 1 reply; 26+ messages in thread
From: Anirban Sinha @ 2009-10-05 13:11 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
Hi all:
catching up with the mails now. still pretty early here in the west coast.
Once upon a time, like on 09-10-05 4:47 AM, Thomas Gleixner wrote:
> On Mon, 5 Oct 2009, Peter Zijlstra wrote:
>> On Mon, 2009-10-05 at 12:56 +0200, Thomas Gleixner wrote:
>>>
>>> Looking more into that I think we should check whether the robust list
>>> has an entry (lock held) in do_execve() and return -EWOULDBLOCK to
>>> luser space. Same if pi_waiters is not empty. Holding a lock and
>>> calling execve() is simply broken.
>>
ha ha! Now you say it's broken :) Indeed, I also thought so. However, I wonder if this behavior change could potentially break some user space application?
>> Fine by me ;-)
>>
>> something like the below?
>>
>> The question is of course what Ani was doing that triggered this in the
>> first place and if he can live with this.. :-)
>>
>> ---
>> fs/exec.c | 16 ++++++++++++++++
>> 1 files changed, 16 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/exec.c b/fs/exec.c
>> index d49be6b..0812ba6 100644
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1295,6 +1295,22 @@ int do_execve(char * filename,
>> bool clear_in_exec;
>> int retval;
>>
>> + retval = -EWOULDBLOCK;
>> +#ifdef CONFIG_FUTEX
>> + if (unlikely(current->robust_list))
>> + goto out_ret;
>> +#ifdef CONFIG_COMPAT
>> + if (unlikely(current->compat_robust_list))
>> + goto out_ret;
>> +#endif
>
> That needs to call into the futex code and check whether the list is
> empty. If not empty, return. If empty set the pointer to NULL
True. Just because the head pointer has been registered does not mean that the list is non-empty. It can point back to itself when no locks are held as it's circular.
We need to clear those pointers regardless. After the exceve(), the address values are meaningless under the new mm context.
Ani
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 13:11 ` Anirban Sinha
@ 2009-10-05 13:28 ` Thomas Gleixner
2009-10-05 14:03 ` Anirban Sinha
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-05 13:28 UTC (permalink / raw)
To: Anirban Sinha
Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
Can you please fix your mail client to do proper line breaks at around
78 chars ?
On Mon, 5 Oct 2009, Anirban Sinha wrote:
>
> We need to clear those pointers regardless. After the exceve(), the
> address values are meaningless under the new mm context.
That's out of question. We just need to come to a decision whether we
silently clean up callers dumbness or not.
Thanks,
tglx
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 13:28 ` Thomas Gleixner
@ 2009-10-05 14:03 ` Anirban Sinha
2009-10-05 18:36 ` Anirban Sinha
0 siblings, 1 reply; 26+ messages in thread
From: Anirban Sinha @ 2009-10-05 14:03 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
Once upon a time, like on 09-10-05 6:28 AM, Thomas Gleixner wrote:
> Can you please fix your mail client to do proper line breaks at around
> 78 chars ?
Hmm. I made that change deliberately according to instructions in
Documentation/email-clients.txt so that it does not break my patches :) Anyway,
back to previous setting.
>>
>> We need to clear those pointers regardless. After the exceve(), the
>> address values are meaningless under the new mm context.
>
> That's out of question. We just need to come to a decision whether we
> silently clean up callers dumbness or not.
Correct me if I am wrong, but according to Ingo:
> So i think exec() should release all existing state, unless told
> otherwise. Making it behave differently for robust futexes sounds
> assymetric to me.
I thought clearing the head pointers was a part of the stale existing state?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 10:36 ` Peter Zijlstra
2009-10-05 10:56 ` Thomas Gleixner
@ 2009-10-05 11:58 ` Peter Zijlstra
2009-10-05 11:59 ` Thomas Gleixner
2009-10-05 18:11 ` Anirban Sinha
2 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2009-10-05 11:58 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
On Mon, 2009-10-05 at 12:36 +0200, Peter Zijlstra wrote:
> On Sun, 2009-10-04 at 18:59 +0200, Thomas Gleixner wrote:
>
> > > do. It does not feel right. Currently, with or without my change,
> > > such a thing would indefinitely block other waiters on the same
> > > futex.
> >
> > Right. Which completely defeats the purpose of the robust list. Will
> > have a look tomorrow.
>
> Right, so mm_release() which is meant to destroy the old mm context
> actually does exit_robust_list(), but the problem is that it does so on
> the new mm, not the old one that got passed down to mm_release().
>
> The other detail is that exit_robust_list() doesn't clear
> current->robust_list.
>
> The problem with the patch send my Ani is that it clears the robust
> lists before the point of no return, so on a failing execve() we'd have
> messed up the state.
>
> Making exit_robust_list() deal with an mm that is not the current mm is
> interesting indeed.
Hmm...
static int exec_mmap(struct mm_struct *mm)
{
struct task_struct *tsk;
struct mm_struct * old_mm, *active_mm;
/* Notify parent that we're no longer interested in the old VM */
tsk = current;
old_mm = current->mm;
mm_release(tsk, old_mm);
if (old_mm) {
/*
* Make sure that if there is a core dump in progress
* for the old mm, we get out and die instead of going
* through with the exec. We must hold mmap_sem around
* checking core_state and changing tsk->mm.
*/
down_read(&old_mm->mmap_sem);
if (unlikely(old_mm->core_state)) {
up_read(&old_mm->mmap_sem);
return -EINTR;
}
}
task_lock(tsk);
active_mm = tsk->active_mm;
tsk->mm = mm;
tsk->active_mm = mm;
activate_mm(active_mm, mm);
task_unlock(tsk);
arch_pick_mmap_layout(mm);
if (old_mm) {
up_read(&old_mm->mmap_sem);
BUG_ON(active_mm != old_mm);
mm_update_next_owner(old_mm);
mmput(old_mm);
return 0;
}
mmdrop(active_mm);
return 0;
}
Actually calls mm_release() before the flip, so the below might actually
be sufficient?
---
kernel/fork.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 266c6af..4c20fff 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -570,12 +570,18 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
/* Get rid of any futexes when releasing the mm */
#ifdef CONFIG_FUTEX
- if (unlikely(tsk->robust_list))
+ if (unlikely(tsk->robust_list)) {
exit_robust_list(tsk);
+ tsk->robust_list = NULL;
+ }
#ifdef CONFIG_COMPAT
- if (unlikely(tsk->compat_robust_list))
+ if (unlikely(tsk->compat_robust_list)) {
compat_exit_robust_list(tsk);
+ tsk->compat_robust_list = NULL;
+ }
#endif
+ if (unlikely(!list_empty(&tsk->pi_state_list)))
+ exit_pi_state_list(tsk);
#endif
/* Get rid of any cached register state */
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 11:58 ` Peter Zijlstra
@ 2009-10-05 11:59 ` Thomas Gleixner
2009-10-05 12:18 ` Peter Zijlstra
0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2009-10-05 11:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
On Mon, 5 Oct 2009, Peter Zijlstra wrote:
> On Mon, 2009-10-05 at 12:36 +0200, Peter Zijlstra wrote:
> > On Sun, 2009-10-04 at 18:59 +0200, Thomas Gleixner wrote:
> >
> > > > do. It does not feel right. Currently, with or without my change,
> > > > such a thing would indefinitely block other waiters on the same
> > > > futex.
> > >
> > > Right. Which completely defeats the purpose of the robust list. Will
> > > have a look tomorrow.
> >
> > Right, so mm_release() which is meant to destroy the old mm context
> > actually does exit_robust_list(), but the problem is that it does so on
> > the new mm, not the old one that got passed down to mm_release().
> >
> > The other detail is that exit_robust_list() doesn't clear
> > current->robust_list.
> >
> > The problem with the patch send my Ani is that it clears the robust
> > lists before the point of no return, so on a failing execve() we'd have
> > messed up the state.
> >
> > Making exit_robust_list() deal with an mm that is not the current mm is
> > interesting indeed.
>
> Hmm...
>
> static int exec_mmap(struct mm_struct *mm)
> {
> struct task_struct *tsk;
> struct mm_struct * old_mm, *active_mm;
>
> /* Notify parent that we're no longer interested in the old VM */
> tsk = current;
> old_mm = current->mm;
> mm_release(tsk, old_mm);
>
> if (old_mm) {
> /*
> * Make sure that if there is a core dump in progress
> * for the old mm, we get out and die instead of going
> * through with the exec. We must hold mmap_sem around
> * checking core_state and changing tsk->mm.
> */
> down_read(&old_mm->mmap_sem);
> if (unlikely(old_mm->core_state)) {
> up_read(&old_mm->mmap_sem);
> return -EINTR;
> }
> }
> task_lock(tsk);
> active_mm = tsk->active_mm;
> tsk->mm = mm;
> tsk->active_mm = mm;
> activate_mm(active_mm, mm);
> task_unlock(tsk);
> arch_pick_mmap_layout(mm);
> if (old_mm) {
> up_read(&old_mm->mmap_sem);
> BUG_ON(active_mm != old_mm);
> mm_update_next_owner(old_mm);
> mmput(old_mm);
> return 0;
> }
> mmdrop(active_mm);
> return 0;
> }
>
> Actually calls mm_release() before the flip, so the below might actually
> be sufficient?
Stared at the same place a minute ago :) But still I wonder if it's a
good idea to silently release locks and set the state to OWNERDEAD
instead of hitting the app programmer with a big clue stick in case
the app holds locks when calling execve().
Thanks,
tglx
> ---
> kernel/fork.c | 10 ++++++++--
> 1 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 266c6af..4c20fff 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -570,12 +570,18 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm)
>
> /* Get rid of any futexes when releasing the mm */
> #ifdef CONFIG_FUTEX
> - if (unlikely(tsk->robust_list))
> + if (unlikely(tsk->robust_list)) {
> exit_robust_list(tsk);
> + tsk->robust_list = NULL;
> + }
> #ifdef CONFIG_COMPAT
> - if (unlikely(tsk->compat_robust_list))
> + if (unlikely(tsk->compat_robust_list)) {
> compat_exit_robust_list(tsk);
> + tsk->compat_robust_list = NULL;
> + }
> #endif
> + if (unlikely(!list_empty(&tsk->pi_state_list)))
> + exit_pi_state_list(tsk);
> #endif
>
> /* Get rid of any cached register state */
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 11:59 ` Thomas Gleixner
@ 2009-10-05 12:18 ` Peter Zijlstra
2009-10-05 12:24 ` Ingo Molnar
0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2009-10-05 12:18 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
On Mon, 2009-10-05 at 13:59 +0200, Thomas Gleixner wrote:
> Stared at the same place a minute ago :) But still I wonder if it's a
> good idea to silently release locks and set the state to OWNERDEAD
> instead of hitting the app programmer with a big clue stick in case
> the app holds locks when calling execve().
Agreed, I rather like the feedback. With regular exit like things
there's just not much we can do to avoid the mess, but here we can
actually avoid it, seems a waste not to do so.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 12:18 ` Peter Zijlstra
@ 2009-10-05 12:24 ` Ingo Molnar
2009-10-05 14:09 ` Darren Hart
0 siblings, 1 reply; 26+ messages in thread
From: Ingo Molnar @ 2009-10-05 12:24 UTC (permalink / raw)
To: Peter Zijlstra, Oleg Nesterov, Andrew Morton, Linus Torvalds
Cc: Thomas Gleixner, Anirban Sinha, linux-kernel, Darren Hart,
Kaz Kylheku, Anirban Sinha
* Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, 2009-10-05 at 13:59 +0200, Thomas Gleixner wrote:
>
> > Stared at the same place a minute ago :) But still I wonder if it's
> > a good idea to silently release locks and set the state to OWNERDEAD
> > instead of hitting the app programmer with a big clue stick in case
> > the app holds locks when calling execve().
>
> Agreed, I rather like the feedback. With regular exit like things
> there's just not much we can do to avoid the mess, but here we can
> actually avoid it, seems a waste not to do so.
Well, exec() has been a 'exit() + boot-strap next process' kind of thing
from the get go - with little state carried over into the new task. This
has security and robustness reasons as well.
So i think exec() should release all existing state, unless told
otherwise. Making it behave differently for robust futexes sounds
assymetric to me.
It might make sense though - a 'prevent exec because you are holding
locks!' thing. Dunno.
Cc:-ed a few execve() semantics experts who might want to chime in.
If a (buggy) app calls execve() with a (robust) futex still held should
we auto-force-release robust locks held, or fail the exec with an error
code? I think the forced release is a 'anomalous exit' thing mostly,
while calling exec() is not anomalous at all.
Ingo
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: futex question
2009-10-05 12:24 ` Ingo Molnar
@ 2009-10-05 14:09 ` Darren Hart
0 siblings, 0 replies; 26+ messages in thread
From: Darren Hart @ 2009-10-05 14:09 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Oleg Nesterov, Andrew Morton, Linus Torvalds,
Thomas Gleixner, Anirban Sinha, linux-kernel, Kaz Kylheku,
Anirban Sinha
Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Mon, 2009-10-05 at 13:59 +0200, Thomas Gleixner wrote:
>>
>>> Stared at the same place a minute ago :) But still I wonder if it's
>>> a good idea to silently release locks and set the state to OWNERDEAD
>>> instead of hitting the app programmer with a big clue stick in case
>>> the app holds locks when calling execve().
>> Agreed, I rather like the feedback. With regular exit like things
>> there's just not much we can do to avoid the mess, but here we can
>> actually avoid it, seems a waste not to do so.
>
> Well, exec() has been a 'exit() + boot-strap next process' kind of thing
> from the get go - with little state carried over into the new task. This
> has security and robustness reasons as well.
>
> So i think exec() should release all existing state, unless told
> otherwise. Making it behave differently for robust futexes sounds
> assymetric to me.
>
> It might make sense though - a 'prevent exec because you are holding
> locks!' thing. Dunno.
>
> Cc:-ed a few execve() semantics experts who might want to chime in.
>
> If a (buggy) app calls execve() with a (robust) futex still held should
> we auto-force-release robust locks held, or fail the exec with an error
> code? I think the forced release is a 'anomalous exit' thing mostly,
> while calling exec() is not anomalous at all.
My first thought earlier in the thread was that changing the exec
behavior to fail if either a robust or pi futex is held would be liable
to break existing applications. I can now see the argument that such
apps are broken already, and if they aren't hanging, it's simply because
they are hacking around it.
I think the semantics work for robust mutexes, if you exec, the exec'ing
"thread" is effectively dead, so EOWNERDEAD makes sense.
This doesn't seem to work for PI futexes, unless they are also Robust of
course. Here I would expect a userspace application to hang.
The only locking related statements made in the SUS or our Linux man
pages is in regards to named semaphores. And here it is only said that
the semaphore will be closed like a call to sem_close(). sem_close(3)
doesn't specify a return value if the semaphore is held when called.
The closing of message queues and canceling of any pending asynchronous
I/O might provide precedent for just unlocking held locks and moving on
in the case of PI. EOWNERDEAD still makes more sense to me from a robust
point of view.
And from the ignorant-fool department, the docs refer to replacing the
"process image" on execve, doesn't that mean that if there are 20
threads in a process and one of them calls execve that all 20 are
destroyed? If so, then we are only concerned with
PTHREAD_PROCESS_SHARED futexes, since none of the private futexes will
have any users after the execve.
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
^ permalink raw reply [flat|nested] 26+ messages in thread
* RE: futex question
2009-10-05 10:36 ` Peter Zijlstra
2009-10-05 10:56 ` Thomas Gleixner
2009-10-05 11:58 ` Peter Zijlstra
@ 2009-10-05 18:11 ` Anirban Sinha
2 siblings, 0 replies; 26+ messages in thread
From: Anirban Sinha @ 2009-10-05 18:11 UTC (permalink / raw)
To: Peter Zijlstra, Thomas Gleixner
Cc: Anirban Sinha, Ingo Molnar, linux-kernel, Darren Hart, Kaz Kylheku
>
>The problem with the patch send my Ani is that it clears the robust
>lists before the point of no return, so on a failing execve() we'd have
>messed up the state.
Ah! yes. I should have added the lines after load_binary() succeeds:
fs/compat.c | 3 +++
fs/exec.c | 3 +++
2 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/fs/compat.c b/fs/compat.c
index 6d6f98f..7d1baf5 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1539,6 +1539,9 @@ int compat_do_execve(char * filename,
if (retval < 0)
goto out;
+#ifdef CONFIG_FUTEX
+ current->compat_robust_list = NULL;
+#endif
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
diff --git a/fs/exec.c b/fs/exec.c
index 172ceb6..d7b4ca3 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1354,6 +1354,9 @@ int do_execve(char * filename,
if (retval < 0)
goto out;
+#ifdef CONFIG_FUTEX
+ current->robust_list = NULL;
+#endif
/* execve succeeded */
current->fs->in_exec = 0;
current->in_execve = 0;
btw, by the same token, shouldn't we call sched_exec() after the exec()
actually succeeds and the process has a new mm (thus having a smallest
effective memory and cache footprint)? I know that I could be missing
something subtle.
Ani
^ permalink raw reply related [flat|nested] 26+ messages in thread