linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump
@ 2012-05-07 19:07 Suresh Siddha
  2012-05-07 19:07 ` [PATCH 2/2] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
  2012-05-07 19:15 ` [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump Linus Torvalds
  0 siblings, 2 replies; 30+ messages in thread
From: Suresh Siddha @ 2012-05-07 19:07 UTC (permalink / raw)
  To: torvalds, hpa, mingo, oleg; +Cc: suresh, linux-kernel, Suresh Siddha

Nalluru reported hitting the BUG_ON(__thread_has_fpu(tsk)) in
arch/x86/kernel/xsave.c:__sanitize_i387_state() during the coredump
of a multi-threaded application.

A look at the exit seqeuence shows that other threads can still be on the
runqueue potentially at the below shown exit_mm() code snippet:

		if (atomic_dec_and_test(&core_state->nr_threads))
			complete(&core_state->startup);

===> other threads can still be active here, but we notify the thread
===> dumping core to wakeup from the coredump_wait() after the last thread
===> joins this point. Core dumping thread will continue dumping
===> all the threads state to the core file.

		for (;;) {
			set_task_state(tsk, TASK_UNINTERRUPTIBLE);
			if (!self.task) /* see coredump_finish() */
				break;
			schedule();
		}

As some of those threads are on the runqueue and didn't call schedule() yet,
their fpu state is still active in the live registers and the thread
proceeding with the coredump will hit the above mentioned BUG_ON while
trying to dump other threads fpustate to the coredump file.

BUG_ON() in arch/x86/kernel/xsave.c:__sanitize_i387_state() is
in the code paths for processors supporting xsaveopt. With or without
xsaveopt, multi-threaded coredump is broken and maynot contain
the correct fpustate at the time of exit.

Fix this by explicitly flushing the fpu state in do_exit() by calling
prepare_to_copy()

Reported-by: Suresh Nalluru <suresh@aristanetworks.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 kernel/exit.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..913f2a6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -974,6 +974,8 @@ void do_exit(long code)
 		tty_audit_exit();
 	audit_free(tsk);
 
+	prepare_to_copy(tsk);
+
 	tsk->exit_code = code;
 	taskstats_exit(tsk, group_dead);
 
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  2012-05-07 19:07 [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump Suresh Siddha
@ 2012-05-07 19:07 ` Suresh Siddha
  2012-05-07 19:15 ` [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump Linus Torvalds
  1 sibling, 0 replies; 30+ messages in thread
From: Suresh Siddha @ 2012-05-07 19:07 UTC (permalink / raw)
  To: torvalds, hpa, mingo, oleg; +Cc: suresh, linux-kernel, Suresh Siddha

Code paths like fork(), exit() and signal handling flush the fpu
state explicitly to the structures in memory.

BUG_ON() in __sanitize_i387_state() is checking that the fpu state
is not live any more. But for preempt kernels, task can be scheduled
out and in at any place and the preload_fpu logic during context switch
can make the fpu registers live again.

Similarly during core dump, thread dumping the core can schedule out
and in for page-allocations etc in non-preempt case.

So remove the paranoid check, even though it caught a bug in the
multi-threaded core dump case (fixed in the previous patch).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/kernel/xsave.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index e62728e..bd18149 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -48,8 +48,6 @@ void __sanitize_i387_state(struct task_struct *tsk)
 	if (!fx)
 		return;
 
-	BUG_ON(__thread_has_fpu(tsk));
-
 	xstate_bv = tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv;
 
 	/*
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump
  2012-05-07 19:07 [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump Suresh Siddha
  2012-05-07 19:07 ` [PATCH 2/2] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
@ 2012-05-07 19:15 ` Linus Torvalds
  2012-05-07 20:09   ` Suresh Siddha
  1 sibling, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2012-05-07 19:15 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: hpa, mingo, oleg, suresh, linux-kernel

On Mon, May 7, 2012 at 12:07 PM, Suresh Siddha
<suresh.b.siddha@intel.com> wrote:
>
> Fix this by explicitly flushing the fpu state in do_exit() by calling
> prepare_to_copy()

Ugh, I hate this one.

We're making the exit path more expensive for almost no gain. The FPU
state is dead in 99.9% of all cases.

Why isn't this a core-dump-only case?

                    Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump
  2012-05-07 19:15 ` [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump Linus Torvalds
@ 2012-05-07 20:09   ` Suresh Siddha
  2012-05-08 23:18     ` Suresh Siddha
  0 siblings, 1 reply; 30+ messages in thread
From: Suresh Siddha @ 2012-05-07 20:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: hpa, mingo, oleg, suresh, linux-kernel

On Mon, 2012-05-07 at 12:15 -0700, Linus Torvalds wrote:
> On Mon, May 7, 2012 at 12:07 PM, Suresh Siddha
> <suresh.b.siddha@intel.com> wrote:
> >
> > Fix this by explicitly flushing the fpu state in do_exit() by calling
> > prepare_to_copy()
> 
> Ugh, I hate this one.
> 
> We're making the exit path more expensive for almost no gain. The FPU
> state is dead in 99.9% of all cases.
> 
> Why isn't this a core-dump-only case?
> 

Today we do this unlazy_fpu() (which is what prepare_to_copy does)
already as part of the first schedule() in the exit path. I am just
making it explicit by calling prepare_to_copy() before exit_mm() which
synchronizes all the threads before the first thread goes with the core
dump.

I don't think I am adding any more cost to the existing path, unless I
am missing something.

thanks,
suresh


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump
  2012-05-07 20:09   ` Suresh Siddha
@ 2012-05-08 23:18     ` Suresh Siddha
  2012-05-08 23:18       ` [PATCH 1/3] " Suresh Siddha
                         ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Suresh Siddha @ 2012-05-08 23:18 UTC (permalink / raw)
  To: torvalds, hpa, mingo, oleg; +Cc: Suresh Siddha, linux-kernel, suresh

On Mon, 2012-05-07 at 13:09 -0700, Suresh Siddha wrote:
On Mon, 2012-05-07 at 12:15 -0700, Linus Torvalds wrote:
> > Ugh, I hate this one.
> > 
> > We're making the exit path more expensive for almost no gain. The FPU
> > state is dead in 99.9% of all cases.
> > 
> > Why isn't this a core-dump-only case?
> > 
> 
> Today we do this unlazy_fpu() (which is what prepare_to_copy does)
> already as part of the first schedule() in the exit path. I am just
> making it explicit by calling prepare_to_copy() before exit_mm() which
> synchronizes all the threads before the first thread goes with the core
> dump.
> 
> I don't think I am adding any more cost to the existing path, unless I
> am missing something.

My quick test confirmed my understanding. For example, mainline is 
doing fpu_save_init() some 1770 times when a tsk is in the TASK_DEAD state,
during boot of a two socket system.

So based on your suggestion, made the first patch as core-dump only case and
added the third patch which clears the fpu state during exit.

thanks.

Suresh Siddha (3):
  coredump: flush the fpu exit state for proper multi-threaded core
    dump
  x86, xsave: remove thread_has_fpu() bug check in
    __sanitize_i387_state()
  x86, fpu: clear the fpu state during thread exit

 arch/x86/kernel/process.c |   19 +++++++++++++------
 arch/x86/kernel/xsave.c   |    2 --
 kernel/exit.c             |    5 +++++
 3 files changed, 18 insertions(+), 8 deletions(-)

-- 
1.7.6.5


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/3] coredump: flush the fpu exit state for proper multi-threaded core dump
  2012-05-08 23:18     ` Suresh Siddha
@ 2012-05-08 23:18       ` Suresh Siddha
  2012-05-09 21:05         ` Oleg Nesterov
  2012-05-08 23:18       ` [PATCH 2/3] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
  2012-05-08 23:18       ` [PATCH 3/3] x86, fpu: clear the fpu state during thread exit Suresh Siddha
  2 siblings, 1 reply; 30+ messages in thread
From: Suresh Siddha @ 2012-05-08 23:18 UTC (permalink / raw)
  To: torvalds, hpa, mingo, oleg; +Cc: Suresh Siddha, linux-kernel, suresh

Nalluru reported hitting the BUG_ON(__thread_has_fpu(tsk)) in
arch/x86/kernel/xsave.c:__sanitize_i387_state() during the coredump
of a multi-threaded application.

A look at the exit seqeuence shows that other threads can still be on the
runqueue potentially at the below shown exit_mm() code snippet:

		if (atomic_dec_and_test(&core_state->nr_threads))
			complete(&core_state->startup);

===> other threads can still be active here, but we notify the thread
===> dumping core to wakeup from the coredump_wait() after the last thread
===> joins this point. Core dumping thread will continue dumping
===> all the threads state to the core file.

		for (;;) {
			set_task_state(tsk, TASK_UNINTERRUPTIBLE);
			if (!self.task) /* see coredump_finish() */
				break;
			schedule();
		}

As some of those threads are on the runqueue and didn't call schedule() yet,
their fpu state is still active in the live registers and the thread
proceeding with the coredump will hit the above mentioned BUG_ON while
trying to dump other threads fpustate to the coredump file.

BUG_ON() in arch/x86/kernel/xsave.c:__sanitize_i387_state() is
in the code paths for processors supporting xsaveopt. With or without
xsaveopt, multi-threaded coredump is broken and maynot contain
the correct fpustate at the time of exit.

If the coredump is in progress, explicitly flush the fpu state by calling
prepare_to_copy().

Reported-by: Suresh Nalluru <suresh@aristanetworks.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 kernel/exit.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b42..dc90d63 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -656,6 +656,11 @@ static void exit_mm(struct task_struct * tsk)
 		struct core_thread self;
 		up_read(&mm->mmap_sem);
 
+		/*
+		 * Flush the live extended register state to memory.
+		 */
+		prepare_to_copy(tsk);
+
 		self.task = tsk;
 		self.next = xchg(&core_state->dumper.next, &self);
 		/*
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/3] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  2012-05-08 23:18     ` Suresh Siddha
  2012-05-08 23:18       ` [PATCH 1/3] " Suresh Siddha
@ 2012-05-08 23:18       ` Suresh Siddha
  2012-05-09 20:30         ` Oleg Nesterov
  2012-05-08 23:18       ` [PATCH 3/3] x86, fpu: clear the fpu state during thread exit Suresh Siddha
  2 siblings, 1 reply; 30+ messages in thread
From: Suresh Siddha @ 2012-05-08 23:18 UTC (permalink / raw)
  To: torvalds, hpa, mingo, oleg; +Cc: Suresh Siddha, linux-kernel, suresh

Code paths like fork(), exit() and signal handling flush the fpu
state explicitly to the structures in memory.

BUG_ON() in __sanitize_i387_state() is checking that the fpu state
is not live any more. But for preempt kernels, task can be scheduled
out and in at any place and the preload_fpu logic during context switch
can make the fpu registers live again.

Similarly during core dump, thread dumping the core can schedule out
and in for page-allocations etc in non-preempt case.

So remove the paranoid check, even though it caught a bug in the
multi-threaded core dump case (fixed in the previous patch).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/kernel/xsave.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index e62728e..bd18149 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -48,8 +48,6 @@ void __sanitize_i387_state(struct task_struct *tsk)
 	if (!fx)
 		return;
 
-	BUG_ON(__thread_has_fpu(tsk));
-
 	xstate_bv = tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv;
 
 	/*
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/3] x86, fpu: clear the fpu state during thread exit
  2012-05-08 23:18     ` Suresh Siddha
  2012-05-08 23:18       ` [PATCH 1/3] " Suresh Siddha
  2012-05-08 23:18       ` [PATCH 2/3] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
@ 2012-05-08 23:18       ` Suresh Siddha
  2 siblings, 0 replies; 30+ messages in thread
From: Suresh Siddha @ 2012-05-08 23:18 UTC (permalink / raw)
  To: torvalds, hpa, mingo, oleg; +Cc: Suresh Siddha, linux-kernel, suresh

There is no need to save any active fpu state to the task structure
memory if the task is dead. Just clear the state instead.

For example, this saved some 1770 xsave's during the system boot
of a two socket Xeon system.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/kernel/process.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1d92a5a..1c7566e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -81,6 +81,16 @@ void arch_task_cache_init(void)
 				  SLAB_PANIC | SLAB_NOTRACK, NULL);
 }
 
+static inline void flush_fpu(struct task_struct *tsk)
+{
+	/*
+	 * Forget coprocessor state..
+	 */
+	tsk->fpu_counter = 0;
+	clear_fpu(tsk);
+	clear_used_math();
+}
+
 /*
  * Free current thread data structures etc..
  */
@@ -103,6 +113,8 @@ void exit_thread(void)
 		put_cpu();
 		kfree(bp);
 	}
+
+	flush_fpu(me);
 }
 
 void show_regs(struct pt_regs *regs)
@@ -143,12 +155,7 @@ void flush_thread(void)
 
 	flush_ptrace_hw_breakpoint(tsk);
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
-	/*
-	 * Forget coprocessor state..
-	 */
-	tsk->fpu_counter = 0;
-	clear_fpu(tsk);
-	clear_used_math();
+	flush_fpu(tsk);
 }
 
 static void hard_disable_TSC(void)
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  2012-05-08 23:18       ` [PATCH 2/3] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
@ 2012-05-09 20:30         ` Oleg Nesterov
  2012-05-09 21:18           ` Suresh Siddha
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2012-05-09 20:30 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: torvalds, hpa, mingo, linux-kernel, suresh

Hi Suresh,

I can't really comment this series, my understanding of this code
is too limited.

But could you explain this patch? I am just curious.

On 05/08, Suresh Siddha wrote:
>
> BUG_ON() in __sanitize_i387_state() is checking that the fpu state
> is not live any more. But for preempt kernels, task can be scheduled
> out and in at any place and the preload_fpu logic during context switch
> can make the fpu registers live again.

And? Do you see any particular scenario when this BUG_ON() is wrong?

Afaics, __sanitize_i387_state() should not be called if the task can
be scheduled in with ->fpu_counter != 0.

> Similarly during core dump, thread dumping the core can schedule out
> and in for page-allocations etc in non-preempt case.

Again, can't understand. The core-dumping thread does init_fpu()
before it calls sanitize_i387_state().

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] coredump: flush the fpu exit state for proper multi-threaded core dump
  2012-05-08 23:18       ` [PATCH 1/3] " Suresh Siddha
@ 2012-05-09 21:05         ` Oleg Nesterov
  2012-05-09 21:32           ` Suresh Siddha
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2012-05-09 21:05 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: torvalds, hpa, mingo, linux-kernel, suresh

On 05/08, Suresh Siddha wrote:
>
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -656,6 +656,11 @@ static void exit_mm(struct task_struct * tsk)
>  		struct core_thread self;
>  		up_read(&mm->mmap_sem);
>
> +		/*
> +		 * Flush the live extended register state to memory.
> +		 */
> +		prepare_to_copy(tsk);

This doesn't look very nice imho, but I guess you understand this...

Perhaps we need an arch-dependent helper which saves the FPU regs
if needed.

I can be easily wrong, but I did the quick grep and I am not sure
we can rely on prepare_to_copy(). For example, it is a nop in
arch/sh/include/asm/processor_64.h. But at the same time it has
save_fpu().

OTOH, I am not sure it is safe to use prepare_to_copy() in exit_mm(),
at least in theory. God knows what it can do...

But again, I do not think I can comment this change. Perhaps this
is the right step anyway.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  2012-05-09 20:30         ` Oleg Nesterov
@ 2012-05-09 21:18           ` Suresh Siddha
  2012-05-10 16:36             ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Suresh Siddha @ 2012-05-09 21:18 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: torvalds, hpa, mingo, linux-kernel, suresh

On Wed, 2012-05-09 at 22:30 +0200, Oleg Nesterov wrote:
> On 05/08, Suresh Siddha wrote:
> >
> > BUG_ON() in __sanitize_i387_state() is checking that the fpu state
> > is not live any more. But for preempt kernels, task can be scheduled
> > out and in at any place and the preload_fpu logic during context switch
> > can make the fpu registers live again.
> 
> And? Do you see any particular scenario when this BUG_ON() is wrong?
> 
> Afaics, __sanitize_i387_state() should not be called if the task can
> be scheduled in with ->fpu_counter != 0.

It is not easy, that is why we haven't seen any issues for so long. I
can give an example with 64-bit kernel with preempt enabled.

Task-A which uses fpu frequently and as such you will find its
fpu_counter mostly non-zero. During its time slice, kernel used fpu by
doing kernel_fpu_begin/kernel_fpu_end(). After this, in the same
scheduling slice, task-A got a signal to handle. Then during the signal
setup path we got preempted when we are just before the
sanitize_i387_state() call in
arch/x86/kernel/xsave.c:save_i387_xstate(). And when we come back we
will have the fpu registers live that can hit the bug_on.

This doesn't happen with 32-bit kernel because they do clear_used_math()
before doing sanitize_i387_state() and any preempt shouldn't preload the
fpu state.

64-bit kernel didn't do this earlier because they do an optimization of
copying the live fpu registers on to the stack directly (see
xsave_user() in the same file) etc.

I am planning to remove this 64-bit specific signal handling
optimization and share the same signal handling code between 32bit/64bit
kernels (infact someone posted those patches before and I am planning to
dust them off soon and repost).

Also, in future because of some xstate that can't be handled in lazy
manner (for example AMD's LWP state), we will always pre-load during
context-switch. But that is a different story.

> > Similarly during core dump, thread dumping the core can schedule out
> > and in for page-allocations etc in non-preempt case.
> 
> Again, can't understand. The core-dumping thread does init_fpu()
> before it calls sanitize_i387_state().

Here I actually meant other threads context-switching in and out, while
the main thread dumps the core. For example, other threads can
context-switch in and out (because of preempt or the explicit schedule()
in kernel/exit.c:exit_mm()) and the main thread dumping core can run
into this bug when it finds some other thread with its fpu registers
live on some other cpu.

thanks,
suresh


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] coredump: flush the fpu exit state for proper multi-threaded core dump
  2012-05-09 21:05         ` Oleg Nesterov
@ 2012-05-09 21:32           ` Suresh Siddha
  2012-05-10 16:55             ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Suresh Siddha @ 2012-05-09 21:32 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: torvalds, hpa, mingo, linux-kernel, suresh

On Wed, 2012-05-09 at 23:05 +0200, Oleg Nesterov wrote:
> On 05/08, Suresh Siddha wrote:
> >
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -656,6 +656,11 @@ static void exit_mm(struct task_struct * tsk)
> >  		struct core_thread self;
> >  		up_read(&mm->mmap_sem);
> >
> > +		/*
> > +		 * Flush the live extended register state to memory.
> > +		 */
> > +		prepare_to_copy(tsk);
> 
> This doesn't look very nice imho, but I guess you understand this...
> 
> Perhaps we need an arch-dependent helper which saves the FPU regs
> if needed.
> 
> I can be easily wrong, but I did the quick grep and I am not sure
> we can rely on prepare_to_copy(). For example, it is a nop in
> arch/sh/include/asm/processor_64.h. But at the same time it has
> save_fpu().
> 
> OTOH, I am not sure it is safe to use prepare_to_copy() in exit_mm(),
> at least in theory. God knows what it can do...

There is an explicit schedule() just few lines below. And the schedule()
also will do the same thing. The thing is we want the user-specific
extended registers to be flushed to memory (used also in the fork path)
before we notify the core dumping thread that we reached the serializing
point, for the dumping thread to continue the dump process.

And to make this coredump specific, I really need to move it there as I
need to reliably find out if the core dump is in progress for this
thread.

Linus/Peter/Ingo, any comments?

thanks,
suresh


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/3] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  2012-05-09 21:18           ` Suresh Siddha
@ 2012-05-10 16:36             ` Oleg Nesterov
  0 siblings, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2012-05-10 16:36 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: torvalds, hpa, mingo, linux-kernel, suresh

On 05/09, Suresh Siddha wrote:
>
> On Wed, 2012-05-09 at 22:30 +0200, Oleg Nesterov wrote:
> > On 05/08, Suresh Siddha wrote:
> > >
> > > BUG_ON() in __sanitize_i387_state() is checking that the fpu state
> > > is not live any more. But for preempt kernels, task can be scheduled
> > > out and in at any place and the preload_fpu logic during context switch
> > > can make the fpu registers live again.
> >
> > And? Do you see any particular scenario when this BUG_ON() is wrong?
> >
> > Afaics, __sanitize_i387_state() should not be called if the task can
> > be scheduled in with ->fpu_counter != 0.
>
> It is not easy, that is why we haven't seen any issues for so long. I
> can give an example with 64-bit kernel with preempt enabled.
>
> Task-A which uses fpu frequently and as such you will find its
> fpu_counter mostly non-zero. During its time slice, kernel used fpu by
> doing kernel_fpu_begin/kernel_fpu_end(). After this, in the same
> scheduling slice, task-A got a signal to handle. Then during the signal
> setup path we got preempted when we are just before the
> sanitize_i387_state() call in
> arch/x86/kernel/xsave.c:save_i387_xstate(). And when we come back we
> will have the fpu registers live that can hit the bug_on.

Indeed. Thanks a lot for your explanation.

> I am planning to remove this 64-bit specific signal handling
> optimization and share the same signal handling code between 32bit/64bit
> kernels (infact someone posted those patches before and I am planning to
> dust them off soon and repost).

Cool ;)

> > > Similarly during core dump, thread dumping the core can schedule out
> > > and in for page-allocations etc in non-preempt case.
> >
> > Again, can't understand. The core-dumping thread does init_fpu()
> > before it calls sanitize_i387_state().
>
> Here I actually meant other threads context-switching in and out, while
> the main thread dumps the core.

I see, thanks.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] coredump: flush the fpu exit state for proper multi-threaded core dump
  2012-05-09 21:32           ` Suresh Siddha
@ 2012-05-10 16:55             ` Oleg Nesterov
  2012-05-10 17:04               ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2012-05-10 16:55 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: torvalds, hpa, mingo, linux-kernel, suresh

On 05/09, Suresh Siddha wrote:
>
> On Wed, 2012-05-09 at 23:05 +0200, Oleg Nesterov wrote:
> > On 05/08, Suresh Siddha wrote:
> > >
> > > --- a/kernel/exit.c
> > > +++ b/kernel/exit.c
> > > @@ -656,6 +656,11 @@ static void exit_mm(struct task_struct * tsk)
> > >  		struct core_thread self;
> > >  		up_read(&mm->mmap_sem);
> > >
> > > +		/*
> > > +		 * Flush the live extended register state to memory.
> > > +		 */
> > > +		prepare_to_copy(tsk);
> >
> > This doesn't look very nice imho, but I guess you understand this...
> >
> > Perhaps we need an arch-dependent helper which saves the FPU regs
> > if needed.
> >
> > I can be easily wrong, but I did the quick grep and I am not sure
> > we can rely on prepare_to_copy(). For example, it is a nop in
> > arch/sh/include/asm/processor_64.h. But at the same time it has
> > save_fpu().
> >
> > OTOH, I am not sure it is safe to use prepare_to_copy() in exit_mm(),
> > at least in theory. God knows what it can do...
>
> There is an explicit schedule() just few lines below. And the schedule()
> also will do the same thing. The thing is we want the user-specific
> extended registers to be flushed to memory (used also in the fork path)
> before we notify the core dumping thread that we reached the serializing
> point, for the dumping thread to continue the dump process.

I understand.

My point was, there is no any guarantee prepare_to_copy() does the flush.
An architecture can do this in copy_thread() or arch_dup_task_struct(),
for example. In fact I do not understand why x86 doesn't do this.

prepare_to_copy() doesn't have any documented semantics, it looks "strange"
in exit_mm().

But let me repeat, I do not see a better solution for now.

may be we can add wait_task_inactive() in fill_thread_core_info() though,
not sure.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] coredump: flush the fpu exit state for proper multi-threaded core dump
  2012-05-10 16:55             ` Oleg Nesterov
@ 2012-05-10 17:04               ` Linus Torvalds
  2012-05-10 23:33                 ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Suresh Siddha
  2012-05-10 23:48                 ` [PATCH 1/3] coredump: flush the fpu exit state for proper multi-threaded core dump Suresh Siddha
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2012-05-10 17:04 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Suresh Siddha, hpa, mingo, linux-kernel, suresh

On Thu, May 10, 2012 at 9:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> My point was, there is no any guarantee prepare_to_copy() does the flush.
> An architecture can do this in copy_thread() or arch_dup_task_struct(),
> for example. In fact I do not understand why x86 doesn't do this.

I agree that it would actually make more sense to do in
arch_dup_task_struct(). I had trouble finding where the heck the
fork() code did the FPU fixes back when I was fighting the FPU
corruption thing.

The prepare_to_copy() thing is, I think, purely historical, and I
think we should in fact get rid of it. Everybody else makes it a
no-op, I think, with a *few* exceptions that seem to have copied the
x86 model of flushing the FPU there.

So if somebody sends me a patch to remove that thing, and move the few
existing users to arch_dup_task_struct(), I'd take it.

I think it would be a mistake to use it in the exit path. Make an
explicit "drop_thread_state()" or similar macro, which can undo FPU
state and possibly other architecture state.

                       Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct()
  2012-05-10 17:04               ` Linus Torvalds
@ 2012-05-10 23:33                 ` Suresh Siddha
  2012-05-10 23:33                   ` [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump Suresh Siddha
                                     ` (4 more replies)
  2012-05-10 23:48                 ` [PATCH 1/3] coredump: flush the fpu exit state for proper multi-threaded core dump Suresh Siddha
  1 sibling, 5 replies; 30+ messages in thread
From: Suresh Siddha @ 2012-05-10 23:33 UTC (permalink / raw)
  To: torvalds, hpa, mingo, oleg
  Cc: Suresh Siddha, linux-kernel, suresh, David Howells,
	Koichi Yasutake, Benjamin Herrenschmidt, Paul Mackerras,
	Paul Mundt, Chris Zankel

Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
the architectures and the rest following the x86 model of flushing the extended
register state like fpu there.

Remove it and use the arch_dup_task_struct() instead.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
---
 arch/alpha/include/asm/processor.h      |    3 ---
 arch/arm/include/asm/processor.h        |    3 ---
 arch/avr32/include/asm/processor.h      |    3 ---
 arch/blackfin/include/asm/processor.h   |    2 --
 arch/c6x/include/asm/processor.h        |    3 ---
 arch/cris/include/asm/processor.h       |    4 ----
 arch/frv/include/asm/processor.h        |    2 --
 arch/frv/kernel/process.c               |   11 -----------
 arch/h8300/include/asm/processor.h      |    2 --
 arch/hexagon/include/asm/processor.h    |    7 -------
 arch/ia64/include/asm/processor.h       |    3 ---
 arch/m32r/include/asm/processor.h       |    2 --
 arch/m68k/include/asm/processor.h       |    3 ---
 arch/microblaze/include/asm/processor.h |    1 -
 arch/mips/include/asm/processor.h       |    3 ---
 arch/mn10300/include/asm/processor.h    |    3 ---
 arch/mn10300/kernel/process.c           |   10 ++++++----
 arch/openrisc/include/asm/processor.h   |    4 ----
 arch/parisc/include/asm/processor.h     |    3 ---
 arch/powerpc/include/asm/processor.h    |    3 ---
 arch/powerpc/kernel/process.c           |   19 +++++++++++--------
 arch/s390/include/asm/processor.h       |    3 ---
 arch/score/include/asm/processor.h      |    1 -
 arch/sh/include/asm/processor_32.h      |    3 ---
 arch/sh/include/asm/processor_64.h      |    1 -
 arch/sh/kernel/process.c                |    7 +++++++
 arch/sh/kernel/process_32.c             |    9 ---------
 arch/sparc/include/asm/processor_32.h   |    3 ---
 arch/sparc/include/asm/processor_64.h   |    3 ---
 arch/tile/include/asm/processor.h       |    3 ---
 arch/um/include/asm/processor-generic.h |    5 -----
 arch/unicore32/include/asm/processor.h  |    3 ---
 arch/x86/include/asm/processor.h        |    3 ---
 arch/x86/kernel/process.c               |    6 ++++++
 arch/x86/kernel/process_32.c            |    9 ---------
 arch/x86/kernel/process_64.c            |    9 ---------
 arch/xtensa/include/asm/processor.h     |    3 ---
 arch/xtensa/kernel/process.c            |    9 ++++++---
 kernel/fork.c                           |    2 --
 39 files changed, 36 insertions(+), 140 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 94afe58..e37b887 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -49,9 +49,6 @@ extern void start_thread(struct pt_regs *, unsigned long, unsigned long);
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 /* Create a kernel thread without removing it from tasklists.  */
 extern long kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 5ac8d3d..6354224 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -77,9 +77,6 @@ struct task_struct;
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 unsigned long get_wchan(struct task_struct *p);
 
 #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index 108502b..87d8bac 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -145,9 +145,6 @@ extern void release_thread(struct task_struct *);
 /* Create a kernel thread without removing it from tasklists */
 extern int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk) do { } while(0)
-
 /* Return saved PC of a blocked thread */
 #define thread_saved_pc(tsk)    ((tsk)->thread.cpu_context.pc)
 
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 8af7772..4ef7cfe 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -75,8 +75,6 @@ static inline void release_thread(struct task_struct *dead_task)
 {
 }
 
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern int kernel_thread(int (*fn) (void *), void *arg, unsigned long flags);
 
 /*
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 3ff7fab..c50af7e 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -92,9 +92,6 @@ static inline void release_thread(struct task_struct *dead_task)
 {
 }
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 #define copy_segments(tsk, mm)		do { } while (0)
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 4210d72..37f522f 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -50,10 +50,6 @@ struct task_struct;
 #define task_pt_regs(task) user_regs(task_thread_info(task))
 #define current_regs() task_pt_regs(current)
 
-static inline void prepare_to_copy(struct task_struct *tsk)
-{
-}
-
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 unsigned long get_wchan(struct task_struct *p);
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 81c2e27..9c76817 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -103,8 +103,6 @@ do {							\
 	__frame->sp	= (_usp);			\
 } while(0)
 
-extern void prepare_to_copy(struct task_struct *tsk);
-
 /* Free all resources held by a thread. */
 static inline void release_thread(struct task_struct *dead_task)
 {
diff --git a/arch/frv/kernel/process.c b/arch/frv/kernel/process.c
index d4de48b..9f3dfad 100644
--- a/arch/frv/kernel/process.c
+++ b/arch/frv/kernel/process.c
@@ -180,17 +180,6 @@ asmlinkage int sys_clone(unsigned long clone_flags, unsigned long newsp,
 	return do_fork(clone_flags, newsp, __frame, 0, parent_tidptr, child_tidptr);
 } /* end sys_clone() */
 
-/*****************************************************************************/
-/*
- * This gets called before we allocate a new thread and copy
- * the current task into it.
- */
-void prepare_to_copy(struct task_struct *tsk)
-{
-	//unlazy_fpu(tsk);
-} /* end prepare_to_copy() */
-
-/*****************************************************************************/
 /*
  * set up the kernel stack and exception frames for a new process
  */
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 61fabf1..4c9f6f8 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -109,8 +109,6 @@ static inline void release_thread(struct task_struct *dead_task)
 
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
-#define prepare_to_copy(tsk)	do { } while (0)
-
 /*
  * Free current thread data structures etc..
  */
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index 20c5dda..e8ea459 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -59,13 +59,6 @@ struct thread_struct {
 #define cpu_relax() __vmyield()
 
 /*
- * "Unlazying all lazy status" occurs here.
- */
-static inline void prepare_to_copy(struct task_struct *tsk)
-{
-}
-
-/*
  * Decides where the kernel will search for a free chunk of vm space during
  * mmaps.
  * See also arch_get_unmapped_area.
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 483f6c6..efcca1b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -343,9 +343,6 @@ struct task_struct;
  */
 #define release_thread(dead_task)
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 /*
  * This is the mechanism for creating a new kernel thread.
  *
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index e1f46d7..da17253 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -118,8 +118,6 @@ struct mm_struct;
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-#define prepare_to_copy(tsk)	do { } while (0)
-
 /*
  * create a kernel thread without removing it from tasklists
  */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 46460fa..f17c42a 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -153,9 +153,6 @@ static inline void release_thread(struct task_struct *dead_task)
 {
 }
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 /*
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index bffb545..af2bb96 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -23,7 +23,6 @@ extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
 # define cpu_sleep()		do {} while (0)
-# define prepare_to_copy(tsk)	do {} while (0)
 
 #define task_pt_regs(tsk) \
 		(((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 20e9dcf..5e33fab 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -310,9 +310,6 @@ struct task_struct;
 /* Free all resources held by a thread. */
 #define release_thread(thread) do { } while(0)
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 extern unsigned long thread_saved_pc(struct task_struct *tsk);
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index f7b3c9a..247928c 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -139,9 +139,6 @@ static inline void start_thread(struct pt_regs *regs,
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-extern void prepare_to_copy(struct task_struct *tsk);
-
 /*
  * create a kernel thread without removing it from tasklists
  */
diff --git a/arch/mn10300/kernel/process.c b/arch/mn10300/kernel/process.c
index 14707f2..7dab0cd 100644
--- a/arch/mn10300/kernel/process.c
+++ b/arch/mn10300/kernel/process.c
@@ -208,12 +208,14 @@ void copy_segments(struct task_struct *p, struct mm_struct *new_mm)
 }
 
 /*
- * this gets called before we allocate a new thread and copy the current task
- * into it so that we can store lazy state into memory
+ * this gets called so that we can store lazy state into memory and copy the
+ * current task into the new thread.
  */
-void prepare_to_copy(struct task_struct *tsk)
+int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
-	unlazy_fpu(tsk);
+	unlazy_fpu(src);
+	*dst = *src;
+	return 0;
 }
 
 /*
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index f7516fa..30462f1 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -72,10 +72,6 @@ struct thread_struct {
 #define task_pt_regs(task) user_regs(task_thread_info(task))
 #define current_regs() user_regs(current_thread_info())
 
-extern inline void prepare_to_copy(struct task_struct *tsk)
-{
-}
-
 #define INIT_SP         (sizeof(init_stack) + (unsigned long) &init_stack)
 
 #define INIT_THREAD  { }
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index acdf4ca..0e8b7b8 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -328,9 +328,6 @@ struct mm_struct;
 extern void release_thread(struct task_struct *);
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern void map_hpux_gateway_page(struct task_struct *tsk, struct mm_struct *mm);
 
 extern unsigned long get_wchan(struct task_struct *p);
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 8e2d037..854f899 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -74,9 +74,6 @@ struct task_struct;
 void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp);
 void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-extern void prepare_to_copy(struct task_struct *tsk);
-
 /* Create a new kernel thread. */
 extern long kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 4937c96..3fa3403 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -711,18 +711,21 @@ release_thread(struct task_struct *t)
 }
 
 /*
- * This gets called before we allocate a new thread and copy
- * the current task into it.
+ * this gets called so that we can store coprocessor state into memory and copy the
+ * current task into the new thread.
  */
-void prepare_to_copy(struct task_struct *tsk)
+int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
-	flush_fp_to_thread(current);
-	flush_altivec_to_thread(current);
-	flush_vsx_to_thread(current);
-	flush_spe_to_thread(current);
+	flush_fp_to_thread(src);
+	flush_altivec_to_thread(src);
+	flush_vsx_to_thread(src);
+	flush_spe_to_thread(src);
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
-	flush_ptrace_hw_breakpoint(tsk);
+	flush_ptrace_hw_breakpoint(src);
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+	*dst = *src;
+	return 0;
 }
 
 /*
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index d499b30..6cbf313 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -141,9 +141,6 @@ struct seq_file;
 extern void release_thread(struct task_struct *);
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 /*
  * Return saved PC of a blocked thread.
  */
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index 7e22f21..ab3aceb 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -26,7 +26,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		barrier()
 #define release_thread(thread)	do {} while (0)
-#define prepare_to_copy(tsk)	do {} while (0)
 
 /*
  * User space process size: 2GB. This is hardcoded into a few places,
diff --git a/arch/sh/include/asm/processor_32.h b/arch/sh/include/asm/processor_32.h
index 900f8d7..b6311fd 100644
--- a/arch/sh/include/asm/processor_32.h
+++ b/arch/sh/include/asm/processor_32.h
@@ -126,9 +126,6 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_pc, unsigned lo
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-void prepare_to_copy(struct task_struct *tsk);
-
 /*
  * create a kernel thread without removing it from tasklists
  */
diff --git a/arch/sh/include/asm/processor_64.h b/arch/sh/include/asm/processor_64.h
index e25c4c7..fe99afe 100644
--- a/arch/sh/include/asm/processor_64.h
+++ b/arch/sh/include/asm/processor_64.h
@@ -172,7 +172,6 @@ extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 #define copy_segments(p, mm)	do { } while (0)
 #define release_segments(mm)	do { } while (0)
 #define forget_segments()	do { } while (0)
-#define prepare_to_copy(tsk)	do { } while (0)
 /*
  * FPU lazy state save handling.
  */
diff --git a/arch/sh/kernel/process.c b/arch/sh/kernel/process.c
index 325f98b..2bde59e 100644
--- a/arch/sh/kernel/process.c
+++ b/arch/sh/kernel/process.c
@@ -6,8 +6,15 @@
 struct kmem_cache *task_xstate_cachep = NULL;
 unsigned int xstate_size;
 
+/*
+ * this gets called so that we can store lazy state into memory and copy the
+ * current task into the new thread.
+ */
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
+#ifdef CONFIG_SUPERH32
+	unlazy_fpu(src, task_pt_regs(src));
+#endif
 	*dst = *src;
 
 	if (src->thread.xstate) {
diff --git a/arch/sh/kernel/process_32.c b/arch/sh/kernel/process_32.c
index 94273aa..dee5961 100644
--- a/arch/sh/kernel/process_32.c
+++ b/arch/sh/kernel/process_32.c
@@ -155,15 +155,6 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpu)
 }
 EXPORT_SYMBOL(dump_fpu);
 
-/*
- * This gets called before we allocate a new thread and copy
- * the current task into it.
- */
-void prepare_to_copy(struct task_struct *tsk)
-{
-	unlazy_fpu(tsk, task_pt_regs(tsk));
-}
-
 asmlinkage void ret_from_fork(void);
 
 int copy_thread(unsigned long clone_flags, unsigned long usp,
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 09521c6..c9c760f 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -109,9 +109,6 @@ static inline void start_thread(struct pt_regs * regs, unsigned long pc,
 #define release_thread(tsk)		do { } while(0)
 extern pid_t kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern unsigned long get_wchan(struct task_struct *);
 
 #define task_pt_regs(tsk) ((tsk)->thread.kregs)
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index e713db2..67df5cc 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -186,9 +186,6 @@ do { \
 /* Free all resources held by a thread. */
 #define release_thread(tsk)		do { } while (0)
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern pid_t kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 extern unsigned long get_wchan(struct task_struct *task);
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 34c1e01..15cd8a4 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -210,9 +210,6 @@ static inline void release_thread(struct task_struct *dead_task)
 	/* Nothing for now */
 }
 
-/* Prepare to copy thread state - unlazy all lazy status. */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 
 extern int do_work_pending(struct pt_regs *regs, u32 flags);
diff --git a/arch/um/include/asm/processor-generic.h b/arch/um/include/asm/processor-generic.h
index 98d01bc..63b7160 100644
--- a/arch/um/include/asm/processor-generic.h
+++ b/arch/um/include/asm/processor-generic.h
@@ -76,11 +76,6 @@ static inline void release_thread(struct task_struct *task)
 
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
-static inline void prepare_to_copy(struct task_struct *tsk)
-{
-}
-
-
 extern unsigned long thread_saved_pc(struct task_struct *t);
 
 static inline void mm_copy_segments(struct mm_struct *from_mm,
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index f0d780a..14382cb 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -68,9 +68,6 @@ struct task_struct;
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4fa7dcc..97fe043 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -579,9 +579,6 @@ extern int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy state */
-extern void prepare_to_copy(struct task_struct *tsk);
-
 unsigned long get_wchan(struct task_struct *p);
 
 /*
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1d92a5a..b7e1e0e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -47,10 +47,16 @@ EXPORT_SYMBOL_GPL(idle_notifier_unregister);
 struct kmem_cache *task_xstate_cachep;
 EXPORT_SYMBOL_GPL(task_xstate_cachep);
 
+/*
+ * this gets called so that we can store lazy state into memory and copy the
+ * current task into the new thread.
+ */
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
 	int ret;
 
+	unlazy_fpu(src);
+
 	*dst = *src;
 	if (fpu_allocated(&src->thread.fpu)) {
 		memset(&dst->thread.fpu, 0, sizeof(dst->thread.fpu));
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index ae68473..2aa57dc 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -126,15 +126,6 @@ void release_thread(struct task_struct *dead_task)
 	release_vm86_irqs(dead_task);
 }
 
-/*
- * This gets called before we allocate a new thread and copy
- * the current task into it.
- */
-void prepare_to_copy(struct task_struct *tsk)
-{
-	unlazy_fpu(tsk);
-}
-
 int copy_thread(unsigned long clone_flags, unsigned long sp,
 	unsigned long unused,
 	struct task_struct *p, struct pt_regs *regs)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 43d8b48..c4c0645 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -145,15 +145,6 @@ static inline u32 read_32bit_tls(struct task_struct *t, int tls)
 	return get_desc_base(&t->thread.tls_array[tls]);
 }
 
-/*
- * This gets called before we allocate a new thread and copy
- * the current task into it.
- */
-void prepare_to_copy(struct task_struct *tsk)
-{
-	unlazy_fpu(tsk);
-}
-
 int copy_thread(unsigned long clone_flags, unsigned long sp,
 		unsigned long unused,
 	struct task_struct *p, struct pt_regs *regs)
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index 3acb26e..5c371d8 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -168,9 +168,6 @@ struct mm_struct;
 /* Free all resources held by a thread. */
 #define release_thread(thread) do { } while(0)
 
-/* Prepare to copy thread state - unlazy all lazy status */
-extern void prepare_to_copy(struct task_struct*);
-
 /* Create a kernel thread without removing it from tasklists */
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
index 6a2d6ed..e39af26 100644
--- a/arch/xtensa/kernel/process.c
+++ b/arch/xtensa/kernel/process.c
@@ -140,13 +140,16 @@ void flush_thread(void)
 }
 
 /*
- * This is called before the thread is copied. 
+ * this gets called so that we can store coprocessor state into memory and copy the
+ * current task into the new thread.
  */
-void prepare_to_copy(struct task_struct *tsk)
+int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
 #if XTENSA_HAVE_COPROCESSORS
-	coprocessor_flush_all(task_thread_info(tsk));
+	coprocessor_flush_all(task_thread_info(src));
 #endif
+	*dst = *src;
+	return 0;
 }
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index b9372a0..eaf1663 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -260,8 +260,6 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
 	int node = tsk_fork_get_node(orig);
 	int err;
 
-	prepare_to_copy(orig);
-
 	tsk = alloc_task_struct_node(node);
 	if (!tsk)
 		return NULL;
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  2012-05-10 23:33                 ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Suresh Siddha
@ 2012-05-10 23:33                   ` Suresh Siddha
  2012-05-11 16:51                     ` Oleg Nesterov
  2012-05-17  0:17                     ` [tip:x86/fpu] " tip-bot for Suresh Siddha
  2012-05-10 23:33                   ` [PATCH v2 3/4] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
                                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 30+ messages in thread
From: Suresh Siddha @ 2012-05-10 23:33 UTC (permalink / raw)
  To: torvalds, hpa, mingo, oleg; +Cc: Suresh Siddha, linux-kernel, suresh

Nalluru reported hitting the BUG_ON(__thread_has_fpu(tsk)) in
arch/x86/kernel/xsave.c:__sanitize_i387_state() during the coredump
of a multi-threaded application.

A look at the exit seqeuence shows that other threads can still be on the
runqueue potentially at the below shown exit_mm() code snippet:

		if (atomic_dec_and_test(&core_state->nr_threads))
			complete(&core_state->startup);

===> other threads can still be active here, but we notify the thread
===> dumping core to wakeup from the coredump_wait() after the last thread
===> joins this point. Core dumping thread will continue dumping
===> all the threads state to the core file.

		for (;;) {
			set_task_state(tsk, TASK_UNINTERRUPTIBLE);
			if (!self.task) /* see coredump_finish() */
				break;
			schedule();
		}

As some of those threads are on the runqueue and didn't call schedule() yet,
their fpu state is still active in the live registers and the thread
proceeding with the coredump will hit the above mentioned BUG_ON while
trying to dump other threads fpustate to the coredump file.

BUG_ON() in arch/x86/kernel/xsave.c:__sanitize_i387_state() is
in the code paths for processors supporting xsaveopt. With or without
xsaveopt, multi-threaded coredump is broken and maynot contain
the correct fpustate at the time of exit.

In coredump_wait(), wait for all the threads to be come inactive, so
that we are sure all the extended register state is flushed to
the memory, so that it can be reliably copied to the core file.

Reported-by: Suresh Nalluru <suresh@aristanetworks.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 fs/exec.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b1fd202..8e2ddeb 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1930,8 +1930,21 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
 		core_waiters = zap_threads(tsk, mm, core_state, exit_code);
 	up_write(&mm->mmap_sem);
 
-	if (core_waiters > 0)
+	if (core_waiters > 0) {
+		struct core_thread *ptr;
+
 		wait_for_completion(&core_state->startup);
+		/*
+		 * Wait for all the threads to become inactive, so that
+		 * all the thread context (extended register state, like
+		 * fpu etc) gets copied to the memory.
+		 */
+		ptr = core_state->dumper.next;
+		while (ptr != NULL) {
+			wait_task_inactive(ptr->task, 0);
+			ptr = ptr->next;
+		}
+	}
 
 	return core_waiters;
 }
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/4] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  2012-05-10 23:33                 ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Suresh Siddha
  2012-05-10 23:33                   ` [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump Suresh Siddha
@ 2012-05-10 23:33                   ` Suresh Siddha
  2012-05-17  0:18                     ` [tip:x86/fpu] " tip-bot for Suresh Siddha
  2012-05-10 23:33                   ` [PATCH v2 4/4] x86, fpu: drop the fpu state during thread exit Suresh Siddha
                                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 30+ messages in thread
From: Suresh Siddha @ 2012-05-10 23:33 UTC (permalink / raw)
  To: torvalds, hpa, mingo, oleg; +Cc: Suresh Siddha, linux-kernel, suresh

Code paths like fork(), exit() and signal handling flush the fpu
state explicitly to the structures in memory.

BUG_ON() in __sanitize_i387_state() is checking that the fpu state
is not live any more. But for preempt kernels, task can be scheduled
out and in at any place and the preload_fpu logic during context switch
can make the fpu registers live again.

For example, consider a 64-bit Task which uses fpu frequently and as such
you will find its fpu_counter mostly non-zero. During its time slice, kernel
used fpu by doing kernel_fpu_begin/kernel_fpu_end(). After this, in the same
scheduling slice, task-A got a signal to handle. Then during the signal
setup path we got preempted when we are just before the sanitize_i387_state()
in arch/x86/kernel/xsave.c:save_i387_xstate(). And when we come back we
will have the fpu registers live that can hit the bug_on.

Similarly during core dump, other threads can context-switch in and out
(because of spurious wakeups while waiting for the coredump to finish in
 kernel/exit.c:exit_mm()) and the main thread dumping core can run into this
bug when it finds some other thread with its fpu registers live on some other cpu.

So remove the paranoid check for now, even though it caught a bug in the
multi-threaded core dump case (fixed in the previous patch).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/kernel/xsave.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index e62728e..bd18149 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -48,8 +48,6 @@ void __sanitize_i387_state(struct task_struct *tsk)
 	if (!fx)
 		return;
 
-	BUG_ON(__thread_has_fpu(tsk));
-
 	xstate_bv = tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv;
 
 	/*
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/4] x86, fpu: drop the fpu state during thread exit
  2012-05-10 23:33                 ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Suresh Siddha
  2012-05-10 23:33                   ` [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump Suresh Siddha
  2012-05-10 23:33                   ` [PATCH v2 3/4] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
@ 2012-05-10 23:33                   ` Suresh Siddha
  2012-05-17  0:19                     ` [tip:x86/fpu] " tip-bot for Suresh Siddha
  2012-05-11  0:17                   ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Benjamin Herrenschmidt
  2012-05-17  0:16                   ` [tip:x86/fpu] " tip-bot for Suresh Siddha
  4 siblings, 1 reply; 30+ messages in thread
From: Suresh Siddha @ 2012-05-10 23:33 UTC (permalink / raw)
  To: torvalds, hpa, mingo, oleg; +Cc: Suresh Siddha, linux-kernel, suresh

There is no need to save any active fpu state to the task structure
memory if the task is dead. Just drop the state instead.

For example, this saved some 1770 xsave's during the system boot
of a two socket Xeon system.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/kernel/process.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b7e1e0e..1219fe2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -87,6 +87,16 @@ void arch_task_cache_init(void)
 				  SLAB_PANIC | SLAB_NOTRACK, NULL);
 }
 
+static inline void drop_fpu(struct task_struct *tsk)
+{
+	/*
+	 * Forget coprocessor state..
+	 */
+	tsk->fpu_counter = 0;
+	clear_fpu(tsk);
+	clear_used_math();
+}
+
 /*
  * Free current thread data structures etc..
  */
@@ -109,6 +119,8 @@ void exit_thread(void)
 		put_cpu();
 		kfree(bp);
 	}
+
+	drop_fpu(me);
 }
 
 void show_regs(struct pt_regs *regs)
@@ -149,12 +161,7 @@ void flush_thread(void)
 
 	flush_ptrace_hw_breakpoint(tsk);
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
-	/*
-	 * Forget coprocessor state..
-	 */
-	tsk->fpu_counter = 0;
-	clear_fpu(tsk);
-	clear_used_math();
+	drop_fpu(tsk);
 }
 
 static void hard_disable_TSC(void)
-- 
1.7.6.5


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/3] coredump: flush the fpu exit state for proper multi-threaded core dump
  2012-05-10 17:04               ` Linus Torvalds
  2012-05-10 23:33                 ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Suresh Siddha
@ 2012-05-10 23:48                 ` Suresh Siddha
  1 sibling, 0 replies; 30+ messages in thread
From: Suresh Siddha @ 2012-05-10 23:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, hpa, mingo, linux-kernel, suresh, dhowells,
	yasutake.koichi, benh, paulus, lethal, chris

On Thu, 2012-05-10 at 10:04 -0700, Linus Torvalds wrote:
> On Thu, May 10, 2012 at 9:55 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > My point was, there is no any guarantee prepare_to_copy() does the flush.
> > An architecture can do this in copy_thread() or arch_dup_task_struct(),
> > for example. In fact I do not understand why x86 doesn't do this.
> 
> I agree that it would actually make more sense to do in
> arch_dup_task_struct(). I had trouble finding where the heck the
> fork() code did the FPU fixes back when I was fighting the FPU
> corruption thing.
> 
> The prepare_to_copy() thing is, I think, purely historical, and I
> think we should in fact get rid of it. Everybody else makes it a
> no-op, I think, with a *few* exceptions that seem to have copied the
> x86 model of flushing the FPU there.
> 
> So if somebody sends me a patch to remove that thing, and move the few
> existing users to arch_dup_task_struct(), I'd take it.

ok. Just sent the v2 version of the patchset which includes this. I
didn't compile, test all the architectures. I copied the relevant arch
maintainers who were using prepare_to_copy(). I will be glad if they can
review/test that patch and Ack.

> I think it would be a mistake to use it in the exit path. Make an
> explicit "drop_thread_state()" or similar macro, which can undo FPU
> state and possibly other architecture state.

In my previous/current patchset, I am using arch specific exit_thread()
to call drop_fpu() (changed the name to drop_fpu() from flush_fpu() in
the v2 patchset) during normal exit.

But in the case of coredump, all the other threads are waiting in
exit_mm() as part of the coredump_wait() and this is where we hit the
bug_on, when the main thread finds the other thread is still active on
the runqueue. So based on the discussion with Oleg, I added
wait_task_inactive() check in coredump_wait() to ensure all the other
threads are really off the rq, so that their most recent fpu state is
copied to the memory, which will be copied to the core file by the
dumping thread. Hope this is ok.

thanks,
suresh


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct()
  2012-05-10 23:33                 ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Suresh Siddha
                                     ` (2 preceding siblings ...)
  2012-05-10 23:33                   ` [PATCH v2 4/4] x86, fpu: drop the fpu state during thread exit Suresh Siddha
@ 2012-05-11  0:17                   ` Benjamin Herrenschmidt
  2012-05-17  0:16                   ` [tip:x86/fpu] " tip-bot for Suresh Siddha
  4 siblings, 0 replies; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2012-05-11  0:17 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: torvalds, hpa, mingo, oleg, linux-kernel, suresh, David Howells,
	Koichi Yasutake, Paul Mackerras, Paul Mundt, Chris Zankel

On Thu, 2012-05-10 at 16:33 -0700, Suresh Siddha wrote:
> Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
> the architectures and the rest following the x86 model of flushing the extended
> register state like fpu there.
> 
> Remove it and use the arch_dup_task_struct() instead.
> 
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>

For the powerpc part:

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Cheers,
Ben.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  2012-05-10 23:33                   ` [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump Suresh Siddha
@ 2012-05-11 16:51                     ` Oleg Nesterov
  2012-05-11 19:05                       ` Suresh Siddha
  2012-05-17  0:17                     ` [tip:x86/fpu] " tip-bot for Suresh Siddha
  1 sibling, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2012-05-11 16:51 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: torvalds, hpa, mingo, linux-kernel, suresh

On 05/10, Suresh Siddha wrote:
>
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1930,8 +1930,21 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
>  		core_waiters = zap_threads(tsk, mm, core_state, exit_code);
>  	up_write(&mm->mmap_sem);
>
> -	if (core_waiters > 0)
> +	if (core_waiters > 0) {
> +		struct core_thread *ptr;
> +
>  		wait_for_completion(&core_state->startup);
> +		/*
> +		 * Wait for all the threads to become inactive, so that
> +		 * all the thread context (extended register state, like
> +		 * fpu etc) gets copied to the memory.
> +		 */
> +		ptr = core_state->dumper.next;
> +		while (ptr != NULL) {
> +			wait_task_inactive(ptr->task, 0);
> +			ptr = ptr->next;
> +		}
> +	}

OK, but this adds the unnecessary penalty if we are not going to dump
the core.

Perhaps it makes sense to create a separate helper and call it from
do_coredump() right before "retval = binfmt->core_dump(&cprm)" ?

This also increases the chance that wait_task_inactive() won't actually
wait.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  2012-05-11 16:51                     ` Oleg Nesterov
@ 2012-05-11 19:05                       ` Suresh Siddha
  2012-05-13 16:11                         ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Suresh Siddha @ 2012-05-11 19:05 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: torvalds, hpa, mingo, linux-kernel, suresh

On Fri, 2012-05-11 at 18:51 +0200, Oleg Nesterov wrote:
> On 05/10, Suresh Siddha wrote:
> >
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -1930,8 +1930,21 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
> >  		core_waiters = zap_threads(tsk, mm, core_state, exit_code);
> >  	up_write(&mm->mmap_sem);
> >
> > -	if (core_waiters > 0)
> > +	if (core_waiters > 0) {
> > +		struct core_thread *ptr;
> > +
> >  		wait_for_completion(&core_state->startup);
> > +		/*
> > +		 * Wait for all the threads to become inactive, so that
> > +		 * all the thread context (extended register state, like
> > +		 * fpu etc) gets copied to the memory.
> > +		 */
> > +		ptr = core_state->dumper.next;
> > +		while (ptr != NULL) {
> > +			wait_task_inactive(ptr->task, 0);
> > +			ptr = ptr->next;
> > +		}
> > +	}
> 
> OK, but this adds the unnecessary penalty if we are not going to dump
> the core.

If we are not planning to dump the core, then we will not be in the
coredump_wait() right?

coredump_wait() already waits for all the threads to respond (referring
to the existing wait_for_completion() line before the proposed
addition). wait_for_completion() already ensures that the other threads
are close to schedule() with TASK_UNINTERRUPTIBLE, so most of the
penalty is already taken and in most cases, wait_task_inactive() will
return success immediately. And in the corner cases (where we hit the
bug_on before) we will spin a bit now while the other thread is still on
the rq.

> Perhaps it makes sense to create a separate helper and call it from
> do_coredump() right before "retval = binfmt->core_dump(&cprm)" ?

I didn't want to spread the core dump waits at multiple places.
coredump_wait() seems to be the natural place, as we are already waiting
for other threads to join.

thanks,
suresh


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  2012-05-11 19:05                       ` Suresh Siddha
@ 2012-05-13 16:11                         ` Oleg Nesterov
  2012-05-15 18:03                           ` Suresh Siddha
  0 siblings, 1 reply; 30+ messages in thread
From: Oleg Nesterov @ 2012-05-13 16:11 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: torvalds, hpa, mingo, linux-kernel, suresh

On 05/11, Suresh Siddha wrote:
>
> On Fri, 2012-05-11 at 18:51 +0200, Oleg Nesterov wrote:
> > On 05/10, Suresh Siddha wrote:
> > >
> > > --- a/fs/exec.c
> > > +++ b/fs/exec.c
> > > @@ -1930,8 +1930,21 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
> > >  		core_waiters = zap_threads(tsk, mm, core_state, exit_code);
> > >  	up_write(&mm->mmap_sem);
> > >
> > > -	if (core_waiters > 0)
> > > +	if (core_waiters > 0) {
> > > +		struct core_thread *ptr;
> > > +
> > >  		wait_for_completion(&core_state->startup);
> > > +		/*
> > > +		 * Wait for all the threads to become inactive, so that
> > > +		 * all the thread context (extended register state, like
> > > +		 * fpu etc) gets copied to the memory.
> > > +		 */
> > > +		ptr = core_state->dumper.next;
> > > +		while (ptr != NULL) {
> > > +			wait_task_inactive(ptr->task, 0);
> > > +			ptr = ptr->next;
> > > +		}
> > > +	}
> >
> > OK, but this adds the unnecessary penalty if we are not going to dump
> > the core.
>
> If we are not planning to dump the core, then we will not be in the
> coredump_wait() right?

No. coredump_wait() is always called if sig_kernel_coredump(sig) == T.
Ignoring the __get_dumpable() security checks. And if RLIMIT_CORE == 0
we do not actually dump the core (unless ispipe of course).

Btw, I do not know if this is essential or not. I mean, we could probably
add the "fast path" check and simply return, the whole process will be
killed anyway by do_group_exit(), it is faster than coredump_wait().
But note that coredump_wait() kills all ->mm users (not only sub-threads),
perhaps we shouldn't/can't change this behaviour, I dunno.

And yes, do_coredump() does other unnecessary work like override_creds()
in this case, probably this needs some cleanups.

> coredump_wait() already waits for all the threads to respond (referring
> to the existing wait_for_completion() line before the proposed
> addition).

Yes,

> and in most cases, wait_task_inactive() will
> return success immediately.

I agree. Still, we add the O(n) loop which needs task_rq_lock() at least.

> And in the corner cases (where we hit the
> bug_on before) we will spin a bit now while the other thread is still on
> the rq.

Plus the possible schedule_hrtimeout().

But once again, I agree that this all is not very important.

> > Perhaps it makes sense to create a separate helper and call it from
> > do_coredump() right before "retval = binfmt->core_dump(&cprm)" ?
>
> I didn't want to spread the core dump waits at multiple places.
> coredump_wait() seems to be the natural place, as we are already waiting
> for other threads to join.

OK. We can shift this code later if needed.

In fact, perhaps we should move the whole "if (core_waiters > 0)" logic,
including wait_for_completion() to increase the parallelize. Not sure,
this will complicate the error handling.

And, perhaps, we should use wait_task_inactive(TASK_UNINTERRUPTIBLE) and
change exit_mm() to do set_task_state() before atomic_dec_and_test(), but
this is almost off-topic.


I believe the patch is correct.

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  2012-05-13 16:11                         ` Oleg Nesterov
@ 2012-05-15 18:03                           ` Suresh Siddha
  2012-05-15 18:55                             ` Oleg Nesterov
  0 siblings, 1 reply; 30+ messages in thread
From: Suresh Siddha @ 2012-05-15 18:03 UTC (permalink / raw)
  To: Oleg Nesterov, torvalds; +Cc: hpa, mingo, linux-kernel, suresh

On Sun, 2012-05-13 at 18:11 +0200, Oleg Nesterov wrote:
> On 05/11, Suresh Siddha wrote:
> >
> > On Fri, 2012-05-11 at 18:51 +0200, Oleg Nesterov wrote:
> > > On 05/10, Suresh Siddha wrote:
> > > >
> > > > --- a/fs/exec.c
> > > > +++ b/fs/exec.c
> > > > @@ -1930,8 +1930,21 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
> > > >  		core_waiters = zap_threads(tsk, mm, core_state, exit_code);
> > > >  	up_write(&mm->mmap_sem);
> > > >
> > > > -	if (core_waiters > 0)
> > > > +	if (core_waiters > 0) {
> > > > +		struct core_thread *ptr;
> > > > +
> > > >  		wait_for_completion(&core_state->startup);
> > > > +		/*
> > > > +		 * Wait for all the threads to become inactive, so that
> > > > +		 * all the thread context (extended register state, like
> > > > +		 * fpu etc) gets copied to the memory.
> > > > +		 */
> > > > +		ptr = core_state->dumper.next;
> > > > +		while (ptr != NULL) {
> > > > +			wait_task_inactive(ptr->task, 0);
> > > > +			ptr = ptr->next;
> > > > +		}
> > > > +	}

Linus, are you ok with this?

> > and in most cases, wait_task_inactive() will
> > return success immediately.
> 
> I agree. Still, we add the O(n) loop which needs task_rq_lock() at least.

yeah I am not too happy with it. But we have only two choices and I
presented both of them.

1. What I do in this patch. Wait for all the threads to become inactive
so that core dumping thread can proceed with the dump operation.

2. call the equivalent of prepare_to_copy() in exit_mm() when the
mm->core_state is set, so that each thread's latest state is flushed to
the memory before we notify the main thread to start the core dump. You
and I think even Linus didn't like it.

May be there is a third one, which I don't want to pursue. using
finish_task_switch() to notify the main-core dumping thread that all the
other threads are now inactive. But adding the core-dump related check
in the hot task switch path is probably not a good idea :(

Given that the process exit with a signal leading to coredump is not a
common thing, we can probably live with option-1.

> I believe the patch is correct.
> 

Oleg, Can I treat this as your Ack for this patch?

thanks,
suresh


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  2012-05-15 18:03                           ` Suresh Siddha
@ 2012-05-15 18:55                             ` Oleg Nesterov
  0 siblings, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2012-05-15 18:55 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: torvalds, hpa, mingo, linux-kernel, suresh

On 05/15, Suresh Siddha wrote:
>
> Oleg, Can I treat this as your Ack for this patch?

Not sure I can ack the change in this area ;) But yes, personally
I think this is the most simple fix for now, whatever we do later.

Acked-by: Oleg Nesterov <oleg@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [tip:x86/fpu] fork: move the real prepare_to_copy() users to arch_dup_task_struct()
  2012-05-10 23:33                 ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Suresh Siddha
                                     ` (3 preceding siblings ...)
  2012-05-11  0:17                   ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Benjamin Herrenschmidt
@ 2012-05-17  0:16                   ` tip-bot for Suresh Siddha
  4 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Suresh Siddha @ 2012-05-17  0:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, torvalds, schwidefsky, cmetcalf, lennox.wu, tony.luck,
	chris, hskinnemoen, linux, suresh.b.siddha, gxt, deller, hpa,
	linux-kernel, yasutake.koichi, liqin.chen, dhowells, benh, rkuo,
	jdike, a-jacquiot, monstr, ralf, vapier, heiko.carstens, oleg,
	tglx, rth, paulus, richard, lethal, msalter, davem, jejb,
	starvik, jonas, hpa, ysato

Commit-ID:  55ccf3fe3f9a3441731aa79cf42a628fc4ecace9
Gitweb:     http://git.kernel.org/tip/55ccf3fe3f9a3441731aa79cf42a628fc4ecace9
Author:     Suresh Siddha <suresh.b.siddha@intel.com>
AuthorDate: Wed, 16 May 2012 15:03:51 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Wed, 16 May 2012 15:16:26 -0700

fork: move the real prepare_to_copy() users to arch_dup_task_struct()

Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
the architectures and the rest following the x86 model of flushing the extended
register state like fpu there.

Remove it and use the arch_dup_task_struct() instead.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.com
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/alpha/include/asm/processor.h      |    3 ---
 arch/arm/include/asm/processor.h        |    3 ---
 arch/avr32/include/asm/processor.h      |    3 ---
 arch/blackfin/include/asm/processor.h   |    2 --
 arch/c6x/include/asm/processor.h        |    3 ---
 arch/cris/include/asm/processor.h       |    4 ----
 arch/frv/include/asm/processor.h        |    2 --
 arch/frv/kernel/process.c               |   11 -----------
 arch/h8300/include/asm/processor.h      |    2 --
 arch/hexagon/include/asm/processor.h    |    7 -------
 arch/ia64/include/asm/processor.h       |    3 ---
 arch/m32r/include/asm/processor.h       |    2 --
 arch/m68k/include/asm/processor.h       |    3 ---
 arch/microblaze/include/asm/processor.h |    1 -
 arch/mips/include/asm/processor.h       |    3 ---
 arch/mn10300/include/asm/processor.h    |    3 ---
 arch/mn10300/kernel/process.c           |   10 ++++++----
 arch/openrisc/include/asm/processor.h   |    4 ----
 arch/parisc/include/asm/processor.h     |    3 ---
 arch/powerpc/include/asm/processor.h    |    3 ---
 arch/powerpc/kernel/process.c           |   19 +++++++++++--------
 arch/s390/include/asm/processor.h       |    3 ---
 arch/score/include/asm/processor.h      |    1 -
 arch/sh/include/asm/processor_32.h      |    3 ---
 arch/sh/include/asm/processor_64.h      |    1 -
 arch/sh/kernel/process.c                |    7 +++++++
 arch/sh/kernel/process_32.c             |    9 ---------
 arch/sparc/include/asm/processor_32.h   |    3 ---
 arch/sparc/include/asm/processor_64.h   |    3 ---
 arch/tile/include/asm/processor.h       |    3 ---
 arch/um/include/asm/processor-generic.h |    5 -----
 arch/unicore32/include/asm/processor.h  |    3 ---
 arch/x86/include/asm/processor.h        |    3 ---
 arch/x86/kernel/process.c               |    6 ++++++
 arch/x86/kernel/process_32.c            |    9 ---------
 arch/x86/kernel/process_64.c            |    9 ---------
 arch/xtensa/include/asm/processor.h     |    3 ---
 arch/xtensa/kernel/process.c            |    9 ++++++---
 kernel/fork.c                           |    2 --
 39 files changed, 36 insertions(+), 140 deletions(-)

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index 94afe58..e37b887 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -49,9 +49,6 @@ extern void start_thread(struct pt_regs *, unsigned long, unsigned long);
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 /* Create a kernel thread without removing it from tasklists.  */
 extern long kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 5ac8d3d..6354224 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -77,9 +77,6 @@ struct task_struct;
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 unsigned long get_wchan(struct task_struct *p);
 
 #if __LINUX_ARM_ARCH__ == 6 || defined(CONFIG_ARM_ERRATA_754327)
diff --git a/arch/avr32/include/asm/processor.h b/arch/avr32/include/asm/processor.h
index 108502b..87d8bac 100644
--- a/arch/avr32/include/asm/processor.h
+++ b/arch/avr32/include/asm/processor.h
@@ -145,9 +145,6 @@ extern void release_thread(struct task_struct *);
 /* Create a kernel thread without removing it from tasklists */
 extern int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk) do { } while(0)
-
 /* Return saved PC of a blocked thread */
 #define thread_saved_pc(tsk)    ((tsk)->thread.cpu_context.pc)
 
diff --git a/arch/blackfin/include/asm/processor.h b/arch/blackfin/include/asm/processor.h
index 8af7772..4ef7cfe 100644
--- a/arch/blackfin/include/asm/processor.h
+++ b/arch/blackfin/include/asm/processor.h
@@ -75,8 +75,6 @@ static inline void release_thread(struct task_struct *dead_task)
 {
 }
 
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern int kernel_thread(int (*fn) (void *), void *arg, unsigned long flags);
 
 /*
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 3ff7fab..c50af7e 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -92,9 +92,6 @@ static inline void release_thread(struct task_struct *dead_task)
 {
 }
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 #define copy_segments(tsk, mm)		do { } while (0)
diff --git a/arch/cris/include/asm/processor.h b/arch/cris/include/asm/processor.h
index 4210d72..37f522f 100644
--- a/arch/cris/include/asm/processor.h
+++ b/arch/cris/include/asm/processor.h
@@ -50,10 +50,6 @@ struct task_struct;
 #define task_pt_regs(task) user_regs(task_thread_info(task))
 #define current_regs() task_pt_regs(current)
 
-static inline void prepare_to_copy(struct task_struct *tsk)
-{
-}
-
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 unsigned long get_wchan(struct task_struct *p);
diff --git a/arch/frv/include/asm/processor.h b/arch/frv/include/asm/processor.h
index 81c2e27..9c76817 100644
--- a/arch/frv/include/asm/processor.h
+++ b/arch/frv/include/asm/processor.h
@@ -103,8 +103,6 @@ do {							\
 	__frame->sp	= (_usp);			\
 } while(0)
 
-extern void prepare_to_copy(struct task_struct *tsk);
-
 /* Free all resources held by a thread. */
 static inline void release_thread(struct task_struct *dead_task)
 {
diff --git a/arch/frv/kernel/process.c b/arch/frv/kernel/process.c
index d4de48b..9f3dfad 100644
--- a/arch/frv/kernel/process.c
+++ b/arch/frv/kernel/process.c
@@ -180,17 +180,6 @@ asmlinkage int sys_clone(unsigned long clone_flags, unsigned long newsp,
 	return do_fork(clone_flags, newsp, __frame, 0, parent_tidptr, child_tidptr);
 } /* end sys_clone() */
 
-/*****************************************************************************/
-/*
- * This gets called before we allocate a new thread and copy
- * the current task into it.
- */
-void prepare_to_copy(struct task_struct *tsk)
-{
-	//unlazy_fpu(tsk);
-} /* end prepare_to_copy() */
-
-/*****************************************************************************/
 /*
  * set up the kernel stack and exception frames for a new process
  */
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 61fabf1..4c9f6f8 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -109,8 +109,6 @@ static inline void release_thread(struct task_struct *dead_task)
 
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
-#define prepare_to_copy(tsk)	do { } while (0)
-
 /*
  * Free current thread data structures etc..
  */
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index 20c5dda..e8ea459 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -59,13 +59,6 @@ struct thread_struct {
 #define cpu_relax() __vmyield()
 
 /*
- * "Unlazying all lazy status" occurs here.
- */
-static inline void prepare_to_copy(struct task_struct *tsk)
-{
-}
-
-/*
  * Decides where the kernel will search for a free chunk of vm space during
  * mmaps.
  * See also arch_get_unmapped_area.
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 483f6c6..efcca1b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -343,9 +343,6 @@ struct task_struct;
  */
 #define release_thread(dead_task)
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 /*
  * This is the mechanism for creating a new kernel thread.
  *
diff --git a/arch/m32r/include/asm/processor.h b/arch/m32r/include/asm/processor.h
index e1f46d7..da17253 100644
--- a/arch/m32r/include/asm/processor.h
+++ b/arch/m32r/include/asm/processor.h
@@ -118,8 +118,6 @@ struct mm_struct;
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-#define prepare_to_copy(tsk)	do { } while (0)
-
 /*
  * create a kernel thread without removing it from tasklists
  */
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 46460fa..f17c42a 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -153,9 +153,6 @@ static inline void release_thread(struct task_struct *dead_task)
 {
 }
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 /*
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index bffb545..af2bb96 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -23,7 +23,6 @@ extern const struct seq_operations cpuinfo_op;
 
 # define cpu_relax()		barrier()
 # define cpu_sleep()		do {} while (0)
-# define prepare_to_copy(tsk)	do {} while (0)
 
 #define task_pt_regs(tsk) \
 		(((struct pt_regs *)(THREAD_SIZE + task_stack_page(tsk))) - 1)
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 20e9dcf..5e33fab 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -310,9 +310,6 @@ struct task_struct;
 /* Free all resources held by a thread. */
 #define release_thread(thread) do { } while(0)
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 extern unsigned long thread_saved_pc(struct task_struct *tsk);
diff --git a/arch/mn10300/include/asm/processor.h b/arch/mn10300/include/asm/processor.h
index f7b3c9a..247928c 100644
--- a/arch/mn10300/include/asm/processor.h
+++ b/arch/mn10300/include/asm/processor.h
@@ -139,9 +139,6 @@ static inline void start_thread(struct pt_regs *regs,
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-extern void prepare_to_copy(struct task_struct *tsk);
-
 /*
  * create a kernel thread without removing it from tasklists
  */
diff --git a/arch/mn10300/kernel/process.c b/arch/mn10300/kernel/process.c
index 14707f2..7dab0cd 100644
--- a/arch/mn10300/kernel/process.c
+++ b/arch/mn10300/kernel/process.c
@@ -208,12 +208,14 @@ void copy_segments(struct task_struct *p, struct mm_struct *new_mm)
 }
 
 /*
- * this gets called before we allocate a new thread and copy the current task
- * into it so that we can store lazy state into memory
+ * this gets called so that we can store lazy state into memory and copy the
+ * current task into the new thread.
  */
-void prepare_to_copy(struct task_struct *tsk)
+int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
-	unlazy_fpu(tsk);
+	unlazy_fpu(src);
+	*dst = *src;
+	return 0;
 }
 
 /*
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index f7516fa..30462f1 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -72,10 +72,6 @@ struct thread_struct {
 #define task_pt_regs(task) user_regs(task_thread_info(task))
 #define current_regs() user_regs(current_thread_info())
 
-extern inline void prepare_to_copy(struct task_struct *tsk)
-{
-}
-
 #define INIT_SP         (sizeof(init_stack) + (unsigned long) &init_stack)
 
 #define INIT_THREAD  { }
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index acdf4ca..0e8b7b8 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -328,9 +328,6 @@ struct mm_struct;
 extern void release_thread(struct task_struct *);
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern void map_hpux_gateway_page(struct task_struct *tsk, struct mm_struct *mm);
 
 extern unsigned long get_wchan(struct task_struct *p);
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 8e2d037..854f899 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -74,9 +74,6 @@ struct task_struct;
 void start_thread(struct pt_regs *regs, unsigned long fdptr, unsigned long sp);
 void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-extern void prepare_to_copy(struct task_struct *tsk);
-
 /* Create a new kernel thread. */
 extern long kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 4937c96..bc129f2 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -711,18 +711,21 @@ release_thread(struct task_struct *t)
 }
 
 /*
- * This gets called before we allocate a new thread and copy
- * the current task into it.
+ * this gets called so that we can store coprocessor state into memory and
+ * copy the current task into the new thread.
  */
-void prepare_to_copy(struct task_struct *tsk)
+int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
-	flush_fp_to_thread(current);
-	flush_altivec_to_thread(current);
-	flush_vsx_to_thread(current);
-	flush_spe_to_thread(current);
+	flush_fp_to_thread(src);
+	flush_altivec_to_thread(src);
+	flush_vsx_to_thread(src);
+	flush_spe_to_thread(src);
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
-	flush_ptrace_hw_breakpoint(tsk);
+	flush_ptrace_hw_breakpoint(src);
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
+
+	*dst = *src;
+	return 0;
 }
 
 /*
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index d499b30..6cbf313 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -141,9 +141,6 @@ struct seq_file;
 extern void release_thread(struct task_struct *);
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 /*
  * Return saved PC of a blocked thread.
  */
diff --git a/arch/score/include/asm/processor.h b/arch/score/include/asm/processor.h
index 7e22f21..ab3aceb 100644
--- a/arch/score/include/asm/processor.h
+++ b/arch/score/include/asm/processor.h
@@ -26,7 +26,6 @@ extern unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()		barrier()
 #define release_thread(thread)	do {} while (0)
-#define prepare_to_copy(tsk)	do {} while (0)
 
 /*
  * User space process size: 2GB. This is hardcoded into a few places,
diff --git a/arch/sh/include/asm/processor_32.h b/arch/sh/include/asm/processor_32.h
index 900f8d7..b6311fd 100644
--- a/arch/sh/include/asm/processor_32.h
+++ b/arch/sh/include/asm/processor_32.h
@@ -126,9 +126,6 @@ extern void start_thread(struct pt_regs *regs, unsigned long new_pc, unsigned lo
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-void prepare_to_copy(struct task_struct *tsk);
-
 /*
  * create a kernel thread without removing it from tasklists
  */
diff --git a/arch/sh/include/asm/processor_64.h b/arch/sh/include/asm/processor_64.h
index e25c4c7..fe99afe 100644
--- a/arch/sh/include/asm/processor_64.h
+++ b/arch/sh/include/asm/processor_64.h
@@ -172,7 +172,6 @@ extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 #define copy_segments(p, mm)	do { } while (0)
 #define release_segments(mm)	do { } while (0)
 #define forget_segments()	do { } while (0)
-#define prepare_to_copy(tsk)	do { } while (0)
 /*
  * FPU lazy state save handling.
  */
diff --git a/arch/sh/kernel/process.c b/arch/sh/kernel/process.c
index 325f98b..2bde59e 100644
--- a/arch/sh/kernel/process.c
+++ b/arch/sh/kernel/process.c
@@ -6,8 +6,15 @@
 struct kmem_cache *task_xstate_cachep = NULL;
 unsigned int xstate_size;
 
+/*
+ * this gets called so that we can store lazy state into memory and copy the
+ * current task into the new thread.
+ */
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
+#ifdef CONFIG_SUPERH32
+	unlazy_fpu(src, task_pt_regs(src));
+#endif
 	*dst = *src;
 
 	if (src->thread.xstate) {
diff --git a/arch/sh/kernel/process_32.c b/arch/sh/kernel/process_32.c
index 94273aa..dee5961 100644
--- a/arch/sh/kernel/process_32.c
+++ b/arch/sh/kernel/process_32.c
@@ -155,15 +155,6 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpu)
 }
 EXPORT_SYMBOL(dump_fpu);
 
-/*
- * This gets called before we allocate a new thread and copy
- * the current task into it.
- */
-void prepare_to_copy(struct task_struct *tsk)
-{
-	unlazy_fpu(tsk, task_pt_regs(tsk));
-}
-
 asmlinkage void ret_from_fork(void);
 
 int copy_thread(unsigned long clone_flags, unsigned long usp,
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 09521c6..c9c760f 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -109,9 +109,6 @@ static inline void start_thread(struct pt_regs * regs, unsigned long pc,
 #define release_thread(tsk)		do { } while(0)
 extern pid_t kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern unsigned long get_wchan(struct task_struct *);
 
 #define task_pt_regs(tsk) ((tsk)->thread.kregs)
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index e713db2..67df5cc 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -186,9 +186,6 @@ do { \
 /* Free all resources held by a thread. */
 #define release_thread(tsk)		do { } while (0)
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern pid_t kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
 extern unsigned long get_wchan(struct task_struct *task);
diff --git a/arch/tile/include/asm/processor.h b/arch/tile/include/asm/processor.h
index 34c1e01..15cd8a4 100644
--- a/arch/tile/include/asm/processor.h
+++ b/arch/tile/include/asm/processor.h
@@ -210,9 +210,6 @@ static inline void release_thread(struct task_struct *dead_task)
 	/* Nothing for now */
 }
 
-/* Prepare to copy thread state - unlazy all lazy status. */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 extern int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 
 extern int do_work_pending(struct pt_regs *regs, u32 flags);
diff --git a/arch/um/include/asm/processor-generic.h b/arch/um/include/asm/processor-generic.h
index 98d01bc..63b7160 100644
--- a/arch/um/include/asm/processor-generic.h
+++ b/arch/um/include/asm/processor-generic.h
@@ -76,11 +76,6 @@ static inline void release_thread(struct task_struct *task)
 
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
-static inline void prepare_to_copy(struct task_struct *tsk)
-{
-}
-
-
 extern unsigned long thread_saved_pc(struct task_struct *t);
 
 static inline void mm_copy_segments(struct mm_struct *from_mm,
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index f0d780a..14382cb 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -68,9 +68,6 @@ struct task_struct;
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy status */
-#define prepare_to_copy(tsk)	do { } while (0)
-
 unsigned long get_wchan(struct task_struct *p);
 
 #define cpu_relax()			barrier()
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 4fa7dcc..97fe043 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -579,9 +579,6 @@ extern int kernel_thread(int (*fn)(void *), void *arg, unsigned long flags);
 /* Free all resources held by a thread. */
 extern void release_thread(struct task_struct *);
 
-/* Prepare to copy thread state - unlazy all lazy state */
-extern void prepare_to_copy(struct task_struct *tsk);
-
 unsigned long get_wchan(struct task_struct *p);
 
 /*
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1d92a5a..b7e1e0e 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -47,10 +47,16 @@ EXPORT_SYMBOL_GPL(idle_notifier_unregister);
 struct kmem_cache *task_xstate_cachep;
 EXPORT_SYMBOL_GPL(task_xstate_cachep);
 
+/*
+ * this gets called so that we can store lazy state into memory and copy the
+ * current task into the new thread.
+ */
 int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
 	int ret;
 
+	unlazy_fpu(src);
+
 	*dst = *src;
 	if (fpu_allocated(&src->thread.fpu)) {
 		memset(&dst->thread.fpu, 0, sizeof(dst->thread.fpu));
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index ae68473..2aa57dc 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -126,15 +126,6 @@ void release_thread(struct task_struct *dead_task)
 	release_vm86_irqs(dead_task);
 }
 
-/*
- * This gets called before we allocate a new thread and copy
- * the current task into it.
- */
-void prepare_to_copy(struct task_struct *tsk)
-{
-	unlazy_fpu(tsk);
-}
-
 int copy_thread(unsigned long clone_flags, unsigned long sp,
 	unsigned long unused,
 	struct task_struct *p, struct pt_regs *regs)
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 43d8b48..c4c0645 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -145,15 +145,6 @@ static inline u32 read_32bit_tls(struct task_struct *t, int tls)
 	return get_desc_base(&t->thread.tls_array[tls]);
 }
 
-/*
- * This gets called before we allocate a new thread and copy
- * the current task into it.
- */
-void prepare_to_copy(struct task_struct *tsk)
-{
-	unlazy_fpu(tsk);
-}
-
 int copy_thread(unsigned long clone_flags, unsigned long sp,
 		unsigned long unused,
 	struct task_struct *p, struct pt_regs *regs)
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index 3acb26e..5c371d8 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -168,9 +168,6 @@ struct mm_struct;
 /* Free all resources held by a thread. */
 #define release_thread(thread) do { } while(0)
 
-/* Prepare to copy thread state - unlazy all lazy status */
-extern void prepare_to_copy(struct task_struct*);
-
 /* Create a kernel thread without removing it from tasklists */
 extern int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
 
diff --git a/arch/xtensa/kernel/process.c b/arch/xtensa/kernel/process.c
index 6a2d6ed..9b306e5 100644
--- a/arch/xtensa/kernel/process.c
+++ b/arch/xtensa/kernel/process.c
@@ -140,13 +140,16 @@ void flush_thread(void)
 }
 
 /*
- * This is called before the thread is copied. 
+ * this gets called so that we can store coprocessor state into memory and
+ * copy the current task into the new thread.
  */
-void prepare_to_copy(struct task_struct *tsk)
+int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
 {
 #if XTENSA_HAVE_COPROCESSORS
-	coprocessor_flush_all(task_thread_info(tsk));
+	coprocessor_flush_all(task_thread_info(src));
 #endif
+	*dst = *src;
+	return 0;
 }
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index 687a15d..7aed746 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -261,8 +261,6 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
 	int node = tsk_fork_get_node(orig);
 	int err;
 
-	prepare_to_copy(orig);
-
 	tsk = alloc_task_struct_node(node);
 	if (!tsk)
 		return NULL;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:x86/fpu] coredump: ensure the fpu state is flushed for proper multi-threaded core dump
  2012-05-10 23:33                   ` [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump Suresh Siddha
  2012-05-11 16:51                     ` Oleg Nesterov
@ 2012-05-17  0:17                     ` tip-bot for Suresh Siddha
  1 sibling, 0 replies; 30+ messages in thread
From: tip-bot for Suresh Siddha @ 2012-05-17  0:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, suresh, suresh.b.siddha, oleg, tglx, hpa

Commit-ID:  11aeca0b3a083a457f5c34fe8c677d5e86a0c6b3
Gitweb:     http://git.kernel.org/tip/11aeca0b3a083a457f5c34fe8c677d5e86a0c6b3
Author:     Suresh Siddha <suresh.b.siddha@intel.com>
AuthorDate: Wed, 16 May 2012 15:03:52 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Wed, 16 May 2012 15:16:48 -0700

coredump: ensure the fpu state is flushed for proper multi-threaded core dump

Nalluru reported hitting the BUG_ON(__thread_has_fpu(tsk)) in
arch/x86/kernel/xsave.c:__sanitize_i387_state() during the coredump
of a multi-threaded application.

A look at the exit seqeuence shows that other threads can still be on the
runqueue potentially at the below shown exit_mm() code snippet:

		if (atomic_dec_and_test(&core_state->nr_threads))
			complete(&core_state->startup);

===> other threads can still be active here, but we notify the thread
===> dumping core to wakeup from the coredump_wait() after the last thread
===> joins this point. Core dumping thread will continue dumping
===> all the threads state to the core file.

		for (;;) {
			set_task_state(tsk, TASK_UNINTERRUPTIBLE);
			if (!self.task) /* see coredump_finish() */
				break;
			schedule();
		}

As some of those threads are on the runqueue and didn't call schedule() yet,
their fpu state is still active in the live registers and the thread
proceeding with the coredump will hit the above mentioned BUG_ON while
trying to dump other threads fpustate to the coredump file.

BUG_ON() in arch/x86/kernel/xsave.c:__sanitize_i387_state() is
in the code paths for processors supporting xsaveopt. With or without
xsaveopt, multi-threaded coredump is broken and maynot contain
the correct fpustate at the time of exit.

In coredump_wait(), wait for all the threads to be come inactive, so
that we are sure all the extended register state is flushed to
the memory, so that it can be reliably copied to the core file.

Reported-by: Suresh Nalluru <suresh@aristanetworks.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-2-git-send-email-suresh.b.siddha@intel.com
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 fs/exec.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index b1fd202..8e2ddeb 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1930,8 +1930,21 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
 		core_waiters = zap_threads(tsk, mm, core_state, exit_code);
 	up_write(&mm->mmap_sem);
 
-	if (core_waiters > 0)
+	if (core_waiters > 0) {
+		struct core_thread *ptr;
+
 		wait_for_completion(&core_state->startup);
+		/*
+		 * Wait for all the threads to become inactive, so that
+		 * all the thread context (extended register state, like
+		 * fpu etc) gets copied to the memory.
+		 */
+		ptr = core_state->dumper.next;
+		while (ptr != NULL) {
+			wait_task_inactive(ptr->task, 0);
+			ptr = ptr->next;
+		}
+	}
 
 	return core_waiters;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:x86/fpu] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()
  2012-05-10 23:33                   ` [PATCH v2 3/4] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
@ 2012-05-17  0:18                     ` tip-bot for Suresh Siddha
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Suresh Siddha @ 2012-05-17  0:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, suresh.b.siddha, oleg, tglx, hpa

Commit-ID:  d75f1b391f5ef73016d14bc6f7e4725820ebaa5b
Gitweb:     http://git.kernel.org/tip/d75f1b391f5ef73016d14bc6f7e4725820ebaa5b
Author:     Suresh Siddha <suresh.b.siddha@intel.com>
AuthorDate: Wed, 16 May 2012 15:03:53 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Wed, 16 May 2012 15:17:17 -0700

x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state()

Code paths like fork(), exit() and signal handling flush the fpu
state explicitly to the structures in memory.

BUG_ON() in __sanitize_i387_state() is checking that the fpu state
is not live any more. But for preempt kernels, task can be scheduled
out and in at any place and the preload_fpu logic during context switch
can make the fpu registers live again.

For example, consider a 64-bit Task which uses fpu frequently and as such
you will find its fpu_counter mostly non-zero. During its time slice, kernel
used fpu by doing kernel_fpu_begin/kernel_fpu_end(). After this, in the same
scheduling slice, task-A got a signal to handle. Then during the signal
setup path we got preempted when we are just before the sanitize_i387_state()
in arch/x86/kernel/xsave.c:save_i387_xstate(). And when we come back we
will have the fpu registers live that can hit the bug_on.

Similarly during core dump, other threads can context-switch in and out
(because of spurious wakeups while waiting for the coredump to finish in
 kernel/exit.c:exit_mm()) and the main thread dumping core can run into this
bug when it finds some other thread with its fpu registers live on some other cpu.

So remove the paranoid check for now, even though it caught a bug in the
multi-threaded core dump case (fixed in the previous patch).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-3-git-send-email-suresh.b.siddha@intel.com
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/xsave.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index e62728e..bd18149 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -48,8 +48,6 @@ void __sanitize_i387_state(struct task_struct *tsk)
 	if (!fx)
 		return;
 
-	BUG_ON(__thread_has_fpu(tsk));
-
 	xstate_bv = tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv;
 
 	/*

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [tip:x86/fpu] x86, fpu: drop the fpu state during thread exit
  2012-05-10 23:33                   ` [PATCH v2 4/4] x86, fpu: drop the fpu state during thread exit Suresh Siddha
@ 2012-05-17  0:19                     ` tip-bot for Suresh Siddha
  0 siblings, 0 replies; 30+ messages in thread
From: tip-bot for Suresh Siddha @ 2012-05-17  0:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, suresh.b.siddha, oleg, tglx, hpa

Commit-ID:  1dcc8d7ba235a316a056f993e88f0d18b92c60d9
Gitweb:     http://git.kernel.org/tip/1dcc8d7ba235a316a056f993e88f0d18b92c60d9
Author:     Suresh Siddha <suresh.b.siddha@intel.com>
AuthorDate: Wed, 16 May 2012 15:03:54 -0700
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Wed, 16 May 2012 15:20:59 -0700

x86, fpu: drop the fpu state during thread exit

There is no need to save any active fpu state to the task structure
memory if the task is dead. Just drop the state instead.

For example, this saved some 1770 xsave's during the system boot
of a two socket Xeon system.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-4-git-send-email-suresh.b.siddha@intel.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
 arch/x86/kernel/process.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b7e1e0e..1219fe2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -87,6 +87,16 @@ void arch_task_cache_init(void)
 				  SLAB_PANIC | SLAB_NOTRACK, NULL);
 }
 
+static inline void drop_fpu(struct task_struct *tsk)
+{
+	/*
+	 * Forget coprocessor state..
+	 */
+	tsk->fpu_counter = 0;
+	clear_fpu(tsk);
+	clear_used_math();
+}
+
 /*
  * Free current thread data structures etc..
  */
@@ -109,6 +119,8 @@ void exit_thread(void)
 		put_cpu();
 		kfree(bp);
 	}
+
+	drop_fpu(me);
 }
 
 void show_regs(struct pt_regs *regs)
@@ -149,12 +161,7 @@ void flush_thread(void)
 
 	flush_ptrace_hw_breakpoint(tsk);
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
-	/*
-	 * Forget coprocessor state..
-	 */
-	tsk->fpu_counter = 0;
-	clear_fpu(tsk);
-	clear_used_math();
+	drop_fpu(tsk);
 }
 
 static void hard_disable_TSC(void)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2012-05-17  0:19 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-07 19:07 [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump Suresh Siddha
2012-05-07 19:07 ` [PATCH 2/2] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
2012-05-07 19:15 ` [PATCH 1/2] coredump: flush the fpu exit state for proper multi-threaded core dump Linus Torvalds
2012-05-07 20:09   ` Suresh Siddha
2012-05-08 23:18     ` Suresh Siddha
2012-05-08 23:18       ` [PATCH 1/3] " Suresh Siddha
2012-05-09 21:05         ` Oleg Nesterov
2012-05-09 21:32           ` Suresh Siddha
2012-05-10 16:55             ` Oleg Nesterov
2012-05-10 17:04               ` Linus Torvalds
2012-05-10 23:33                 ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Suresh Siddha
2012-05-10 23:33                   ` [PATCH v2 2/4] coredump: ensure the fpu state is flushed for proper multi-threaded core dump Suresh Siddha
2012-05-11 16:51                     ` Oleg Nesterov
2012-05-11 19:05                       ` Suresh Siddha
2012-05-13 16:11                         ` Oleg Nesterov
2012-05-15 18:03                           ` Suresh Siddha
2012-05-15 18:55                             ` Oleg Nesterov
2012-05-17  0:17                     ` [tip:x86/fpu] " tip-bot for Suresh Siddha
2012-05-10 23:33                   ` [PATCH v2 3/4] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
2012-05-17  0:18                     ` [tip:x86/fpu] " tip-bot for Suresh Siddha
2012-05-10 23:33                   ` [PATCH v2 4/4] x86, fpu: drop the fpu state during thread exit Suresh Siddha
2012-05-17  0:19                     ` [tip:x86/fpu] " tip-bot for Suresh Siddha
2012-05-11  0:17                   ` [PATCH v2 1/4] fork: move the real prepare_to_copy() users to arch_dup_task_struct() Benjamin Herrenschmidt
2012-05-17  0:16                   ` [tip:x86/fpu] " tip-bot for Suresh Siddha
2012-05-10 23:48                 ` [PATCH 1/3] coredump: flush the fpu exit state for proper multi-threaded core dump Suresh Siddha
2012-05-08 23:18       ` [PATCH 2/3] x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() Suresh Siddha
2012-05-09 20:30         ` Oleg Nesterov
2012-05-09 21:18           ` Suresh Siddha
2012-05-10 16:36             ` Oleg Nesterov
2012-05-08 23:18       ` [PATCH 3/3] x86, fpu: clear the fpu state during thread exit Suresh Siddha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).