All of lore.kernel.org
 help / color / mirror / Atom feed
* issue with x86 FPU state after suspend to ram
@ 2012-11-30 18:52 Vincent Palatin
  2012-11-30 18:52 ` [PATCH] x86, fpu: avoid FPU lazy restore after suspend Vincent Palatin
  0 siblings, 1 reply; 13+ messages in thread
From: Vincent Palatin @ 2012-11-30 18:52 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, linux-kernel, Linus Torvalds
  Cc: Thomas Gleixner, x86, Peter Zijlstra, Jarkko Sakkinen,
	Duncan Laurie, Olof Johansson

Hi,

On a 4-core Ivybridge platform, when doing a lot of suspend-to-ram/resume
cycles, we were observing processes randomly killed by a SIGFPE.
When dumping the FPU registers state on the SIGFPE (usually a floating stack
underflow/overflow on a floating point arithmetic operation), the FPU registers
looks empty or at least corrupted which was more or less impossible with
respect to the disassembled floating point code.

After doing more tracing, in the faulty case, the process seems to be keeping
FPU ownership over a secondary CPU unplug/re-plug triggered by the suspend.
Then it's doing a lazy restore of its FPU context (ie just using the current
FPU hardware registers as he is the owner) instead of writing them back
to the hardware from the version previously saved in the task context,
despite the fact the whole FPU hardware state has been lost.

Just invalidating the "fpu_owner_task" when disabling a secondary CPU seems
to solve my issue (it's already reset for the primary CPU).

By the way, when FPU the lazy restore patch was discussed back in february,
Ingo commented (in http://permalink.gmane.org/gmane.linux.kernel/1255423) :
"
I guess the CPU hotplug case deserves a comment in the code: CPU
hotplug + replug of the same (but meanwhile reset) CPU is safe
because fpu_owner_task[cpu] gets reset to NULL.
"
That contradicts my previous observation, so maybe I have totally overlooked
something in this mechanism.
Can you comment ?

I'm still putting my patch proposal in this thread.
The issue seems to exist since 3.4 after the FPU lazy restore was actually
implemented by commit 7e16838d "i387: support lazy restore of FPU state".
But the issue is mainly visible on 3.4 and 3.6 since on tip of tree, it is
hidden by the eager fpu implementation for platforms with xsave support,
but it still happens with eagerfpu=off.

To apply this change to 3.4, "this_cpu_write" needs to be replaced by
percpu_write.

--
Vincent


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-11-30 22:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-30 18:52 issue with x86 FPU state after suspend to ram Vincent Palatin
2012-11-30 18:52 ` [PATCH] x86, fpu: avoid FPU lazy restore after suspend Vincent Palatin
2012-11-30 18:57   ` H. Peter Anvin
2012-11-30 19:25   ` Linus Torvalds
2012-11-30 19:38     ` H. Peter Anvin
2012-11-30 19:41       ` Linus Torvalds
2012-11-30 19:51         ` H. Peter Anvin
     [not found]           ` <CAP_ceTxmMhQeDi=x9HmYke85hKMg3_YhbXSnfDC12rOcocQJpA@mail.gmail.com>
2012-11-30 19:55             ` H. Peter Anvin
2012-11-30 21:45               ` Vincent Palatin
2012-11-30 19:52         ` [PATCH v2] " Vincent Palatin
2012-11-30 20:15           ` [PATCH v3] " Vincent Palatin
2012-11-30 22:10             ` [tip:x86/urgent] x86, fpu: Avoid " tip-bot for Vincent Palatin
2012-11-30 19:26   ` tip-bot for Vincent Palatin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.