linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vincent Palatin <vpalatin@chromium.org>
To: Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org, Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Jarkko Sakkinen <jarkko.sakkinen@intel.com>,
	Duncan Laurie <dlaurie@chromium.org>,
	Olof Johansson <olofj@chromium.org>
Subject: issue with x86 FPU state after suspend to ram
Date: Fri, 30 Nov 2012 10:52:02 -0800	[thread overview]
Message-ID: <1354301523-5252-1-git-send-email-vpalatin@chromium.org> (raw)

Hi,

On a 4-core Ivybridge platform, when doing a lot of suspend-to-ram/resume
cycles, we were observing processes randomly killed by a SIGFPE.
When dumping the FPU registers state on the SIGFPE (usually a floating stack
underflow/overflow on a floating point arithmetic operation), the FPU registers
looks empty or at least corrupted which was more or less impossible with
respect to the disassembled floating point code.

After doing more tracing, in the faulty case, the process seems to be keeping
FPU ownership over a secondary CPU unplug/re-plug triggered by the suspend.
Then it's doing a lazy restore of its FPU context (ie just using the current
FPU hardware registers as he is the owner) instead of writing them back
to the hardware from the version previously saved in the task context,
despite the fact the whole FPU hardware state has been lost.

Just invalidating the "fpu_owner_task" when disabling a secondary CPU seems
to solve my issue (it's already reset for the primary CPU).

By the way, when FPU the lazy restore patch was discussed back in february,
Ingo commented (in http://permalink.gmane.org/gmane.linux.kernel/1255423) :
"
I guess the CPU hotplug case deserves a comment in the code: CPU
hotplug + replug of the same (but meanwhile reset) CPU is safe
because fpu_owner_task[cpu] gets reset to NULL.
"
That contradicts my previous observation, so maybe I have totally overlooked
something in this mechanism.
Can you comment ?

I'm still putting my patch proposal in this thread.
The issue seems to exist since 3.4 after the FPU lazy restore was actually
implemented by commit 7e16838d "i387: support lazy restore of FPU state".
But the issue is mainly visible on 3.4 and 3.6 since on tip of tree, it is
hidden by the eager fpu implementation for platforms with xsave support,
but it still happens with eagerfpu=off.

To apply this change to 3.4, "this_cpu_write" needs to be replaced by
percpu_write.

--
Vincent


             reply	other threads:[~2012-11-30 18:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-30 18:52 Vincent Palatin [this message]
2012-11-30 18:52 ` [PATCH] x86, fpu: avoid FPU lazy restore after suspend Vincent Palatin
2012-11-30 18:57   ` H. Peter Anvin
2012-11-30 19:25   ` Linus Torvalds
2012-11-30 19:38     ` H. Peter Anvin
2012-11-30 19:41       ` Linus Torvalds
2012-11-30 19:51         ` H. Peter Anvin
     [not found]           ` <CAP_ceTxmMhQeDi=x9HmYke85hKMg3_YhbXSnfDC12rOcocQJpA@mail.gmail.com>
2012-11-30 19:55             ` H. Peter Anvin
2012-11-30 21:45               ` Vincent Palatin
2012-11-30 19:52         ` [PATCH v2] " Vincent Palatin
2012-11-30 20:15           ` [PATCH v3] " Vincent Palatin
2012-11-30 22:10             ` [tip:x86/urgent] x86, fpu: Avoid " tip-bot for Vincent Palatin
2012-11-30 19:26   ` tip-bot for Vincent Palatin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1354301523-5252-1-git-send-email-vpalatin@chromium.org \
    --to=vpalatin@chromium.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=dlaurie@chromium.org \
    --cc=hpa@zytor.com \
    --cc=jarkko.sakkinen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=olofj@chromium.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).