From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Simon Gaiser <simon@invisiblethingslab.com>,
xen-devel@lists.xenproject.org
Cc: Jan Beulich <jbeulich@suse.com>
Subject: Re: Resume from suspend to RAM broken when using early microcode updates
Date: Wed, 11 Apr 2018 13:12:51 +0100 [thread overview]
Message-ID: <f587c412-68a7-4f19-c52a-1bb0c3c1a54f@citrix.com> (raw)
In-Reply-To: <d8da2a1f-b2e8-f8c3-4e33-bc84ac0cc7d4@invisiblethingslab.com>
On 11/04/18 13:01, Simon Gaiser wrote:
> Andrew Cooper:
>> On 11/04/18 12:48, Simon Gaiser wrote:
>>> Hi,
>>>
>>> when I use early microcode loading with the microcode update with the
>>> BTI mitigations, resuming from suspend to RAM is broken.
>>>
>>> Based on added logging to enter_state() (from power.c) it doesn't
>>> survive the local_irq_restore(flags) call (at least a printk() after the
>>> call doesn't output anything on the serial console).
>>>
>>> I guess that some irq handler tries to use IBRS/IBPB. But the microcode
>>> is only loaded later.
>>>
>>> If I simply move the microcode_resume_cpu(0) directly before the
>>> local_irq_restore(flags) everything seems to work fine. But I'm not sure
>>> if this has unintended consequences.
>>>
>>> I tested the above with Xen 4.8.3 from Qubes which includes the BTI and
>>> microcode patches from staging-4.8. AFAICS there are no commits which
>>> changes the affected code or other commits which sound relevant so this
>>> probably affected also all the newer branches.
>> S3 support is a very unloved area of the hypervisor.
>>
>> Yes - we definitely need to get microcode reloaded before interrupts are
>> enabled.
> Do you see any problems with simply moving microcode_resume_cpu(0)
> directly before the local_irq_restore(flags) call? (I'm not familiar
> with the code at all and (early) resume handling sounds like something
> which is easy to break in non obvious ways)
Judging by what is going on, it wants to be between tboot_s3_error() and
the done label.
We only need to restore microcode if we successfully went into S3. The
done and enable_cpu labels are only used by paths which don't need to
restore microcode.
OTOH, you should check the return value and panic if restoration
failed. As you've seen, the system won't survive trying to blindly
continue resuming.
>
>> That said, I would have expected a backtrace complaining about
>> a GP fault if we had hit the use of IBRS/IBPB before the microcode was
>> reloaded.
> Yeah, not sure what's happening here. I don't get any output from after
> local_irq_restore(flags). If you have some ideas for more debug output I
> can easily test it.
In hindsight, I am. We take a #GP fault because of a bad MSR, and at
the head of the exception handler try to use the same bad MSR. It will
repeatedly fault until hitting a guard page (or other read-only page),
at which point we take a double fault, and suffer a #GP yet again.
Taking a #DF will reset the stack to a moderately sane value, and the
system will livelock taking faults.
This is an unfortunate consequence of having $MAGIC in the exception
handlers.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2018-04-11 12:12 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-11 11:48 Resume from suspend to RAM broken when using early microcode updates Simon Gaiser
2018-04-11 11:51 ` Andrew Cooper
2018-04-11 12:01 ` Simon Gaiser
2018-04-11 12:11 ` Jan Beulich
2018-04-11 12:17 ` Jan Beulich
2018-04-11 12:46 ` Simon Gaiser
2018-04-11 13:45 ` Jan Beulich
2018-04-11 20:14 ` [PATCH 1/2] x86/microcode: Indicate "not found" in rc of microcode_resume_cpu() Simon Gaiser
2018-04-11 20:14 ` [PATCH 2/2] x86: correct ordering of operations during S3 resume Simon Gaiser
2018-04-11 20:21 ` [PATCH v2 " Simon Gaiser
[not found] ` <5ACE6EA10200005203786DDD@prv1-mh.provo.novell.com>
2018-04-12 7:12 ` Jan Beulich
2018-04-12 6:56 ` [PATCH 1/2] x86/microcode: Indicate "not found" in rc of microcode_resume_cpu() Jan Beulich
2018-04-11 12:12 ` Andrew Cooper [this message]
2018-04-11 14:49 ` Resume from suspend to RAM broken when using early microcode updates Konrad Rzeszutek Wilk
2018-04-11 15:05 ` Andrew Cooper
2018-04-11 15:32 ` Jan Beulich
2018-04-11 12:04 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f587c412-68a7-4f19-c52a-1bb0c3c1a54f@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=simon@invisiblethingslab.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.