From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Simon Gaiser <simon@invisiblethingslab.com>,
Jan Beulich <JBeulich@suse.com>,
xen-devel <xen-devel@lists.xenproject.org>
Cc: Juergen Gross <jgross@suse.com>
Subject: Re: [PATCH 0/3] x86: S3 resume adjustments
Date: Sun, 15 Apr 2018 18:34:02 +0100 [thread overview]
Message-ID: <7ac35f02-c3e3-9572-c41e-9c0fa4210afb@citrix.com> (raw)
In-Reply-To: <72e260c1-1804-1526-2b94-b7dda32313a3@invisiblethingslab.com>
On 15/04/18 16:52, Simon Gaiser wrote:
> Andrew Cooper:
>> On 14/04/18 06:49, Simon Gaiser wrote:
>>> Jan Beulich:
>>>> 1: correct ordering of operations during S3 resume
>>>> 2: suppress BTI mitigations around S3 suspend/resume
>>>> 3: check feature flags after resume
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> Simon, could you give this a try please?
>>> Backported to 4.8 it works fine with the two fixes I sent earlier.
>>>
>>> I now also tried staging. Resume is broken even without IBRS/IBPB. It
>>> panics about a double fault somewhere after it starts to enable the
>>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
>>> could test the patches anyway. With them it gets again to the point
>>> where it double faults. So the patches are most likely fine.
>>>
>>> I didn't really looked yet at the cause of the double fault.
>> Do you at least have the crash log from the attempt?
> Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a
> Debian sid:
I can't find that object. I presume this isn't an upstream tree?
>
> (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI
> (XEN) CPU0 CMCI LVT vector (0xf2) already installed
> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) Enabling non-boot CPUs ...
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
Bad dom0. It shouldn't be playing with APIC_BASE at all, but I guess
this means I can't fix the hypervisor behaviour to throw #GP back at a
PV guest.
> (XEN) *** DOUBLE FAULT ***
> (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7
Can you disassemble the binary and find out where this is? On current
staging, handle_exception+0x9c is in the middle of
SPEC_CTRL_ENTRY_FROM_INTR but this might not be the case for you.
> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor
> (XEN) rax: ffffc90040cd4068 rbx: 0000000000000000 rcx: 000000000000000a
> (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: 0000000000000000
> (XEN) rbp: 000036ffbf32bf77 rsp: ffffc90040cd4000 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: ffffc90040cd7fff
> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426e0
> (XEN) cr3: 000000022200a000 cr2: ffffc90040cd3ff8
> (XEN) fsb: 0000000000000000 gsb: ffff88021e6c0000 gss: 0000000000000000
> (XEN) ds: 002b es: 002b fs: 8a00 gs: 0010 ss: e010 cs: e008
> (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000
> (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4000, tss.rsp0=ffff8300cec8ffa0
Given the %rsp and %cr2 values, it looks like we have a bad %rsp over a
region which isn't mapped, tried to push a value, got #PF, tried to
invoke the #PF exception handler which faulted again, and escalated to
#DF which followed the TSS and moved back to reality.
The only way to come in with stack pointers other than TSS.RSP0 is via
syscall and sysenter. SYSENTER_ESP should be identical to TSS.RSP0
--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -257,6 +257,13 @@ void do_double_fault(struct cpu_user_regs *regs)
_show_registers(regs, crs, CTXT_hypervisor, NULL);
show_stack_overflow(cpu, regs);
+ {
+ uint64_t val;
+
+ rdmsrl(MSR_IA32_SYSENTER_ESP, val);
+ printk("*** SYSENTER_ESP: %p\n", _p(val));
+ }
+
panic("DOUBLE FAULT -- system shutdown");
}
so this bit of debugging should help track things down. If not, then
we've probably got an issue (re)writing the syscall trampolines.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2018-04-15 17:34 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich
2018-04-13 11:56 ` [PATCH 1/3] x86: correct ordering of operations during S3 resume Jan Beulich
2018-04-13 11:57 ` [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume Jan Beulich
2018-04-13 18:25 ` Simon Gaiser
2018-04-13 18:27 ` Andrew Cooper
2018-04-13 18:34 ` Simon Gaiser
2018-04-13 11:58 ` [PATCH 3/3] x86: check feature flags after resume Jan Beulich
2018-04-13 18:29 ` Simon Gaiser
2018-04-13 18:56 ` Simon Gaiser
2018-04-16 10:16 ` Jan Beulich
2018-04-13 12:01 ` [PATCH 0/3] x86: S3 resume adjustments Andrew Cooper
2018-04-16 11:57 ` Juergen Gross
2018-04-14 5:49 ` Simon Gaiser
2018-04-15 13:08 ` Andrew Cooper
2018-04-15 15:52 ` Simon Gaiser
2018-04-15 17:34 ` Andrew Cooper [this message]
2018-04-15 20:15 ` Simon Gaiser
2018-04-16 13:13 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7ac35f02-c3e3-9572-c41e-9c0fa4210afb@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=jgross@suse.com \
--cc=simon@invisiblethingslab.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.