Re: [PATCH 0/3] x86: S3 resume adjustments

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Simon Gaiser <simon@invisiblethingslab.com>,
	Jan Beulich <JBeulich@suse.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Cc: Juergen Gross <jgross@suse.com>
Subject: Re: [PATCH 0/3] x86: S3 resume adjustments
Date: Sun, 15 Apr 2018 18:34:02 +0100	[thread overview]
Message-ID: <7ac35f02-c3e3-9572-c41e-9c0fa4210afb@citrix.com> (raw)
In-Reply-To: <72e260c1-1804-1526-2b94-b7dda32313a3@invisiblethingslab.com>

On 15/04/18 16:52, Simon Gaiser wrote:
> Andrew Cooper:
>> On 14/04/18 06:49, Simon Gaiser wrote:
>>> Jan Beulich:
>>>> 1: correct ordering of operations during S3 resume
>>>> 2: suppress BTI mitigations around S3 suspend/resume
>>>> 3: check feature flags after resume
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> Simon, could you give this a try please?
>>> Backported to 4.8 it works fine with the two fixes I sent earlier.
>>>
>>> I now also tried staging. Resume is broken even without IBRS/IBPB. It
>>> panics about a double fault somewhere after it starts to enable the
>>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
>>> could test the patches anyway. With them it gets again to the point
>>> where it double faults. So the patches are most likely fine.
>>>
>>> I didn't really looked yet at the cause of the double fault.
>> Do you at least have the crash log from the attempt?
> Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a
> Debian sid:

I can't find that object.  I presume this isn't an upstream tree?

>
> (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI
> (XEN) CPU0 CMCI LVT vector (0xf2) already installed
> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) Enabling non-boot CPUs  ...
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800

Bad dom0.  It shouldn't be playing with APIC_BASE at all, but I guess
this means I can't fix the hypervisor behaviour to throw #GP back at a
PV guest.

> (XEN) *** DOUBLE FAULT ***
> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7

Can you disassemble the binary and find out where this is?  On current
staging, handle_exception+0x9c is in the middle of
SPEC_CTRL_ENTRY_FROM_INTR but this might not be the case for you.

> (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
> (XEN) rax: ffffc90040cd4068   rbx: 0000000000000000   rcx: 000000000000000a
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
> (XEN) rbp: 000036ffbf32bf77   rsp: ffffc90040cd4000   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc90040cd7fff
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426e0
> (XEN) cr3: 000000022200a000   cr2: ffffc90040cd3ff8
> (XEN) fsb: 0000000000000000   gsb: ffff88021e6c0000   gss: 0000000000000000
> (XEN) ds: 002b   es: 002b   fs: 8a00   gs: 0010   ss: e010   cs: e008
> (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000
> (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4000, tss.rsp0=ffff8300cec8ffa0

Given the %rsp and %cr2 values, it looks like we have a bad %rsp over a
region which isn't mapped, tried to push a value, got #PF, tried to
invoke the #PF exception handler which faulted again, and escalated to
#DF which followed the TSS and moved back to reality.

The only way to come in with stack pointers other than TSS.RSP0 is via
syscall and sysenter.  SYSENTER_ESP should be identical to TSS.RSP0

--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -257,6 +257,13 @@ void do_double_fault(struct cpu_user_regs *regs)
     _show_registers(regs, crs, CTXT_hypervisor, NULL);
     show_stack_overflow(cpu, regs);
 
+    {
+        uint64_t val;
+
+        rdmsrl(MSR_IA32_SYSENTER_ESP, val);
+        printk("*** SYSENTER_ESP: %p\n", _p(val));
+    }
+
     panic("DOUBLE FAULT -- system shutdown");
 }
 
so this bit of debugging should help track things down.  If not, then
we've probably got an issue (re)writing the syscall trampolines.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel