All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Simon Gaiser <simon@invisiblethingslab.com>,
	Jan Beulich <JBeulich@suse.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Cc: Juergen Gross <jgross@suse.com>
Subject: Re: [PATCH 0/3] x86: S3 resume adjustments
Date: Sun, 15 Apr 2018 18:34:02 +0100	[thread overview]
Message-ID: <7ac35f02-c3e3-9572-c41e-9c0fa4210afb@citrix.com> (raw)
In-Reply-To: <72e260c1-1804-1526-2b94-b7dda32313a3@invisiblethingslab.com>

On 15/04/18 16:52, Simon Gaiser wrote:
> Andrew Cooper:
>> On 14/04/18 06:49, Simon Gaiser wrote:
>>> Jan Beulich:
>>>> 1: correct ordering of operations during S3 resume
>>>> 2: suppress BTI mitigations around S3 suspend/resume
>>>> 3: check feature flags after resume
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> Simon, could you give this a try please?
>>> Backported to 4.8 it works fine with the two fixes I sent earlier.
>>>
>>> I now also tried staging. Resume is broken even without IBRS/IBPB. It
>>> panics about a double fault somewhere after it starts to enable the
>>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
>>> could test the patches anyway. With them it gets again to the point
>>> where it double faults. So the patches are most likely fine.
>>>
>>> I didn't really looked yet at the cause of the double fault.
>> Do you at least have the crash log from the attempt?
> Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a
> Debian sid:

I can't find that object.  I presume this isn't an upstream tree?

>
> (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI
> (XEN) CPU0 CMCI LVT vector (0xf2) already installed
> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) Enabling non-boot CPUs  ...
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800

Bad dom0.  It shouldn't be playing with APIC_BASE at all, but I guess
this means I can't fix the hypervisor behaviour to throw #GP back at a
PV guest.

> (XEN) *** DOUBLE FAULT ***
> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7

Can you disassemble the binary and find out where this is?  On current
staging, handle_exception+0x9c is in the middle of
SPEC_CTRL_ENTRY_FROM_INTR but this might not be the case for you.

> (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
> (XEN) rax: ffffc90040cd4068   rbx: 0000000000000000   rcx: 000000000000000a
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
> (XEN) rbp: 000036ffbf32bf77   rsp: ffffc90040cd4000   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc90040cd7fff
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426e0
> (XEN) cr3: 000000022200a000   cr2: ffffc90040cd3ff8
> (XEN) fsb: 0000000000000000   gsb: ffff88021e6c0000   gss: 0000000000000000
> (XEN) ds: 002b   es: 002b   fs: 8a00   gs: 0010   ss: e010   cs: e008
> (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000
> (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4000, tss.rsp0=ffff8300cec8ffa0

Given the %rsp and %cr2 values, it looks like we have a bad %rsp over a
region which isn't mapped, tried to push a value, got #PF, tried to
invoke the #PF exception handler which faulted again, and escalated to
#DF which followed the TSS and moved back to reality.

The only way to come in with stack pointers other than TSS.RSP0 is via
syscall and sysenter.  SYSENTER_ESP should be identical to TSS.RSP0

--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -257,6 +257,13 @@ void do_double_fault(struct cpu_user_regs *regs)
     _show_registers(regs, crs, CTXT_hypervisor, NULL);
     show_stack_overflow(cpu, regs);
 
+    {
+        uint64_t val;
+
+        rdmsrl(MSR_IA32_SYSENTER_ESP, val);
+        printk("*** SYSENTER_ESP: %p\n", _p(val));
+    }
+
     panic("DOUBLE FAULT -- system shutdown");
 }
 
so this bit of debugging should help track things down.  If not, then
we've probably got an issue (re)writing the syscall trampolines.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-04-15 17:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich
2018-04-13 11:56 ` [PATCH 1/3] x86: correct ordering of operations during S3 resume Jan Beulich
2018-04-13 11:57 ` [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume Jan Beulich
2018-04-13 18:25   ` Simon Gaiser
2018-04-13 18:27     ` Andrew Cooper
2018-04-13 18:34       ` Simon Gaiser
2018-04-13 11:58 ` [PATCH 3/3] x86: check feature flags after resume Jan Beulich
2018-04-13 18:29   ` Simon Gaiser
2018-04-13 18:56     ` Simon Gaiser
2018-04-16 10:16       ` Jan Beulich
2018-04-13 12:01 ` [PATCH 0/3] x86: S3 resume adjustments Andrew Cooper
2018-04-16 11:57   ` Juergen Gross
2018-04-14  5:49 ` Simon Gaiser
2018-04-15 13:08   ` Andrew Cooper
2018-04-15 15:52     ` Simon Gaiser
2018-04-15 17:34       ` Andrew Cooper [this message]
2018-04-15 20:15         ` Simon Gaiser
2018-04-16 13:13           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ac35f02-c3e3-9572-c41e-9c0fa4210afb@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=jgross@suse.com \
    --cc=simon@invisiblethingslab.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.