[PATCH 0/3] x86: S3 resume adjustments

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] x86: S3 resume adjustments
@ 2018-04-13 11:49 Jan Beulich
  2018-04-13 11:56 ` [PATCH 1/3] x86: correct ordering of operations during S3 resume Jan Beulich
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Jan Beulich @ 2018-04-13 11:49 UTC (permalink / raw)
  To: xen-devel; +Cc: Simon Gaiser, Andrew Cooper, Juergen Gross

1: correct ordering of operations during S3 resume
2: suppress BTI mitigations around S3 suspend/resume
3: check feature flags after resume

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Simon, could you give this a try please?

Thanks, Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] x86: correct ordering of operations during S3 resume
  2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich
@ 2018-04-13 11:56 ` Jan Beulich
  2018-04-13 11:57 ` [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume Jan Beulich
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2018-04-13 11:56 UTC (permalink / raw)
  To: xen-devel; +Cc: Simon Gaiser, Andrew Cooper, Juergen Gross

Microcode loading needs to happen before re-enabling interrupts, in case
only updated microcode allows the use of e.g. the SPEC_{CTRL,CMD} MSRs.
Otoh it doesn't need to happen at all when we didn't suspend in the
first place. It needs to happen before spin_debug_enable() though, as it
acquires a lock and hence would otherwise make
common/spinlock.c:check_lock() unhappy. As micrcode loading can be
pretty verbose, also make sure it only runs after console_end_sync().

cpufreq_add_cpu() doesn't need calling on the only "goto enable_cpu"
path, which sits ahead of cpufreq_del_cpu().

Reported-by: Simon Gaiser <simon@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -203,6 +203,7 @@ static int enter_state(u32 state)
         printk(XENLOG_ERR "Some devices failed to power down.");
         system_state = SYS_STATE_resume;
         device_power_up(error);
+        console_end_sync();
         error = -EIO;
         goto done;
     }
@@ -243,17 +244,19 @@ static int enter_state(u32 state)
     if ( (state == ACPI_STATE_S3) && error )
         tboot_s3_error(error);
 
+    console_end_sync();
+
+    microcode_resume_cpu(0);
+
  done:
     spin_debug_enable();
     local_irq_restore(flags);
-    console_end_sync();
     acpi_sleep_post(state);
     if ( hvm_cpu_up() )
         BUG();
+    cpufreq_add_cpu(0);
 
  enable_cpu:
-    cpufreq_add_cpu(0);
-    microcode_resume_cpu(0);
     rcu_barrier();
     mtrr_aps_sync_begin();
     enable_nonboot_cpus();




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume
  2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich
  2018-04-13 11:56 ` [PATCH 1/3] x86: correct ordering of operations during S3 resume Jan Beulich
@ 2018-04-13 11:57 ` Jan Beulich
  2018-04-13 18:25   ` Simon Gaiser
  2018-04-13 11:58 ` [PATCH 3/3] x86: check feature flags after resume Jan Beulich
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2018-04-13 11:57 UTC (permalink / raw)
  To: xen-devel; +Cc: Simon Gaiser, Andrew Cooper, Juergen Gross

NMI and #MC can occur at any time after S3 resume, yet the MSR_SPEC_CTRL
may become available only once we're reloaded microcode. Make
SPEC_CTRL_ENTRY_FROM_INTR_IST and DO_SPEC_CTRL_EXIT_TO_XEN no-ops for
the critical period of time.

Also set the MSR back to its intended value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -28,6 +28,7 @@
 #include <asm/tboot.h>
 #include <asm/apic.h>
 #include <asm/io_apic.h>
+#include <asm/spec_ctrl.h>
 #include <acpi/cpufreq/cpufreq.h>
 
 uint32_t system_reset_counter = 1;
@@ -163,6 +164,7 @@ static int enter_state(u32 state)
 {
     unsigned long flags;
     int error;
+    struct cpu_info *ci;
     unsigned long cr4;
 
     if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) )
@@ -210,6 +212,10 @@ static int enter_state(u32 state)
     else
         error = 0;
 
+    ci = get_cpu_info();
+    ci->use_shadow_spec_ctrl = 0;
+    ci->bti_ist_info = 0;
+
     ACPI_FLUSH_CPU_CACHE();
 
     switch ( state )
@@ -248,6 +254,11 @@ static int enter_state(u32 state)
 
     microcode_resume_cpu(0);
 
+    ci->bti_ist_info = default_bti_ist_info;
+    asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)
+                  :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0)
+                  : "memory");
+
  done:
     spin_debug_enable();
     local_irq_restore(flags);




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume
  2018-04-13 11:57 ` [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume Jan Beulich
@ 2018-04-13 18:25   ` Simon Gaiser
  2018-04-13 18:27     ` Andrew Cooper
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Gaiser @ 2018-04-13 18:25 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Juergen Gross, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 1865 bytes --]

Jan Beulich:
> NMI and #MC can occur at any time after S3 resume, yet the MSR_SPEC_CTRL
> may become available only once we're reloaded microcode. Make
> SPEC_CTRL_ENTRY_FROM_INTR_IST and DO_SPEC_CTRL_EXIT_TO_XEN no-ops for
> the critical period of time.
> 
> Also set the MSR back to its intended value.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/arch/x86/acpi/power.c
> +++ b/xen/arch/x86/acpi/power.c
> @@ -28,6 +28,7 @@
>  #include <asm/tboot.h>
>  #include <asm/apic.h>
>  #include <asm/io_apic.h>
> +#include <asm/spec_ctrl.h>
>  #include <acpi/cpufreq/cpufreq.h>
>  
>  uint32_t system_reset_counter = 1;
> @@ -163,6 +164,7 @@ static int enter_state(u32 state)
>  {
>      unsigned long flags;
>      int error;
> +    struct cpu_info *ci;
>      unsigned long cr4;
>  
>      if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) )
> @@ -210,6 +212,10 @@ static int enter_state(u32 state)
>      else
>          error = 0;
>  
> +    ci = get_cpu_info();
> +    ci->use_shadow_spec_ctrl = 0;
> +    ci->bti_ist_info = 0;
> +
>      ACPI_FLUSH_CPU_CACHE();
>  
>      switch ( state )
> @@ -248,6 +254,11 @@ static int enter_state(u32 state)
>  
>      microcode_resume_cpu(0);
>  
> +    ci->bti_ist_info = default_bti_ist_info;
> +    asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)

This does not compile for me:

  power.c: Assembler messages:
  power.c:272: Error: value of 257 too large for field of 1 bytes at 0

Changing the alternative based on the other "wrmsr" calls fixes it:

  asm volatile (ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_XEN_IBRS_SET)

> +                  :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0)
> +                  : "memory");
> +
>   done:
>      spin_debug_enable();
>      local_irq_restore(flags);


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume
  2018-04-13 18:25   ` Simon Gaiser
@ 2018-04-13 18:27     ` Andrew Cooper
  2018-04-13 18:34       ` Simon Gaiser
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2018-04-13 18:27 UTC (permalink / raw)
  To: Simon Gaiser, Jan Beulich, xen-devel; +Cc: Juergen Gross

On 13/04/18 19:25, Simon Gaiser wrote:
> Jan Beulich:
>> NMI and #MC can occur at any time after S3 resume, yet the MSR_SPEC_CTRL
>> may become available only once we're reloaded microcode. Make
>> SPEC_CTRL_ENTRY_FROM_INTR_IST and DO_SPEC_CTRL_EXIT_TO_XEN no-ops for
>> the critical period of time.
>>
>> Also set the MSR back to its intended value.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/acpi/power.c
>> +++ b/xen/arch/x86/acpi/power.c
>> @@ -28,6 +28,7 @@
>>  #include <asm/tboot.h>
>>  #include <asm/apic.h>
>>  #include <asm/io_apic.h>
>> +#include <asm/spec_ctrl.h>
>>  #include <acpi/cpufreq/cpufreq.h>
>>  
>>  uint32_t system_reset_counter = 1;
>> @@ -163,6 +164,7 @@ static int enter_state(u32 state)
>>  {
>>      unsigned long flags;
>>      int error;
>> +    struct cpu_info *ci;
>>      unsigned long cr4;
>>  
>>      if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) )
>> @@ -210,6 +212,10 @@ static int enter_state(u32 state)
>>      else
>>          error = 0;
>>  
>> +    ci = get_cpu_info();
>> +    ci->use_shadow_spec_ctrl = 0;
>> +    ci->bti_ist_info = 0;
>> +
>>      ACPI_FLUSH_CPU_CACHE();
>>  
>>      switch ( state )
>> @@ -248,6 +254,11 @@ static int enter_state(u32 state)
>>  
>>      microcode_resume_cpu(0);
>>  
>> +    ci->bti_ist_info = default_bti_ist_info;
>> +    asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)
> This does not compile for me:
>
>   power.c: Assembler messages:
>   power.c:272: Error: value of 257 too large for field of 1 bytes at 0
>
> Changing the alternative based on the other "wrmsr" calls fixes it:
>
>   asm volatile (ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_XEN_IBRS_SET)

Ah - you're presumably back-porting this to 4.8?

Jan's code is correct for staging, and your version here is correct for
all currently-released versions of Xen.  (I've done quite a lot of
playing with alternatives generation for 4.11.)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume
  2018-04-13 18:27     ` Andrew Cooper
@ 2018-04-13 18:34       ` Simon Gaiser
  0 siblings, 0 replies; 18+ messages in thread
From: Simon Gaiser @ 2018-04-13 18:34 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich, xen-devel; +Cc: Juergen Gross


[-- Attachment #1.1.1: Type: text/plain, Size: 2158 bytes --]

Andrew Cooper:
> On 13/04/18 19:25, Simon Gaiser wrote:
>> Jan Beulich:
>>> NMI and #MC can occur at any time after S3 resume, yet the MSR_SPEC_CTRL
>>> may become available only once we're reloaded microcode. Make
>>> SPEC_CTRL_ENTRY_FROM_INTR_IST and DO_SPEC_CTRL_EXIT_TO_XEN no-ops for
>>> the critical period of time.
>>>
>>> Also set the MSR back to its intended value.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> --- a/xen/arch/x86/acpi/power.c
>>> +++ b/xen/arch/x86/acpi/power.c
>>> @@ -28,6 +28,7 @@
>>>  #include <asm/tboot.h>
>>>  #include <asm/apic.h>
>>>  #include <asm/io_apic.h>
>>> +#include <asm/spec_ctrl.h>
>>>  #include <acpi/cpufreq/cpufreq.h>
>>>  
>>>  uint32_t system_reset_counter = 1;
>>> @@ -163,6 +164,7 @@ static int enter_state(u32 state)
>>>  {
>>>      unsigned long flags;
>>>      int error;
>>> +    struct cpu_info *ci;
>>>      unsigned long cr4;
>>>  
>>>      if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) )
>>> @@ -210,6 +212,10 @@ static int enter_state(u32 state)
>>>      else
>>>          error = 0;
>>>  
>>> +    ci = get_cpu_info();
>>> +    ci->use_shadow_spec_ctrl = 0;
>>> +    ci->bti_ist_info = 0;
>>> +
>>>      ACPI_FLUSH_CPU_CACHE();
>>>  
>>>      switch ( state )
>>> @@ -248,6 +254,11 @@ static int enter_state(u32 state)
>>>  
>>>      microcode_resume_cpu(0);
>>>  
>>> +    ci->bti_ist_info = default_bti_ist_info;
>>> +    asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)
>> This does not compile for me:
>>
>>   power.c: Assembler messages:
>>   power.c:272: Error: value of 257 too large for field of 1 bytes at 0
>>
>> Changing the alternative based on the other "wrmsr" calls fixes it:
>>
>>   asm volatile (ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_XEN_IBRS_SET)
> 
> Ah - you're presumably back-porting this to 4.8?
> 
> Jan's code is correct for staging, and your version here is correct for
> all currently-released versions of Xen.  (I've done quite a lot of
> playing with alternatives generation for 4.11.)

Yeah, sorry, I should have checked if it works with staging.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/3] x86: check feature flags after resume
  2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich
  2018-04-13 11:56 ` [PATCH 1/3] x86: correct ordering of operations during S3 resume Jan Beulich
  2018-04-13 11:57 ` [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume Jan Beulich
@ 2018-04-13 11:58 ` Jan Beulich
  2018-04-13 18:29   ` Simon Gaiser
  2018-04-13 12:01 ` [PATCH 0/3] x86: S3 resume adjustments Andrew Cooper
  2018-04-14  5:49 ` Simon Gaiser
  4 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2018-04-13 11:58 UTC (permalink / raw)
  To: xen-devel; +Cc: Simon Gaiser, Andrew Cooper, Juergen Gross

Make sure no previously present features are missing after resume (and
the re-loading of microcode), to avoid later crashes or (likely silent)
hangs / live locks. This doesn't go beyond checking x86_capability[],
but this should be good enough for the immediate need of making sure
that the BIT mitigation MSRs are still available.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -254,6 +254,9 @@ static int enter_state(u32 state)
 
     microcode_resume_cpu(0);
 
+    if ( !recheck_cpu_features(0) )
+        panic("Missing previously available feature(s).");
+
     ci->bti_ist_info = default_bti_ist_info;
     asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)
                   :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0)
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -501,6 +501,9 @@ void identify_cpu(struct cpuinfo_x86 *c)
 	printk("\n");
 #endif
 
+	if (system_state == SYS_STATE_resume)
+		return;
+
 	/*
 	 * On SMP, boot_cpu_data holds the common feature set between
 	 * all CPUs; so make sure that we indicate which features are
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -473,6 +473,28 @@ void __init init_guest_cpuid(void)
     calculate_hvm_max_policy();
 }
 
+bool recheck_cpu_features(unsigned int cpu)
+{
+    bool okay = true;
+    struct cpuinfo_x86 c;
+    const struct cpuinfo_x86 *bsp = &boot_cpu_data;
+    unsigned int i;
+
+    identify_cpu(&c);
+
+    for ( i = 0; i < NCAPINTS; ++i )
+    {
+        if ( !(~c.x86_capability[i] & bsp->x86_capability[i]) )
+            continue;
+
+        printk(XENLOG_ERR "CPU%u: cap[%2u] is %08x (expected %08x)\n",
+               cpu, i, c.x86_capability[i], bsp->x86_capability[i]);
+        okay = false;
+    }
+
+    return okay;
+}
+
 const uint32_t *lookup_deep_deps(uint32_t feature)
 {
     static const struct {
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -90,11 +90,14 @@ void initialize_cpu_data(unsigned int cp
     cpu_data[cpu] = boot_cpu_data;
 }
 
-static void smp_store_cpu_info(int id)
+static bool smp_store_cpu_info(unsigned int id)
 {
     unsigned int socket;
 
-    identify_cpu(&cpu_data[id]);
+    if ( system_state != SYS_STATE_resume )
+        identify_cpu(&cpu_data[id]);
+    else if ( !recheck_cpu_features(id) )
+        return false;
 
     socket = cpu_to_socket(id);
     if ( !socket_cpumask[socket] )
@@ -102,6 +105,8 @@ static void smp_store_cpu_info(int id)
         socket_cpumask[socket] = secondary_socket_cpumask;
         secondary_socket_cpumask = NULL;
     }
+
+    return true;
 }
 
 /*
@@ -187,12 +192,19 @@ static void smp_callin(void)
     setup_local_APIC();
 
     /* Save our processor parameters. */
-    smp_store_cpu_info(cpu);
+    if ( !smp_store_cpu_info(cpu) )
+    {
+        printk("CPU%u: Failed to validate features - not coming back online\n",
+               cpu);
+        cpu_error = -ENXIO;
+        goto halt;
+    }
 
     if ( (rc = hvm_cpu_up()) != 0 )
     {
         printk("CPU%d: Failed to initialise HVM. Not coming online.\n", cpu);
         cpu_error = rc;
+    halt:
         clear_local_APIC();
         spin_debug_enable();
         cpu_exit_clear(cpu);
--- a/xen/include/asm-x86/cpuid.h
+++ b/xen/include/asm-x86/cpuid.h
@@ -253,6 +253,9 @@ static inline void cpuid_featureset_to_p
 extern struct cpuid_policy raw_cpuid_policy, host_cpuid_policy,
     pv_max_cpuid_policy, hvm_max_cpuid_policy;
 
+/* Check that all previously present features are still available. */
+bool recheck_cpu_features(unsigned int cpu);
+
 /* Allocate and initialise a CPUID policy suitable for the domain. */
 int init_domain_cpuid_policy(struct domain *d);
 





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] x86: check feature flags after resume
  2018-04-13 11:58 ` [PATCH 3/3] x86: check feature flags after resume Jan Beulich
@ 2018-04-13 18:29   ` Simon Gaiser
  2018-04-13 18:56     ` Simon Gaiser
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Gaiser @ 2018-04-13 18:29 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Juergen Gross, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 5053 bytes --]

Jan Beulich:
> Make sure no previously present features are missing after resume (and
> the re-loading of microcode), to avoid later crashes or (likely silent)
> hangs / live locks. This doesn't go beyond checking x86_capability[],
> but this should be good enough for the immediate need of making sure
> that the BIT mitigation MSRs are still available.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> --- a/xen/arch/x86/acpi/power.c
> +++ b/xen/arch/x86/acpi/power.c
> @@ -254,6 +254,9 @@ static int enter_state(u32 state)
>  
>      microcode_resume_cpu(0);
>  
> +    if ( !recheck_cpu_features(0) )
> +        panic("Missing previously available feature(s).");
> +
>      ci->bti_ist_info = default_bti_ist_info;
>      asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)
>                    :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0)
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -501,6 +501,9 @@ void identify_cpu(struct cpuinfo_x86 *c)
>  	printk("\n");
>  #endif
>  
> +	if (system_state == SYS_STATE_resume)
> +		return;
> +
>  	/*
>  	 * On SMP, boot_cpu_data holds the common feature set between
>  	 * all CPUs; so make sure that we indicate which features are
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -473,6 +473,28 @@ void __init init_guest_cpuid(void)
>      calculate_hvm_max_policy();
>  }
>  
> +bool recheck_cpu_features(unsigned int cpu)
> +{
> +    bool okay = true;
> +    struct cpuinfo_x86 c;
> +    const struct cpuinfo_x86 *bsp = &boot_cpu_data;
> +    unsigned int i;
> +
> +    identify_cpu(&c);

This runs into a bug in identify_cpu(). x86_vendor_id does not get
zeroed, so the x86_vendor_id is not null terminated and the vendor
identification fails.

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 4feaa2ceb6..5750d26216 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -366,8 +366,8 @@ void identify_cpu(struct cpuinfo_x86 *c)
        c->x86_vendor = X86_VENDOR_UNKNOWN;
        c->cpuid_level = -1;    /* CPUID not detected */
        c->x86_model = c->x86_mask = 0; /* So far unknown... */
-       c->x86_vendor_id[0] = '\0'; /* Unset */
-       c->x86_model_id[0] = '\0';  /* Unset */
+       memset(&c->x86_vendor_id, 0, sizeof(c->x86_vendor_id));
+       memset(&c->x86_model_id, 0, sizeof(c->x86_model_id));
        c->x86_max_cores = 1;
        c->x86_num_siblings = 1;
        c->x86_clflush_size = 0;

With this patch it works for me.

> +
> +    for ( i = 0; i < NCAPINTS; ++i )
> +    {
> +        if ( !(~c.x86_capability[i] & bsp->x86_capability[i]) )
> +            continue;
> +
> +        printk(XENLOG_ERR "CPU%u: cap[%2u] is %08x (expected %08x)\n",
> +               cpu, i, c.x86_capability[i], bsp->x86_capability[i]);
> +        okay = false;
> +    }
> +
> +    return okay;
> +}
> +
>  const uint32_t *lookup_deep_deps(uint32_t feature)
>  {
>      static const struct {
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -90,11 +90,14 @@ void initialize_cpu_data(unsigned int cp
>      cpu_data[cpu] = boot_cpu_data;
>  }
>  
> -static void smp_store_cpu_info(int id)
> +static bool smp_store_cpu_info(unsigned int id)
>  {
>      unsigned int socket;
>  
> -    identify_cpu(&cpu_data[id]);
> +    if ( system_state != SYS_STATE_resume )
> +        identify_cpu(&cpu_data[id]);
> +    else if ( !recheck_cpu_features(id) )
> +        return false;
>  
>      socket = cpu_to_socket(id);
>      if ( !socket_cpumask[socket] )
> @@ -102,6 +105,8 @@ static void smp_store_cpu_info(int id)
>          socket_cpumask[socket] = secondary_socket_cpumask;
>          secondary_socket_cpumask = NULL;
>      }
> +
> +    return true;
>  }
>  
>  /*
> @@ -187,12 +192,19 @@ static void smp_callin(void)
>      setup_local_APIC();
>  
>      /* Save our processor parameters. */
> -    smp_store_cpu_info(cpu);
> +    if ( !smp_store_cpu_info(cpu) )
> +    {
> +        printk("CPU%u: Failed to validate features - not coming back online\n",
> +               cpu);
> +        cpu_error = -ENXIO;
> +        goto halt;
> +    }
>  
>      if ( (rc = hvm_cpu_up()) != 0 )
>      {
>          printk("CPU%d: Failed to initialise HVM. Not coming online.\n", cpu);
>          cpu_error = rc;
> +    halt:
>          clear_local_APIC();
>          spin_debug_enable();
>          cpu_exit_clear(cpu);
> --- a/xen/include/asm-x86/cpuid.h
> +++ b/xen/include/asm-x86/cpuid.h
> @@ -253,6 +253,9 @@ static inline void cpuid_featureset_to_p
>  extern struct cpuid_policy raw_cpuid_policy, host_cpuid_policy,
>      pv_max_cpuid_policy, hvm_max_cpuid_policy;
>  
> +/* Check that all previously present features are still available. */
> +bool recheck_cpu_features(unsigned int cpu);
> +
>  /* Allocate and initialise a CPUID policy suitable for the domain. */
>  int init_domain_cpuid_policy(struct domain *d);
>  


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] x86: check feature flags after resume
  2018-04-13 18:29   ` Simon Gaiser
@ 2018-04-13 18:56     ` Simon Gaiser
  2018-04-16 10:16       ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Gaiser @ 2018-04-13 18:56 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Juergen Gross, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 2830 bytes --]

Simon Gaiser:
> Jan Beulich:
>> Make sure no previously present features are missing after resume (and
>> the re-loading of microcode), to avoid later crashes or (likely silent)
>> hangs / live locks. This doesn't go beyond checking x86_capability[],
>> but this should be good enough for the immediate need of making sure
>> that the BIT mitigation MSRs are still available.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> --- a/xen/arch/x86/acpi/power.c
>> +++ b/xen/arch/x86/acpi/power.c
>> @@ -254,6 +254,9 @@ static int enter_state(u32 state)
>>  
>>      microcode_resume_cpu(0);
>>  
>> +    if ( !recheck_cpu_features(0) )
>> +        panic("Missing previously available feature(s).");
>> +
>>      ci->bti_ist_info = default_bti_ist_info;
>>      asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)
>>                    :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0)
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -501,6 +501,9 @@ void identify_cpu(struct cpuinfo_x86 *c)
>>  	printk("\n");
>>  #endif
>>  
>> +	if (system_state == SYS_STATE_resume)
>> +		return;
>> +
>>  	/*
>>  	 * On SMP, boot_cpu_data holds the common feature set between
>>  	 * all CPUs; so make sure that we indicate which features are
>> --- a/xen/arch/x86/cpuid.c
>> +++ b/xen/arch/x86/cpuid.c
>> @@ -473,6 +473,28 @@ void __init init_guest_cpuid(void)
>>      calculate_hvm_max_policy();
>>  }
>>  
>> +bool recheck_cpu_features(unsigned int cpu)
>> +{
>> +    bool okay = true;
>> +    struct cpuinfo_x86 c;
>> +    const struct cpuinfo_x86 *bsp = &boot_cpu_data;
>> +    unsigned int i;
>> +
>> +    identify_cpu(&c);
> 
> This runs into a bug in identify_cpu(). x86_vendor_id does not get
> zeroed, so the x86_vendor_id is not null terminated and the vendor
> identification fails.
> 
> diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
> index 4feaa2ceb6..5750d26216 100644
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -366,8 +366,8 @@ void identify_cpu(struct cpuinfo_x86 *c)
>         c->x86_vendor = X86_VENDOR_UNKNOWN;
>         c->cpuid_level = -1;    /* CPUID not detected */
>         c->x86_model = c->x86_mask = 0; /* So far unknown... */
> -       c->x86_vendor_id[0] = '\0'; /* Unset */
> -       c->x86_model_id[0] = '\0';  /* Unset */
> +       memset(&c->x86_vendor_id, 0, sizeof(c->x86_vendor_id));
> +       memset(&c->x86_model_id, 0, sizeof(c->x86_model_id));
>         c->x86_max_cores = 1;
>         c->x86_num_siblings = 1;
>         c->x86_clflush_size = 0;
> 
> With this patch it works for me.

Meh, also a backport failure from me. Since e34bc403c3c7 this problem
should not appear since it does not assume a null terminated string.


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] x86: check feature flags after resume
  2018-04-13 18:56     ` Simon Gaiser
@ 2018-04-16 10:16       ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2018-04-16 10:16 UTC (permalink / raw)
  To: Simon Gaiser; +Cc: Juergen Gross, Andrew Cooper, xen-devel

>>> On 13.04.18 at 20:56, <simon@invisiblethingslab.com> wrote:
> Simon Gaiser:
>> Jan Beulich:
>>> Make sure no previously present features are missing after resume (and
>>> the re-loading of microcode), to avoid later crashes or (likely silent)
>>> hangs / live locks. This doesn't go beyond checking x86_capability[],
>>> but this should be good enough for the immediate need of making sure
>>> that the BIT mitigation MSRs are still available.
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> --- a/xen/arch/x86/acpi/power.c
>>> +++ b/xen/arch/x86/acpi/power.c
>>> @@ -254,6 +254,9 @@ static int enter_state(u32 state)
>>>  
>>>      microcode_resume_cpu(0);
>>>  
>>> +    if ( !recheck_cpu_features(0) )
>>> +        panic("Missing previously available feature(s).");
>>> +
>>>      ci->bti_ist_info = default_bti_ist_info;
>>>      asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET)
>>>                    :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0)
>>> --- a/xen/arch/x86/cpu/common.c
>>> +++ b/xen/arch/x86/cpu/common.c
>>> @@ -501,6 +501,9 @@ void identify_cpu(struct cpuinfo_x86 *c)
>>>  	printk("\n");
>>>  #endif
>>>  
>>> +	if (system_state == SYS_STATE_resume)
>>> +		return;
>>> +
>>>  	/*
>>>  	 * On SMP, boot_cpu_data holds the common feature set between
>>>  	 * all CPUs; so make sure that we indicate which features are
>>> --- a/xen/arch/x86/cpuid.c
>>> +++ b/xen/arch/x86/cpuid.c
>>> @@ -473,6 +473,28 @@ void __init init_guest_cpuid(void)
>>>      calculate_hvm_max_policy();
>>>  }
>>>  
>>> +bool recheck_cpu_features(unsigned int cpu)
>>> +{
>>> +    bool okay = true;
>>> +    struct cpuinfo_x86 c;
>>> +    const struct cpuinfo_x86 *bsp = &boot_cpu_data;
>>> +    unsigned int i;
>>> +
>>> +    identify_cpu(&c);
>> 
>> This runs into a bug in identify_cpu(). x86_vendor_id does not get
>> zeroed, so the x86_vendor_id is not null terminated and the vendor
>> identification fails.
>> 
>> diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
>> index 4feaa2ceb6..5750d26216 100644
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -366,8 +366,8 @@ void identify_cpu(struct cpuinfo_x86 *c)
>>         c->x86_vendor = X86_VENDOR_UNKNOWN;
>>         c->cpuid_level = -1;    /* CPUID not detected */
>>         c->x86_model = c->x86_mask = 0; /* So far unknown... */
>> -       c->x86_vendor_id[0] = '\0'; /* Unset */
>> -       c->x86_model_id[0] = '\0';  /* Unset */
>> +       memset(&c->x86_vendor_id, 0, sizeof(c->x86_vendor_id));
>> +       memset(&c->x86_model_id, 0, sizeof(c->x86_model_id));
>>         c->x86_max_cores = 1;
>>         c->x86_num_siblings = 1;
>>         c->x86_clflush_size = 0;
>> 
>> With this patch it works for me.
> 
> Meh, also a backport failure from me. Since e34bc403c3c7 this problem
> should not appear since it does not assume a null terminated string.

NP - it's good to be aware of such issues in case we as well decide to
backport this.

Thanks for the feedback,
Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] x86: S3 resume adjustments
  2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich
                   ` (2 preceding siblings ...)
  2018-04-13 11:58 ` [PATCH 3/3] x86: check feature flags after resume Jan Beulich
@ 2018-04-13 12:01 ` Andrew Cooper
  2018-04-16 11:57   ` Juergen Gross
  2018-04-14  5:49 ` Simon Gaiser
  4 siblings, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2018-04-13 12:01 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Simon Gaiser, Juergen Gross

On 13/04/18 12:49, Jan Beulich wrote:
> 1: correct ordering of operations during S3 resume
> 2: suppress BTI mitigations around S3 suspend/resume
> 3: check feature flags after resume
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] x86: S3 resume adjustments
  2018-04-13 12:01 ` [PATCH 0/3] x86: S3 resume adjustments Andrew Cooper
@ 2018-04-16 11:57   ` Juergen Gross
  0 siblings, 0 replies; 18+ messages in thread
From: Juergen Gross @ 2018-04-16 11:57 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich, xen-devel; +Cc: Simon Gaiser

On 13/04/18 14:01, Andrew Cooper wrote:
> On 13/04/18 12:49, Jan Beulich wrote:
>> 1: correct ordering of operations during S3 resume
>> 2: suppress BTI mitigations around S3 suspend/resume
>> 3: check feature flags after resume
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
> 

Release-acked-by: Juergen Gross <jgross@suse.com>


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] x86: S3 resume adjustments
  2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich
                   ` (3 preceding siblings ...)
  2018-04-13 12:01 ` [PATCH 0/3] x86: S3 resume adjustments Andrew Cooper
@ 2018-04-14  5:49 ` Simon Gaiser
  2018-04-15 13:08   ` Andrew Cooper
  4 siblings, 1 reply; 18+ messages in thread
From: Simon Gaiser @ 2018-04-14  5:49 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Juergen Gross, Andrew Cooper


[-- Attachment #1.1.1: Type: text/plain, Size: 755 bytes --]

Jan Beulich:
> 1: correct ordering of operations during S3 resume
> 2: suppress BTI mitigations around S3 suspend/resume
> 3: check feature flags after resume
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Simon, could you give this a try please?

Backported to 4.8 it works fine with the two fixes I sent earlier.

I now also tried staging. Resume is broken even without IBRS/IBPB. It
panics about a double fault somewhere after it starts to enable the
non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
could test the patches anyway. With them it gets again to the point
where it double faults. So the patches are most likely fine.

I didn't really looked yet at the cause of the double fault.

Simon


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] x86: S3 resume adjustments
  2018-04-14  5:49 ` Simon Gaiser
@ 2018-04-15 13:08   ` Andrew Cooper
  2018-04-15 15:52     ` Simon Gaiser
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2018-04-15 13:08 UTC (permalink / raw)
  To: Simon Gaiser, Jan Beulich, xen-devel; +Cc: Juergen Gross

On 14/04/18 06:49, Simon Gaiser wrote:
> Jan Beulich:
>> 1: correct ordering of operations during S3 resume
>> 2: suppress BTI mitigations around S3 suspend/resume
>> 3: check feature flags after resume
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>
>> Simon, could you give this a try please?
> Backported to 4.8 it works fine with the two fixes I sent earlier.
>
> I now also tried staging. Resume is broken even without IBRS/IBPB. It
> panics about a double fault somewhere after it starts to enable the
> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
> could test the patches anyway. With them it gets again to the point
> where it double faults. So the patches are most likely fine.
>
> I didn't really looked yet at the cause of the double fault.

Do you at least have the crash log from the attempt?

~Andrew

>
> Simon
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] x86: S3 resume adjustments
  2018-04-15 13:08   ` Andrew Cooper
@ 2018-04-15 15:52     ` Simon Gaiser
  2018-04-15 17:34       ` Andrew Cooper
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Gaiser @ 2018-04-15 15:52 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich, xen-devel; +Cc: Juergen Gross


[-- Attachment #1.1.1: Type: text/plain, Size: 3216 bytes --]

Andrew Cooper:
> On 14/04/18 06:49, Simon Gaiser wrote:
>> Jan Beulich:
>>> 1: correct ordering of operations during S3 resume
>>> 2: suppress BTI mitigations around S3 suspend/resume
>>> 3: check feature flags after resume
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>
>>> Simon, could you give this a try please?
>> Backported to 4.8 it works fine with the two fixes I sent earlier.
>>
>> I now also tried staging. Resume is broken even without IBRS/IBPB. It
>> panics about a double fault somewhere after it starts to enable the
>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
>> could test the patches anyway. With them it gets again to the point
>> where it double faults. So the patches are most likely fine.
>>
>> I didn't really looked yet at the cause of the double fault.
> 
> Do you at least have the crash log from the attempt?

Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a
Debian sid:

(XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI
(XEN) CPU0 CMCI LVT vector (0xf2) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
(XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
(XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
(XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
(XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
(XEN) *** DOUBLE FAULT ***
(XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: ffffc90040cd4068   rbx: 0000000000000000   rcx: 000000000000000a
(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
(XEN) rbp: 000036ffbf32bf77   rsp: ffffc90040cd4000   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc90040cd7fff
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426e0
(XEN) cr3: 000000022200a000   cr2: ffffc90040cd3ff8
(XEN) fsb: 0000000000000000   gsb: ffff88021e6c0000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 8a00   gs: 0010   ss: e010   cs: e008
(XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000
(XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4000, tss.rsp0=ffff8300cec8ffa0
(XEN) No stack overflow detected. Skipping stack trace.
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) DOUBLE FAULT -- system shutdown
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] x86: S3 resume adjustments
  2018-04-15 15:52     ` Simon Gaiser
@ 2018-04-15 17:34       ` Andrew Cooper
  2018-04-15 20:15         ` Simon Gaiser
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Cooper @ 2018-04-15 17:34 UTC (permalink / raw)
  To: Simon Gaiser, Jan Beulich, xen-devel; +Cc: Juergen Gross

On 15/04/18 16:52, Simon Gaiser wrote:
> Andrew Cooper:
>> On 14/04/18 06:49, Simon Gaiser wrote:
>>> Jan Beulich:
>>>> 1: correct ordering of operations during S3 resume
>>>> 2: suppress BTI mitigations around S3 suspend/resume
>>>> 3: check feature flags after resume
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>
>>>> Simon, could you give this a try please?
>>> Backported to 4.8 it works fine with the two fixes I sent earlier.
>>>
>>> I now also tried staging. Resume is broken even without IBRS/IBPB. It
>>> panics about a double fault somewhere after it starts to enable the
>>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
>>> could test the patches anyway. With them it gets again to the point
>>> where it double faults. So the patches are most likely fine.
>>>
>>> I didn't really looked yet at the cause of the double fault.
>> Do you at least have the crash log from the attempt?
> Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a
> Debian sid:

I can't find that object.  I presume this isn't an upstream tree?

>
> (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI
> (XEN) CPU0 CMCI LVT vector (0xf2) already installed
> (XEN) Finishing wakeup from ACPI S3 state.
> (XEN) Enabling non-boot CPUs  ...
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800

Bad dom0.  It shouldn't be playing with APIC_BASE at all, but I guess
this means I can't fix the hypervisor behaviour to throw #GP back at a
PV guest.

> (XEN) *** DOUBLE FAULT ***
> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7

Can you disassemble the binary and find out where this is?  On current
staging, handle_exception+0x9c is in the middle of
SPEC_CTRL_ENTRY_FROM_INTR but this might not be the case for you.

> (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
> (XEN) rax: ffffc90040cd4068   rbx: 0000000000000000   rcx: 000000000000000a
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
> (XEN) rbp: 000036ffbf32bf77   rsp: ffffc90040cd4000   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc90040cd7fff
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426e0
> (XEN) cr3: 000000022200a000   cr2: ffffc90040cd3ff8
> (XEN) fsb: 0000000000000000   gsb: ffff88021e6c0000   gss: 0000000000000000
> (XEN) ds: 002b   es: 002b   fs: 8a00   gs: 0010   ss: e010   cs: e008
> (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000
> (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4000, tss.rsp0=ffff8300cec8ffa0

Given the %rsp and %cr2 values, it looks like we have a bad %rsp over a
region which isn't mapped, tried to push a value, got #PF, tried to
invoke the #PF exception handler which faulted again, and escalated to
#DF which followed the TSS and moved back to reality.

The only way to come in with stack pointers other than TSS.RSP0 is via
syscall and sysenter.  SYSENTER_ESP should be identical to TSS.RSP0

--- a/xen/arch/x86/x86_64/traps.c
+++ b/xen/arch/x86/x86_64/traps.c
@@ -257,6 +257,13 @@ void do_double_fault(struct cpu_user_regs *regs)
     _show_registers(regs, crs, CTXT_hypervisor, NULL);
     show_stack_overflow(cpu, regs);
 
+    {
+        uint64_t val;
+
+        rdmsrl(MSR_IA32_SYSENTER_ESP, val);
+        printk("*** SYSENTER_ESP: %p\n", _p(val));
+    }
+
     panic("DOUBLE FAULT -- system shutdown");
 }
 
so this bit of debugging should help track things down.  If not, then
we've probably got an issue (re)writing the syscall trampolines.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] x86: S3 resume adjustments
  2018-04-15 17:34       ` Andrew Cooper
@ 2018-04-15 20:15         ` Simon Gaiser
  2018-04-16 13:13           ` Jan Beulich
  0 siblings, 1 reply; 18+ messages in thread
From: Simon Gaiser @ 2018-04-15 20:15 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich, xen-devel; +Cc: Juergen Gross


[-- Attachment #1.1.1: Type: text/plain, Size: 16458 bytes --]

Andrew Cooper:
> On 15/04/18 16:52, Simon Gaiser wrote:
>> Andrew Cooper:
>>> On 14/04/18 06:49, Simon Gaiser wrote:
>>>> Jan Beulich:
>>>>> 1: correct ordering of operations during S3 resume
>>>>> 2: suppress BTI mitigations around S3 suspend/resume
>>>>> 3: check feature flags after resume
>>>>>
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>>
>>>>> Simon, could you give this a try please?
>>>> Backported to 4.8 it works fine with the two fixes I sent earlier.
>>>>
>>>> I now also tried staging. Resume is broken even without IBRS/IBPB. It
>>>> panics about a double fault somewhere after it starts to enable the
>>>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I
>>>> could test the patches anyway. With them it gets again to the point
>>>> where it double faults. So the patches are most likely fine.
>>>>
>>>> I didn't really looked yet at the cause of the double fault.
>>> Do you at least have the crash log from the attempt?
>> Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a
>> Debian sid:
> 
> I can't find that object.  I presume this isn't an upstream tree?

That's the head of upstream staging as of Friday/Saturday night. And
AFAICS it still is:
https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5

>> (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI
>> (XEN) CPU0 CMCI LVT vector (0xf2) already installed
>> (XEN) Finishing wakeup from ACPI S3 state.
>> (XEN) Enabling non-boot CPUs  ...
>> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
>> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
>> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
>> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
>> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
>> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
> 
> Bad dom0.  It shouldn't be playing with APIC_BASE at all, but I guess
> this means I can't fix the hypervisor behaviour to throw #GP back at a
> PV guest.
> 
>> (XEN) *** DOUBLE FAULT ***
>> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7
> 
> Can you disassemble the binary and find out where this is?  On current
> staging, handle_exception+0x9c is in the middle of
> SPEC_CTRL_ENTRY_FROM_INTR but this might not be the case for you.

Dump of assembler code for function handle_exception:
   0xffff82d08037a8a8 <+0>:	0f 1f 00	nopl   (%rax)
   0xffff82d08037a8ab <+3>:	48 83 c4 88	add    $0xffffffffffffff88,%rsp
   0xffff82d08037a8af <+7>:	fc	cld    
   0xffff82d08037a8b0 <+8>:	48 89 7c 24 70	mov    %rdi,0x70(%rsp)
   0xffff82d08037a8b5 <+13>:	31 ff	xor    %edi,%edi
   0xffff82d08037a8b7 <+15>:	48 89 74 24 68	mov    %rsi,0x68(%rsp)
   0xffff82d08037a8bc <+20>:	31 f6	xor    %esi,%esi
   0xffff82d08037a8be <+22>:	48 89 54 24 60	mov    %rdx,0x60(%rsp)
   0xffff82d08037a8c3 <+27>:	31 d2	xor    %edx,%edx
   0xffff82d08037a8c5 <+29>:	48 89 4c 24 58	mov    %rcx,0x58(%rsp)
   0xffff82d08037a8ca <+34>:	31 c9	xor    %ecx,%ecx
   0xffff82d08037a8cc <+36>:	48 89 44 24 50	mov    %rax,0x50(%rsp)
   0xffff82d08037a8d1 <+41>:	31 c0	xor    %eax,%eax
   0xffff82d08037a8d3 <+43>:	4c 89 44 24 48	mov    %r8,0x48(%rsp)
   0xffff82d08037a8d8 <+48>:	4c 89 4c 24 40	mov    %r9,0x40(%rsp)
   0xffff82d08037a8dd <+53>:	4c 89 54 24 38	mov    %r10,0x38(%rsp)
   0xffff82d08037a8e2 <+58>:	4c 89 5c 24 30	mov    %r11,0x30(%rsp)
   0xffff82d08037a8e7 <+63>:	45 31 c0	xor    %r8d,%r8d
   0xffff82d08037a8ea <+66>:	45 31 c9	xor    %r9d,%r9d
   0xffff82d08037a8ed <+69>:	45 31 d2	xor    %r10d,%r10d
   0xffff82d08037a8f0 <+72>:	45 31 db	xor    %r11d,%r11d
   0xffff82d08037a8f3 <+75>:	48 89 5c 24 28	mov    %rbx,0x28(%rsp)
   0xffff82d08037a8f8 <+80>:	31 db	xor    %ebx,%ebx
   0xffff82d08037a8fa <+82>:	48 89 6c 24 20	mov    %rbp,0x20(%rsp)
   0xffff82d08037a8ff <+87>:	48 8d 6c 24 20	lea    0x20(%rsp),%rbp
   0xffff82d08037a904 <+92>:	48 f7 d5	not    %rbp
   0xffff82d08037a907 <+95>:	4c 89 64 24 18	mov    %r12,0x18(%rsp)
   0xffff82d08037a90c <+100>:	4c 89 6c 24 10	mov    %r13,0x10(%rsp)
   0xffff82d08037a911 <+105>:	4c 89 74 24 08	mov    %r14,0x8(%rsp)
   0xffff82d08037a916 <+110>:	4c 89 3c 24	mov    %r15,(%rsp)
   0xffff82d08037a91a <+114>:	45 31 e4	xor    %r12d,%r12d
   0xffff82d08037a91d <+117>:	45 31 ed	xor    %r13d,%r13d
   0xffff82d08037a920 <+120>:	45 31 f6	xor    %r14d,%r14d
   0xffff82d08037a923 <+123>:	45 31 ff	xor    %r15d,%r15d
   0xffff82d08037a926 <+126>:	49 c7 c6 ff 7f 00 00	mov    $0x7fff,%r14
   0xffff82d08037a92d <+133>:	49 09 e6	or     %rsp,%r14
   0xffff82d08037a930 <+136>:	90	nop
   0xffff82d08037a931 <+137>:	90	nop
   0xffff82d08037a932 <+138>:	90	nop
   0xffff82d08037a933 <+139>:	90	nop
   0xffff82d08037a934 <+140>:	90	nop
   0xffff82d08037a935 <+141>:	90	nop
   0xffff82d08037a936 <+142>:	90	nop
   0xffff82d08037a937 <+143>:	90	nop
   0xffff82d08037a938 <+144>:	90	nop
   0xffff82d08037a939 <+145>:	90	nop
   0xffff82d08037a93a <+146>:	90	nop
   0xffff82d08037a93b <+147>:	90	nop
   0xffff82d08037a93c <+148>:	90	nop
   0xffff82d08037a93d <+149>:	90	nop
   0xffff82d08037a93e <+150>:	90	nop
   0xffff82d08037a93f <+151>:	90	nop
   0xffff82d08037a940 <+152>:	90	nop
   0xffff82d08037a941 <+153>:	90	nop
   0xffff82d08037a942 <+154>:	90	nop
   0xffff82d08037a943 <+155>:	90	nop
   0xffff82d08037a944 <+156>:	90	nop
   0xffff82d08037a945 <+157>:	90	nop
   0xffff82d08037a946 <+158>:	90	nop
   0xffff82d08037a947 <+159>:	90	nop
   0xffff82d08037a948 <+160>:	90	nop
   0xffff82d08037a949 <+161>:	90	nop
   0xffff82d08037a94a <+162>:	90	nop
   0xffff82d08037a94b <+163>:	90	nop
   0xffff82d08037a94c <+164>:	90	nop
   0xffff82d08037a94d <+165>:	90	nop
   0xffff82d08037a94e <+166>:	90	nop
   0xffff82d08037a94f <+167>:	90	nop
   0xffff82d08037a950 <+168>:	90	nop
   0xffff82d08037a951 <+169>:	90	nop
   0xffff82d08037a952 <+170>:	90	nop
   0xffff82d08037a953 <+171>:	90	nop
   0xffff82d08037a954 <+172>:	90	nop
   0xffff82d08037a955 <+173>:	90	nop
   0xffff82d08037a956 <+174>:	90	nop
   0xffff82d08037a957 <+175>:	90	nop
   0xffff82d08037a958 <+176>:	90	nop
   0xffff82d08037a959 <+177>:	90	nop
   0xffff82d08037a95a <+178>:	90	nop
   0xffff82d08037a95b <+179>:	90	nop
   0xffff82d08037a95c <+180>:	90	nop
   0xffff82d08037a95d <+181>:	90	nop
   0xffff82d08037a95e <+182>:	90	nop
   0xffff82d08037a95f <+183>:	90	nop
   0xffff82d08037a960 <+184>:	90	nop
   0xffff82d08037a961 <+185>:	90	nop
   0xffff82d08037a962 <+186>:	90	nop
   0xffff82d08037a963 <+187>:	90	nop
   0xffff82d08037a964 <+188>:	90	nop
   0xffff82d08037a965 <+189>:	90	nop
   0xffff82d08037a966 <+190>:	90	nop
   0xffff82d08037a967 <+191>:	90	nop
   0xffff82d08037a968 <+192>:	90	nop
   0xffff82d08037a969 <+193>:	90	nop
   0xffff82d08037a96a <+194>:	90	nop
   0xffff82d08037a96b <+195>:	90	nop
   0xffff82d08037a96c <+196>:	90	nop
   0xffff82d08037a96d <+197>:	90	nop
   0xffff82d08037a96e <+198>:	90	nop
   0xffff82d08037a96f <+199>:	90	nop
   0xffff82d08037a970 <+200>:	90	nop
   0xffff82d08037a971 <+201>:	90	nop
   0xffff82d08037a972 <+202>:	90	nop
   0xffff82d08037a973 <+203>:	90	nop
   0xffff82d08037a974 <+204>:	90	nop
   0xffff82d08037a975 <+205>:	49 8b 4e e1	mov    -0x1f(%r14),%rcx
   0xffff82d08037a979 <+209>:	49 89 cf	mov    %rcx,%r15
   0xffff82d08037a97c <+212>:	48 f7 d9	neg    %rcx
   0xffff82d08037a97f <+215>:	74 1e	je     0xffff82d08037a99f <handle_exception_saved>
   0xffff82d08037a981 <+217>:	79 07	jns    0xffff82d08037a98a <handle_exception+226>
   0xffff82d08037a983 <+219>:	49 89 4e e1	mov    %rcx,-0x1f(%r14)
   0xffff82d08037a987 <+223>:	48 f7 d9	neg    %rcx
   0xffff82d08037a98a <+226>:	0f 22 d9	mov    %rcx,%cr3
   0xffff82d08037a98d <+229>:	31 c9	xor    %ecx,%ecx
   0xffff82d08037a98f <+231>:	49 89 4e e1	mov    %rcx,-0x1f(%r14)
   0xffff82d08037a993 <+235>:	f6 84 24 88 00 00 00 03	testb  $0x3,0x88(%rsp)
   0xffff82d08037a99b <+243>:	4c 0f 45 f9	cmovne %rcx,%r15
End of assembler dump.

Is there an easy way to get gdb to resolve alternatives?

BTW:
(XEN) Speculative mitigation facilities:
(XEN)   Hardware features:
(XEN)   Compiled-in support: INDIRECT_THUNK
(XEN) BTI mitigations: Thunk RETPOLINE, Others: RSB_NATIVE RSB_VMEXIT
(XEN) XPTI: enabled

With 'bti=rsb_native=0' it fails somewhere else:

(XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI
(XEN) CPU0 CMCI LVT vector (0xf2) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
(XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
(XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
(XEN) *** DOUBLE FAULT ***
(XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d08027c35d>] search_pre_exception_table+0/0x54
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffffc90040cd4028
(XEN) rbp: 000036ffbf32bfb7   rsp: ffffc90040cd4020   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc90040cd7fff
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426e0
(XEN) cr3: 000000022200a000   cr2: ffffc90040cd3ff8
(XEN) fsb: 00007fd74515e740   gsb: ffff88021e6c0000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000
(XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4020, tss.rsp0=ffff8300cec8ffa0
(XEN) No stack overflow detected. Skipping stack trace.
(XEN) *** SYSENTER_ESP: ffff8300cec8ffa0
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) DOUBLE FAULT -- system shutdown
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

Dump of assembler code for function search_pre_exception_table:
   0xffff82d08027c35d <+0>:	55	push   %rbp
   0xffff82d08027c35e <+1>:	48 89 e5	mov    %rsp,%rbp
   0xffff82d08027c361 <+4>:	41 54	push   %r12
   0xffff82d08027c363 <+6>:	53	push   %rbx
   0xffff82d08027c364 <+7>:	4c 8b a7 80 00 00 00	mov    0x80(%rdi),%r12
   0xffff82d08027c36b <+14>:	4c 89 e2	mov    %r12,%rdx
   0xffff82d08027c36e <+17>:	48 8d 35 e3 61 17 00	lea    0x1761e3(%rip),%rsi        # 0xffff82d0803f2558
   0xffff82d08027c375 <+24>:	48 8d 3d d4 61 17 00	lea    0x1761d4(%rip),%rdi        # 0xffff82d0803f2550
   0xffff82d08027c37c <+31>:	e8 0c fe ff ff	callq  0xffff82d08027c18d <search_one_extable>
   0xffff82d08027c381 <+36>:	48 89 c3	mov    %rax,%rbx
   0xffff82d08027c384 <+39>:	48 85 c0	test   %rax,%rax
   0xffff82d08027c387 <+42>:	75 08	jne    0xffff82d08027c391 <search_pre_exception_table+52>
   0xffff82d08027c389 <+44>:	48 89 d8	mov    %rbx,%rax
   0xffff82d08027c38c <+47>:	5b	pop    %rbx
   0xffff82d08027c38d <+48>:	41 5c	pop    %r12
   0xffff82d08027c38f <+50>:	5d	pop    %rbp
   0xffff82d08027c390 <+51>:	c3	retq   
   0xffff82d08027c391 <+52>:	49 89 c0	mov    %rax,%r8
   0xffff82d08027c394 <+55>:	4c 89 e1	mov    %r12,%rcx
   0xffff82d08027c397 <+58>:	ba ca 00 00 00	mov    $0xca,%edx
   0xffff82d08027c39c <+63>:	48 8d 35 0a df 16 00	lea    0x16df0a(%rip),%rsi        # 0xffff82d0803ea2ad
   0xffff82d08027c3a3 <+70>:	48 8d 3d 56 89 15 00	lea    0x158956(%rip),%rdi        # 0xffff82d0803d4d00
   0xffff82d08027c3aa <+77>:	e8 58 71 fd ff	callq  0xffff82d080253507 <printk>
   0xffff82d08027c3af <+82>:	eb d8	jmp    0xffff82d08027c389 <search_pre_exception_table+44>
End of assembler dump.

>> (XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
>> (XEN) rax: ffffc90040cd4068   rbx: 0000000000000000   rcx: 000000000000000a
>> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
>> (XEN) rbp: 000036ffbf32bf77   rsp: ffffc90040cd4000   r8:  0000000000000000
>> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
>> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc90040cd7fff
>> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426e0
>> (XEN) cr3: 000000022200a000   cr2: ffffc90040cd3ff8
>> (XEN) fsb: 0000000000000000   gsb: ffff88021e6c0000   gss: 0000000000000000
>> (XEN) ds: 002b   es: 002b   fs: 8a00   gs: 0010   ss: e010   cs: e008
>> (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000
>> (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4000, tss.rsp0=ffff8300cec8ffa0
> 
> Given the %rsp and %cr2 values, it looks like we have a bad %rsp over a
> region which isn't mapped, tried to push a value, got #PF, tried to
> invoke the #PF exception handler which faulted again, and escalated to
> #DF which followed the TSS and moved back to reality.
> 
> The only way to come in with stack pointers other than TSS.RSP0 is via
> syscall and sysenter.  SYSENTER_ESP should be identical to TSS.RSP0
> 
> --- a/xen/arch/x86/x86_64/traps.c
> +++ b/xen/arch/x86/x86_64/traps.c
> @@ -257,6 +257,13 @@ void do_double_fault(struct cpu_user_regs *regs)
>      _show_registers(regs, crs, CTXT_hypervisor, NULL);
>      show_stack_overflow(cpu, regs);
>  
> +    {
> +        uint64_t val;
> +
> +        rdmsrl(MSR_IA32_SYSENTER_ESP, val);
> +        printk("*** SYSENTER_ESP: %p\n", _p(val));
> +    }
> +
>      panic("DOUBLE FAULT -- system shutdown");
>  }
>  
> so this bit of debugging should help track things down.  If not, then
> we've probably got an issue (re)writing the syscall trampolines.

(XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI
(XEN) CPU0 CMCI LVT vector (0xf2) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000
(XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800
(XEN) *** DOUBLE FAULT ***
(XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: ffffc90040cc4068   rbx: 0000000000000000   rcx: 000000000000000a
(XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
(XEN) rbp: 000036ffbf33bf77   rsp: ffffc90040cc4000   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc90040cc7fff
(XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426e0
(XEN) cr3: 000000022200a000   cr2: ffffc90040cc3ff8
(XEN) fsb: 0000000000000000   gsb: ffff88021e640000   gss: 0000000000000000
(XEN) ds: 002b   es: 002b   fs: 8a00   gs: 0010   ss: e010   cs: e008
(XEN) Current stack base ffffc90040cc0000 differs from expected ffff8300cec88000
(XEN) Valid stack range: ffffc90040cc6000-ffffc90040cc8000, sp=ffffc90040cc4000, tss.rsp0=ffff8300cec8ffa0
(XEN) No stack overflow detected. Skipping stack trace.
(XEN) *** SYSENTER_ESP: ffff8300cec8ffa0
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) DOUBLE FAULT -- system shutdown
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...

Thanks, Simon


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] x86: S3 resume adjustments
  2018-04-15 20:15         ` Simon Gaiser
@ 2018-04-16 13:13           ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2018-04-16 13:13 UTC (permalink / raw)
  To: Andrew Cooper, Simon Gaiser; +Cc: Juergen Gross, xen-devel

>>> On 15.04.18 at 22:15, <simon@invisiblethingslab.com> wrote:
> (XEN) *** DOUBLE FAULT ***
> (XEN) ----[ Xen-4.11-unstable  x86_64  debug=y   Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d08027c35d>] search_pre_exception_table+0/0x54
> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 0000000000000000
> (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: ffffc90040cd4028
> (XEN) rbp: 000036ffbf32bfb7   rsp: ffffc90040cd4020   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
> (XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffffc90040cd7fff
> (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000426e0
> (XEN) cr3: 000000022200a000   cr2: ffffc90040cd3ff8
> (XEN) fsb: 00007fd74515e740   gsb: ffff88021e6c0000   gss: 0000000000000000
> (XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000
> (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4020, tss.rsp0=ffff8300cec8ffa0

The fact that the exact location varies where the #DF triggers is of no big
interest - it all depends on when exactly the stack overflow occurs. What
I note though: ffffc90040cd4020 is a guest (presumably Dom0) kernel
address, far outside the Xen range. I guess we'd need to see all of that
(wrong) stack's contents logged up to the original entry into Xen to
understand how that could have happened.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-04-16 13:13 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich
2018-04-13 11:56 ` [PATCH 1/3] x86: correct ordering of operations during S3 resume Jan Beulich
2018-04-13 11:57 ` [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume Jan Beulich
2018-04-13 18:25   ` Simon Gaiser
2018-04-13 18:27     ` Andrew Cooper
2018-04-13 18:34       ` Simon Gaiser
2018-04-13 11:58 ` [PATCH 3/3] x86: check feature flags after resume Jan Beulich
2018-04-13 18:29   ` Simon Gaiser
2018-04-13 18:56     ` Simon Gaiser
2018-04-16 10:16       ` Jan Beulich
2018-04-13 12:01 ` [PATCH 0/3] x86: S3 resume adjustments Andrew Cooper
2018-04-16 11:57   ` Juergen Gross
2018-04-14  5:49 ` Simon Gaiser
2018-04-15 13:08   ` Andrew Cooper
2018-04-15 15:52     ` Simon Gaiser
2018-04-15 17:34       ` Andrew Cooper
2018-04-15 20:15         ` Simon Gaiser
2018-04-16 13:13           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.