* [PATCH 0/3] x86: S3 resume adjustments @ 2018-04-13 11:49 Jan Beulich 2018-04-13 11:56 ` [PATCH 1/3] x86: correct ordering of operations during S3 resume Jan Beulich ` (4 more replies) 0 siblings, 5 replies; 18+ messages in thread From: Jan Beulich @ 2018-04-13 11:49 UTC (permalink / raw) To: xen-devel; +Cc: Simon Gaiser, Andrew Cooper, Juergen Gross 1: correct ordering of operations during S3 resume 2: suppress BTI mitigations around S3 suspend/resume 3: check feature flags after resume Signed-off-by: Jan Beulich <jbeulich@suse.com> Simon, could you give this a try please? Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] x86: correct ordering of operations during S3 resume 2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich @ 2018-04-13 11:56 ` Jan Beulich 2018-04-13 11:57 ` [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume Jan Beulich ` (3 subsequent siblings) 4 siblings, 0 replies; 18+ messages in thread From: Jan Beulich @ 2018-04-13 11:56 UTC (permalink / raw) To: xen-devel; +Cc: Simon Gaiser, Andrew Cooper, Juergen Gross Microcode loading needs to happen before re-enabling interrupts, in case only updated microcode allows the use of e.g. the SPEC_{CTRL,CMD} MSRs. Otoh it doesn't need to happen at all when we didn't suspend in the first place. It needs to happen before spin_debug_enable() though, as it acquires a lock and hence would otherwise make common/spinlock.c:check_lock() unhappy. As micrcode loading can be pretty verbose, also make sure it only runs after console_end_sync(). cpufreq_add_cpu() doesn't need calling on the only "goto enable_cpu" path, which sits ahead of cpufreq_del_cpu(). Reported-by: Simon Gaiser <simon@invisiblethingslab.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -203,6 +203,7 @@ static int enter_state(u32 state) printk(XENLOG_ERR "Some devices failed to power down."); system_state = SYS_STATE_resume; device_power_up(error); + console_end_sync(); error = -EIO; goto done; } @@ -243,17 +244,19 @@ static int enter_state(u32 state) if ( (state == ACPI_STATE_S3) && error ) tboot_s3_error(error); + console_end_sync(); + + microcode_resume_cpu(0); + done: spin_debug_enable(); local_irq_restore(flags); - console_end_sync(); acpi_sleep_post(state); if ( hvm_cpu_up() ) BUG(); + cpufreq_add_cpu(0); enable_cpu: - cpufreq_add_cpu(0); - microcode_resume_cpu(0); rcu_barrier(); mtrr_aps_sync_begin(); enable_nonboot_cpus(); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume 2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich 2018-04-13 11:56 ` [PATCH 1/3] x86: correct ordering of operations during S3 resume Jan Beulich @ 2018-04-13 11:57 ` Jan Beulich 2018-04-13 18:25 ` Simon Gaiser 2018-04-13 11:58 ` [PATCH 3/3] x86: check feature flags after resume Jan Beulich ` (2 subsequent siblings) 4 siblings, 1 reply; 18+ messages in thread From: Jan Beulich @ 2018-04-13 11:57 UTC (permalink / raw) To: xen-devel; +Cc: Simon Gaiser, Andrew Cooper, Juergen Gross NMI and #MC can occur at any time after S3 resume, yet the MSR_SPEC_CTRL may become available only once we're reloaded microcode. Make SPEC_CTRL_ENTRY_FROM_INTR_IST and DO_SPEC_CTRL_EXIT_TO_XEN no-ops for the critical period of time. Also set the MSR back to its intended value. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -28,6 +28,7 @@ #include <asm/tboot.h> #include <asm/apic.h> #include <asm/io_apic.h> +#include <asm/spec_ctrl.h> #include <acpi/cpufreq/cpufreq.h> uint32_t system_reset_counter = 1; @@ -163,6 +164,7 @@ static int enter_state(u32 state) { unsigned long flags; int error; + struct cpu_info *ci; unsigned long cr4; if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) ) @@ -210,6 +212,10 @@ static int enter_state(u32 state) else error = 0; + ci = get_cpu_info(); + ci->use_shadow_spec_ctrl = 0; + ci->bti_ist_info = 0; + ACPI_FLUSH_CPU_CACHE(); switch ( state ) @@ -248,6 +254,11 @@ static int enter_state(u32 state) microcode_resume_cpu(0); + ci->bti_ist_info = default_bti_ist_info; + asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) + :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0) + : "memory"); + done: spin_debug_enable(); local_irq_restore(flags); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume 2018-04-13 11:57 ` [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume Jan Beulich @ 2018-04-13 18:25 ` Simon Gaiser 2018-04-13 18:27 ` Andrew Cooper 0 siblings, 1 reply; 18+ messages in thread From: Simon Gaiser @ 2018-04-13 18:25 UTC (permalink / raw) To: Jan Beulich, xen-devel; +Cc: Juergen Gross, Andrew Cooper [-- Attachment #1.1.1: Type: text/plain, Size: 1865 bytes --] Jan Beulich: > NMI and #MC can occur at any time after S3 resume, yet the MSR_SPEC_CTRL > may become available only once we're reloaded microcode. Make > SPEC_CTRL_ENTRY_FROM_INTR_IST and DO_SPEC_CTRL_EXIT_TO_XEN no-ops for > the critical period of time. > > Also set the MSR back to its intended value. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/arch/x86/acpi/power.c > +++ b/xen/arch/x86/acpi/power.c > @@ -28,6 +28,7 @@ > #include <asm/tboot.h> > #include <asm/apic.h> > #include <asm/io_apic.h> > +#include <asm/spec_ctrl.h> > #include <acpi/cpufreq/cpufreq.h> > > uint32_t system_reset_counter = 1; > @@ -163,6 +164,7 @@ static int enter_state(u32 state) > { > unsigned long flags; > int error; > + struct cpu_info *ci; > unsigned long cr4; > > if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) ) > @@ -210,6 +212,10 @@ static int enter_state(u32 state) > else > error = 0; > > + ci = get_cpu_info(); > + ci->use_shadow_spec_ctrl = 0; > + ci->bti_ist_info = 0; > + > ACPI_FLUSH_CPU_CACHE(); > > switch ( state ) > @@ -248,6 +254,11 @@ static int enter_state(u32 state) > > microcode_resume_cpu(0); > > + ci->bti_ist_info = default_bti_ist_info; > + asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) This does not compile for me: power.c: Assembler messages: power.c:272: Error: value of 257 too large for field of 1 bytes at 0 Changing the alternative based on the other "wrmsr" calls fixes it: asm volatile (ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_XEN_IBRS_SET) > + :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0) > + : "memory"); > + > done: > spin_debug_enable(); > local_irq_restore(flags); [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume 2018-04-13 18:25 ` Simon Gaiser @ 2018-04-13 18:27 ` Andrew Cooper 2018-04-13 18:34 ` Simon Gaiser 0 siblings, 1 reply; 18+ messages in thread From: Andrew Cooper @ 2018-04-13 18:27 UTC (permalink / raw) To: Simon Gaiser, Jan Beulich, xen-devel; +Cc: Juergen Gross On 13/04/18 19:25, Simon Gaiser wrote: > Jan Beulich: >> NMI and #MC can occur at any time after S3 resume, yet the MSR_SPEC_CTRL >> may become available only once we're reloaded microcode. Make >> SPEC_CTRL_ENTRY_FROM_INTR_IST and DO_SPEC_CTRL_EXIT_TO_XEN no-ops for >> the critical period of time. >> >> Also set the MSR back to its intended value. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> >> --- a/xen/arch/x86/acpi/power.c >> +++ b/xen/arch/x86/acpi/power.c >> @@ -28,6 +28,7 @@ >> #include <asm/tboot.h> >> #include <asm/apic.h> >> #include <asm/io_apic.h> >> +#include <asm/spec_ctrl.h> >> #include <acpi/cpufreq/cpufreq.h> >> >> uint32_t system_reset_counter = 1; >> @@ -163,6 +164,7 @@ static int enter_state(u32 state) >> { >> unsigned long flags; >> int error; >> + struct cpu_info *ci; >> unsigned long cr4; >> >> if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) ) >> @@ -210,6 +212,10 @@ static int enter_state(u32 state) >> else >> error = 0; >> >> + ci = get_cpu_info(); >> + ci->use_shadow_spec_ctrl = 0; >> + ci->bti_ist_info = 0; >> + >> ACPI_FLUSH_CPU_CACHE(); >> >> switch ( state ) >> @@ -248,6 +254,11 @@ static int enter_state(u32 state) >> >> microcode_resume_cpu(0); >> >> + ci->bti_ist_info = default_bti_ist_info; >> + asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) > This does not compile for me: > > power.c: Assembler messages: > power.c:272: Error: value of 257 too large for field of 1 bytes at 0 > > Changing the alternative based on the other "wrmsr" calls fixes it: > > asm volatile (ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_XEN_IBRS_SET) Ah - you're presumably back-porting this to 4.8? Jan's code is correct for staging, and your version here is correct for all currently-released versions of Xen. (I've done quite a lot of playing with alternatives generation for 4.11.) ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume 2018-04-13 18:27 ` Andrew Cooper @ 2018-04-13 18:34 ` Simon Gaiser 0 siblings, 0 replies; 18+ messages in thread From: Simon Gaiser @ 2018-04-13 18:34 UTC (permalink / raw) To: Andrew Cooper, Jan Beulich, xen-devel; +Cc: Juergen Gross [-- Attachment #1.1.1: Type: text/plain, Size: 2158 bytes --] Andrew Cooper: > On 13/04/18 19:25, Simon Gaiser wrote: >> Jan Beulich: >>> NMI and #MC can occur at any time after S3 resume, yet the MSR_SPEC_CTRL >>> may become available only once we're reloaded microcode. Make >>> SPEC_CTRL_ENTRY_FROM_INTR_IST and DO_SPEC_CTRL_EXIT_TO_XEN no-ops for >>> the critical period of time. >>> >>> Also set the MSR back to its intended value. >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> >>> --- a/xen/arch/x86/acpi/power.c >>> +++ b/xen/arch/x86/acpi/power.c >>> @@ -28,6 +28,7 @@ >>> #include <asm/tboot.h> >>> #include <asm/apic.h> >>> #include <asm/io_apic.h> >>> +#include <asm/spec_ctrl.h> >>> #include <acpi/cpufreq/cpufreq.h> >>> >>> uint32_t system_reset_counter = 1; >>> @@ -163,6 +164,7 @@ static int enter_state(u32 state) >>> { >>> unsigned long flags; >>> int error; >>> + struct cpu_info *ci; >>> unsigned long cr4; >>> >>> if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) ) >>> @@ -210,6 +212,10 @@ static int enter_state(u32 state) >>> else >>> error = 0; >>> >>> + ci = get_cpu_info(); >>> + ci->use_shadow_spec_ctrl = 0; >>> + ci->bti_ist_info = 0; >>> + >>> ACPI_FLUSH_CPU_CACHE(); >>> >>> switch ( state ) >>> @@ -248,6 +254,11 @@ static int enter_state(u32 state) >>> >>> microcode_resume_cpu(0); >>> >>> + ci->bti_ist_info = default_bti_ist_info; >>> + asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) >> This does not compile for me: >> >> power.c: Assembler messages: >> power.c:272: Error: value of 257 too large for field of 1 bytes at 0 >> >> Changing the alternative based on the other "wrmsr" calls fixes it: >> >> asm volatile (ALTERNATIVE(ASM_NOP3, "wrmsr", X86_FEATURE_XEN_IBRS_SET) > > Ah - you're presumably back-porting this to 4.8? > > Jan's code is correct for staging, and your version here is correct for > all currently-released versions of Xen. (I've done quite a lot of > playing with alternatives generation for 4.11.) Yeah, sorry, I should have checked if it works with staging. [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] x86: check feature flags after resume 2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich 2018-04-13 11:56 ` [PATCH 1/3] x86: correct ordering of operations during S3 resume Jan Beulich 2018-04-13 11:57 ` [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume Jan Beulich @ 2018-04-13 11:58 ` Jan Beulich 2018-04-13 18:29 ` Simon Gaiser 2018-04-13 12:01 ` [PATCH 0/3] x86: S3 resume adjustments Andrew Cooper 2018-04-14 5:49 ` Simon Gaiser 4 siblings, 1 reply; 18+ messages in thread From: Jan Beulich @ 2018-04-13 11:58 UTC (permalink / raw) To: xen-devel; +Cc: Simon Gaiser, Andrew Cooper, Juergen Gross Make sure no previously present features are missing after resume (and the re-loading of microcode), to avoid later crashes or (likely silent) hangs / live locks. This doesn't go beyond checking x86_capability[], but this should be good enough for the immediate need of making sure that the BIT mitigation MSRs are still available. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -254,6 +254,9 @@ static int enter_state(u32 state) microcode_resume_cpu(0); + if ( !recheck_cpu_features(0) ) + panic("Missing previously available feature(s)."); + ci->bti_ist_info = default_bti_ist_info; asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0) --- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -501,6 +501,9 @@ void identify_cpu(struct cpuinfo_x86 *c) printk("\n"); #endif + if (system_state == SYS_STATE_resume) + return; + /* * On SMP, boot_cpu_data holds the common feature set between * all CPUs; so make sure that we indicate which features are --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -473,6 +473,28 @@ void __init init_guest_cpuid(void) calculate_hvm_max_policy(); } +bool recheck_cpu_features(unsigned int cpu) +{ + bool okay = true; + struct cpuinfo_x86 c; + const struct cpuinfo_x86 *bsp = &boot_cpu_data; + unsigned int i; + + identify_cpu(&c); + + for ( i = 0; i < NCAPINTS; ++i ) + { + if ( !(~c.x86_capability[i] & bsp->x86_capability[i]) ) + continue; + + printk(XENLOG_ERR "CPU%u: cap[%2u] is %08x (expected %08x)\n", + cpu, i, c.x86_capability[i], bsp->x86_capability[i]); + okay = false; + } + + return okay; +} + const uint32_t *lookup_deep_deps(uint32_t feature) { static const struct { --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -90,11 +90,14 @@ void initialize_cpu_data(unsigned int cp cpu_data[cpu] = boot_cpu_data; } -static void smp_store_cpu_info(int id) +static bool smp_store_cpu_info(unsigned int id) { unsigned int socket; - identify_cpu(&cpu_data[id]); + if ( system_state != SYS_STATE_resume ) + identify_cpu(&cpu_data[id]); + else if ( !recheck_cpu_features(id) ) + return false; socket = cpu_to_socket(id); if ( !socket_cpumask[socket] ) @@ -102,6 +105,8 @@ static void smp_store_cpu_info(int id) socket_cpumask[socket] = secondary_socket_cpumask; secondary_socket_cpumask = NULL; } + + return true; } /* @@ -187,12 +192,19 @@ static void smp_callin(void) setup_local_APIC(); /* Save our processor parameters. */ - smp_store_cpu_info(cpu); + if ( !smp_store_cpu_info(cpu) ) + { + printk("CPU%u: Failed to validate features - not coming back online\n", + cpu); + cpu_error = -ENXIO; + goto halt; + } if ( (rc = hvm_cpu_up()) != 0 ) { printk("CPU%d: Failed to initialise HVM. Not coming online.\n", cpu); cpu_error = rc; + halt: clear_local_APIC(); spin_debug_enable(); cpu_exit_clear(cpu); --- a/xen/include/asm-x86/cpuid.h +++ b/xen/include/asm-x86/cpuid.h @@ -253,6 +253,9 @@ static inline void cpuid_featureset_to_p extern struct cpuid_policy raw_cpuid_policy, host_cpuid_policy, pv_max_cpuid_policy, hvm_max_cpuid_policy; +/* Check that all previously present features are still available. */ +bool recheck_cpu_features(unsigned int cpu); + /* Allocate and initialise a CPUID policy suitable for the domain. */ int init_domain_cpuid_policy(struct domain *d); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] x86: check feature flags after resume 2018-04-13 11:58 ` [PATCH 3/3] x86: check feature flags after resume Jan Beulich @ 2018-04-13 18:29 ` Simon Gaiser 2018-04-13 18:56 ` Simon Gaiser 0 siblings, 1 reply; 18+ messages in thread From: Simon Gaiser @ 2018-04-13 18:29 UTC (permalink / raw) To: Jan Beulich, xen-devel; +Cc: Juergen Gross, Andrew Cooper [-- Attachment #1.1.1: Type: text/plain, Size: 5053 bytes --] Jan Beulich: > Make sure no previously present features are missing after resume (and > the re-loading of microcode), to avoid later crashes or (likely silent) > hangs / live locks. This doesn't go beyond checking x86_capability[], > but this should be good enough for the immediate need of making sure > that the BIT mitigation MSRs are still available. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > --- a/xen/arch/x86/acpi/power.c > +++ b/xen/arch/x86/acpi/power.c > @@ -254,6 +254,9 @@ static int enter_state(u32 state) > > microcode_resume_cpu(0); > > + if ( !recheck_cpu_features(0) ) > + panic("Missing previously available feature(s)."); > + > ci->bti_ist_info = default_bti_ist_info; > asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) > :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0) > --- a/xen/arch/x86/cpu/common.c > +++ b/xen/arch/x86/cpu/common.c > @@ -501,6 +501,9 @@ void identify_cpu(struct cpuinfo_x86 *c) > printk("\n"); > #endif > > + if (system_state == SYS_STATE_resume) > + return; > + > /* > * On SMP, boot_cpu_data holds the common feature set between > * all CPUs; so make sure that we indicate which features are > --- a/xen/arch/x86/cpuid.c > +++ b/xen/arch/x86/cpuid.c > @@ -473,6 +473,28 @@ void __init init_guest_cpuid(void) > calculate_hvm_max_policy(); > } > > +bool recheck_cpu_features(unsigned int cpu) > +{ > + bool okay = true; > + struct cpuinfo_x86 c; > + const struct cpuinfo_x86 *bsp = &boot_cpu_data; > + unsigned int i; > + > + identify_cpu(&c); This runs into a bug in identify_cpu(). x86_vendor_id does not get zeroed, so the x86_vendor_id is not null terminated and the vendor identification fails. diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c index 4feaa2ceb6..5750d26216 100644 --- a/xen/arch/x86/cpu/common.c +++ b/xen/arch/x86/cpu/common.c @@ -366,8 +366,8 @@ void identify_cpu(struct cpuinfo_x86 *c) c->x86_vendor = X86_VENDOR_UNKNOWN; c->cpuid_level = -1; /* CPUID not detected */ c->x86_model = c->x86_mask = 0; /* So far unknown... */ - c->x86_vendor_id[0] = '\0'; /* Unset */ - c->x86_model_id[0] = '\0'; /* Unset */ + memset(&c->x86_vendor_id, 0, sizeof(c->x86_vendor_id)); + memset(&c->x86_model_id, 0, sizeof(c->x86_model_id)); c->x86_max_cores = 1; c->x86_num_siblings = 1; c->x86_clflush_size = 0; With this patch it works for me. > + > + for ( i = 0; i < NCAPINTS; ++i ) > + { > + if ( !(~c.x86_capability[i] & bsp->x86_capability[i]) ) > + continue; > + > + printk(XENLOG_ERR "CPU%u: cap[%2u] is %08x (expected %08x)\n", > + cpu, i, c.x86_capability[i], bsp->x86_capability[i]); > + okay = false; > + } > + > + return okay; > +} > + > const uint32_t *lookup_deep_deps(uint32_t feature) > { > static const struct { > --- a/xen/arch/x86/smpboot.c > +++ b/xen/arch/x86/smpboot.c > @@ -90,11 +90,14 @@ void initialize_cpu_data(unsigned int cp > cpu_data[cpu] = boot_cpu_data; > } > > -static void smp_store_cpu_info(int id) > +static bool smp_store_cpu_info(unsigned int id) > { > unsigned int socket; > > - identify_cpu(&cpu_data[id]); > + if ( system_state != SYS_STATE_resume ) > + identify_cpu(&cpu_data[id]); > + else if ( !recheck_cpu_features(id) ) > + return false; > > socket = cpu_to_socket(id); > if ( !socket_cpumask[socket] ) > @@ -102,6 +105,8 @@ static void smp_store_cpu_info(int id) > socket_cpumask[socket] = secondary_socket_cpumask; > secondary_socket_cpumask = NULL; > } > + > + return true; > } > > /* > @@ -187,12 +192,19 @@ static void smp_callin(void) > setup_local_APIC(); > > /* Save our processor parameters. */ > - smp_store_cpu_info(cpu); > + if ( !smp_store_cpu_info(cpu) ) > + { > + printk("CPU%u: Failed to validate features - not coming back online\n", > + cpu); > + cpu_error = -ENXIO; > + goto halt; > + } > > if ( (rc = hvm_cpu_up()) != 0 ) > { > printk("CPU%d: Failed to initialise HVM. Not coming online.\n", cpu); > cpu_error = rc; > + halt: > clear_local_APIC(); > spin_debug_enable(); > cpu_exit_clear(cpu); > --- a/xen/include/asm-x86/cpuid.h > +++ b/xen/include/asm-x86/cpuid.h > @@ -253,6 +253,9 @@ static inline void cpuid_featureset_to_p > extern struct cpuid_policy raw_cpuid_policy, host_cpuid_policy, > pv_max_cpuid_policy, hvm_max_cpuid_policy; > > +/* Check that all previously present features are still available. */ > +bool recheck_cpu_features(unsigned int cpu); > + > /* Allocate and initialise a CPUID policy suitable for the domain. */ > int init_domain_cpuid_policy(struct domain *d); > [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] x86: check feature flags after resume 2018-04-13 18:29 ` Simon Gaiser @ 2018-04-13 18:56 ` Simon Gaiser 2018-04-16 10:16 ` Jan Beulich 0 siblings, 1 reply; 18+ messages in thread From: Simon Gaiser @ 2018-04-13 18:56 UTC (permalink / raw) To: Jan Beulich, xen-devel; +Cc: Juergen Gross, Andrew Cooper [-- Attachment #1.1.1: Type: text/plain, Size: 2830 bytes --] Simon Gaiser: > Jan Beulich: >> Make sure no previously present features are missing after resume (and >> the re-loading of microcode), to avoid later crashes or (likely silent) >> hangs / live locks. This doesn't go beyond checking x86_capability[], >> but this should be good enough for the immediate need of making sure >> that the BIT mitigation MSRs are still available. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> >> --- a/xen/arch/x86/acpi/power.c >> +++ b/xen/arch/x86/acpi/power.c >> @@ -254,6 +254,9 @@ static int enter_state(u32 state) >> >> microcode_resume_cpu(0); >> >> + if ( !recheck_cpu_features(0) ) >> + panic("Missing previously available feature(s)."); >> + >> ci->bti_ist_info = default_bti_ist_info; >> asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) >> :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0) >> --- a/xen/arch/x86/cpu/common.c >> +++ b/xen/arch/x86/cpu/common.c >> @@ -501,6 +501,9 @@ void identify_cpu(struct cpuinfo_x86 *c) >> printk("\n"); >> #endif >> >> + if (system_state == SYS_STATE_resume) >> + return; >> + >> /* >> * On SMP, boot_cpu_data holds the common feature set between >> * all CPUs; so make sure that we indicate which features are >> --- a/xen/arch/x86/cpuid.c >> +++ b/xen/arch/x86/cpuid.c >> @@ -473,6 +473,28 @@ void __init init_guest_cpuid(void) >> calculate_hvm_max_policy(); >> } >> >> +bool recheck_cpu_features(unsigned int cpu) >> +{ >> + bool okay = true; >> + struct cpuinfo_x86 c; >> + const struct cpuinfo_x86 *bsp = &boot_cpu_data; >> + unsigned int i; >> + >> + identify_cpu(&c); > > This runs into a bug in identify_cpu(). x86_vendor_id does not get > zeroed, so the x86_vendor_id is not null terminated and the vendor > identification fails. > > diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c > index 4feaa2ceb6..5750d26216 100644 > --- a/xen/arch/x86/cpu/common.c > +++ b/xen/arch/x86/cpu/common.c > @@ -366,8 +366,8 @@ void identify_cpu(struct cpuinfo_x86 *c) > c->x86_vendor = X86_VENDOR_UNKNOWN; > c->cpuid_level = -1; /* CPUID not detected */ > c->x86_model = c->x86_mask = 0; /* So far unknown... */ > - c->x86_vendor_id[0] = '\0'; /* Unset */ > - c->x86_model_id[0] = '\0'; /* Unset */ > + memset(&c->x86_vendor_id, 0, sizeof(c->x86_vendor_id)); > + memset(&c->x86_model_id, 0, sizeof(c->x86_model_id)); > c->x86_max_cores = 1; > c->x86_num_siblings = 1; > c->x86_clflush_size = 0; > > With this patch it works for me. Meh, also a backport failure from me. Since e34bc403c3c7 this problem should not appear since it does not assume a null terminated string. [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] x86: check feature flags after resume 2018-04-13 18:56 ` Simon Gaiser @ 2018-04-16 10:16 ` Jan Beulich 0 siblings, 0 replies; 18+ messages in thread From: Jan Beulich @ 2018-04-16 10:16 UTC (permalink / raw) To: Simon Gaiser; +Cc: Juergen Gross, Andrew Cooper, xen-devel >>> On 13.04.18 at 20:56, <simon@invisiblethingslab.com> wrote: > Simon Gaiser: >> Jan Beulich: >>> Make sure no previously present features are missing after resume (and >>> the re-loading of microcode), to avoid later crashes or (likely silent) >>> hangs / live locks. This doesn't go beyond checking x86_capability[], >>> but this should be good enough for the immediate need of making sure >>> that the BIT mitigation MSRs are still available. >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> >>> --- a/xen/arch/x86/acpi/power.c >>> +++ b/xen/arch/x86/acpi/power.c >>> @@ -254,6 +254,9 @@ static int enter_state(u32 state) >>> >>> microcode_resume_cpu(0); >>> >>> + if ( !recheck_cpu_features(0) ) >>> + panic("Missing previously available feature(s)."); >>> + >>> ci->bti_ist_info = default_bti_ist_info; >>> asm volatile (ALTERNATIVE("", "wrmsr", X86_FEATURE_XEN_IBRS_SET) >>> :: "a" (SPEC_CTRL_IBRS), "c" (MSR_SPEC_CTRL), "d" (0) >>> --- a/xen/arch/x86/cpu/common.c >>> +++ b/xen/arch/x86/cpu/common.c >>> @@ -501,6 +501,9 @@ void identify_cpu(struct cpuinfo_x86 *c) >>> printk("\n"); >>> #endif >>> >>> + if (system_state == SYS_STATE_resume) >>> + return; >>> + >>> /* >>> * On SMP, boot_cpu_data holds the common feature set between >>> * all CPUs; so make sure that we indicate which features are >>> --- a/xen/arch/x86/cpuid.c >>> +++ b/xen/arch/x86/cpuid.c >>> @@ -473,6 +473,28 @@ void __init init_guest_cpuid(void) >>> calculate_hvm_max_policy(); >>> } >>> >>> +bool recheck_cpu_features(unsigned int cpu) >>> +{ >>> + bool okay = true; >>> + struct cpuinfo_x86 c; >>> + const struct cpuinfo_x86 *bsp = &boot_cpu_data; >>> + unsigned int i; >>> + >>> + identify_cpu(&c); >> >> This runs into a bug in identify_cpu(). x86_vendor_id does not get >> zeroed, so the x86_vendor_id is not null terminated and the vendor >> identification fails. >> >> diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c >> index 4feaa2ceb6..5750d26216 100644 >> --- a/xen/arch/x86/cpu/common.c >> +++ b/xen/arch/x86/cpu/common.c >> @@ -366,8 +366,8 @@ void identify_cpu(struct cpuinfo_x86 *c) >> c->x86_vendor = X86_VENDOR_UNKNOWN; >> c->cpuid_level = -1; /* CPUID not detected */ >> c->x86_model = c->x86_mask = 0; /* So far unknown... */ >> - c->x86_vendor_id[0] = '\0'; /* Unset */ >> - c->x86_model_id[0] = '\0'; /* Unset */ >> + memset(&c->x86_vendor_id, 0, sizeof(c->x86_vendor_id)); >> + memset(&c->x86_model_id, 0, sizeof(c->x86_model_id)); >> c->x86_max_cores = 1; >> c->x86_num_siblings = 1; >> c->x86_clflush_size = 0; >> >> With this patch it works for me. > > Meh, also a backport failure from me. Since e34bc403c3c7 this problem > should not appear since it does not assume a null terminated string. NP - it's good to be aware of such issues in case we as well decide to backport this. Thanks for the feedback, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] x86: S3 resume adjustments 2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich ` (2 preceding siblings ...) 2018-04-13 11:58 ` [PATCH 3/3] x86: check feature flags after resume Jan Beulich @ 2018-04-13 12:01 ` Andrew Cooper 2018-04-16 11:57 ` Juergen Gross 2018-04-14 5:49 ` Simon Gaiser 4 siblings, 1 reply; 18+ messages in thread From: Andrew Cooper @ 2018-04-13 12:01 UTC (permalink / raw) To: Jan Beulich, xen-devel; +Cc: Simon Gaiser, Juergen Gross On 13/04/18 12:49, Jan Beulich wrote: > 1: correct ordering of operations during S3 resume > 2: suppress BTI mitigations around S3 suspend/resume > 3: check feature flags after resume > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] x86: S3 resume adjustments 2018-04-13 12:01 ` [PATCH 0/3] x86: S3 resume adjustments Andrew Cooper @ 2018-04-16 11:57 ` Juergen Gross 0 siblings, 0 replies; 18+ messages in thread From: Juergen Gross @ 2018-04-16 11:57 UTC (permalink / raw) To: Andrew Cooper, Jan Beulich, xen-devel; +Cc: Simon Gaiser On 13/04/18 14:01, Andrew Cooper wrote: > On 13/04/18 12:49, Jan Beulich wrote: >> 1: correct ordering of operations during S3 resume >> 2: suppress BTI mitigations around S3 suspend/resume >> 3: check feature flags after resume >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> > Release-acked-by: Juergen Gross <jgross@suse.com> Juergen _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] x86: S3 resume adjustments 2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich ` (3 preceding siblings ...) 2018-04-13 12:01 ` [PATCH 0/3] x86: S3 resume adjustments Andrew Cooper @ 2018-04-14 5:49 ` Simon Gaiser 2018-04-15 13:08 ` Andrew Cooper 4 siblings, 1 reply; 18+ messages in thread From: Simon Gaiser @ 2018-04-14 5:49 UTC (permalink / raw) To: Jan Beulich, xen-devel; +Cc: Juergen Gross, Andrew Cooper [-- Attachment #1.1.1: Type: text/plain, Size: 755 bytes --] Jan Beulich: > 1: correct ordering of operations during S3 resume > 2: suppress BTI mitigations around S3 suspend/resume > 3: check feature flags after resume > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Simon, could you give this a try please? Backported to 4.8 it works fine with the two fixes I sent earlier. I now also tried staging. Resume is broken even without IBRS/IBPB. It panics about a double fault somewhere after it starts to enable the non-boot CPUs. Since the IBRS/IPBP problem happens before that point I could test the patches anyway. With them it gets again to the point where it double faults. So the patches are most likely fine. I didn't really looked yet at the cause of the double fault. Simon [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] x86: S3 resume adjustments 2018-04-14 5:49 ` Simon Gaiser @ 2018-04-15 13:08 ` Andrew Cooper 2018-04-15 15:52 ` Simon Gaiser 0 siblings, 1 reply; 18+ messages in thread From: Andrew Cooper @ 2018-04-15 13:08 UTC (permalink / raw) To: Simon Gaiser, Jan Beulich, xen-devel; +Cc: Juergen Gross On 14/04/18 06:49, Simon Gaiser wrote: > Jan Beulich: >> 1: correct ordering of operations during S3 resume >> 2: suppress BTI mitigations around S3 suspend/resume >> 3: check feature flags after resume >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> >> Simon, could you give this a try please? > Backported to 4.8 it works fine with the two fixes I sent earlier. > > I now also tried staging. Resume is broken even without IBRS/IBPB. It > panics about a double fault somewhere after it starts to enable the > non-boot CPUs. Since the IBRS/IPBP problem happens before that point I > could test the patches anyway. With them it gets again to the point > where it double faults. So the patches are most likely fine. > > I didn't really looked yet at the cause of the double fault. Do you at least have the crash log from the attempt? ~Andrew > > Simon > _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] x86: S3 resume adjustments 2018-04-15 13:08 ` Andrew Cooper @ 2018-04-15 15:52 ` Simon Gaiser 2018-04-15 17:34 ` Andrew Cooper 0 siblings, 1 reply; 18+ messages in thread From: Simon Gaiser @ 2018-04-15 15:52 UTC (permalink / raw) To: Andrew Cooper, Jan Beulich, xen-devel; +Cc: Juergen Gross [-- Attachment #1.1.1: Type: text/plain, Size: 3216 bytes --] Andrew Cooper: > On 14/04/18 06:49, Simon Gaiser wrote: >> Jan Beulich: >>> 1: correct ordering of operations during S3 resume >>> 2: suppress BTI mitigations around S3 suspend/resume >>> 3: check feature flags after resume >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> >>> Simon, could you give this a try please? >> Backported to 4.8 it works fine with the two fixes I sent earlier. >> >> I now also tried staging. Resume is broken even without IBRS/IBPB. It >> panics about a double fault somewhere after it starts to enable the >> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I >> could test the patches anyway. With them it gets again to the point >> where it double faults. So the patches are most likely fine. >> >> I didn't really looked yet at the cause of the double fault. > > Do you at least have the crash log from the attempt? Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a Debian sid: (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI (XEN) CPU0 CMCI LVT vector (0xf2) already installed (XEN) Finishing wakeup from ACPI S3 state. (XEN) Enabling non-boot CPUs ... (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 (XEN) *** DOUBLE FAULT *** (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7 (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor (XEN) rax: ffffc90040cd4068 rbx: 0000000000000000 rcx: 000000000000000a (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: 0000000000000000 (XEN) rbp: 000036ffbf32bf77 rsp: ffffc90040cd4000 r8: 0000000000000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: ffffc90040cd7fff (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426e0 (XEN) cr3: 000000022200a000 cr2: ffffc90040cd3ff8 (XEN) fsb: 0000000000000000 gsb: ffff88021e6c0000 gss: 0000000000000000 (XEN) ds: 002b es: 002b fs: 8a00 gs: 0010 ss: e010 cs: e008 (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000 (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4000, tss.rsp0=ffff8300cec8ffa0 (XEN) No stack overflow detected. Skipping stack trace. (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) DOUBLE FAULT -- system shutdown (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] x86: S3 resume adjustments 2018-04-15 15:52 ` Simon Gaiser @ 2018-04-15 17:34 ` Andrew Cooper 2018-04-15 20:15 ` Simon Gaiser 0 siblings, 1 reply; 18+ messages in thread From: Andrew Cooper @ 2018-04-15 17:34 UTC (permalink / raw) To: Simon Gaiser, Jan Beulich, xen-devel; +Cc: Juergen Gross On 15/04/18 16:52, Simon Gaiser wrote: > Andrew Cooper: >> On 14/04/18 06:49, Simon Gaiser wrote: >>> Jan Beulich: >>>> 1: correct ordering of operations during S3 resume >>>> 2: suppress BTI mitigations around S3 suspend/resume >>>> 3: check feature flags after resume >>>> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>> >>>> Simon, could you give this a try please? >>> Backported to 4.8 it works fine with the two fixes I sent earlier. >>> >>> I now also tried staging. Resume is broken even without IBRS/IBPB. It >>> panics about a double fault somewhere after it starts to enable the >>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I >>> could test the patches anyway. With them it gets again to the point >>> where it double faults. So the patches are most likely fine. >>> >>> I didn't really looked yet at the cause of the double fault. >> Do you at least have the crash log from the attempt? > Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a > Debian sid: I can't find that object. I presume this isn't an upstream tree? > > (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI > (XEN) CPU0 CMCI LVT vector (0xf2) already installed > (XEN) Finishing wakeup from ACPI S3 state. > (XEN) Enabling non-boot CPUs ... > (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 > (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 > (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 > (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 > (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 > (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 Bad dom0. It shouldn't be playing with APIC_BASE at all, but I guess this means I can't fix the hypervisor behaviour to throw #GP back at a PV guest. > (XEN) *** DOUBLE FAULT *** > (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7 Can you disassemble the binary and find out where this is? On current staging, handle_exception+0x9c is in the middle of SPEC_CTRL_ENTRY_FROM_INTR but this might not be the case for you. > (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor > (XEN) rax: ffffc90040cd4068 rbx: 0000000000000000 rcx: 000000000000000a > (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: 0000000000000000 > (XEN) rbp: 000036ffbf32bf77 rsp: ffffc90040cd4000 r8: 0000000000000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: ffffc90040cd7fff > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426e0 > (XEN) cr3: 000000022200a000 cr2: ffffc90040cd3ff8 > (XEN) fsb: 0000000000000000 gsb: ffff88021e6c0000 gss: 0000000000000000 > (XEN) ds: 002b es: 002b fs: 8a00 gs: 0010 ss: e010 cs: e008 > (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000 > (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4000, tss.rsp0=ffff8300cec8ffa0 Given the %rsp and %cr2 values, it looks like we have a bad %rsp over a region which isn't mapped, tried to push a value, got #PF, tried to invoke the #PF exception handler which faulted again, and escalated to #DF which followed the TSS and moved back to reality. The only way to come in with stack pointers other than TSS.RSP0 is via syscall and sysenter. SYSENTER_ESP should be identical to TSS.RSP0 --- a/xen/arch/x86/x86_64/traps.c +++ b/xen/arch/x86/x86_64/traps.c @@ -257,6 +257,13 @@ void do_double_fault(struct cpu_user_regs *regs) _show_registers(regs, crs, CTXT_hypervisor, NULL); show_stack_overflow(cpu, regs); + { + uint64_t val; + + rdmsrl(MSR_IA32_SYSENTER_ESP, val); + printk("*** SYSENTER_ESP: %p\n", _p(val)); + } + panic("DOUBLE FAULT -- system shutdown"); } so this bit of debugging should help track things down. If not, then we've probably got an issue (re)writing the syscall trampolines. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] x86: S3 resume adjustments 2018-04-15 17:34 ` Andrew Cooper @ 2018-04-15 20:15 ` Simon Gaiser 2018-04-16 13:13 ` Jan Beulich 0 siblings, 1 reply; 18+ messages in thread From: Simon Gaiser @ 2018-04-15 20:15 UTC (permalink / raw) To: Andrew Cooper, Jan Beulich, xen-devel; +Cc: Juergen Gross [-- Attachment #1.1.1: Type: text/plain, Size: 16458 bytes --] Andrew Cooper: > On 15/04/18 16:52, Simon Gaiser wrote: >> Andrew Cooper: >>> On 14/04/18 06:49, Simon Gaiser wrote: >>>> Jan Beulich: >>>>> 1: correct ordering of operations during S3 resume >>>>> 2: suppress BTI mitigations around S3 suspend/resume >>>>> 3: check feature flags after resume >>>>> >>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>>> >>>>> Simon, could you give this a try please? >>>> Backported to 4.8 it works fine with the two fixes I sent earlier. >>>> >>>> I now also tried staging. Resume is broken even without IBRS/IBPB. It >>>> panics about a double fault somewhere after it starts to enable the >>>> non-boot CPUs. Since the IBRS/IPBP problem happens before that point I >>>> could test the patches anyway. With them it gets again to the point >>>> where it double faults. So the patches are most likely fine. >>>> >>>> I didn't really looked yet at the cause of the double fault. >>> Do you at least have the crash log from the attempt? >> Sure, it' a build of 16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 on a >> Debian sid: > > I can't find that object. I presume this isn't an upstream tree? That's the head of upstream staging as of Friday/Saturday night. And AFAICS it still is: https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=16fb4b5a9a79f95df17f10ba62e9f44d21cf89b5 >> (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI >> (XEN) CPU0 CMCI LVT vector (0xf2) already installed >> (XEN) Finishing wakeup from ACPI S3 state. >> (XEN) Enabling non-boot CPUs ... >> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 >> (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 >> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 >> (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 >> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 >> (XEN) emul-priv-op.c:1179:d0v3 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 > > Bad dom0. It shouldn't be playing with APIC_BASE at all, but I guess > this means I can't fix the hypervisor behaviour to throw #GP back at a > PV guest. > >> (XEN) *** DOUBLE FAULT *** >> (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7 > > Can you disassemble the binary and find out where this is? On current > staging, handle_exception+0x9c is in the middle of > SPEC_CTRL_ENTRY_FROM_INTR but this might not be the case for you. Dump of assembler code for function handle_exception: 0xffff82d08037a8a8 <+0>: 0f 1f 00 nopl (%rax) 0xffff82d08037a8ab <+3>: 48 83 c4 88 add $0xffffffffffffff88,%rsp 0xffff82d08037a8af <+7>: fc cld 0xffff82d08037a8b0 <+8>: 48 89 7c 24 70 mov %rdi,0x70(%rsp) 0xffff82d08037a8b5 <+13>: 31 ff xor %edi,%edi 0xffff82d08037a8b7 <+15>: 48 89 74 24 68 mov %rsi,0x68(%rsp) 0xffff82d08037a8bc <+20>: 31 f6 xor %esi,%esi 0xffff82d08037a8be <+22>: 48 89 54 24 60 mov %rdx,0x60(%rsp) 0xffff82d08037a8c3 <+27>: 31 d2 xor %edx,%edx 0xffff82d08037a8c5 <+29>: 48 89 4c 24 58 mov %rcx,0x58(%rsp) 0xffff82d08037a8ca <+34>: 31 c9 xor %ecx,%ecx 0xffff82d08037a8cc <+36>: 48 89 44 24 50 mov %rax,0x50(%rsp) 0xffff82d08037a8d1 <+41>: 31 c0 xor %eax,%eax 0xffff82d08037a8d3 <+43>: 4c 89 44 24 48 mov %r8,0x48(%rsp) 0xffff82d08037a8d8 <+48>: 4c 89 4c 24 40 mov %r9,0x40(%rsp) 0xffff82d08037a8dd <+53>: 4c 89 54 24 38 mov %r10,0x38(%rsp) 0xffff82d08037a8e2 <+58>: 4c 89 5c 24 30 mov %r11,0x30(%rsp) 0xffff82d08037a8e7 <+63>: 45 31 c0 xor %r8d,%r8d 0xffff82d08037a8ea <+66>: 45 31 c9 xor %r9d,%r9d 0xffff82d08037a8ed <+69>: 45 31 d2 xor %r10d,%r10d 0xffff82d08037a8f0 <+72>: 45 31 db xor %r11d,%r11d 0xffff82d08037a8f3 <+75>: 48 89 5c 24 28 mov %rbx,0x28(%rsp) 0xffff82d08037a8f8 <+80>: 31 db xor %ebx,%ebx 0xffff82d08037a8fa <+82>: 48 89 6c 24 20 mov %rbp,0x20(%rsp) 0xffff82d08037a8ff <+87>: 48 8d 6c 24 20 lea 0x20(%rsp),%rbp 0xffff82d08037a904 <+92>: 48 f7 d5 not %rbp 0xffff82d08037a907 <+95>: 4c 89 64 24 18 mov %r12,0x18(%rsp) 0xffff82d08037a90c <+100>: 4c 89 6c 24 10 mov %r13,0x10(%rsp) 0xffff82d08037a911 <+105>: 4c 89 74 24 08 mov %r14,0x8(%rsp) 0xffff82d08037a916 <+110>: 4c 89 3c 24 mov %r15,(%rsp) 0xffff82d08037a91a <+114>: 45 31 e4 xor %r12d,%r12d 0xffff82d08037a91d <+117>: 45 31 ed xor %r13d,%r13d 0xffff82d08037a920 <+120>: 45 31 f6 xor %r14d,%r14d 0xffff82d08037a923 <+123>: 45 31 ff xor %r15d,%r15d 0xffff82d08037a926 <+126>: 49 c7 c6 ff 7f 00 00 mov $0x7fff,%r14 0xffff82d08037a92d <+133>: 49 09 e6 or %rsp,%r14 0xffff82d08037a930 <+136>: 90 nop 0xffff82d08037a931 <+137>: 90 nop 0xffff82d08037a932 <+138>: 90 nop 0xffff82d08037a933 <+139>: 90 nop 0xffff82d08037a934 <+140>: 90 nop 0xffff82d08037a935 <+141>: 90 nop 0xffff82d08037a936 <+142>: 90 nop 0xffff82d08037a937 <+143>: 90 nop 0xffff82d08037a938 <+144>: 90 nop 0xffff82d08037a939 <+145>: 90 nop 0xffff82d08037a93a <+146>: 90 nop 0xffff82d08037a93b <+147>: 90 nop 0xffff82d08037a93c <+148>: 90 nop 0xffff82d08037a93d <+149>: 90 nop 0xffff82d08037a93e <+150>: 90 nop 0xffff82d08037a93f <+151>: 90 nop 0xffff82d08037a940 <+152>: 90 nop 0xffff82d08037a941 <+153>: 90 nop 0xffff82d08037a942 <+154>: 90 nop 0xffff82d08037a943 <+155>: 90 nop 0xffff82d08037a944 <+156>: 90 nop 0xffff82d08037a945 <+157>: 90 nop 0xffff82d08037a946 <+158>: 90 nop 0xffff82d08037a947 <+159>: 90 nop 0xffff82d08037a948 <+160>: 90 nop 0xffff82d08037a949 <+161>: 90 nop 0xffff82d08037a94a <+162>: 90 nop 0xffff82d08037a94b <+163>: 90 nop 0xffff82d08037a94c <+164>: 90 nop 0xffff82d08037a94d <+165>: 90 nop 0xffff82d08037a94e <+166>: 90 nop 0xffff82d08037a94f <+167>: 90 nop 0xffff82d08037a950 <+168>: 90 nop 0xffff82d08037a951 <+169>: 90 nop 0xffff82d08037a952 <+170>: 90 nop 0xffff82d08037a953 <+171>: 90 nop 0xffff82d08037a954 <+172>: 90 nop 0xffff82d08037a955 <+173>: 90 nop 0xffff82d08037a956 <+174>: 90 nop 0xffff82d08037a957 <+175>: 90 nop 0xffff82d08037a958 <+176>: 90 nop 0xffff82d08037a959 <+177>: 90 nop 0xffff82d08037a95a <+178>: 90 nop 0xffff82d08037a95b <+179>: 90 nop 0xffff82d08037a95c <+180>: 90 nop 0xffff82d08037a95d <+181>: 90 nop 0xffff82d08037a95e <+182>: 90 nop 0xffff82d08037a95f <+183>: 90 nop 0xffff82d08037a960 <+184>: 90 nop 0xffff82d08037a961 <+185>: 90 nop 0xffff82d08037a962 <+186>: 90 nop 0xffff82d08037a963 <+187>: 90 nop 0xffff82d08037a964 <+188>: 90 nop 0xffff82d08037a965 <+189>: 90 nop 0xffff82d08037a966 <+190>: 90 nop 0xffff82d08037a967 <+191>: 90 nop 0xffff82d08037a968 <+192>: 90 nop 0xffff82d08037a969 <+193>: 90 nop 0xffff82d08037a96a <+194>: 90 nop 0xffff82d08037a96b <+195>: 90 nop 0xffff82d08037a96c <+196>: 90 nop 0xffff82d08037a96d <+197>: 90 nop 0xffff82d08037a96e <+198>: 90 nop 0xffff82d08037a96f <+199>: 90 nop 0xffff82d08037a970 <+200>: 90 nop 0xffff82d08037a971 <+201>: 90 nop 0xffff82d08037a972 <+202>: 90 nop 0xffff82d08037a973 <+203>: 90 nop 0xffff82d08037a974 <+204>: 90 nop 0xffff82d08037a975 <+205>: 49 8b 4e e1 mov -0x1f(%r14),%rcx 0xffff82d08037a979 <+209>: 49 89 cf mov %rcx,%r15 0xffff82d08037a97c <+212>: 48 f7 d9 neg %rcx 0xffff82d08037a97f <+215>: 74 1e je 0xffff82d08037a99f <handle_exception_saved> 0xffff82d08037a981 <+217>: 79 07 jns 0xffff82d08037a98a <handle_exception+226> 0xffff82d08037a983 <+219>: 49 89 4e e1 mov %rcx,-0x1f(%r14) 0xffff82d08037a987 <+223>: 48 f7 d9 neg %rcx 0xffff82d08037a98a <+226>: 0f 22 d9 mov %rcx,%cr3 0xffff82d08037a98d <+229>: 31 c9 xor %ecx,%ecx 0xffff82d08037a98f <+231>: 49 89 4e e1 mov %rcx,-0x1f(%r14) 0xffff82d08037a993 <+235>: f6 84 24 88 00 00 00 03 testb $0x3,0x88(%rsp) 0xffff82d08037a99b <+243>: 4c 0f 45 f9 cmovne %rcx,%r15 End of assembler dump. Is there an easy way to get gdb to resolve alternatives? BTW: (XEN) Speculative mitigation facilities: (XEN) Hardware features: (XEN) Compiled-in support: INDIRECT_THUNK (XEN) BTI mitigations: Thunk RETPOLINE, Others: RSB_NATIVE RSB_VMEXIT (XEN) XPTI: enabled With 'bti=rsb_native=0' it fails somewhere else: (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI (XEN) CPU0 CMCI LVT vector (0xf2) already installed (XEN) Finishing wakeup from ACPI S3 state. (XEN) Enabling non-boot CPUs ... (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 (XEN) emul-priv-op.c:1179:d0v2 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 (XEN) *** DOUBLE FAULT *** (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d08027c35d>] search_pre_exception_table+0/0x54 (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: ffffc90040cd4028 (XEN) rbp: 000036ffbf32bfb7 rsp: ffffc90040cd4020 r8: 0000000000000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: ffffc90040cd7fff (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426e0 (XEN) cr3: 000000022200a000 cr2: ffffc90040cd3ff8 (XEN) fsb: 00007fd74515e740 gsb: ffff88021e6c0000 gss: 0000000000000000 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000 (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4020, tss.rsp0=ffff8300cec8ffa0 (XEN) No stack overflow detected. Skipping stack trace. (XEN) *** SYSENTER_ESP: ffff8300cec8ffa0 (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) DOUBLE FAULT -- system shutdown (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Dump of assembler code for function search_pre_exception_table: 0xffff82d08027c35d <+0>: 55 push %rbp 0xffff82d08027c35e <+1>: 48 89 e5 mov %rsp,%rbp 0xffff82d08027c361 <+4>: 41 54 push %r12 0xffff82d08027c363 <+6>: 53 push %rbx 0xffff82d08027c364 <+7>: 4c 8b a7 80 00 00 00 mov 0x80(%rdi),%r12 0xffff82d08027c36b <+14>: 4c 89 e2 mov %r12,%rdx 0xffff82d08027c36e <+17>: 48 8d 35 e3 61 17 00 lea 0x1761e3(%rip),%rsi # 0xffff82d0803f2558 0xffff82d08027c375 <+24>: 48 8d 3d d4 61 17 00 lea 0x1761d4(%rip),%rdi # 0xffff82d0803f2550 0xffff82d08027c37c <+31>: e8 0c fe ff ff callq 0xffff82d08027c18d <search_one_extable> 0xffff82d08027c381 <+36>: 48 89 c3 mov %rax,%rbx 0xffff82d08027c384 <+39>: 48 85 c0 test %rax,%rax 0xffff82d08027c387 <+42>: 75 08 jne 0xffff82d08027c391 <search_pre_exception_table+52> 0xffff82d08027c389 <+44>: 48 89 d8 mov %rbx,%rax 0xffff82d08027c38c <+47>: 5b pop %rbx 0xffff82d08027c38d <+48>: 41 5c pop %r12 0xffff82d08027c38f <+50>: 5d pop %rbp 0xffff82d08027c390 <+51>: c3 retq 0xffff82d08027c391 <+52>: 49 89 c0 mov %rax,%r8 0xffff82d08027c394 <+55>: 4c 89 e1 mov %r12,%rcx 0xffff82d08027c397 <+58>: ba ca 00 00 00 mov $0xca,%edx 0xffff82d08027c39c <+63>: 48 8d 35 0a df 16 00 lea 0x16df0a(%rip),%rsi # 0xffff82d0803ea2ad 0xffff82d08027c3a3 <+70>: 48 8d 3d 56 89 15 00 lea 0x158956(%rip),%rdi # 0xffff82d0803d4d00 0xffff82d08027c3aa <+77>: e8 58 71 fd ff callq 0xffff82d080253507 <printk> 0xffff82d08027c3af <+82>: eb d8 jmp 0xffff82d08027c389 <search_pre_exception_table+44> End of assembler dump. >> (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor >> (XEN) rax: ffffc90040cd4068 rbx: 0000000000000000 rcx: 000000000000000a >> (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: 0000000000000000 >> (XEN) rbp: 000036ffbf32bf77 rsp: ffffc90040cd4000 r8: 0000000000000000 >> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 >> (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: ffffc90040cd7fff >> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426e0 >> (XEN) cr3: 000000022200a000 cr2: ffffc90040cd3ff8 >> (XEN) fsb: 0000000000000000 gsb: ffff88021e6c0000 gss: 0000000000000000 >> (XEN) ds: 002b es: 002b fs: 8a00 gs: 0010 ss: e010 cs: e008 >> (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000 >> (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4000, tss.rsp0=ffff8300cec8ffa0 > > Given the %rsp and %cr2 values, it looks like we have a bad %rsp over a > region which isn't mapped, tried to push a value, got #PF, tried to > invoke the #PF exception handler which faulted again, and escalated to > #DF which followed the TSS and moved back to reality. > > The only way to come in with stack pointers other than TSS.RSP0 is via > syscall and sysenter. SYSENTER_ESP should be identical to TSS.RSP0 > > --- a/xen/arch/x86/x86_64/traps.c > +++ b/xen/arch/x86/x86_64/traps.c > @@ -257,6 +257,13 @@ void do_double_fault(struct cpu_user_regs *regs) > _show_registers(regs, crs, CTXT_hypervisor, NULL); > show_stack_overflow(cpu, regs); > > + { > + uint64_t val; > + > + rdmsrl(MSR_IA32_SYSENTER_ESP, val); > + printk("*** SYSENTER_ESP: %p\n", _p(val)); > + } > + > panic("DOUBLE FAULT -- system shutdown"); > } > > so this bit of debugging should help track things down. If not, then > we've probably got an issue (re)writing the syscall trampolines. (XEN) mce_intel.c:782: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI (XEN) CPU0 CMCI LVT vector (0xf2) already installed (XEN) Finishing wakeup from ACPI S3 state. (XEN) Enabling non-boot CPUs ... (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00000 (XEN) emul-priv-op.c:1179:d0v1 Domain attempted WRMSR 0000001b from 0x00000000fee00c00 to 0x00000000fee00800 (XEN) *** DOUBLE FAULT *** (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff82d08037a944>] handle_exception+0x9c/0xf7 (XEN) RFLAGS: 0000000000010006 CONTEXT: hypervisor (XEN) rax: ffffc90040cc4068 rbx: 0000000000000000 rcx: 000000000000000a (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: 0000000000000000 (XEN) rbp: 000036ffbf33bf77 rsp: ffffc90040cc4000 r8: 0000000000000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: ffffc90040cc7fff (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426e0 (XEN) cr3: 000000022200a000 cr2: ffffc90040cc3ff8 (XEN) fsb: 0000000000000000 gsb: ffff88021e640000 gss: 0000000000000000 (XEN) ds: 002b es: 002b fs: 8a00 gs: 0010 ss: e010 cs: e008 (XEN) Current stack base ffffc90040cc0000 differs from expected ffff8300cec88000 (XEN) Valid stack range: ffffc90040cc6000-ffffc90040cc8000, sp=ffffc90040cc4000, tss.rsp0=ffff8300cec8ffa0 (XEN) No stack overflow detected. Skipping stack trace. (XEN) *** SYSENTER_ESP: ffff8300cec8ffa0 (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) DOUBLE FAULT -- system shutdown (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... Thanks, Simon [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] [-- Attachment #2: Type: text/plain, Size: 157 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] x86: S3 resume adjustments 2018-04-15 20:15 ` Simon Gaiser @ 2018-04-16 13:13 ` Jan Beulich 0 siblings, 0 replies; 18+ messages in thread From: Jan Beulich @ 2018-04-16 13:13 UTC (permalink / raw) To: Andrew Cooper, Simon Gaiser; +Cc: Juergen Gross, xen-devel >>> On 15.04.18 at 22:15, <simon@invisiblethingslab.com> wrote: > (XEN) *** DOUBLE FAULT *** > (XEN) ----[ Xen-4.11-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82d08027c35d>] search_pre_exception_table+0/0x54 > (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor > (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 > (XEN) rdx: 0000000000000000 rsi: 0000000000000000 rdi: ffffc90040cd4028 > (XEN) rbp: 000036ffbf32bfb7 rsp: ffffc90040cd4020 r8: 0000000000000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: ffffc90040cd7fff > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426e0 > (XEN) cr3: 000000022200a000 cr2: ffffc90040cd3ff8 > (XEN) fsb: 00007fd74515e740 gsb: ffff88021e6c0000 gss: 0000000000000000 > (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Current stack base ffffc90040cd0000 differs from expected ffff8300cec88000 > (XEN) Valid stack range: ffffc90040cd6000-ffffc90040cd8000, sp=ffffc90040cd4020, tss.rsp0=ffff8300cec8ffa0 The fact that the exact location varies where the #DF triggers is of no big interest - it all depends on when exactly the stack overflow occurs. What I note though: ffffc90040cd4020 is a guest (presumably Dom0) kernel address, far outside the Xen range. I guess we'd need to see all of that (wrong) stack's contents logged up to the original entry into Xen to understand how that could have happened. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2018-04-16 13:13 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-13 11:49 [PATCH 0/3] x86: S3 resume adjustments Jan Beulich 2018-04-13 11:56 ` [PATCH 1/3] x86: correct ordering of operations during S3 resume Jan Beulich 2018-04-13 11:57 ` [PATCH 2/3] x86: suppress BTI mitigations around S3 suspend/resume Jan Beulich 2018-04-13 18:25 ` Simon Gaiser 2018-04-13 18:27 ` Andrew Cooper 2018-04-13 18:34 ` Simon Gaiser 2018-04-13 11:58 ` [PATCH 3/3] x86: check feature flags after resume Jan Beulich 2018-04-13 18:29 ` Simon Gaiser 2018-04-13 18:56 ` Simon Gaiser 2018-04-16 10:16 ` Jan Beulich 2018-04-13 12:01 ` [PATCH 0/3] x86: S3 resume adjustments Andrew Cooper 2018-04-16 11:57 ` Juergen Gross 2018-04-14 5:49 ` Simon Gaiser 2018-04-15 13:08 ` Andrew Cooper 2018-04-15 15:52 ` Simon Gaiser 2018-04-15 17:34 ` Andrew Cooper 2018-04-15 20:15 ` Simon Gaiser 2018-04-16 13:13 ` Jan Beulich
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.