On Tue, Aug 17, 2021 at 03:25:21AM +0200, Marek Marczykowski-Górecki wrote: > Hi, > > I've got another S3 issue: > > (XEN) Preparing system for ACPI S3 state. > (XEN) Disabling non-boot CPUs ... > (XEN) Broke affinity for IRQ1, new: ffff > (XEN) Broke affinity for IRQ16, new: ffff > (XEN) Broke affinity for IRQ9, new: ffff > (XEN) Broke affinity for IRQ139, new: ffff > (XEN) Broke affinity for IRQ8, new: ffff > (XEN) Broke affinity for IRQ14, new: ffff > (XEN) Broke affinity for IRQ20, new: ffff > (XEN) Broke affinity for IRQ137, new: ffff > (XEN) Broke affinity for IRQ138, new: ffff > (XEN) Entering ACPI S3 state. > (XEN) mce_intel.c:773: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, CMCI > (XEN) CPU0 CMCI LVT vector (0xf1) already installed > (XEN) Finishing wakeup from ACPI S3 state. > (XEN) microcode: CPU0 updated from revision 0xca to 0xea, date = 2021-01-05 > (XEN) xstate: size: 0x440 (uncompressed 0x440) and states: 0x1f > (XEN) Enabling non-boot CPUs ... > (XEN) xstate: size: 0x440 (uncompressed 0x240) and states: 0x1f > (XEN) Xen BUG at xstate.c:673 > (XEN) ----[ Xen-4.16-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 1 > (XEN) RIP: e008:[] xstate_init+0x24b/0x2ff > (XEN) RFLAGS: 0000000000010087 CONTEXT: hypervisor > (XEN) rax: 0000000000000240 rbx: 000000000000001f rcx: 0000000000000440 > (XEN) rdx: 0000000000000001 rsi: 000000000000000a rdi: 000000000000001f > (XEN) rbp: ffff83025dc9fd38 rsp: ffff83025dc9fd20 r8: 0000000000000001 > (XEN) r9: ffff83025dc9fc88 r10: 0000000000000001 r11: 0000000000000001 > (XEN) r12: ffff83025dc9fd80 r13: 000000000000001f r14: 0000000000000001 > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000003526e0 > (XEN) cr3: 0000000049656000 cr2: 0000000000000000 > (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen code around (xstate_init+0x24b/0x2ff): > (XEN) ff e9 a2 00 00 00 0f 0b <0f> 0b 89 f8 89 f1 0f a2 89 f2 4c 8b 0d cb b4 0f > (XEN) Xen stack trace from rsp=ffff83025dc9fd20: > (XEN) 0000000000000240 ffff83025dc9fd80 0000000000000001 ffff83025dc9fd70 > (XEN) ffff82d04027e7a1 000000004035a7f1 7ffafbbf01100800 00000000bfebfbff > (XEN) 0000000000000001 00000000000000c8 ffff83025dc9feb8 ffff82d0402e43ce > (XEN) 000000160a9e0106 bfebfbff80000008 2c1008007ffaf3bf 0000000f00000121 > (XEN) 00000000029c6fbf 0000000000000100 000000009c002e00 02afcd7f00000000 > (XEN) 756e654700000000 6c65746e49656e69 65746e4904b21920 726f43202952286c > (XEN) 376920294d542865 432048303537382d 322e322040205550 000000007a484730 > (XEN) ffff830000000000 ffff83025dc9fe18 00002400402e8e0b 000000085dc9fe30 > (XEN) 00000002402e9f21 0000000000000001 ffffffff00000000 ffff82d0402e0040 > (XEN) 00000000003526e0 ffff83025dc9fe68 ffff82d04027bd15 0000000000000001 > (XEN) ffff8302590a0000 0000000000000000 00000000000000c8 0000000000000001 > (XEN) 0000000000000001 ffff83025dc9feb8 ffff82d0402e32b7 0000000000000001 > (XEN) 0000000000000001 00000000000000c8 0000000000000001 ffff83025dc9fee8 > (XEN) ffff82d04030e401 0000000000000001 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 ffff82d040200122 0800002000000002 > (XEN) 0100000400010000 0000002000000000 2000000000100000 0000001000000000 > (XEN) 2000000000000000 0000000029000000 0000008000000000 00110000a0000000 > (XEN) 8000000080000000 4000000000000008 0000100000000000 0200000040000080 > (XEN) 0004000000000000 0000010000000002 0400002030000000 0000000060000000 > (XEN) 0400001000010000 0000000010000000 0000004010000000 0000000000000000 > (XEN) Xen call trace: > (XEN) [] R xstate_init+0x24b/0x2ff > (XEN) [] F identify_cpu+0x318/0x4af > (XEN) [] F recheck_cpu_features+0x1f/0x72 > (XEN) [] F start_secondary+0x255/0x38a > (XEN) [] F __high_start+0x82/0x91 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 1: > (XEN) Xen BUG at xstate.c:673 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > This is with added debug patch: > > diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c > index 6aaf9a2f1546..7873a21b356a 100644 > --- a/xen/arch/x86/xstate.c > +++ b/xen/arch/x86/xstate.c > @@ -668,6 +668,8 @@ void xstate_init(struct cpuinfo_x86 *c) > else > { > BUG_ON(xfeature_mask != feature_mask); > + printk("xstate: size: %#x (uncompressed %#x) and states: %#"PRIx64"\n", > + xsave_cntxt_size, hw_uncompressed_size(feature_mask), feature_mask); > BUG_ON(xsave_cntxt_size != hw_uncompressed_size(feature_mask)); > } > > > As can be seen above - the xsave size differs between BSP and other > CPU(s) - likely because of (not) loaded ucode update there. > I guess it's a matter of moving ucode loading somewhere else, right? Few more data points: 1. The CPU is i7-8750H (family 6, model 158, stepping 10). 2. I do have "smt=off" on the Xen cmdline, if that matters. I've tried the same without letting Xen load the ucode update (so, staying at 0xca) and got the same effect. So, I think it isn't about ucode... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab