On 31/07/13 09:30, Thimo E. wrote: > Hello all, > > I have also a Haswell system. I am running XenServer 6.2 (with Xen > 4.1.5) on it and I am experiencing the same issue. Do you already have > a solution for this problem ? > > Best regards > Thimo Hi, We are still none the wiser on this issue. I have a debugging patch to get more information, but the problem hasn't reoccurred since. This is now 2 crashes on Xen 4.1 and a single crash on Xen 4.2 that I have seen. For the benefit of anyone else who runs over this issue in the meantime, the patch (against Xen-4.3) is attached. Thimo: I shall put a new version of the XenServer 6.2 Xen with the debugging patch on the forum thread. ~Andrew > > (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at > irq.c:1027^M > (XEN) ----[ Xen-4.1.5.debug x86_64 debug=y Not tainted ]----^M > (XEN) CPU: 1^M > (XEN) RIP: e008:[] do_IRQ+0x3ba/0x6d9^M > (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor^M > (XEN) rax: 0000000000000001 rbx: ffff83081f080f00 rcx: > ffff83081f05b340^M > (XEN) rdx: 0000000000000001 rsi: 000000000000002b rdi: > 0000000000000001^M > (XEN) rbp: ffff83081f057d88 rsp: ffff83081f057d18 r8: > ffff83081f05b63c^M > (XEN) r9: 000070044fb97100 r10: ffff8300b858c060 r11: > 000020f3f5a4dea5^M > (XEN) r12: 000000000000002b r13: ffff83081f004e80 r14: > 000000000000001d^M > (XEN) r15: 0000000000000002 cr0: 000000008005003b cr4: > 00000000001026f0^M > (XEN) cr3: 000000045915f000 cr2: 0000000000150008^M > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008^M > (XEN) Xen stack trace from rsp=ffff83081f057d18:^M > (XEN) 000000000000001d 000000000000001d ffff83081f080f00 > 0000000000000000^M > (XEN) 00000000ffffffea ffff83081f080f00 0000000000000000 > 0000000000000000^M > (XEN) ffffffffffffffff ffff83081f057f18 ffff83081f06bb00 > ffff83081f06bb90^M > (XEN) ffff8300b858c000 0000000000000002 00007cf7e0fa8247 > ffff82c480161a66^M > (XEN) 0000000000000002 ffff8300b858c000 ffff83081f06bb90 > ffff83081f06bb00^M > (XEN) ffff83081f057ef0 ffff83081f057f18 000020f3f5a4dea5 > ffff8300b858c060^M > (XEN) 000070044fb97100 ffff83081f05bb80 0000000000007f40 > 0000000000000001^M > (XEN) 0000000000000000 000020f3c755a972 ffff83081f06bb90 > 0000002b00000000^M > (XEN) ffff82c4801a21f0 000000000000e008 0000000000000246 > ffff83081f057e48^M > (XEN) 000000000000e010 ffff83081f057ef0 ffff82c4801a3dc4 > 000020f3f595c09c^M > (XEN) 000020f3f596987e ffff8306383e3010 ffff83081f05b100 > ffffffffffffffff^M > (XEN) 0000000000000001 0000000000000001 ffffffffffffffff > ffff83081f057f18^M > (XEN) 00000000802d4680 0000000000000000 0000000000000000 > ffff82c4802d4680^M > (XEN) 000002a80000024b ffff8300b8586000 ffff83081f057f18 > ffff8300b8586000^M > (XEN) ffff8300b858c000 ffff8300b858c000 0000000000000002 > ffff83081f057f10^M > (XEN) ffff82c48015a261 ffff82c480126ccd 0000000000000001 > ffff83081f057d18^M > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000^M > (XEN) 0000000000000000 0000000000000000 0000000000000246 > ffff88001a8093a0^M > (XEN) 0000000100885e0f 000000000000000f 0000000000000000 > ffffffff802063aa^M > (XEN) 0000000000000001 00000000deadbeef 00000000deadbeef > 0000010000000000^M > (XEN) Xen call trace:^M > (XEN) [] do_IRQ+0x3ba/0x6d9^M > (XEN) [] common_interrupt+0x26/0x30^M > (XEN) [] lapic_timer_nop+0x0/0x6^M > (XEN) [] idle_loop+0x48/0x59^M > (XEN) ^M > (XEN) ^M > (XEN) ****************************************^M > (XEN) Panic on CPU 1:^M > (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at > irq.c:1027^M > (XEN) ****************************************^M > (XEN) ^M > (XEN) Reboot in five seconds...^M > > Am 31.05.2013 22:32, schrieb Andrew Cooper: >> Recently our automated testing system has caught a curious assertion >> while testing Xen 4.1.5 on a HaswellDT system. >> >> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at >> irq.c:1030 >> (XEN) ----[ Xen-4.1.5 x86_64 debug=n Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[] do_IRQ+0x514/0x750 >> (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor >> (XEN) rax: 000000000000002f rbx: ffff830249841e80 rcx: >> ffff82c4803127c0 >> (XEN) rdx: 0000000000000004 rsi: 0000000000000027 rdi: >> 0000000000000001 >> (XEN) rbp: 0000000000001e00 rsp: ffff82c4802bfd48 r8: >> ffff82c480312abc >> (XEN) r9: ffff8302498a5948 r10: 0000000000000009 r11: >> ffff8302498c6c80 >> (XEN) r12: ffff830243b07f50 r13: ffff8300a24f8000 r14: >> 00000af8373788e3 >> (XEN) r15: ffff830249841e80 cr0: 000000008005003b cr4: >> 00000000001026f0 >> (XEN) cr3: 00000002479e6000 cr2: 00000000e6d3c090 >> (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008 >> (XEN) Xen stack trace from rsp=ffff82c4802bfd48: >> (XEN) ffff830249841eb4 ffff82c480312ec0 000000000000001e >> 0000001e00000000 >> (XEN) 0000000000000000 00000000498a5670 ffff830249841d80 >> ffff830249840080 >> (XEN) ffff830249841db4 0000000000000000 ffff8302498a55e0 >> ffff8302498a5670 >> (XEN) ffff8300a24f8000 00000af8373788e3 00000af83736b8ed >> ffff82c480162ca0 >> (XEN) 00000af83736b8ed 00000af8373788e3 ffff8300a24f8000 >> ffff8302498a5670 >> (XEN) ffff8302498a55e0 0000000000000000 ffff8302498c6c80 >> 0000000000000009 >> (XEN) ffff8302498a5948 ffff82c480313000 0000000000007f40 >> 0000000000000001 >> (XEN) 0000000000000000 0000000000000000 00000af80db652fd >> 0000002700000000 >> (XEN) ffff82c4801a50a0 000000000000e008 0000000000000246 >> ffff82c4802bfe78 >> (XEN) 0000000000000000 ffff8302498a5670 ffff82c4801a6a56 >> ffffffffffffffff >> (XEN) ffff830249818000 0000000000000000 ffff8300a24f8000 >> ffff82c480122c11 >> (XEN) 00000af839021119 0000000000000000 0000000000000000 >> 00000000802bff18 >> (XEN) 0000025c0000013b ffff82c4802e7580 ffff82c4802bff18 >> ffff8300a2838000 >> (XEN) ffff82c4802f61a0 ffff8300a24f8000 0000000000000002 >> 00000af837304b45 >> (XEN) ffff82c48015b67a 0000000000000000 0000000000000000 >> 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f8c >> 0000000000000001 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 >> 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f74 >> 0000000000000af8 >> (XEN) 0000000000000001 0000010000000000 00000000c01013a7 >> 0000000000000061 >> (XEN) 0000000000000246 00000000ee8a3f70 0000000000000069 >> 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [] do_IRQ+0x514/0x750 >> (XEN) 15[] common_interrupt+0x20/0x30 >> (XEN) 32[] lapic_timer_nop+0x0/0x10 >> (XEN) 38[] acpi_processor_idle+0x376/0x740 >> (XEN) 43[] do_block+0x71/0xd0 >> (XEN) 56[] idle_loop+0x1a/0x50 >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at >> irq.c:1030 >> (XEN) **************************************** >> >> And the disassembly before the assertion: >> >> ffff82c48016b29f: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx >> ffff82c48016b2a6: 00 >> ffff82c48016b2a7: 0f b6 44 11 ff movzbl >> -0x1(%rcx,%rdx,1),%eax >> ffff82c48016b2ac: 39 c6 cmp %eax,%esi >> ffff82c48016b2ae: 0f 8f 5c ff ff ff jg >> ffff82c48016b210 >> ffff82c48016b2b4: 0f 0b ud2 >> >> >> Xen has been woken up by an interrupt of vector 0x27, but has a vector >> 0x2f on the top of the pending EOI stack for the local APIC. >> >> I have put in more debugging to dump the LAPIC state of the two >> interesting vectors and the IOAPIC state, but I have no idea if/when the >> problem might reoccur. >> >> My understanding of LAPIC priority leads me to think that Xen really >> shouldn't be woken up by a lower priority vector if a higher priority >> one is still un-eoi'd. There is not yet sufficient information to tell >> whether this is truely the case, or that Xen has simply gotten confused >> about which vectors it eoi'd. >> >> Having said that, we do keep line level interrupts un-eoi'd for extended >> periods while guests service the interrupt. Given that vectors are >> chosen at random, we could get into a situation where a line interrupt >> has a vector 0xdf and stays pending for 150ms (which I measured as a >> not-overly-uncommon mean-time-till-eoi for line level interrupt). This >> would starve any other guest interrupts for an extended period. >> >> Given directed-eoi support in the past few generations of processor, the >> requirement for the pending EOI stack has disappeared as far as I am >> aware. Would it be sensible idea in general to make use of the pending >> eoi stack conditional on not having/using directed EOI support? >> >> ~Andrew >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >