All of lore.kernel.org
 help / color / mirror / Atom feed
* cpuidle and un-eoid interrupts at the local apic
@ 2013-05-31 20:32 Andrew Cooper
  2013-06-03 14:30 ` Jan Beulich
  2013-07-31  8:30 ` Thimo E.
  0 siblings, 2 replies; 63+ messages in thread
From: Andrew Cooper @ 2013-05-31 20:32 UTC (permalink / raw)
  To: Xen-devel List; +Cc: Keir Fraser, Jan Beulich

Recently our automated testing system has caught a curious assertion
while testing Xen 4.1.5 on a HaswellDT system.

(XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1030
(XEN) ----[ Xen-4.1.5  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48016b2b4>] do_IRQ+0x514/0x750
(XEN) RFLAGS: 0000000000010093   CONTEXT: hypervisor
(XEN) rax: 000000000000002f   rbx: ffff830249841e80   rcx: ffff82c4803127c0
(XEN) rdx: 0000000000000004   rsi: 0000000000000027   rdi: 0000000000000001
(XEN) rbp: 0000000000001e00   rsp: ffff82c4802bfd48   r8:  ffff82c480312abc
(XEN) r9:  ffff8302498a5948   r10: 0000000000000009   r11: ffff8302498c6c80
(XEN) r12: ffff830243b07f50   r13: ffff8300a24f8000   r14: 00000af8373788e3
(XEN) r15: ffff830249841e80   cr0: 000000008005003b   cr4: 00000000001026f0
(XEN) cr3: 00000002479e6000   cr2: 00000000e6d3c090
(XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802bfd48:
(XEN)    ffff830249841eb4 ffff82c480312ec0 000000000000001e 0000001e00000000
(XEN)    0000000000000000 00000000498a5670 ffff830249841d80 ffff830249840080
(XEN)    ffff830249841db4 0000000000000000 ffff8302498a55e0 ffff8302498a5670
(XEN)    ffff8300a24f8000 00000af8373788e3 00000af83736b8ed ffff82c480162ca0
(XEN)    00000af83736b8ed 00000af8373788e3 ffff8300a24f8000 ffff8302498a5670
(XEN)    ffff8302498a55e0 0000000000000000 ffff8302498c6c80 0000000000000009
(XEN)    ffff8302498a5948 ffff82c480313000 0000000000007f40 0000000000000001
(XEN)    0000000000000000 0000000000000000 00000af80db652fd 0000002700000000
(XEN)    ffff82c4801a50a0 000000000000e008 0000000000000246 ffff82c4802bfe78
(XEN)    0000000000000000 ffff8302498a5670 ffff82c4801a6a56 ffffffffffffffff
(XEN)    ffff830249818000 0000000000000000 ffff8300a24f8000 ffff82c480122c11
(XEN)    00000af839021119 0000000000000000 0000000000000000 00000000802bff18
(XEN)    0000025c0000013b ffff82c4802e7580 ffff82c4802bff18 ffff8300a2838000
(XEN)    ffff82c4802f61a0 ffff8300a24f8000 0000000000000002 00000af837304b45
(XEN)    ffff82c48015b67a 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 00000000ee8a3f8c 0000000000000001
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 00000000ee8a3f74 0000000000000af8
(XEN)    0000000000000001 0000010000000000 00000000c01013a7 0000000000000061
(XEN)    0000000000000246 00000000ee8a3f70 0000000000000069 0000000000000000
(XEN) Xen call trace:
(XEN)       [<ffff82c48016b2b4>] do_IRQ+0x514/0x750
(XEN)     15[<ffff82c480162ca0>] common_interrupt+0x20/0x30
(XEN)     32[<ffff82c4801a50a0>] lapic_timer_nop+0x0/0x10
(XEN)     38[<ffff82c4801a6a56>] acpi_processor_idle+0x376/0x740
(XEN)     43[<ffff82c480122c11>] do_block+0x71/0xd0
(XEN)     56[<ffff82c48015b67a>] idle_loop+0x1a/0x50
(XEN)    
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1030
(XEN) ****************************************

And the disassembly before the assertion:

ffff82c48016b29f:       48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
ffff82c48016b2a6:       00 
ffff82c48016b2a7:       0f b6 44 11 ff          movzbl -0x1(%rcx,%rdx,1),%eax
ffff82c48016b2ac:       39 c6                   cmp    %eax,%esi
ffff82c48016b2ae:       0f 8f 5c ff ff ff       jg     ffff82c48016b210 <do_IRQ+0x470>
ffff82c48016b2b4:       0f 0b                   ud2


Xen has been woken up by an interrupt of vector 0x27, but has a vector
0x2f on the top of the pending EOI stack for the local APIC.

I have put in more debugging to dump the LAPIC state of the two
interesting vectors and the IOAPIC state, but I have no idea if/when the
problem might reoccur. 

My understanding of LAPIC priority leads me to think that Xen really
shouldn't be woken up by a lower priority vector if a higher priority
one is still un-eoi'd.  There is not yet sufficient information to tell
whether this is truely the case, or that Xen has simply gotten confused
about which vectors it eoi'd.

Having said that, we do keep line level interrupts un-eoi'd for extended
periods while guests service the interrupt.  Given that vectors are
chosen at random, we could get into a situation where a line interrupt
has a vector 0xdf and stays pending for 150ms (which I measured as a
not-overly-uncommon mean-time-till-eoi for line level interrupt).  This
would starve any other guest interrupts for an extended period.

Given directed-eoi support in the past few generations of processor, the
requirement for the pending EOI stack has disappeared as far as I am
aware.  Would it be sensible idea in general to make use of the pending
eoi stack conditional on not having/using directed EOI support?

~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-05-31 20:32 cpuidle and un-eoid interrupts at the local apic Andrew Cooper
@ 2013-06-03 14:30 ` Jan Beulich
  2013-07-31  8:30 ` Thimo E.
  1 sibling, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2013-06-03 14:30 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Xen-devel List

>>> On 31.05.13 at 22:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> Xen has been woken up by an interrupt of vector 0x27, but has a vector
> 0x2f on the top of the pending EOI stack for the local APIC.
> 
> I have put in more debugging to dump the LAPIC state of the two
> interesting vectors and the IOAPIC state, but I have no idea if/when the
> problem might reoccur. 
> 
> My understanding of LAPIC priority leads me to think that Xen really
> shouldn't be woken up by a lower priority vector if a higher priority
> one is still un-eoi'd.  There is not yet sufficient information to tell
> whether this is truely the case, or that Xen has simply gotten confused
> about which vectors it eoi'd.

Considering that this was on a Haswell, and got so far not reported
by anyone else, I wonder whether that's related to some effect of
(or flaw in) APIC virtualization. But of course without knowing the
state of the LAPIC, that's hard to tell for sure. The more that a stray
ack_APIC_irq() could lead to the same effect, and that EDX (holding
"sp") has a value of 4 - quite a few lower priority vectors awaiting
an EOI considering that vector group 2x is the lowest possible one
(i.e. the other entries on the stack ought to have even larger
vector numbers).

> Having said that, we do keep line level interrupts un-eoi'd for extended
> periods while guests service the interrupt.  Given that vectors are
> chosen at random, we could get into a situation where a line interrupt
> has a vector 0xdf and stays pending for 150ms (which I measured as a
> not-overly-uncommon mean-time-till-eoi for line level interrupt).  This
> would starve any other guest interrupts for an extended period.
> 
> Given directed-eoi support in the past few generations of processor, the
> requirement for the pending EOI stack has disappeared as far as I am
> aware.  Would it be sensible idea in general to make use of the pending
> eoi stack conditional on not having/using directed EOI support?

We don't use ACKTYPE_EOI in that case: setup_IO_APIC() only sets
ioapic_level_type.ack to irq_complete_move (consumed by
pirq_acktype()) when ioapic_ack_new, and directed EOI implies
!ioapic_ack_new (see verify_local_APIC()). The only other case of
using ACKTYPE_EOI is for non-maskable MSIs.

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-05-31 20:32 cpuidle and un-eoid interrupts at the local apic Andrew Cooper
  2013-06-03 14:30 ` Jan Beulich
@ 2013-07-31  8:30 ` Thimo E.
  2013-07-31  9:47   ` Andrew Cooper
  1 sibling, 1 reply; 63+ messages in thread
From: Thimo E. @ 2013-07-31  8:30 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, Xen-devel List

Hello all,

I have also a Haswell system. I am running XenServer 6.2 (with Xen 
4.1.5) on it and I am experiencing the same issue. Do you already have a 
solution for this problem ?

Best regards
   Thimo

(XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at 
irq.c:1027^M
(XEN) ----[ Xen-4.1.5.debug  x86_64  debug=y  Not tainted ]----^M
(XEN) CPU:    1^M
(XEN) RIP:    e008:[<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
(XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor^M
(XEN) rax: 0000000000000001   rbx: ffff83081f080f00   rcx: 
ffff83081f05b340^M
(XEN) rdx: 0000000000000001   rsi: 000000000000002b   rdi: 
0000000000000001^M
(XEN) rbp: ffff83081f057d88   rsp: ffff83081f057d18   r8: ffff83081f05b63c^M
(XEN) r9:  000070044fb97100   r10: ffff8300b858c060   r11: 
000020f3f5a4dea5^M
(XEN) r12: 000000000000002b   r13: ffff83081f004e80   r14: 
000000000000001d^M
(XEN) r15: 0000000000000002   cr0: 000000008005003b   cr4: 
00000000001026f0^M
(XEN) cr3: 000000045915f000   cr2: 0000000000150008^M
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008^M
(XEN) Xen stack trace from rsp=ffff83081f057d18:^M
(XEN)    000000000000001d 000000000000001d ffff83081f080f00 
0000000000000000^M
(XEN)    00000000ffffffea ffff83081f080f00 0000000000000000 
0000000000000000^M
(XEN)    ffffffffffffffff ffff83081f057f18 ffff83081f06bb00 
ffff83081f06bb90^M
(XEN)    ffff8300b858c000 0000000000000002 00007cf7e0fa8247 
ffff82c480161a66^M
(XEN)    0000000000000002 ffff8300b858c000 ffff83081f06bb90 
ffff83081f06bb00^M
(XEN)    ffff83081f057ef0 ffff83081f057f18 000020f3f5a4dea5 
ffff8300b858c060^M
(XEN)    000070044fb97100 ffff83081f05bb80 0000000000007f40 
0000000000000001^M
(XEN)    0000000000000000 000020f3c755a972 ffff83081f06bb90 
0000002b00000000^M
(XEN)    ffff82c4801a21f0 000000000000e008 0000000000000246 
ffff83081f057e48^M
(XEN)    000000000000e010 ffff83081f057ef0 ffff82c4801a3dc4 
000020f3f595c09c^M
(XEN)    000020f3f596987e ffff8306383e3010 ffff83081f05b100 
ffffffffffffffff^M
(XEN)    0000000000000001 0000000000000001 ffffffffffffffff 
ffff83081f057f18^M
(XEN)    00000000802d4680 0000000000000000 0000000000000000 
ffff82c4802d4680^M
(XEN)    000002a80000024b ffff8300b8586000 ffff83081f057f18 
ffff8300b8586000^M
(XEN)    ffff8300b858c000 ffff8300b858c000 0000000000000002 
ffff83081f057f10^M
(XEN)    ffff82c48015a261 ffff82c480126ccd 0000000000000001 
ffff83081f057d18^M
(XEN)    0000000000000000 0000000000000000 0000000000000000 
0000000000000000^M
(XEN)    0000000000000000 0000000000000000 0000000000000246 
ffff88001a8093a0^M
(XEN)    0000000100885e0f 000000000000000f 0000000000000000 
ffffffff802063aa^M
(XEN)    0000000000000001 00000000deadbeef 00000000deadbeef 
0000010000000000^M
(XEN) Xen call trace:^M
(XEN)    [<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
(XEN)    [<ffff82c480161a66>] common_interrupt+0x26/0x30^M
(XEN)    [<ffff82c4801a21f0>] lapic_timer_nop+0x0/0x6^M
(XEN)    [<ffff82c48015a261>] idle_loop+0x48/0x59^M
(XEN)    ^M
(XEN) ^M
(XEN) ****************************************^M
(XEN) Panic on CPU 1:^M
(XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at 
irq.c:1027^M
(XEN) ****************************************^M
(XEN) ^M
(XEN) Reboot in five seconds...^M

Am 31.05.2013 22:32, schrieb Andrew Cooper:
> Recently our automated testing system has caught a curious assertion
> while testing Xen 4.1.5 on a HaswellDT system.
>
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1030
> (XEN) ----[ Xen-4.1.5  x86_64  debug=n  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48016b2b4>] do_IRQ+0x514/0x750
> (XEN) RFLAGS: 0000000000010093   CONTEXT: hypervisor
> (XEN) rax: 000000000000002f   rbx: ffff830249841e80   rcx: ffff82c4803127c0
> (XEN) rdx: 0000000000000004   rsi: 0000000000000027   rdi: 0000000000000001
> (XEN) rbp: 0000000000001e00   rsp: ffff82c4802bfd48   r8:  ffff82c480312abc
> (XEN) r9:  ffff8302498a5948   r10: 0000000000000009   r11: ffff8302498c6c80
> (XEN) r12: ffff830243b07f50   r13: ffff8300a24f8000   r14: 00000af8373788e3
> (XEN) r15: ffff830249841e80   cr0: 000000008005003b   cr4: 00000000001026f0
> (XEN) cr3: 00000002479e6000   cr2: 00000000e6d3c090
> (XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802bfd48:
> (XEN)    ffff830249841eb4 ffff82c480312ec0 000000000000001e 0000001e00000000
> (XEN)    0000000000000000 00000000498a5670 ffff830249841d80 ffff830249840080
> (XEN)    ffff830249841db4 0000000000000000 ffff8302498a55e0 ffff8302498a5670
> (XEN)    ffff8300a24f8000 00000af8373788e3 00000af83736b8ed ffff82c480162ca0
> (XEN)    00000af83736b8ed 00000af8373788e3 ffff8300a24f8000 ffff8302498a5670
> (XEN)    ffff8302498a55e0 0000000000000000 ffff8302498c6c80 0000000000000009
> (XEN)    ffff8302498a5948 ffff82c480313000 0000000000007f40 0000000000000001
> (XEN)    0000000000000000 0000000000000000 00000af80db652fd 0000002700000000
> (XEN)    ffff82c4801a50a0 000000000000e008 0000000000000246 ffff82c4802bfe78
> (XEN)    0000000000000000 ffff8302498a5670 ffff82c4801a6a56 ffffffffffffffff
> (XEN)    ffff830249818000 0000000000000000 ffff8300a24f8000 ffff82c480122c11
> (XEN)    00000af839021119 0000000000000000 0000000000000000 00000000802bff18
> (XEN)    0000025c0000013b ffff82c4802e7580 ffff82c4802bff18 ffff8300a2838000
> (XEN)    ffff82c4802f61a0 ffff8300a24f8000 0000000000000002 00000af837304b45
> (XEN)    ffff82c48015b67a 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 00000000ee8a3f8c 0000000000000001
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 00000000ee8a3f74 0000000000000af8
> (XEN)    0000000000000001 0000010000000000 00000000c01013a7 0000000000000061
> (XEN)    0000000000000246 00000000ee8a3f70 0000000000000069 0000000000000000
> (XEN) Xen call trace:
> (XEN)       [<ffff82c48016b2b4>] do_IRQ+0x514/0x750
> (XEN)     15[<ffff82c480162ca0>] common_interrupt+0x20/0x30
> (XEN)     32[<ffff82c4801a50a0>] lapic_timer_nop+0x0/0x10
> (XEN)     38[<ffff82c4801a6a56>] acpi_processor_idle+0x376/0x740
> (XEN)     43[<ffff82c480122c11>] do_block+0x71/0xd0
> (XEN)     56[<ffff82c48015b67a>] idle_loop+0x1a/0x50
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1030
> (XEN) ****************************************
>
> And the disassembly before the assertion:
>
> ffff82c48016b29f:       48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
> ffff82c48016b2a6:       00
> ffff82c48016b2a7:       0f b6 44 11 ff          movzbl -0x1(%rcx,%rdx,1),%eax
> ffff82c48016b2ac:       39 c6                   cmp    %eax,%esi
> ffff82c48016b2ae:       0f 8f 5c ff ff ff       jg     ffff82c48016b210 <do_IRQ+0x470>
> ffff82c48016b2b4:       0f 0b                   ud2
>
>
> Xen has been woken up by an interrupt of vector 0x27, but has a vector
> 0x2f on the top of the pending EOI stack for the local APIC.
>
> I have put in more debugging to dump the LAPIC state of the two
> interesting vectors and the IOAPIC state, but I have no idea if/when the
> problem might reoccur.
>
> My understanding of LAPIC priority leads me to think that Xen really
> shouldn't be woken up by a lower priority vector if a higher priority
> one is still un-eoi'd.  There is not yet sufficient information to tell
> whether this is truely the case, or that Xen has simply gotten confused
> about which vectors it eoi'd.
>
> Having said that, we do keep line level interrupts un-eoi'd for extended
> periods while guests service the interrupt.  Given that vectors are
> chosen at random, we could get into a situation where a line interrupt
> has a vector 0xdf and stays pending for 150ms (which I measured as a
> not-overly-uncommon mean-time-till-eoi for line level interrupt).  This
> would starve any other guest interrupts for an extended period.
>
> Given directed-eoi support in the past few generations of processor, the
> requirement for the pending EOI stack has disappeared as far as I am
> aware.  Would it be sensible idea in general to make use of the pending
> eoi stack conditional on not having/using directed EOI support?
>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-07-31  8:30 ` Thimo E.
@ 2013-07-31  9:47   ` Andrew Cooper
  2013-08-02 22:50     ` Thimo E.
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-07-31  9:47 UTC (permalink / raw)
  To: Thimo E.; +Cc: Keir Fraser, Jan Beulich, Xen-devel List

[-- Attachment #1: Type: text/plain, Size: 9270 bytes --]

On 31/07/13 09:30, Thimo E. wrote:
> Hello all,
>
> I have also a Haswell system. I am running XenServer 6.2 (with Xen
> 4.1.5) on it and I am experiencing the same issue. Do you already have
> a solution for this problem ?
>
> Best regards
>   Thimo

Hi,

We are still none the wiser on this issue.  I have a debugging patch to
get more information, but the problem hasn't reoccurred since.  This is
now 2 crashes on Xen 4.1 and a single crash on Xen 4.2 that I have seen.

For the benefit of anyone else who runs over this issue in the meantime,
the patch (against Xen-4.3) is attached.

Thimo: I shall put a new version of the XenServer 6.2 Xen with the
debugging patch on the forum thread.

~Andrew

>
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
> irq.c:1027^M
> (XEN) ----[ Xen-4.1.5.debug  x86_64  debug=y  Not tainted ]----^M
> (XEN) CPU:    1^M
> (XEN) RIP:    e008:[<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor^M
> (XEN) rax: 0000000000000001   rbx: ffff83081f080f00   rcx:
> ffff83081f05b340^M
> (XEN) rdx: 0000000000000001   rsi: 000000000000002b   rdi:
> 0000000000000001^M
> (XEN) rbp: ffff83081f057d88   rsp: ffff83081f057d18   r8:
> ffff83081f05b63c^M
> (XEN) r9:  000070044fb97100   r10: ffff8300b858c060   r11:
> 000020f3f5a4dea5^M
> (XEN) r12: 000000000000002b   r13: ffff83081f004e80   r14:
> 000000000000001d^M
> (XEN) r15: 0000000000000002   cr0: 000000008005003b   cr4:
> 00000000001026f0^M
> (XEN) cr3: 000000045915f000   cr2: 0000000000150008^M
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008^M
> (XEN) Xen stack trace from rsp=ffff83081f057d18:^M
> (XEN)    000000000000001d 000000000000001d ffff83081f080f00
> 0000000000000000^M
> (XEN)    00000000ffffffea ffff83081f080f00 0000000000000000
> 0000000000000000^M
> (XEN)    ffffffffffffffff ffff83081f057f18 ffff83081f06bb00
> ffff83081f06bb90^M
> (XEN)    ffff8300b858c000 0000000000000002 00007cf7e0fa8247
> ffff82c480161a66^M
> (XEN)    0000000000000002 ffff8300b858c000 ffff83081f06bb90
> ffff83081f06bb00^M
> (XEN)    ffff83081f057ef0 ffff83081f057f18 000020f3f5a4dea5
> ffff8300b858c060^M
> (XEN)    000070044fb97100 ffff83081f05bb80 0000000000007f40
> 0000000000000001^M
> (XEN)    0000000000000000 000020f3c755a972 ffff83081f06bb90
> 0000002b00000000^M
> (XEN)    ffff82c4801a21f0 000000000000e008 0000000000000246
> ffff83081f057e48^M
> (XEN)    000000000000e010 ffff83081f057ef0 ffff82c4801a3dc4
> 000020f3f595c09c^M
> (XEN)    000020f3f596987e ffff8306383e3010 ffff83081f05b100
> ffffffffffffffff^M
> (XEN)    0000000000000001 0000000000000001 ffffffffffffffff
> ffff83081f057f18^M
> (XEN)    00000000802d4680 0000000000000000 0000000000000000
> ffff82c4802d4680^M
> (XEN)    000002a80000024b ffff8300b8586000 ffff83081f057f18
> ffff8300b8586000^M
> (XEN)    ffff8300b858c000 ffff8300b858c000 0000000000000002
> ffff83081f057f10^M
> (XEN)    ffff82c48015a261 ffff82c480126ccd 0000000000000001
> ffff83081f057d18^M
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000^M
> (XEN)    0000000000000000 0000000000000000 0000000000000246
> ffff88001a8093a0^M
> (XEN)    0000000100885e0f 000000000000000f 0000000000000000
> ffffffff802063aa^M
> (XEN)    0000000000000001 00000000deadbeef 00000000deadbeef
> 0000010000000000^M
> (XEN) Xen call trace:^M
> (XEN)    [<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
> (XEN)    [<ffff82c480161a66>] common_interrupt+0x26/0x30^M
> (XEN)    [<ffff82c4801a21f0>] lapic_timer_nop+0x0/0x6^M
> (XEN)    [<ffff82c48015a261>] idle_loop+0x48/0x59^M
> (XEN)    ^M
> (XEN) ^M
> (XEN) ****************************************^M
> (XEN) Panic on CPU 1:^M
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
> irq.c:1027^M
> (XEN) ****************************************^M
> (XEN) ^M
> (XEN) Reboot in five seconds...^M
>
> Am 31.05.2013 22:32, schrieb Andrew Cooper:
>> Recently our automated testing system has caught a curious assertion
>> while testing Xen 4.1.5 on a HaswellDT system.
>>
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1030
>> (XEN) ----[ Xen-4.1.5  x86_64  debug=n  Not tainted ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>> (XEN) RFLAGS: 0000000000010093   CONTEXT: hypervisor
>> (XEN) rax: 000000000000002f   rbx: ffff830249841e80   rcx:
>> ffff82c4803127c0
>> (XEN) rdx: 0000000000000004   rsi: 0000000000000027   rdi:
>> 0000000000000001
>> (XEN) rbp: 0000000000001e00   rsp: ffff82c4802bfd48   r8: 
>> ffff82c480312abc
>> (XEN) r9:  ffff8302498a5948   r10: 0000000000000009   r11:
>> ffff8302498c6c80
>> (XEN) r12: ffff830243b07f50   r13: ffff8300a24f8000   r14:
>> 00000af8373788e3
>> (XEN) r15: ffff830249841e80   cr0: 000000008005003b   cr4:
>> 00000000001026f0
>> (XEN) cr3: 00000002479e6000   cr2: 00000000e6d3c090
>> (XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0000   ss: 0000   cs: e008
>> (XEN) Xen stack trace from rsp=ffff82c4802bfd48:
>> (XEN)    ffff830249841eb4 ffff82c480312ec0 000000000000001e
>> 0000001e00000000
>> (XEN)    0000000000000000 00000000498a5670 ffff830249841d80
>> ffff830249840080
>> (XEN)    ffff830249841db4 0000000000000000 ffff8302498a55e0
>> ffff8302498a5670
>> (XEN)    ffff8300a24f8000 00000af8373788e3 00000af83736b8ed
>> ffff82c480162ca0
>> (XEN)    00000af83736b8ed 00000af8373788e3 ffff8300a24f8000
>> ffff8302498a5670
>> (XEN)    ffff8302498a55e0 0000000000000000 ffff8302498c6c80
>> 0000000000000009
>> (XEN)    ffff8302498a5948 ffff82c480313000 0000000000007f40
>> 0000000000000001
>> (XEN)    0000000000000000 0000000000000000 00000af80db652fd
>> 0000002700000000
>> (XEN)    ffff82c4801a50a0 000000000000e008 0000000000000246
>> ffff82c4802bfe78
>> (XEN)    0000000000000000 ffff8302498a5670 ffff82c4801a6a56
>> ffffffffffffffff
>> (XEN)    ffff830249818000 0000000000000000 ffff8300a24f8000
>> ffff82c480122c11
>> (XEN)    00000af839021119 0000000000000000 0000000000000000
>> 00000000802bff18
>> (XEN)    0000025c0000013b ffff82c4802e7580 ffff82c4802bff18
>> ffff8300a2838000
>> (XEN)    ffff82c4802f61a0 ffff8300a24f8000 0000000000000002
>> 00000af837304b45
>> (XEN)    ffff82c48015b67a 0000000000000000 0000000000000000
>> 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 00000000ee8a3f8c
>> 0000000000000001
>> (XEN)    0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 00000000ee8a3f74
>> 0000000000000af8
>> (XEN)    0000000000000001 0000010000000000 00000000c01013a7
>> 0000000000000061
>> (XEN)    0000000000000246 00000000ee8a3f70 0000000000000069
>> 0000000000000000
>> (XEN) Xen call trace:
>> (XEN)       [<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>> (XEN)     15[<ffff82c480162ca0>] common_interrupt+0x20/0x30
>> (XEN)     32[<ffff82c4801a50a0>] lapic_timer_nop+0x0/0x10
>> (XEN)     38[<ffff82c4801a6a56>] acpi_processor_idle+0x376/0x740
>> (XEN)     43[<ffff82c480122c11>] do_block+0x71/0xd0
>> (XEN)     56[<ffff82c48015b67a>] idle_loop+0x1a/0x50
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1030
>> (XEN) ****************************************
>>
>> And the disassembly before the assertion:
>>
>> ffff82c48016b29f:       48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
>> ffff82c48016b2a6:       00
>> ffff82c48016b2a7:       0f b6 44 11 ff          movzbl
>> -0x1(%rcx,%rdx,1),%eax
>> ffff82c48016b2ac:       39 c6                   cmp    %eax,%esi
>> ffff82c48016b2ae:       0f 8f 5c ff ff ff       jg    
>> ffff82c48016b210 <do_IRQ+0x470>
>> ffff82c48016b2b4:       0f 0b                   ud2
>>
>>
>> Xen has been woken up by an interrupt of vector 0x27, but has a vector
>> 0x2f on the top of the pending EOI stack for the local APIC.
>>
>> I have put in more debugging to dump the LAPIC state of the two
>> interesting vectors and the IOAPIC state, but I have no idea if/when the
>> problem might reoccur.
>>
>> My understanding of LAPIC priority leads me to think that Xen really
>> shouldn't be woken up by a lower priority vector if a higher priority
>> one is still un-eoi'd.  There is not yet sufficient information to tell
>> whether this is truely the case, or that Xen has simply gotten confused
>> about which vectors it eoi'd.
>>
>> Having said that, we do keep line level interrupts un-eoi'd for extended
>> periods while guests service the interrupt.  Given that vectors are
>> chosen at random, we could get into a situation where a line interrupt
>> has a vector 0xdf and stays pending for 150ms (which I measured as a
>> not-overly-uncommon mean-time-till-eoi for line level interrupt).  This
>> would starve any other guest interrupts for an extended period.
>>
>> Given directed-eoi support in the past few generations of processor, the
>> requirement for the pending EOI stack has disappeared as far as I am
>> aware.  Would it be sensible idea in general to make use of the pending
>> eoi stack conditional on not having/using directed EOI support?
>>
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>


[-- Attachment #2: ca-107844-debug.patch --]
[-- Type: text/x-patch, Size: 2919 bytes --]

# HG changeset patch
# Parent 3bc8894f281f3ee68406a565beb2f811c67c6b5e

diff -r 3bc8894f281f xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1100,7 +1100,7 @@ static inline void UNEXPECTED_IO_APIC(vo
 {
 }
 
-static void /*__init*/ __print_IO_APIC(void)
+void /*__init*/ __print_IO_APIC(void)
 {
     int apic, i;
     union IO_APIC_reg_00 reg_00;
diff -r 3bc8894f281f xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1115,6 +1115,8 @@ static void irq_guest_eoi_timer_fn(void 
     spin_unlock_irqrestore(&desc->lock, flags);
 }
 
+static void dump_irqs(unsigned char key);
+void __print_IO_APIC(void);
 static void __do_IRQ_guest(int irq)
 {
     struct irq_desc         *desc = irq_to_desc(irq);
@@ -1137,7 +1139,36 @@ static void __do_IRQ_guest(int irq)
     if ( action->ack_type == ACKTYPE_EOI )
     {
         sp = pending_eoi_sp(peoi);
-        ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
+        if ( unlikely( !((sp == 0) || (peoi[sp-1].vector < vector)) ))
+        {
+            printk("**Pending EOI error\n");
+            printk("  irq %d, vector 0x%x\n", irq, vector);
+
+            for ( i = sp-1; i >= 0; --i )
+            {
+                printk("  s[%d] irq %d, vec 0x%x, ready %u, "
+                       "ISR %u, TMR %u, IRR %u\n",
+                       i, peoi[i].irq, peoi[i].vector, peoi[i].ready,
+                       apic_isr_read(peoi[i].vector),
+                       apic_tmr_read(peoi[i].vector),
+                       apic_irr_read(peoi[i].vector) );
+            }
+
+            printk("All LAPIC state:\n");
+            printk("[vector] %8s %8s %8s\n", "ISR", "TMR", "IRR");
+            for ( i = 0; i < APIC_ISR_NR; ++i )
+                printk("[%02x:%0x2x] %08"PRIu32" %08"PRIu32" %08"PRIu32"\n",
+                       (i * 32)+31, i*32,
+                       apic_read(APIC_ISR + i*0x10),
+                       apic_read(APIC_TMR + i*0x10),
+                       apic_read(APIC_IRR + i*0x10) );
+
+            spin_unlock(&desc->lock);
+            dump_irqs('i');
+            __print_IO_APIC();
+
+            panic("CA-107844");
+        }
         ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
         peoi[sp].irq = irq;
         peoi[sp].vector = vector;
diff -r 3bc8894f281f xen/include/asm-x86/apic.h
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -152,6 +152,18 @@ static __inline bool_t apic_isr_read(u8 
             (vector & 0x1f)) & 1;
 }
 
+static __inline bool_t apic_tmr_read(u8 vector)
+{
+    return (apic_read(APIC_TMR + ((vector & ~0x1f) >> 1)) >>
+            (vector & 0x1f)) & 1;
+}
+
+static __inline bool_t apic_irr_read(u8 vector)
+{
+    return (apic_read(APIC_IRR + ((vector & ~0x1f) >> 1)) >>
+            (vector & 0x1f)) & 1;
+}
+
 static __inline u32 get_apic_id(void) /* Get the physical APIC id */
 {
     u32 id = apic_read(APIC_ID);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-07-31  9:47   ` Andrew Cooper
@ 2013-08-02 22:50     ` Thimo E.
  2013-08-02 23:32       ` Andrew Cooper
  0 siblings, 1 reply; 63+ messages in thread
From: Thimo E. @ 2013-08-02 22:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, Xen-devel List


[-- Attachment #1.1: Type: text/plain, Size: 9961 bytes --]

Hi,

I've postet it already in the forum thread, but to keep all of you up to 
date for this issue I am copying the logfile into this thread, too:

XenServer crash again, attached you'll find the output with the verbose 
messages Andrew inserted into the code.

Best regards
   Thimo


Am 31.07.2013 11:47, schrieb Andrew Cooper:
> On 31/07/13 09:30, Thimo E. wrote:
>> Hello all,
>>
>> I have also a Haswell system. I am running XenServer 6.2 (with Xen
>> 4.1.5) on it and I am experiencing the same issue. Do you already have
>> a solution for this problem ?
>>
>> Best regards
>>    Thimo
> Hi,
>
> We are still none the wiser on this issue.  I have a debugging patch to
> get more information, but the problem hasn't reoccurred since.  This is
> now 2 crashes on Xen 4.1 and a single crash on Xen 4.2 that I have seen.
>
> For the benefit of anyone else who runs over this issue in the meantime,
> the patch (against Xen-4.3) is attached.
>
> Thimo: I shall put a new version of the XenServer 6.2 Xen with the
> debugging patch on the forum thread.
>
> ~Andrew
>
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1027^M
>> (XEN) ----[ Xen-4.1.5.debug  x86_64  debug=y  Not tainted ]----^M
>> (XEN) CPU:    1^M
>> (XEN) RIP:    e008:[<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
>> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor^M
>> (XEN) rax: 0000000000000001   rbx: ffff83081f080f00   rcx:
>> ffff83081f05b340^M
>> (XEN) rdx: 0000000000000001   rsi: 000000000000002b   rdi:
>> 0000000000000001^M
>> (XEN) rbp: ffff83081f057d88   rsp: ffff83081f057d18   r8:
>> ffff83081f05b63c^M
>> (XEN) r9:  000070044fb97100   r10: ffff8300b858c060   r11:
>> 000020f3f5a4dea5^M
>> (XEN) r12: 000000000000002b   r13: ffff83081f004e80   r14:
>> 000000000000001d^M
>> (XEN) r15: 0000000000000002   cr0: 000000008005003b   cr4:
>> 00000000001026f0^M
>> (XEN) cr3: 000000045915f000   cr2: 0000000000150008^M
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008^M
>> (XEN) Xen stack trace from rsp=ffff83081f057d18:^M
>> (XEN)    000000000000001d 000000000000001d ffff83081f080f00
>> 0000000000000000^M
>> (XEN)    00000000ffffffea ffff83081f080f00 0000000000000000
>> 0000000000000000^M
>> (XEN)    ffffffffffffffff ffff83081f057f18 ffff83081f06bb00
>> ffff83081f06bb90^M
>> (XEN)    ffff8300b858c000 0000000000000002 00007cf7e0fa8247
>> ffff82c480161a66^M
>> (XEN)    0000000000000002 ffff8300b858c000 ffff83081f06bb90
>> ffff83081f06bb00^M
>> (XEN)    ffff83081f057ef0 ffff83081f057f18 000020f3f5a4dea5
>> ffff8300b858c060^M
>> (XEN)    000070044fb97100 ffff83081f05bb80 0000000000007f40
>> 0000000000000001^M
>> (XEN)    0000000000000000 000020f3c755a972 ffff83081f06bb90
>> 0000002b00000000^M
>> (XEN)    ffff82c4801a21f0 000000000000e008 0000000000000246
>> ffff83081f057e48^M
>> (XEN)    000000000000e010 ffff83081f057ef0 ffff82c4801a3dc4
>> 000020f3f595c09c^M
>> (XEN)    000020f3f596987e ffff8306383e3010 ffff83081f05b100
>> ffffffffffffffff^M
>> (XEN)    0000000000000001 0000000000000001 ffffffffffffffff
>> ffff83081f057f18^M
>> (XEN)    00000000802d4680 0000000000000000 0000000000000000
>> ffff82c4802d4680^M
>> (XEN)    000002a80000024b ffff8300b8586000 ffff83081f057f18
>> ffff8300b8586000^M
>> (XEN)    ffff8300b858c000 ffff8300b858c000 0000000000000002
>> ffff83081f057f10^M
>> (XEN)    ffff82c48015a261 ffff82c480126ccd 0000000000000001
>> ffff83081f057d18^M
>> (XEN)    0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000^M
>> (XEN)    0000000000000000 0000000000000000 0000000000000246
>> ffff88001a8093a0^M
>> (XEN)    0000000100885e0f 000000000000000f 0000000000000000
>> ffffffff802063aa^M
>> (XEN)    0000000000000001 00000000deadbeef 00000000deadbeef
>> 0000010000000000^M
>> (XEN) Xen call trace:^M
>> (XEN)    [<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
>> (XEN)    [<ffff82c480161a66>] common_interrupt+0x26/0x30^M
>> (XEN)    [<ffff82c4801a21f0>] lapic_timer_nop+0x0/0x6^M
>> (XEN)    [<ffff82c48015a261>] idle_loop+0x48/0x59^M
>> (XEN)    ^M
>> (XEN) ^M
>> (XEN) ****************************************^M
>> (XEN) Panic on CPU 1:^M
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1027^M
>> (XEN) ****************************************^M
>> (XEN) ^M
>> (XEN) Reboot in five seconds...^M
>>
>> Am 31.05.2013 22:32, schrieb Andrew Cooper:
>>> Recently our automated testing system has caught a curious assertion
>>> while testing Xen 4.1.5 on a HaswellDT system.
>>>
>>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>>> irq.c:1030
>>> (XEN) ----[ Xen-4.1.5  x86_64  debug=n  Not tainted ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>>> (XEN) RFLAGS: 0000000000010093   CONTEXT: hypervisor
>>> (XEN) rax: 000000000000002f   rbx: ffff830249841e80   rcx:
>>> ffff82c4803127c0
>>> (XEN) rdx: 0000000000000004   rsi: 0000000000000027   rdi:
>>> 0000000000000001
>>> (XEN) rbp: 0000000000001e00   rsp: ffff82c4802bfd48   r8:
>>> ffff82c480312abc
>>> (XEN) r9:  ffff8302498a5948   r10: 0000000000000009   r11:
>>> ffff8302498c6c80
>>> (XEN) r12: ffff830243b07f50   r13: ffff8300a24f8000   r14:
>>> 00000af8373788e3
>>> (XEN) r15: ffff830249841e80   cr0: 000000008005003b   cr4:
>>> 00000000001026f0
>>> (XEN) cr3: 00000002479e6000   cr2: 00000000e6d3c090
>>> (XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0000   ss: 0000   cs: e008
>>> (XEN) Xen stack trace from rsp=ffff82c4802bfd48:
>>> (XEN)    ffff830249841eb4 ffff82c480312ec0 000000000000001e
>>> 0000001e00000000
>>> (XEN)    0000000000000000 00000000498a5670 ffff830249841d80
>>> ffff830249840080
>>> (XEN)    ffff830249841db4 0000000000000000 ffff8302498a55e0
>>> ffff8302498a5670
>>> (XEN)    ffff8300a24f8000 00000af8373788e3 00000af83736b8ed
>>> ffff82c480162ca0
>>> (XEN)    00000af83736b8ed 00000af8373788e3 ffff8300a24f8000
>>> ffff8302498a5670
>>> (XEN)    ffff8302498a55e0 0000000000000000 ffff8302498c6c80
>>> 0000000000000009
>>> (XEN)    ffff8302498a5948 ffff82c480313000 0000000000007f40
>>> 0000000000000001
>>> (XEN)    0000000000000000 0000000000000000 00000af80db652fd
>>> 0000002700000000
>>> (XEN)    ffff82c4801a50a0 000000000000e008 0000000000000246
>>> ffff82c4802bfe78
>>> (XEN)    0000000000000000 ffff8302498a5670 ffff82c4801a6a56
>>> ffffffffffffffff
>>> (XEN)    ffff830249818000 0000000000000000 ffff8300a24f8000
>>> ffff82c480122c11
>>> (XEN)    00000af839021119 0000000000000000 0000000000000000
>>> 00000000802bff18
>>> (XEN)    0000025c0000013b ffff82c4802e7580 ffff82c4802bff18
>>> ffff8300a2838000
>>> (XEN)    ffff82c4802f61a0 ffff8300a24f8000 0000000000000002
>>> 00000af837304b45
>>> (XEN)    ffff82c48015b67a 0000000000000000 0000000000000000
>>> 0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 00000000ee8a3f8c
>>> 0000000000000001
>>> (XEN)    0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 00000000ee8a3f74
>>> 0000000000000af8
>>> (XEN)    0000000000000001 0000010000000000 00000000c01013a7
>>> 0000000000000061
>>> (XEN)    0000000000000246 00000000ee8a3f70 0000000000000069
>>> 0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN)       [<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>>> (XEN)     15[<ffff82c480162ca0>] common_interrupt+0x20/0x30
>>> (XEN)     32[<ffff82c4801a50a0>] lapic_timer_nop+0x0/0x10
>>> (XEN)     38[<ffff82c4801a6a56>] acpi_processor_idle+0x376/0x740
>>> (XEN)     43[<ffff82c480122c11>] do_block+0x71/0xd0
>>> (XEN)     56[<ffff82c48015b67a>] idle_loop+0x1a/0x50
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>>> irq.c:1030
>>> (XEN) ****************************************
>>>
>>> And the disassembly before the assertion:
>>>
>>> ffff82c48016b29f:       48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
>>> ffff82c48016b2a6:       00
>>> ffff82c48016b2a7:       0f b6 44 11 ff          movzbl
>>> -0x1(%rcx,%rdx,1),%eax
>>> ffff82c48016b2ac:       39 c6                   cmp    %eax,%esi
>>> ffff82c48016b2ae:       0f 8f 5c ff ff ff       jg
>>> ffff82c48016b210 <do_IRQ+0x470>
>>> ffff82c48016b2b4:       0f 0b                   ud2
>>>
>>>
>>> Xen has been woken up by an interrupt of vector 0x27, but has a vector
>>> 0x2f on the top of the pending EOI stack for the local APIC.
>>>
>>> I have put in more debugging to dump the LAPIC state of the two
>>> interesting vectors and the IOAPIC state, but I have no idea if/when the
>>> problem might reoccur.
>>>
>>> My understanding of LAPIC priority leads me to think that Xen really
>>> shouldn't be woken up by a lower priority vector if a higher priority
>>> one is still un-eoi'd.  There is not yet sufficient information to tell
>>> whether this is truely the case, or that Xen has simply gotten confused
>>> about which vectors it eoi'd.
>>>
>>> Having said that, we do keep line level interrupts un-eoi'd for extended
>>> periods while guests service the interrupt.  Given that vectors are
>>> chosen at random, we could get into a situation where a line interrupt
>>> has a vector 0xdf and stays pending for 150ms (which I measured as a
>>> not-overly-uncommon mean-time-till-eoi for line level interrupt).  This
>>> would starve any other guest interrupts for an extended period.
>>>
>>> Given directed-eoi support in the past few generations of processor, the
>>> requirement for the pending EOI stack has disappeared as far as I am
>>> aware.  Would it be sensible idea in general to make use of the pending
>>> eoi stack conditional on not having/using directed EOI support?
>>>
>>> ~Andrew
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 10484 bytes --]

[-- Attachment #2: 20130803-crash.log --]
[-- Type: text/plain, Size: 9521 bytes --]

(XEN) HVM4: 182520].
(XEN) HVM4: [bind=FFFFFA8002A98240] Binding reference count-- = 1.
(XEN) HVM4: [FFFFFA8002155170] WskKnrCompleteRequest: complete irp with IO status = 000000
(XEN) HVM4: 00.
(XEN) HVM4: [addr=FFFFF8A007182520] WskProAPIFreeAddressInfo freed addrinfo.
(XEN) **Pending EOI error
(XEN)   irq 29, vector 0x2e
(XEN)   s[0] irq 29, vec 0x2e, ready 0, ISR 1, TMR 0, IRR 0
(XEN) All LAPIC state:
(XEN) [vector]      ISR      TMR      IRR
(XEN) [1f:02x] 00000000 00000000 00000000
(XEN) [3f:202x] 00016384 4095716568 00000000
(XEN) [5f:402x] 00000000 4041382474 00000000
(XEN) [7f:602x] 00000000 3967325758 00000000
(XEN) [9f:802x] 00000000 2123395250 00000000
(XEN) [bf:a02x] 00000000 1502837374 00000000
(XEN) [df:c02x] 00000000 4270415335 00000000
(XEN) [ff:e02x] 00000000 00000000 00000000
(XEN) Guest interrupt information:
(XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  1(----),
(XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC          status=00000000 mapped, unbound
(XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  5(----),
(XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(----),
(XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0:  9(----),
(XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  16 affinity:4 vec:a4 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 16(----),
(XEN)    IRQ:  18 affinity:2 vec:36 type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 18(----),
(XEN)    IRQ:  19 affinity:1 vec:39 type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  20 affinity:1 vec:29 type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  22 affinity:2 vec:3e type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 22(----),
(XEN)    IRQ:  23 affinity:1 vec:d8 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 23(----),
(XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  26 affinity:f vec:c0 type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  29 affinity:4 vec:2e type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:276(----),
(XEN)    IRQ:  30 affinity:4 vec:d5 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:275(----),
(XEN)    IRQ:  31 affinity:2 vec:dc type=PCI-MSI         status=00000054 in-flight=0 domain-list=0:274(----),
(XEN)    IRQ:  32 affinity:4 vec:44 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(----),
(XEN)    IRQ:  33 affinity:8 vec:a7 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:272(----),
(XEN)    IRQ:  34 affinity:1 vec:59 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:271(----),
(XEN) IO-APIC interrupt information:
(XEN)     IRQ  0 Vec240:
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  1 Vec 56:
(XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  3 Vec 64:
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  4 Vec 72:
(XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  5 Vec 80:
(XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  6 Vec 88:
(XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  7 Vec 96:
(XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  8 Vec104:
(XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  9 Vec112:
(XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 10 Vec120:
(XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 11 Vec136:
(XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 12 Vec144:
(XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 13 Vec152:
(XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 14 Vec160:
(XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 15 Vec168:
(XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 16 Vec164:
(XEN)       Apic 0x00, Pin 16: vec=a4 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 18 Vec 54:
(XEN)       Apic 0x00, Pin 18: vec=36 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 19 Vec 57:
(XEN)       Apic 0x00, Pin 19: vec=39 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 20 Vec 41:
(XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 22 Vec 62:
(XEN)       Apic 0x00, Pin 22: vec=3e delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 23 Vec216:
(XEN)       Apic 0x00, Pin 23: vec=d8 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN) number of MP IRQ sources: 15.
(XEN) number of IO-APIC #2 registers: 24.
(XEN) testing the IO APIC.......................
(XEN) IO APIC #2......
(XEN) .... register #00: 02000000
(XEN) .......    : physical APIC id: 02
(XEN) .......    : Delivery Type: 0
(XEN) .......    : LTS          : 0
(XEN) .... register #01: 00170020
(XEN) .......     : max redirection entries: 0017
(XEN) .......     : PRQ implemented: 0
(XEN) .......     : IO APIC version: 0020
(XEN) .... IRQ redirection table:
(XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
(XEN)  00 000 00  1    0    0   0   0    0    0    00
(XEN)  01 000 00  0    0    0   0   0    1    1    38
(XEN)  02 000 00  0    0    0   0   0    1    1    F0
(XEN)  03 000 00  0    0    0   0   0    1    1    40
(XEN)  04 000 00  0    0    0   0   0    1    1    48
(XEN)  05 000 00  0    0    0   0   0    1    1    50
(XEN)  06 000 00  0    0    0   0   0    1    1    58
(XEN)  07 000 00  0    0    0   0   0    1    1    60
(XEN)  08 000 00  0    0    0   0   0    1    1    68
(XEN)  09 000 00  0    1    0   0   0    1    1    70
(XEN)  0a 000 00  0    0    0   0   0    1    1    78
(XEN)  0b 000 00  0    0    0   0   0    1    1    88
(XEN)  0c 000 00  0    0    0   0   0    1    1    90
(XEN)  0d 000 00  0    0    0   0   0    1    1    98
(XEN)  0e 000 00  0    0    0   0   0    1    1    A0
(XEN)  0f 000 00  0    0    0   0   0    1    1    A8
(XEN)  10 000 00  0    1    0   1   0    1    1    A4
(XEN)  11 000 00  1    0    0   0   0    0    0    00
(XEN)  12 000 00  0    1    0   1   0    1    1    36
(XEN)  13 000 00  1    1    0   1   0    1    1    39
(XEN)  14 000 00  1    1    0   1   0    1    1    29
(XEN)  15 07A 0A  1    0    0   0   0    0    2    B4
(XEN)  16 000 00  0    1    0   1   0    1    1    3E
(XEN)  17 000 00  0    1    0   1   0    1    1    D8
(XEN) Using vector-based indexing
(XEN) IRQ to pin mappings:
(XEN) IRQ240 -> 0:2
(XEN) IRQ56 -> 0:1
(XEN) IRQ64 -> 0:3
(XEN) IRQ72 -> 0:4
(XEN) IRQ80 -> 0:5
(XEN) IRQ88 -> 0:6
(XEN) IRQ96 -> 0:7
(XEN) IRQ104 -> 0:8
(XEN) IRQ112 -> 0:9
(XEN) IRQ120 -> 0:10
(XEN) IRQ136 -> 0:11
(XEN) IRQ144 -> 0:12
(XEN) IRQ152 -> 0:13
(XEN) IRQ160 -> 0:14
(XEN) IRQ168 -> 0:15
(XEN) IRQ164 -> 0:16
(XEN) IRQ54 -> 0:18
(XEN) IRQ57 -> 0:19
(XEN) IRQ41 -> 0:20
(XEN) IRQ62 -> 0:22
(XEN) IRQ216 -> 0:23
(XEN) .................................... done.
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 2:
(XEN) CA-107844****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Executing crash image

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-02 22:50     ` Thimo E.
@ 2013-08-02 23:32       ` Andrew Cooper
  2013-08-05 12:45         ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-08-02 23:32 UTC (permalink / raw)
  To: Thimo E.; +Cc: Keir Fraser, Jan Beulich, Xen-devel List


[-- Attachment #1.1: Type: text/plain, Size: 1364 bytes --]

On 02/08/2013 23:50, Thimo E. wrote:
> Hi,
>
> I've postet it already in the forum thread, but to keep all of you up
> to date for this issue I am copying the logfile into this thread, too:
>
> XenServer crash again, attached you'll find the output with the
> verbose messages Andrew inserted into the code.
>
> Best regards
>   Thimo

So I can see that I did screw up the debugging patch a tad, but the
information is still salvageable.

Adjusted from my "interesting" idea of printk formatting,

(XEN) **Pending EOI error
(XEN)   irq 29, vector 0x2e
(XEN)   s[0] irq 29, vec 0x2e, ready 0, ISR 1, TMR 0, IRR 0
(XEN) All LAPIC state:
(XEN) [vector]      ISR      TMR      IRR
(XEN) [1f:01] 00000000 00000000 00000000
(XEN) [3f:20] 00016384 4095716568 00000000
(XEN) [5f:40] 00000000 4041382474 00000000
(XEN) [7f:60] 00000000 3967325758 00000000
(XEN) [9f:80] 00000000 2123395250 00000000
(XEN) [bf:a0] 00000000 1502837374 00000000
(XEN) [df:c0] 00000000 4270415335 00000000
(XEN) [ff:e0] 00000000 00000000 00000000

So Xen has been interrupted by an interrupt which it believes it has
already seen, and is outstanding on the PendingEOI stack, waiting for
Dom0 to actually deal with.

The In Service Register indicates (given the hex/dec snafu) that only
vector 0x2e is in service.

I will update my debugging patch with some extra information tomorrow.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 2496 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-02 23:32       ` Andrew Cooper
@ 2013-08-05 12:45         ` Jan Beulich
  2013-08-05 14:51           ` Andrew Cooper
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2013-08-05 12:45 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Thimo E., Keir Fraser, Xen-develList

>>> On 03.08.13 at 01:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> Adjusted from my "interesting" idea of printk formatting,
> 
> (XEN) **Pending EOI error
> (XEN)   irq 29, vector 0x2e
> (XEN)   s[0] irq 29, vec 0x2e, ready 0, ISR 1, TMR 0, IRR 0
> (XEN) All LAPIC state:
> (XEN) [vector]      ISR      TMR      IRR
> (XEN) [1f:01] 00000000 00000000 00000000
> (XEN) [3f:20] 00016384 4095716568 00000000
> (XEN) [5f:40] 00000000 4041382474 00000000
> (XEN) [7f:60] 00000000 3967325758 00000000
> (XEN) [9f:80] 00000000 2123395250 00000000
> (XEN) [bf:a0] 00000000 1502837374 00000000
> (XEN) [df:c0] 00000000 4270415335 00000000
> (XEN) [ff:e0] 00000000 00000000 00000000
> 
> So Xen has been interrupted by an interrupt which it believes it has
> already seen, and is outstanding on the PendingEOI stack, waiting for
> Dom0 to actually deal with.

And which hence should be masked. Is this perhaps a non-maskable
MSI, and the device (erroneously?) issues a new interrupts before
the old one was really finished with?

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-05 12:45         ` Jan Beulich
@ 2013-08-05 14:51           ` Andrew Cooper
  2013-08-09 21:27             ` Thimo E.
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-08-05 14:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Thimo E., Keir Fraser, Xen-develList

On 05/08/13 13:45, Jan Beulich wrote:
>>>> On 03.08.13 at 01:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> Adjusted from my "interesting" idea of printk formatting,
>>
>> (XEN) **Pending EOI error
>> (XEN)   irq 29, vector 0x2e
>> (XEN)   s[0] irq 29, vec 0x2e, ready 0, ISR 1, TMR 0, IRR 0
>> (XEN) All LAPIC state:
>> (XEN) [vector]      ISR      TMR      IRR
>> (XEN) [1f:01] 00000000 00000000 00000000
>> (XEN) [3f:20] 00016384 4095716568 00000000
>> (XEN) [5f:40] 00000000 4041382474 00000000
>> (XEN) [7f:60] 00000000 3967325758 00000000
>> (XEN) [9f:80] 00000000 2123395250 00000000
>> (XEN) [bf:a0] 00000000 1502837374 00000000
>> (XEN) [df:c0] 00000000 4270415335 00000000
>> (XEN) [ff:e0] 00000000 00000000 00000000
>>
>> So Xen has been interrupted by an interrupt which it believes it has
>> already seen, and is outstanding on the PendingEOI stack, waiting for
>> Dom0 to actually deal with.
> And which hence should be masked. Is this perhaps a non-maskable
> MSI, and the device (erroneously?) issues a new interrupts before
> the old one was really finished with?
>
> Jan
>

All of these crashes are coming out of mwait_idle, so the cpu in
question has literally just been in an lower power state.

I am wondering whether there is some caching issue where an update to
the Pending EOI stack pointer got "lost", but this seems like a little
too specific to be reasonably explained as a caching issue.

A new debugging patch is on its way (Sorry - it has been a very busy few
days)

~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-05 14:51           ` Andrew Cooper
@ 2013-08-09 21:27             ` Thimo E.
  2013-08-09 21:40               ` Andrew Cooper
  2013-08-12  8:20               ` Jan Beulich
  0 siblings, 2 replies; 63+ messages in thread
From: Thimo E. @ 2013-08-09 21:27 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, Xen-develList


[-- Attachment #1.1: Type: text/plain, Size: 11332 bytes --]

Next crash occured, debugging output included.

One Remark: Over the last days (besides many linux PV guests) 1 Windows 
Guest (with PV drivers) was running, today I've started another Windows 
guest and during 3 hours two crashed occured, coincidence ?

Best regards
   Thimo

(XEN) **Pending EOI error
(XEN)   irq 29, vector 0x24
(XEN)   s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 
00000000
(XEN) All LAPIC state:
(XEN) [vector]      ISR      TMR      IRR
(XEN) [1f:00] 00000000 00000000 00000000
(XEN) [3f:20] 00000010 76efa12e 00000000
(XEN) [5f:40] 00000000 e6f0f2fc 00000000
(XEN) [7f:60] 00000000 32d096ca 00000000
(XEN) [9f:80] 00000000 78fcf87a 00000000
(XEN) [bf:a0] 00000000 f9b9fe4e 00000000
(XEN) [df:c0] 00000000 ffdfe7ab 00000000
(XEN) [ff:e0] 00000000 00000000 00000000
(XEN) Peoi stack trace records:
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x24}
(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Guest interrupt information:
(XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge status=00000000 
mapped, unbound
(XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge status=00000050 
in-flight=0 domain-list=0:  1(----),
(XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC status=00000000 mapped, 
unbound
(XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge status=00000002 
mapped, unbound
(XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge status=00000002 
mapped, unbound
(XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge status=00000050 
in-flight=0 domain-list=0:  5(----),
(XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge status=00000002 
mapped, unbound
(XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge status=00000002 
mapped, unbound
(XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge status=00000050 
in-flight=0 domain-list=0:  8(----),
(XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level status=00000050 
in-flight=0 domain-list=0:  9(----),
(XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge status=00000002 
mapped, unbound
(XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge status=00000002 
mapped, unbound
(XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge status=00000002 
mapped, unbound
(XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge status=00000002 
mapped, unbound
(XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge status=00000002 
mapped, unbound
(XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge status=00000002 
mapped, unbound
(XEN)    IRQ:  16 affinity:1 vec:db type=IO-APIC-level status=00000010 
in-flight=0 domain-list=0: 16(----),
(XEN)    IRQ:  18 affinity:1 vec:2c type=IO-APIC-level status=00000010 
in-flight=0 domain-list=0: 18(----),
(XEN)    IRQ:  19 affinity:1 vec:51 type=IO-APIC-level status=00000002 
mapped, unbound
(XEN)    IRQ:  20 affinity:1 vec:29 type=IO-APIC-level status=00000002 
mapped, unbound
(XEN)    IRQ:  22 affinity:1 vec:bb type=IO-APIC-level status=00000050 
in-flight=0 domain-list=0: 22(----),
(XEN)    IRQ:  23 affinity:8 vec:c2 type=IO-APIC-level status=00000050 
in-flight=0 domain-list=0: 23(----),
(XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI status=00000000 mapped, 
unbound
(XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI status=00000000 mapped, 
unbound
(XEN)    IRQ:  26 affinity:f vec:c0 type=PCI-MSI status=00000002 mapped, 
unbound
(XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI status=00000002 mapped, 
unbound
(XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI status=00000002 mapped, 
unbound
(XEN)    IRQ:  29 affinity:2 vec:24 type=PCI-MSI status=00000010 
in-flight=0 domain-list=0:276(----),
(XEN)    IRQ:  30 affinity:4 vec:93 type=PCI-MSI status=00000050 
in-flight=0 domain-list=0:275(----),
(XEN)    IRQ:  31 affinity:2 vec:4a type=PCI-MSI status=00000050 
in-flight=0 domain-list=0:274(----),
(XEN)    IRQ:  32 affinity:2 vec:73 type=PCI-MSI status=00000050 
in-flight=0 domain-list=0:273(----),
(XEN)    IRQ:  33 affinity:1 vec:49 type=PCI-MSI status=00000050 
in-flight=0 domain-list=0:272(----),
(XEN)    IRQ:  34 affinity:8 vec:5f type=PCI-MSI status=00000050 
in-flight=0 domain-list=0:271(----),
(XEN) IO-APIC interrupt information:
(XEN)     IRQ  0 Vec240:
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  1 Vec 56:
(XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  3 Vec 64:
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  4 Vec 72:
(XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  5 Vec 80:
(XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  6 Vec 88:
(XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  7 Vec 96:
(XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  8 Vec104:
(XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  9 Vec112:
(XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 10 Vec120:
(XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 11 Vec136:
(XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 12 Vec144:
(XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 13 Vec152:
(XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 14 Vec160:
(XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 15 Vec168:
(XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 16 Vec219:
(XEN)       Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 18 Vec 44:
(XEN)       Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 19 Vec 81:
(XEN)       Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 20 Vec 41:
(XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 22 Vec187:
(XEN)       Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 23 Vec194:
(XEN)       Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN) number of MP IRQ sources: 15.
(XEN) number of IO-APIC #2 registers: 24.
(XEN) testing the IO APIC.......................
(XEN) IO APIC #2......
(XEN) .... register #00: 02000000
(XEN) .......    : physical APIC id: 02
(XEN) .......    : Delivery Type: 0
(XEN) .......    : LTS          : 0
(XEN) .... register #01: 00170020
(XEN) .......     : max redirection entries: 0017
(XEN) .......     : PRQ implemented: 0
(XEN) .......     : IO APIC version: 0020
(XEN) .... IRQ redirection table:
(XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
(XEN)  00 000 00  1    0    0   0   0    0    0    00
(XEN)  01 000 00  0    0    0   0   0    1    1    38
(XEN)  02 000 00  0    0    0   0   0    1    1    F0
(XEN)  03 000 00  0    0    0   0   0    1    1    40
(XEN)  04 000 00  0    0    0   0   0    1    1    48
(XEN)  05 000 00  0    0    0   0   0    1    1    50
(XEN)  06 000 00  0    0    0   0   0    1    1    58
(XEN)  07 000 00  0    0    0   0   0    1    1    60
(XEN)  08 000 00  0    0    0   0   0    1    1    68
(XEN)  09 000 00  0    1    0   0   0    1    1    70
(XEN)  0a 000 00  0    0    0   0   0    1    1    78
(XEN)  0b 000 00  0    0    0   0   0    1    1    88
(XEN)  0c 000 00  0    0    0   0   0    1    1    90
(XEN)  0d 000 00  0    0    0   0   0    1    1    98
(XEN)  0e 000 00  0    0    0   0   0    1    1    A0
(XEN)  0f 000 00  0    0    0   0   0    1    1    A8
(XEN)  10 000 00  0    1    0   1   0    1    1    DB
(XEN)  11 000 00  1    0    0   0   0    0    0    00
(XEN)  12 000 00  0    1    0   1   0    1    1    2C
(XEN)  13 000 00  1    1    0   1   0    1    1    51
(XEN)  14 000 00  1    1    0   1   0    1    1    29
(XEN)  15 07A 0A  1    0    0   0   0    0    2    B4
(XEN)  16 000 00  0    1    0   1   0    1    1    BB
(XEN)  17 000 00  0    1    0   1   0    1    1    C2
(XEN) Using vector-based indexing
(XEN) IRQ to pin mappings:
(XEN) IRQ240 -> 0:2
(XEN) IRQ56 -> 0:1
(XEN) IRQ64 -> 0:3
(XEN) IRQ72 -> 0:4
(XEN) IRQ80 -> 0:5
(XEN) IRQ88 -> 0:6
(XEN) IRQ96 -> 0:7
(XEN) IRQ104 -> 0:8
(XEN) IRQ112 -> 0:9
(XEN) IRQ120 -> 0:10
(XEN) IRQ136 -> 0:11
(XEN) IRQ144 -> 0:12
(XEN) IRQ152 -> 0:13
(XEN) IRQ160 -> 0:14
(XEN) IRQ168 -> 0:15
(XEN) IRQ219 -> 0:16
(XEN) IRQ44 -> 0:18
(XEN) IRQ81 -> 0:19
(XEN) IRQ41 -> 0:20
(XEN) IRQ187 -> 0:22
(XEN) IRQ194 -> 0:23
(XEN) .................................... done.
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) CA-107844****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Executing crash image


Am 05.08.2013 16:51, schrieb Andrew Cooper:
> All of these crashes are coming out of mwait_idle, so the cpu in
> question has literally just been in an lower power state.
>
> I am wondering whether there is some caching issue where an update to
> the Pending EOI stack pointer got "lost", but this seems like a little
> too specific to be reasonably explained as a caching issue.
>
> A new debugging patch is on its way (Sorry - it has been a very busy few
> days)
>
> ~Andrew
>

[-- Attachment #1.2: Type: text/html, Size: 19962 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-09 21:27             ` Thimo E.
@ 2013-08-09 21:40               ` Andrew Cooper
  2013-08-09 21:44                 ` Andrew Cooper
  2013-08-12  5:50                 ` Zhang, Yang Z
  2013-08-12  8:20               ` Jan Beulich
  1 sibling, 2 replies; 63+ messages in thread
From: Andrew Cooper @ 2013-08-09 21:40 UTC (permalink / raw)
  To: Thimo E.
  Cc: Keir Fraser, Jan Beulich, Eddie Dong, Xen-develList,
	Jun Nakajima, Xiantao Zhang


[-- Attachment #1.1: Type: text/plain, Size: 12520 bytes --]

On 09/08/13 22:27, Thimo E. wrote:
> Next crash occured, debugging output included.
>
> One Remark: Over the last days (besides many linux PV guests) 1
> Windows Guest (with PV drivers) was running, today I've started
> another Windows guest and during 3 hours two crashed occured,
> coincidence ?
>
> Best regards
>   Thimo

So according to my debugging, we really have just pushed the same irq
which we have subsequently seen again unexpectedly.

This bug has only ever been seen on Haswell hardware, and appears linked
to running HVM guests.

So either there is an erroneous ACK the LAPIC which is clearing the ISR
before the PEOI stack is expecting (which I obviously see, looking at
the code), or something more funky is going on with the hardware.

CC'ing in the Intel maintainers:  Do you have any ideas?  Could this be
related to APICv?

~Andrew

>
> (XEN) **Pending EOI error
> (XEN)   irq 29, vector 0x24
> (XEN)   s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000,
> IRR 00000000
> (XEN) All LAPIC state:
> (XEN) [vector]      ISR      TMR      IRR
> (XEN) [1f:00] 00000000 00000000 00000000
> (XEN) [3f:20] 00000010 76efa12e 00000000
> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
> (XEN) [7f:60] 00000000 32d096ca 00000000
> (XEN) [9f:80] 00000000 78fcf87a 00000000
> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
> (XEN) [ff:e0] 00000000 00000000 00000000
> (XEN) Peoi stack trace records:
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Guest interrupt information:
> (XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge   
> status=00000000 mapped, unbound
> (XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge   
> status=00000050 in-flight=0 domain-list=0:  1(----),
> (XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC         
> status=00000000 mapped, unbound
> (XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge   
> status=00000002 mapped, unbound
> (XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge   
> status=00000002 mapped, unbound
> (XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge   
> status=00000050 in-flight=0 domain-list=0:  5(----),
> (XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge   
> status=00000002 mapped, unbound
> (XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge   
> status=00000002 mapped, unbound
> (XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge   
> status=00000050 in-flight=0 domain-list=0:  8(----),
> (XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level  
> status=00000050 in-flight=0 domain-list=0:  9(----),
> (XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge   
> status=00000002 mapped, unbound
> (XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge   
> status=00000002 mapped, unbound
> (XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge   
> status=00000002 mapped, unbound
> (XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge   
> status=00000002 mapped, unbound
> (XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge   
> status=00000002 mapped, unbound
> (XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge   
> status=00000002 mapped, unbound
> (XEN)    IRQ:  16 affinity:1 vec:db type=IO-APIC-level  
> status=00000010 in-flight=0 domain-list=0: 16(----),
> (XEN)    IRQ:  18 affinity:1 vec:2c type=IO-APIC-level  
> status=00000010 in-flight=0 domain-list=0: 18(----),
> (XEN)    IRQ:  19 affinity:1 vec:51 type=IO-APIC-level  
> status=00000002 mapped, unbound
> (XEN)    IRQ:  20 affinity:1 vec:29 type=IO-APIC-level  
> status=00000002 mapped, unbound
> (XEN)    IRQ:  22 affinity:1 vec:bb type=IO-APIC-level  
> status=00000050 in-flight=0 domain-list=0: 22(----),
> (XEN)    IRQ:  23 affinity:8 vec:c2 type=IO-APIC-level  
> status=00000050 in-flight=0 domain-list=0: 23(----),
> (XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI        
> status=00000000 mapped, unbound
> (XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI        
> status=00000000 mapped, unbound
> (XEN)    IRQ:  26 affinity:f vec:c0 type=PCI-MSI        
> status=00000002 mapped, unbound
> (XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI        
> status=00000002 mapped, unbound
> (XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI        
> status=00000002 mapped, unbound
> (XEN)    IRQ:  29 affinity:2 vec:24 type=PCI-MSI        
> status=00000010 in-flight=0 domain-list=0:276(----),
> (XEN)    IRQ:  30 affinity:4 vec:93 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=0:275(----),
> (XEN)    IRQ:  31 affinity:2 vec:4a type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=0:274(----),
> (XEN)    IRQ:  32 affinity:2 vec:73 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=0:273(----),
> (XEN)    IRQ:  33 affinity:1 vec:49 type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=0:272(----),
> (XEN)    IRQ:  34 affinity:8 vec:5f type=PCI-MSI        
> status=00000050 in-flight=0 domain-list=0:271(----),
> (XEN) IO-APIC interrupt information:
> (XEN)     IRQ  0 Vec240:
> (XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  1 Vec 56:
> (XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  3 Vec 64:
> (XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  4 Vec 72:
> (XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  5 Vec 80:
> (XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  6 Vec 88:
> (XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  7 Vec 96:
> (XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  8 Vec104:
> (XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  9 Vec112:
> (XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 10 Vec120:
> (XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 11 Vec136:
> (XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 12 Vec144:
> (XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 13 Vec152:
> (XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 14 Vec160:
> (XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 15 Vec168:
> (XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 16 Vec219:
> (XEN)       Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 18 Vec 44:
> (XEN)       Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 19 Vec 81:
> (XEN)       Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN)     IRQ 20 Vec 41:
> (XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN)     IRQ 22 Vec187:
> (XEN)       Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 23 Vec194:
> (XEN)       Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN) number of MP IRQ sources: 15.
> (XEN) number of IO-APIC #2 registers: 24.
> (XEN) testing the IO APIC.......................
> (XEN) IO APIC #2......
> (XEN) .... register #00: 02000000
> (XEN) .......    : physical APIC id: 02
> (XEN) .......    : Delivery Type: 0
> (XEN) .......    : LTS          : 0
> (XEN) .... register #01: 00170020
> (XEN) .......     : max redirection entries: 0017
> (XEN) .......     : PRQ implemented: 0
> (XEN) .......     : IO APIC version: 0020
> (XEN) .... IRQ redirection table:
> (XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
> (XEN)  00 000 00  1    0    0   0   0    0    0    00
> (XEN)  01 000 00  0    0    0   0   0    1    1    38
> (XEN)  02 000 00  0    0    0   0   0    1    1    F0
> (XEN)  03 000 00  0    0    0   0   0    1    1    40
> (XEN)  04 000 00  0    0    0   0   0    1    1    48
> (XEN)  05 000 00  0    0    0   0   0    1    1    50
> (XEN)  06 000 00  0    0    0   0   0    1    1    58
> (XEN)  07 000 00  0    0    0   0   0    1    1    60
> (XEN)  08 000 00  0    0    0   0   0    1    1    68
> (XEN)  09 000 00  0    1    0   0   0    1    1    70
> (XEN)  0a 000 00  0    0    0   0   0    1    1    78
> (XEN)  0b 000 00  0    0    0   0   0    1    1    88
> (XEN)  0c 000 00  0    0    0   0   0    1    1    90
> (XEN)  0d 000 00  0    0    0   0   0    1    1    98
> (XEN)  0e 000 00  0    0    0   0   0    1    1    A0
> (XEN)  0f 000 00  0    0    0   0   0    1    1    A8
> (XEN)  10 000 00  0    1    0   1   0    1    1    DB
> (XEN)  11 000 00  1    0    0   0   0    0    0    00
> (XEN)  12 000 00  0    1    0   1   0    1    1    2C
> (XEN)  13 000 00  1    1    0   1   0    1    1    51
> (XEN)  14 000 00  1    1    0   1   0    1    1    29
> (XEN)  15 07A 0A  1    0    0   0   0    0    2    B4
> (XEN)  16 000 00  0    1    0   1   0    1    1    BB
> (XEN)  17 000 00  0    1    0   1   0    1    1    C2
> (XEN) Using vector-based indexing
> (XEN) IRQ to pin mappings:
> (XEN) IRQ240 -> 0:2
> (XEN) IRQ56 -> 0:1
> (XEN) IRQ64 -> 0:3
> (XEN) IRQ72 -> 0:4
> (XEN) IRQ80 -> 0:5
> (XEN) IRQ88 -> 0:6
> (XEN) IRQ96 -> 0:7
> (XEN) IRQ104 -> 0:8
> (XEN) IRQ112 -> 0:9
> (XEN) IRQ120 -> 0:10
> (XEN) IRQ136 -> 0:11
> (XEN) IRQ144 -> 0:12
> (XEN) IRQ152 -> 0:13
> (XEN) IRQ160 -> 0:14
> (XEN) IRQ168 -> 0:15
> (XEN) IRQ219 -> 0:16
> (XEN) IRQ44 -> 0:18
> (XEN) IRQ81 -> 0:19
> (XEN) IRQ41 -> 0:20
> (XEN) IRQ187 -> 0:22
> (XEN) IRQ194 -> 0:23
> (XEN) .................................... done.
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) CA-107844****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN) Executing crash image
>
>
> Am 05.08.2013 16:51, schrieb Andrew Cooper:
>> All of these crashes are coming out of mwait_idle, so the cpu in
>> question has literally just been in an lower power state.
>>
>> I am wondering whether there is some caching issue where an update to
>> the Pending EOI stack pointer got "lost", but this seems like a little
>> too specific to be reasonably explained as a caching issue.
>>
>> A new debugging patch is on its way (Sorry - it has been a very busy few
>> days)
>>
>> ~Andrew
>>


[-- Attachment #1.2: Type: text/html, Size: 21576 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-09 21:40               ` Andrew Cooper
@ 2013-08-09 21:44                 ` Andrew Cooper
  2013-08-11 17:46                   ` Thimo E.
  2013-08-12  5:50                 ` Zhang, Yang Z
  1 sibling, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-08-09 21:44 UTC (permalink / raw)
  To: Thimo E.
  Cc: Keir Fraser, Jan Beulich, Eddie Dong, Xen-develList,
	Jun Nakajima, Xiantao Zhang


[-- Attachment #1.1: Type: text/plain, Size: 13052 bytes --]

On 09/08/13 22:40, Andrew Cooper wrote:
> On 09/08/13 22:27, Thimo E. wrote:
>> Next crash occured, debugging output included.
>>
>> One Remark: Over the last days (besides many linux PV guests) 1
>> Windows Guest (with PV drivers) was running, today I've started
>> another Windows guest and during 3 hours two crashed occured,
>> coincidence ?
>>
>> Best regards
>>   Thimo
>
> So according to my debugging, we really have just pushed the same irq
> which we have subsequently seen again unexpectedly.
>
> This bug has only ever been seen on Haswell hardware, and appears
> linked to running HVM guests.
>
> So either there is an erroneous ACK the LAPIC which is clearing the
> ISR before the PEOI stack is expecting (which I 

"can't"

Apologies for the confusion.

~Andrew

> obviously see, looking at the code), or something more funky is going
> on with the hardware.
>
> CC'ing in the Intel maintainers:  Do you have any ideas?  Could this
> be related to APICv?
>
> ~Andrew
>
>>
>> (XEN) **Pending EOI error
>> (XEN)   irq 29, vector 0x24
>> (XEN)   s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000,
>> IRR 00000000
>> (XEN) All LAPIC state:
>> (XEN) [vector]      ISR      TMR      IRR
>> (XEN) [1f:00] 00000000 00000000 00000000
>> (XEN) [3f:20] 00000010 76efa12e 00000000
>> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
>> (XEN) [7f:60] 00000000 32d096ca 00000000
>> (XEN) [9f:80] 00000000 78fcf87a 00000000
>> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
>> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
>> (XEN) [ff:e0] 00000000 00000000 00000000
>> (XEN) Peoi stack trace records:
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Guest interrupt information:
>> (XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge   
>> status=00000000 mapped, unbound
>> (XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge   
>> status=00000050 in-flight=0 domain-list=0:  1(----),
>> (XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC         
>> status=00000000 mapped, unbound
>> (XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge   
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge   
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge   
>> status=00000050 in-flight=0 domain-list=0:  5(----),
>> (XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge   
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge   
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge   
>> status=00000050 in-flight=0 domain-list=0:  8(----),
>> (XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level  
>> status=00000050 in-flight=0 domain-list=0:  9(----),
>> (XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge   
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge   
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge   
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge   
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge   
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge   
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  16 affinity:1 vec:db type=IO-APIC-level  
>> status=00000010 in-flight=0 domain-list=0: 16(----),
>> (XEN)    IRQ:  18 affinity:1 vec:2c type=IO-APIC-level  
>> status=00000010 in-flight=0 domain-list=0: 18(----),
>> (XEN)    IRQ:  19 affinity:1 vec:51 type=IO-APIC-level  
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  20 affinity:1 vec:29 type=IO-APIC-level  
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  22 affinity:1 vec:bb type=IO-APIC-level  
>> status=00000050 in-flight=0 domain-list=0: 22(----),
>> (XEN)    IRQ:  23 affinity:8 vec:c2 type=IO-APIC-level  
>> status=00000050 in-flight=0 domain-list=0: 23(----),
>> (XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI        
>> status=00000000 mapped, unbound
>> (XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI        
>> status=00000000 mapped, unbound
>> (XEN)    IRQ:  26 affinity:f vec:c0 type=PCI-MSI        
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI        
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI        
>> status=00000002 mapped, unbound
>> (XEN)    IRQ:  29 affinity:2 vec:24 type=PCI-MSI        
>> status=00000010 in-flight=0 domain-list=0:276(----),
>> (XEN)    IRQ:  30 affinity:4 vec:93 type=PCI-MSI        
>> status=00000050 in-flight=0 domain-list=0:275(----),
>> (XEN)    IRQ:  31 affinity:2 vec:4a type=PCI-MSI        
>> status=00000050 in-flight=0 domain-list=0:274(----),
>> (XEN)    IRQ:  32 affinity:2 vec:73 type=PCI-MSI        
>> status=00000050 in-flight=0 domain-list=0:273(----),
>> (XEN)    IRQ:  33 affinity:1 vec:49 type=PCI-MSI        
>> status=00000050 in-flight=0 domain-list=0:272(----),
>> (XEN)    IRQ:  34 affinity:8 vec:5f type=PCI-MSI        
>> status=00000050 in-flight=0 domain-list=0:271(----),
>> (XEN) IO-APIC interrupt information:
>> (XEN)     IRQ  0 Vec240:
>> (XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  1 Vec 56:
>> (XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  3 Vec 64:
>> (XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  4 Vec 72:
>> (XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  5 Vec 80:
>> (XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  6 Vec 88:
>> (XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  7 Vec 96:
>> (XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  8 Vec104:
>> (XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  9 Vec112:
>> (XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 10 Vec120:
>> (XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 11 Vec136:
>> (XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 12 Vec144:
>> (XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 13 Vec152:
>> (XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 14 Vec160:
>> (XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 15 Vec168:
>> (XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 16 Vec219:
>> (XEN)       Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 18 Vec 44:
>> (XEN)       Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 19 Vec 81:
>> (XEN)       Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN)     IRQ 20 Vec 41:
>> (XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN)     IRQ 22 Vec187:
>> (XEN)       Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 23 Vec194:
>> (XEN)       Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN) number of MP IRQ sources: 15.
>> (XEN) number of IO-APIC #2 registers: 24.
>> (XEN) testing the IO APIC.......................
>> (XEN) IO APIC #2......
>> (XEN) .... register #00: 02000000
>> (XEN) .......    : physical APIC id: 02
>> (XEN) .......    : Delivery Type: 0
>> (XEN) .......    : LTS          : 0
>> (XEN) .... register #01: 00170020
>> (XEN) .......     : max redirection entries: 0017
>> (XEN) .......     : PRQ implemented: 0
>> (XEN) .......     : IO APIC version: 0020
>> (XEN) .... IRQ redirection table:
>> (XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
>> (XEN)  00 000 00  1    0    0   0   0    0    0    00
>> (XEN)  01 000 00  0    0    0   0   0    1    1    38
>> (XEN)  02 000 00  0    0    0   0   0    1    1    F0
>> (XEN)  03 000 00  0    0    0   0   0    1    1    40
>> (XEN)  04 000 00  0    0    0   0   0    1    1    48
>> (XEN)  05 000 00  0    0    0   0   0    1    1    50
>> (XEN)  06 000 00  0    0    0   0   0    1    1    58
>> (XEN)  07 000 00  0    0    0   0   0    1    1    60
>> (XEN)  08 000 00  0    0    0   0   0    1    1    68
>> (XEN)  09 000 00  0    1    0   0   0    1    1    70
>> (XEN)  0a 000 00  0    0    0   0   0    1    1    78
>> (XEN)  0b 000 00  0    0    0   0   0    1    1    88
>> (XEN)  0c 000 00  0    0    0   0   0    1    1    90
>> (XEN)  0d 000 00  0    0    0   0   0    1    1    98
>> (XEN)  0e 000 00  0    0    0   0   0    1    1    A0
>> (XEN)  0f 000 00  0    0    0   0   0    1    1    A8
>> (XEN)  10 000 00  0    1    0   1   0    1    1    DB
>> (XEN)  11 000 00  1    0    0   0   0    0    0    00
>> (XEN)  12 000 00  0    1    0   1   0    1    1    2C
>> (XEN)  13 000 00  1    1    0   1   0    1    1    51
>> (XEN)  14 000 00  1    1    0   1   0    1    1    29
>> (XEN)  15 07A 0A  1    0    0   0   0    0    2    B4
>> (XEN)  16 000 00  0    1    0   1   0    1    1    BB
>> (XEN)  17 000 00  0    1    0   1   0    1    1    C2
>> (XEN) Using vector-based indexing
>> (XEN) IRQ to pin mappings:
>> (XEN) IRQ240 -> 0:2
>> (XEN) IRQ56 -> 0:1
>> (XEN) IRQ64 -> 0:3
>> (XEN) IRQ72 -> 0:4
>> (XEN) IRQ80 -> 0:5
>> (XEN) IRQ88 -> 0:6
>> (XEN) IRQ96 -> 0:7
>> (XEN) IRQ104 -> 0:8
>> (XEN) IRQ112 -> 0:9
>> (XEN) IRQ120 -> 0:10
>> (XEN) IRQ136 -> 0:11
>> (XEN) IRQ144 -> 0:12
>> (XEN) IRQ152 -> 0:13
>> (XEN) IRQ160 -> 0:14
>> (XEN) IRQ168 -> 0:15
>> (XEN) IRQ219 -> 0:16
>> (XEN) IRQ44 -> 0:18
>> (XEN) IRQ81 -> 0:19
>> (XEN) IRQ41 -> 0:20
>> (XEN) IRQ187 -> 0:22
>> (XEN) IRQ194 -> 0:23
>> (XEN) .................................... done.
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 1:
>> (XEN) CA-107844****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>> (XEN) Executing crash image
>>
>>
>> Am 05.08.2013 16:51, schrieb Andrew Cooper:
>>> All of these crashes are coming out of mwait_idle, so the cpu in
>>> question has literally just been in an lower power state.
>>>
>>> I am wondering whether there is some caching issue where an update to
>>> the Pending EOI stack pointer got "lost", but this seems like a little
>>> too specific to be reasonably explained as a caching issue.
>>>
>>> A new debugging patch is on its way (Sorry - it has been a very busy few
>>> days)
>>>
>>> ~Andrew
>>>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 22925 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-09 21:44                 ` Andrew Cooper
@ 2013-08-11 17:46                   ` Thimo E.
  2013-08-12  6:02                     ` Zhang, Yang Z
                                       ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Thimo E. @ 2013-08-11 17:46 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Jan Beulich, Eddie Dong, Xen-develList,
	Jun Nakajima, Xiantao Zhang


[-- Attachment #1.1: Type: text/plain, Size: 1103 bytes --]

Hello again,

attached you'll find another crash dump from today. Don't know if it 
gives you more information than the last one.

Just FYI, this is a system with an Intel Mainboard (H87 chipset) and a 
Core i5-4670 CPU.

Best regards
   Thimo

Am 09.08.2013 23:44, schrieb Andrew Cooper:
> On 09/08/13 22:40, Andrew Cooper wrote:
>>
>> So according to my debugging, we really have just pushed the same irq 
>> which we have subsequently seen again unexpectedly.
>>
>> This bug has only ever been seen on Haswell hardware, and appears 
>> linked to running HVM guests.
>>
>> So either there is an erroneous ACK the LAPIC which is clearing the 
>> ISR before the PEOI stack is expecting (which I 
>
> "can't"
>
> Apologies for the confusion.
>
> ~Andrew
>
>> obviously see, looking at the code), or something more funky is going 
>> on with the hardware.
>>
>> CC'ing in the Intel maintainers:  Do you have any ideas?  Could this 
>> be related to APICv?
>>
>> ~Andrew
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 2288 bytes --]

[-- Attachment #2: crash20130811.txt --]
[-- Type: text/plain, Size: 10671 bytes --]

(XEN) **Pending EOI error
(XEN)   irq 29, vector 0x26
(XEN)   s[0] irq 29, vec 0x26, ready 0, ISR 00000001, TMR 00000000, IRR 00000001
(XEN) All LAPIC state:
(XEN) [vector]      ISR      TMR      IRR
(XEN) [1f:00] 00000000 00000000 00000000
(XEN) [3f:20] 00000040 8c334526 00000040
(XEN) [5f:40] 00000000 006c000a 00000000
(XEN) [7f:60] 00000000 00240278 00000000
(XEN) [9f:80] 00000000 40000a10 00000000
(XEN) [bf:a0] 00000000 060d708a 00000000
(XEN) [df:c0] 00000000 42054000 00000000
(XEN) [ff:e0] 00000000 00000000 00000000
(XEN) Peoi stack trace records:
(XEN)   Pushed {sp 0, irq 29, vec 0x26}
(XEN)   Poped entry {sp 1, irq 29, vec 0x26}
(XEN)   Marked {sp 0, irq 29, vec 0x26} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x26}
(XEN)   Poped entry {sp 1, irq 29, vec 0x26}
(XEN)   Marked {sp 0, irq 29, vec 0x26} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x26}
(XEN)   Poped entry {sp 1, irq 29, vec 0x26}
(XEN)   Marked {sp 0, irq 29, vec 0x26} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x26}
(XEN)   Poped entry {sp 1, irq 29, vec 0x26}
(XEN)   Marked {sp 0, irq 29, vec 0x26} ready
(XEN)   Pushed {sp 0, irq 29, vec 0x26}
(XEN)   Poped entry {sp 1, irq 29, vec 0xc7}
(XEN)   Marked {sp 0, irq 29, vec 0x31} ready
(XEN)   Pushed {sp 0, irq 29, vec 0xc7}
(XEN)   Poped entry {sp 1, irq 29, vec 0xc7}
(XEN)   Marked {sp 0, irq 29, vec 0xc7} ready
(XEN)   Pushed {sp 0, irq 29, vec 0xc7}
(XEN)   Poped entry {sp 1, irq 29, vec 0xc7}
(XEN)   Marked {sp 0, irq 29, vec 0xc7} ready
(XEN)   Pushed {sp 0, irq 29, vec 0xc7}
(XEN)   Poped entry {sp 1, irq 29, vec 0xc7}
(XEN)   Marked {sp 0, irq 29, vec 0xc7} ready
(XEN)   Pushed {sp 0, irq 29, vec 0xc7}
(XEN)   Poped entry {sp 1, irq 29, vec 0xcd}
(XEN)   Marked {sp 0, irq 29, vec 0x9e} ready
(XEN)   Pushed {sp 0, irq 29, vec 0xcd}
(XEN)   Poped entry {sp 1, irq 29, vec 0xcd}
(XEN)   Marked {sp 0, irq 29, vec 0xcd} ready
(XEN)   Pushed {sp 0, irq 29, vec 0xcd}
(XEN)   Poped entry {sp 1, irq 29, vec 0xcd}
(XEN) Guest interrupt information:
(XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge    status=00000054 in-flight=0 domain-list=0:  1(----),
(XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC          status=00000000 mapped, unbound
(XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  5(----),
(XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(----),
(XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0:  9(----),
(XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  16 affinity:1 vec:36 type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 16(----),
(XEN)    IRQ:  18 affinity:4 vec:b5 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 18(----),
(XEN)    IRQ:  19 affinity:1 vec:39 type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  20 affinity:1 vec:29 type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  22 affinity:8 vec:66 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 22(----),
(XEN)    IRQ:  23 affinity:1 vec:d8 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 23(----),
(XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  26 affinity:f vec:c0 type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  29 affinity:8 vec:26 type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:276(----),
(XEN)    IRQ:  30 affinity:2 vec:9b type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:275(----),
(XEN)    IRQ:  31 affinity:2 vec:72 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:274(----),
(XEN)    IRQ:  32 affinity:8 vec:3e type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(----),
(XEN)    IRQ:  33 affinity:8 vec:5e type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:272(----),
(XEN)    IRQ:  34 affinity:1 vec:59 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:271(----),
(XEN) IO-APIC interrupt information:
(XEN)     IRQ  0 Vec240:
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  1 Vec 56:
(XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  3 Vec 64:
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  4 Vec 72:
(XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  5 Vec 80:
(XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  6 Vec 88:
(XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  7 Vec 96:
(XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  8 Vec104:
(XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  9 Vec112:
(XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 10 Vec120:
(XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 11 Vec136:
(XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 12 Vec144:
(XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 13 Vec152:
(XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 14 Vec160:
(XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 15 Vec168:
(XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 16 Vec 54:
(XEN)       Apic 0x00, Pin 16: vec=36 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 18 Vec181:
(XEN)       Apic 0x00, Pin 18: vec=b5 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 19 Vec 57:
(XEN)       Apic 0x00, Pin 19: vec=39 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 20 Vec 41:
(XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 22 Vec102:
(XEN)       Apic 0x00, Pin 22: vec=66 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 23 Vec216:
(XEN)       Apic 0x00, Pin 23: vec=d8 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN) number of MP IRQ sources: 15.
(XEN) number of IO-APIC #2 registers: 24.
(XEN) testing the IO APIC.......................
(XEN) IO APIC #2......
(XEN) .... register #00: 02000000
(XEN) .......    : physical APIC id: 02
(XEN) .......    : Delivery Type: 0
(XEN) .......    : LTS          : 0
(XEN) .... register #01: 00170020
(XEN) .......     : max redirection entries: 0017
(XEN) .......     : PRQ implemented: 0
(XEN) .......     : IO APIC version: 0020
(XEN) .... IRQ redirection table:
(XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
(XEN)  00 000 00  1    0    0   0   0    0    0    00
(XEN)  01 000 00  0    0    0   0   0    1    1    38
(XEN)  02 000 00  0    0    0   0   0    1    1    F0
(XEN)  03 000 00  0    0    0   0   0    1    1    40
(XEN)  04 000 00  0    0    0   0   0    1    1    48
(XEN)  05 000 00  0    0    0   0   0    1    1    50
(XEN)  06 000 00  0    0    0   0   0    1    1    58
(XEN)  07 000 00  0    0    0   0   0    1    1    60
(XEN)  08 000 00  0    0    0   0   0    1    1    68
(XEN)  09 000 00  0    1    0   0   0    1    1    70
(XEN)  0a 000 00  0    0    0   0   0    1    1    78
(XEN)  0b 000 00  0    0    0   0   0    1    1    88
(XEN)  0c 000 00  0    0    0   0   0    1    1    90
(XEN)  0d 000 00  0    0    0   0   0    1    1    98
(XEN)  0e 000 00  0    0    0   0   0    1    1    A0
(XEN)  0f 000 00  0    0    0   0   0    1    1    A8
(XEN)  10 000 00  0    1    0   1   0    1    1    36
(XEN)  11 000 00  1    0    0   0   0    0    0    00
(XEN)  12 000 00  0    1    0   1   0    1    1    B5
(XEN)  13 000 00  1    1    0   1   0    1    1    39
(XEN)  14 000 00  1    1    0   1   0    1    1    29
(XEN)  15 07A 0A  1    0    0   0   0    0    2    B4
(XEN)  16 000 00  0    1    0   1   0    1    1    66
(XEN)  17 000 00  0    1    0   1   0    1    1    D8
(XEN) Using vector-based indexing
(XEN) IRQ to pin mappings:
(XEN) IRQ240 -> 0:2
(XEN) IRQ56 -> 0:1
(XEN) IRQ64 -> 0:3
(XEN) IRQ72 -> 0:4
(XEN) IRQ80 -> 0:5
(XEN) IRQ88 -> 0:6
(XEN) IRQ96 -> 0:7
(XEN) IRQ104 -> 0:8
(XEN) IRQ112 -> 0:9
(XEN) IRQ120 -> 0:10
(XEN) IRQ136 -> 0:11
(XEN) IRQ144 -> 0:12
(XEN) IRQ152 -> 0:13
(XEN) IRQ160 -> 0:14
(XEN) IRQ168 -> 0:15
(XEN) IRQ54 -> 0:16
(XEN) IRQ181 -> 0:18
(XEN) IRQ57 -> 0:19
(XEN) IRQ41 -> 0:20
(XEN) IRQ102 -> 0:22
(XEN) IRQ216 -> 0:23
(XEN) .................................... done.
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) CA-107844****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Executing crash image

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-09 21:40               ` Andrew Cooper
  2013-08-09 21:44                 ` Andrew Cooper
@ 2013-08-12  5:50                 ` Zhang, Yang Z
  1 sibling, 0 replies; 63+ messages in thread
From: Zhang, Yang Z @ 2013-08-12  5:50 UTC (permalink / raw)
  To: Andrew Cooper, Thimo E.
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Xiantao

Andrew Cooper wrote on 2013-08-10:
> On 09/08/13 22:27, Thimo E. wrote:
> 
> 
> 	Next crash occured, debugging output included.
> 
> 
> 	One Remark: Over the last days (besides many linux PV guests) 1 
> Windows Guest (with PV drivers) was running, today I've started 
> another Windows guest and during 3 hours two crashed occured, coincidence ?
> 
> 	Best regards
> 	  Thimo
> 
> 
> 
> So according to my debugging, we really have just pushed the same irq 
> which we have subsequently seen again unexpectedly.
> 
> This bug has only ever been seen on Haswell hardware, and appears 
> linked to running HVM guests.
> 
> So either there is an erroneous ACK the LAPIC which is clearing the 
> ISR before the PEOI stack is expecting (which I obviously see, looking 
> at the code), or something more funky is going on with the hardware.
> 
> CC'ing in the Intel maintainers:  Do you have any ideas?  Could this 
> be related to APICv?
Does your machine support APIC-v?

> 
> ~Andrew
> 
> 
> 
> 
> 	(XEN) **Pending EOI error 	(XEN)   irq 29, vector 0x24 	(XEN)   s[0]
> irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 00000000
> 	(XEN) All LAPIC state: 	(XEN) [vector]      ISR      TMR      IRR
> 	(XEN) [1f:00] 00000000 00000000 00000000 	(XEN) [3f:20] 00000010
> 76efa12e 00000000 	(XEN) [5f:40] 00000000 e6f0f2fc 00000000 	(XEN)
> [7f:60] 00000000 32d096ca 00000000 	(XEN) [9f:80] 00000000 78fcf87a
> 00000000 	(XEN) [bf:a0] 00000000 f9b9fe4e 00000000 	(XEN) [df:c0]
> 00000000 ffdfe7ab 00000000 	(XEN) [ff:e0] 00000000 00000000 00000000
> 	(XEN) Peoi stack trace records: 	(XEN)   Pushed {sp 0, irq 29, vec
> 0x24} 	(XEN)   Poped entry {sp 1, irq 29, vec 0x24} 	(XEN)   Marked {sp
> 0, irq 29, vec 0x24} ready 	(XEN)   Pushed {sp 0, irq 29, vec 0x24}
> 	(XEN)   Poped entry {sp 1, irq 29, vec 0x24} 	(XEN)   Marked {sp 0, irq
> 29, vec 0x24} ready 	(XEN)   Pushed {sp 0, irq 29, vec 0x24} 	(XEN)  
> Poped entry {sp 1, irq 29, vec 0x24} 	(XEN)   Marked {sp 0, irq 29, vec
> 0x24} ready 	(XEN)   Pushed {sp 0, irq 29, vec 0x24} 	(XEN)   Poped
> entry {sp 1, irq 29, vec 0x24} 	(XEN)   Marked {sp 0, irq 29, vec 0x24}
> ready 	(XEN)   Pushed {sp 0, irq 29, vec 0x24} 	(XEN)   Poped entry {sp
> 1, irq 29, vec 0x24} 	(XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> 	(XEN)   Pushed {sp 0, irq 29, vec 0x24} 	(XEN)   Poped entry {sp 1, irq
> 29, vec 0x24} 	(XEN)   Marked {sp 0, irq 29, vec 0x24} ready 	(XEN)  
> Pushed {sp 0, irq 29, vec 0x24} 	(XEN)   Poped entry {sp 1, irq 29, vec
> 0x24} 	(XEN)   Marked {sp 0, irq 29, vec 0x24} ready 	(XEN)   Pushed {sp
> 0, irq 29, vec 0x24} 	(XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> 	(XEN)   Marked {sp 0, irq 29, vec 0x24} ready 	(XEN)   Pushed {sp 0,
> irq 29, vec 0x24} 	(XEN)   Poped entry {sp 1, irq 29, vec 0x24} 	(XEN)  
> Marked {sp 0, irq 29, vec 0x24} ready 	(XEN)   Pushed {sp 0, irq 29, vec
> 0x24} 	(XEN)   Poped entry {sp 1, irq 29, vec 0x24} 	(XEN)   Marked {sp
> 0, irq 29, vec 0x24} ready 	(XEN)   Pushed {sp 0, irq 29, vec 0x24}
> 	(XEN)   Poped entry {sp 1, irq 29, vec 0x24} 	(XEN) Guest interrupt
> information: 	(XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge
> status=00000000 mapped, unbound 	(XEN)    IRQ:   1 affinity:1 vec:38
> type=IO-APIC-edge status=00000050 in-flight=0 domain-list=0:  1(----),
> 	(XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC status=00000000 mapped,
> unbound 	(XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge
> status=00000002 mapped, unbound 	(XEN)    IRQ:   4 affinity:1 vec:48
> type=IO-APIC-edge status=00000002 mapped, unbound 	(XEN)    IRQ:   5
> affinity:1 vec:50 type=IO-APIC-edge status=00000050 in-flight=0
> domain-list=0:  5(----), 	(XEN)    IRQ:   6 affinity:1 vec:58
> type=IO-APIC-edge status=00000002 mapped, unbound 	(XEN)    IRQ:   7
> affinity:1 vec:60 type=IO-APIC-edge status=00000002 mapped, unbound
> 	(XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge status=00000050
> in-flight=0 domain-list=0:  8(----), 	(XEN)    IRQ:   9 affinity:1
> vec:70 type=IO-APIC-level status=00000050 in-flight=0 domain-list=0: 
> 9(----), 	(XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge
> status=00000002 mapped, unbound 	(XEN)    IRQ:  11 affinity:1 vec:88
> type=IO-APIC-edge status=00000002 mapped, unbound 	(XEN)    IRQ:  12
> affinity:1 vec:90 type=IO-APIC-edge status=00000002 mapped, unbound
> 	(XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge status=00000002
> mapped, unbound 	(XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge
> status=00000002 mapped, unbound 	(XEN)    IRQ:  15 affinity:1 vec:a8
> type=IO-APIC-edge status=00000002 mapped, unbound 	(XEN)    IRQ:  16
> affinity:1 vec:db type=IO-APIC-level status=00000010 in-flight=0
> domain-list=0: 16(----), 	(XEN)    IRQ:  18 affinity:1 vec:2c
> type=IO-APIC-level status=00000010 in-flight=0 domain-list=0: 18(----),
> 	(XEN)    IRQ:  19 affinity:1 vec:51 type=IO-APIC-level status=00000002
> mapped, unbound 	(XEN)    IRQ:  20 affinity:1 vec:29 type=IO-APIC-level
> status=00000002 mapped, unbound 	(XEN)    IRQ:  22 affinity:1 vec:bb
> type=IO-APIC-level status=00000050 in-flight=0 domain-list=0: 22(----),
> 	(XEN)    IRQ:  23 affinity:8 vec:c2 type=IO-APIC-level status=00000050
> in-flight=0 domain-list=0: 23(----), 	(XEN)    IRQ:  24 affinity:1
> vec:28 type=DMA_MSI status=00000000 mapped, unbound 	(XEN)    IRQ:  25
> affinity:1 vec:30 type=DMA_MSI status=00000000 mapped, unbound 	(XEN)   
> IRQ:  26 affinity:f vec:c0 type=PCI-MSI status=00000002 mapped, unbound
> 	(XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI status=00000002
> mapped, unbound 	(XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI
> status=00000002 mapped, unbound 	(XEN)    IRQ:  29 affinity:2 vec:24
> type=PCI-MSI status=00000010 in-flight=0 domain-list=0:276(----), 	(XEN)
>    IRQ:  30 affinity:4 vec:93 type=PCI-MSI status=00000050 in-flight=0
> domain-list=0:275(----), 	(XEN)    IRQ:  31 affinity:2 vec:4a
> type=PCI-MSI status=00000050 in-flight=0 domain-list=0:274(----), 	(XEN)
>    IRQ:  32 affinity:2 vec:73 type=PCI-MSI status=00000050 in-flight=0
> domain-list=0:273(----), 	(XEN)    IRQ:  33 affinity:1 vec:49
> type=PCI-MSI status=00000050 in-flight=0 domain-list=0:272(----), 	(XEN)
>    IRQ:  34 affinity:8 vec:5f type=PCI-MSI status=00000050 in-flight=0
> domain-list=0:271(----), 	(XEN) IO-APIC interrupt information: 	(XEN)   
>  IRQ  0 Vec240: 	(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri
> dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ 
> 1 Vec 56: 	(XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ  3 Vec
> 64: 	(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ  4 Vec
> 72: 	(XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ  5 Vec
> 80: 	(XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ  6 Vec
> 88: 	(XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ  7 Vec
> 96: 	(XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ  8
> Vec104: 	(XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ  9
> Vec112: 	(XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=L mask=0 dest_id:0 	(XEN)     IRQ 10
> Vec120: 	(XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ 11
> Vec136: 	(XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ 12
> Vec144: 	(XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ 13
> Vec152: 	(XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ 14
> Vec160: 	(XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ 15
> Vec168: 	(XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 	(XEN)     IRQ 16
> Vec219: 	(XEN)       Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0 	(XEN)     IRQ 18 Vec
> 44: 	(XEN)       Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0 	(XEN)     IRQ 19 Vec
> 81: 	(XEN)       Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0 	(XEN)     IRQ 20 Vec
> 41: 	(XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0 	(XEN)     IRQ 22
> Vec187: 	(XEN)       Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0 	(XEN)     IRQ 23
> Vec194: 	(XEN)       Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0 	(XEN) number of MP
> IRQ sources: 15. 	(XEN) number of IO-APIC #2 registers: 24. 	(XEN)
> testing the IO APIC....................... 	(XEN) IO APIC #2......
> 	(XEN) .... register #00: 02000000 	(XEN) .......    : physical APIC id:
> 02 	(XEN) .......    : Delivery Type: 0 	(XEN) .......    : LTS         
> : 0 	(XEN) .... register #01: 00170020 	(XEN) .......     : max
> redirection entries: 0017 	(XEN) .......     : PRQ implemented: 0 	(XEN)
> .......     : IO APIC version: 0020 	(XEN) .... IRQ redirection table:
> 	(XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: 	(XEN)  00 000
> 00  1    0    0   0   0    0    0    00 	(XEN)  01 000 00  0    0    0  
> 0   0    1    1    38 	(XEN)  02 000 00  0    0    0   0   0    1    1  
>  F0 	(XEN)  03 000 00  0    0    0   0   0    1    1    40 	(XEN)  04
> 000 00  0    0    0   0   0    1    1    48 	(XEN)  05 000 00  0    0   
> 0   0   0    1    1    50 	(XEN)  06 000 00  0    0    0   0   0    1   
> 1    58 	(XEN)  07 000 00  0    0    0   0   0    1    1    60 	(XEN) 
> 08 000 00  0    0    0   0   0    1    1    68 	(XEN)  09 000 00  0    1
>    0   0   0    1    1    70 	(XEN)  0a 000 00  0    0    0   0   0    1
>    1    78 	(XEN)  0b 000 00  0    0    0   0   0    1    1    88 	(XEN)
>  0c 000 00  0    0    0   0   0    1    1    90 	(XEN)  0d 000 00  0   
> 0    0   0   0    1    1    98 	(XEN)  0e 000 00  0    0    0   0   0   
> 1    1    A0 	(XEN)  0f 000 00  0    0    0   0   0    1    1    A8
> 	(XEN)  10 000 00  0    1    0   1   0    1    1    DB 	(XEN)  11 000 00
>  1    0    0   0   0    0    0    00 	(XEN)  12 000 00  0    1    0   1 
>  0    1    1    2C 	(XEN)  13 000 00  1    1    0   1   0    1    1   
> 51 	(XEN)  14 000 00  1    1    0   1   0    1    1    29 	(XEN)  15 07A
> 0A  1    0    0   0   0    0    2    B4 	(XEN)  16 000 00  0    1    0  
> 1   0    1    1    BB 	(XEN)  17 000 00  0    1    0   1   0    1    1  
>  C2 	(XEN) Using vector-based indexing 	(XEN) IRQ to pin mappings:
> 	(XEN) IRQ240 -> 0:2 	(XEN) IRQ56 -> 0:1 	(XEN) IRQ64 -> 0:3 	(XEN)
> IRQ72 -> 0:4 	(XEN) IRQ80 -> 0:5 	(XEN) IRQ88 -> 0:6 	(XEN) IRQ96 -> 0:7
> 	(XEN) IRQ104 -> 0:8 	(XEN) IRQ112 -> 0:9 	(XEN) IRQ120 -> 0:10 	(XEN)
> IRQ136 -> 0:11 	(XEN) IRQ144 -> 0:12 	(XEN) IRQ152 -> 0:13 	(XEN) IRQ160
> -> 0:14 	(XEN) IRQ168 -> 0:15 	(XEN) IRQ219 -> 0:16 	(XEN) IRQ44 -> 0:18
> 	(XEN) IRQ81 -> 0:19 	(XEN) IRQ41 -> 0:20 	(XEN) IRQ187 -> 0:22 	(XEN)
> IRQ194 -> 0:23 	(XEN) .................................... done. 	(XEN)
> 	(XEN) **************************************** 	(XEN) Panic on CPU 1:
> 	(XEN) CA-107844**************************************** 	(XEN) 	(XEN)
> Reboot in five seconds... 	(XEN) Executing crash image
> 
> 
> 	Am 05.08.2013 16:51, schrieb Andrew Cooper:
> 
> 		All of these crashes are coming out of mwait_idle, so the cpu in
> 		question has literally just been in an lower power state.
> 
> 		I am wondering whether there is some caching issue where an update to
> 		the Pending EOI stack pointer got "lost", but this seems like a little
> 		too specific to be reasonably explained as a caching issue.
> 
> 		A new debugging patch is on its way (Sorry - it has been a very busy
> few 		days)
> 
> 		~Andrew
> 
>


Best regards,
Yang

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-11 17:46                   ` Thimo E.
@ 2013-08-12  6:02                     ` Zhang, Yang Z
  2013-08-12  8:49                     ` Zhang, Yang Z
  2013-08-12  9:10                     ` Andrew Cooper
  2 siblings, 0 replies; 63+ messages in thread
From: Zhang, Yang Z @ 2013-08-12  6:02 UTC (permalink / raw)
  To: Thimo E., Andrew Cooper
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Xiantao


[-- Attachment #1.1: Type: text/plain, Size: 1506 bytes --]

Hi Thimo,

Can you provide the xen boot log?

Best regards,
Yang

From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Thimo E.
Sent: Monday, August 12, 2013 1:47 AM
To: Andrew Cooper
Cc: Keir Fraser; Jan Beulich; Dong, Eddie; Xen-develList; Nakajima, Jun; Zhang, Xiantao
Subject: Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic

Hello again,

attached you'll find another crash dump from today. Don't know if it gives you more information than the last one.

Just FYI, this is a system with an Intel Mainboard (H87 chipset) and a Core i5-4670 CPU.

Best regards
  Thimo

Am 09.08.2013 23:44, schrieb Andrew Cooper:
On 09/08/13 22:40, Andrew Cooper wrote:

So according to my debugging, we really have just pushed the same irq which we have subsequently seen again unexpectedly.

This bug has only ever been seen on Haswell hardware, and appears linked to running HVM guests.

So either there is an erroneous ACK the LAPIC which is clearing the ISR before the PEOI stack is expecting (which I

"can't"

Apologies for the confusion.

~Andrew


obviously see, looking at the code), or something more funky is going on with the hardware.

CC'ing in the Intel maintainers:  Do you have any ideas?  Could this be related to APICv?

~Andrew

_______________________________________________

Xen-devel mailing list

Xen-devel@lists.xen.org<mailto:Xen-devel@lists.xen.org>

http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 7354 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-09 21:27             ` Thimo E.
  2013-08-09 21:40               ` Andrew Cooper
@ 2013-08-12  8:20               ` Jan Beulich
  2013-08-12  9:28                 ` Andrew Cooper
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2013-08-12  8:20 UTC (permalink / raw)
  To: Andrew Cooper, Thimo E.; +Cc: xen-devel, Keir Fraser

>>> On 09.08.13 at 23:27, "Thimo E." <abc@digithi.de> wrote:
> (XEN) **Pending EOI error
> (XEN)   irq 29, vector 0x24
> (XEN)   s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 00000000
> (XEN) All LAPIC state:
> (XEN) [vector]      ISR      TMR      IRR
> (XEN) [1f:00] 00000000 00000000 00000000
> (XEN) [3f:20] 00000010 76efa12e 00000000
> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
> (XEN) [7f:60] 00000000 32d096ca 00000000
> (XEN) [9f:80] 00000000 78fcf87a 00000000
> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
> (XEN) [ff:e0] 00000000 00000000 00000000
> (XEN) Peoi stack trace records:

Mind providing (a link to) the patch that was used here, so that
one can make sense of the printed information (and perhaps
also suggest adjustments to that debugging code)? Nothing I
was able to find on the list fully matches the output above...

Jan

> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Guest interrupt information:
> (XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge status=00000000 
> mapped, unbound
> (XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge status=00000050 
> in-flight=0 domain-list=0:  1(----),
> (XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC status=00000000 mapped, 
> unbound
> (XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge status=00000002 
> mapped, unbound
> (XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge status=00000002 
> mapped, unbound
> (XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge status=00000050 
> in-flight=0 domain-list=0:  5(----),
> (XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge status=00000002 
> mapped, unbound
> (XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge status=00000002 
> mapped, unbound
> (XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge status=00000050 
> in-flight=0 domain-list=0:  8(----),
> (XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level status=00000050 
> in-flight=0 domain-list=0:  9(----),
> (XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge status=00000002 
> mapped, unbound
> (XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge status=00000002 
> mapped, unbound
> (XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge status=00000002 
> mapped, unbound
> (XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge status=00000002 
> mapped, unbound
> (XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge status=00000002 
> mapped, unbound
> (XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge status=00000002 
> mapped, unbound
> (XEN)    IRQ:  16 affinity:1 vec:db type=IO-APIC-level status=00000010 
> in-flight=0 domain-list=0: 16(----),
> (XEN)    IRQ:  18 affinity:1 vec:2c type=IO-APIC-level status=00000010 
> in-flight=0 domain-list=0: 18(----),
> (XEN)    IRQ:  19 affinity:1 vec:51 type=IO-APIC-level status=00000002 
> mapped, unbound
> (XEN)    IRQ:  20 affinity:1 vec:29 type=IO-APIC-level status=00000002 
> mapped, unbound
> (XEN)    IRQ:  22 affinity:1 vec:bb type=IO-APIC-level status=00000050 
> in-flight=0 domain-list=0: 22(----),
> (XEN)    IRQ:  23 affinity:8 vec:c2 type=IO-APIC-level status=00000050 
> in-flight=0 domain-list=0: 23(----),
> (XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI status=00000000 mapped, 
> unbound
> (XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI status=00000000 mapped, 
> unbound
> (XEN)    IRQ:  26 affinity:f vec:c0 type=PCI-MSI status=00000002 mapped, 
> unbound
> (XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI status=00000002 mapped, 
> unbound
> (XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI status=00000002 mapped, 
> unbound
> (XEN)    IRQ:  29 affinity:2 vec:24 type=PCI-MSI status=00000010 
> in-flight=0 domain-list=0:276(----),
> (XEN)    IRQ:  30 affinity:4 vec:93 type=PCI-MSI status=00000050 
> in-flight=0 domain-list=0:275(----),
> (XEN)    IRQ:  31 affinity:2 vec:4a type=PCI-MSI status=00000050 
> in-flight=0 domain-list=0:274(----),
> (XEN)    IRQ:  32 affinity:2 vec:73 type=PCI-MSI status=00000050 
> in-flight=0 domain-list=0:273(----),
> (XEN)    IRQ:  33 affinity:1 vec:49 type=PCI-MSI status=00000050 
> in-flight=0 domain-list=0:272(----),
> (XEN)    IRQ:  34 affinity:8 vec:5f type=PCI-MSI status=00000050 
> in-flight=0 domain-list=0:271(----),
> (XEN) IO-APIC interrupt information:
> (XEN)     IRQ  0 Vec240:
> (XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  1 Vec 56:
> (XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  3 Vec 64:
> (XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  4 Vec 72:
> (XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  5 Vec 80:
> (XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  6 Vec 88:
> (XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  7 Vec 96:
> (XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  8 Vec104:
> (XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  9 Vec112:
> (XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 10 Vec120:
> (XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 11 Vec136:
> (XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 12 Vec144:
> (XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 13 Vec152:
> (XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 14 Vec160:
> (XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 15 Vec168:
> (XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0 
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 16 Vec219:
> (XEN)       Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L status=0 
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 18 Vec 44:
> (XEN)       Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L status=0 
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 19 Vec 81:
> (XEN)       Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L status=0 
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN)     IRQ 20 Vec 41:
> (XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0 
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN)     IRQ 22 Vec187:
> (XEN)       Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L status=0 
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 23 Vec194:
> (XEN)       Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L status=0 
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN) number of MP IRQ sources: 15.
> (XEN) number of IO-APIC #2 registers: 24.
> (XEN) testing the IO APIC.......................
> (XEN) IO APIC #2......
> (XEN) .... register #00: 02000000
> (XEN) .......    : physical APIC id: 02
> (XEN) .......    : Delivery Type: 0
> (XEN) .......    : LTS          : 0
> (XEN) .... register #01: 00170020
> (XEN) .......     : max redirection entries: 0017
> (XEN) .......     : PRQ implemented: 0
> (XEN) .......     : IO APIC version: 0020
> (XEN) .... IRQ redirection table:
> (XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
> (XEN)  00 000 00  1    0    0   0   0    0    0    00
> (XEN)  01 000 00  0    0    0   0   0    1    1    38
> (XEN)  02 000 00  0    0    0   0   0    1    1    F0
> (XEN)  03 000 00  0    0    0   0   0    1    1    40
> (XEN)  04 000 00  0    0    0   0   0    1    1    48
> (XEN)  05 000 00  0    0    0   0   0    1    1    50
> (XEN)  06 000 00  0    0    0   0   0    1    1    58
> (XEN)  07 000 00  0    0    0   0   0    1    1    60
> (XEN)  08 000 00  0    0    0   0   0    1    1    68
> (XEN)  09 000 00  0    1    0   0   0    1    1    70
> (XEN)  0a 000 00  0    0    0   0   0    1    1    78
> (XEN)  0b 000 00  0    0    0   0   0    1    1    88
> (XEN)  0c 000 00  0    0    0   0   0    1    1    90
> (XEN)  0d 000 00  0    0    0   0   0    1    1    98
> (XEN)  0e 000 00  0    0    0   0   0    1    1    A0
> (XEN)  0f 000 00  0    0    0   0   0    1    1    A8
> (XEN)  10 000 00  0    1    0   1   0    1    1    DB
> (XEN)  11 000 00  1    0    0   0   0    0    0    00
> (XEN)  12 000 00  0    1    0   1   0    1    1    2C
> (XEN)  13 000 00  1    1    0   1   0    1    1    51
> (XEN)  14 000 00  1    1    0   1   0    1    1    29
> (XEN)  15 07A 0A  1    0    0   0   0    0    2    B4
> (XEN)  16 000 00  0    1    0   1   0    1    1    BB
> (XEN)  17 000 00  0    1    0   1   0    1    1    C2
> (XEN) Using vector-based indexing
> (XEN) IRQ to pin mappings:
> (XEN) IRQ240 -> 0:2
> (XEN) IRQ56 -> 0:1
> (XEN) IRQ64 -> 0:3
> (XEN) IRQ72 -> 0:4
> (XEN) IRQ80 -> 0:5
> (XEN) IRQ88 -> 0:6
> (XEN) IRQ96 -> 0:7
> (XEN) IRQ104 -> 0:8
> (XEN) IRQ112 -> 0:9
> (XEN) IRQ120 -> 0:10
> (XEN) IRQ136 -> 0:11
> (XEN) IRQ144 -> 0:12
> (XEN) IRQ152 -> 0:13
> (XEN) IRQ160 -> 0:14
> (XEN) IRQ168 -> 0:15
> (XEN) IRQ219 -> 0:16
> (XEN) IRQ44 -> 0:18
> (XEN) IRQ81 -> 0:19
> (XEN) IRQ41 -> 0:20
> (XEN) IRQ187 -> 0:22
> (XEN) IRQ194 -> 0:23
> (XEN) .................................... done.
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) CA-107844****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN) Executing crash image
> 
> 
> Am 05.08.2013 16:51, schrieb Andrew Cooper:
>> All of these crashes are coming out of mwait_idle, so the cpu in
>> question has literally just been in an lower power state.
>>
>> I am wondering whether there is some caching issue where an update to
>> the Pending EOI stack pointer got "lost", but this seems like a little
>> too specific to be reasonably explained as a caching issue.
>>
>> A new debugging patch is on its way (Sorry - it has been a very busy few
>> days)
>>
>> ~Andrew
>>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-11 17:46                   ` Thimo E.
  2013-08-12  6:02                     ` Zhang, Yang Z
@ 2013-08-12  8:49                     ` Zhang, Yang Z
  2013-08-12  8:57                       ` Jan Beulich
                                         ` (2 more replies)
  2013-08-12  9:10                     ` Andrew Cooper
  2 siblings, 3 replies; 63+ messages in thread
From: Zhang, Yang Z @ 2013-08-12  8:49 UTC (permalink / raw)
  To: Thimo E., Andrew Cooper
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Xiantao


[-- Attachment #1.1: Type: text/plain, Size: 2696 bytes --]

Hi Thimo,
>From your previous experience and log, it shows:

1.       The interrupt that triggers the issue is a MSI.

2.       MSI are treated as edge-triggered interrupts nomally, except when there is no way to mask the device. In this case, your previous log indicates the device is unmaskable(What special device are you using?Modern PCI devcie should be maskable).

3.       The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.

4.       The status of IRQ 29 is 10 which means the guest already issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should be no pending EOI in the EOI stack. If possible, can you add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?

5.       Both of the log show when the issue occured, most of the other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it a coincidence? Or it happened only on the special condition like heavy of IRQ migration?Perhaps you can disable irq balance in dom0 and pin the IRQ manually.

6.       I guess the interrupt remapping is enabled in your machine. Can you try to disable IR to see whether it still reproduceable?
Also, please provide the whole Xen log.

Best regards,
Yang

From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Thimo E.
Sent: Monday, August 12, 2013 1:47 AM
To: Andrew Cooper
Cc: Keir Fraser; Jan Beulich; Dong, Eddie; Xen-develList; Nakajima, Jun; Zhang, Xiantao
Subject: Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic

Hello again,

attached you'll find another crash dump from today. Don't know if it gives you more information than the last one.

Just FYI, this is a system with an Intel Mainboard (H87 chipset) and a Core i5-4670 CPU.

Best regards
  Thimo

Am 09.08.2013 23:44, schrieb Andrew Cooper:
On 09/08/13 22:40, Andrew Cooper wrote:

So according to my debugging, we really have just pushed the same irq which we have subsequently seen again unexpectedly.

This bug has only ever been seen on Haswell hardware, and appears linked to running HVM guests.

So either there is an erroneous ACK the LAPIC which is clearing the ISR before the PEOI stack is expecting (which I

"can't"

Apologies for the confusion.

~Andrew


obviously see, looking at the code), or something more funky is going on with the hardware.

CC'ing in the Intel maintainers:  Do you have any ideas?  Could this be related to APICv?

~Andrew

_______________________________________________

Xen-devel mailing list

Xen-devel@lists.xen.org<mailto:Xen-devel@lists.xen.org>

http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 13841 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12  8:49                     ` Zhang, Yang Z
@ 2013-08-12  8:57                       ` Jan Beulich
  2013-08-12 11:52                       ` Thimo E
  2013-08-12 13:54                       ` Thimo E
  2 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2013-08-12  8:57 UTC (permalink / raw)
  To: Thimo E., Yang Z Zhang
  Cc: Keir Fraser, Andrew Cooper, Eddie Dong, Xen-develList,
	Jun Nakajima, Xiantao Zhang

>>> On 12.08.13 at 10:49, "Zhang, Yang Z" <yang.z.zhang@intel.com> wrote:
> 5.       Both of the log show when the issue occured, most of the other 
> interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it a 
> coincidence? Or it happened only on the special condition like heavy of IRQ 
> migration?Perhaps you can disable irq balance in dom0 and pin the IRQ 
> manually.

Since guest IRQs' affinities track the vCPU's placement on pCPU-s,
suppressing IRQ movement would not only require IRQ balancing
to be suppressed in the respective domain, but also that the vCPU
be bound to a single pCPU.

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-11 17:46                   ` Thimo E.
  2013-08-12  6:02                     ` Zhang, Yang Z
  2013-08-12  8:49                     ` Zhang, Yang Z
@ 2013-08-12  9:10                     ` Andrew Cooper
  2 siblings, 0 replies; 63+ messages in thread
From: Andrew Cooper @ 2013-08-12  9:10 UTC (permalink / raw)
  To: Thimo E.
  Cc: Keir Fraser, Jan Beulich, Eddie Dong, Xen-develList,
	Jun Nakajima, Xiantao Zhang


[-- Attachment #1.1: Type: text/plain, Size: 514 bytes --]

On 11/08/13 18:46, Thimo E. wrote:
> Hello again,
>
> attached you'll find another crash dump from today. Don't know if it
> gives you more information than the last one.
>
> Just FYI, this is a system with an Intel Mainboard (H87 chipset) and a
> Core i5-4670 CPU.
>
> Best regards
>   Thimo

It is still saying the same.  irq 29 should already be in-service at the
LAPIC (because it is present on the PEOI stack), but isn't, and we
subsequently get reinterrupted with it, causing the assertion to fail.

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 1083 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12  8:20               ` Jan Beulich
@ 2013-08-12  9:28                 ` Andrew Cooper
  2013-08-12 10:05                   ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-08-12  9:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Thimo E., Keir Fraser

[-- Attachment #1: Type: text/plain, Size: 12210 bytes --]

On 12/08/13 09:20, Jan Beulich wrote:
>>>> On 09.08.13 at 23:27, "Thimo E." <abc@digithi.de> wrote:
>> (XEN) **Pending EOI error
>> (XEN)   irq 29, vector 0x24
>> (XEN)   s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 00000000
>> (XEN) All LAPIC state:
>> (XEN) [vector]      ISR      TMR      IRR
>> (XEN) [1f:00] 00000000 00000000 00000000
>> (XEN) [3f:20] 00000010 76efa12e 00000000
>> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
>> (XEN) [7f:60] 00000000 32d096ca 00000000
>> (XEN) [9f:80] 00000000 78fcf87a 00000000
>> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
>> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
>> (XEN) [ff:e0] 00000000 00000000 00000000
>> (XEN) Peoi stack trace records:
> Mind providing (a link to) the patch that was used here, so that
> one can make sense of the printed information (and perhaps
> also suggest adjustments to that debugging code)? Nothing I
> was able to find on the list fully matches the output above...
>
> Jan

Attached

~Andrew

>
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN)   Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN)   Pushed {sp 0, irq 29, vec 0x24}
>> (XEN)   Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Guest interrupt information:
>> (XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge status=00000000 
>> mapped, unbound
>> (XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge status=00000050 
>> in-flight=0 domain-list=0:  1(----),
>> (XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC status=00000000 mapped, 
>> unbound
>> (XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge status=00000050 
>> in-flight=0 domain-list=0:  5(----),
>> (XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge status=00000050 
>> in-flight=0 domain-list=0:  8(----),
>> (XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level status=00000050 
>> in-flight=0 domain-list=0:  9(----),
>> (XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:  16 affinity:1 vec:db type=IO-APIC-level status=00000010 
>> in-flight=0 domain-list=0: 16(----),
>> (XEN)    IRQ:  18 affinity:1 vec:2c type=IO-APIC-level status=00000010 
>> in-flight=0 domain-list=0: 18(----),
>> (XEN)    IRQ:  19 affinity:1 vec:51 type=IO-APIC-level status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:  20 affinity:1 vec:29 type=IO-APIC-level status=00000002 
>> mapped, unbound
>> (XEN)    IRQ:  22 affinity:1 vec:bb type=IO-APIC-level status=00000050 
>> in-flight=0 domain-list=0: 22(----),
>> (XEN)    IRQ:  23 affinity:8 vec:c2 type=IO-APIC-level status=00000050 
>> in-flight=0 domain-list=0: 23(----),
>> (XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI status=00000000 mapped, 
>> unbound
>> (XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI status=00000000 mapped, 
>> unbound
>> (XEN)    IRQ:  26 affinity:f vec:c0 type=PCI-MSI status=00000002 mapped, 
>> unbound
>> (XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI status=00000002 mapped, 
>> unbound
>> (XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI status=00000002 mapped, 
>> unbound
>> (XEN)    IRQ:  29 affinity:2 vec:24 type=PCI-MSI status=00000010 
>> in-flight=0 domain-list=0:276(----),
>> (XEN)    IRQ:  30 affinity:4 vec:93 type=PCI-MSI status=00000050 
>> in-flight=0 domain-list=0:275(----),
>> (XEN)    IRQ:  31 affinity:2 vec:4a type=PCI-MSI status=00000050 
>> in-flight=0 domain-list=0:274(----),
>> (XEN)    IRQ:  32 affinity:2 vec:73 type=PCI-MSI status=00000050 
>> in-flight=0 domain-list=0:273(----),
>> (XEN)    IRQ:  33 affinity:1 vec:49 type=PCI-MSI status=00000050 
>> in-flight=0 domain-list=0:272(----),
>> (XEN)    IRQ:  34 affinity:8 vec:5f type=PCI-MSI status=00000050 
>> in-flight=0 domain-list=0:271(----),
>> (XEN) IO-APIC interrupt information:
>> (XEN)     IRQ  0 Vec240:
>> (XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  1 Vec 56:
>> (XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  3 Vec 64:
>> (XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  4 Vec 72:
>> (XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  5 Vec 80:
>> (XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  6 Vec 88:
>> (XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  7 Vec 96:
>> (XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  8 Vec104:
>> (XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  9 Vec112:
>> (XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 10 Vec120:
>> (XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 11 Vec136:
>> (XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 12 Vec144:
>> (XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 13 Vec152:
>> (XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 14 Vec160:
>> (XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 15 Vec168:
>> (XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0 
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 16 Vec219:
>> (XEN)       Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L status=0 
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 18 Vec 44:
>> (XEN)       Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L status=0 
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 19 Vec 81:
>> (XEN)       Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L status=0 
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN)     IRQ 20 Vec 41:
>> (XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0 
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN)     IRQ 22 Vec187:
>> (XEN)       Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L status=0 
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 23 Vec194:
>> (XEN)       Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L status=0 
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN) number of MP IRQ sources: 15.
>> (XEN) number of IO-APIC #2 registers: 24.
>> (XEN) testing the IO APIC.......................
>> (XEN) IO APIC #2......
>> (XEN) .... register #00: 02000000
>> (XEN) .......    : physical APIC id: 02
>> (XEN) .......    : Delivery Type: 0
>> (XEN) .......    : LTS          : 0
>> (XEN) .... register #01: 00170020
>> (XEN) .......     : max redirection entries: 0017
>> (XEN) .......     : PRQ implemented: 0
>> (XEN) .......     : IO APIC version: 0020
>> (XEN) .... IRQ redirection table:
>> (XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
>> (XEN)  00 000 00  1    0    0   0   0    0    0    00
>> (XEN)  01 000 00  0    0    0   0   0    1    1    38
>> (XEN)  02 000 00  0    0    0   0   0    1    1    F0
>> (XEN)  03 000 00  0    0    0   0   0    1    1    40
>> (XEN)  04 000 00  0    0    0   0   0    1    1    48
>> (XEN)  05 000 00  0    0    0   0   0    1    1    50
>> (XEN)  06 000 00  0    0    0   0   0    1    1    58
>> (XEN)  07 000 00  0    0    0   0   0    1    1    60
>> (XEN)  08 000 00  0    0    0   0   0    1    1    68
>> (XEN)  09 000 00  0    1    0   0   0    1    1    70
>> (XEN)  0a 000 00  0    0    0   0   0    1    1    78
>> (XEN)  0b 000 00  0    0    0   0   0    1    1    88
>> (XEN)  0c 000 00  0    0    0   0   0    1    1    90
>> (XEN)  0d 000 00  0    0    0   0   0    1    1    98
>> (XEN)  0e 000 00  0    0    0   0   0    1    1    A0
>> (XEN)  0f 000 00  0    0    0   0   0    1    1    A8
>> (XEN)  10 000 00  0    1    0   1   0    1    1    DB
>> (XEN)  11 000 00  1    0    0   0   0    0    0    00
>> (XEN)  12 000 00  0    1    0   1   0    1    1    2C
>> (XEN)  13 000 00  1    1    0   1   0    1    1    51
>> (XEN)  14 000 00  1    1    0   1   0    1    1    29
>> (XEN)  15 07A 0A  1    0    0   0   0    0    2    B4
>> (XEN)  16 000 00  0    1    0   1   0    1    1    BB
>> (XEN)  17 000 00  0    1    0   1   0    1    1    C2
>> (XEN) Using vector-based indexing
>> (XEN) IRQ to pin mappings:
>> (XEN) IRQ240 -> 0:2
>> (XEN) IRQ56 -> 0:1
>> (XEN) IRQ64 -> 0:3
>> (XEN) IRQ72 -> 0:4
>> (XEN) IRQ80 -> 0:5
>> (XEN) IRQ88 -> 0:6
>> (XEN) IRQ96 -> 0:7
>> (XEN) IRQ104 -> 0:8
>> (XEN) IRQ112 -> 0:9
>> (XEN) IRQ120 -> 0:10
>> (XEN) IRQ136 -> 0:11
>> (XEN) IRQ144 -> 0:12
>> (XEN) IRQ152 -> 0:13
>> (XEN) IRQ160 -> 0:14
>> (XEN) IRQ168 -> 0:15
>> (XEN) IRQ219 -> 0:16
>> (XEN) IRQ44 -> 0:18
>> (XEN) IRQ81 -> 0:19
>> (XEN) IRQ41 -> 0:20
>> (XEN) IRQ187 -> 0:22
>> (XEN) IRQ194 -> 0:23
>> (XEN) .................................... done.
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 1:
>> (XEN) CA-107844****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>> (XEN) Executing crash image
>>
>>
>> Am 05.08.2013 16:51, schrieb Andrew Cooper:
>>> All of these crashes are coming out of mwait_idle, so the cpu in
>>> question has literally just been in an lower power state.
>>>
>>> I am wondering whether there is some caching issue where an update to
>>> the Pending EOI stack pointer got "lost", but this seems like a little
>>> too specific to be reasonably explained as a caching issue.
>>>
>>> A new debugging patch is on its way (Sorry - it has been a very busy few
>>> days)
>>>
>>> ~Andrew
>>>
>


[-- Attachment #2: ca-107844-debug.patch --]
[-- Type: text/x-patch, Size: 5875 bytes --]

# HG changeset patch
# Parent bbd6b6d05c06f6331974467467cf567d60915b3d

diff -r bbd6b6d05c06 xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1176,7 +1176,7 @@ static inline void UNEXPECTED_IO_APIC(vo
 {
 }
 
-static void /*__init*/ __print_IO_APIC(void)
+void /*__init*/ __print_IO_APIC(void)
 {
     int apic, i;
     union IO_APIC_reg_00 reg_00;
diff -r bbd6b6d05c06 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1003,6 +1003,46 @@ static void irq_guest_eoi_timer_fn(void 
     spin_unlock_irqrestore(&desc->lock, flags);
 }
 
+struct peoi_record {
+    enum { PEOI_PUSH,
+           PEOI_SETREADY,
+           PEOI_FLUSH,
+           PEOI_POP } action;
+    unsigned sp, irq, vector;
+};
+
+static void print_peoi_record(const struct peoi_record *r)
+{
+    switch ( r->action )
+    {
+    case PEOI_PUSH:
+        printk("  Pushed {sp %d, irq %d, vec 0x%02x}\n",
+               r->sp, r->irq, r->vector);
+        break;
+    case PEOI_SETREADY:
+        printk("  Marked {sp %d, irq %d, vec 0x%02x} ready\n",
+               r->sp, r->irq, r->vector);
+        break;
+    case PEOI_FLUSH:
+        printk("  Fushed %d -> 0 \n", r->sp);
+        break;
+    case PEOI_POP:
+        printk("  Poped entry {sp %d, irq %d, vec 0x%02x}\n",
+               r->sp, r->irq, r->vector);
+        break;
+    default:
+        printk("  Unknown: {%d, %d, %d, 0x%02x}\n",
+               r->action, r->sp, r->irq, r->vector);
+        break;
+    }
+}
+
+#define NR_PEOI_RECORDS 32
+static DEFINE_PER_CPU(struct peoi_record, _peoi_dbg[NR_PEOI_RECORDS]) = {{0}};
+static DEFINE_PER_CPU(unsigned int, _peoi_dbg_idx) = 0;
+
+static void dump_irqs(unsigned char key);
+void __print_IO_APIC(void);
 static void __do_IRQ_guest(int irq)
 {
     struct irq_desc         *desc = irq_to_desc(irq);
@@ -1024,13 +1064,53 @@ static void __do_IRQ_guest(int irq)
     if ( action->ack_type == ACKTYPE_EOI )
     {
         sp = pending_eoi_sp(peoi);
-        ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
+        if ( unlikely( !((sp == 0) || (peoi[sp-1].vector < vector)) ))
+        {
+            int p;
+            unsigned i, idx;
+            printk("**Pending EOI error\n");
+            printk("  irq %d, vector 0x%x\n", irq, vector);
+
+            for ( p = sp-1; p >= 0; --p )
+            {
+                printk("  s[%d] irq %d, vec 0x%x, ready %u, "
+                       "ISR %08"PRIx32", TMR %08"PRIx32", IRR %08"PRIx32"\n",
+                       p, peoi[p].irq, peoi[p].vector, peoi[p].ready,
+                       apic_isr_read(peoi[p].vector),
+                       apic_tmr_read(peoi[p].vector),
+                       apic_irr_read(peoi[p].vector) );
+            }
+
+            printk("All LAPIC state:\n");
+            printk("[vector] %8s %8s %8s\n", "ISR", "TMR", "IRR");
+            for ( i = 0; i < APIC_ISR_NR; ++i )
+                printk("[%02x:%02x] %08"PRIx32" %08"PRIx32" %08"PRIx32"\n",
+                       (i * 32)+31, i*32,
+                       apic_read(APIC_ISR + i*0x10),
+                       apic_read(APIC_TMR + i*0x10),
+                       apic_read(APIC_IRR + i*0x10) );
+
+            printk("Peoi stack trace records:\n");
+            idx = this_cpu(_peoi_dbg_idx);
+            for ( i = 1; i <= NR_PEOI_RECORDS; ++i )
+                print_peoi_record(&this_cpu(_peoi_dbg)[(idx - i) &
+                                                       (NR_PEOI_RECORDS-1)] );
+
+            spin_unlock(&desc->lock);
+            dump_irqs('i');
+            __print_IO_APIC();
+
+            panic("CA-107844");
+        }
         ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
         peoi[sp].irq = irq;
         peoi[sp].vector = vector;
         peoi[sp].ready = 0;
         pending_eoi_sp(peoi) = sp+1;
         cpu_set(smp_processor_id(), action->cpu_eoi_map);
+
+        this_cpu(_peoi_dbg)[(this_cpu(_peoi_dbg_idx)++) & (NR_PEOI_RECORDS-1)]
+            = (struct peoi_record){PEOI_PUSH, sp, irq, peoi[sp].vector};
     }
 
     for ( i = 0; i < action->nr_guests; i++ )
@@ -1130,6 +1210,9 @@ static void flush_ready_eoi(void)
         spin_lock(&desc->lock);
         desc->handler->end(irq, peoi[sp].vector);
         spin_unlock(&desc->lock);
+
+        this_cpu(_peoi_dbg)[(this_cpu(_peoi_dbg_idx)++) & (NR_PEOI_RECORDS-1)]
+            = (struct peoi_record){PEOI_POP, sp+1, irq, peoi[sp].vector};
     }
 
     pending_eoi_sp(peoi) = sp+1;
@@ -1155,6 +1238,9 @@ static void __set_eoi_ready(struct irq_d
     } while ( peoi[--sp].irq != irq );
     ASSERT(!peoi[sp].ready);
     peoi[sp].ready = 1;
+
+    this_cpu(_peoi_dbg)[(this_cpu(_peoi_dbg_idx)++) & (NR_PEOI_RECORDS-1)]
+        = (struct peoi_record){PEOI_SETREADY, sp, irq, desc->chip_data->vector};
 }
 
 /* Mark specified IRQ as ready-for-EOI (if it really is) and attempt to EOI. */
@@ -1976,6 +2062,8 @@ void fixup_irqs(void)
 
     /* Flush the interrupt EOI stack. */
     peoi = this_cpu(pending_eoi);
+    this_cpu(_peoi_dbg)[(this_cpu(_peoi_dbg_idx)++) & (NR_PEOI_RECORDS-1)]
+        = (struct peoi_record){PEOI_FLUSH, pending_eoi_sp(peoi)};
     for ( sp = 0; sp < pending_eoi_sp(peoi); sp++ )
         peoi[sp].ready = 1;
     flush_ready_eoi();
diff -r bbd6b6d05c06 xen/include/asm-x86/apic.h
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -152,6 +152,18 @@ static __inline bool_t apic_isr_read(u8 
             (vector & 0x1f)) & 1;
 }
 
+static __inline bool_t apic_tmr_read(u8 vector)
+{
+    return (apic_read(APIC_TMR + ((vector & ~0x1f) >> 1)) >>
+            (vector & 0x1f)) & 1;
+}
+
+static __inline bool_t apic_irr_read(u8 vector)
+{
+    return (apic_read(APIC_IRR + ((vector & ~0x1f) >> 1)) >>
+            (vector & 0x1f)) & 1;
+}
+
 static __inline u32 get_apic_id(void) /* Get the physical APIC id */
 {
     u32 id = apic_read(APIC_ID);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12  9:28                 ` Andrew Cooper
@ 2013-08-12 10:05                   ` Jan Beulich
  2013-08-12 10:27                     ` Andrew Cooper
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2013-08-12 10:05 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Thimo E., Keir Fraser

>>> On 12.08.13 at 11:28, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 12/08/13 09:20, Jan Beulich wrote:
>>>>> On 09.08.13 at 23:27, "Thimo E." <abc@digithi.de> wrote:
>>> (XEN) **Pending EOI error
>>> (XEN)   irq 29, vector 0x24
>>> (XEN)   s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 
> 00000000
>>> (XEN) All LAPIC state:
>>> (XEN) [vector]      ISR      TMR      IRR
>>> (XEN) [1f:00] 00000000 00000000 00000000
>>> (XEN) [3f:20] 00000010 76efa12e 00000000
>>> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
>>> (XEN) [7f:60] 00000000 32d096ca 00000000
>>> (XEN) [9f:80] 00000000 78fcf87a 00000000
>>> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
>>> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
>>> (XEN) [ff:e0] 00000000 00000000 00000000
>>> (XEN) Peoi stack trace records:
>> Mind providing (a link to) the patch that was used here, so that
>> one can make sense of the printed information (and perhaps
>> also suggest adjustments to that debugging code)? Nothing I
>> was able to find on the list fully matches the output above...
> 
> Attached

Thanks. Actually, the second case he sent has an interesting
difference:

(XEN)   s[0] irq 29, vec 0x26, ready 0, ISR 00000001, TMR 00000000, IRR 00000001

i.e. we in fact have _three_ instance of the interrupt (two in-service,
and one request). I don't see an explanation for this other than
buggy hardware. Sadly we still don't know what device it is that is
behaving that way (including the confirmation that it's a non-
maskable MSI one).

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12 10:05                   ` Jan Beulich
@ 2013-08-12 10:27                     ` Andrew Cooper
  2013-08-14  2:53                       ` Zhang, Yang Z
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-08-12 10:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Thimo E., Keir Fraser

On 12/08/13 11:05, Jan Beulich wrote:
>>>> On 12.08.13 at 11:28, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 12/08/13 09:20, Jan Beulich wrote:
>>>>>> On 09.08.13 at 23:27, "Thimo E." <abc@digithi.de> wrote:
>>>> (XEN) **Pending EOI error
>>>> (XEN)   irq 29, vector 0x24
>>>> (XEN)   s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 
>> 00000000
>>>> (XEN) All LAPIC state:
>>>> (XEN) [vector]      ISR      TMR      IRR
>>>> (XEN) [1f:00] 00000000 00000000 00000000
>>>> (XEN) [3f:20] 00000010 76efa12e 00000000
>>>> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
>>>> (XEN) [7f:60] 00000000 32d096ca 00000000
>>>> (XEN) [9f:80] 00000000 78fcf87a 00000000
>>>> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
>>>> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
>>>> (XEN) [ff:e0] 00000000 00000000 00000000
>>>> (XEN) Peoi stack trace records:
>>> Mind providing (a link to) the patch that was used here, so that
>>> one can make sense of the printed information (and perhaps
>>> also suggest adjustments to that debugging code)? Nothing I
>>> was able to find on the list fully matches the output above...
>> Attached
> Thanks. Actually, the second case he sent has an interesting
> difference:
>
> (XEN)   s[0] irq 29, vec 0x26, ready 0, ISR 00000001, TMR 00000000, IRR 00000001
>
> i.e. we in fact have _three_ instance of the interrupt (two in-service,
> and one request). I don't see an explanation for this other than
> buggy hardware. Sadly we still don't know what device it is that is
> behaving that way (including the confirmation that it's a non-
> maskable MSI one).
>
> Jan
>

On the XenServer hardware where we have seen this issue, the problematic
interrupt was from:

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection
I217-LM (rev 02)
Subsystem: Intel Corporation Device 0000
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 1275
Region 0: Memory at c2700000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at c273e000 (32-bit, non-prefetchable) [size=4K]
Region 2: I/O ports at 7080 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00318 Data: 0000
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: e1000e
Kernel modules: e1000e

I am still attempting to reproduce the issue, but we haven’t seen it
again since my email at the root of this thread.

~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12  8:49                     ` Zhang, Yang Z
  2013-08-12  8:57                       ` Jan Beulich
@ 2013-08-12 11:52                       ` Thimo E
  2013-08-12 12:04                         ` Andrew Cooper
  2013-08-12 13:54                       ` Thimo E
  2 siblings, 1 reply; 63+ messages in thread
From: Thimo E @ 2013-08-12 11:52 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao


[-- Attachment #1.1: Type: text/plain, Size: 2618 bytes --]

Hello Yang,

attached you'll find the kernel dmesg, xen dmesg, lspci and output of 
/proc/interrupts. If you want to see further logfiles, please let me know.

The processor is a Core i5-4670. The board is an Intel  DH87MC 
Mainboard. I am really not sure if it supports APICv, but VT-d is 
supported enabled enabled.


> 4.The status of IRQ 29 is 10 which means the guest already issues the 
> EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should 
> be no pending EOI in the EOI stack. If possible, can you add some 
> debug message in the guest EOI code path(like _irq_guest_eoi())) to 
> track the EOI?
>
I don't see the IRQ29 in /proc/interrupts, what I see is:
cat xen-dmesg.txt |grep "29": (XEN) allocated vector 29 for irq 20
cat dmesg.txt | grep "eth0": [   23.152355] e1000e 0000:00:19.0: PCI INT 
A -> GSI 20 (level, low) -> IRQ 20
                                                   [   23.330408] e1000e 
0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection

So is the ethernet irq the bad one ? That is an Onboard Intel network 
adapter.

> 6.I guess the interrupt remapping is enabled in your machine. Can you 
> try to disable IR to see whether it still reproduceable?
>
Just to be sure, your proposal is to try the parameter "no-intremap" ?

Best regards
   Thimo

Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>
> Hi Thimo,
>
> From your previous experience and log, it shows:
>
> 1.The interrupt that triggers the issue is a MSI.
>
> 2.MSI are treated as edge-triggered interrupts nomally, except when 
> there is no way to mask the device. In this case, your previous log 
> indicates the device is unmaskable(What special device are you 
> using?Modern PCI devcie should be maskable).
>
> 3.The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.
>
> 4.The status of IRQ 29 is 10 which means the guest already issues the 
> EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should 
> be no pending EOI in the EOI stack. If possible, can you add some 
> debug message in the guest EOI code path(like _irq_guest_eoi())) to 
> track the EOI?
>
> 5.Both of the log show when the issue occured, most of the other 
> interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it 
> a coincidence? Or it happened only on the special condition like heavy 
> of IRQ migration?Perhaps you can disable irq balance in dom0 and pin 
> the IRQ manually.
>
|6.I guess the interrupt remapping is enabled in your machine. Can you 
try to disable IR to see whether it still reproduceable?
>
> Also, please provide the whole Xen log.
>
> Best regards,
>
> Yang
>


[-- Attachment #1.2: Type: text/html, Size: 15679 bytes --]

[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 37951 bytes --]

[root@localhost ~]# dmesg
[    0.000000] Reserving virtual address space above 0xfdc00000
[    0.000000] Linux version 2.6.32.43-0.4.1.xs1.8.0.835.170778xen (geeko@buildhost) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-51)) #1 SMP Wed May 29 18:06:30 EDT 2013
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   NSC Geode by NSC
[    0.000000]   Cyrix CyrixInstead
[    0.000000]   Centaur CentaurHauls
[    0.000000]   Transmeta GenuineTMx86
[    0.000000]   Transmeta TransmetaCPU
[    3.886977] Xen-provided machine memory map:
[    3.886979]  BIOS: 0000000000000000 - 000000000009d800 (usable)
[    3.886981]  BIOS: 000000000009d800 - 00000000000a0000 (reserved)
[    3.886982]  BIOS: 00000000000e0000 - 0000000000100000 (reserved)
[    3.886984]  BIOS: 0000000000100000 - 00000000b857f000 (usable)
[    3.886985]  BIOS: 00000000b857f000 - 00000000b8586000 (ACPI NVS)
[    3.886986]  BIOS: 00000000b8586000 - 00000000cb8dc000 (usable)
[    3.886987]  BIOS: 00000000cb8dc000 - 00000000cbae4000 (reserved)
[    3.886989]  BIOS: 00000000cbae4000 - 00000000cbaf9000 (ACPI data)
[    3.886990]  BIOS: 00000000cbaf9000 - 00000000cbc66000 (ACPI NVS)
[    3.886991]  BIOS: 00000000cbc66000 - 00000000cbfff000 (reserved)
[    3.886992]  BIOS: 00000000cbfff000 - 00000000cc000000 (usable)
[    3.886994]  BIOS: 00000000d7000000 - 00000000df200000 (reserved)
[    3.886995]  BIOS: 00000000f8000000 - 00000000fc000000 (reserved)
[    3.886996]  BIOS: 00000000fec00000 - 00000000fec01000 (reserved)
[    3.886997]  BIOS: 00000000fed00000 - 00000000fed04000 (reserved)
[    3.886998]  BIOS: 00000000fed1c000 - 00000000fed20000 (reserved)
[    3.887000]  BIOS: 00000000fee00000 - 00000000fee01000 (reserved)
[    3.887001]  BIOS: 00000000ff000000 - 0000000100000000 (reserved)
[    3.887002]  BIOS: 0000000100000000 - 000000081fe00000 (usable)
[    3.887003] Xen-provided physical RAM map:
[    3.887004]  Xen: 0000000000000000 - 000000002f800000 (usable)
[    3.887021] DMI 2.7 present.
[    3.887331] last_pfn = 0x2f800 max_arch_pfn = 0x10000000
[    3.887332] initial memory mapped : 0 - 00000000
[    3.887334] init_memory_mapping: 0000000000000000-000000002f800000
[    3.887337] NX (Execute Disable) protection: active
[    3.887339]  0000000000 - 002f800000 page 4k
[    3.887565] kernel direct mapping tables up to 2f800000 @ 1040000-11c1000
[    4.119725] RAMDISK: 00787000 - 00f70200
[    4.121445] ACPI: RSDP 000f0490 00024 (v02  INTEL)
[    4.121450] ACPI: XSDT cbae8078 00074 (v01 INTEL  DH87MC   0000002F AMI  00010013)
[    4.121455] ACPI: FACP cbaf3ef0 0010C (v05 INTEL  DH87MC   0000002F AMI  00010013)
[    4.121459] ACPI Warning: FADT (revision 5) is longer than ACPI 2.0 version, truncating length 0x10C to 0xF4 (20090903/tbfadt-288)
[    4.121463] ACPI: DSDT cbae8180 0BD6E (v02 INTEL  DH87MC   0000002F INTL 20091112)
[    4.121466] ACPI: FACS cbc64080 00040
[    4.121469] ACPI: APIC cbaf4000 00072 (v03 INTEL  DH87MC   0000002F AMI  00010013)
[    4.121472] ACPI: FPDT cbaf4078 00044 (v01 INTEL  DH87MC   0000002F AMI  00010013)
[    4.121475] ACPI: SSDT cbaf40c0 00539 (v01 INTEL  DH87MC   0000002F INTL 20051117)
[    4.121477] ACPI: SSDT cbaf4600 00AD8 (v01 INTEL  DH87MC   0000002F INTL 20051117)
[    4.121480] ACPI: MCFG cbaf50d8 0003C (v01 INTEL  DH87MC   0000002F MSFT 00000097)
[    4.121483] ACPI: HPET cbaf5118 00038 (v01 INTEL  DH87MC   0000002F AMI. 00000005)
[    4.121486] ACPI: SSDT cbaf5150 0036D (v01 INTEL  DH87MC   0000002F INTL 20091112)
[    4.121489] ACPI: SSDT cbaf54c0 02EDB (v01 INTEL  DH87MC   0000002F INTL 20091112)
[    4.121492] ACPI: XMAR cbaf83a0 000B8 (v01 INTEL  DH87MC   0000002F INTL 00000001)
[    4.121505] 0MB HIGHMEM available.
[    4.121506] 760MB LOWMEM available.
[    4.121507]   mapped low ram: 0 - 2f800000
[    4.121508]   low ram: 0 - 2f800000
[    4.121511]   node 0 low ram: 00000000 - 2f000000
[    4.121512]   node 0 bootmap 00000000 - 00005e00
[    4.122600] (5 early reservations) ==> bootmem [0000000000 - 002f000000]
[    4.122601]   #0 [0000100000 - 0000766414]    TEXT DATA BSS ==> [0000100000 - 0000766414]
[    4.122613]   #1 [0000787000 - 0001040000]     Xen provided ==> [0000787000 - 0001040000]
[    4.122627]   #2 [0000767000 - 00007671c5]              BRK ==> [0000767000 - 00007671c5]
[    4.122628]   #3 [0001040000 - 00011b4000]          PGTABLE ==> [0001040000 - 00011b4000]
[    4.122632]   #4 [0000000000 - 0000006000]          BOOTMAP ==> [0000000000 - 0000006000]
[    4.122640] found SMP MP-table at [fdbf8740] 000fd740
[    4.126292] Zone PFN ranges:
[    4.126293]   DMA      0x00000000 -> 0x00001000
[    4.126294]   Normal   0x00001000 -> 0x0002f800
[    4.126295]   HighMem  0x0002f800 -> 0x0002f800
[    4.126296] Movable zone start PFN for each node
[    4.126297] early_node_map[2] active PFN ranges
[    4.126298]     0: 0x00000000 -> 0x0002f000
[    4.126298]     0: 0x0002f800 -> 0x0002f800
[    4.126300] On node 0 totalpages: 192512
[    4.126565] free_area_init_node: node 0, pgdat c05cee80, node_mem_map c11b6000
[    4.126567]   DMA zone: 32 pages used for memmap
[    4.126568]   DMA zone: 0 pages reserved
[    4.126569]   DMA zone: 4064 pages, LIFO batch:0
[    4.126617]   Normal zone: 1488 pages used for memmap
[    4.126619]   Normal zone: 186928 pages, LIFO batch:31
[    4.128804] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[    4.128805] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
[    4.128806] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)
[    4.128807] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)
[    4.128810] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
[    4.128813] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    4.128818] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
[    4.128821] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    4.128822] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    4.128824] ACPI: IRQ0 used by override.
[    4.128825] ACPI: IRQ2 used by override.
[    4.128826] ACPI: IRQ9 used by override.
[    4.128830] Using ACPI (MADT) for SMP configuration information
[    4.128845] Allocating PCI resources starting at df200000 (gap: df200000:18e00000)
[    4.128847] NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:4 nr_node_ids:1
[    4.128866] PERCPU: Embedded 10 pages/cpu @c1868000 s18456 r0 d22504 u65536
[    4.128869] pcpu-alloc: s18456 r0 d22504 u65536 alloc=16*4096
[    4.128870] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
[    4.128881] Swapping MFNs for PFN 639 and 186a (MFN b9639 and 808795)
[    4.128894] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 190992
[    4.128896] Kernel command line: root=LABEL=root-logtaqlb ro console=tty0 xencons=hvc console=hvc0
[    4.128913] PID hash table entries: 4096 (order: 2, 16384 bytes)
[    4.128937] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    4.129036] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    4.129217] Enabling fast FPU save and restore... done.
[    4.129220] Enabling unmasked SIMD FPU exception support... done.
[    4.129223] Initializing CPU#0
[    4.150637] Software IO TLB enabled:
[    4.150638]  Aperture:     64 megabytes
[    4.150638]  Address size: 28 bits
[    4.150638]  Kernel range: c1962000 - c5962000
[    4.150639] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    4.150641] PCI-DMA: mask is set to 37 bits
[    4.153027] Initializing HighMem for node 0 (00000000:00000000)
[    4.153029] Memory: 679608k/778240k available (2935k kernel code, 90012k reserved, 2030k data, 392k init, 0k highmem)
[    4.153032] virtual kernel memory layout:
[    4.153033]     fixmap  : 0xfd877000 - 0xfdbff000   (3616 kB)
[    4.153033]     pkmap   : 0xfd400000 - 0xfd600000   (2048 kB)
[    4.153034]     vmalloc : 0xf0000000 - 0xfd3fe000   ( 211 MB)
[    4.153034]     lowmem  : 0xc0000000 - 0xef800000   ( 760 MB)
[    4.153035]       .init : 0xc05da000 - 0xc063c000   ( 392 kB)
[    4.153035]       .data : 0xc03ddc79 - 0xc05d95fc   (2030 kB)
[    4.153036]       .text : 0xc0100000 - 0xc03ddc79   (2935 kB)
[    4.153037] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[    4.153083] Hierarchical RCU implementation.
[    4.153086] NR_IRQS:5376
[    4.153564] Extended CMOS year: 2000
[    4.153566] Xen reported: 3392.246 MHz processor.
[    4.155016] Console: colour VGA+ 80x25
[    4.163658] console [tty0] enabled
[    4.173441] console [hvc0] enabled
[    4.298919] Calibrating delay using timer specific routine.. 6819.24 BogoMIPS (lpj=34096222)
[    4.299060] pid_max: default: 32768 minimum: 301
[    4.299155] Mount-cache hash table entries: 512
[    4.299292] mce: CPU supports 9 MCE banks
[    4.299383] Checking 'hlt' instruction... OK.
[    4.299798] SMP alternatives: switching to UP code
[    4.314046] ACPI: Core revision 20090903
[    4.328131] ftrace: converting mcount calls to 0f 1f 44 00 00
[    4.328205] ftrace: allocating 13078 entries in 26 pages
[    4.331244] SMP alternatives: switching to SMP code
[    4.345359] Initializing CPU#1
[    4.345402] Initializing CPU#2
[    4.345438] Brought up 4 CPUs
[    4.345441] Initializing CPU#3
[    4.427837] Grant table version 2
[    4.427949] NET: Registered protocol family 16
[    4.428286] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[    4.428376] ACPI: bus type pci registered
[    4.428469] PCI: MCFG configuration 0: base f8000000 segment 0 buses 0 - 63
[    4.428545] PCI: MCFG area at f8000000 reserved in E820
[    4.428615] PCI: Using MMCONFIG for extended config space
[    4.428687] PCI: Using configuration type 1 for base access
[    4.430194] bio: create slab <bio-0> at 0
[    4.431141] ACPI: EC: Look up EC in DSDT
[    4.433028] ACPI: Executed 1 blocks of module-level executable AML code
[    4.434934] ACPI: BIOS _OSI(Linux) query ignored
[    4.436410] ACPI: Interpreter enabled
[    4.436479] ACPI: (supports S0 S5)
[    4.436652] ACPI: Using IOAPIC for interrupt routing
[    4.443922] ACPI: Power Resource [FN00] (off)
[    4.444001] ACPI: Power Resource [FN01] (off)
[    4.444079] ACPI: Power Resource [FN02] (off)
[    4.444156] ACPI: Power Resource [FN03] (off)
[    4.444233] ACPI: Power Resource [FN04] (off)
[    4.444700] ACPI: No dock devices found.
[    4.444773] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    4.445207] ACPI: PCI Root Bridge [PCI0] (0000:00)
[    4.445955] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
[    4.446030] pci 0000:00:01.0: PME# disabled
[    4.446156] pci 0000:00:02.0: reg 10 64bit mmio: [0xf7800000-0xf7bfffff]
[    4.446166] pci 0000:00:02.0: reg 18 64bit mmio pref: [0xe0000000-0xefffffff]
[    4.446171] pci 0000:00:02.0: reg 20 io port: [0xf000-0xf03f]
[    4.446283] pci 0000:00:14.0: reg 10 64bit mmio: [0xf7e20000-0xf7e2ffff]
[    4.446338] pci 0000:00:14.0: PME# supported from D3hot D3cold
[    4.446416] pci 0000:00:14.0: PME# disabled
[    4.446584] pci 0000:00:16.0: reg 10 64bit mmio: [0xf7e3b000-0xf7e3b00f]
[    4.446643] pci 0000:00:16.0: PME# supported from D0 D3hot D3cold
[    4.446718] pci 0000:00:16.0: PME# disabled
[    4.446857] pci 0000:00:19.0: reg 10 32bit mmio: [0xf7e00000-0xf7e1ffff]
[    4.446864] pci 0000:00:19.0: reg 14 32bit mmio: [0xf7e39000-0xf7e39fff]
[    4.446871] pci 0000:00:19.0: reg 18 io port: [0xf080-0xf09f]
[    4.446927] pci 0000:00:19.0: PME# supported from D0 D3hot D3cold
[    4.447002] pci 0000:00:19.0: PME# disabled
[    4.447136] pci 0000:00:1a.0: reg 10 32bit mmio: [0xf7e38000-0xf7e383ff]
[    4.447212] pci 0000:00:1a.0: PME# supported from D0 D3hot D3cold
[    4.447289] pci 0000:00:1a.0: PME# disabled
[    4.447413] pci 0000:00:1b.0: reg 10 64bit mmio: [0xf7e30000-0xf7e33fff]
[    4.447473] pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
[    4.447548] pci 0000:00:1b.0: PME# disabled
[    4.447709] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
[    4.447784] pci 0000:00:1c.0: PME# disabled
[    4.447944] pci 0000:00:1c.2: PME# supported from D0 D3hot D3cold
[    4.448019] pci 0000:00:1c.2: PME# disabled
[    4.448164] pci 0000:00:1d.0: reg 10 32bit mmio: [0xf7e37000-0xf7e373ff]
[    4.448239] pci 0000:00:1d.0: PME# supported from D0 D3hot D3cold
[    4.448315] pci 0000:00:1d.0: PME# disabled
[    4.448562] pci 0000:00:1f.2: reg 10 io port: [0xf0d0-0xf0d7]
[    4.448569] pci 0000:00:1f.2: reg 14 io port: [0xf0c0-0xf0c3]
[    4.448575] pci 0000:00:1f.2: reg 18 io port: [0xf0b0-0xf0b7]
[    4.448582] pci 0000:00:1f.2: reg 1c io port: [0xf0a0-0xf0a3]
[    4.448588] pci 0000:00:1f.2: reg 20 io port: [0xf060-0xf07f]
[    4.448595] pci 0000:00:1f.2: reg 24 32bit mmio: [0xf7e36000-0xf7e367ff]
[    4.448674] pci 0000:00:1f.2: PME# supported from D3hot
[    4.448748] pci 0000:00:1f.2: PME# disabled
[    4.448851] pci 0000:00:1f.3: reg 10 64bit mmio: [0xf7e35000-0xf7e350ff]
[    4.448870] pci 0000:00:1f.3: reg 20 io port: [0xf040-0xf05f]
[    4.448937] pci 0000:01:00.0: reg 10 32bit mmio: [0xf7d10000-0xf7d11fff]
[    4.448973] pci 0000:01:00.0: reg 30 32bit mmio pref: [0xf7d00000-0xf7d0ffff]
[    4.449004] pci 0000:01:00.0: supports D1
[    4.449047] pci 0000:00:01.0: PCI bridge to [bus 01-01]
[    4.449122] pci 0000:00:01.0: bridge 32bit mmio: [0xf7d00000-0xf7dfffff]
[    4.449284] pci 0000:02:00.0: supports D1 D2
[    4.449285] pci 0000:02:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    4.449363] pci 0000:02:00.0: PME# disabled
[    4.449490] pci 0000:00:1c.0: PCI bridge to [bus 02-03]
[    4.449564] pci 0000:00:1c.0: bridge io port: [0xe000-0xefff]
[    4.449627] pci 0000:03:02.0: reg 10 io port: [0xe050-0xe057]
[    4.449639] pci 0000:03:02.0: reg 14 io port: [0xe040-0xe047]
[    4.449652] pci 0000:03:02.0: reg 18 io port: [0xe030-0xe037]
[    4.449664] pci 0000:03:02.0: reg 1c io port: [0xe020-0xe027]
[    4.449676] pci 0000:03:02.0: reg 20 io port: [0xe010-0xe017]
[    4.449689] pci 0000:03:02.0: reg 24 io port: [0xe000-0xe00f]
[    4.449807] pci 0000:02:00.0: PCI bridge to [bus 03-03] (subtractive decode)
[    4.449891] pci 0000:02:00.0: bridge io port: [0xe000-0xefff]
[    4.449908] pci 0000:02:00.0:   bridge window [0xe000-0xefff] (subtractive decode)
[    4.449909] pci 0000:02:00.0:   bridge window [0x0-0x0] (subtractive decode)
[    4.449910] pci 0000:02:00.0:   bridge window [0x0-0x0] (subtractive decode)
[    4.449911] pci 0000:02:00.0:   bridge window [0x0-0x0] (subtractive decode)
[    4.450003] pci 0000:04:00.0: reg 10 32bit mmio: [0xf7cc0000-0xf7cdffff]
[    4.450016] pci 0000:04:00.0: reg 14 32bit mmio: [0xf7c00000-0xf7c7ffff]
[    4.450028] pci 0000:04:00.0: reg 18 io port: [0xd000-0xd01f]
[    4.450041] pci 0000:04:00.0: reg 1c 32bit mmio: [0xf7ce0000-0xf7ce3fff]
[    4.450076] pci 0000:04:00.0: reg 30 32bit mmio pref: [0xf7c80000-0xf7cbffff]
[    4.450141] pci 0000:04:00.0: PME# supported from D0 D3hot D3cold
[    4.450219] pci 0000:04:00.0: PME# disabled
[    4.450367] pci 0000:00:1c.2: PCI bridge to [bus 04-04]
[    4.450440] pci 0000:00:1c.2: bridge io port: [0xd000-0xdfff]
[    4.450444] pci 0000:00:1c.2: bridge 32bit mmio: [0xf7c00000-0xf7cfffff]
[    4.450468] pci_bus 0000:00: on NUMA node 0
[    4.450470] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[    4.450528] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.RP01._PRT]
[    4.450551] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.RP03._PRT]
[    4.450570] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT]
[    4.450597] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEG0._PRT]
[    4.454249] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15)
[    4.454900] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 10 11 12 14 15) *0, disabled.
[    4.455702] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 10 *11 12 14 15)
[    4.456356] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 *10 11 12 14 15)
[    4.457005] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 10 *11 12 14 15)
[    4.457693] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 10 11 12 14 15) *0, disabled.
[    4.458462] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 *10 11 12 14 15)
[    4.459110] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 *10 11 12 14 15)
[    4.459854] xen_mem: Initialising balloon driver.
[    4.460070] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    4.460163] vgaarb: loaded
[    4.460327] PCI: Using ACPI for IRQ routing
[    4.460566] Switching to clocksource xen
[    4.461400] pnp: PnP ACPI init
[    4.461475] ACPI: bus type pnp registered
[    4.463957] pnp: PnP ACPI: found 15 devices
[    4.464028] ACPI: ACPI bus type pnp unregistered
[    4.464104] system 00:01: iomem range 0xfed40000-0xfed44fff has been reserved
[    4.464182] system 00:06: ioport range 0x680-0x69f has been reserved
[    4.464256] system 00:06: ioport range 0xffff-0xffff has been reserved
[    4.464329] system 00:06: ioport range 0xffff-0xffff has been reserved
[    4.464401] system 00:06: ioport range 0xffff-0xffff has been reserved
[    4.464475] system 00:06: ioport range 0x1c00-0x1cfe has been reserved
[    4.465082] system 00:06: ioport range 0x1d00-0x1dfe has been reserved
[    4.465155] system 00:06: ioport range 0x1e00-0x1efe has been reserved
[    4.465229] system 00:06: ioport range 0x1f00-0x1ffe has been reserved
[    4.465303] system 00:06: ioport range 0x1800-0x18fe has been reserved
[    4.465376] system 00:06: ioport range 0x164e-0x164f has been reserved
[    4.465450] system 00:08: ioport range 0x1854-0x1857 has been reserved
[    4.465526] system 00:09: ioport range 0xa00-0xa1f has been reserved
[    4.465600] system 00:09: ioport range 0xa20-0xa3f has been reserved
[    4.465676] system 00:0d: ioport range 0x4d0-0x4d1 has been reserved
[    4.465750] system 00:0e: iomem range 0xfed1c000-0xfed1ffff has been reserved
[    4.465861] system 00:0e: iomem range 0xfed10000-0xfed17fff has been reserved
[    4.465935] system 00:0e: iomem range 0xfed18000-0xfed18fff has been reserved
[    4.466009] system 00:0e: iomem range 0xfed19000-0xfed19fff has been reserved
[    4.466084] system 00:0e: iomem range 0xf8000000-0xfbffffff has been reserved
[    4.466158] system 00:0e: iomem range 0xfed20000-0xfed3ffff has been reserved
[    4.466231] system 00:0e: iomem range 0xfed90000-0xfed93fff has been reserved
[    4.466306] system 00:0e: iomem range 0xfed45000-0xfed8ffff has been reserved
[    4.466380] system 00:0e: iomem range 0xff000000-0xffffffff has been reserved
[    4.466454] system 00:0e: iomem range 0xfee00000-0xfeefffff could not be reserved
[    4.466541] system 00:0e: iomem range 0xf7fef000-0xf7feffff has been reserved
[    4.466616] system 00:0e: iomem range 0xf7ff0000-0xf7ff0fff has been reserved
[    4.467142] pci 0000:00:01.0: PCI bridge, secondary bus 0000:01
[    4.467216] pci 0000:00:01.0:   IO window: disabled
[    4.467291] pci 0000:00:01.0:   MEM window: 0xf7d00000-0xf7dfffff
[    4.467365] pci 0000:00:01.0:   PREFETCH window: disabled
[    4.467440] pci 0000:02:00.0: PCI bridge, secondary bus 0000:03
[    4.467515] pci 0000:02:00.0:   IO window: 0xe000-0xefff
[    4.467595] pci 0000:02:00.0:   MEM window: disabled
[    4.467670] pci 0000:02:00.0:   PREFETCH window: disabled
[    4.467752] pci 0000:00:1c.0: PCI bridge, secondary bus 0000:02
[    4.467825] pci 0000:00:1c.0:   IO window: 0xe000-0xefff
[    4.467902] pci 0000:00:1c.0:   MEM window: disabled
[    4.468011] pci 0000:00:1c.0:   PREFETCH window: disabled
[    4.468088] pci 0000:00:1c.2: PCI bridge, secondary bus 0000:04
[    4.468162] pci 0000:00:1c.2:   IO window: 0xd000-0xdfff
[    4.468237] pci 0000:00:1c.2:   MEM window: 0xf7c00000-0xf7cfffff
[    4.468311] pci 0000:00:1c.2:   PREFETCH window: disabled
[    4.468413] pci 0000:00:01.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    4.468490] pci 0000:00:01.0: setting latency timer to 64
[    4.468499] pci 0000:00:1c.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    4.468575] pci 0000:00:1c.0: setting latency timer to 64
[    4.468588] pci 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    4.468667] pci 0000:02:00.0: setting latency timer to 64
[    4.468693] pci 0000:00:1c.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
[    4.468770] pci 0000:00:1c.2: setting latency timer to 64
[    4.468773] pci_bus 0000:00: resource 4 io:  [0x00-0xcf7]
[    4.468774] pci_bus 0000:00: resource 5 io:  [0xd00-0xffff]
[    4.468776] pci_bus 0000:00: resource 6 mem: [0x0a0000-0x0bffff]
[    4.468777] pci_bus 0000:00: resource 7 mem: [0x0d4000-0x0d7fff]
[    4.468778] pci_bus 0000:00: resource 8 mem: [0x0d8000-0x0dbfff]
[    4.468778] pci_bus 0000:00: resource 9 mem: [0x0dc000-0x0dffff]
[    4.468779] pci_bus 0000:00: resource 10 mem: [0x0e0000-0x0e3fff]
[    4.468780] pci_bus 0000:00: resource 11 mem: [0x0e4000-0x0e7fff]
[    4.468782] pci_bus 0000:00: resource 12 mem: [0xdf200000-0xfeafffff]
[    4.468783] pci_bus 0000:01: resource 1 mem: [0xf7d00000-0xf7dfffff]
[    4.468784] pci_bus 0000:02: resource 0 io:  [0xe000-0xefff]
[    4.468785] pci_bus 0000:03: resource 0 io:  [0xe000-0xefff]
[    4.468786] pci_bus 0000:03: resource 4 io:  [0xe000-0xefff]
[    4.468787] pci_bus 0000:04: resource 0 io:  [0xd000-0xdfff]
[    4.468788] pci_bus 0000:04: resource 1 mem: [0xf7c00000-0xf7cfffff]
[    4.468822] NET: Registered protocol family 2
[    4.468915] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[    4.469057] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[    4.469268] TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
[    4.469398] TCP: Hash tables configured (established 131072 bind 65536)
[    4.469473] TCP reno registered
[    4.469579] NET: Registered protocol family 1
[    4.469676] pci 0000:00:02.0: Boot video device
[    4.514610] Trying to unpack rootfs image as initramfs...
[    4.517847] Freeing initrd memory: 8100k freed
[    4.519616] MCE: bind virq for DOM0 logging
[    4.523285] VFS: Disk quotas dquot_6.5.2
[    4.523364] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[    4.523496] msgmni has been set to 379
[    4.523730] alg: No test for stdrng (krng)
[    4.523840] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
[    4.523929] io scheduler noop registered
[    4.523999] io scheduler anticipatory registered
[    4.524070] io scheduler deadline registered
[    4.524153] io scheduler cfq registered (default)
[    4.524353] pcieport 0000:00:01.0: get owner: 7ff0
[    4.524441] pcieport 0000:00:01.0: irq 1279 (279) for MSI/MSI-X
[    4.524449] pcieport 0000:00:01.0: setting latency timer to 64
[    4.524629] pcieport 0000:00:1c.0: get owner: 7ff0
[    4.524720] pcieport 0000:00:1c.0: irq 1278 (278) for MSI/MSI-X
[    4.524730] pcieport 0000:00:1c.0: setting latency timer to 64
[    4.524882] pcieport 0000:00:1c.2: get owner: 7ff0
[    4.524973] pcieport 0000:00:1c.2: irq 1277 (277) for MSI/MSI-X
[    4.524982] pcieport 0000:00:1c.2: setting latency timer to 64
[    4.544605] floppy0: Unable to grab DMA2 for the floppy driver
[    7.564496] floppy0: no floppy controllers found
[    7.565633] brd: module loaded
[    7.574697] loop: module loaded
[    7.574824] Xen virtual console successfully installed as hvc0
[    7.574934] Event-channel device installed.
[    7.575044] blktap_device_init: blktap device major 253
[    7.575117] blktap_ring_init: blktap ring major: 253
[    7.578818] netfront: Initialising virtual ethernet driver.
[    7.579073] Uniform Multi-Platform E-IDE driver
[    7.579562] PNP: No PS/2 controller found. Probing ports directly.
[    7.824855] serio: i8042 KBD port at 0x60,0x64 irq 1
[    7.825038] mice: PS/2 mouse device common for all mice
[    7.825278] NET: Registered protocol family 17
[    7.825471] registered taskstats version 1
[    7.826002] PCI IO multiplexer device installed.
[    7.826082] BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
[    7.826263] Freeing unused kernel memory: 392k freed
[    8.146851] usbcore: registered new interface driver usbfs
[    8.146969] usbcore: registered new interface driver hub
[    8.147070] usbcore: registered new device driver usb
[    8.148617] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    8.148716] ehci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    8.148802] ehci_hcd 0000:00:1a.0: setting latency timer to 64
[    8.148805] ehci_hcd 0000:00:1a.0: EHCI Host Controller
[    8.148884] ehci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 1
[    8.149008] ehci_hcd 0000:00:1a.0: debug port 2
[    8.152975] ehci_hcd 0000:00:1a.0: cache line size of 32 is not supported
[    8.153172] ehci_hcd 0000:00:1a.0: irq 16, io mem 0xf7e38000
[    8.174490] ehci_hcd 0000:00:1a.0: USB 2.0 started, EHCI 1.00
[    8.174641] usb usb1: configuration #1 chosen from 1 choice
[    8.174739] hub 1-0:1.0: USB hub found
[    8.174816] hub 1-0:1.0: 2 ports detected
[    8.174960] ehci_hcd 0000:00:1d.0: PCI INT A -> GSI 23 (level, low) -> IRQ 23
[    8.175046] ehci_hcd 0000:00:1d.0: setting latency timer to 64
[    8.175049] ehci_hcd 0000:00:1d.0: EHCI Host Controller
[    8.175124] ehci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
[    8.175239] ehci_hcd 0000:00:1d.0: debug port 2
[    8.179190] ehci_hcd 0000:00:1d.0: cache line size of 32 is not supported
[    8.179361] ehci_hcd 0000:00:1d.0: irq 23, io mem 0xf7e37000
[    8.194473] ehci_hcd 0000:00:1d.0: USB 2.0 started, EHCI 1.00
[    8.194616] usb usb2: configuration #1 chosen from 1 choice
[    8.194713] hub 2-0:1.0: USB hub found
[    8.194789] hub 2-0:1.0: 2 ports detected
[    8.196136] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    8.197428] uhci_hcd: USB Universal Host Controller Interface driver
[    8.231334] SCSI subsystem initialized
[    8.234919] arcmsr 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[    8.235024] arcmsr 0000:01:00.0: setting latency timer to 64
[    8.254518] Areca RAID Controller0: Model ARC-1212, F/W V1.51 2012-07-04
[    8.254688] scsi0 : Areca SAS Host Adapter RAID Controller( RAID6 capable)
[    8.254688]         v1.20.0X.15.130619
[    8.254923] arcmsr 0000:01:00.0: get owner: 7ff0
[    8.255021] arcmsr 0000:01:00.0: irq 1276 (276) for MSI/MSI-X
[    8.255028] IRQ 1276/arcmsr: IRQF_DISABLED is not guaranteed on shared IRQs
[    8.255278] arcmsr0: msi enabled
[    8.274633] scsi 0:0:0:0: Direct-Access     Areca    ARC-1212-VOL#000 R001 PQ: 0 ANSI: 5
[    8.274844] sd 0:0:0:0: [sda] 39061504 512-byte logical blocks: (19.9 GB/18.6 GiB)
[    8.274973] scsi 0:0:0:1: Direct-Access     Areca    ARC-1212-VOL#001 R001 PQ: 0 ANSI: 5
[    8.275121] sd 0:0:0:0: [sda] Write Protect is off
[    8.275192] sd 0:0:0:0: [sda] Mode Sense: cb 00 00 08
[    8.275259] sd 0:0:0:1: [sdb] 7773435904 512-byte logical blocks: (3.97 TB/3.61 TiB)
[    8.275283] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    8.275507] sda: detected capacity change from 0 to 19999490048
[    8.275592]  sda:
[    8.275654] sd 0:0:0:1: [sdb] Write Protect is off
[    8.275793] sd 0:0:0:1: [sdb] Mode Sense: cb 00 00 08
[    8.275834] sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    8.276006] scsi 0:0:16:0: Processor         Areca    RAID controller  R001 PQ: 0 ANSI: 0
[    8.276159] sdb: detected capacity change from 0 to 3979999182848
[    8.276233]  sdb: unknown partition table
[    8.284771]  sda1 sda2 sda3
[    8.285266] sd 0:0:0:0: [sda] Attached SCSI disk
[    8.285267] sd 0:0:0:1: [sdb] Attached SCSI disk
[    8.288664] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-devel@redhat.com
[    8.494474] usb 1-1: new high speed USB device using ehci_hcd and address 2
[    8.644803] usb 1-1: configuration #1 chosen from 1 choice
[    8.645032] hub 1-1:1.0: USB hub found
[    8.645141] hub 1-1:1.0: 6 ports detected
[    8.764493] usb 2-1: new high speed USB device using ehci_hcd and address 2
[    8.914915] usb 2-1: configuration #1 chosen from 1 choice
[    8.915146] hub 2-1:1.0: USB hub found
[    8.915258] hub 2-1:1.0: 8 ports detected
[    8.994635] usb 1-1.1: new full speed USB device using ehci_hcd and address 3
[    9.105825] usb 1-1.1: configuration #1 chosen from 1 choice
[    9.182281] usb 1-1.2: new low speed USB device using ehci_hcd and address 4
[    9.301559] usb 1-1.2: configuration #1 chosen from 1 choice
[   19.543871] EXT3-fs: INFO: recovery required on readonly filesystem.
[   19.543953] EXT3-fs: write access will be enabled during recovery.
[   20.703097] kjournald starting.  Commit interval 15 seconds
[   20.703103] EXT3-fs: recovery complete.
[   20.717913] EXT3-fs: mounted filesystem with ordered data mode.
[   23.105239] input: PC Speaker as /class/input/input0
[   23.152294] e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-NAPI
[   23.152295] e1000e: Copyright(c) 1999 - 2013 Intel Corporation.
[   23.152355] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[   23.152362] e1000e 0000:00:19.0: setting latency timer to 64
[   23.152435] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[   23.152483] e1000e 0000:00:19.0: get owner: 7ff0
[   23.152512] e1000e 0000:00:19.0: irq 1275 (275) for MSI/MSI-X
[   23.161131] libata version 3.00 loaded.
[   23.263318] rtc_cmos 00:07: RTC can wake from S4
[   23.263381] rtc_cmos 00:07: rtc core: registered rtc_cmos as rtc0
[   23.263436] rtc0: alarms up to one month, y3k, 242 bytes nvram
[   23.273904] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[   23.274025] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[   23.274212] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[   23.276733] parport_pc 00:0c: reported by Plug and Play ACPI
[   23.276795] parport0: PC-style at 0x378, irq 5 [PCSPP]
[   23.330406] e1000e 0000:00:19.0: eth0: (PCI Express:2.5GT/s:Width x1) 7c:05:07:0e:a0:b2
[   23.330408] e1000e 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
[   23.330444] e1000e 0000:00:19.0: eth0: MAC: 11, PHY: 12, PBA No: FFFFFF-0FF
[   23.330474] e1000e 0000:04:00.0: Disabling ASPM L0s L1
[   23.330477] ahci 0000:00:1f.2: version 3.0
[   23.330535] ahci 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
[   23.330622] ahci 0000:00:1f.2: get owner: 7ff0
[   23.330658] ahci 0000:00:1f.2: irq 1274 (274) for MSI/MSI-X
[   23.330702] e1000e 0000:04:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[   23.330712] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 5 ports 6 Gbps 0x2 impl SATA mode
[   23.330714] ahci 0000:00:1f.2: flags: 64bit ncq pm led clo pio slum part ems apst
[   23.330724] ahci 0000:00:1f.2: setting latency timer to 64
[   23.330748] e1000e 0000:04:00.0: setting latency timer to 64
[   23.330887] e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[   23.331012] e1000e 0000:04:00.0: get owner: 7ff0
[   23.331119] e1000e 0000:04:00.0: irq 1273 (273) for MSI/MSI-X
[   23.331120] e1000e 0000:04:00.0: get owner: 7ff0
[   23.331217] e1000e 0000:04:00.0: irq 1272 (272) for MSI/MSI-X
[   23.331219] e1000e 0000:04:00.0: get owner: 7ff0
[   23.331316] e1000e 0000:04:00.0: irq 1271 (271) for MSI/MSI-X
[   23.356728] 00:0a: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[   23.356991] 00:0b: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[   23.357303] scsi1 : ahci
[   23.360806] ACPI: SSDT cbad9c18 003D3 (v01  PmRef  Cpu0Cst 00003001 INTL 20051117)
[   23.361621] ACPI: SSDT cbad9618 005AA (v01  PmRef    ApIst 00003000 INTL 20051117)
[   23.361913] ACPI: SSDT cbad8d98 00119 (v01  PmRef    ApCst 00003000 INTL 20051117)
[   23.368734] scsi2 : ahci
[   23.379227] thermal LNXTHERM:01: registered as thermal_zone0
[   23.379233] ACPI: Thermal Zone [TZ00] (28 C)
[   23.379371] thermal LNXTHERM:02: registered as thermal_zone1
[   23.379376] ACPI: Thermal Zone [TZ01] (30 C)
[   23.386146] scsi3 : ahci
[   23.386812] input: Power Button as /class/input/input1
[   23.386821] ACPI: Power Button [PWRB]
[   23.386848] input: Power Button as /class/input/input2
[   23.386849] ACPI: Power Button [PWRF]
[   23.412559] scsi4 : ahci
[   23.423034] scsi5 : ahci
[   23.423214] ata1: DUMMY
[   23.423216] ata2: SATA max UDMA/133 abar m2048@0xf7e36000 port 0xf7e36180 irq 1274
[   23.423218] ata3: DUMMY
[   23.423218] ata4: DUMMY
[   23.423219] ata5: DUMMY
[   23.488122] e1000e 0000:04:00.0: eth1: (PCI Express:2.5GT/s:Width x1) 68:05:ca:09:f2:b0
[   23.488124] e1000e 0000:04:00.0: eth1: Intel(R) PRO/1000 Network Connection
[   23.488136] e1000e 0000:04:00.0: eth1: MAC: 3, PHY: 8, PBA No: E46981-007
[   23.701429] serial 0000:03:02.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
[   23.701750] 0000:03:02.0: ttyS2 at I/O 0xe040 (irq = 18) is a 16550A
[   23.712716] ACPI: Fan [FAN0] (off)
[   23.712739] ACPI: Fan [FAN1] (off)
[   23.712760] ACPI: Fan [FAN2] (off)
[   23.712782] ACPI: Fan [FAN3] (off)
[   23.712804] ACPI: Fan [FAN4] (off)
[   23.778912] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   23.780959] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   23.780962] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   23.780963] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   23.782563] ata2.00: ATAPI: Optiarc DVD RW AD-5260S, 1.00, max UDMA/100
[   23.784987] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[   23.784989] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[   23.784991] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[   23.786603] ata2.00: configured for UDMA/100
[   23.810591] scsi 2:0:0:0: CD-ROM            Optiarc  DVD RW AD-5260S  1.00 PQ: 0 ANSI: 5
[   24.080565] acpi device:53: registered as cooling_device13
[   24.080619] input: Video Bus as /class/input/input3
[   24.080622] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[   24.091733] Too many connections
[   24.200236] sd 0:0:0:0: Attached scsi generic sg0 type 0
[   24.200279] sd 0:0:0:1: Attached scsi generic sg1 type 0
[   24.200321] scsi 0:0:16:0: Attached scsi generic sg2 type 3
[   24.200354] scsi 2:0:0:0: Attached scsi generic sg3 type 5
[   24.321204] sr0: scsi3-mmc drive: 48x/48x writer cd/rw xa/form2 cdda tray
[   24.321206] Uniform CD-ROM driver Revision: 3.20
[   24.321291] sr 2:0:0:0: Attached scsi CD-ROM sr0
[   24.600239] usbcore: registered new interface driver usbserial
[   24.600245] USB Serial support registered for generic
[   24.600260] usbcore: registered new interface driver usbserial_generic
[   24.600261] usbserial: USB Serial Driver core
[   24.624363] USB Serial support registered for FTDI USB Serial Device
[   24.624385] ftdi_sio 1-1.1:1.0: FTDI USB Serial Device converter detected
[   24.624397] usb 1-1.1: Detected FT232BM
[   24.624398] usb 1-1.1: Number of endpoints 2
[   24.624398] usb 1-1.1: Endpoint 1 MaxPacketSize 64
[   24.624399] usb 1-1.1: Endpoint 2 MaxPacketSize 64
[   24.624400] usb 1-1.1: Setting MaxPacketSize 64
[   24.626957] ftdi_sio ttyUSB0: Unable to read latency timer: -32
[   24.627027] usb 1-1.1: FTDI USB Serial Device converter now attached to ttyUSB0
[   24.627034] usbcore: registered new interface driver hiddev
[   24.627043] usbcore: registered new interface driver ftdi_sio
[   24.627044] ftdi_sio: v1.5.0:USB FTDI Serial Converters Driver
[   24.629412] input: LITEON Technology USB Multimedia Keyboard as /class/input/input4
[   24.629462] generic-usb 0003:046D:C313.0001: input: USB HID v1.10 Keyboard [LITEON Technology USB Multimedia Keyboard] on usb-0000:00:1a.0-1.2/input0
[   24.632066] input: LITEON Technology USB Multimedia Keyboard as /class/input/input5
[   24.632112] generic-usb 0003:046D:C313.0002: input: USB HID v1.10 Device [LITEON Technology USB Multimedia Keyboard] on usb-0000:00:1a.0-1.2/input1
[   24.632122] usbcore: registered new interface driver usbhid
[   24.632123] usbhid: v2.6:USB HID core driver
[   26.780806] Non-volatile memory driver v1.3
[   26.814258] lp0: using parport0 (interrupt-driven).
[   27.181481] md: Autodetecting RAID arrays.
[   27.181487] md: Scanned 0 and added 0 devices.
[   27.181490] md: autorun ...
[   27.181492] md: ... autorun DONE.
[   27.592946] EXT3 FS on sda1, internal journal
[   27.693968] ISO 9660 Extensions: Microsoft Joliet Level 3
[   27.721014] ISO 9660 Extensions: RRIP_1991A
[   29.122667] Adding 524280k swap on /var/swap/swap.001.  Priority:-1 extents:149 across:1024864k
[   30.060497] Microcode Update Driver: v2.00-xen <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[   30.118808] Microcode Update Driver: v2.00-xen removed.
[   31.609507] openvswitch_mod: Open vSwitch switching datapath 1.4.6, built Jun 14 2013 09:23:22
[   32.219454] ip_tables: (C) 2000-2006 Netfilter Core Team
[   33.242878] device xenbr1 entered promiscuous mode
[   33.394197] device eth1 entered promiscuous mode
[   34.541604] RPC: Registered udp transport module.
[   34.541606] RPC: Registered tcp transport module.
[   34.541607] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   36.475617] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   75.935592] warning: `ntpdate' uses 32-bit capabilities (legacy support in use)
[   80.400227] suspend: event channel 42
[   86.244189] device xenbr0 entered promiscuous mode
[   86.354851] e1000e 0000:00:19.0: get owner: 7ff0
[   86.354911] e1000e 0000:00:19.0: get owner: 7ff0
[   86.354939] e1000e 0000:00:19.0: irq 1275 (275) for MSI/MSI-X
[   86.465037] e1000e 0000:00:19.0: get owner: 7ff0
[   86.465098] e1000e 0000:00:19.0: get owner: 7ff0
[   86.465126] e1000e 0000:00:19.0: irq 1275 (275) for MSI/MSI-X
[   86.469989] device eth0 entered promiscuous mode
[   89.976206] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

[-- Attachment #3: lspci.txt --]
[-- Type: text/plain, Size: 1577 bytes --]

00:00.0 Host bridge: Intel Corporation Haswell DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Haswell PCI Express x16 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Haswell Integrated Graphics Controller (rev 06)
00:14.0 USB controller: Intel Corporation Lynx Point USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation Lynx Point MEI Controller #1 (rev 04)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-V (rev 04)
00:1a.0 USB controller: Intel Corporation Lynx Point USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation Lynx Point High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation Lynx Point PCI Express Root Port #1 (rev d4)
00:1c.2 PCI bridge: Intel Corporation Lynx Point PCI Express Root Port #3 (rev d4)
00:1d.0 USB controller: Intel Corporation Lynx Point USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation Lynx Point LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation Lynx Point 6-port SATA Controller 1 [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation Lynx Point SMBus Controller (rev 04)
01:00.0 RAID bus controller: Areca Technology Corp. ARC-1680 8 port PCIe/PCI-X to SAS/SATA II RAID Controller
02:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 41)
03:02.0 Serial controller: MosChip Semiconductor Technology Ltd. PCI 9835 Multi-I/O Controller (rev 01)
04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

[-- Attachment #4: lspci-vv.txt --]
[-- Type: text/plain, Size: 29251 bytes --]

[root@localhost ~]# lspci -vv
00:00.0 Host bridge: Intel Corporation Haswell DRAM Controller (rev 06)
        Subsystem: Intel Corporation Device 2049
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
        Latency: 0
        Capabilities: [e0] Vendor Specific Information: Len=0c <?>

00:01.0 PCI bridge: Intel Corporation Haswell PCI Express x16 Controller (rev 06) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        Memory behind bridge: f7d00000-f7dfffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [88] Subsystem: Intel Corporation Device 2049
        Capabilities: [80] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee002f8  Data: 0000
        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #2, Speed unknown, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <8us
                        ClockPM- Surprise- LLActRep- BwNot+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet- LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis- ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
                LnkCtl2: Target Link Speed: Unknown, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [140 v1] Root Complex Link
                Desc:   PortNumber=02 ComponentID=01 EltType=Config
                Link0:  Desc:   TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+
                        Addr:   00000000fed19000
        Capabilities: [d94 v1] #19
        Kernel driver in use: pcieport

00:02.0 VGA compatible controller: Intel Corporation Haswell Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
        Subsystem: Intel Corporation Device 2049
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at f000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a4] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-

00:14.0 USB controller: Intel Corporation Lynx Point USB xHCI Host Controller (rev 04) (prog-if 30 [XHCI])
        Subsystem: Intel Corporation Device 2049
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at f7e20000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [70] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [80] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000

00:16.0 Communication controller: Intel Corporation Lynx Point MEI Controller #1 (rev 04)
        Subsystem: Intel Corporation Device 2049
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at f7e3b000 (64-bit, non-prefetchable) [size=16]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-V (rev 04)
        Subsystem: Intel Corporation Device 2049
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 1275
        Region 0: Memory at f7e00000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at f7e39000 (32-bit, non-prefetchable) [size=4K]
        Region 2: I/O ports at f080 [size=32]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00378  Data: 0000
        Capabilities: [e0] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: e1000e
        Kernel modules: e1000e

00:1a.0 USB controller: Intel Corporation Lynx Point USB Enhanced Host Controller #2 (rev 04) (prog-if 20 [EHCI])
        Subsystem: Intel Corporation Device 2049
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at f7e38000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Debug port: BAR=1 offset=00a0
        Capabilities: [98] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP+
        Kernel driver in use: ehci_hcd
        Kernel modules: ehci-hcd

00:1b.0 Audio device: Intel Corporation Lynx Point High Definition Audio Controller (rev 04)
        Subsystem: Intel Corporation Device 2049
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 22
        Region 0: Memory at f7e30000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [60] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v1) Root Complex Integrated Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- RBE- FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed unknown, Width x0, ASPM unknown, Latency L0 <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
                VC1:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=2 ArbSelect=Fixed TC/VC=04
                        Status: NegoPending- InProgress-
        Kernel driver in use: HDA Intel
        Kernel modules: snd-hda-intel

00:1c.0 PCI bridge: Intel Corporation Lynx Point PCI Express Root Port #1 (rev d4) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=02, subordinate=03, sec-latency=0
        I/O behind bridge: 0000e000-0000efff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <1us, L1 <16us
                        ClockPM- Surprise- LLActRep+ BwNot+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 10.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet- LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+ ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee00318  Data: 0000
        Capabilities: [90] Subsystem: Intel Corporation Device 2049
        Capabilities: [a0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: pcieport

00:1c.2 PCI bridge: Intel Corporation Lynx Point PCI Express Root Port #3 (rev d4) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: f7c00000-f7cfffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #3, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <16us
                        ClockPM- Surprise- LLActRep+ BwNot+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #2, PowerLimit 10.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet- LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABC, TimeoutDis+ ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee00338  Data: 0000
        Capabilities: [90] Subsystem: Intel Corporation Device 2049
        Capabilities: [a0] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Kernel driver in use: pcieport

00:1d.0 USB controller: Intel Corporation Lynx Point USB Enhanced Host Controller #1 (rev 04) (prog-if 20 [EHCI])
        Subsystem: Intel Corporation Device 2049
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 23
        Region 0: Memory at f7e37000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [50] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Debug port: BAR=1 offset=00a0
        Capabilities: [98] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: ehci_hcd
        Kernel modules: ehci-hcd

00:1f.0 ISA bridge: Intel Corporation Lynx Point LPC Controller (rev 04)
        Subsystem: Intel Corporation Device 2049
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Capabilities: [e0] Vendor Specific Information: Len=0c <?>

00:1f.2 SATA controller: Intel Corporation Lynx Point 6-port SATA Controller 1 [AHCI mode] (rev 04) (prog-if 01 [AHCI 1.0])
        Subsystem: Intel Corporation Device 2049
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin B routed to IRQ 1274
        Region 0: I/O ports at f0d0 [size=8]
        Region 1: I/O ports at f0c0 [size=4]
        Region 2: I/O ports at f0b0 [size=8]
        Region 3: I/O ports at f0a0 [size=4]
        Region 4: I/O ports at f060 [size=32]
        Region 5: Memory at f7e36000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee00398  Data: 0000
        Capabilities: [70] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004
        Kernel driver in use: ahci
        Kernel modules: ahci

00:1f.3 SMBus: Intel Corporation Lynx Point SMBus Controller (rev 04)
        Subsystem: Intel Corporation Device 2049
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin C routed to IRQ 11
        Region 0: Memory at f7e35000 (64-bit, non-prefetchable) [size=256]
        Region 4: I/O ports at f040 [size=32]

01:00.0 RAID bus controller: Areca Technology Corp. ARC-1680 8 port PCIe/PCI-X to SAS/SATA II RAID Controller
        Subsystem: Areca Technology Corp. ARC-1212 4-Port PCIe to SAS/SATA II RAID Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 1276
        Region 0: Memory at f7d10000 (32-bit, non-prefetchable) [size=8K]
        Expansion ROM at f7d00000 [disabled] [size=64K]
        Capabilities: [98] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a0] MSI: Enable+ Count=1/2 Maskable- 64bit+
                Address: 00000000fee00358  Data: 0000
        Capabilities: [d0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 256 bytes
                DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM unknown, Latency L0 <128ns, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Kernel driver in use: arcmsr
        Kernel modules: arcmsr

02:00.0 PCI bridge: Integrated Technology Express, Inc. Device 8892 (rev 41) (prog-if 01 [Subtractive decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=02, secondary=03, subordinate=03, sec-latency=32
        I/O behind bridge: 0000e000-0000efff
        Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr+ DiscTmrStat- DiscTmrSERREn-
        Capabilities: [90] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=55mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a0] Subsystem: Intel Corporation Device 2049

03:02.0 Serial controller: MosChip Semiconductor Technology Ltd. PCI 9835 Multi-I/O Controller (rev 01) (prog-if 02 [16550])
        Subsystem: LSI Logic / Symbios Logic 2S (16C550 UART)
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 18
        Region 0: I/O ports at e050 [size=8]
        Region 1: I/O ports at e040 [size=8]
        Region 2: I/O ports at e030 [size=8]
        Region 3: I/O ports at e020 [size=8]
        Region 4: I/O ports at e010 [size=8]
        Region 5: I/O ports at e000 [size=16]
        Kernel driver in use: serial
        Kernel modules: parport_serial, 8250_pci

04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
        Subsystem: Intel Corporation Gigabit CT Desktop Adapter
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at f7cc0000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at f7c00000 (32-bit, non-prefetchable) [size=512K]
        Region 2: I/O ports at d000 [size=32]
        Region 3: Memory at f7ce0000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at f7c80000 [disabled] [size=256K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <128ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable+ Count=5 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Device Serial Number 68-05-ca-ff-ff-09-f2-b0
        Kernel driver in use: e1000e
        Kernel modules: e1000e

[-- Attachment #5: proc_interrupts.txt --]
[-- Type: text/plain, Size: 3556 bytes --]

            CPU0       CPU1       CPU2       CPU3
   1:          2          0          0          0      Phys-fasteoi   i8042
   3:          2          0          0          0      Phys-fasteoi
   4:          2          0          0          0      Phys-fasteoi
   5:          0          0          0          0      Phys-fasteoi   parport0
   8:          0          0          0          0      Phys-fasteoi   rtc0
   9:          0          0          0          0      Phys-fasteoi   acpi
  16:   21540272    2019848      10967      36641      Phys-fasteoi   ehci_hcd:usb1
  18:    1136455          0          0          0      Phys-fasteoi   serial
  23:        722          0          0          0      Phys-fasteoi   ehci_hcd:usb2
1271:          2          0          0          0      Phys-fasteoi   eth1
1272:     366278    1042267      53929          0      Phys-fasteoi   eth1-tx-0
1273:     228149    2202205      61538      31428      Phys-fasteoi   eth1-rx-0
1274:     478864          0          0          0      Phys-fasteoi   ahci
1275:     534255     355514          0       2135      Phys-fasteoi   eth0
1276:    1290076     614329          0          0      Phys-fasteoi   arcmsr
1280:    8049519    6344980    6955891    6191609   Dynamic-percpu    timer
1281:    1934700    2150070   13180166    5099824   Dynamic-percpu    resched
1282:        172        649        646        629   Dynamic-percpu    callfunc
1283:     179585     377636     463599     314766   Dynamic-percpu    call1func
1284:          0          0          0          0   Dynamic-percpu    reboot
1285:      58111      43898      42617      40707   Dynamic-percpu    spinlock
1286:          0          0          0          0   Dynamic-fasteoi   mce
1287:          0          0          0          0   Dynamic-fasteoi   console
1288:       3380        888          0          0   Dynamic-fasteoi   xenbus
1289:          0          0          0          0   Dynamic-fasteoi   suspend
1290:      23725      31039        745         31   Dynamic-fasteoi   blkif-backend
1291:       3348       4493        127         12   Dynamic-fasteoi   blkif-backend
1292:      11833      19939        579         19   Dynamic-fasteoi   vif1.0
1293:     616790     545740      66804      20724   Dynamic-fasteoi   vif2.0
1294:     143273     134973      38705      13997   Dynamic-fasteoi   vif2.1
1295:     375537     209553      17189        248   Dynamic-fasteoi   vif2.2
1296:     102045      83304       7994         83   Dynamic-fasteoi   blkif-backend
1297:     283886     314708       6045        183   Dynamic-fasteoi   blkif-backend
1298:     419807     356574      29085        321   Dynamic-fasteoi   vif5.0
1299:     371863     452805     121437       5079   Dynamic-fasteoi   vif3.0
1300:     100407      94982       4729        230   Dynamic-fasteoi   blkif-backend
1301:      27931      33386      20341         19   Dynamic-fasteoi   vif4.0
1302:     209051     156930       6768          0   Dynamic-fasteoi   blkif-backend
1303:     213914     156507      55802          0   Dynamic-fasteoi   vif6.0
1304:      26383      32040       7129          0   Dynamic-fasteoi   blkif-backend
 NMI:          0          0          0          0   Non-maskable interrupts
 RES:    1934700    2150071   13180167    5099825   Rescheduling interrupts
 CAL:     179757     378285     464245     315395   Function call interrupts
 MCE:          0          0          0          0   Machine check exceptions
 MCP:          1          1          1          1   Machine check polls

[-- Attachment #6: xen-dmesg.txt --]
[-- Type: text/plain, Size: 11725 bytes --]

[root@localhost log]# cat xen-dmesg
 __  __            _  _    _   ____         _      _                 ____
 \ \/ /___ _ __   | || |  / | | ___|     __| | ___| |__  _   _  __ _| ___|
  \  // _ \ '_ \  | || |_ | | |___ \ __ / _` |/ _ \ '_ \| | | |/ _` |___ \
  /  \  __/ | | | |__   _|| |_ ___) |__| (_| |  __/ |_) | |_| | (_| |___) |
 /_/\_\___|_| |_|    |_|(_)_(_)____/    \__,_|\___|_.__/ \__,_|\__, |____/
                                                               |___/
(XEN) Xen version 4.1.5-debug5 (andrewcoop@uk.xensource.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) Fri Aug  9 14:43:57 EDT 2013
(XEN) Latest ChangeSet: Fri Aug 09 19:33:04 2013 +0100 23792:19aca3f08703
(XEN) Bootloader: SYSLINUX 4.06 0x51a10931
(XEN) Command line: com1=115200,8n1,0xe050,0 console=com1,vga mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M cpuid_mask_xsave_eax=0
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 2 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009d800 (usable)
(XEN)  000000000009d800 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 00000000b857f000 (usable)
(XEN)  00000000b857f000 - 00000000b8586000 (ACPI NVS)
(XEN)  00000000b8586000 - 00000000cb8dc000 (usable)
(XEN)  00000000cb8dc000 - 00000000cbae4000 (reserved)
(XEN)  00000000cbae4000 - 00000000cbaf9000 (ACPI data)
(XEN)  00000000cbaf9000 - 00000000cbc66000 (ACPI NVS)
(XEN)  00000000cbc66000 - 00000000cbfff000 (reserved)
(XEN)  00000000cbfff000 - 00000000cc000000 (usable)
(XEN)  00000000d7000000 - 00000000df200000 (reserved)
(XEN)  00000000f8000000 - 00000000fc000000 (reserved)
(XEN)  00000000fec00000 - 00000000fec01000 (reserved)
(XEN)  00000000fed00000 - 00000000fed04000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
(XEN)  00000000fee00000 - 00000000fee01000 (reserved)
(XEN)  00000000ff000000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 000000081fe00000 (usable)
(XEN) Kdump: 64MB (65536kB) at 0x2000000
(XEN) ACPI: RSDP 000F0490, 0024 (r2  INTEL)
(XEN) ACPI: XSDT CBAE8078, 0074 (r1 INTEL  DH87MC         2F AMI     10013)
(XEN) ACPI: FACP CBAF3EF0, 010C (r5 INTEL  DH87MC         2F AMI     10013)
(XEN) ACPI Warning (tbfadt-0232): FADT (revision 5) is longer than ACPI 2.0 version, truncating length 0x10C to 0xF4 [20070126]
(XEN) ACPI: DSDT CBAE8180, BD6E (r2 INTEL  DH87MC         2F INTL 20091112)
(XEN) ACPI: FACS CBC64080, 0040
(XEN) ACPI: APIC CBAF4000, 0072 (r3 INTEL  DH87MC         2F AMI     10013)
(XEN) ACPI: FPDT CBAF4078, 0044 (r1 INTEL  DH87MC         2F AMI     10013)
(XEN) ACPI: SSDT CBAF40C0, 0539 (r1 INTEL  DH87MC         2F INTL 20051117)
(XEN) ACPI: SSDT CBAF4600, 0AD8 (r1 INTEL  DH87MC         2F INTL 20051117)
(XEN) ACPI: MCFG CBAF50D8, 003C (r1 INTEL  DH87MC         2F MSFT       97)
(XEN) ACPI: HPET CBAF5118, 0038 (r1 INTEL  DH87MC         2F AMI.        5)
(XEN) ACPI: SSDT CBAF5150, 036D (r1 INTEL  DH87MC         2F INTL 20091112)
(XEN) ACPI: SSDT CBAF54C0, 2EDB (r1 INTEL  DH87MC         2F INTL 20091112)
(XEN) ACPI: DMAR CBAF83A0, 00B8 (r1 INTEL  DH87MC         2F INTL        1)
(XEN) System RAM: 32438MB (33216972kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-000000081fe00000
(XEN) Domain heap initialised DMA width 32 bits
(XEN) found SMP MP-table at 000fd740
(XEN) DMI 2.7 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x1808
(XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[1804,0], pm1x_evt[1800,0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT - cbc64080/0000000000000000, using 32
(XEN) ACPI:                  wakeup_vec[cbc6408c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) Processor #0 7:12 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) Processor #2 7:12 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled)
(XEN) Processor #4 7:12 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled)
(XEN) Processor #6 7:12 APIC version 21
(XEN) ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a701 base: 0xfed00000
(XEN) PCI: MCFG configuration 0: base f8000000 segment 0 buses 0 - 63
(XEN) PCI: MCFG area at f8000000 reserved in E820
(XEN) ERST table was not found
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 4 CPUs (0 hotplug CPUs)
(XEN) IRQ limits: 24 GSI, 760 MSI/MSI-X
(XEN) Switched to APIC driver x2apic_cluster.
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 3392.247 MHz processor.
(XEN) Initing memory sharing.
(XEN) mce_intel.c:1162: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 extended MCE MSR 0
(XEN) Intel machine check reporting enabled
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB.
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Shared EPT tables not enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Interrupt remapping enabled
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) TSC deadline timer enabled
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 64 KiB.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB
(XEN) Brought up 4 CPUs
(XEN) Testing NMI watchdog --- CPU#0 okay. CPU#1 okay. CPU#2 okay. CPU#3 okay.
(XEN) ACPI sleep modes: S3
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) elf_parse_binary: phdr: paddr=0x100000 memsz=0x3fd000
(XEN) elf_parse_binary: phdr: paddr=0x4fd000 memsz=0x28a000
(XEN) elf_parse_binary: memory: 0x100000 -> 0x787000
(XEN) elf_xen_parse_note: GUEST_OS = "linux"
(XEN) elf_xen_parse_note: GUEST_VERSION = "2.6"
(XEN) elf_xen_parse_note: XEN_VERSION = "xen-3.0"
(XEN) elf_xen_parse_note: VIRT_BASE = 0xc0000000
(XEN) elf_xen_parse_note: PADDR_OFFSET = 0x0
(XEN) elf_xen_parse_note: ENTRY = 0xc0100000
(XEN) elf_xen_parse_note: HYPERCALL_PAGE = 0xc0101000
(XEN) elf_xen_parse_note: HV_START_LOW = 0xf5800000
(XEN) elf_xen_parse_note: FEATURES = "writable_page_tables|writable_descriptor_tables|auto_translated_physmap|pae_pgdir_above_4gb|supervisor_mode_kernel"
(XEN) elf_xen_parse_note: PAE_MODE = "yes"
(XEN) elf_xen_parse_note: unknown xen elf note (0xd)
(XEN) elf_xen_parse_note: LOADER = "generic"
(XEN) elf_xen_parse_note: SUSPEND_CANCEL = 0x1
(XEN) elf_xen_addr_calc_check: addresses:
(XEN)     virt_base        = 0xc0000000
(XEN)     elf_paddr_offset = 0x0
(XEN)     virt_offset      = 0xc0000000
(XEN)     virt_kstart      = 0xc0100000
(XEN)     virt_kend        = 0xc0787000
(XEN)     virt_entry       = 0xc0100000
(XEN)     p2m_base         = 0xffffffffffffffff
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 32-bit, PAE, lsb, paddr 0x100000 -> 0x787000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   00000000b9000000->00000000ba000000 (186390 pages to be allocated)
(XEN)  Init. ramdisk: 000000081f616000->000000081fdff200
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: 00000000c0100000->00000000c0787000
(XEN)  Init. ramdisk: 00000000c0787000->00000000c0f70200
(XEN)  Phys-Mach map: 00000000c0f71000->00000000c102d000
(XEN)  Start info:    00000000c102d000->00000000c102d4b4
(XEN)  Page tables:   00000000c102e000->00000000c103e000
(XEN)  Boot stack:    00000000c103e000->00000000c103f000
(XEN)  TOTAL:         00000000c0000000->00000000c1400000
(XEN)  ENTRY ADDRESS: 00000000c0100000
(XEN) Dom0 has maximum 4 VCPUs
(XEN) elf_load_binary: phdr 0 at 0x3222274048 -> 0x3226456064
(XEN) elf_load_binary: phdr 1 at 0x3226456064 -> 0x3227783168
(XEN) Scrubbing Free RAM: ........................................................................................................................................................................................................................................................................................................................done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 276kB init memory.
(XEN) __csched_vcpu_acct_start: setting dom 0 as the privileged domain
(XEN) PCI add device 00:00.0
(XEN) PCI add device 00:01.0
(XEN) PCI add device 00:02.0
(XEN) PCI add device 00:14.0
(XEN) PCI add device 00:16.0
(XEN) PCI add device 00:19.0
(XEN) PCI add device 00:1a.0
(XEN) PCI add device 00:1b.0
(XEN) PCI add device 00:1c.0
(XEN) PCI add device 00:1c.2
(XEN) PCI add device 00:1d.0
(XEN) PCI add device 00:1f.0
(XEN) PCI add device 00:1f.2
(XEN) PCI add device 00:1f.3
(XEN) PCI add device 01:00.0
(XEN) PCI add device 02:00.0
(XEN) PCI add device 03:02.0
(XEN) PCI add device 04:00.0
(XEN) allocated vector b0 for irq 16
(XEN) allocated vector b8 for irq 18
(XEN) PCI add device 00:01.0
(XEN) PCI add device 00:1c.0
(XEN) PCI add device 00:1c.2
(XEN) PCI add device 00:1a.0
(XEN) PCI add device 00:1d.0
(XEN) allocated vector d8 for irq 23
(XEN) PCI add device 01:00.0
(XEN) PCI add device 00:19.0
(XEN) allocated vector 29 for irq 20
(XEN) PCI add device 04:00.0
(XEN) PCI add device 00:1f.2
(XEN) allocated vector 39 for irq 19
(XEN) cpuid.MWAIT[.eax=40, .ebx=40, .ecx=3, .edx=42120]
(XEN) Monitor-Mwait will be used to enter C-1 state
(XEN) cpuid.MWAIT[.eax=40, .ebx=40, .ecx=3, .edx=42120]
(XEN) Monitor-Mwait will be used to enter C-2 state
(XEN) cpuid.MWAIT[.eax=40, .ebx=40, .ecx=3, .edx=42120]
(XEN) Monitor-Mwait will be used to enter C-1 state
(XEN) cpuid.MWAIT[.eax=40, .ebx=40, .ecx=3, .edx=42120]
(XEN) Monitor-Mwait will be used to enter C-2 state
(XEN) cpuid.MWAIT[.eax=40, .ebx=40, .ecx=3, .edx=42120]
(XEN) Monitor-Mwait will be used to enter C-1 state
(XEN) cpuid.MWAIT[.eax=40, .ebx=40, .ecx=3, .edx=42120]
(XEN) Monitor-Mwait will be used to enter C-2 state
(XEN) cpuid.MWAIT[.eax=40, .ebx=40, .ecx=3, .edx=42120]
(XEN) Monitor-Mwait will be used to enter C-1 state
(XEN) cpuid.MWAIT[.eax=40, .ebx=40, .ecx=3, .edx=42120]
(XEN) Monitor-Mwait will be used to enter C-2 state
(XEN) no cpu_id for acpi_id 5
(XEN) no cpu_id for acpi_id 6
(XEN) no cpu_id for acpi_id 7
(XEN) no cpu_id for acpi_id 8
(XEN) PCI add device 03:02.0
(XEN) PCI add device 00:1b.0
(XEN) allocated vector 61 for irq 22
(XEN) paging.c:732:d0 Tried to do a paging op on itself.
(XEN) paging.c:732:d0 Tried to do a paging op on itself.

[-- Attachment #7: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12 11:52                       ` Thimo E
@ 2013-08-12 12:04                         ` Andrew Cooper
  2013-08-19 15:14                           ` Thimo E.
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-08-12 12:04 UTC (permalink / raw)
  To: Thimo E
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Yang Z, Zhang, Xiantao


[-- Attachment #1.1: Type: text/plain, Size: 2916 bytes --]

On 12/08/13 12:52, Thimo E wrote:
> Hello Yang,
>
> attached you'll find the kernel dmesg, xen dmesg, lspci and output of
> /proc/interrupts. If you want to see further logfiles, please let me know.
>
> The processor is a Core i5-4670. The board is an Intel  DH87MC
> Mainboard. I am really not sure if it supports APICv, but VT-d is
> supported enabled enabled.
>
>
>> 4.       The status of IRQ 29 is 10 which means the guest already
>> issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so
>> there should be no pending EOI in the EOI stack. If possible, can you
>> add some debug message in the guest EOI code path(like
>> _irq_guest_eoi())) to track the EOI?
>>
> I don't see the IRQ29 in /proc/interrupts, what I see is:
> cat xen-dmesg.txt |grep "29": (XEN) allocated vector 29 for irq 20
> cat dmesg.txt | grep "eth0": [   23.152355] e1000e 0000:00:19.0: PCI
> INT A -> GSI 20 (level, low) -> IRQ 20
>                                                   [   23.330408]
> e1000e 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
>
> So is the ethernet irq the bad one ? That is an Onboard Intel network
> adapter.

That would be consistent with the crash seen with our hardware in XenServer

>
>> 6.       I guess the interrupt remapping is enabled in your machine.
>> Can you try to disable IR to see whether it still reproduceable?
>>
>>  
>>
> Just to be sure, your proposal is to try the parameter "no-intremap" ?

specifically, iommu=no-intremap

>
> Best regards
>   Thimo

~Andrew

>
> Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>>
>> Hi Thimo,
>>
>> From your previous experience and log, it shows:
>>
>> 1.       The interrupt that triggers the issue is a MSI.
>>
>> 2.       MSI are treated as edge-triggered interrupts nomally, except
>> when there is no way to mask the device. In this case, your previous
>> log indicates the device is unmaskable(What special device are you
>> using?Modern PCI devcie should be maskable).
>>
>> 3.       The IRQ 29 is belong to dom0, it seems it is not a HVM
>> related issue.
>>
>> 4.       The status of IRQ 29 is 10 which means the guest already
>> issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so
>> there should be no pending EOI in the EOI stack. If possible, can you
>> add some debug message in the guest EOI code path(like
>> _irq_guest_eoi())) to track the EOI?
>>
>> 5.       Both of the log show when the issue occured, most of the
>> other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status.
>> Is it a coincidence? Or it happened only on the special condition
>> like heavy of IRQ migration?Perhaps you can disable irq balance in
>> dom0 and pin the IRQ manually.
>>
> |6.       I guess the interrupt remapping is enabled in your machine.
> Can you try to disable IR to see whether it still reproduceable?
>>
>> Also, please provide the whole Xen log.
>>
>>  
>>
>> Best regards,
>>
>> Yang
>>
>


[-- Attachment #1.2: Type: text/html, Size: 16253 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12  8:49                     ` Zhang, Yang Z
  2013-08-12  8:57                       ` Jan Beulich
  2013-08-12 11:52                       ` Thimo E
@ 2013-08-12 13:54                       ` Thimo E
  2013-08-12 14:06                         ` Andrew Cooper
  2013-08-13 11:39                         ` Wu, Feng
  2 siblings, 2 replies; 63+ messages in thread
From: Thimo E @ 2013-08-12 13:54 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao


[-- Attachment #1.1: Type: text/plain, Size: 1869 bytes --]

Hello Yang,

and attached the next crash dump which occured today, only some minutes 
after I've created the logfiles I've sent in the mail just before.
Perhaps together with the logfiles of the former mail it gives you a 
better understand of what is going on.

I've disabled Interrupt remapping now.

> 4.....
 > can you add some debug message in the guest EOI code path(like 
_irq_guest_eoi())) to track the EOI?
@Andrew: Is it possible for you to integrate the requested changes from 
Yang into your Xen debugging version ?

Best regards
   Thimo

Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>
> Hi Thimo,
>
> From your previous experience and log, it shows:
>
> 1.The interrupt that triggers the issue is a MSI.
>
> 2.MSI are treated as edge-triggered interrupts nomally, except when 
> there is no way to mask the device. In this case, your previous log 
> indicates the device is unmaskable(What special device are you 
> using?Modern PCI devcie should be maskable).
>
> 3.The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.
>
> 4.The status of IRQ 29 is 10 which means the guest already issues the 
> EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should 
> be no pending EOI in the EOI stack. If possible, can you add some 
> debug message in the guest EOI code path(like _irq_guest_eoi())) to 
> track the EOI?
>
> 5.Both of the log show when the issue occured, most of the other 
> interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it 
> a coincidence? Or it happened only on the special condition like heavy 
> of IRQ migration?Perhaps you can disable irq balance in dom0 and pin 
> the IRQ manually.
>
|6.I guess the interrupt remapping is enabled in your machine. Can you 
try to disable IR to see whether it still reproduceable?
>
> Also, please provide the whole Xen log.
>
> Best regards,
>
> Yang
>


[-- Attachment #1.2: Type: text/html, Size: 11804 bytes --]

[-- Attachment #2: crash20130812.txt --]
[-- Type: text/plain, Size: 11060 bytes --]

(XEN) **Pending EOI error^M
(XEN)   irq 29, vector 0x26^M
(XEN)   s[0] irq 29, vec 0x26, ready 0, ISR 00000001, TMR 00000000, IRR 00000000^M
(XEN) All LAPIC state:^M
(XEN) [vector]      ISR      TMR      IRR^M
(XEN) [1f:00] 00000000 00000000 00000000^M
(XEN) [3f:20] 00000040 74fe50aa 00000000^M
(XEN) [5f:40] 00000000 ecbaeed2 00000000^M
(XEN) [7f:60] 00000000 be38f2d6 00000000^M
(XEN) [9f:80] 00000000 f0bc768a 00000000^M
(XEN) [bf:a0] 00000000 fadff8dc 00000000^M
(XEN) [df:c0] 00000000 df20fe40 00000000^M
(XEN) [ff:e0] 00000000 00000000 00000000^M
(XEN) Peoi stack trace records:^M
(XEN)   Pushed {sp 0, irq 29, vec 0x26}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x26}^M
(XEN)   Marked {sp 0, irq 29, vec 0x26} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x26}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x26}^M
(XEN)   Marked {sp 0, irq 29, vec 0x26} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x26}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x26}^M
(XEN)   Marked {sp 0, irq 29, vec 0x26} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x26}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x26}^M
(XEN)   Marked {sp 0, irq 29, vec 0x26} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x26}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x26}^M
(XEN)   Marked {sp 0, irq 29, vec 0x26} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x26}^M
(XEN)   Poped entry {sp 1, irq 30, vec 0x45}^M
(XEN)   Marked {sp 0, irq 30, vec 0x4d} ready^M
(XEN)   Pushed {sp 0, irq 30, vec 0x45}^M
(XEN)   Poped entry {sp 1, irq 30, vec 0x45}^M
(XEN)   Marked {sp 0, irq 30, vec 0x45} ready^M
(XEN)   Pushed {sp 0, irq 30, vec 0x45}^M
(XEN)   Poped entry {sp 1, irq 30, vec 0x45}^M
(XEN)   Marked {sp 0, irq 30, vec 0x45} ready^M
(XEN)   Pushed {sp 0, irq 30, vec 0x45}^M
(XEN)   Poped entry {sp 1, irq 30, vec 0x45}^M
(XEN)   Marked {sp 0, irq 30, vec 0x45} ready^M
(XEN)   Pushed {sp 0, irq 30, vec 0x45}^M
(XEN)   Poped entry {sp 1, irq 30, vec 0x45}^M
(XEN)   Marked {sp 0, irq 30, vec 0x45} ready^M
(XEN)   Pushed {sp 0, irq 30, vec 0x45}^M
(XEN)   Poped entry {sp 1, irq 30, vec 0xd3}^M
(XEN) Guest interrupt information:^M
(XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge    status=00000000 mapped, unbound^M
(XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge    status=00000054 in-flight=0 domain-list=0:  1(----),^M
(XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC          status=00000000 mapped, unbound^M
(XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  5(----),^M
(XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(----),^M
(XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0:  9(----),^M
(XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge    status=00000002 mapped, unbound^M
(XEN)    IRQ:  16 affinity:8 vec:dd type=IO-APIC-level   status=00000010 in-flight=1 domain-list=0: 16(PS-M),^M
(XEN)    IRQ:  18 affinity:8 vec:2e type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 18(-S--),^M
(XEN)    IRQ:  19 affinity:1 vec:39 type=IO-APIC-level   status=00000002 mapped, unbound^M
(XEN)    IRQ:  20 affinity:1 vec:29 type=IO-APIC-level   status=00000002 mapped, unbound^M
(XEN)    IRQ:  22 affinity:8 vec:d5 type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 22(-S--),^M
(XEN)    IRQ:  23 affinity:2 vec:9b type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 23(-S--),^M
(XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI         status=00000000 mapped, unbound^M
(XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI         status=00000000 mapped, unbound^M
(XEN)    IRQ:  26 affinity:f vec:c0 type=PCI-MSI         status=00000002 mapped, unbound^M
(XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI         status=00000002 mapped, unbound^M
(XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI         status=00000002 mapped, unbound^M
(XEN)    IRQ:  29 affinity:1 vec:26 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:276(-S--),^M
(XEN)    IRQ:  30 affinity:4 vec:cd type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:275(-S--),^M
(XEN)    IRQ:  31 affinity:1 vec:24 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:274(-S--),^M
(XEN)    IRQ:  32 affinity:2 vec:2c type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(-S--),^M
(XEN)    IRQ:  33 affinity:8 vec:7b type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:272(-S--),^M
(XEN)    IRQ:  34 affinity:1 vec:59 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:271(-S--),^M
(XEN) IO-APIC interrupt information:^M
(XEN)     IRQ  0 Vec240:^M
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ  1 Vec 56:^M
(XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ  3 Vec 64:^M
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ  4 Vec 72:^M
(XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ  5 Vec 80:^M
(XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ  6 Vec 88:^M
(XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ  7 Vec 96:^M
(XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ  8 Vec104:^M
(XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ  9 Vec112:^M
(XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=L mask=0 dest_id:0^M
(XEN)     IRQ 10 Vec120:^M
(XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ 11 Vec136:^M
(XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ 12 Vec144:^M
(XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ 13 Vec152:^M
(XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ 14 Vec160:^M
(XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ 15 Vec168:^M
(XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0^M
(XEN)     IRQ 16 Vec221:^M
(XEN)       Apic 0x00, Pin 16: vec=dd delivery=LoPri dest=L status=1 polarity=1 irr=1 trig=L mask=0 dest_id:0^M
(XEN)     IRQ 18 Vec 46:^M
(XEN)       Apic 0x00, Pin 18: vec=2e delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0^M
(XEN)     IRQ 19 Vec 57:^M
(XEN)       Apic 0x00, Pin 19: vec=39 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0^M
(XEN)     IRQ 20 Vec 41:^M
(XEN)       Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0^M
(XEN)     IRQ 22 Vec213:^M
(XEN)       Apic 0x00, Pin 22: vec=d5 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0^M
(XEN)     IRQ 23 Vec155:^M
(XEN)       Apic 0x00, Pin 23: vec=9b delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0^M
(XEN) number of MP IRQ sources: 15.^M
(XEN) number of IO-APIC #2 registers: 24.^M
(XEN) testing the IO APIC.......................^M
(XEN) IO APIC #2......^M
(XEN) .... register #00: 02000000^M
(XEN) .......    : physical APIC id: 02^M
(XEN) .......    : Delivery Type: 0^M
(XEN) .......    : LTS          : 0^M
(XEN) .... register #01: 00170020^M
(XEN) .......     : max redirection entries: 0017^M
(XEN) .......     : PRQ implemented: 0^M
(XEN) .......     : IO APIC version: 0020^M
(XEN) .... IRQ redirection table:^M
(XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   ^M
(XEN)  00 000 00  1    0    0   0   0    0    0    00^M
(XEN)  01 000 00  0    0    0   0   0    1    1    38^M
(XEN)  02 000 00  0    0    0   0   0    1    1    F0^M
(XEN)  03 000 00  0    0    0   0   0    1    1    40^M
(XEN)  04 000 00  0    0    0   0   0    1    1    48^M
(XEN)  05 000 00  0    0    0   0   0    1    1    50^M
(XEN)  06 000 00  0    0    0   0   0    1    1    58^M
(XEN)  07 000 00  0    0    0   0   0    1    1    60^M
(XEN)  08 000 00  0    0    0   0   0    1    1    68^M
(XEN)  09 000 00  0    1    0   0   0    1    1    70^M
(XEN)  0a 000 00  0    0    0   0   0    1    1    78^M
(XEN)  0b 000 00  0    0    0   0   0    1    1    88^M
(XEN)  0c 000 00  0    0    0   0   0    1    1    90^M
(XEN)  0d 000 00  0    0    0   0   0    1    1    98^M
(XEN)  0e 000 00  0    0    0   0   0    1    1    A0^M
(XEN)  0f 000 00  0    0    0   0   0    1    1    A8^M
(XEN)  10 000 00  0    1    1   1   1    1    1    DD^M
(XEN)  11 000 00  1    0    0   0   0    0    0    00^M
(XEN)  12 000 00  0    1    0   1   0    1    1    2E^M
(XEN)  13 000 00  1    1    0   1   0    1    1    39^M
(XEN)  14 000 00  1    1    0   1   0    1    1    29^M
(XEN)  15 07A 0A  1    0    0   0   0    0    2    B4^M
(XEN)  16 000 00  0    1    0   1   0    1    1    D5^M
(XEN)  17 000 00  0    1    0   1   0    1    1    9B^M
(XEN) Using vector-based indexing^M
(XEN) IRQ to pin mappings:^M
(XEN) IRQ240 -> 0:2^M
(XEN) IRQ56 -> 0:1^M
(XEN) IRQ64 -> 0:3^M
(XEN) IRQ72 -> 0:4^M
(XEN) IRQ80 -> 0:5^M
(XEN) IRQ88 -> 0:6^M
(XEN) IRQ96 -> 0:7^M
(XEN) IRQ104 -> 0:8^M
(XEN) IRQ112 -> 0:9^M
(XEN) IRQ120 -> 0:10^M
(XEN) IRQ136 -> 0:11^M
(XEN) IRQ144 -> 0:12^M
(XEN) IRQ152 -> 0:13^M
(XEN) IRQ160 -> 0:14^M
(XEN) IRQ168 -> 0:15^M
(XEN) IRQ221 -> 0:16^M
(XEN) IRQ46 -> 0:18^M
(XEN) IRQ57 -> 0:19^M
(XEN) IRQ41 -> 0:20^M
(XEN) IRQ213 -> 0:22^M
(XEN) IRQ155 -> 0:23^M
(XEN) .................................... done.^M
(XEN) ^M
(XEN) ****************************************^M
(XEN) Panic on CPU 0:^M
(XEN) CA-107844****************************************^M
(XEN) ^M
(XEN) Reboot in five seconds...^M
(XEN) Executing crash image^M

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12 13:54                       ` Thimo E
@ 2013-08-12 14:06                         ` Andrew Cooper
  2013-08-13  1:43                           ` Zhang, Yang Z
  2013-08-13 11:39                         ` Wu, Feng
  1 sibling, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-08-12 14:06 UTC (permalink / raw)
  To: Thimo E
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Yang Z, Zhang, Xiantao


[-- Attachment #1.1: Type: text/plain, Size: 2110 bytes --]

On 12/08/13 14:54, Thimo E wrote:
> Hello Yang,
>
> and attached the next crash dump which occured today, only some
> minutes after I've created the logfiles I've sent in the mail just before.
> Perhaps together with the logfiles of the former mail it gives you a
> better understand of what is going on.
>
> I've disabled Interrupt remapping now.
>
> > 4.....
> > can you add some debug message in the guest EOI code path(like
> _irq_guest_eoi())) to track the EOI?
> @Andrew: Is it possible for you to integrate the requested changes
> from Yang into your Xen debugging version ?

I already have.  That would be "Marked {foo} ready" debugging in the
PEOI stack section.

~Andrew

>
> Best regards
>   Thimo
>
> Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>>
>> Hi Thimo,
>>
>> From your previous experience and log, it shows:
>>
>> 1.       The interrupt that triggers the issue is a MSI.
>>
>> 2.       MSI are treated as edge-triggered interrupts nomally, except
>> when there is no way to mask the device. In this case, your previous
>> log indicates the device is unmaskable(What special device are you
>> using?Modern PCI devcie should be maskable).
>>
>> 3.       The IRQ 29 is belong to dom0, it seems it is not a HVM
>> related issue.
>>
>> 4.       The status of IRQ 29 is 10 which means the guest already
>> issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so
>> there should be no pending EOI in the EOI stack. If possible, can you
>> add some debug message in the guest EOI code path(like
>> _irq_guest_eoi())) to track the EOI?
>>
>> 5.       Both of the log show when the issue occured, most of the
>> other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status.
>> Is it a coincidence? Or it happened only on the special condition
>> like heavy of IRQ migration?Perhaps you can disable irq balance in
>> dom0 and pin the IRQ manually.
>>
> |6.       I guess the interrupt remapping is enabled in your machine.
> Can you try to disable IR to see whether it still reproduceable?
>>
>> Also, please provide the whole Xen log.
>>
>>  
>>
>> Best regards,
>>
>> Yang
>>
>


[-- Attachment #1.2: Type: text/html, Size: 12593 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12 14:06                         ` Andrew Cooper
@ 2013-08-13  1:43                           ` Zhang, Yang Z
  2013-08-13  6:39                             ` Thimo E.
  0 siblings, 1 reply; 63+ messages in thread
From: Zhang, Yang Z @ 2013-08-13  1:43 UTC (permalink / raw)
  To: Andrew Cooper, Thimo E
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Xiantao

Andrew Cooper wrote on 2013-08-12:
> On 12/08/13 14:54, Thimo E wrote:
> 
> 
> 	Hello Yang,
> 
> 	and attached the next crash dump which occured today, only some
> minutes after I've created the logfiles I've sent in the mail just before.
> 	Perhaps together with the logfiles of the former mail it gives you a
> better understand of what is going on.
> 
> 	I've disabled Interrupt remapping now.
> 
> 	> 4.....
> 	> can you add some debug message in the guest EOI code path(like
> _irq_guest_eoi())) to track the EOI?
> 	@Andrew: Is it possible for you to integrate the requested changes
> from Yang into your Xen debugging version ?
> 
> 
> 
> I already have.  That would be "Marked {foo} ready" debugging in the
> PEOI stack section.
I didn't find your debug patch that add PEOI stack tracing. Could you resend it? thanks.

> 
> ~Andrew
> 
> 
> 
> 
> 	Best regards
> 	  Thimo
> 
> 	Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
> 
> 
> 		Hi Thimo,
> 
> 		From your previous experience and log, it shows:
> 
> 		1.       The interrupt that triggers the issue is a MSI.
> 
> 		2.       MSI are treated as edge-triggered interrupts nomally,
> except when there is no way to mask the device. In this case, your
> previous log indicates the device is unmaskable(What special device
> are you using?Modern PCI devcie should be maskable).
> 
> 		3.       The IRQ 29 is belong to dom0, it seems it is not a HVM
> related issue.
> 
> 		4.       The status of IRQ 29 is 10 which means the guest already
> issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so
> there should be no pending EOI in the EOI stack. If possible, can you
> add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?
> 
> 		5.       Both of the log show when the issue occured, most of the
> other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status.
> Is it a coincidence? Or it happened only on the special condition like
> heavy of IRQ migration?Perhaps you can disable irq balance in dom0 and
> pin the IRQ manually.
> 
> 	|6.       I guess the interrupt remapping is enabled in your machine.
> Can you try to disable IR to see whether it still reproduceable?
> 
> 		Also, please provide the whole Xen log.
> 
> 
> 
> 		Best regards,
> 
> 		Yang
> 
> 
>


Best regards,
Yang

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-13  1:43                           ` Zhang, Yang Z
@ 2013-08-13  6:39                             ` Thimo E.
  0 siblings, 0 replies; 63+ messages in thread
From: Thimo E. @ 2013-08-13  6:39 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

[-- Attachment #1: Type: text/plain, Size: 447 bytes --]

Hello,

Andrew sent it somewhere yesterday into another branch of this thread, 
attached you'll find that patch that corresponds to my debugging output.

Best regards
   Thimo

Am 13.08.2013 03:43, schrieb Zhang, Yang Z:
> Andrew Cooper wrote on 2013-08-12:
>> I already have.  That would be "Marked {foo} ready" debugging in the
>> PEOI stack section.
> I didn't find your debug patch that add PEOI stack tracing. Could you resend it? thanks.
>


[-- Attachment #2: ca-107844-debug.patch --]
[-- Type: text/plain, Size: 5875 bytes --]

# HG changeset patch
# Parent bbd6b6d05c06f6331974467467cf567d60915b3d

diff -r bbd6b6d05c06 xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1176,7 +1176,7 @@ static inline void UNEXPECTED_IO_APIC(vo
 {
 }
 
-static void /*__init*/ __print_IO_APIC(void)
+void /*__init*/ __print_IO_APIC(void)
 {
     int apic, i;
     union IO_APIC_reg_00 reg_00;
diff -r bbd6b6d05c06 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1003,6 +1003,46 @@ static void irq_guest_eoi_timer_fn(void 
     spin_unlock_irqrestore(&desc->lock, flags);
 }
 
+struct peoi_record {
+    enum { PEOI_PUSH,
+           PEOI_SETREADY,
+           PEOI_FLUSH,
+           PEOI_POP } action;
+    unsigned sp, irq, vector;
+};
+
+static void print_peoi_record(const struct peoi_record *r)
+{
+    switch ( r->action )
+    {
+    case PEOI_PUSH:
+        printk("  Pushed {sp %d, irq %d, vec 0x%02x}\n",
+               r->sp, r->irq, r->vector);
+        break;
+    case PEOI_SETREADY:
+        printk("  Marked {sp %d, irq %d, vec 0x%02x} ready\n",
+               r->sp, r->irq, r->vector);
+        break;
+    case PEOI_FLUSH:
+        printk("  Fushed %d -> 0 \n", r->sp);
+        break;
+    case PEOI_POP:
+        printk("  Poped entry {sp %d, irq %d, vec 0x%02x}\n",
+               r->sp, r->irq, r->vector);
+        break;
+    default:
+        printk("  Unknown: {%d, %d, %d, 0x%02x}\n",
+               r->action, r->sp, r->irq, r->vector);
+        break;
+    }
+}
+
+#define NR_PEOI_RECORDS 32
+static DEFINE_PER_CPU(struct peoi_record, _peoi_dbg[NR_PEOI_RECORDS]) = {{0}};
+static DEFINE_PER_CPU(unsigned int, _peoi_dbg_idx) = 0;
+
+static void dump_irqs(unsigned char key);
+void __print_IO_APIC(void);
 static void __do_IRQ_guest(int irq)
 {
     struct irq_desc         *desc = irq_to_desc(irq);
@@ -1024,13 +1064,53 @@ static void __do_IRQ_guest(int irq)
     if ( action->ack_type == ACKTYPE_EOI )
     {
         sp = pending_eoi_sp(peoi);
-        ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
+        if ( unlikely( !((sp == 0) || (peoi[sp-1].vector < vector)) ))
+        {
+            int p;
+            unsigned i, idx;
+            printk("**Pending EOI error\n");
+            printk("  irq %d, vector 0x%x\n", irq, vector);
+
+            for ( p = sp-1; p >= 0; --p )
+            {
+                printk("  s[%d] irq %d, vec 0x%x, ready %u, "
+                       "ISR %08"PRIx32", TMR %08"PRIx32", IRR %08"PRIx32"\n",
+                       p, peoi[p].irq, peoi[p].vector, peoi[p].ready,
+                       apic_isr_read(peoi[p].vector),
+                       apic_tmr_read(peoi[p].vector),
+                       apic_irr_read(peoi[p].vector) );
+            }
+
+            printk("All LAPIC state:\n");
+            printk("[vector] %8s %8s %8s\n", "ISR", "TMR", "IRR");
+            for ( i = 0; i < APIC_ISR_NR; ++i )
+                printk("[%02x:%02x] %08"PRIx32" %08"PRIx32" %08"PRIx32"\n",
+                       (i * 32)+31, i*32,
+                       apic_read(APIC_ISR + i*0x10),
+                       apic_read(APIC_TMR + i*0x10),
+                       apic_read(APIC_IRR + i*0x10) );
+
+            printk("Peoi stack trace records:\n");
+            idx = this_cpu(_peoi_dbg_idx);
+            for ( i = 1; i <= NR_PEOI_RECORDS; ++i )
+                print_peoi_record(&this_cpu(_peoi_dbg)[(idx - i) &
+                                                       (NR_PEOI_RECORDS-1)] );
+
+            spin_unlock(&desc->lock);
+            dump_irqs('i');
+            __print_IO_APIC();
+
+            panic("CA-107844");
+        }
         ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
         peoi[sp].irq = irq;
         peoi[sp].vector = vector;
         peoi[sp].ready = 0;
         pending_eoi_sp(peoi) = sp+1;
         cpu_set(smp_processor_id(), action->cpu_eoi_map);
+
+        this_cpu(_peoi_dbg)[(this_cpu(_peoi_dbg_idx)++) & (NR_PEOI_RECORDS-1)]
+            = (struct peoi_record){PEOI_PUSH, sp, irq, peoi[sp].vector};
     }
 
     for ( i = 0; i < action->nr_guests; i++ )
@@ -1130,6 +1210,9 @@ static void flush_ready_eoi(void)
         spin_lock(&desc->lock);
         desc->handler->end(irq, peoi[sp].vector);
         spin_unlock(&desc->lock);
+
+        this_cpu(_peoi_dbg)[(this_cpu(_peoi_dbg_idx)++) & (NR_PEOI_RECORDS-1)]
+            = (struct peoi_record){PEOI_POP, sp+1, irq, peoi[sp].vector};
     }
 
     pending_eoi_sp(peoi) = sp+1;
@@ -1155,6 +1238,9 @@ static void __set_eoi_ready(struct irq_d
     } while ( peoi[--sp].irq != irq );
     ASSERT(!peoi[sp].ready);
     peoi[sp].ready = 1;
+
+    this_cpu(_peoi_dbg)[(this_cpu(_peoi_dbg_idx)++) & (NR_PEOI_RECORDS-1)]
+        = (struct peoi_record){PEOI_SETREADY, sp, irq, desc->chip_data->vector};
 }
 
 /* Mark specified IRQ as ready-for-EOI (if it really is) and attempt to EOI. */
@@ -1976,6 +2062,8 @@ void fixup_irqs(void)
 
     /* Flush the interrupt EOI stack. */
     peoi = this_cpu(pending_eoi);
+    this_cpu(_peoi_dbg)[(this_cpu(_peoi_dbg_idx)++) & (NR_PEOI_RECORDS-1)]
+        = (struct peoi_record){PEOI_FLUSH, pending_eoi_sp(peoi)};
     for ( sp = 0; sp < pending_eoi_sp(peoi); sp++ )
         peoi[sp].ready = 1;
     flush_ready_eoi();
diff -r bbd6b6d05c06 xen/include/asm-x86/apic.h
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -152,6 +152,18 @@ static __inline bool_t apic_isr_read(u8 
             (vector & 0x1f)) & 1;
 }
 
+static __inline bool_t apic_tmr_read(u8 vector)
+{
+    return (apic_read(APIC_TMR + ((vector & ~0x1f) >> 1)) >>
+            (vector & 0x1f)) & 1;
+}
+
+static __inline bool_t apic_irr_read(u8 vector)
+{
+    return (apic_read(APIC_IRR + ((vector & ~0x1f) >> 1)) >>
+            (vector & 0x1f)) & 1;
+}
+
 static __inline u32 get_apic_id(void) /* Get the physical APIC id */
 {
     u32 id = apic_read(APIC_ID);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12 13:54                       ` Thimo E
  2013-08-12 14:06                         ` Andrew Cooper
@ 2013-08-13 11:39                         ` Wu, Feng
  2013-08-13 12:46                           ` Andrew Cooper
  1 sibling, 1 reply; 63+ messages in thread
From: Wu, Feng @ 2013-08-13 11:39 UTC (permalink / raw)
  To: Thimo E, Zhang, Yang Z
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao


[-- Attachment #1.1: Type: text/plain, Size: 2512 bytes --]

Hi Thimo,

I am trying to reproduce this issue on my side, unfortunately, I failed to boot up the guest rhel6.4 on top of Xen-4.1.5 RC1 with 3.9.3 domain0 kernel. Since Xen-4.1.5 is a little old, could you please share the guest configuration file you used when this issue happened? Thanks a lot!

Thanks,
Feng

From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Thimo E
Sent: Monday, August 12, 2013 9:55 PM
To: Zhang, Yang Z
Cc: Keir Fraser; Jan Beulich; Andrew Cooper; Dong, Eddie; Xen-develList; Nakajima, Jun; Zhang, Xiantao
Subject: Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic

Hello Yang,

and attached the next crash dump which occured today, only some minutes after I've created the logfiles I've sent in the mail just before.
Perhaps together with the logfiles of the former mail it gives you a better understand of what is going on.

I've disabled Interrupt remapping now.

> 4.....
> can you add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?
@Andrew: Is it possible for you to integrate the requested changes from Yang into your Xen debugging version ?

Best regards
  Thimo

Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
Hi Thimo,
>From your previous experience and log, it shows:

1.      The interrupt that triggers the issue is a MSI.

2.      MSI are treated as edge-triggered interrupts nomally, except when there is no way to mask the device. In this case, your previous log indicates the device is unmaskable(What special device are you using?Modern PCI devcie should be maskable).

3.      The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.

4.      The status of IRQ 29 is 10 which means the guest already issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should be no pending EOI in the EOI stack. If possible, can you add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?

5.      Both of the log show when the issue occured, most of the other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it a coincidence? Or it happened only on the special condition like heavy of IRQ migration?Perhaps you can disable irq balance in dom0 and pin the IRQ manually.
|I guess the interrupt remapping is enabled in your machine. Can you try to disable IR to see whether it still reproduceable?
Also, please provide the whole Xen log.

Best regards,
Yang


[-- Attachment #1.2: Type: text/html, Size: 13147 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-13 11:39                         ` Wu, Feng
@ 2013-08-13 12:46                           ` Andrew Cooper
  0 siblings, 0 replies; 63+ messages in thread
From: Andrew Cooper @ 2013-08-13 12:46 UTC (permalink / raw)
  To: Wu, Feng
  Cc: Thimo E, Keir Fraser, Nakajima, Jun, Dong, Eddie, Xen-develList,
	Jan Beulich, Zhang, Yang Z, Zhang, Xiantao


[-- Attachment #1.1: Type: text/plain, Size: 3091 bytes --]

On 13/08/13 12:39, Wu, Feng wrote:
>
> Hi Thimo,
>
>  
>
> I am trying to reproduce this issue on my side, unfortunately, I
> failed to boot up the guest rhel6.4 on top of Xen-4.1.5 RC1 with 3.9.3
> domain0 kernel. Since Xen-4.1.5 is a little old, could you please
> share the guest configuration file you used when this issue happened?
> Thanks a lot!
>
>  
>
> Thanks,
>
> Feng
>

Stepping in here for a moment, Thimo is running XenServer 6.2

This issue started on the XenServer forums but moved here.  For
reference, we found this once in XenServer testing (as seen at the root
of this email thread), but I have been unable to reproduce the issue
since.  We have seen the crash on Xen 4.1 and 4.2

~Andrew

>  
>
> *From:*xen-devel-bounces@lists.xen.org
> [mailto:xen-devel-bounces@lists.xen.org] *On Behalf Of *Thimo E
> *Sent:* Monday, August 12, 2013 9:55 PM
> *To:* Zhang, Yang Z
> *Cc:* Keir Fraser; Jan Beulich; Andrew Cooper; Dong, Eddie;
> Xen-develList; Nakajima, Jun; Zhang, Xiantao
> *Subject:* Re: [Xen-devel] cpuidle and un-eoid interrupts at the local
> apic
>
>  
>
> Hello Yang,
>
> and attached the next crash dump which occured today, only some
> minutes after I've created the logfiles I've sent in the mail just before.
> Perhaps together with the logfiles of the former mail it gives you a
> better understand of what is going on.
>
> I've disabled Interrupt remapping now.
>
> > 4.....
> > can you add some debug message in the guest EOI code path(like
> _irq_guest_eoi())) to track the EOI?
> @Andrew: Is it possible for you to integrate the requested changes
> from Yang into your Xen debugging version ?
>
> Best regards
>   Thimo
>
> Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>
>     Hi Thimo,
>
>     From your previous experience and log, it shows:
>
>     1.      The interrupt that triggers the issue is a MSI.
>
>     2.      MSI are treated as edge-triggered interrupts nomally,
>     except when there is no way to mask the device. In this case, your
>     previous log indicates the device is unmaskable(What special
>     device are you using?Modern PCI devcie should be maskable).
>
>     3.      The IRQ 29 is belong to dom0, it seems it is not a HVM
>     related issue.
>
>     4.      The status of IRQ 29 is 10 which means the guest already
>     issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared,
>     so there should be no pending EOI in the EOI stack. If possible,
>     can you add some debug message in the guest EOI code path(like
>     _irq_guest_eoi())) to track the EOI?
>
>     5.      Both of the log show when the issue occured, most of the
>     other interrupts which owned by dom0 were in IRQ_MOVE_PENDING
>     status. Is it a coincidence? Or it happened only on the special
>     condition like heavy of IRQ migration?Perhaps you can disable irq
>     balance in dom0 and pin the IRQ manually.
>
> |I guess the interrupt remapping is enabled in your machine. Can you
> try to disable IR to see whether it still reproduceable?
>
> Also, please provide the whole Xen log.
>
>  
>
> Best regards,
>
> Yang
>
>  
>


[-- Attachment #1.2: Type: text/html, Size: 16247 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12 10:27                     ` Andrew Cooper
@ 2013-08-14  2:53                       ` Zhang, Yang Z
  2013-08-14  7:51                         ` Thimo E.
  2013-08-14  9:52                         ` Andrew Cooper
  0 siblings, 2 replies; 63+ messages in thread
From: Zhang, Yang Z @ 2013-08-14  2:53 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: xen-devel, Thimo E., Keir Fraser

Andrew Cooper wrote on 2013-08-12:
> On 12/08/13 11:05, Jan Beulich wrote:
>>>>> On 12.08.13 at 11:28, Andrew Cooper <andrew.cooper3@citrix.com>
> wrote:
>>> On 12/08/13 09:20, Jan Beulich wrote:
>>>>>>> On 09.08.13 at 23:27, "Thimo E." <abc@digithi.de> wrote:
>>>>> (XEN) **Pending EOI error (XEN)   irq 29, vector 0x24 (XEN)   s[0]
>>>>> irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 00000000
>>>>> (XEN) All LAPIC state: (XEN) [vector]      ISR      TMR      IRR
>>>>> (XEN) [1f:00] 00000000 00000000 00000000 (XEN) [3f:20] 00000010
>>>>> 76efa12e 00000000 (XEN) [5f:40] 00000000 e6f0f2fc 00000000 (XEN)
>>>>> [7f:60] 00000000 32d096ca 00000000 (XEN) [9f:80] 00000000 78fcf87a
>>>>> 00000000 (XEN) [bf:a0] 00000000 f9b9fe4e 00000000 (XEN) [df:c0]
>>>>> 00000000 ffdfe7ab 00000000 (XEN) [ff:e0] 00000000 00000000 00000000
>>>>> (XEN) Peoi stack trace records:
>>>> Mind providing (a link to) the patch that was used here, so that
>>>> one can make sense of the printed information (and perhaps also
>>>> suggest adjustments to that debugging code)? Nothing I was able to
>>>> find on the list fully matches the output above...
>>> Attached
>> Thanks. Actually, the second case he sent has an interesting
>> difference:
>> 
>> (XEN)   s[0] irq 29, vec 0x26, ready 0, ISR 00000001, TMR 00000000, IRR
>> 00000001
>> 
>> i.e. we in fact have _three_ instance of the interrupt (two
>> in-service, and one request). I don't see an explanation for this
>> other than buggy hardware. Sadly we still don't know what device it
>> is that is behaving that way (including the confirmation that it's a
>> non- maskable MSI one).
>> 
>> Jan
>> 
> 
> On the XenServer hardware where we have seen this issue, the
> problematic interrupt was from:
> 
> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection
> I217-LM (rev 02) Subsystem: Intel Corporation Device 0000 Control: I/O+
> Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR-
> FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin
> A routed to IRQ 1275 Region 0: Memory at c2700000 (32-bit,
> non-prefetchable) [size=128K] Region 1: Memory at c273e000 (32-bit,
> non-prefetchable) [size=4K] Region 2: I/O ports at 7080 [size=32]
> Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1-
> D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst-
> PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+
> Count=1/1 Maskable- 64bit+ Address: 00000000fee00318 Data: 0000
> Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR-
> AFStatus: TP- Kernel driver in use: e1000e Kernel modules: e1000e
> 
> I am still attempting to reproduce the issue, but we haven't seen it
> again since my email at the root of this thread.
Did you see the issue on other HSW machine without this NIC? Also, Thimo, have you tried to pin the vcpu and stop irqbalance in dom0?

> 
> ~Andrew
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


Best regards,
Yang

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-14  2:53                       ` Zhang, Yang Z
@ 2013-08-14  7:51                         ` Thimo E.
  2013-08-14  9:52                         ` Andrew Cooper
  1 sibling, 0 replies; 63+ messages in thread
From: Thimo E. @ 2013-08-14  7:51 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, xen-devel

Hello,

on last reboot i've disabled interrupt remapping, since then it did not 
crash (but the crashes happened somewhere between 3 hours to 7 days). So 
still waiting if that option helped.

Here the masterplan from my conclusions out of this thread:
1) If the server crashes again I'll enable dom0_max_vcpus=1 and 
dom0_vcpus_pin.
Could that problem also be a driver error ? Another idea is to update 
the e1000e driver from 2.3.2-NAPI to 2.4.14 ?!

2) I have two Intel NICs in the server, one onboard and one PCIe NIC. 
The crash seems to come from the onboard device. So If the server 
crashes again I'll disable that internal Intel NIC and put another PCIe 
network card in it.

Best regards
   Thimo

Am 14.08.2013 04:53, schrieb Zhang, Yang Z:
>
> Did you see the issue on other HSW machine without this NIC? Also, Thimo, have you tried to pin the vcpu and stop irqbalance in dom0?
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-14  2:53                       ` Zhang, Yang Z
  2013-08-14  7:51                         ` Thimo E.
@ 2013-08-14  9:52                         ` Andrew Cooper
  2013-09-07 13:27                           ` Thimo E.
  1 sibling, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-08-14  9:52 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: xen-devel, Thimo E., Keir Fraser, Jan Beulich

On 14/08/13 03:53, Zhang, Yang Z wrote:
> Andrew Cooper wrote on 2013-08-12:
>> On 12/08/13 11:05, Jan Beulich wrote:
>>>>>> On 12.08.13 at 11:28, Andrew Cooper <andrew.cooper3@citrix.com>
>> wrote:
>>>> On 12/08/13 09:20, Jan Beulich wrote:
>>>>>>>> On 09.08.13 at 23:27, "Thimo E." <abc@digithi.de> wrote:
>>>>>> (XEN) **Pending EOI error (XEN)   irq 29, vector 0x24 (XEN)   s[0]
>>>>>> irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 00000000
>>>>>> (XEN) All LAPIC state: (XEN) [vector]      ISR      TMR      IRR
>>>>>> (XEN) [1f:00] 00000000 00000000 00000000 (XEN) [3f:20] 00000010
>>>>>> 76efa12e 00000000 (XEN) [5f:40] 00000000 e6f0f2fc 00000000 (XEN)
>>>>>> [7f:60] 00000000 32d096ca 00000000 (XEN) [9f:80] 00000000 78fcf87a
>>>>>> 00000000 (XEN) [bf:a0] 00000000 f9b9fe4e 00000000 (XEN) [df:c0]
>>>>>> 00000000 ffdfe7ab 00000000 (XEN) [ff:e0] 00000000 00000000 00000000
>>>>>> (XEN) Peoi stack trace records:
>>>>> Mind providing (a link to) the patch that was used here, so that
>>>>> one can make sense of the printed information (and perhaps also
>>>>> suggest adjustments to that debugging code)? Nothing I was able to
>>>>> find on the list fully matches the output above...
>>>> Attached
>>> Thanks. Actually, the second case he sent has an interesting
>>> difference:
>>>
>>> (XEN)   s[0] irq 29, vec 0x26, ready 0, ISR 00000001, TMR 00000000, IRR
>>> 00000001
>>>
>>> i.e. we in fact have _three_ instance of the interrupt (two
>>> in-service, and one request). I don't see an explanation for this
>>> other than buggy hardware. Sadly we still don't know what device it
>>> is that is behaving that way (including the confirmation that it's a
>>> non- maskable MSI one).
>>>
>>> Jan
>>>
>> On the XenServer hardware where we have seen this issue, the
>> problematic interrupt was from:
>>
>> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection
>> I217-LM (rev 02) Subsystem: Intel Corporation Device 0000 Control: I/O+
>> Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR-
>> FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>> TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin
>> A routed to IRQ 1275 Region 0: Memory at c2700000 (32-bit,
>> non-prefetchable) [size=128K] Region 1: Memory at c273e000 (32-bit,
>> non-prefetchable) [size=4K] Region 2: I/O ports at 7080 [size=32]
>> Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1-
>> D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst-
>> PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+
>> Count=1/1 Maskable- 64bit+ Address: 00000000fee00318 Data: 0000
>> Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR-
>> AFStatus: TP- Kernel driver in use: e1000e Kernel modules: e1000e
>>
>> I am still attempting to reproduce the issue, but we haven't seen it
>> again since my email at the root of this thread.
> Did you see the issue on other HSW machine without this NIC? Also, Thimo, have you tried to pin the vcpu and stop irqbalance in dom0?

We do not have any Haswell hardware without this NIC.

~Andrew

>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
> Best regards,
> Yang
>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-12 12:04                         ` Andrew Cooper
@ 2013-08-19 15:14                           ` Thimo E.
  2013-08-20  5:43                             ` Thimo Eichstädt
  0 siblings, 1 reply; 63+ messages in thread
From: Thimo E. @ 2013-08-19 15:14 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Yang Z, Zhang, Xiantao


[-- Attachment #1.1: Type: text/plain, Size: 3767 bytes --]

Hello,

after one week of testing an intermediate result:

Since I've set iommu=no-intremap no crash occured so far. The server 
never ran longer without a crash. So a careful "it's working", but, 
because only one 7 days passed so far, not a final horray.

Even if this option really avoids the problem I classify it as nothing 
more than a workaround...obviously a good one because it's working, but 
still a workaround.

Where could the problem of the source be ? Bug in hardware ? Bug in 
software ?

And what does interrupt remapping really do ? Does disabling remapping 
have a performance impact ?

Best regards
   Thimo

Am 12.08.2013 14:04, schrieb Andrew Cooper:
> On 12/08/13 12:52, Thimo E wrote:
>> Hello Yang,
>>
>> attached you'll find the kernel dmesg, xen dmesg, lspci and output of 
>> /proc/interrupts. If you want to see further logfiles, please let me 
>> know.
>>
>> The processor is a Core i5-4670. The board is an Intel  DH87MC 
>> Mainboard. I am really not sure if it supports APICv, but VT-d is 
>> supported enabled enabled.
>>
>>
>>> 4.The status of IRQ 29 is 10 which means the guest already issues 
>>> the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there 
>>> should be no pending EOI in the EOI stack. If possible, can you add 
>>> some debug message in the guest EOI code path(like 
>>> _irq_guest_eoi())) to track the EOI?
>>>
>> I don't see the IRQ29 in /proc/interrupts, what I see is:
>> cat xen-dmesg.txt |grep "29": (XEN) allocated vector 29 for irq 20
>> cat dmesg.txt | grep "eth0": [   23.152355] e1000e 0000:00:19.0: PCI 
>> INT A -> GSI 20 (level, low) -> IRQ 20
>>                                                   [ 23.330408] e1000e 
>> 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
>>
>> So is the ethernet irq the bad one ? That is an Onboard Intel network 
>> adapter.
>
> That would be consistent with the crash seen with our hardware in 
> XenServer
>
>>
>>> 6.I guess the interrupt remapping is enabled in your machine. Can 
>>> you try to disable IR to see whether it still reproduceable?
>>>
>> Just to be sure, your proposal is to try the parameter "no-intremap" ?
>
> specifically, iommu=no-intremap
>
>>
>> Best regards
>>   Thimo
>
> ~Andrew
>
>>
>> Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>>>
>>> Hi Thimo,
>>>
>>> From your previous experience and log, it shows:
>>>
>>> 1.The interrupt that triggers the issue is a MSI.
>>>
>>> 2.MSI are treated as edge-triggered interrupts nomally, except when 
>>> there is no way to mask the device. In this case, your previous log 
>>> indicates the device is unmaskable(What special device are you 
>>> using?Modern PCI devcie should be maskable).
>>>
>>> 3.The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.
>>>
>>> 4.The status of IRQ 29 is 10 which means the guest already issues 
>>> the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there 
>>> should be no pending EOI in the EOI stack. If possible, can you add 
>>> some debug message in the guest EOI code path(like 
>>> _irq_guest_eoi())) to track the EOI?
>>>
>>> 5.Both of the log show when the issue occured, most of the other 
>>> interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is 
>>> it a coincidence? Or it happened only on the special condition like 
>>> heavy of IRQ migration?Perhaps you can disable irq balance in dom0 
>>> and pin the IRQ manually.
>>>
>> |6.I guess the interrupt remapping is enabled in your machine. Can 
>> you try to disable IR to see whether it still reproduceable?
>>>
>>> Also, please provide the whole Xen log.
>>>
>>> Best regards,
>>>
>>> Yang
>>>
>>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 18683 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-19 15:14                           ` Thimo E.
@ 2013-08-20  5:43                             ` Thimo Eichstädt
  2013-08-20  8:40                               ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: Thimo Eichstädt @ 2013-08-20  5:43 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Yang Z, Zhang, Xiantao


[-- Attachment #1.1: Type: text/plain, Size: 15250 bytes --]

Hello again,

ok, I was happy too soon. Crashed again. Now I've set the following xen 
parameters:
   iommu=no-intremap dom0_max_vcpus=1-1 dom0_vcpus_pin noirqbalance

Best regards
   Thimo

Here the crash dump:

(XEN) **Pending EOI error^M
(XEN)   irq 29, vector 0x21^M
(XEN)   s[0] irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR 
00000000^M
(XEN) All LAPIC state:^M
(XEN) [vector]      ISR      TMR      IRR^M
(XEN) [1f:00] 00000000 00000000 00000000^M
(XEN) [3f:20] 00020002 00000000 00000000^M
(XEN) [5f:40] 00000000 00000000 00000000^M
(XEN) [7f:60] 00000000 00000002 00000000^M
(XEN) [9f:80] 00000000 00000000 00000000^M
(XEN) [bf:a0] 00000000 01010000 00000000^M
(XEN) [df:c0] 00000000 01000000 00000000^M
(XEN) [ff:e0] 00000000 00000000 08000000^M
(XEN) Peoi stack trace records:^M
(XEN)   Pushed {sp 0, irq 30, vec 0x31}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 31, vec 0x71}^M
(XEN)   Marked {sp 0, irq 31, vec 0x71} ready^M
(XEN)   Pushed {sp 0, irq 31, vec 0x71}^M
(XEN)   Poped entry {sp 1, irq 30, vec 0x31}^M
(XEN)   Marked {sp 0, irq 30, vec 0x31} ready^M
(XEN)   Pushed {sp 0, irq 30, vec 0x31}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN)   Marked {sp 0, irq 29, vec 0x21} ready^M
(XEN)   Pushed {sp 0, irq 29, vec 0x21}^M
(XEN)   Poped entry {sp 1, irq 29, vec 0x21}^M
(XEN) Guest interrupt information:^M
(XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge status=00000000 
mapped, unbound^M
(XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge status=00000050 
in-flight=0 domain-list=0:  1(----),^M
(XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC status=00000000 mapped, 
unbound^M
(XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge status=00000006 
mapped, unbound^M
(XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge status=00000002 
mapped, unbound^M
(XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge status=00000050 
in-flight=0 domain-list=0:  5(----),^M
(XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge status=00000002 
mapped, unbound^M
(XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge status=00000002 
mapped, unbound^M
(XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge status=00000050 
in-flight=0 domain-list=0:  8(----),^M
(XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level status=00000050 
in-flight=0 domain-list=0:  9(----),^M
(XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge status=00000002 
mapped, unbound^M
(XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge status=00000002 
mapped, unbound^M
(XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge status=00000002 
mapped, unbound^M
(XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge status=00000002 
mapped, unbound^M
(XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge status=00000002 
mapped, unbound^M
(XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge status=00000002 
mapped, unbound^M
(XEN)    IRQ:  16 affinity:4 vec:b0 type=IO-APIC-level status=00000010 
in-flight=0 domain-list=0: 16(----),^M
(XEN)    IRQ:  18 affinity:8 vec:b8 type=IO-APIC-level status=00000050 
in-flight=0 domain-list=0: 18(----),^M
(XEN)    IRQ:  19 affinity:f vec:29 type=IO-APIC-level status=00000002 
mapped, unbound^M
(XEN)    IRQ:  20 affinity:f vec:39 type=IO-APIC-level status=00000002 
mapped, unbound^M
(XEN)    IRQ:  22 affinity:8 vec:61 type=IO-APIC-level status=00000050 
in-flight=0 domain-list=0: 22(----),^M
(XEN)    IRQ:  23 affinity:4 vec:d8 type=IO-APIC-level status=00000050 
in-flight=0 domain-list=0: 23(----),^M
(XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI status=00000000 mapped, 
unbound^M
(XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI status=00000000 mapped, 
unbound^M
(XEN)    IRQ:  26 affinity:f vec:c0 type=PCI-MSI status=00000002 mapped, 
unbound^M
(XEN)    IRQ:  27 affinity:f vec:c8 type=PCI-MSI status=00000002 mapped, 
unbound^M
(XEN)    IRQ:  28 affinity:f vec:d0 type=PCI-MSI status=00000002 mapped, 
unbound^M
(XEN)    IRQ:  29 affinity:4 vec:21 type=PCI-MSI status=00000010 
in-flight=0 domain-list=0:276(----),^M
(XEN)    IRQ:  30 affinity:4 vec:31 type=PCI-MSI status=00000010 
in-flight=0 domain-list=0:275(----),^M
(XEN)    IRQ:  31 affinity:8 vec:71 type=PCI-MSI status=00000050 
in-flight=0 domain-list=0:274(----),^M
(XEN)    IRQ:  32 affinity:4 vec:49 type=PCI-MSI status=00000050 
in-flight=0 domain-list=0:273(----),^M
(XEN)    IRQ:  33 affinity:8 vec:51 type=PCI-MSI status=00000050 
in-flight=0 domain-list=0:272(----),^M
(XEN)    IRQ:  34 affinity:1 vec:59 type=PCI-MSI status=00000050 
in-flight=0 domain-list=0:271(----),^M
(XEN) IO-APIC interrupt information:^M
(XEN)     IRQ  0 Vec240:^M
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  1 Vec 56:^M
(XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  3 Vec 64:^M
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  4 Vec 72:^M
(XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  5 Vec 80:^M
(XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  6 Vec 88:^M
(XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  7 Vec 96:^M
(XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  8 Vec104:^M
(XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ  9 Vec112:^M
(XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=L mask=0 dest_id:1^M
(XEN)     IRQ 10 Vec120:^M
(XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 11 Vec136:^M
(XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 12 Vec144:^M
(XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 13 Vec152:^M
(XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 14 Vec160:^M
(XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 15 Vec168:^M
(XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0 
polarity=0 irr=0 trig=E mask=0 dest_id:1^M
(XEN)     IRQ 16 Vec176:^M
(XEN)       Apic 0x00, Pin 16: vec=b0 delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=0 dest_id:4^M
(XEN)     IRQ 18 Vec184:^M
(XEN)       Apic 0x00, Pin 18: vec=b8 delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=0 dest_id:8^M
(XEN)     IRQ 19 Vec 41:^M
(XEN)       Apic 0x00, Pin 19: vec=29 delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=1 dest_id:15^M
(XEN)     IRQ 20 Vec 57:^M
(XEN)       Apic 0x00, Pin 20: vec=39 delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=1 dest_id:15^M
(XEN)     IRQ 22 Vec 97:^M
(XEN)       Apic 0x00, Pin 22: vec=61 delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=0 dest_id:8^M
(XEN)     IRQ 23 Vec216:^M
(XEN)       Apic 0x00, Pin 23: vec=d8 delivery=LoPri dest=L status=0 
polarity=1 irr=0 trig=L mask=0 dest_id:4^M
(XEN) number of MP IRQ sources: 15.^M
(XEN) number of IO-APIC #2 registers: 24.^M
(XEN) testing the IO APIC.......................^M
(XEN) IO APIC #2......^M
(XEN) .... register #00: 02000000^M
(XEN) .......    : physical APIC id: 02^M
(XEN) .......    : Delivery Type: 0^M
(XEN) .......    : LTS          : 0^M
(XEN) .... register #01: 00170020^M
(XEN) .......     : max redirection entries: 0017^M
(XEN) .......     : PRQ implemented: 0^M
(XEN) .......     : IO APIC version: 0020^M
(XEN) .... IRQ redirection table:^M
(XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:   ^M
(XEN)  00 000 00  1    0    0   0   0    0    0    00^M
(XEN)  01 001 01  0    0    0   0   0    1    1    38^M
(XEN)  02 001 01  0    0    0   0   0    1    1    F0^M
(XEN)  03 001 01  0    0    0   0   0    1    1    40^M
(XEN)  04 001 01  0    0    0   0   0    1    1    48^M
(XEN)  05 001 01  0    0    0   0   0    1    1    50^M
(XEN)  06 001 01  0    0    0   0   0    1    1    58^M
(XEN)  07 001 01  0    0    0   0   0    1    1    60^M
(XEN)  08 001 01  0    0    0   0   0    1    1    68^M
(XEN)  09 001 01  0    1    0   0   0    1    1    70^M
(XEN)  0a 001 01  0    0    0   0   0    1    1    78^M
(XEN)  0b 001 01  0    0    0   0   0    1    1    88^M
(XEN)  0c 001 01  0    0    0   0   0    1    1    90^M
(XEN)  0d 001 01  0    0    0   0   0    1    1    98^M
(XEN)  0e 001 01  0    0    0   0   0    1    1    A0^M
(XEN)  0f 001 01  0    0    0   0   0    1    1    A8^M
(XEN)  10 004 04  0    1    1   1   1    1    1    B0^M
(XEN)  11 000 00  1    0    0   0   0    0    0    00^M
(XEN)  12 008 08  0    1    0   1   0    1    1    B8^M
(XEN)  13 00F 0F  1    1    0   1   0    1    1    29^M
(XEN)  14 00F 0F  1    1    0   1   0    1    1    39^M
(XEN)  15 07A 0A  1    0    0   0   0    0    2    B4^M
(XEN)  16 008 08  0    1    0   1   0    1    1    61^M
(XEN)  17 004 04  0    1    0   1   0    1    1    D8^M
(XEN) Using vector-based indexing^M
(XEN) IRQ to pin mappings:^M
(XEN) IRQ240 -> 0:2^M
(XEN) IRQ56 -> 0:1^M
(XEN) IRQ64 -> 0:3^M
(XEN) IRQ72 -> 0:4^M
(XEN) IRQ80 -> 0:5^M
(XEN) IRQ88 -> 0:6^M
(XEN) IRQ96 -> 0:7^M
(XEN) IRQ104 -> 0:8^M
(XEN) IRQ112 -> 0:9^M
(XEN) IRQ120 -> 0:10^M
(XEN) IRQ136 -> 0:11^M
(XEN) IRQ144 -> 0:12^M
(XEN) IRQ152 -> 0:13^M
(XEN) IRQ160 -> 0:14^M
(XEN) IRQ168 -> 0:15^M
(XEN) IRQ176 -> 0:16^M
(XEN) IRQ184 -> 0:18^M
(XEN) IRQ41 -> 0:19^M
(XEN) IRQ57 -> 0:20^M
(XEN) IRQ97 -> 0:22^M
(XEN) IRQ216 -> 0:23^M
(XEN) .................................... done.^M
(XEN) ^M
(XEN) ****************************************^M
(XEN) Panic on CPU 3:^M
(XEN) CA-107844****************************************^M
(XEN) ^M
(XEN) Reboot in five seconds...^M
(XEN) Executing crash image^M


Am 19.08.2013 17:14, schrieb Thimo E.:
> Hello,
>
> after one week of testing an intermediate result:
>
> Since I've set iommu=no-intremap no crash occured so far. The server 
> never ran longer without a crash. So a careful "it's working", but, 
> because only one 7 days passed so far, not a final horray.
>
> Even if this option really avoids the problem I classify it as nothing 
> more than a workaround...obviously a good one because it's working, 
> but still a workaround.
>
> Where could the problem of the source be ? Bug in hardware ? Bug in 
> software ?
>
> And what does interrupt remapping really do ? Does disabling remapping 
> have a performance impact ?
>
> Best regards
>   Thimo
>
> Am 12.08.2013 14:04, schrieb Andrew Cooper:
>> On 12/08/13 12:52, Thimo E wrote:
>>> Hello Yang,
>>>
>>> attached you'll find the kernel dmesg, xen dmesg, lspci and output 
>>> of /proc/interrupts. If you want to see further logfiles, please let 
>>> me know.
>>>
>>> The processor is a Core i5-4670. The board is an Intel DH87MC 
>>> Mainboard. I am really not sure if it supports APICv, but VT-d is 
>>> supported enabled enabled.
>>>
>>>
>>>> 4.The status of IRQ 29 is 10 which means the guest already issues 
>>>> the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there 
>>>> should be no pending EOI in the EOI stack. If possible, can you add 
>>>> some debug message in the guest EOI code path(like 
>>>> _irq_guest_eoi())) to track the EOI?
>>>>
>>> I don't see the IRQ29 in /proc/interrupts, what I see is:
>>> cat xen-dmesg.txt |grep "29": (XEN) allocated vector 29 for irq 20
>>> cat dmesg.txt | grep "eth0": [   23.152355] e1000e 0000:00:19.0: PCI 
>>> INT A -> GSI 20 (level, low) -> IRQ 20
>>>                                                   [ 23.330408] 
>>> e1000e 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
>>>
>>> So is the ethernet irq the bad one ? That is an Onboard Intel 
>>> network adapter.
>>
>> That would be consistent with the crash seen with our hardware in 
>> XenServer
>>
>>>
>>>> 6.I guess the interrupt remapping is enabled in your machine. Can 
>>>> you try to disable IR to see whether it still reproduceable?
>>>>
>>> Just to be sure, your proposal is to try the parameter "no-intremap" ?
>>
>> specifically, iommu=no-intremap
>>
>>>
>>> Best regards
>>>   Thimo
>>
>> ~Andrew
>>
>>>
>>> Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>>>>
>>>> Hi Thimo,
>>>>
>>>> From your previous experience and log, it shows:
>>>>
>>>> 1.The interrupt that triggers the issue is a MSI.
>>>>
>>>> 2.MSI are treated as edge-triggered interrupts nomally, except when 
>>>> there is no way to mask the device. In this case, your previous log 
>>>> indicates the device is unmaskable(What special device are you 
>>>> using?Modern PCI devcie should be maskable).
>>>>
>>>> 3.The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.
>>>>
>>>> 4.The status of IRQ 29 is 10 which means the guest already issues 
>>>> the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there 
>>>> should be no pending EOI in the EOI stack. If possible, can you add 
>>>> some debug message in the guest EOI code path(like 
>>>> _irq_guest_eoi())) to track the EOI?
>>>>
>>>> 5.Both of the log show when the issue occured, most of the other 
>>>> interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is 
>>>> it a coincidence? Or it happened only on the special condition like 
>>>> heavy of IRQ migration?Perhaps you can disable irq balance in dom0 
>>>> and pin the IRQ manually.
>>>>
>>> |6.I guess the interrupt remapping is enabled in your machine. Can 
>>> you try to disable IR to see whether it still reproduceable?
>>>>
>>>> Also, please provide the whole Xen log.
>>>>
>>>> Best regards,
>>>>
>>>> Yang
>>>>
>>>
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 39342 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-20  5:43                             ` Thimo Eichstädt
@ 2013-08-20  8:40                               ` Jan Beulich
  2013-08-20  8:50                                 ` Zhang, Yang Z
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2013-08-20  8:40 UTC (permalink / raw)
  To: Thimo Eichstädt
  Cc: Keir Fraser, Andrew Cooper, Eddie Dong, Xen-develList,
	Jun Nakajima, Yang Z Zhang, Xiantao Zhang

>>> On 20.08.13 at 07:43, Thimo Eichstädt<thimoe@digithi.de> wrote:
> (XEN) **Pending EOI error^M
> (XEN)   irq 29, vector 0x21^M
> (XEN)   s[0] irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR 00000000^M
> (XEN) All LAPIC state:^M
> (XEN) [vector]      ISR      TMR      IRR^M
> (XEN) [1f:00] 00000000 00000000 00000000^M
> (XEN) [3f:20] 00020002 00000000 00000000^M

It ought to be plain impossible to receive an interrupt at vector
0x21 while the ISR bit for vector 0x31 is still set.

Intel folks - any input on this?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-20  8:40                               ` Jan Beulich
@ 2013-08-20  8:50                                 ` Zhang, Yang Z
  2013-08-23  7:22                                   ` Thimo Eichstädt
  0 siblings, 1 reply; 63+ messages in thread
From: Zhang, Yang Z @ 2013-08-20  8:50 UTC (permalink / raw)
  To: Jan Beulich, Thimo Eichst?dt
  Cc: Keir Fraser, Andrew Cooper, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Xiantao

Jan Beulich wrote on 2013-08-20:
>>>> On 20.08.13 at 07:43, Thimo Eichstädt<thimoe@digithi.de> wrote:
>> (XEN) **Pending EOI error^M (XEN)   irq 29, vector 0x21^M (XEN)   s[0]
>> irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR 00000000^M
>> (XEN) All LAPIC state:^M (XEN) [vector]      ISR      TMR      IRR^M
>> (XEN) [1f:00] 00000000 00000000 00000000^M (XEN) [3f:20] 00020002
>> 00000000 00000000^M
> 
> It ought to be plain impossible to receive an interrupt at vector
> 0x21 while the ISR bit for vector 0x31 is still set.
> 
> Intel folks - any input on this?
I have no idea with this. But I will forward the information to some experts internally for help. 

> 
> Jan


Best regards,
Yang


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-20  8:50                                 ` Zhang, Yang Z
@ 2013-08-23  7:22                                   ` Thimo Eichstädt
  2013-08-23  7:30                                     ` Zhang, Yang Z
  2013-08-27  1:03                                     ` Zhang, Yang Z
  0 siblings, 2 replies; 63+ messages in thread
From: Thimo Eichstädt @ 2013-08-23  7:22 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

Hello Yang,

any update from your side ? Did your expert have any idea ? Possible 
Hardware problem ?

Best regards
   Thimo

Am 20.08.2013 10:50, schrieb Zhang, Yang Z:
> Jan Beulich wrote on 2013-08-20:
>>>>> On 20.08.13 at 07:43, Thimo Eichstädt<thimoe@digithi.de> wrote:
>>> (XEN) **Pending EOI error^M (XEN)   irq 29, vector 0x21^M (XEN)   s[0]
>>> irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR 00000000^M
>>> (XEN) All LAPIC state:^M (XEN) [vector]      ISR      TMR      IRR^M
>>> (XEN) [1f:00] 00000000 00000000 00000000^M (XEN) [3f:20] 00020002
>>> 00000000 00000000^M
>> It ought to be plain impossible to receive an interrupt at vector
>> 0x21 while the ISR bit for vector 0x31 is still set.
>>
>> Intel folks - any input on this?
> I have no idea with this. But I will forward the information to some experts internally for help.
>
>> Jan
>
> Best regards,
> Yang
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-23  7:22                                   ` Thimo Eichstädt
@ 2013-08-23  7:30                                     ` Zhang, Yang Z
  2013-08-27  1:03                                     ` Zhang, Yang Z
  1 sibling, 0 replies; 63+ messages in thread
From: Zhang, Yang Z @ 2013-08-23  7:30 UTC (permalink / raw)
  To: Thimo Eichst?dt
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

Thimo Eichstädt wrote on 2013-08-23:
> Hello Yang,
> 
> any update from your side ? Did your expert have any idea ? Possible
> Hardware problem ?
Sorry, no update on this. I am still waiting the answer from hardware team.

> 
> Best regards
>    Thimo
> Am 20.08.2013 10:50, schrieb Zhang, Yang Z:
>> Jan Beulich wrote on 2013-08-20:
>>>>>> On 20.08.13 at 07:43, Thimo Eichstädt<thimoe@digithi.de> wrote:
>>>> (XEN) **Pending EOI error^M (XEN)   irq 29, vector 0x21^M (XEN)  
>>>> s[0] irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR
>>>> 00000000^M (XEN) All LAPIC state:^M (XEN) [vector]      ISR      TMR
>>>> IRR^M (XEN) [1f:00] 00000000 00000000 00000000^M (XEN) [3f:20]
>>>> 00020002 00000000 00000000^M
>>> It ought to be plain impossible to receive an interrupt at vector
>>> 0x21 while the ISR bit for vector 0x31 is still set.
>>> 
>>> Intel folks - any input on this?
>> I have no idea with this. But I will forward the information to some
>> experts internally for help.
>> 
>>> Jan
>> 
>> Best regards,
>> Yang
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel


Best regards,
Yang


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-23  7:22                                   ` Thimo Eichstädt
  2013-08-23  7:30                                     ` Zhang, Yang Z
@ 2013-08-27  1:03                                     ` Zhang, Yang Z
  2013-09-04 18:32                                       ` Thimo E.
  1 sibling, 1 reply; 63+ messages in thread
From: Zhang, Yang Z @ 2013-08-27  1:03 UTC (permalink / raw)
  To: Thimo Eichst?dt
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

Zhang, Yang Z wrote on 2013-08-23:
> Thimo Eichstädt wrote on 2013-08-23:
>> Hello Yang,
>> 
>> any update from your side ? Did your expert have any idea ? Possible
>> Hardware problem ?
> Sorry, no update on this. I am still waiting the answer from hardware team.
Hi Thimo,

I remember that the CPU always in idle state when this issue happens. So can you have a try to disable the C state in Xen to see if it helps?

> 
>> 
>> Best regards
>>    Thimo
>> Am 20.08.2013 10:50, schrieb Zhang, Yang Z:
>>> Jan Beulich wrote on 2013-08-20:
>>>>>>> On 20.08.13 at 07:43, Thimo Eichstädt<thimoe@digithi.de> wrote:
>>>>> (XEN) **Pending EOI error^M (XEN)   irq 29, vector 0x21^M (XEN) s[0]
>>>>> irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR
>>>>> 00000000^M (XEN) All LAPIC state:^M (XEN) [vector]      ISR TMR
>>>>> IRR^M (XEN) [1f:00] 00000000 00000000 00000000^M (XEN) [3f:20]
>>>>> 00020002 00000000 00000000^M
>>>> It ought to be plain impossible to receive an interrupt at vector
>>>> 0x21 while the ISR bit for vector 0x31 is still set.
>>>> 
>>>> Intel folks - any input on this?
>>> I have no idea with this. But I will forward the information to
>>> some experts internally for help.
>>> 
>>>> Jan
>>> 
>>> Best regards,
>>> Yang
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xen.org
>>> http://lists.xen.org/xen-devel
> 
> 
> Best regards,
> Yang
>


Best regards,
Yang


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-27  1:03                                     ` Zhang, Yang Z
@ 2013-09-04 18:32                                       ` Thimo E.
  2013-09-04 18:55                                         ` Andrew Cooper
                                                           ` (2 more replies)
  0 siblings, 3 replies; 63+ messages in thread
From: Thimo E. @ 2013-09-04 18:32 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

[-- Attachment #1: Type: text/plain, Size: 2860 bytes --]

Hello again,

the last two weeks no crash with pinning dom0_vcpus_pin and restricting 
dom0 to 1 cpu. But yesterday it crashed again. So changed the command 
line again to:

iommu=no-intremap noirqbalance com1=115200,8n1,0xe050,0 console=com1,vga 
mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M watchdog_timeout=300 
lowmem_emergency_pool=1M crashkernel=64M@32M cpuid_mask_xsave_eax=0

And today server crashed again and produced a lot of debugging messages, 
see attached. The "..." in the logfiles mean that the message above the 
points was repeated very often.

My summary so far:
- With only 1 cpu atteched to dom0 the server was stable for 2 weeks, 
the crash there did not really show any irq problems, see crash20130903.txt
    You can find Andrews ideas to this in 
http://forums.citrix.com/thread.jspa?messageID=1760771#1760771
- With more than 1 cpu and irqbalance the server produced the crashes 
I've already posted before
- Without irqbalance crash with some other fancy output, see 
crash20130904.txt

Next step is to change the network card.

Zhang, any update from your side ? Or do the others have any idea ?
Could "ioapic_ack=old" help somewhere ?

Best regards
   Thimo

Am 27.08.2013 03:03, schrieb Zhang, Yang Z:
> Zhang, Yang Z wrote on 2013-08-23:
>> Thimo Eichstädt wrote on 2013-08-23:
>>> Hello Yang,
>>>
>>> any update from your side ? Did your expert have any idea ? Possible
>>> Hardware problem ?
>> Sorry, no update on this. I am still waiting the answer from hardware team.
> Hi Thimo,
>
> I remember that the CPU always in idle state when this issue happens. So can you have a try to disable the C state in Xen to see if it helps?
>
>>> Best regards
>>>     Thimo
>>> Am 20.08.2013 10:50, schrieb Zhang, Yang Z:
>>>> Jan Beulich wrote on 2013-08-20:
>>>>>>>> On 20.08.13 at 07:43, Thimo Eichstädt<thimoe@digithi.de> wrote:
>>>>>> (XEN) **Pending EOI error^M (XEN)   irq 29, vector 0x21^M (XEN) s[0]
>>>>>> irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR
>>>>>> 00000000^M (XEN) All LAPIC state:^M (XEN) [vector]      ISR TMR
>>>>>> IRR^M (XEN) [1f:00] 00000000 00000000 00000000^M (XEN) [3f:20]
>>>>>> 00020002 00000000 00000000^M
>>>>> It ought to be plain impossible to receive an interrupt at vector
>>>>> 0x21 while the ISR bit for vector 0x31 is still set.
>>>>>
>>>>> Intel folks - any input on this?
>>>> I have no idea with this. But I will forward the information to
>>>> some experts internally for help.
>>>>
>>>>> Jan
>>>> Best regards,
>>>> Yang
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> http://lists.xen.org/xen-devel
>>
>> Best regards,
>> Yang
>>
>
> Best regards,
> Yang
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #2: crash20130904.txt --]
[-- Type: text/plain, Size: 31115 bytes --]

(XEN) Assertion 'entry->next->prev == entry' failed at /bind/myrepos/xen-4.1.hg/xen/include/xen/list.h:169^M
(XEN) vmx.c:2615:d4 Bad vmexit (reason 3)^M
(XEN) ----[ Xen-4.1.5-debug5  x86_64  debug=y  Not tainted ]----^M
(XEN) domain_crash called from vmx.c:2616^M
(XEN) CPU:    2^M
(XEN) Domain 4 (vcpu#0) crashed on cpu#1:^M
(XEN) RIP:    e008:[<ffff82c48012a2a3>]----[ Xen-4.1.5-debug5  x86_64  debug=y  Not tainted ]----^M
(XEN)  set_timer+0x112/0x1cfCPU:    1^M
(XEN) ^M
(XEN) RFLAGS: 0000000000010007   RIP:    0010:[<fffff80001632450>]CONTEXT: hypervisor^M
(XEN) ^M
(XEN) RFLAGS: 0000000000000046   rax: ffff83081f02cce0   rbx: 0000000000000002   rcx: ffff83081f04a068^M
(XEN) CONTEXT: hvm guest^M
(XEN) rdx: 0000000000000086   rsi: 000078e965c15ea0   rdi: ffff83081f04a060^M
(XEN) ----[ Xen-4.1.5-debug5  x86_64  debug=y  Not tainted ]----^M
(XEN) rbp: ffff83081f047e10   rsp: ffff83081f047dd0   r8:  ffff83081f04ad80^M
(XEN) CPU:    1^M
(XEN) r9:  00019a62f338562a   r10: ffff8304bb5f8750   r11: 000079613423c962^M
(XEN) RIP:    e008:[<ffff82c480169d9e>]r12: ffff83081f04a060   r13: 0000000000000002   r14: ffff82c4802f5400^M
(XEN)  do_IRQ+0x968/0xbefr15: ffff82c4802e2c60   cr0: 000000008005003b   cr4: 00000000001026f0^M
(XEN) ^M
(XEN) RFLAGS: 0000000000010016   cr3: 000000053316b000   cr2: 0000000000320000^M
(XEN) CONTEXT: hypervisor^M
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) Xen stack trace from rsp=ffff83081f047dd0:^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) rax: 00000000000000aa   rbx: ffff83081f081000   rcx: ffff82c4802f5860^M
(XEN)  ffff82c4802f5400CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000078e965c15ea0rdx: 000000439ed64d00   rsi: 0000000000000000   rdi: ffff83081f001ec0^M
(XEN)  0000000000000082rbp: ffff83081f057988   rsp: ffff83081f0578c8   r8:  0000000000000002^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) r9:  0000000000000002   r10: 0000000000000002   r11: 0000000000000002^M
(XEN)  ffff8300b8576000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   r12: ffff83081f02a000   r13: 0000000000000000   r14: ffff830808699a30^M
(XEN)  0000000001c9c380CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f04a060CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) r15: 000000000000007c   cr0: 000000008005003b   cr4: 00000000001026f0^M
(XEN)  ffff8300b857c000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c4802e2c60CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047ea0cr3: 00000004bb5ce000   cr2: 00000000000000aa^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480124c23ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008^M
(XEN)  000078e963f79b20CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  00ff82c4801294c5Xen stack trace from rsp=ffff83081f0578c8:^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  00000003361e6f80CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f04a040 ffff83081f057918 0000000280154d5d ffff83081f057910 ffff83081f04a100CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f073ca8CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff8300b8576000 000000000000001f^M
(XEN)    0000000001c9c380 000000000000001fCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83053a72a600CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f04a100CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)    ffff83081f081000 ffffffffffffffffCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000002CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000002CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffffffffffffffff 0000000000000000^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48013fdbfCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047f18^M
(XEN)    ffff82c4802d4680CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f057928CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000020CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047ee0 0000000080243f49CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f081000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480126b71 ffff83081f057978CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)    ffff82c4802d4680 ffff82c4801402e7CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047f18 00000008ffd090f0 ffff8300b857c000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c4802663a0 ffff8300b8576000^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)    0000000000000086CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480243f44CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff8300b8576000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f057a50 0000000000000002CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047ef0CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480126ccdCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)    0000000000000000^M
(XEN)    ffff83081f047f10 ffff83081f057a50 0000000000000282CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48015a26bCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480126ccd ffff8300b8588000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000003^M
(XEN)    0000000000000002CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  00007cf7e0fa8647CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480161a66 ffff83081f047d18CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffffffffffffffff 0000000000000003CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000000 ffff8300b8588000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   ^M
(XEN)    0000000000000282 ffffffff81c0b2e0 ffff83081f057a50CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffffffff81a01f00CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffffffff81a00000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000246CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f057b48CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000000^M
(XEN)   ^M
(XEN)   ^M
(XEN)    0000000000000002 000078bcade46669 0000000000000002 0000000000000004 ffff88000f14f848CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000000 0000000000000004^M
(XEN)   ^M
(XEN)    ffffffff810013aa 0000000000000000 0000000000000000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000000000000000a 00000000deadbeef 000000000000000a 00000000deadbeefCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000000000000000a^M
(XEN)   ^M
(XEN)    0000010000000000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffffffff810013aa ffff82c4802662e8 000000000000e033CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000246CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000003900000000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffffffff81a01ee8CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48013eef4CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000000000000e02bCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  2477815fcb0dbeefCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  fdcc0a71e54abeefCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000000000000e008^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  a7647993c47cbeefCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)    0000000000000282CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f057a48CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  58ce58ce58cebeef 0000000000000000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  bb582f7d00000002CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff8300b857c000 0000000000000282^M
(XEN)   ^M
(XEN)    000000439ed54d00 ffff83081f057a58 ffff82c4802d5e93 11295b7bcfdee294 0000003000000010CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) Xen call trace:^M
(XEN)    CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c48012a2a3>] ffff83081f057b58^M
(XEN)    set_timer+0x112/0x1cf^M
(XEN)    CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c480124c23>] ffff83081f057a78CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  schedule+0x19d/0x883^M
(XEN)     ffff82c480126f2cCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f057b88CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48024d471CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000002CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000020 0000000000000004CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c480126b71>]CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  __do_softirq+0x87/0x98^M
(XEN)     0000000000000004[<ffff82c480126ccd>]CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  do_softirq+0x5d/0x60^M
(XEN)    ^M
(XEN)   [<ffff82c48015a26b>]CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f057ab8 idle_loop+0x52/0x59^M
(XEN)    CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)  ffff82c48013fdbfCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN) ****************************************^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f057ad8CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) Panic on CPU 2:^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000001Assertion 'entry->next->prev == entry' failed at /bind/myrepos/xen-4.1.hg/xen/include/xen/list.h:169^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ****************************************^M
(XEN) ^M
(XEN) Reboot in five seconds...^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) Executing crash image^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   Assertion 'gate->b == new->b' failed at /bind/myrepos/xen-4.1.hg/xen/include/asm/desc.h:149^M
(XEN)  000000000000000aCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000000000000000a----[ Xen-4.1.5-debug5  x86_64  debug=y  Not tainted ]----^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU:    2^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) RIP:    e008:[<ffff82c48019f5d6>]CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  machine_crash_shutdown+0x106/0x1f5^M
(XEN) RFLAGS: 0000000000010002   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) CONTEXT: hypervisor^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) rax: 00000000ffff82c4   rbx: 0000000000000002   rcx: ffff83081f02e030^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) rdx: 0000000000000000   rsi: ffff82c480219990   rdi: 0000000000000000^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) rbp: ffff83081f047b88   rsp: ffff83081f047b68   r8:  0000000000000003^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) r9:  ffff82c4802604b8   r10: 0000ffffffff0000   r11: 0000000000000002^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) r12: fffffff8ffffffff   r13: ffff82c4802604a0   r14: 00008e00e0080000^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) r15: 00000000000000a9   cr0: 000000008005003b   cr4: 00000000001026f0^M
(XEN) cr3: 000000053316b000   cr2: 0000000000320000^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008^M
(XEN) CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) Xen stack trace from rsp=ffff83081f047b68:^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000003CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000082CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480241198CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48012a2adCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)    ffff83081f047ba8CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480111782CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000082CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480253770CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047c98CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48013f37dCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000003000000020CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047ca8CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047bd8CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f04ad80CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047c28CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48023e660CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480241198CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  00000000000000a9CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000004CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000004CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047c48CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480127ec5CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000010007CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000059CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)    0000000000000052CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047f10 ffff82c48015a26bCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047ef0CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047c78CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480187fabCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047d28CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047d28CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480241198CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48012a2adCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047c98CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480187fcdCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48023e660 ffff83081f047d28CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047d18CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480188a7bCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48011c020CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c4801248b5CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000000021f047d18CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047d00CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000000001143b3bcCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000004CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  00000002a7c20b0fCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c480159e06CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000078e963f79331CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff8300b857c000CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f04a060CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000002CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c4802f5400CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c4802e2c60CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  00007cf7e0fb82b7CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c4802196ceCPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)    ffff82c4802e2c60 ffff82c4802f5400CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000002CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f04a060CPU252117232: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047e10CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000002CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000079613423c962CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff8304bb5f8750CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  00019a62f338562aCPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f04ad80CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f02cce0CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f04a068CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000086CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000078e965c15ea0 ffff83081f04a060CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000600000000CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c48012a2a3CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000000000000e008CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000010007CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff83081f047dd0CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN)   CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000000000000e010CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  ffff82c4802f5400CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  000078e965c15ea0CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  0000000000000082CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) Xen call trace:^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c48019f5d6>]CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  machine_crash_shutdown+0x106/0x1f5^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c480111782>]CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  kexec_crash+0x4c/0x70^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c48013f37d>]CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  panic+0x16d/0x1a1^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c480188a7b>]CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  do_invalid_op+0x3cc/0x457^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c4802196ce>]CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  handle_exception_saved+0x30/0x6e^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c48012a2a3>]CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  set_timer+0x112/0x1cf^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c480124c23>]CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  schedule+0x19d/0x883^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c480126b71>]CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  __do_softirq+0x87/0x98^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c480126ccd>]CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  do_softirq+0x5d/0x60^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) [<ffff82c48015a26b>]CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN)  idle_loop+0x52/0x59^M
(XEN)    CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) ^M
(XEN) ****************************************^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) Panic on CPU 2:^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) Assertion 'gate->b == new->b' failed at /bind/myrepos/xen-4.1.hg/xen/include/asm/desc.h:149^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) ****************************************^M
(XEN) ^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) Reboot in five seconds...^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) Executing crash image^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU251789552: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU251789552: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU251789552: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU251789552: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU251789552: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU251691312: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU3267543746: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU48526: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU52622: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU60814: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU32038: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU217612592: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU30206: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU217088304: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU42492: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU46588: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU54780: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU4083646447: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU211320816: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU211320816: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU211255600: No irq handler for vector 2e (IRQ -65536)^M
...
(XEN) CPU211255600: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU41980: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU50172: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU120581: No irq handler for vector 2e (IRQ -2147483648)^M
(XEN) CPU4100587375: No irq handler for vector 2e (IRQ -65536)^M
...
(XEN) CPU4100587375: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU4294967295: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU4294967295: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU4294967295: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU185008272: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU4294967295: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU4294967295: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU4294967295: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU3267543746: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU49155: No irq handler for vector 2e (IRQ -1027423550)^M
(XEN) CPU53251: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU184516912: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU184516912: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU4294967295: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU4294967295: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU1: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU181633168: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU4294967295: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU4294967295: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU148963632: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU148963632: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU3006840808: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU3006840808: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU3006840808: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU224981527: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU17695024: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU17695024: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU67174464: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU16974128: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU16974128: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU134858908: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU942945636: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU0: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU64: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU64: No irq handler for vector 2e (IRQ -65536)^M
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU12157232: No irq handler for vector 2e (IRQ -1)^M
...
(XEN) CPU12157232: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU64: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU64: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU1514292291: No irq handler for vector 2e (IRQ -65536)^M
...
(XEN) CPU2149999336: No irq handler for vector 2e (IRQ -1)^M
(XEN) CPU10387760: No irq handler for vector 2e (IRQ -1)^M
...

[-- Attachment #3: crash20130903.txt --]
[-- Type: text/plain, Size: 4657 bytes --]

(XEN) HVM4: [FFFFFA8003BCBC00] WskKnrCompleteRequest: complete irp with IO status = 000000^M
(XEN) HVM4: 00.^M
(XEN) HVM4: [addr=FFFFF8A0010779C0] WskProAPIFreeAddressInfo freed addrinfo.^M
(XEN) ----[ Xen-4.1.5-debug5  x86_64  debug=y  Not tainted ]----^M
(XEN) CPU:    3^M
(XEN) RIP:    e008:[<ffff82c480126b1c>] __do_softirq+0x32/0x98^M
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor^M
(XEN) rax: 0000000000000000   rbx: 0000000000000003   rcx: 00000000000006e0^M
(XEN) rdx: 0000000000000001   rsi: 00000000d91b056b   rdi: ffff83081f03a100^M
(XEN) rbp: ffff83081f037ee0   rsp: ffff83081f037eb0   r8:  ffff83081f03ad80^M
(XEN) r9:  000e1f52c67afb54   r10: ffff8304bab42750   r11: 000429b904f1b609^M
(XEN) r12: ffff83081f03a100   r13: 0000000000000003   r14: ffffffffffffffff^M
(XEN) r15: ffff83081f037f18   cr0: 000000008005003b   cr4: 00000000001026f0^M
(XEN) cr3: 0000000730157000   cr2: 00000000000000c7^M
(XEN) ds: 004b   es: 004b   fs: 0000   gs: 01c3   ss: e010   cs: e008^M
(XEN) Xen stack trace from rsp=ffff83081f037eb0:^M
(XEN)    ffff82c4802d4680 ffff83081f037f18 ffff8300b83fc000 ffff8300b8572000^M
(XEN)    ffff8300b8572000 0000000000000002 ffff83081f037ef0 ffff82c480126ccd^M
(XEN)    ffff83081f037f10 ffff82c48015a26b ffff82c480126ccd 0000000000000003^M
(XEN)    ffff83081f037d18 0000000000000000 0000000000000000 fffffffffbcc5100^M
(XEN)    fffffffffbc618a0 ffffff0003e05bc0 0000000000000000----[ Xen-4.1.5-debug5  x86_64  debug=y  Not tainted ]----^M
(XEN)  0000000000000282CPU:    0^M
(XEN) RIP:    e008:[<ffff82c480126b1c>]^M
(XEN)    ffffff01479abb98 __do_softirq+0x32/0x98 0000000000000000^M
(XEN) RFLAGS: 0000000000010286    0000000000000001 0000000000000000CONTEXT: hypervisor^M
(XEN) rax: 0000000000000000   rbx: 0000000000000000   rcx: 00000000000006e0^M
(XEN) ^M
(XEN)   rdx: 0000000000000001   rsi: 00000000d91b056b   rdi: ffff82c4802f5400^M
(XEN)  fffffffffb84f621rbp: ffff82c4802b7ee0   rsp: ffff82c4802b7eb0   r8:  ffff82c4802f6080^M
(XEN)  0000000000000000r9:  000e1f52c67af967   r10: ffff83081f0bcf10   r11: 000429b8f5b76200^M
(XEN)  00000000deadbeefr12: ffff82c4802f5400   r13: 0000000000000000   r14: ffffffffffffffff^M
(XEN)  00000000deadbeefr15: ffff82c4802b7f18   cr0: 000000008005003b   cr4: 00000000001026f0^M
(XEN) ^M
(XEN)   cr3: 000000081f004000   cr2: 00000000000000c7^M
(XEN)  0000010000000000ds: 007b   es: 007b   fs: 00d8   gs: 0000   ss: 0000   cs: e008^M
(XEN)  fffffffffb84f621Xen stack trace from rsp=ffff82c4802b7eb0:^M
(XEN)    000000000000e033 0000000000000282 ffff82c4802d4680^M
(XEN)    ffff82c4802b7f18 ffffff0003e05bb0 ffff8300cb4fc000 000000000000e02b ffff8300cb4fa000 78bc1b84dd95beef^M
(XEN)    9565eebb1d06beef ffff8300cb4fa000^M
(XEN)    0000000000000002 519abee16014beef ffff82c4802b7ef0 2cdfeaee668dbeef ffff82c480126ccd 1a62503000000003^M
(XEN)    ffff8300b83fc000 ffff82c4802b7f10----[ Xen-4.1.5-debug5  x86_64  debug=y  Not tainted ]----^M
(XEN)  ffff82c48015a26b ffff82c480126ccd^M
(XEN)    0000000000000000 000000439ed44d00CPU:    2^M
(XEN)  970920b4cfcf1138RIP:    e008:[<ffff82c480126b1c>]^M
(XEN) ^M
(XEN)    __do_softirq+0x32/0x98Xen call trace:^M
(XEN)    ^M
(XEN) RFLAGS: 0000000000010286   [<ffff82c480126b1c>]CONTEXT: hypervisor^M
(XEN)  __do_softirq+0x32/0x98^M
(XEN)    rax: 0000000000000000   rbx: 0000000000000002   rcx: 00000000000006e0^M
(XEN) [<ffff82c480126ccd>]rdx: 0000000000000001   rsi: 00000000d91b056b   rdi: ffff83081f04a100^M
(XEN)  ffff82c4802b7d18 do_softirq+0x5d/0x60^M
(XEN)     0000000000000000 0000000000000000[<ffff82c48015a26b>] 0000000000000000 idle_loop+0x52/0x59^M
(XEN)    ^M
(XEN)   ^M
(XEN)  0000000000000000Pagetable walk from 00000000000000c7:^M
(XEN)  00000000c04fff9c 00000000deadbeef L4[0x000] = 0000000000000000 ffffffffffffffff^M
(XEN)  0000000000000000^M
(XEN) ****************************************^M
(XEN) Panic on CPU 3:^M
(XEN) ^M
(XEN)   FATAL PAGE FAULT^M
(XEN) [error_code=0000]^M
(XEN) Faulting linear address: 00000000000000c7^M
(XEN)  0000000000000000****************************************^M
(XEN) ^M
(XEN) ----[ Xen-4.1.5-debug5  x86_64  debug=y  Not tainted ]----^M
(XEN)  0000000000000000rbp: ffff83081f047ee0   rsp: ffff83081f047eb0   r8:  ffff83081f04ad80^M
(XEN)  0000000000000000r9:  000e1f52c67afba9   r10: ffff8306379cc660   r11: 000429b8f5b76200^M
(XEN) Reboot in five seconds...^M
(XEN) r12: ffff83081f04a100   r13: 0000000000000002   r14: ffffffffffffffff^M
(XEN) CPU:    1^M
(XEN) r15: ffff83081f047f18   cr0: 0000000080050033   cr4: 00000000001026f0^M
(XEN)  0000000000000000RIP:    e008:[<ffff82c480126b1c>]Executing crash image^M

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-04 18:32                                       ` Thimo E.
@ 2013-09-04 18:55                                         ` Andrew Cooper
  2013-09-04 19:56                                           ` Thimo E.
  2013-09-05  1:15                                         ` Zhang, Yang Z
  2013-09-17  2:09                                         ` Zhang, Yang Z
  2 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-09-04 18:55 UTC (permalink / raw)
  To: Thimo E.
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Yang Z, Zhang, Xiantao

On 04/09/13 19:32, Thimo E. wrote:
> Hello again,
>
> the last two weeks no crash with pinning dom0_vcpus_pin and
> restricting dom0 to 1 cpu. But yesterday it crashed again. So changed
> the command line again to:
>
> iommu=no-intremap noirqbalance com1=115200,8n1,0xe050,0
> console=com1,vga mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M
> watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M
> cpuid_mask_xsave_eax=0
>
> And today server crashed again and produced a lot of debugging
> messages, see attached. The "..." in the logfiles mean that the
> message above the points was repeated very often.
>
> My summary so far:
> - With only 1 cpu atteched to dom0 the server was stable for 2 weeks,
> the crash there did not really show any irq problems, see
> crash20130903.txt
>    You can find Andrews ideas to this in
> http://forums.citrix.com/thread.jspa?messageID=1760771#1760771
> - With more than 1 cpu and irqbalance the server produced the crashes
> I've already posted before
> - Without irqbalance crash with some other fancy output, see
> crash20130904.txt
>
> Next step is to change the network card.
>
> Zhang, any update from your side ? Or do the others have any idea ?
> Could "ioapic_ack=old" help somewhere ?
>
> Best regards
>   Thimo
>

Ok - the second attachment (crash20130903.txt) is the one I have triaged
before, and the crash is impossible given the expected code flow through
the function.

%r14 is calculated as a the per-cpu cpu_info, which cannot possibly be
-1 at the point of the fault.  The only explanation is that the
pagefault is a result of a spurious jump to this location.

>From a quick glance at the other crash, vector 2e was the problematic
one (iirc).  The "Bad vmexit (reason 3)" at the top would suggest that
something on the system has sent an INIT to pcpu 2, which seems antisocial.

As we have identified that the hardware is delivering invalid
interrupts, I wouldn't necessarily read any more into this new crash;
something is very broken in the hardware.

I would be interested for any update from Intel regarding the ISR violation.

~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-04 18:55                                         ` Andrew Cooper
@ 2013-09-04 19:56                                           ` Thimo E.
  2013-09-04 20:54                                             ` Andrew Cooper
  0 siblings, 1 reply; 63+ messages in thread
From: Thimo E. @ 2013-09-04 19:56 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Yang Z, Zhang, Xiantao

Hello Andrew,

thanks for your response. At least I've seen the trigger of the new 
crash (2e) already before, so they seem so belong together.

I can't image that I am the only one on the world who is using a haswell 
board. And as I haven't seen any other Xen bug/crash reports
like mine (and one time you) nor bug reports from users with other 
operating systems, I ask myself if only my hardware is buggy
or if other operating systems handle those "spurious" interrupts in 
another way ?!?!

What does " ioapic_ack=old" change ?

Best regards
   Thimo

Am 04.09.2013 20:55, schrieb Andrew Cooper:
> On 04/09/13 19:32, Thimo E. wrote:
>> Hello again,
>>
>> the last two weeks no crash with pinning dom0_vcpus_pin and
>> restricting dom0 to 1 cpu. But yesterday it crashed again. So changed
>> the command line again to:
>>
>> iommu=no-intremap noirqbalance com1=115200,8n1,0xe050,0
>> console=com1,vga mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M
>> watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M
>> cpuid_mask_xsave_eax=0
>>
>> And today server crashed again and produced a lot of debugging
>> messages, see attached. The "..." in the logfiles mean that the
>> message above the points was repeated very often.
>>
>> My summary so far:
>> - With only 1 cpu atteched to dom0 the server was stable for 2 weeks,
>> the crash there did not really show any irq problems, see
>> crash20130903.txt
>>     You can find Andrews ideas to this in
>> http://forums.citrix.com/thread.jspa?messageID=1760771#1760771
>> - With more than 1 cpu and irqbalance the server produced the crashes
>> I've already posted before
>> - Without irqbalance crash with some other fancy output, see
>> crash20130904.txt
>>
>> Next step is to change the network card.
>>
>> Zhang, any update from your side ? Or do the others have any idea ?
>> Could "ioapic_ack=old" help somewhere ?
>>
>> Best regards
>>    Thimo
>>
> Ok - the second attachment (crash20130903.txt) is the one I have triaged
> before, and the crash is impossible given the expected code flow through
> the function.
>
> %r14 is calculated as a the per-cpu cpu_info, which cannot possibly be
> -1 at the point of the fault.  The only explanation is that the
> pagefault is a result of a spurious jump to this location.
>
>  From a quick glance at the other crash, vector 2e was the problematic
> one (iirc).  The "Bad vmexit (reason 3)" at the top would suggest that
> something on the system has sent an INIT to pcpu 2, which seems antisocial.
>
> As we have identified that the hardware is delivering invalid
> interrupts, I wouldn't necessarily read any more into this new crash;
> something is very broken in the hardware.
>
> I would be interested for any update from Intel regarding the ISR violation.
>
> ~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-04 19:56                                           ` Thimo E.
@ 2013-09-04 20:54                                             ` Andrew Cooper
  2013-09-05  1:45                                               ` Zhang, Yang Z
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-09-04 20:54 UTC (permalink / raw)
  To: Thimo E.
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Yang Z, Zhang, Xiantao

On 04/09/2013 20:56, Thimo E. wrote:
> Hello Andrew,
>
> thanks for your response. At least I've seen the trigger of the new
> crash (2e) already before, so they seem so belong together.
>
> I can't image that I am the only one on the world who is using a
> haswell board. And as I haven't seen any other Xen bug/crash reports
> like mine (and one time you) nor bug reports from users with other
> operating systems, I ask myself if only my hardware is buggy
> or if other operating systems handle those "spurious" interrupts in
> another way ?!?!
>
> What does " ioapic_ack=old" change ?
>
> Best regards
>   Thimo

ioapic_ack=old is already in effect - see "Enabled directed EOI with
ioapic_ack_old on!" in the boot dmesg.

Originally, it was a bugfix workaround for ancient IO-APIC hardware
which had a bug on one of the mask bits.  Nowadays, it is used with EOI
broadcast suppression, which is a APIC transaction performance
improvement on recent processors.  What it does is affect whether an
IO-APIC interrupt gets masked when an interrupt is received.

You could certainly try "ioapic_ack=new" and see whether that makes a
difference, given a lack of any other ideas.  It will disable EOI
broadcast suppression.

~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-04 18:32                                       ` Thimo E.
  2013-09-04 18:55                                         ` Andrew Cooper
@ 2013-09-05  1:15                                         ` Zhang, Yang Z
  2013-09-17  2:09                                         ` Zhang, Yang Z
  2 siblings, 0 replies; 63+ messages in thread
From: Zhang, Yang Z @ 2013-09-05  1:15 UTC (permalink / raw)
  To: Thimo E.
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

Thimo E. wrote on 2013-09-05:
> Hello again,
> 
> the last two weeks no crash with pinning dom0_vcpus_pin and 
> restricting
> dom0 to 1 cpu. But yesterday it crashed again. So changed the command 
> line again to:
> 
> iommu=no-intremap noirqbalance com1=115200,8n1,0xe050,0 
> console=com1,vga mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M 
> watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M 
> cpuid_mask_xsave_eax=0
> 
> And today server crashed again and produced a lot of debugging 
> messages, see attached. The "..." in the logfiles mean that the 
> message above the points was repeated very often.
> 
> My summary so far:
> - With only 1 cpu atteched to dom0 the server was stable for 2 weeks, 
> the crash there did not really show any irq problems, see crash20130903.txt
>     You can find Andrews ideas to this in
> http://forums.citrix.com/thread.jspa?messageID=1760771#1760771 - With 
> more than 1 cpu and irqbalance the server produced the crashes I've 
> already posted before - Without irqbalance crash with some other fancy 
> output, see crash20130904.txt
> 
> Next step is to change the network card.
> 
> Zhang, any update from your side ? Or do the others have any idea ?
Our hardware guys said they don't aware of such issue with this CPU. We are trying to find the same platform to reproduce now.

> Could "ioapic_ack=old" help somewhere ?
> 
> Best regards
>    Thimo
> Am 27.08.2013 03:03, schrieb Zhang, Yang Z:
>> Zhang, Yang Z wrote on 2013-08-23:
>>> Thimo Eichstädt wrote on 2013-08-23:
>>>> Hello Yang,
>>>> 
>>>> any update from your side ? Did your expert have any idea ?
>>>> Possible Hardware problem ?
>>> Sorry, no update on this. I am still waiting the answer from hardware team.
>> Hi Thimo,
>> 
>> I remember that the CPU always in idle state when this issue happens.
>> So can you have a try to disable the C state in Xen to see if it helps?
>> 
>>>> Best regards
>>>>     Thimo
>>>> Am 20.08.2013 10:50, schrieb Zhang, Yang Z:
>>>>> Jan Beulich wrote on 2013-08-20:
>>>>>>>>> On 20.08.13 at 07:43, Thimo Eichstädt<thimoe@digithi.de> wrote:
>>>>>>> (XEN) **Pending EOI error^M (XEN)   irq 29, vector 0x21^M (XEN) s[0]
>>>>>>> irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR
>>>>>>> 00000000^M (XEN) All LAPIC state:^M (XEN) [vector]      ISR TMR
>>>>>>> IRR^M (XEN) [1f:00] 00000000 00000000 00000000^M (XEN) [3f:20]
>>>>>>> 00020002 00000000 00000000^M
>>>>>> It ought to be plain impossible to receive an interrupt at vector
>>>>>> 0x21 while the ISR bit for vector 0x31 is still set.
>>>>>> 
>>>>>> Intel folks - any input on this?
>>>>> I have no idea with this. But I will forward the information to 
>>>>> some experts internally for help.
>>>>> 
>>>>>> Jan
>>>>> Best regards,
>>>>> Yang
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xen.org
>>>>> http://lists.xen.org/xen-devel
>>> 
>>> Best regards,
>>> Yang
>>> 
>> 
>> Best regards,
>> Yang
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel


Best regards,
Yang

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-04 20:54                                             ` Andrew Cooper
@ 2013-09-05  1:45                                               ` Zhang, Yang Z
  2013-09-05  7:20                                                 ` Thimo E.
  0 siblings, 1 reply; 63+ messages in thread
From: Zhang, Yang Z @ 2013-09-05  1:45 UTC (permalink / raw)
  To: Andrew Cooper, Thimo E.
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Xiantao

Andrew Cooper wrote on 2013-09-05:
> On 04/09/2013 20:56, Thimo E. wrote:
>> Hello Andrew,
>> 
>> thanks for your response. At least I've seen the trigger of the new
>> crash (2e) already before, so they seem so belong together.
>> 
>> I can't image that I am the only one on the world who is using a
>> haswell board. And as I haven't seen any other Xen bug/crash reports
>> like mine (and one time you) nor bug reports from users with other
>> operating systems, I ask myself if only my hardware is buggy or if
>> other operating systems handle those "spurious" interrupts in
>> another way ?!?!
>> 
>> What does " ioapic_ack=old" change ?
>> 
>> Best regards
>>   Thimo
> 
> ioapic_ack=old is already in effect - see "Enabled directed EOI with
> ioapic_ack_old on!" in the boot dmesg.
> 
> Originally, it was a bugfix workaround for ancient IO-APIC hardware
> which had a bug on one of the mask bits.  Nowadays, it is used with
> EOI broadcast suppression, which is a APIC transaction performance
> improvement on recent processors.  What it does is affect whether an
> IO-APIC interrupt gets masked when an interrupt is received.
> 
> You could certainly try "ioapic_ack=new" and see whether that makes a
> difference, given a lack of any other ideas.  It will disable EOI
> broadcast suppression.
> 
> ~Andrew
Hi Thimo

Did you see this issue if and only if HVM guest running? If yes, can you try to isolate the dom0 VCPUs and HVM guest's VCPUs? For example, pin all dom0's VCPUs to some PCPUs and pin all HVM guest's VCPUs to the remain PCPUs.

BTW: you didn't try the device pass-through when the issue occurs?

Best regards,
Yang

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-05  1:45                                               ` Zhang, Yang Z
@ 2013-09-05  7:20                                                 ` Thimo E.
  0 siblings, 0 replies; 63+ messages in thread
From: Thimo E. @ 2013-09-05  7:20 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

Am 05.09.2013 03:45, schrieb Zhang, Yang Z:
> Hi Thimo
>
> Did you see this issue if and only if HVM guest running? If yes, can you try to isolate the dom0 VCPUs and HVM guest's VCPUs? For example, pin all dom0's VCPUs to some PCPUs and pin all HVM guest's VCPUs to the remain PCPUs.
I have both, PV guests and HVM guests. I've tried to shutdown all the 
HVM guests and use only PV guests, problem is still there. I did not try 
the other way, only HVM guests, because this is a production system and 
the PV guests are needed.

>
> BTW: you didn't try the device pass-through when the issue occurs?
I am not using device pass-through.

Best regards
   Thimo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-08-14  9:52                         ` Andrew Cooper
@ 2013-09-07 13:27                           ` Thimo E.
  2013-09-07 17:02                             ` Andrew Cooper
  0 siblings, 1 reply; 63+ messages in thread
From: Thimo E. @ 2013-09-07 13:27 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Jan Beulich

Hello again,

I've disabled the internal network card and used another one, problem 
still exists. I had two crashed during 5 minutes, frustrating.
So (assuming disabling the internal card in the bios is working) the 
source of the problem is not the internal NIC.

Every time the pending EOI error occurs I see the mysterious interrupt 
 >>29<<. Only the vectors are changing. See below a summary of the last 
5 crashes.

My Questions:
- How can I see to which hardware device int 29 belongs ? I can't find 
int 29 in /proc/interrupts or lspci -vv nor in kernel dmesg or xen dmesg 
?!?!
- Andrew, what does your output "domain-list=0:276" mean and why is it 
alway 0:276 for interrupt 29 ? Is it the VM number ?

1)
(XEN)   irq 29, vector 0x21
(XEN)    IRQ:  29 affinity:4 vec:21 type=PCI-MSI status=00000010 
in-flight=0 domain-list=0:276(----),

2)
(XEN)   irq 29, vector 0x26
(XEN)    IRQ:  29 affinity:8 vec:26 type=PCI-MSI status=00000010 
in-flight=0 domain-list=0:276(----),

3)
(XEN)   irq 29, vector 0x31
(XEN)    IRQ:  29 affinity:2 vec:24 type=PCI-MSI status=00000010 
in-flight=0 domain-list=0:276(----),

4)
(XEN)   irq 29, vector 0x2e
(XEN)    IRQ:  29 affinity:8 vec:7e type=PCI-MSI status=00000010 
in-flight=0 domain-list=0:276(----),

5)
(XEN)   irq 29, vector 0x3b
(XEN)    IRQ:  29 affinity:2 vec:3b type=PCI-MSI status=00000010 
in-flight=0 domain-list=0:276(----),



Best regards
   Thimo



Am 14.08.2013 11:52, schrieb Andrew Cooper:
> On 14/08/13 03:53, Zhang, Yang Z wrote:
>> Andrew Cooper wrote on 2013-08-12:
>>>
>>> On the XenServer hardware where we have seen this issue, the
>>> problematic interrupt was from:
>>>
>>> 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection
>>> I217-LM (rev 02) Subsystem: Intel Corporation Device 0000 Control: I/O+
>>> Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR-
>>> FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>>> TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin
>>> A routed to IRQ 1275 Region 0: Memory at c2700000 (32-bit,
>>> non-prefetchable) [size=128K] Region 1: Memory at c273e000 (32-bit,
>>> non-prefetchable) [size=4K] Region 2: I/O ports at 7080 [size=32]
>>> Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1-
>>> D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst-
>>> PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+
>>> Count=1/1 Maskable- 64bit+ Address: 00000000fee00318 Data: 0000
>>> Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR-
>>> AFStatus: TP- Kernel driver in use: e1000e Kernel modules: e1000e
>>>
>>> I am still attempting to reproduce the issue, but we haven't seen it
>>> again since my email at the root of this thread.
>> Did you see the issue on other HSW machine without this NIC? Also, Thimo, have you tried to pin the vcpu and stop irqbalance in dom0?
> We do not have any Haswell hardware without this NIC.
>
> ~Andrew
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-07 13:27                           ` Thimo E.
@ 2013-09-07 17:02                             ` Andrew Cooper
  2013-09-07 23:37                               ` Thimo E.
  2013-09-09  7:59                               ` Jan Beulich
  0 siblings, 2 replies; 63+ messages in thread
From: Andrew Cooper @ 2013-09-07 17:02 UTC (permalink / raw)
  To: Thimo E.; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Jan Beulich

On 07/09/2013 14:27, Thimo E. wrote:
> Hello again,
>
> I've disabled the internal network card and used another one, problem
> still exists. I had two crashed during 5 minutes, frustrating.
> So (assuming disabling the internal card in the bios is working) the
> source of the problem is not the internal NIC.
>
> Every time the pending EOI error occurs I see the mysterious interrupt
> >>29<<. Only the vectors are changing. See below a summary of the last
> 5 crashes.
>
> My Questions:
> - How can I see to which hardware device int 29 belongs ? I can't find
> int 29 in /proc/interrupts or lspci -vv nor in kernel dmesg or xen
> dmesg ?!?!
> - Andrew, what does your output "domain-list=0:276" mean and why is it
> alway 0:276 for interrupt 29 ? Is it the VM number ?
>
> 1)
> (XEN)   irq 29, vector 0x21
> (XEN)    IRQ:  29 affinity:4 vec:21 type=PCI-MSI status=00000010
> in-flight=0 domain-list=0:276(----),
>
> 2)
> (XEN)   irq 29, vector 0x26
> (XEN)    IRQ:  29 affinity:8 vec:26 type=PCI-MSI status=00000010
> in-flight=0 domain-list=0:276(----),
>
> 3)
> (XEN)   irq 29, vector 0x31
> (XEN)    IRQ:  29 affinity:2 vec:24 type=PCI-MSI status=00000010
> in-flight=0 domain-list=0:276(----),
>
> 4)
> (XEN)   irq 29, vector 0x2e
> (XEN)    IRQ:  29 affinity:8 vec:7e type=PCI-MSI status=00000010
> in-flight=0 domain-list=0:276(----),
>
> 5)
> (XEN)   irq 29, vector 0x3b
> (XEN)    IRQ:  29 affinity:2 vec:3b type=PCI-MSI status=00000010
> in-flight=0 domain-list=0:276(----),
>

irq 29 is just an internal Xen number for accounting all interrupts.  It
doesn't mean anything specific regarding hardware etc.  The vector and
affinity would expect to change as dom0s vcpus are moved around by the
scheduler.

domain-list=0 means that this interrupt is targeted at dom0 (It is a
list because certain interrupts have to be shared my more than 1
domain).  Helpfully, the keyhandler truncates the pirq field, so 276 is
unlikely to be correct.  As it is a dom0 MSI, I am guessing it actually
matches up with interrupt 1276 in /proc/interrupts, if there is one.

Can you provide the results of `xl debug-keys iMQ`, and attach
/proc/interrupts to this email (just in case the setup has changed after
playing with your BIOS)

~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-07 17:02                             ` Andrew Cooper
@ 2013-09-07 23:37                               ` Thimo E.
  2013-09-08  9:53                                 ` Andrew Cooper
  2013-09-09  7:59                               ` Jan Beulich
  1 sibling, 1 reply; 63+ messages in thread
From: Thimo E. @ 2013-09-07 23:37 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Jan Beulich

Hello Andrew,

ok, thanks. This is what I assumed.

The output of "xl debug-keys iMQ" is empty.

[root@localhost ~]#  dmesg |grep arcmsr
[    8.159321] arcmsr 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> 
IRQ 16
[    8.159413] arcmsr 0000:01:00.0: setting latency timer to 64
[    8.170316] arcmsr 0000:01:00.0: get owner: 7ff0
[    8.170414] arcmsr 0000:01:00.0: irq 1276 (276) for MSI/MSI-X
[    8.170421] IRQ 1276/arcmsr: IRQF_DISABLED is not guaranteed on 
shared IRQs
[    8.170654] arcmsr0: msi enabled

[root@localhost /]# cat /proc/irq/1276/spurious
count 61007
unhandled 8
last_unhandled 36736990 ms

arcmsr is the driver of the Areca Storage Raid Controller. Used it 
already before with Xenserver 6.0.2 for years, no problems.

THe messages "IRQF_DISABLED is not guaranteed...." and "8 unhandled 
interrupts" look interesting. I am not a kernel hacker but what I 
interpret from 
http://lxr.free-electrons.com/source/kernel/irq/manage.c?v=2.6.32:

1025         if ((irqflags & (IRQF_SHARED|IRQF_DISABLED)) ==
1026 (IRQF_SHARED|IRQF_DISABLED)) {
1027                 pr_warning(
1028                   "IRQ %d/%s: IRQF_DISABLED is not guaranteed on 
shared IRQs\n",
1029                         irq, devname);
...
738                  * Force MSI interrupts to run with interrupts
739                  * disabled. The multi vector cards can cause stack
740                  * overflows due to nested interrupts when enough of
741                  * them are directed to a core and fire at the same
742                  * time.
743                  */
744                 if (desc->msi_desc)
745                         new->flags |= IRQF_DISABLED;

--> "IRQF_DISABLED is not guaranteed on shared IRQs" warning is only 
printed when irqflags IRQF_SHARED and IRQF_DISABLED are set
--> Is that what we see in the kernel oops the stack overflow the 
comment in lines 738-742 is talking about ?!
--> IRQF_SHARED is set, so MSI interrupt 1276 is shared ?! I thought 
that it is not possible that MSI interrupts are shared. Attached you'll 
see my /proc/interrupts

So what I do now is disabling MSI for the arcmsr driver. Could this be 
the source of the problem ?! But why is 1276 shared ?!

Best regards
   Thimo

Am 07.09.2013 19:02, schrieb Andrew Cooper:
>
> irq 29 is just an internal Xen number for accounting all interrupts.  It
> doesn't mean anything specific regarding hardware etc.  The vector and
> affinity would expect to change as dom0s vcpus are moved around by the
> scheduler.
>
> domain-list=0 means that this interrupt is targeted at dom0 (It is a
> list because certain interrupts have to be shared my more than 1
> domain).  Helpfully, the keyhandler truncates the pirq field, so 276 is
> unlikely to be correct.  As it is a dom0 MSI, I am guessing it actually
> matches up with interrupt 1276 in /proc/interrupts, if there is one.
>
> Can you provide the results of `xl debug-keys iMQ`, and attach
> /proc/interrupts to this email (just in case the setup has changed after
> playing with your BIOS)
>
> ~Andrew
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-07 23:37                               ` Thimo E.
@ 2013-09-08  9:53                                 ` Andrew Cooper
  2013-09-08 10:24                                   ` Thimo E.
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-09-08  9:53 UTC (permalink / raw)
  To: Thimo E.; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Jan Beulich

On 08/09/2013 00:37, Thimo E. wrote:
> Hello Andrew,
>
> ok, thanks. This is what I assumed.
>
> The output of "xl debug-keys iMQ" is empty.

Sorry - I should have been more clear.  `xl debug-keys` dumps its
information into the xen dmesg buffer, so `xl dmesg` will capture the
results.

~Andrew

>
> [root@localhost ~]#  dmesg |grep arcmsr
> [    8.159321] arcmsr 0000:01:00.0: PCI INT A -> GSI 16 (level, low)
> -> IRQ 16
> [    8.159413] arcmsr 0000:01:00.0: setting latency timer to 64
> [    8.170316] arcmsr 0000:01:00.0: get owner: 7ff0
> [    8.170414] arcmsr 0000:01:00.0: irq 1276 (276) for MSI/MSI-X
> [    8.170421] IRQ 1276/arcmsr: IRQF_DISABLED is not guaranteed on
> shared IRQs
> [    8.170654] arcmsr0: msi enabled
>
> [root@localhost /]# cat /proc/irq/1276/spurious
> count 61007
> unhandled 8
> last_unhandled 36736990 ms
>
> arcmsr is the driver of the Areca Storage Raid Controller. Used it
> already before with Xenserver 6.0.2 for years, no problems.
>
> THe messages "IRQF_DISABLED is not guaranteed...." and "8 unhandled
> interrupts" look interesting. I am not a kernel hacker but what I
> interpret from
> http://lxr.free-electrons.com/source/kernel/irq/manage.c?v=2.6.32:
>
> 1025         if ((irqflags & (IRQF_SHARED|IRQF_DISABLED)) ==
> 1026 (IRQF_SHARED|IRQF_DISABLED)) {
> 1027                 pr_warning(
> 1028                   "IRQ %d/%s: IRQF_DISABLED is not guaranteed on
> shared IRQs\n",
> 1029                         irq, devname);
> ...
> 738                  * Force MSI interrupts to run with interrupts
> 739                  * disabled. The multi vector cards can cause stack
> 740                  * overflows due to nested interrupts when enough of
> 741                  * them are directed to a core and fire at the same
> 742                  * time.
> 743                  */
> 744                 if (desc->msi_desc)
> 745                         new->flags |= IRQF_DISABLED;
>
> --> "IRQF_DISABLED is not guaranteed on shared IRQs" warning is only
> printed when irqflags IRQF_SHARED and IRQF_DISABLED are set
> --> Is that what we see in the kernel oops the stack overflow the
> comment in lines 738-742 is talking about ?!
> --> IRQF_SHARED is set, so MSI interrupt 1276 is shared ?! I thought
> that it is not possible that MSI interrupts are shared. Attached
> you'll see my /proc/interrupts
>
> So what I do now is disabling MSI for the arcmsr driver. Could this be
> the source of the problem ?! But why is 1276 shared ?!
>
> Best regards
>   Thimo
>
> Am 07.09.2013 19:02, schrieb Andrew Cooper:
>>
>> irq 29 is just an internal Xen number for accounting all interrupts.  It
>> doesn't mean anything specific regarding hardware etc.  The vector and
>> affinity would expect to change as dom0s vcpus are moved around by the
>> scheduler.
>>
>> domain-list=0 means that this interrupt is targeted at dom0 (It is a
>> list because certain interrupts have to be shared my more than 1
>> domain).  Helpfully, the keyhandler truncates the pirq field, so 276 is
>> unlikely to be correct.  As it is a dom0 MSI, I am guessing it actually
>> matches up with interrupt 1276 in /proc/interrupts, if there is one.
>>
>> Can you provide the results of `xl debug-keys iMQ`, and attach
>> /proc/interrupts to this email (just in case the setup has changed after
>> playing with your BIOS)
>>
>> ~Andrew
>>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-08  9:53                                 ` Andrew Cooper
@ 2013-09-08 10:24                                   ` Thimo E.
  2013-09-09 13:16                                     ` Andrew Cooper
  0 siblings, 1 reply; 63+ messages in thread
From: Thimo E. @ 2013-09-08 10:24 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Jan Beulich

[-- Attachment #1: Type: text/plain, Size: 3514 bytes --]

Ah, sorry.  Output is attached.

Am 08.09.2013 11:53, schrieb Andrew Cooper:
> On 08/09/2013 00:37, Thimo E. wrote:
>> Hello Andrew,
>>
>> ok, thanks. This is what I assumed.
>>
>> The output of "xl debug-keys iMQ" is empty.
> Sorry - I should have been more clear.  `xl debug-keys` dumps its
> information into the xen dmesg buffer, so `xl dmesg` will capture the
> results.
>
> ~Andrew
>
>> [root@localhost ~]#  dmesg |grep arcmsr
>> [    8.159321] arcmsr 0000:01:00.0: PCI INT A -> GSI 16 (level, low)
>> -> IRQ 16
>> [    8.159413] arcmsr 0000:01:00.0: setting latency timer to 64
>> [    8.170316] arcmsr 0000:01:00.0: get owner: 7ff0
>> [    8.170414] arcmsr 0000:01:00.0: irq 1276 (276) for MSI/MSI-X
>> [    8.170421] IRQ 1276/arcmsr: IRQF_DISABLED is not guaranteed on
>> shared IRQs
>> [    8.170654] arcmsr0: msi enabled
>>
>> [root@localhost /]# cat /proc/irq/1276/spurious
>> count 61007
>> unhandled 8
>> last_unhandled 36736990 ms
>>
>> arcmsr is the driver of the Areca Storage Raid Controller. Used it
>> already before with Xenserver 6.0.2 for years, no problems.
>>
>> THe messages "IRQF_DISABLED is not guaranteed...." and "8 unhandled
>> interrupts" look interesting. I am not a kernel hacker but what I
>> interpret from
>> http://lxr.free-electrons.com/source/kernel/irq/manage.c?v=2.6.32:
>>
>> 1025         if ((irqflags & (IRQF_SHARED|IRQF_DISABLED)) ==
>> 1026 (IRQF_SHARED|IRQF_DISABLED)) {
>> 1027                 pr_warning(
>> 1028                   "IRQ %d/%s: IRQF_DISABLED is not guaranteed on
>> shared IRQs\n",
>> 1029                         irq, devname);
>> ...
>> 738                  * Force MSI interrupts to run with interrupts
>> 739                  * disabled. The multi vector cards can cause stack
>> 740                  * overflows due to nested interrupts when enough of
>> 741                  * them are directed to a core and fire at the same
>> 742                  * time.
>> 743                  */
>> 744                 if (desc->msi_desc)
>> 745                         new->flags |= IRQF_DISABLED;
>>
>> --> "IRQF_DISABLED is not guaranteed on shared IRQs" warning is only
>> printed when irqflags IRQF_SHARED and IRQF_DISABLED are set
>> --> Is that what we see in the kernel oops the stack overflow the
>> comment in lines 738-742 is talking about ?!
>> --> IRQF_SHARED is set, so MSI interrupt 1276 is shared ?! I thought
>> that it is not possible that MSI interrupts are shared. Attached
>> you'll see my /proc/interrupts
>>
>> So what I do now is disabling MSI for the arcmsr driver. Could this be
>> the source of the problem ?! But why is 1276 shared ?!
>>
>> Best regards
>>    Thimo
>>
>> Am 07.09.2013 19:02, schrieb Andrew Cooper:
>>> irq 29 is just an internal Xen number for accounting all interrupts.  It
>>> doesn't mean anything specific regarding hardware etc.  The vector and
>>> affinity would expect to change as dom0s vcpus are moved around by the
>>> scheduler.
>>>
>>> domain-list=0 means that this interrupt is targeted at dom0 (It is a
>>> list because certain interrupts have to be shared my more than 1
>>> domain).  Helpfully, the keyhandler truncates the pirq field, so 276 is
>>> unlikely to be correct.  As it is a dom0 MSI, I am guessing it actually
>>> matches up with interrupt 1276 in /proc/interrupts, if there is one.
>>>
>>> Can you provide the results of `xl debug-keys iMQ`, and attach
>>> /proc/interrupts to this email (just in case the setup has changed after
>>> playing with your BIOS)
>>>
>>> ~Andrew
>>>


[-- Attachment #2: xl_debus_keys20130908.txt --]
[-- Type: text/plain, Size: 6746 bytes --]

(XEN) Guest interrupt information:
(XEN)    IRQ:   0 affinity:1 vec:f0 type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   1 affinity:1 vec:38 type=IO-APIC-edge    status=00000014 in-flight=0 domain-list=0:  1(----),
(XEN)    IRQ:   2 affinity:f vec:00 type=XT-PIC          status=00000000 mapped, unbound
(XEN)    IRQ:   3 affinity:1 vec:40 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   4 affinity:1 vec:48 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   5 affinity:1 vec:50 type=IO-APIC-edge    status=00000010 in-flight=0 domain-list=0:  5(----),
(XEN)    IRQ:   6 affinity:1 vec:58 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   7 affinity:1 vec:60 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   8 affinity:1 vec:68 type=IO-APIC-edge    status=00000010 in-flight=0 domain-list=0:  8(----),
(XEN)    IRQ:   9 affinity:1 vec:70 type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0:  9(----),
(XEN)    IRQ:  10 affinity:1 vec:78 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  11 affinity:1 vec:88 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  12 affinity:1 vec:90 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  13 affinity:1 vec:98 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  14 affinity:1 vec:a0 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  15 affinity:1 vec:a8 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  16 affinity:1 vec:b0 type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 16(----),
(XEN)    IRQ:  17 affinity:1 vec:39 type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  18 affinity:1 vec:31 type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 18(----),
(XEN)    IRQ:  22 affinity:1 vec:29 type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 22(----),
(XEN)    IRQ:  23 affinity:1 vec:d0 type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0: 23(----),
(XEN)    IRQ:  24 affinity:1 vec:28 type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  25 affinity:1 vec:30 type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  26 affinity:f vec:b8 type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  27 affinity:f vec:c0 type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  28 affinity:f vec:c8 type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  29 affinity:f vec:d8 type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:276(----),
(XEN)    IRQ:  30 affinity:f vec:51 type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:275(----),
(XEN)    IRQ:  31 affinity:f vec:61 type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:274(----),
(XEN) IO-APIC interrupt information:
(XEN)     IRQ  0 Vec240:
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  1 Vec 56:
(XEN)       Apic 0x00, Pin  1: vec=38 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  3 Vec 64:
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  4 Vec 72:
(XEN)       Apic 0x00, Pin  4: vec=48 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  5 Vec 80:
(XEN)       Apic 0x00, Pin  5: vec=50 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  6 Vec 88:
(XEN)       Apic 0x00, Pin  6: vec=58 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  7 Vec 96:
(XEN)       Apic 0x00, Pin  7: vec=60 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  8 Vec104:
(XEN)       Apic 0x00, Pin  8: vec=68 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  9 Vec112:
(XEN)       Apic 0x00, Pin  9: vec=70 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 10 Vec120:
(XEN)       Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 11 Vec136:
(XEN)       Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 12 Vec144:
(XEN)       Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 13 Vec152:
(XEN)       Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 14 Vec160:
(XEN)       Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 15 Vec168:
(XEN)       Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 16 Vec176:
(XEN)       Apic 0x00, Pin 16: vec=b0 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 17 Vec 57:
(XEN)       Apic 0x00, Pin 17: vec=39 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 18 Vec 49:
(XEN)       Apic 0x00, Pin 18: vec=31 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 22 Vec 41:
(XEN)       Apic 0x00, Pin 22: vec=29 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)       Apic 0x00, Pin 23: vec=d0 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN) PCI-MSI interrupt information:
(XEN)  MSI    26 vec=b8 lowest  edge   assert  log lowest dest=00000001 mask=0/1/-1
(XEN)  MSI    27 vec=c0 lowest  edge   assert  log lowest dest=00000001 mask=0/1/-1
(XEN)  MSI    28 vec=c8 lowest  edge   assert  log lowest dest=00000001 mask=0/1/-1
(XEN)  MSI    29 vec=d8 lowest  edge   assert  log lowest dest=00000001 mask=0/0/-1
(XEN)  MSI    30 vec=51 lowest  edge   assert  log lowest dest=00000001 mask=0/0/-1
(XEN)  MSI    31 vec=61 lowest  edge   assert  log lowest dedmes	st=00000001 mask=0/0/-1
(XEN) ==== PCI devices ====
(XEN) 04:00.1 - dom 0   - MSIs < 31 >
(XEN) 04:00.0 - dom 0   - MSIs < 30 >
(XEN) 03:02.0 - dom 0   - MSIs < >
(XEN) 02:00.0 - dom 0   - MSIs < >
(XEN) 01:00.0 - dom 0   - MSIs < 29 >
(XEN) 00:1f.3 - dom 0   - MSIs < >
(XEN) 00:1f.0 - dom 0   - MSIs < >
(XEN) 00:1d.0 - dom 0   - MSIs < >
(XEN) 00:1c.4 - dom 0   - MSIs < 28 >
(XEN) 00:1c.0 - dom 0   - MSIs < 27 >
(XEN) 00:1b.0 - dom 0   - MSIs < >
(XEN) 00:1a.0 - dom 0   - MSIs < >
(XEN) 00:16.0 - dom 0   - MSIs < >
(XEN) 00:14.0 - dom 0   - MSIs < >
(XEN) 00:02.0 - dom 0   - MSIs < >
(XEN) 00:01.0 - dom 0   - MSIs < 26 >
(XEN) 00:00.0 - dom 0   - MSIs < >

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-07 17:02                             ` Andrew Cooper
  2013-09-07 23:37                               ` Thimo E.
@ 2013-09-09  7:59                               ` Jan Beulich
  2013-09-09 12:53                                 ` Andrew Cooper
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2013-09-09  7:59 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Yang Z Zhang, xen-devel, Thimo E., Keir Fraser

>>> On 07.09.13 at 19:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> domain-list=0 means that this interrupt is targeted at dom0 (It is a
> list because certain interrupts have to be shared my more than 1
> domain).  Helpfully, the keyhandler truncates the pirq field, so 276 is
> unlikely to be correct.  As it is a dom0 MSI, I am guessing it actually
> matches up with interrupt 1276 in /proc/interrupts, if there is one.

What truncation are you seeing here? %3d merely pads the number
with spaces if it ends up being less than three digits. Wider numbers
still get printed in full. Whether that padding is really useful is another
question...

Jan

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-09  7:59                               ` Jan Beulich
@ 2013-09-09 12:53                                 ` Andrew Cooper
  0 siblings, 0 replies; 63+ messages in thread
From: Andrew Cooper @ 2013-09-09 12:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Yang Z Zhang, xen-devel, Thimo E., Keir Fraser

On 09/09/13 08:59, Jan Beulich wrote:
>>>> On 07.09.13 at 19:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> domain-list=0 means that this interrupt is targeted at dom0 (It is a
>> list because certain interrupts have to be shared my more than 1
>> domain).  Helpfully, the keyhandler truncates the pirq field, so 276 is
>> unlikely to be correct.  As it is a dom0 MSI, I am guessing it actually
>> matches up with interrupt 1276 in /proc/interrupts, if there is one.
> What truncation are you seeing here? %3d merely pads the number
> with spaces if it ends up being less than three digits. Wider numbers
> still get printed in full. Whether that padding is really useful is another
> question...
>
> Jan
>

Yes - very true.  Which means there is some other reason for the 1000
discrepancy between Xen and Dom0 ideas of dom0's of its pirqs.

~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-08 10:24                                   ` Thimo E.
@ 2013-09-09 13:16                                     ` Andrew Cooper
  2013-09-09 14:48                                       ` Thimo Eichstädt
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2013-09-09 13:16 UTC (permalink / raw)
  To: Thimo E.; +Cc: Zhang, Yang Z, xen-devel, Keir Fraser, Jan Beulich

On 08/09/13 11:24, Thimo E. wrote:
> Ah, sorry.  Output is attached.

So in this case, irq29 is now your SATA controller.

I presume you are still falling over the same basic assertion for the
pending EOI stack?

~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-09 13:16                                     ` Andrew Cooper
@ 2013-09-09 14:48                                       ` Thimo Eichstädt
  2013-09-09 15:12                                         ` Andrew Cooper
  0 siblings, 1 reply; 63+ messages in thread
From: Thimo Eichstädt @ 2013-09-09 14:48 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Zhang, Yang Z, xen-devel, Thimo E., Keir Fraser, Jan Beulich

Hello Andrew,

I've disabled MSI on that controller, now it is running with level 
triggered IRQs. No crash so far with these settings.

But what I see are a lot of spurious interrupts for every type of IRQ on 
my machine, Here an example:

[root@localhost /]# cat /proc/irq/1276/spurious
count 61007
unhandled 0
last_unhandled 36736990 ms

I can see this for the ethernet irqs, usb, sata and so on.

I've already written it into another mail on Sunday:

 >http://lxr.free-electrons.com/source/kernel/irq/manage.c?v=2.6.32:
 >1025         if ((irqflags & (IRQF_SHARED|IRQF_DISABLED)) ==
 >1026 (IRQF_SHARED|IRQF_DISABLED)) {
 >1027                 pr_warning(
 >1028                   "IRQ %d/%s: IRQF_DISABLED is not guaranteed on
 >shared IRQs\n",
 >1029                         irq, devname);
 >...
 >738                  * Force MSI interrupts to run with interrupts
 >739                  * disabled. The multi vector cards can cause stack
 >740                  * overflows due to nested interrupts when enough of
 >741                  * them are directed to a core and fire at the same
 >742                  * time.
 >743                  */
 >744                 if (desc->msi_desc)
 >745                         new->flags |= IRQF_DISABLED;

--> When using MSI on the SATA controller the kernel indicates me that 
IRQF_SHARED for that interrupt  is set, so the MSI  is shared ?! I 
thought that it is not possible that MSI interrupts are shared.
--> Is that what we see in the kernel oops the stack overflow the 
comment in lines 738-742 is talking about ?! Espacially because the 
warning in 1028 tells me that IRQF_DISABLED might not be set on shared 
interrupts.

Am 09.09.2013 15:16, schrieb Andrew Cooper:
> So in this case, irq29 is now your SATA controller.
>
> I presume you are still falling over the same basic assertion for the
> pending EOI stack?
>
> ~Andrew
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-09 14:48                                       ` Thimo Eichstädt
@ 2013-09-09 15:12                                         ` Andrew Cooper
  0 siblings, 0 replies; 63+ messages in thread
From: Andrew Cooper @ 2013-09-09 15:12 UTC (permalink / raw)
  To: Thimo Eichstädt
  Cc: Zhang, Yang Z, xen-devel, Thimo E., Keir Fraser, Jan Beulich

On 09/09/13 15:48, Thimo Eichstädt wrote:
> Hello Andrew,
>
> I've disabled MSI on that controller, now it is running with level
> triggered IRQs. No crash so far with these settings.
>
> But what I see are a lot of spurious interrupts for every type of IRQ
> on my machine, Here an example:

Given the nature of the problem, I am not surprised in the slightest
that there are spurious interrupts.

>
> [root@localhost /]# cat /proc/irq/1276/spurious
> count 61007
> unhandled 0
> last_unhandled 36736990 ms
>
> I can see this for the ethernet irqs, usb, sata and so on.

Line level interrupts are shared between multiple pieces of hardware,
leading to the possibility that no device driver claims the interrupt
(which is when the interrupt is declared as spurious)

>
> I've already written it into another mail on Sunday:
>
> >http://lxr.free-electrons.com/source/kernel/irq/manage.c?v=2.6.32:
> >1025         if ((irqflags & (IRQF_SHARED|IRQF_DISABLED)) ==
> >1026 (IRQF_SHARED|IRQF_DISABLED)) {
> >1027                 pr_warning(
> >1028                   "IRQ %d/%s: IRQF_DISABLED is not guaranteed on
> >shared IRQs\n",
> >1029                         irq, devname);
> >...
> >738                  * Force MSI interrupts to run with interrupts
> >739                  * disabled. The multi vector cards can cause stack
> >740                  * overflows due to nested interrupts when enough of
> >741                  * them are directed to a core and fire at the same
> >742                  * time.
> >743                  */
> >744                 if (desc->msi_desc)
> >745                         new->flags |= IRQF_DISABLED;
>
> --> When using MSI on the SATA controller the kernel indicates me that
> IRQF_SHARED for that interrupt  is set, so the MSI  is shared ?! I
> thought that it is not possible that MSI interrupts are shared.
> --> Is that what we see in the kernel oops the stack overflow the
> comment in lines 738-742 is talking about ?! Espacially because the
> warning in 1028 tells me that IRQF_DISABLED might not be set on shared
> interrupts.

I suspect that this is a red herring.  It looks like a generic error
path for both legacy interrupts and msi interrupts.

Furthermore, dom0's interrupt handling is rather different under Xen,
not least as the event channel mechanism essentially serialises the
delivery of interrupts.

~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-04 18:32                                       ` Thimo E.
  2013-09-04 18:55                                         ` Andrew Cooper
  2013-09-05  1:15                                         ` Zhang, Yang Z
@ 2013-09-17  2:09                                         ` Zhang, Yang Z
  2013-09-17  7:39                                           ` Thimo E.
  2 siblings, 1 reply; 63+ messages in thread
From: Zhang, Yang Z @ 2013-09-17  2:09 UTC (permalink / raw)
  To: Thimo E.
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]

Zhang, Yang Z wrote on 2013-09-05:
> Thimo E. wrote on 2013-09-05:
>> Hello again,
>> 
>> the last two weeks no crash with pinning dom0_vcpus_pin and
>> restricting
>> dom0 to 1 cpu. But yesterday it crashed again. So changed the
>> command line again to:
>> 
>> iommu=no-intremap noirqbalance com1=115200,8n1,0xe050,0
>> console=com1,vga mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M
>> watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M
>> cpuid_mask_xsave_eax=0
>> 
>> And today server crashed again and produced a lot of debugging
>> messages, see attached. The "..." in the logfiles mean that the
>> message above the points was repeated very often.
>> 
>> My summary so far:
>> - With only 1 cpu atteched to dom0 the server was stable for 2
>> weeks, the crash there did not really show any irq problems, see crash20130903.txt
>>     You can find Andrews ideas to this in
>> http://forums.citrix.com/thread.jspa?messageID=1760771#1760771 -
>> With more than 1 cpu and irqbalance the server produced the crashes
>> I've already posted before - Without irqbalance crash with some
>> other fancy output, see crash20130904.txt
>> 
>> Next step is to change the network card.
>> 
>> Zhang, any update from your side ? Or do the others have any idea ?
> Our hardware guys said they don't aware of such issue with this CPU.
> We are trying to find the same platform to reproduce now.
Hi, Thimo,

I cannot reproduce this issue in my box after running about two weeks:
I started four guests (two PV guests and two HVM guests). And each guest runs a simple workload (ping a remote machine). After two weeks, the machine still works no crash and panic happen. Are there any special workload required to reproduce this issue?

Attached the cpuinfo and pci info in my box. Please compare it with yours to see whether it is same. Especially, the microcode version.

Best regards,
Yang



[-- Attachment #2: cpuinfo --]
[-- Type: application/octet-stream, Size: 3312 bytes --]

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz
stepping	: 3
microcode	: 0x7
cpu MHz		: 3398.082
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc eagerfpu pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm ida arat epb xsaveopt pln pts dtherm fsgsbase bmi1 hle avx2 bmi2 erms rtm
bogomips	: 6796.16
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz
stepping	: 3
microcode	: 0x7
cpu MHz		: 3398.082
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 4
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc eagerfpu pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm ida arat epb xsaveopt pln pts dtherm fsgsbase bmi1 hle avx2 bmi2 erms rtm
bogomips	: 6796.16
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz
stepping	: 3
microcode	: 0x7
cpu MHz		: 3398.082
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 2
cpu cores	: 4
apicid		: 4
initial apicid	: 4
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc eagerfpu pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm ida arat epb xsaveopt pln pts dtherm fsgsbase bmi1 hle avx2 bmi2 erms rtm
bogomips	: 6796.16
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz
stepping	: 3
microcode	: 0x7
cpu MHz		: 3398.082
cache size	: 6144 KB
physical id	: 0
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 6
initial apicid	: 6
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc eagerfpu pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm ida arat epb xsaveopt pln pts dtherm fsgsbase bmi1 hle avx2 bmi2 erms rtm
bogomips	: 6796.16
clflush size	: 64
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:


[-- Attachment #3: lspci --]
[-- Type: application/octet-stream, Size: 1873 bytes --]

00:00.0 Host bridge: Intel Corporation Device 0c00 (rev 06)
00:01.0 PCI bridge: Intel Corporation Device 0c01 (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Device 0412 (rev 06)
00:03.0 Audio device: Intel Corporation Device 0c0c (rev 06)
00:16.0 Communication controller: Intel Corporation Device 8c3a (rev 04)
00:19.0 Ethernet controller: Intel Corporation Device 153b (rev 04)
00:1a.0 USB controller: Intel Corporation Device 8c2d (rev 04)
00:1b.0 Audio device: Intel Corporation Device 8c20 (rev 04)
00:1c.0 PCI bridge: Intel Corporation Device 8c10 (rev d4)
00:1c.1 PCI bridge: Intel Corporation Device 8c12 (rev d4)
00:1c.3 PCI bridge: Intel Corporation Device 8c16 (rev d4)
00:1c.5 PCI bridge: Intel Corporation Device 8c1a (rev d4)
00:1d.0 USB controller: Intel Corporation Device 8c26 (rev 04)
00:1f.0 ISA bridge: Intel Corporation Device 8c44 (rev 04)
00:1f.2 SATA controller: Intel Corporation Device 8c02 (rev 04)
00:1f.3 SMBus: Intel Corporation Device 8c22 (rev 04)
01:00.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI Express Switch (rev 0c)
02:02.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI Express Switch (rev 0c)
02:04.0 PCI bridge: Integrated Device Technology, Inc. PES12N3A PCI Express Switch (rev 0c)
03:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
03:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
04:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
04:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
06:00.0 Network controller: Atheros Communications Inc. Device 0034 (rev 01)
07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
08:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-17  2:09                                         ` Zhang, Yang Z
@ 2013-09-17  7:39                                           ` Thimo E.
  2013-09-17  7:43                                             ` Zhang, Yang Z
  0 siblings, 1 reply; 63+ messages in thread
From: Thimo E. @ 2013-09-17  7:39 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

Hello Yang,

The problem that we are discussing here seems to come from the 
interaction of my raid controller (Areca, ARC-1212) and the hardware.
When I enable MSI on that controller the server crashes with the known 
error messages between 1 day and 7 days.  When I disable MSI I don't see 
these crashes anymore.

Andrew doesn't have this controller but he could also observe these type 
of crashes with another MSI enabled card, I think the networc card. But 
less often, I think 2 times in the last 4 months.

My cpuinfo nor dmesg show any microcode information, perhaps Xen hides 
that info ?!

Best regards
   Thimo

Am 17.09.2013 04:09, schrieb Zhang, Yang Z:
> Hi, Thimo,
>
> I cannot reproduce this issue in my box after running about two weeks:
> I started four guests (two PV guests and two HVM guests). And each guest runs a simple workload (ping a remote machine). After two weeks, the machine still works no crash and panic happen. Are there any special workload required to reproduce this issue?
>
> Attached the cpuinfo and pci info in my box. Please compare it with yours to see whether it is same. Especially, the microcode version.
>
> Best regards,
> Yang
>
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-17  7:39                                           ` Thimo E.
@ 2013-09-17  7:43                                             ` Zhang, Yang Z
  2013-09-17 21:04                                               ` Thimo E.
  0 siblings, 1 reply; 63+ messages in thread
From: Zhang, Yang Z @ 2013-09-17  7:43 UTC (permalink / raw)
  To: Thimo E.
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

Thimo E. wrote on 2013-09-17:
> Hello Yang,
> 
> The problem that we are discussing here seems to come from the
> interaction of my raid controller (Areca, ARC-1212) and the hardware.
> When I enable MSI on that controller the server crashes with the known
> error messages between 1 day and 7 days.  When I disable MSI I don't
> see these crashes anymore.
> 
> Andrew doesn't have this controller but he could also observe these
> type of crashes with another MSI enabled card, I think the networc
> card. But less often, I think 2 times in the last 4 months.
> 
> My cpuinfo nor dmesg show any microcode information, perhaps Xen hides
> that info ?!
'cat /proc/cpuinfo' will show the microcode version.

> 
> Best regards
>    Thimo
> Am 17.09.2013 04:09, schrieb Zhang, Yang Z:
>> Hi, Thimo,
>> 
>> I cannot reproduce this issue in my box after running about two weeks:
>> I started four guests (two PV guests and two HVM guests). And each
>> guest
> runs a simple workload (ping a remote machine). After two weeks, the
> machine still works no crash and panic happen. Are there any special
> workload required to reproduce this issue?
> 
>> 
>> Attached the cpuinfo and pci info in my box. Please compare it with
>> yours to see whether it is same. Especially, the microcode version.
>> 
>> Best regards,
>> Yang
>> 
>>


Best regards,
Yang

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-17  7:43                                             ` Zhang, Yang Z
@ 2013-09-17 21:04                                               ` Thimo E.
  2013-09-18  1:18                                                 ` Zhang, Xiantao
  2013-09-18 12:06                                                 ` Andrew Cooper
  0 siblings, 2 replies; 63+ messages in thread
From: Thimo E. @ 2013-09-17 21:04 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun, Zhang, Xiantao

Hello,

unfortunately the Xenserver kernel seems to not support reading the 
microcode, at least it is not populated in /proc/cpuinfo.
Andrew, are there any special tricks to get the version out of the 
Xenserver kernel ?

Best regards
   Thimo


Am 17.09.2013 09:43, schrieb Zhang, Yang Z:
> Thimo E. wrote on 2013-09-17:
>> Hello Yang,
>>
>> The problem that we are discussing here seems to come from the
>> interaction of my raid controller (Areca, ARC-1212) and the hardware.
>> When I enable MSI on that controller the server crashes with the known
>> error messages between 1 day and 7 days.  When I disable MSI I don't
>> see these crashes anymore.
>>
>> Andrew doesn't have this controller but he could also observe these
>> type of crashes with another MSI enabled card, I think the networc
>> card. But less often, I think 2 times in the last 4 months.
>>
>> My cpuinfo nor dmesg show any microcode information, perhaps Xen hides
>> that info ?!
> 'cat /proc/cpuinfo' will show the microcode version.
> Best regards,
> Yang
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-17 21:04                                               ` Thimo E.
@ 2013-09-18  1:18                                                 ` Zhang, Xiantao
  2013-09-18 17:24                                                   ` Thimo E.
  2013-09-18 12:06                                                 ` Andrew Cooper
  1 sibling, 1 reply; 63+ messages in thread
From: Zhang, Xiantao @ 2013-09-18  1:18 UTC (permalink / raw)
  To: Thimo E., Zhang, Yang Z
  Cc: Keir Fraser, Jan Beulich, Andrew Cooper, Dong, Eddie,
	Xen-develList, Nakajima, Jun

Probably you can use a non-Xen kernel to boot up the system, and then you can get the micro-code in cpuinfo.  Thanks!
Xiantao

-----Original Message-----
From: Thimo E. [mailto:abc@digithi.de] 
Sent: Wednesday, September 18, 2013 5:05 AM
To: Zhang, Yang Z
Cc: Keir Fraser; Jan Beulich; Andrew Cooper; Dong, Eddie; Xen-develList; Nakajima, Jun; Zhang, Xiantao
Subject: Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic

Hello,

unfortunately the Xenserver kernel seems to not support reading the microcode, at least it is not populated in /proc/cpuinfo.
Andrew, are there any special tricks to get the version out of the Xenserver kernel ?

Best regards
   Thimo


Am 17.09.2013 09:43, schrieb Zhang, Yang Z:
> Thimo E. wrote on 2013-09-17:
>> Hello Yang,
>>
>> The problem that we are discussing here seems to come from the 
>> interaction of my raid controller (Areca, ARC-1212) and the hardware.
>> When I enable MSI on that controller the server crashes with the 
>> known error messages between 1 day and 7 days.  When I disable MSI I 
>> don't see these crashes anymore.
>>
>> Andrew doesn't have this controller but he could also observe these 
>> type of crashes with another MSI enabled card, I think the networc 
>> card. But less often, I think 2 times in the last 4 months.
>>
>> My cpuinfo nor dmesg show any microcode information, perhaps Xen 
>> hides that info ?!
> 'cat /proc/cpuinfo' will show the microcode version.
> Best regards,
> Yang
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-17 21:04                                               ` Thimo E.
  2013-09-18  1:18                                                 ` Zhang, Xiantao
@ 2013-09-18 12:06                                                 ` Andrew Cooper
  1 sibling, 0 replies; 63+ messages in thread
From: Andrew Cooper @ 2013-09-18 12:06 UTC (permalink / raw)
  To: Thimo E.
  Cc: Keir Fraser, Jan Beulich, Dong, Eddie, Xen-develList, Nakajima,
	Jun, Zhang, Yang Z, Zhang, Xiantao

On 17/09/2013 22:04, Thimo E. wrote:
> Hello,
>
> unfortunately the Xenserver kernel seems to not support reading the
> microcode, at least it is not populated in /proc/cpuinfo.
> Andrew, are there any special tricks to get the version out of the
> Xenserver kernel ?
>
> Best regards
>   Thimo

Sadly not.  In Xen 4.1, the microcode detail printing was
unconditionally compiled out in all cases.  This behaviour changed in
4.2 (or possibly 4.3).  Your best chance is probably to boot a recent
Linux LiveCD.

~Andrew

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: cpuidle and un-eoid interrupts at the local apic
  2013-09-18  1:18                                                 ` Zhang, Xiantao
@ 2013-09-18 17:24                                                   ` Thimo E.
  0 siblings, 0 replies; 63+ messages in thread
From: Thimo E. @ 2013-09-18 17:24 UTC (permalink / raw)
  To: Zhang, Xiantao
  Cc: Keir Fraser, Nakajima, Jun, Andrew Cooper, Dong, Eddie,
	Xen-develList, Jan Beulich, Zhang, Yang Z

Hello,

I've looked it up in the bios, microcode version is 9

Best regards
   Thimo

Am 18.09.2013 03:18, schrieb Zhang, Xiantao:
> Probably you can use a non-Xen kernel to boot up the system, and then you can get the micro-code in cpuinfo.  Thanks!
> Xiantao
>
> -----Original Message-----
> From: Thimo E. [mailto:abc@digithi.de]
> Sent: Wednesday, September 18, 2013 5:05 AM
> To: Zhang, Yang Z
> Cc: Keir Fraser; Jan Beulich; Andrew Cooper; Dong, Eddie; Xen-develList; Nakajima, Jun; Zhang, Xiantao
> Subject: Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic
>
> Hello,
>
> unfortunately the Xenserver kernel seems to not support reading the microcode, at least it is not populated in /proc/cpuinfo.
> Andrew, are there any special tricks to get the version out of the Xenserver kernel ?
>
> Best regards
>     Thimo
>
>
> Am 17.09.2013 09:43, schrieb Zhang, Yang Z:
>> Thimo E. wrote on 2013-09-17:
>>> Hello Yang,
>>>
>>> The problem that we are discussing here seems to come from the
>>> interaction of my raid controller (Areca, ARC-1212) and the hardware.
>>> When I enable MSI on that controller the server crashes with the
>>> known error messages between 1 day and 7 days.  When I disable MSI I
>>> don't see these crashes anymore.
>>>
>>> Andrew doesn't have this controller but he could also observe these
>>> type of crashes with another MSI enabled card, I think the networc
>>> card. But less often, I think 2 times in the last 4 months.
>>>
>>> My cpuinfo nor dmesg show any microcode information, perhaps Xen
>>> hides that info ?!
>> 'cat /proc/cpuinfo' will show the microcode version.
>> Best regards,
>> Yang
>>

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2013-09-18 17:24 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-31 20:32 cpuidle and un-eoid interrupts at the local apic Andrew Cooper
2013-06-03 14:30 ` Jan Beulich
2013-07-31  8:30 ` Thimo E.
2013-07-31  9:47   ` Andrew Cooper
2013-08-02 22:50     ` Thimo E.
2013-08-02 23:32       ` Andrew Cooper
2013-08-05 12:45         ` Jan Beulich
2013-08-05 14:51           ` Andrew Cooper
2013-08-09 21:27             ` Thimo E.
2013-08-09 21:40               ` Andrew Cooper
2013-08-09 21:44                 ` Andrew Cooper
2013-08-11 17:46                   ` Thimo E.
2013-08-12  6:02                     ` Zhang, Yang Z
2013-08-12  8:49                     ` Zhang, Yang Z
2013-08-12  8:57                       ` Jan Beulich
2013-08-12 11:52                       ` Thimo E
2013-08-12 12:04                         ` Andrew Cooper
2013-08-19 15:14                           ` Thimo E.
2013-08-20  5:43                             ` Thimo Eichstädt
2013-08-20  8:40                               ` Jan Beulich
2013-08-20  8:50                                 ` Zhang, Yang Z
2013-08-23  7:22                                   ` Thimo Eichstädt
2013-08-23  7:30                                     ` Zhang, Yang Z
2013-08-27  1:03                                     ` Zhang, Yang Z
2013-09-04 18:32                                       ` Thimo E.
2013-09-04 18:55                                         ` Andrew Cooper
2013-09-04 19:56                                           ` Thimo E.
2013-09-04 20:54                                             ` Andrew Cooper
2013-09-05  1:45                                               ` Zhang, Yang Z
2013-09-05  7:20                                                 ` Thimo E.
2013-09-05  1:15                                         ` Zhang, Yang Z
2013-09-17  2:09                                         ` Zhang, Yang Z
2013-09-17  7:39                                           ` Thimo E.
2013-09-17  7:43                                             ` Zhang, Yang Z
2013-09-17 21:04                                               ` Thimo E.
2013-09-18  1:18                                                 ` Zhang, Xiantao
2013-09-18 17:24                                                   ` Thimo E.
2013-09-18 12:06                                                 ` Andrew Cooper
2013-08-12 13:54                       ` Thimo E
2013-08-12 14:06                         ` Andrew Cooper
2013-08-13  1:43                           ` Zhang, Yang Z
2013-08-13  6:39                             ` Thimo E.
2013-08-13 11:39                         ` Wu, Feng
2013-08-13 12:46                           ` Andrew Cooper
2013-08-12  9:10                     ` Andrew Cooper
2013-08-12  5:50                 ` Zhang, Yang Z
2013-08-12  8:20               ` Jan Beulich
2013-08-12  9:28                 ` Andrew Cooper
2013-08-12 10:05                   ` Jan Beulich
2013-08-12 10:27                     ` Andrew Cooper
2013-08-14  2:53                       ` Zhang, Yang Z
2013-08-14  7:51                         ` Thimo E.
2013-08-14  9:52                         ` Andrew Cooper
2013-09-07 13:27                           ` Thimo E.
2013-09-07 17:02                             ` Andrew Cooper
2013-09-07 23:37                               ` Thimo E.
2013-09-08  9:53                                 ` Andrew Cooper
2013-09-08 10:24                                   ` Thimo E.
2013-09-09 13:16                                     ` Andrew Cooper
2013-09-09 14:48                                       ` Thimo Eichstädt
2013-09-09 15:12                                         ` Andrew Cooper
2013-09-09  7:59                               ` Jan Beulich
2013-09-09 12:53                                 ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.