From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: cpuidle and un-eoid interrupts at the local apic Date: Mon, 5 Aug 2013 15:51:52 +0100 Message-ID: <51FFBC08.6070804@citrix.com> References: <51A908CA.7050604@citrix.com> <51F8CB15.1070608@digithi.de> <51F8DD40.2090207@citrix.com> <51FC37A9.9090809@digithi.de> <51FC418D.8020708@citrix.com> <51FFBA8502000078000E9462@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <51FFBA8502000078000E9462@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: "Thimo E." , Keir Fraser , Xen-develList List-Id: xen-devel@lists.xenproject.org On 05/08/13 13:45, Jan Beulich wrote: >>>> On 03.08.13 at 01:32, Andrew Cooper wrote: >> Adjusted from my "interesting" idea of printk formatting, >> >> (XEN) **Pending EOI error >> (XEN) irq 29, vector 0x2e >> (XEN) s[0] irq 29, vec 0x2e, ready 0, ISR 1, TMR 0, IRR 0 >> (XEN) All LAPIC state: >> (XEN) [vector] ISR TMR IRR >> (XEN) [1f:01] 00000000 00000000 00000000 >> (XEN) [3f:20] 00016384 4095716568 00000000 >> (XEN) [5f:40] 00000000 4041382474 00000000 >> (XEN) [7f:60] 00000000 3967325758 00000000 >> (XEN) [9f:80] 00000000 2123395250 00000000 >> (XEN) [bf:a0] 00000000 1502837374 00000000 >> (XEN) [df:c0] 00000000 4270415335 00000000 >> (XEN) [ff:e0] 00000000 00000000 00000000 >> >> So Xen has been interrupted by an interrupt which it believes it has >> already seen, and is outstanding on the PendingEOI stack, waiting for >> Dom0 to actually deal with. > And which hence should be masked. Is this perhaps a non-maskable > MSI, and the device (erroneously?) issues a new interrupts before > the old one was really finished with? > > Jan > All of these crashes are coming out of mwait_idle, so the cpu in question has literally just been in an lower power state. I am wondering whether there is some caching issue where an update to the Pending EOI stack pointer got "lost", but this seems like a little too specific to be reasonably explained as a caching issue. A new debugging patch is on its way (Sorry - it has been a very busy few days) ~Andrew