All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: "Thimo E." <abc@digithi.de>
Cc: Keir Fraser <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
	Xen-devel List <xen-devel@lists.xen.org>
Subject: Re: cpuidle and un-eoid interrupts at the local apic
Date: Wed, 31 Jul 2013 10:47:44 +0100	[thread overview]
Message-ID: <51F8DD40.2090207@citrix.com> (raw)
In-Reply-To: <51F8CB15.1070608@digithi.de>

[-- Attachment #1: Type: text/plain, Size: 9270 bytes --]

On 31/07/13 09:30, Thimo E. wrote:
> Hello all,
>
> I have also a Haswell system. I am running XenServer 6.2 (with Xen
> 4.1.5) on it and I am experiencing the same issue. Do you already have
> a solution for this problem ?
>
> Best regards
>   Thimo

Hi,

We are still none the wiser on this issue.  I have a debugging patch to
get more information, but the problem hasn't reoccurred since.  This is
now 2 crashes on Xen 4.1 and a single crash on Xen 4.2 that I have seen.

For the benefit of anyone else who runs over this issue in the meantime,
the patch (against Xen-4.3) is attached.

Thimo: I shall put a new version of the XenServer 6.2 Xen with the
debugging patch on the forum thread.

~Andrew

>
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
> irq.c:1027^M
> (XEN) ----[ Xen-4.1.5.debug  x86_64  debug=y  Not tainted ]----^M
> (XEN) CPU:    1^M
> (XEN) RIP:    e008:[<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor^M
> (XEN) rax: 0000000000000001   rbx: ffff83081f080f00   rcx:
> ffff83081f05b340^M
> (XEN) rdx: 0000000000000001   rsi: 000000000000002b   rdi:
> 0000000000000001^M
> (XEN) rbp: ffff83081f057d88   rsp: ffff83081f057d18   r8:
> ffff83081f05b63c^M
> (XEN) r9:  000070044fb97100   r10: ffff8300b858c060   r11:
> 000020f3f5a4dea5^M
> (XEN) r12: 000000000000002b   r13: ffff83081f004e80   r14:
> 000000000000001d^M
> (XEN) r15: 0000000000000002   cr0: 000000008005003b   cr4:
> 00000000001026f0^M
> (XEN) cr3: 000000045915f000   cr2: 0000000000150008^M
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008^M
> (XEN) Xen stack trace from rsp=ffff83081f057d18:^M
> (XEN)    000000000000001d 000000000000001d ffff83081f080f00
> 0000000000000000^M
> (XEN)    00000000ffffffea ffff83081f080f00 0000000000000000
> 0000000000000000^M
> (XEN)    ffffffffffffffff ffff83081f057f18 ffff83081f06bb00
> ffff83081f06bb90^M
> (XEN)    ffff8300b858c000 0000000000000002 00007cf7e0fa8247
> ffff82c480161a66^M
> (XEN)    0000000000000002 ffff8300b858c000 ffff83081f06bb90
> ffff83081f06bb00^M
> (XEN)    ffff83081f057ef0 ffff83081f057f18 000020f3f5a4dea5
> ffff8300b858c060^M
> (XEN)    000070044fb97100 ffff83081f05bb80 0000000000007f40
> 0000000000000001^M
> (XEN)    0000000000000000 000020f3c755a972 ffff83081f06bb90
> 0000002b00000000^M
> (XEN)    ffff82c4801a21f0 000000000000e008 0000000000000246
> ffff83081f057e48^M
> (XEN)    000000000000e010 ffff83081f057ef0 ffff82c4801a3dc4
> 000020f3f595c09c^M
> (XEN)    000020f3f596987e ffff8306383e3010 ffff83081f05b100
> ffffffffffffffff^M
> (XEN)    0000000000000001 0000000000000001 ffffffffffffffff
> ffff83081f057f18^M
> (XEN)    00000000802d4680 0000000000000000 0000000000000000
> ffff82c4802d4680^M
> (XEN)    000002a80000024b ffff8300b8586000 ffff83081f057f18
> ffff8300b8586000^M
> (XEN)    ffff8300b858c000 ffff8300b858c000 0000000000000002
> ffff83081f057f10^M
> (XEN)    ffff82c48015a261 ffff82c480126ccd 0000000000000001
> ffff83081f057d18^M
> (XEN)    0000000000000000 0000000000000000 0000000000000000
> 0000000000000000^M
> (XEN)    0000000000000000 0000000000000000 0000000000000246
> ffff88001a8093a0^M
> (XEN)    0000000100885e0f 000000000000000f 0000000000000000
> ffffffff802063aa^M
> (XEN)    0000000000000001 00000000deadbeef 00000000deadbeef
> 0000010000000000^M
> (XEN) Xen call trace:^M
> (XEN)    [<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
> (XEN)    [<ffff82c480161a66>] common_interrupt+0x26/0x30^M
> (XEN)    [<ffff82c4801a21f0>] lapic_timer_nop+0x0/0x6^M
> (XEN)    [<ffff82c48015a261>] idle_loop+0x48/0x59^M
> (XEN)    ^M
> (XEN) ^M
> (XEN) ****************************************^M
> (XEN) Panic on CPU 1:^M
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
> irq.c:1027^M
> (XEN) ****************************************^M
> (XEN) ^M
> (XEN) Reboot in five seconds...^M
>
> Am 31.05.2013 22:32, schrieb Andrew Cooper:
>> Recently our automated testing system has caught a curious assertion
>> while testing Xen 4.1.5 on a HaswellDT system.
>>
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1030
>> (XEN) ----[ Xen-4.1.5  x86_64  debug=n  Not tainted ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>> (XEN) RFLAGS: 0000000000010093   CONTEXT: hypervisor
>> (XEN) rax: 000000000000002f   rbx: ffff830249841e80   rcx:
>> ffff82c4803127c0
>> (XEN) rdx: 0000000000000004   rsi: 0000000000000027   rdi:
>> 0000000000000001
>> (XEN) rbp: 0000000000001e00   rsp: ffff82c4802bfd48   r8: 
>> ffff82c480312abc
>> (XEN) r9:  ffff8302498a5948   r10: 0000000000000009   r11:
>> ffff8302498c6c80
>> (XEN) r12: ffff830243b07f50   r13: ffff8300a24f8000   r14:
>> 00000af8373788e3
>> (XEN) r15: ffff830249841e80   cr0: 000000008005003b   cr4:
>> 00000000001026f0
>> (XEN) cr3: 00000002479e6000   cr2: 00000000e6d3c090
>> (XEN) ds: 007b   es: 007b   fs: 00d8   gs: 0000   ss: 0000   cs: e008
>> (XEN) Xen stack trace from rsp=ffff82c4802bfd48:
>> (XEN)    ffff830249841eb4 ffff82c480312ec0 000000000000001e
>> 0000001e00000000
>> (XEN)    0000000000000000 00000000498a5670 ffff830249841d80
>> ffff830249840080
>> (XEN)    ffff830249841db4 0000000000000000 ffff8302498a55e0
>> ffff8302498a5670
>> (XEN)    ffff8300a24f8000 00000af8373788e3 00000af83736b8ed
>> ffff82c480162ca0
>> (XEN)    00000af83736b8ed 00000af8373788e3 ffff8300a24f8000
>> ffff8302498a5670
>> (XEN)    ffff8302498a55e0 0000000000000000 ffff8302498c6c80
>> 0000000000000009
>> (XEN)    ffff8302498a5948 ffff82c480313000 0000000000007f40
>> 0000000000000001
>> (XEN)    0000000000000000 0000000000000000 00000af80db652fd
>> 0000002700000000
>> (XEN)    ffff82c4801a50a0 000000000000e008 0000000000000246
>> ffff82c4802bfe78
>> (XEN)    0000000000000000 ffff8302498a5670 ffff82c4801a6a56
>> ffffffffffffffff
>> (XEN)    ffff830249818000 0000000000000000 ffff8300a24f8000
>> ffff82c480122c11
>> (XEN)    00000af839021119 0000000000000000 0000000000000000
>> 00000000802bff18
>> (XEN)    0000025c0000013b ffff82c4802e7580 ffff82c4802bff18
>> ffff8300a2838000
>> (XEN)    ffff82c4802f61a0 ffff8300a24f8000 0000000000000002
>> 00000af837304b45
>> (XEN)    ffff82c48015b67a 0000000000000000 0000000000000000
>> 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 00000000ee8a3f8c
>> 0000000000000001
>> (XEN)    0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 00000000ee8a3f74
>> 0000000000000af8
>> (XEN)    0000000000000001 0000010000000000 00000000c01013a7
>> 0000000000000061
>> (XEN)    0000000000000246 00000000ee8a3f70 0000000000000069
>> 0000000000000000
>> (XEN) Xen call trace:
>> (XEN)       [<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>> (XEN)     15[<ffff82c480162ca0>] common_interrupt+0x20/0x30
>> (XEN)     32[<ffff82c4801a50a0>] lapic_timer_nop+0x0/0x10
>> (XEN)     38[<ffff82c4801a6a56>] acpi_processor_idle+0x376/0x740
>> (XEN)     43[<ffff82c480122c11>] do_block+0x71/0xd0
>> (XEN)     56[<ffff82c48015b67a>] idle_loop+0x1a/0x50
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1030
>> (XEN) ****************************************
>>
>> And the disassembly before the assertion:
>>
>> ffff82c48016b29f:       48 8d 14 85 00 00 00    lea    0x0(,%rax,4),%rdx
>> ffff82c48016b2a6:       00
>> ffff82c48016b2a7:       0f b6 44 11 ff          movzbl
>> -0x1(%rcx,%rdx,1),%eax
>> ffff82c48016b2ac:       39 c6                   cmp    %eax,%esi
>> ffff82c48016b2ae:       0f 8f 5c ff ff ff       jg    
>> ffff82c48016b210 <do_IRQ+0x470>
>> ffff82c48016b2b4:       0f 0b                   ud2
>>
>>
>> Xen has been woken up by an interrupt of vector 0x27, but has a vector
>> 0x2f on the top of the pending EOI stack for the local APIC.
>>
>> I have put in more debugging to dump the LAPIC state of the two
>> interesting vectors and the IOAPIC state, but I have no idea if/when the
>> problem might reoccur.
>>
>> My understanding of LAPIC priority leads me to think that Xen really
>> shouldn't be woken up by a lower priority vector if a higher priority
>> one is still un-eoi'd.  There is not yet sufficient information to tell
>> whether this is truely the case, or that Xen has simply gotten confused
>> about which vectors it eoi'd.
>>
>> Having said that, we do keep line level interrupts un-eoi'd for extended
>> periods while guests service the interrupt.  Given that vectors are
>> chosen at random, we could get into a situation where a line interrupt
>> has a vector 0xdf and stays pending for 150ms (which I measured as a
>> not-overly-uncommon mean-time-till-eoi for line level interrupt).  This
>> would starve any other guest interrupts for an extended period.
>>
>> Given directed-eoi support in the past few generations of processor, the
>> requirement for the pending EOI stack has disappeared as far as I am
>> aware.  Would it be sensible idea in general to make use of the pending
>> eoi stack conditional on not having/using directed EOI support?
>>
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>


[-- Attachment #2: ca-107844-debug.patch --]
[-- Type: text/x-patch, Size: 2919 bytes --]

# HG changeset patch
# Parent 3bc8894f281f3ee68406a565beb2f811c67c6b5e

diff -r 3bc8894f281f xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -1100,7 +1100,7 @@ static inline void UNEXPECTED_IO_APIC(vo
 {
 }
 
-static void /*__init*/ __print_IO_APIC(void)
+void /*__init*/ __print_IO_APIC(void)
 {
     int apic, i;
     union IO_APIC_reg_00 reg_00;
diff -r 3bc8894f281f xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -1115,6 +1115,8 @@ static void irq_guest_eoi_timer_fn(void 
     spin_unlock_irqrestore(&desc->lock, flags);
 }
 
+static void dump_irqs(unsigned char key);
+void __print_IO_APIC(void);
 static void __do_IRQ_guest(int irq)
 {
     struct irq_desc         *desc = irq_to_desc(irq);
@@ -1137,7 +1139,36 @@ static void __do_IRQ_guest(int irq)
     if ( action->ack_type == ACKTYPE_EOI )
     {
         sp = pending_eoi_sp(peoi);
-        ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
+        if ( unlikely( !((sp == 0) || (peoi[sp-1].vector < vector)) ))
+        {
+            printk("**Pending EOI error\n");
+            printk("  irq %d, vector 0x%x\n", irq, vector);
+
+            for ( i = sp-1; i >= 0; --i )
+            {
+                printk("  s[%d] irq %d, vec 0x%x, ready %u, "
+                       "ISR %u, TMR %u, IRR %u\n",
+                       i, peoi[i].irq, peoi[i].vector, peoi[i].ready,
+                       apic_isr_read(peoi[i].vector),
+                       apic_tmr_read(peoi[i].vector),
+                       apic_irr_read(peoi[i].vector) );
+            }
+
+            printk("All LAPIC state:\n");
+            printk("[vector] %8s %8s %8s\n", "ISR", "TMR", "IRR");
+            for ( i = 0; i < APIC_ISR_NR; ++i )
+                printk("[%02x:%0x2x] %08"PRIu32" %08"PRIu32" %08"PRIu32"\n",
+                       (i * 32)+31, i*32,
+                       apic_read(APIC_ISR + i*0x10),
+                       apic_read(APIC_TMR + i*0x10),
+                       apic_read(APIC_IRR + i*0x10) );
+
+            spin_unlock(&desc->lock);
+            dump_irqs('i');
+            __print_IO_APIC();
+
+            panic("CA-107844");
+        }
         ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
         peoi[sp].irq = irq;
         peoi[sp].vector = vector;
diff -r 3bc8894f281f xen/include/asm-x86/apic.h
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -152,6 +152,18 @@ static __inline bool_t apic_isr_read(u8 
             (vector & 0x1f)) & 1;
 }
 
+static __inline bool_t apic_tmr_read(u8 vector)
+{
+    return (apic_read(APIC_TMR + ((vector & ~0x1f) >> 1)) >>
+            (vector & 0x1f)) & 1;
+}
+
+static __inline bool_t apic_irr_read(u8 vector)
+{
+    return (apic_read(APIC_IRR + ((vector & ~0x1f) >> 1)) >>
+            (vector & 0x1f)) & 1;
+}
+
 static __inline u32 get_apic_id(void) /* Get the physical APIC id */
 {
     u32 id = apic_read(APIC_ID);

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2013-07-31  9:47 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-31 20:32 cpuidle and un-eoid interrupts at the local apic Andrew Cooper
2013-06-03 14:30 ` Jan Beulich
2013-07-31  8:30 ` Thimo E.
2013-07-31  9:47   ` Andrew Cooper [this message]
2013-08-02 22:50     ` Thimo E.
2013-08-02 23:32       ` Andrew Cooper
2013-08-05 12:45         ` Jan Beulich
2013-08-05 14:51           ` Andrew Cooper
2013-08-09 21:27             ` Thimo E.
2013-08-09 21:40               ` Andrew Cooper
2013-08-09 21:44                 ` Andrew Cooper
2013-08-11 17:46                   ` Thimo E.
2013-08-12  6:02                     ` Zhang, Yang Z
2013-08-12  8:49                     ` Zhang, Yang Z
2013-08-12  8:57                       ` Jan Beulich
2013-08-12 11:52                       ` Thimo E
2013-08-12 12:04                         ` Andrew Cooper
2013-08-19 15:14                           ` Thimo E.
2013-08-20  5:43                             ` Thimo Eichstädt
2013-08-20  8:40                               ` Jan Beulich
2013-08-20  8:50                                 ` Zhang, Yang Z
2013-08-23  7:22                                   ` Thimo Eichstädt
2013-08-23  7:30                                     ` Zhang, Yang Z
2013-08-27  1:03                                     ` Zhang, Yang Z
2013-09-04 18:32                                       ` Thimo E.
2013-09-04 18:55                                         ` Andrew Cooper
2013-09-04 19:56                                           ` Thimo E.
2013-09-04 20:54                                             ` Andrew Cooper
2013-09-05  1:45                                               ` Zhang, Yang Z
2013-09-05  7:20                                                 ` Thimo E.
2013-09-05  1:15                                         ` Zhang, Yang Z
2013-09-17  2:09                                         ` Zhang, Yang Z
2013-09-17  7:39                                           ` Thimo E.
2013-09-17  7:43                                             ` Zhang, Yang Z
2013-09-17 21:04                                               ` Thimo E.
2013-09-18  1:18                                                 ` Zhang, Xiantao
2013-09-18 17:24                                                   ` Thimo E.
2013-09-18 12:06                                                 ` Andrew Cooper
2013-08-12 13:54                       ` Thimo E
2013-08-12 14:06                         ` Andrew Cooper
2013-08-13  1:43                           ` Zhang, Yang Z
2013-08-13  6:39                             ` Thimo E.
2013-08-13 11:39                         ` Wu, Feng
2013-08-13 12:46                           ` Andrew Cooper
2013-08-12  9:10                     ` Andrew Cooper
2013-08-12  5:50                 ` Zhang, Yang Z
2013-08-12  8:20               ` Jan Beulich
2013-08-12  9:28                 ` Andrew Cooper
2013-08-12 10:05                   ` Jan Beulich
2013-08-12 10:27                     ` Andrew Cooper
2013-08-14  2:53                       ` Zhang, Yang Z
2013-08-14  7:51                         ` Thimo E.
2013-08-14  9:52                         ` Andrew Cooper
2013-09-07 13:27                           ` Thimo E.
2013-09-07 17:02                             ` Andrew Cooper
2013-09-07 23:37                               ` Thimo E.
2013-09-08  9:53                                 ` Andrew Cooper
2013-09-08 10:24                                   ` Thimo E.
2013-09-09 13:16                                     ` Andrew Cooper
2013-09-09 14:48                                       ` Thimo Eichstädt
2013-09-09 15:12                                         ` Andrew Cooper
2013-09-09  7:59                               ` Jan Beulich
2013-09-09 12:53                                 ` Andrew Cooper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51F8DD40.2090207@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=abc@digithi.de \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.