From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ben Guthro Subject: S3 crash with VTD Queue Invalidation enabled Date: Mon, 3 Jun 2013 14:29:54 -0400 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: xen-devel List-Id: xen-devel@lists.xenproject.org I am seeing a crash on some vPro systems in the S3 path - specifically a Lenovo ThinkPad x220t (Sandybridge) Once I managed to not suspend the console, I got a panic in queue_invalidate_wait() (I added a dump_execution_state() here, to get some more info) (XEN) Entering ACPI S3 state. (XEN) ----[ Xen-4.2.2 x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[] invalidate_sync+0x258/0x291 (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff830137a665c0 rcx: 0000000000000000 (XEN) rdx: ffff82c48030a0a0 rsi: 000000000000000a rdi: ffff82c4802766e0 (XEN) rbp: ffff82c4802bfd30 rsp: ffff82c4802bfce0 r8: 0000000000000004 (XEN) r9: 0000000000000002 r10: 0000000000000020 r11: 0000000000000010 (XEN) r12: 0000000bf34a77bc r13: 0000000000000000 r14: ffff830137a665f8 (XEN) r15: 0000000137a5c002 cr0: 000000008005003b cr4: 00000000000426f0 (XEN) cr3: 00000000ba2cd000 cr2: ffff880024181ff0 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff82c4802bfce0: (XEN) 0000000000000002 0000000000000002 0101010000000002 0000000000000082 (XEN) 00000001802bfd30 ffff830137a665c0 0000000000000000 0000000000000000 (XEN) 0000000000000000 1000000000000000 ffff82c4802bfd90 ffff82c48014919d (XEN) ffff82c400000000 0000000000000000 ffff82c4802bfd60 0000000000000000 (XEN) ffff82c4802bfd90 ffff830137a665c0 ffff830137a66540 0000000000000000 (XEN) ffff830137a66670 ffff82c4802679e0 ffff82c4802bfde0 ffff82c480145a60 (XEN) 0000000000000000 ffff82c4802bfdc0 ffff82c480125d36 ffff82c3ffd7a00c (XEN) 0000000000000000 0000000000000003 0000000000000003 ffff82c48030a100 (XEN) ffff82c4802bfe20 ffff82c480145b08 ffff830137a4e620 ffff82c3ffd7a00c (XEN) 0000000000000000 0000000000000003 0000000000000003 ffff82c48030a100 (XEN) ffff82c4802bfe30 ffff82c480141e12 ffff82c4802bfe80 ffff82c48019f315 (XEN) ffff82c4802bfe60 0000000000000282 0000000000000003 ffff83010cc0a010 (XEN) ffff8300ba0fd000 0000000000000000 0000000000000003 ffff82c48030a100 (XEN) ffff82c4802bfea0 ffff82c480105ed4 ffff8300ba0fd188 ffff82c48030a170 (XEN) ffff82c4802bfec0 ffff82c480127a1e ffff82c480125b8a ffff82c48030a190 (XEN) ffff82c4802bfef0 ffff82c480127d89 ffff82c4802bff18 ffff82c4802bff18 (XEN) ffff82c4802bff18 00000000ffffffff ffff82c4802bff10 ffff82c48015a42f (XEN) ffff8300ba59a000 ffff8300ba0fd000 ffff82c4802bfda8 0000000000001403 (XEN) 0000000000000003 0000000000003403 ffffffff81a6b278 ffff8800049f3d28 (XEN) 0000000000000000 0000000000000246 0000000000000404 0000000000000003 (XEN) Xen call trace: (XEN) [] invalidate_sync+0x258/0x291 (XEN) [] flush_iotlb_qi+0xd3/0xef (XEN) [] iommu_flush_all+0xb5/0xde (XEN) [] vtd_suspend+0x23/0xf1 (XEN) [] iommu_suspend+0x3c/0x3e (XEN) [] enter_state_helper+0x1a0/0x3cb (XEN) [] continue_hypercall_tasklet_handler+0x51/0xbf (XEN) [] do_tasklet_work+0x8d/0xc7 (XEN) [] do_tasklet+0x6b/0x9b (XEN) [] idle_loop+0x67/0x6f (XEN) (XEN) (XEN) DMAR_IQA_REG = 137a5c002 (XEN) DMAR_IQH_REG = 120 (XEN) DMAR_IQT_REG = 140 (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) queue invalidate wait descriptor was not executed (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... This particular dump was with Xen 4.2.2, and Linux 3.8.8 I have tested the following other combinations, with no difference in behavior: Xen-unstable git cs da3bca931fbcf0cbdfec971aca234e7ec0f41e16, with Linux 3.10-rc3 cs 58f8bbd2e39c3732c55698494338ee19a92c53a0 Xen-4.2.2 / linux-3.8.8 Xen-4.2.2 / linux-3.8.13 Xen-4.2.3-pre / linux-3.8.13 Booting with iommu=no-qinval or iommu=off works around the problem, but I was wondering if there was a more elegant solution, possibly detecting, and disabling this feature if not working properly? Thanks in advance for any insight. Ben