S3 crash with VTD Queue Invalidation enabled

* S3 crash with VTD Queue Invalidation enabled
@ 2013-06-03 18:29 Ben Guthro
  2013-06-03 19:22 ` Andrew Cooper
  0 siblings, 1 reply; 26+ messages in thread
From: Ben Guthro @ 2013-06-03 18:29 UTC (permalink / raw)
  To: xen-devel

I am seeing a crash on some vPro systems in the S3 path -
specifically a Lenovo ThinkPad x220t (Sandybridge)

Once I managed to not suspend the console, I got a panic in
queue_invalidate_wait()
(I added a dump_execution_state() here, to get some more info)

(XEN) Entering ACPI S3 state.
(XEN) ----[ Xen-4.2.2  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c480149091>] invalidate_sync+0x258/0x291
(XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: ffff830137a665c0   rcx: 0000000000000000
(XEN) rdx: ffff82c48030a0a0   rsi: 000000000000000a   rdi: ffff82c4802766e0
(XEN) rbp: ffff82c4802bfd30   rsp: ffff82c4802bfce0   r8:  0000000000000004
(XEN) r9:  0000000000000002   r10: 0000000000000020   r11: 0000000000000010
(XEN) r12: 0000000bf34a77bc   r13: 0000000000000000   r14: ffff830137a665f8
(XEN) r15: 0000000137a5c002   cr0: 000000008005003b   cr4: 00000000000426f0
(XEN) cr3: 00000000ba2cd000   cr2: ffff880024181ff0
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802bfce0:
(XEN)    0000000000000002 0000000000000002 0101010000000002 0000000000000082
(XEN)    00000001802bfd30 ffff830137a665c0 0000000000000000 0000000000000000
(XEN)    0000000000000000 1000000000000000 ffff82c4802bfd90 ffff82c48014919d
(XEN)    ffff82c400000000 0000000000000000 ffff82c4802bfd60 0000000000000000
(XEN)    ffff82c4802bfd90 ffff830137a665c0 ffff830137a66540 0000000000000000
(XEN)    ffff830137a66670 ffff82c4802679e0 ffff82c4802bfde0 ffff82c480145a60
(XEN)    0000000000000000 ffff82c4802bfdc0 ffff82c480125d36 ffff82c3ffd7a00c
(XEN)    0000000000000000 0000000000000003 0000000000000003 ffff82c48030a100
(XEN)    ffff82c4802bfe20 ffff82c480145b08 ffff830137a4e620 ffff82c3ffd7a00c
(XEN)    0000000000000000 0000000000000003 0000000000000003 ffff82c48030a100
(XEN)    ffff82c4802bfe30 ffff82c480141e12 ffff82c4802bfe80 ffff82c48019f315
(XEN)    ffff82c4802bfe60 0000000000000282 0000000000000003 ffff83010cc0a010
(XEN)    ffff8300ba0fd000 0000000000000000 0000000000000003 ffff82c48030a100
(XEN)    ffff82c4802bfea0 ffff82c480105ed4 ffff8300ba0fd188 ffff82c48030a170
(XEN)    ffff82c4802bfec0 ffff82c480127a1e ffff82c480125b8a ffff82c48030a190
(XEN)    ffff82c4802bfef0 ffff82c480127d89 ffff82c4802bff18 ffff82c4802bff18
(XEN)    ffff82c4802bff18 00000000ffffffff ffff82c4802bff10 ffff82c48015a42f
(XEN)    ffff8300ba59a000 ffff8300ba0fd000 ffff82c4802bfda8 0000000000001403
(XEN)    0000000000000003 0000000000003403 ffffffff81a6b278 ffff8800049f3d28
(XEN)    0000000000000000 0000000000000246 0000000000000404 0000000000000003
(XEN) Xen call trace:
(XEN)    [<ffff82c480149091>] invalidate_sync+0x258/0x291
(XEN)    [<ffff82c48014919d>] flush_iotlb_qi+0xd3/0xef
(XEN)    [<ffff82c480145a60>] iommu_flush_all+0xb5/0xde
(XEN)    [<ffff82c480145b08>] vtd_suspend+0x23/0xf1
(XEN)    [<ffff82c480141e12>] iommu_suspend+0x3c/0x3e
(XEN)    [<ffff82c48019f315>] enter_state_helper+0x1a0/0x3cb
(XEN)    [<ffff82c480105ed4>] continue_hypercall_tasklet_handler+0x51/0xbf
(XEN)    [<ffff82c480127a1e>] do_tasklet_work+0x8d/0xc7
(XEN)    [<ffff82c480127d89>] do_tasklet+0x6b/0x9b
(XEN)    [<ffff82c48015a42f>] idle_loop+0x67/0x6f
(XEN)
(XEN)
(XEN) DMAR_IQA_REG = 137a5c002
(XEN) DMAR_IQH_REG = 120
(XEN) DMAR_IQT_REG = 140
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) queue invalidate wait descriptor was not executed
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

This particular dump was with Xen 4.2.2, and Linux 3.8.8
I have tested the following other combinations, with no difference in behavior:

Xen-unstable git cs da3bca931fbcf0cbdfec971aca234e7ec0f41e16, with
Linux 3.10-rc3 cs 58f8bbd2e39c3732c55698494338ee19a92c53a0

Xen-4.2.2 / linux-3.8.8
Xen-4.2.2 / linux-3.8.13
Xen-4.2.3-pre / linux-3.8.13

Booting with iommu=no-qinval or iommu=off works around the problem,
but I was wondering if there was a more elegant solution, possibly
detecting, and disabling this feature if not working properly?

Thanks in advance for any insight.

Ben

^ permalink raw reply	[flat|nested] 26+ messages in thread