* Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
@ 2016-07-18 10:21 linux
2016-07-18 17:48 ` Andrew Cooper
2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
0 siblings, 2 replies; 13+ messages in thread
From: linux @ 2016-07-18 10:21 UTC (permalink / raw)
To: Xen-devel; +Cc: Jan Beulich
Hi Jan,
It seems that since your patch series starting with commit:
2016-06-22 x86/vMSI-X: defer intercept handler registration
74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
The shutdown of a guest which has a PCI device passed through which uses
MSI-X interrupts causes
a host crash, see the splat below. Somehow it also doesn't reboot in 5
seconds as it is supposed to (i don't have no-reboot on the command
line).
--
Sander
(XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64 debug=y
Not tainted ]----
(XEN) [2016-07-16 16:03:17.069] CPU: 0
(XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>]
msixtbl_pt_unregister+0x7b/0xd9
(XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT:
hypervisor (d0v0)
(XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx:
ffff83055c685500 rcx: 0000000000000001
(XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi:
0000000000001ab0 rdi: ffff8305313b85a0
(XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp:
ffff83009fd07c68 r8: ffff8305356dfff0
(XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10:
ffff830503420c50 r11: 0000000000000282
(XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13:
ffff83009fd07e48 r14: ffff8305313b8000
(XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0:
0000000080050033 cr4: 00000000000006e0
(XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2:
0000000000000000
(XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs:
0000 ss: e010 cs: e008
(XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
(msixtbl_pt_unregister+0x7b/0xd9):
(XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a 0f
18 09 48 39 fa 75 ec 48 8d 7b 24 e8
(XEN) [2016-07-16 16:03:17.069] Xen stack trace from
rsp=ffff83009fd07c68:
(XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480
ffff83009fd07ce8 ffff82d08014c394
(XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480
0000000000000293 ffff8305313b80cc
(XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000
ffff83009fd07cd8 ffff83009fd07e38
(XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000
00007fc25a33e004 ffff8305313b8000
(XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8
0000000000000000 ffff83053b1191f0
(XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28
ffff82d0801300ae 000000000000000e
(XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497
ffff83009fd07d78 000000020001d17b
(XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000
ffff83009fd07d68 ffff82d080130280
(XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa
0000000000000202 0000000000000000
(XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320
0000000000305000 00007fc25a33e004
(XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c
0000000000000206 0000000000000002
(XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db
0000000000000cfe 0000000000000002
(XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000
ffff83009fd07e48 ffff82d08019c119
(XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177
ffff83009fd07e38 0000000000000cfe
(XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206
0000000c00000030 000056082bb90013
(XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013
0000305600000000 000056082b87465d
(XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f
0000000000000000 000056082b8746cf
(XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730
00007ffe26820740 000056082b8797be
(XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772
00007ffe26820740 0000000000003056
(XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000
00007ffe26820580 ffff88005716d320
(XEN) [2016-07-16 16:03:17.070] Xen call trace:
(XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>]
msixtbl_pt_unregister+0x7b/0xd9
(XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>]
pt_irq_destroy_bind+0x2be/0x3f0
(XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>]
arch_do_domctl+0xc77/0x2414
(XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>]
do_domctl+0x19db/0x1d26
(XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>]
lstar_enter+0xdd/0x137
(XEN) [2016-07-16 16:03:17.070]
(XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
(XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000
ffffffffffffffff
(XEN) [2016-07-16 16:03:18.147]
(XEN) [2016-07-16 16:03:18.155] ****************************************
(XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
(XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
(XEN) [2016-07-16 16:03:18.200] [error_code=0000]
(XEN) [2016-07-16 16:03:18.214] Faulting linear address:
0000000000000000
(XEN) [2016-07-16 16:03:18.233] ****************************************
(XEN) [2016-07-16 16:03:18.252]
(XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
2016-07-18 10:21 Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts linux
@ 2016-07-18 17:48 ` Andrew Cooper
2016-07-18 19:26 ` Sander Eikelenboom
2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
1 sibling, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2016-07-18 17:48 UTC (permalink / raw)
To: linux, Xen-devel; +Cc: Jan Beulich
On 18/07/16 11:21, linux@eikelenboom.it wrote:
> Hi Jan,
>
> It seems that since your patch series starting with commit:
> 2016-06-22 x86/vMSI-X: defer intercept handler registration
> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>
> The shutdown of a guest which has a PCI device passed through which
> uses MSI-X interrupts causes
> a host crash, see the splat below. Somehow it also doesn't reboot in 5
> seconds as it is supposed to (i don't have no-reboot on the command
> line).
>
> --
> Sander
>
>
> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64
> debug=y Not tainted ]----
> (XEN) [2016-07-16 16:03:17.069] CPU: 0
> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>]
> msixtbl_pt_unregister+0x7b/0xd9
> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT:
> hypervisor (d0v0)
> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx:
> ffff83055c685500 rcx: 0000000000000001
> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi:
> 0000000000001ab0 rdi: ffff8305313b85a0
> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp:
> ffff83009fd07c68 r8: ffff8305356dfff0
> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10:
> ffff830503420c50 r11: 0000000000000282
> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13:
> ffff83009fd07e48 r14: ffff8305313b8000
> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0:
> 0000000080050033 cr4: 00000000000006e0
> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2:
> 0000000000000000
> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs:
> 0000 ss: e010 cs: e008
> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
> (msixtbl_pt_unregister+0x7b/0xd9):
> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a 0f
> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
> rsp=ffff83009fd07c68:
> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480
> ffff83009fd07ce8 ffff82d08014c394
> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480
> 0000000000000293 ffff8305313b80cc
> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000
> ffff83009fd07cd8 ffff83009fd07e38
> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000
> 00007fc25a33e004 ffff8305313b8000
> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8
> 0000000000000000 ffff83053b1191f0
> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28
> ffff82d0801300ae 000000000000000e
> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497
> ffff83009fd07d78 000000020001d17b
> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000
> ffff83009fd07d68 ffff82d080130280
> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa
> 0000000000000202 0000000000000000
> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320
> 0000000000305000 00007fc25a33e004
> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c
> 0000000000000206 0000000000000002
> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db
> 0000000000000cfe 0000000000000002
> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000
> ffff83009fd07e48 ffff82d08019c119
> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177
> ffff83009fd07e38 0000000000000cfe
> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206
> 0000000c00000030 000056082bb90013
> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013
> 0000305600000000 000056082b87465d
> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f
> 0000000000000000 000056082b8746cf
> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730
> 00007ffe26820740 000056082b8797be
> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772
> 00007ffe26820740 0000000000003056
> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000
> 00007ffe26820580 ffff88005716d320
> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>]
> msixtbl_pt_unregister+0x7b/0xd9
> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>]
> pt_irq_destroy_bind+0x2be/0x3f0
> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>]
> arch_do_domctl+0xc77/0x2414
> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>]
> do_domctl+0x19db/0x1d26
> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>]
> lstar_enter+0xdd/0x137
> (XEN) [2016-07-16 16:03:17.070]
> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000
> ffffffffffffffff
> (XEN) [2016-07-16 16:03:18.147]
> (XEN) [2016-07-16 16:03:18.155] ****************************************
> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000
> (XEN) [2016-07-16 16:03:18.233] ****************************************
> (XEN) [2016-07-16 16:03:18.252]
> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>
Can you paste the disassembly of msixtbl_pt_unregister() please? That
is a dereference of %rdx which is NULL at this point, but I need to
figure out which pointer it is supposed to be.
Thanks,
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
2016-07-18 17:48 ` Andrew Cooper
@ 2016-07-18 19:26 ` Sander Eikelenboom
2016-07-18 20:57 ` Andrew Cooper
0 siblings, 1 reply; 13+ messages in thread
From: Sander Eikelenboom @ 2016-07-18 19:26 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel
Monday, July 18, 2016, 7:48:20 PM, you wrote:
> On 18/07/16 11:21, linux@eikelenboom.it wrote:
>> Hi Jan,
>>
>> It seems that since your patch series starting with commit:
>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>
>> The shutdown of a guest which has a PCI device passed through which
>> uses MSI-X interrupts causes
>> a host crash, see the splat below. Somehow it also doesn't reboot in 5
>> seconds as it is supposed to (i don't have no-reboot on the command
>> line).
>>
>> --
>> Sander
>>
>>
>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64
>> debug=y Not tainted ]----
>> (XEN) [2016-07-16 16:03:17.069] CPU: 0
>> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>]
>> msixtbl_pt_unregister+0x7b/0xd9
>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT:
>> hypervisor (d0v0)
>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx:
>> ffff83055c685500 rcx: 0000000000000001
>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi:
>> 0000000000001ab0 rdi: ffff8305313b85a0
>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp:
>> ffff83009fd07c68 r8: ffff8305356dfff0
>> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10:
>> ffff830503420c50 r11: 0000000000000282
>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13:
>> ffff83009fd07e48 r14: ffff8305313b8000
>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0:
>> 0000000080050033 cr4: 00000000000006e0
>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2:
>> 0000000000000000
>> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs:
>> 0000 ss: e010 cs: e008
>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>> (msixtbl_pt_unregister+0x7b/0xd9):
>> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a 0f
>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>> rsp=ffff83009fd07c68:
>> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480
>> ffff83009fd07ce8 ffff82d08014c394
>> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480
>> 0000000000000293 ffff8305313b80cc
>> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000
>> ffff83009fd07cd8 ffff83009fd07e38
>> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000
>> 00007fc25a33e004 ffff8305313b8000
>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8
>> 0000000000000000 ffff83053b1191f0
>> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28
>> ffff82d0801300ae 000000000000000e
>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497
>> ffff83009fd07d78 000000020001d17b
>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000
>> ffff83009fd07d68 ffff82d080130280
>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa
>> 0000000000000202 0000000000000000
>> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320
>> 0000000000305000 00007fc25a33e004
>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c
>> 0000000000000206 0000000000000002
>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db
>> 0000000000000cfe 0000000000000002
>> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000
>> ffff83009fd07e48 ffff82d08019c119
>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177
>> ffff83009fd07e38 0000000000000cfe
>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206
>> 0000000c00000030 000056082bb90013
>> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013
>> 0000305600000000 000056082b87465d
>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f
>> 0000000000000000 000056082b8746cf
>> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730
>> 00007ffe26820740 000056082b8797be
>> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772
>> 00007ffe26820740 0000000000003056
>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000
>> 00007ffe26820580 ffff88005716d320
>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>]
>> msixtbl_pt_unregister+0x7b/0xd9
>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>]
>> pt_irq_destroy_bind+0x2be/0x3f0
>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>]
>> arch_do_domctl+0xc77/0x2414
>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>]
>> do_domctl+0x19db/0x1d26
>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>]
>> lstar_enter+0xdd/0x137
>> (XEN) [2016-07-16 16:03:17.070]
>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
>> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000
>> ffffffffffffffff
>> (XEN) [2016-07-16 16:03:18.147]
>> (XEN) [2016-07-16 16:03:18.155] ****************************************
>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000
>> (XEN) [2016-07-16 16:03:18.233] ****************************************
>> (XEN) [2016-07-16 16:03:18.252]
>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>
> Can you paste the disassembly of msixtbl_pt_unregister() please? That
> is a dereference of %rdx which is NULL at this point, but I need to
> figure out which pointer it is supposed to be.
Hi Andrew,
# addr2line -e xen-syms ffff82d0801e3e7e
/usr/src/new/xen-unstable/xen/arch/x86/hvm/vmsi.c:535 (discriminator 1)
So the RIP points to:
void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq)
{
struct irq_desc *irq_desc;
struct msi_desc *msi_desc;
struct pci_dev *pdev;
struct msixtbl_entry *entry;
ASSERT(pcidevs_locked());
ASSERT(spin_is_locked(&d->event_lock));
if ( !has_vlapic(d) )
return;
irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
if ( !irq_desc )
return;
msi_desc = irq_desc->msi_desc;
if ( !msi_desc )
goto out;
pdev = msi_desc->dev;
list_for_each_entry( entry, &d->arch.hvm_domain.msixtbl_list, list ) <--- HERE
if ( pdev == entry->pdev )
goto found;
out:
spin_unlock_irq(&irq_desc->lock);
return;
found:
if ( !atomic_dec_and_test(&entry->refcnt) )
del_msixtbl_entry(entry);
spin_unlock_irq(&irq_desc->lock);
}
Disassembly:
(gdb) info line msixtbl_pt_unregister
Line 513 of "vmsi.c" starts at address 0xffff82d0801e3e03 <msixtbl_pt_unregister> and ends at 0xffff82d0801e3e10 <msixtbl_pt_unregister+13>.
(gdb) disas 0xffff82d0801e3e03
Dump of assembler code for function msixtbl_pt_unregister:
0xffff82d0801e3e03 <+0>: push %rbp
0xffff82d0801e3e04 <+1>: mov %rsp,%rbp
0xffff82d0801e3e07 <+4>: push %r12
0xffff82d0801e3e09 <+6>: push %rbx
0xffff82d0801e3e0a <+7>: mov %rdi,%r12
0xffff82d0801e3e0d <+10>: mov %rsi,%rbx
0xffff82d0801e3e10 <+13>: callq 0xffff82d08014d585 <pcidevs_locked>
0xffff82d0801e3e15 <+18>: test %al,%al
0xffff82d0801e3e17 <+20>: jne 0xffff82d0801e3e1b <msixtbl_pt_unregister+24>
0xffff82d0801e3e19 <+22>: ud2
0xffff82d0801e3e1b <+24>: lea 0xcc(%r12),%rdi
0xffff82d0801e3e23 <+32>: callq 0xffff82d080130544 <_spin_is_locked>
0xffff82d0801e3e28 <+37>: test %eax,%eax
0xffff82d0801e3e2a <+39>: jne 0xffff82d0801e3e2e <msixtbl_pt_unregister+43>
0xffff82d0801e3e2c <+41>: ud2
0xffff82d0801e3e2e <+43>: testb $0x1,0x9dc(%r12)
0xffff82d0801e3e37 <+52>: je 0xffff82d0801e3ed7 <msixtbl_pt_unregister+212>
0xffff82d0801e3e3d <+58>: mov $0x0,%esi
0xffff82d0801e3e42 <+63>: mov %rbx,%rdi
0xffff82d0801e3e45 <+66>: callq 0xffff82d0801743a4 <pirq_spin_lock_irq_desc>
0xffff82d0801e3e4a <+71>: mov %rax,%rbx
0xffff82d0801e3e4d <+74>: test %rax,%rax
0xffff82d0801e3e50 <+77>: je 0xffff82d0801e3ed7 <msixtbl_pt_unregister+212>
0xffff82d0801e3e56 <+83>: mov 0x10(%rax),%rax
0xffff82d0801e3e5a <+87>: test %rax,%rax
0xffff82d0801e3e5d <+90>: je 0xffff82d0801e3e89 <msixtbl_pt_unregister+134>
0xffff82d0801e3e5f <+92>: mov 0x20(%rax),%rax
0xffff82d0801e3e63 <+96>: mov 0x5a0(%r12),%rdx
0xffff82d0801e3e6b <+104>: lea 0x5a0(%r12),%rdi
0xffff82d0801e3e73 <+112>: jmp 0xffff82d0801e3e7e <msixtbl_pt_unregister+123>
0xffff82d0801e3e75 <+114>: cmp %rax,0x18(%rdx)
0xffff82d0801e3e79 <+118>: je 0xffff82d0801e3e94 <msixtbl_pt_unregister+145>
0xffff82d0801e3e7b <+120>: mov %rcx,%rdx
0xffff82d0801e3e7e <+123>: mov (%rdx),%rcx
0xffff82d0801e3e81 <+126>: prefetcht0 (%rcx)
0xffff82d0801e3e84 <+129>: cmp %rdi,%rdx
0xffff82d0801e3e87 <+132>: jne 0xffff82d0801e3e75 <msixtbl_pt_unregister+114>
0xffff82d0801e3e89 <+134>: lea 0x24(%rbx),%rdi
0xffff82d0801e3e8d <+138>: callq 0xffff82d080130514 <_spin_unlock_irq>
0xffff82d0801e3e92 <+143>: jmp 0xffff82d0801e3ed7 <msixtbl_pt_unregister+212>
0xffff82d0801e3e94 <+145>: lock decl 0x10(%rdx)
0xffff82d0801e3e98 <+149>: sete %al
0xffff82d0801e3e9b <+152>: test %al,%al
0xffff82d0801e3e9d <+154>: jne 0xffff82d0801e3ece <msixtbl_pt_unregister+203>
0xffff82d0801e3e9f <+156>: mov (%rdx),%rcx
0xffff82d0801e3ea2 <+159>: mov 0x8(%rdx),%rax
0xffff82d0801e3ea6 <+163>: mov %rax,0x8(%rcx)
0xffff82d0801e3eaa <+167>: mov %rcx,(%rax)
0xffff82d0801e3ead <+170>: movabs $0x200200200200200,%rax
0xffff82d0801e3eb7 <+180>: mov %rax,0x8(%rdx)
0xffff82d0801e3ebb <+184>: lea 0x158(%rdx),%rdi
0xffff82d0801e3ec2 <+191>: lea -0xac1(%rip),%rsi # 0xffff82d0801e3408 <free_msixtbl_entry>
0xffff82d0801e3ec9 <+198>: callq 0xffff82d080122be0 <call_rcu>
0xffff82d0801e3ece <+203>: lea 0x24(%rbx),%rdi
0xffff82d0801e3ed2 <+207>: callq 0xffff82d080130514 <_spin_unlock_irq>
0xffff82d0801e3ed7 <+212>: pop %rbx
0xffff82d0801e3ed8 <+213>: pop %r12
0xffff82d0801e3eda <+215>: pop %rbp
0xffff82d0801e3edb <+216>: retq
End of assembler dump.
--
Sander
> Thanks,
> ~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
2016-07-18 19:26 ` Sander Eikelenboom
@ 2016-07-18 20:57 ` Andrew Cooper
2016-07-18 22:03 ` linux
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2016-07-18 20:57 UTC (permalink / raw)
To: Sander Eikelenboom; +Cc: Jan Beulich, Xen-devel
On 18/07/2016 20:26, Sander Eikelenboom wrote:
> Monday, July 18, 2016, 7:48:20 PM, you wrote:
>
>> On 18/07/16 11:21, linux@eikelenboom.it wrote:
>>> Hi Jan,
>>>
>>> It seems that since your patch series starting with commit:
>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>>
>>> The shutdown of a guest which has a PCI device passed through which
>>> uses MSI-X interrupts causes
>>> a host crash, see the splat below. Somehow it also doesn't reboot in 5
>>> seconds as it is supposed to (i don't have no-reboot on the command
>>> line).
>>>
>>> --
>>> Sander
>>>
>>>
>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64
>>> debug=y Not tainted ]----
>>> (XEN) [2016-07-16 16:03:17.069] CPU: 0
>>> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>]
>>> msixtbl_pt_unregister+0x7b/0xd9
>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT:
>>> hypervisor (d0v0)
>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx:
>>> ffff83055c685500 rcx: 0000000000000001
>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi:
>>> 0000000000001ab0 rdi: ffff8305313b85a0
>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp:
>>> ffff83009fd07c68 r8: ffff8305356dfff0
>>> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10:
>>> ffff830503420c50 r11: 0000000000000282
>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13:
>>> ffff83009fd07e48 r14: ffff8305313b8000
>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0:
>>> 0000000080050033 cr4: 00000000000006e0
>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2:
>>> 0000000000000000
>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs:
>>> 0000 ss: e010 cs: e008
>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>>> (msixtbl_pt_unregister+0x7b/0xd9):
>>> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a 0f
>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>>> rsp=ffff83009fd07c68:
>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480
>>> ffff83009fd07ce8 ffff82d08014c394
>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480
>>> 0000000000000293 ffff8305313b80cc
>>> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000
>>> ffff83009fd07cd8 ffff83009fd07e38
>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000
>>> 00007fc25a33e004 ffff8305313b8000
>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8
>>> 0000000000000000 ffff83053b1191f0
>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28
>>> ffff82d0801300ae 000000000000000e
>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497
>>> ffff83009fd07d78 000000020001d17b
>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000
>>> ffff83009fd07d68 ffff82d080130280
>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa
>>> 0000000000000202 0000000000000000
>>> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320
>>> 0000000000305000 00007fc25a33e004
>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c
>>> 0000000000000206 0000000000000002
>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db
>>> 0000000000000cfe 0000000000000002
>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000
>>> ffff83009fd07e48 ffff82d08019c119
>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177
>>> ffff83009fd07e38 0000000000000cfe
>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206
>>> 0000000c00000030 000056082bb90013
>>> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013
>>> 0000305600000000 000056082b87465d
>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f
>>> 0000000000000000 000056082b8746cf
>>> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730
>>> 00007ffe26820740 000056082b8797be
>>> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772
>>> 00007ffe26820740 0000000000003056
>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000
>>> 00007ffe26820580 ffff88005716d320
>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>]
>>> msixtbl_pt_unregister+0x7b/0xd9
>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>]
>>> pt_irq_destroy_bind+0x2be/0x3f0
>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>]
>>> arch_do_domctl+0xc77/0x2414
>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>]
>>> do_domctl+0x19db/0x1d26
>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>]
>>> lstar_enter+0xdd/0x137
>>> (XEN) [2016-07-16 16:03:17.070]
>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
>>> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000
>>> ffffffffffffffff
>>> (XEN) [2016-07-16 16:03:18.147]
>>> (XEN) [2016-07-16 16:03:18.155] ****************************************
>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000
>>> (XEN) [2016-07-16 16:03:18.233] ****************************************
>>> (XEN) [2016-07-16 16:03:18.252]
>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>>
>> Can you paste the disassembly of msixtbl_pt_unregister() please? That
>> is a dereference of %rdx which is NULL at this point, but I need to
>> figure out which pointer it is supposed to be.
> Hi Andrew,
<snip>
Thanks. What has happened is that the msixtbl linked list is still
uninitialised at this point. The only way I can see for this to happen
is that msixtbl_init() hasn't been called, or hasn't passed its first if
condition. The INIT_LIST_HEAD() visible in the context of the 2nd hunk
of identified changeset is the line of code which changes the list from
0 to initialised, and I don't see anywhere which re-zeros it later.
This alone suggests that the VM in question isn't actually using MSI-X
interrupts, even if the device passed through is capable.
Following the style of the identified changeset,
andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index e418b98..c533719 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
pirq *pirq)
ASSERT(pcidevs_locked());
ASSERT(spin_is_locked(&d->event_lock));
- if ( !has_vlapic(d) )
+ if ( !d->arch.hvm_domain.msixtbl_list.next )
return;
irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
should resolve your issue, although I am very tempted to replace the
opencoded list logic with a msixtbl_initialised() predicate instead.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
2016-07-18 20:57 ` Andrew Cooper
@ 2016-07-18 22:03 ` linux
2016-07-18 22:07 ` Andrew Cooper
0 siblings, 1 reply; 13+ messages in thread
From: linux @ 2016-07-18 22:03 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Andrew Cooper, Jan Beulich, Xen-devel
On 2016-07-18 22:57, Andrew Cooper wrote:
> On 18/07/2016 20:26, Sander Eikelenboom wrote:
>> Monday, July 18, 2016, 7:48:20 PM, you wrote:
>>
>>> On 18/07/16 11:21, linux@eikelenboom.it wrote:
>>>> Hi Jan,
>>>>
>>>> It seems that since your patch series starting with commit:
>>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>>>
>>>> The shutdown of a guest which has a PCI device passed through which
>>>> uses MSI-X interrupts causes
>>>> a host crash, see the splat below. Somehow it also doesn't reboot in
>>>> 5
>>>> seconds as it is supposed to (i don't have no-reboot on the command
>>>> line).
>>>>
>>>> --
>>>> Sander
>>>>
>>>>
>>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64
>>>> debug=y Not tainted ]----
>>>> (XEN) [2016-07-16 16:03:17.069] CPU: 0
>>>> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>]
>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT:
>>>> hypervisor (d0v0)
>>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx:
>>>> ffff83055c685500 rcx: 0000000000000001
>>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi:
>>>> 0000000000001ab0 rdi: ffff8305313b85a0
>>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp:
>>>> ffff83009fd07c68 r8: ffff8305356dfff0
>>>> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10:
>>>> ffff830503420c50 r11: 0000000000000282
>>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13:
>>>> ffff83009fd07e48 r14: ffff8305313b8000
>>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0:
>>>> 0000000080050033 cr4: 00000000000006e0
>>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2:
>>>> 0000000000000000
>>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs:
>>>> 0000 ss: e010 cs: e008
>>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>>>> (msixtbl_pt_unregister+0x7b/0xd9):
>>>> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b 0a
>>>> 0f
>>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>>>> rsp=ffff83009fd07c68:
>>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480
>>>> ffff83009fd07ce8 ffff82d08014c394
>>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480
>>>> 0000000000000293 ffff8305313b80cc
>>>> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000
>>>> ffff83009fd07cd8 ffff83009fd07e38
>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000
>>>> 00007fc25a33e004 ffff8305313b8000
>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8
>>>> 0000000000000000 ffff83053b1191f0
>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28
>>>> ffff82d0801300ae 000000000000000e
>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497
>>>> ffff83009fd07d78 000000020001d17b
>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000
>>>> ffff83009fd07d68 ffff82d080130280
>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa
>>>> 0000000000000202 0000000000000000
>>>> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320
>>>> 0000000000305000 00007fc25a33e004
>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c
>>>> 0000000000000206 0000000000000002
>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db
>>>> 0000000000000cfe 0000000000000002
>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000
>>>> ffff83009fd07e48 ffff82d08019c119
>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177
>>>> ffff83009fd07e38 0000000000000cfe
>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206
>>>> 0000000c00000030 000056082bb90013
>>>> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013
>>>> 0000305600000000 000056082b87465d
>>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f
>>>> 0000000000000000 000056082b8746cf
>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730
>>>> 00007ffe26820740 000056082b8797be
>>>> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772
>>>> 00007ffe26820740 0000000000003056
>>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000
>>>> 00007ffe26820580 ffff88005716d320
>>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>]
>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>]
>>>> pt_irq_destroy_bind+0x2be/0x3f0
>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>]
>>>> arch_do_domctl+0xc77/0x2414
>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>]
>>>> do_domctl+0x19db/0x1d26
>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>]
>>>> lstar_enter+0xdd/0x137
>>>> (XEN) [2016-07-16 16:03:17.070]
>>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from
>>>> 0000000000000000:
>>>> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000
>>>> ffffffffffffffff
>>>> (XEN) [2016-07-16 16:03:18.147]
>>>> (XEN) [2016-07-16 16:03:18.155]
>>>> ****************************************
>>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address:
>>>> 0000000000000000
>>>> (XEN) [2016-07-16 16:03:18.233]
>>>> ****************************************
>>>> (XEN) [2016-07-16 16:03:18.252]
>>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>>>
>>> Can you paste the disassembly of msixtbl_pt_unregister() please?
>>> That
>>> is a dereference of %rdx which is NULL at this point, but I need to
>>> figure out which pointer it is supposed to be.
>> Hi Andrew,
>
> <snip>
>
> Thanks. What has happened is that the msixtbl linked list is still
> uninitialised at this point. The only way I can see for this to happen
> is that msixtbl_init() hasn't been called, or hasn't passed its first
> if
> condition. The INIT_LIST_HEAD() visible in the context of the 2nd hunk
> of identified changeset is the line of code which changes the list from
> 0 to initialised, and I don't see anywhere which re-zeros it later.
>
> This alone suggests that the VM in question isn't actually using MSI-X
> interrupts, even if the device passed through is capable.
Hmm didn't actually check this before, but you seem to be right
(below is the lspci output from within the guest).
> Following the style of the identified changeset,
>
> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index e418b98..c533719 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
> pirq *pirq)
> ASSERT(pcidevs_locked());
> ASSERT(spin_is_locked(&d->event_lock));
>
> - if ( !has_vlapic(d) )
> + if ( !d->arch.hvm_domain.msixtbl_list.next )
> return;
>
> irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
>
> should resolve your issue, although I am very tempted to replace the
> opencoded list logic with a msixtbl_initialised() predicate instead.
>
> ~Andrew
It does resolve the issue, thanks !
--
Sander
00:05.0 VGA compatible controller: Advanced Micro Devices, Inc.
[AMD/ATI] Turks PRO [Radeon HD 6570/7570/8550] (prog-if 00 [VGA
controller])
Subsystem: PC Partner Limited / Sapphire Technology Turks PRO [Radeon
HD 6570/7570/8550]
Physical Slot: 5
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 68
Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M]
Region 2: Memory at f3060000 (64-bit, non-prefetchable) [size=128K]
Region 4: I/O ports at c100 [size=256]
Expansion ROM at f3080000 [disabled] [size=128K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1
unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s
<64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF
Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF
Disabled
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-,
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee57000 Data: 4300
Kernel driver in use: radeon
00:06.0 Audio device: Advanced Micro Devices, Inc. [AMD/ATI]
Turks/Whistler HDMI Audio [Radeon HD 6000 Series]
Subsystem: PC Partner Limited / Sapphire Technology Turks/Whistler HDMI
Audio [Radeon HD 6000 Series]
Physical Slot: 6
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 79
Region 0: Memory at f30b0000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1
unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s
<64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive-
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF
Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF
Disabled
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-,
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee56000 Data: 4300
Kernel driver in use: snd_hda_intel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
2016-07-18 22:03 ` linux
@ 2016-07-18 22:07 ` Andrew Cooper
0 siblings, 0 replies; 13+ messages in thread
From: Andrew Cooper @ 2016-07-18 22:07 UTC (permalink / raw)
To: linux; +Cc: Xen-devel, Jan Beulich, Andrew Cooper
On 18/07/2016 23:03, linux@eikelenboom.it wrote:
> On 2016-07-18 22:57, Andrew Cooper wrote:
>> On 18/07/2016 20:26, Sander Eikelenboom wrote:
>>> Monday, July 18, 2016, 7:48:20 PM, you wrote:
>>>
>>>> On 18/07/16 11:21, linux@eikelenboom.it wrote:
>>>>> Hi Jan,
>>>>>
>>>>> It seems that since your patch series starting with commit:
>>>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>>>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>>>>
>>>>> The shutdown of a guest which has a PCI device passed through which
>>>>> uses MSI-X interrupts causes
>>>>> a host crash, see the splat below. Somehow it also doesn't reboot
>>>>> in 5
>>>>> seconds as it is supposed to (i don't have no-reboot on the command
>>>>> line).
>>>>>
>>>>> --
>>>>> Sander
>>>>>
>>>>>
>>>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable x86_64
>>>>> debug=y Not tainted ]----
>>>>> (XEN) [2016-07-16 16:03:17.069] CPU: 0
>>>>> (XEN) [2016-07-16 16:03:17.069] RIP: e008:[<ffff82d0801e39de>]
>>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082 CONTEXT:
>>>>> hypervisor (d0v0)
>>>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40 rbx:
>>>>> ffff83055c685500 rcx: 0000000000000001
>>>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000 rsi:
>>>>> 0000000000001ab0 rdi: ffff8305313b85a0
>>>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78 rsp:
>>>>> ffff83009fd07c68 r8: ffff8305356dfff0
>>>>> (XEN) [2016-07-16 16:03:17.069] r9: ffff8305356df480 r10:
>>>>> ffff830503420c50 r11: 0000000000000282
>>>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000 r13:
>>>>> ffff83009fd07e48 r14: ffff8305313b8000
>>>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8 cr0:
>>>>> 0000000080050033 cr4: 00000000000006e0
>>>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000 cr2:
>>>>> 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000 es: 0000 fs: 0000 gs:
>>>>> 0000 ss: e010 cs: e008
>>>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>>>>> (msixtbl_pt_unregister+0x7b/0xd9):
>>>>> (XEN) [2016-07-16 16:03:17.069] 39 42 18 74 19 48 89 ca <48> 8b
>>>>> 0a 0f
>>>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>>>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>>>>> rsp=ffff83009fd07c68:
>>>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000000 ffff8305356df480
>>>>> ffff83009fd07ce8 ffff82d08014c394
>>>>> (XEN) [2016-07-16 16:03:17.069] 0000000000000001 ffff8305356df480
>>>>> 0000000000000293 ffff8305313b80cc
>>>>> (XEN) [2016-07-16 16:03:17.069] 000000568012ffe5 ffff8305313b8000
>>>>> ffff83009fd07cd8 ffff83009fd07e38
>>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000000 ffff83054e5fc000
>>>>> 00007fc25a33e004 ffff8305313b8000
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07da8 ffff82d0801629c8
>>>>> 0000000000000000 ffff83053b1191f0
>>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000246 ffff83009fd07d28
>>>>> ffff82d0801300ae 000000000000000e
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d080171497
>>>>> ffff83009fd07d78 000000020001d17b
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d68 0000000000000000
>>>>> ffff83009fd07d68 ffff82d080130280
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07d78 ffff82d08014d0aa
>>>>> 0000000000000202 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff8305313b8000 ffff88005716d320
>>>>> 0000000000305000 00007fc25a33e004
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07ef8 ffff82d080104b2c
>>>>> 0000000000000206 0000000000000002
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07df8 ffff82d08018c9db
>>>>> 0000000000000cfe 0000000000000002
>>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000000002 ffff83054e5fc000
>>>>> ffff83009fd07e48 ffff82d08019c119
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07e38 0000000080121177
>>>>> ffff83009fd07e38 0000000000000cfe
>>>>> (XEN) [2016-07-16 16:03:17.070] ffff83009fd07f18 0000000000000206
>>>>> 0000000c00000030 000056082bb90013
>>>>> (XEN) [2016-07-16 16:03:17.070] 0000000200000056 00007fc200000013
>>>>> 0000305600000000 000056082b87465d
>>>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 00007fc25606b31f
>>>>> 0000000000000000 000056082b8746cf
>>>>> (XEN) [2016-07-16 16:03:17.070] 0000000000001000 fee5600026820730
>>>>> 00007ffe26820740 000056082b8797be
>>>>> (XEN) [2016-07-16 16:03:17.070] 00000000fee56000 0000430026820772
>>>>> 00007ffe26820740 0000000000003056
>>>>> (XEN) [2016-07-16 16:03:17.070] 00007ffe268206e0 ffff83009ff8a000
>>>>> 00007ffe26820580 ffff88005716d320
>>>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801e39de>]
>>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d08014c394>]
>>>>> pt_irq_destroy_bind+0x2be/0x3f0
>>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0801629c8>]
>>>>> arch_do_domctl+0xc77/0x2414
>>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d080104b2c>]
>>>>> do_domctl+0x19db/0x1d26
>>>>> (XEN) [2016-07-16 16:03:17.070] [<ffff82d0802426bd>]
>>>>> lstar_enter+0xdd/0x137
>>>>> (XEN) [2016-07-16 16:03:17.070]
>>>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
>>>>> (XEN) [2016-07-16 16:03:17.070] L4[0x000] = 0000000000000000
>>>>> ffffffffffffffff
>>>>> (XEN) [2016-07-16 16:03:18.147]
>>>>> (XEN) [2016-07-16 16:03:18.155]
>>>>> ****************************************
>>>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>>>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>>>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>>>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address:
>>>>> 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:18.233]
>>>>> ****************************************
>>>>> (XEN) [2016-07-16 16:03:18.252]
>>>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>>>>
>>>> Can you paste the disassembly of msixtbl_pt_unregister() please? That
>>>> is a dereference of %rdx which is NULL at this point, but I need to
>>>> figure out which pointer it is supposed to be.
>>> Hi Andrew,
>>
>> <snip>
>>
>> Thanks. What has happened is that the msixtbl linked list is still
>> uninitialised at this point. The only way I can see for this to happen
>> is that msixtbl_init() hasn't been called, or hasn't passed its first if
>> condition. The INIT_LIST_HEAD() visible in the context of the 2nd hunk
>> of identified changeset is the line of code which changes the list from
>> 0 to initialised, and I don't see anywhere which re-zeros it later.
>>
>> This alone suggests that the VM in question isn't actually using MSI-X
>> interrupts, even if the device passed through is capable.
>
> Hmm didn't actually check this before, but you seem to be right
> (below is the lspci output from within the guest).
Both of those devices are using MSI interrupts - they don't even support
MSI-X.
>
>
>> Following the style of the identified changeset,
>>
>> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>> index e418b98..c533719 100644
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
>> pirq *pirq)
>> ASSERT(pcidevs_locked());
>> ASSERT(spin_is_locked(&d->event_lock));
>>
>> - if ( !has_vlapic(d) )
>> + if ( !d->arch.hvm_domain.msixtbl_list.next )
>> return;
>>
>> irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
>>
>> should resolve your issue, although I am very tempted to replace the
>> opencoded list logic with a msixtbl_initialised() predicate instead.
>>
>> ~Andrew
>
> It does resolve the issue, thanks !
Right - I will clean up the patch tomorrow using a more logical predicate.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
2016-07-18 10:21 Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts linux
2016-07-18 17:48 ` Andrew Cooper
@ 2016-07-21 10:18 ` Andrew Cooper
2016-07-21 10:37 ` Sander Eikelenboom
` (2 more replies)
1 sibling, 3 replies; 13+ messages in thread
From: Andrew Cooper @ 2016-07-21 10:18 UTC (permalink / raw)
To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Sander Eikelenboom
c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
table infrastructure not to always be initialised, but it missed one path
which needed an is-initialised check.
If a devices is passed through to a domain which is MSI capable but not MSI-X
capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
hypercall still calls into msixtbl_pt_unregister(). This follows the linked
list pointer which is still NULL.
Introduce an is-initalised check to msixtbl_pt_unregister().
Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
subtle. Introduce an msixtbl_initialised() predicate instead, which makes its
purpose far more obvious.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Sander Eikelenboom <linux@eikelenboom.it>
Sander - would you mind double checking this patch?
---
xen/arch/x86/hvm/vmsi.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index e418b98..ef1dfff 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -166,6 +166,16 @@ struct msixtbl_entry
static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
+/*
+ * MSI-X table infrastructure is dynamically initialised when an MSI-X capable
+ * device is passed through to a domain, rather than unconditionally for all
+ * domains.
+ */
+static bool msixtbl_initialised(const struct domain *d)
+{
+ return !!d->arch.hvm_domain.msixtbl_list.next;
+}
+
static struct msixtbl_entry *msixtbl_find_entry(
struct vcpu *v, unsigned long addr)
{
@@ -519,7 +529,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq)
ASSERT(pcidevs_locked());
ASSERT(spin_is_locked(&d->event_lock));
- if ( !has_vlapic(d) )
+ if ( !msixtbl_initialised(d) )
return;
irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
@@ -552,7 +562,7 @@ void msixtbl_init(struct domain *d)
struct hvm_io_handler *handler;
if ( !has_hvm_container_domain(d) || !has_vlapic(d) ||
- d->arch.hvm_domain.msixtbl_list.next )
+ msixtbl_initialised(d) )
return;
INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
@@ -569,7 +579,7 @@ void msixtbl_pt_cleanup(struct domain *d)
{
struct msixtbl_entry *entry, *temp;
- if ( !d->arch.hvm_domain.msixtbl_list.next )
+ if ( !msixtbl_initialised(d) )
return;
spin_lock(&d->event_lock);
--
2.1.4
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
@ 2016-07-21 10:37 ` Sander Eikelenboom
2016-07-22 8:50 ` Sander Eikelenboom
2016-07-25 10:26 ` George Dunlap
2 siblings, 0 replies; 13+ messages in thread
From: Sander Eikelenboom @ 2016-07-21 10:37 UTC (permalink / raw)
To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich
On July 21, 2016 12:18:37 PM GMT+02:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused
>MSI-X
>table infrastructure not to always be initialised, but it missed one
>path
>which needed an is-initialised check.
>
>If a devices is passed through to a domain which is MSI capable but not
>MSI-X
>capable, the call to msixtbl_init() is omitted, but a
>XEN_DOMCTL_unbind_pt_irq
>hypercall still calls into msixtbl_pt_unregister(). This follows the
>linked
>list pointer which is still NULL.
>
>Introduce an is-initalised check to msixtbl_pt_unregister().
>
>Furthermore, the purpose of the open-coded msixtbl_list.next check is
>rather
>subtle. Introduce an msixtbl_initialised() predicate instead, which
>makes its
>purpose far more obvious.
>
>Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>---
>CC: Jan Beulich <JBeulich@suse.com>
>CC: Sander Eikelenboom <linux@eikelenboom.it>
>
>Sander - would you mind double checking this patch?
>---
Sure, will report back tomorrow.
--
Sander
> xen/arch/x86/hvm/vmsi.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
>diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>index e418b98..ef1dfff 100644
>--- a/xen/arch/x86/hvm/vmsi.c
>+++ b/xen/arch/x86/hvm/vmsi.c
>@@ -166,6 +166,16 @@ struct msixtbl_entry
>
> static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
>
>+/*
>+ * MSI-X table infrastructure is dynamically initialised when an MSI-X
>capable
>+ * device is passed through to a domain, rather than unconditionally
>for all
>+ * domains.
>+ */
>+static bool msixtbl_initialised(const struct domain *d)
>+{
>+ return !!d->arch.hvm_domain.msixtbl_list.next;
>+}
>+
> static struct msixtbl_entry *msixtbl_find_entry(
> struct vcpu *v, unsigned long addr)
> {
>@@ -519,7 +529,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
>pirq *pirq)
> ASSERT(pcidevs_locked());
> ASSERT(spin_is_locked(&d->event_lock));
>
>- if ( !has_vlapic(d) )
>+ if ( !msixtbl_initialised(d) )
> return;
>
> irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
>@@ -552,7 +562,7 @@ void msixtbl_init(struct domain *d)
> struct hvm_io_handler *handler;
>
> if ( !has_hvm_container_domain(d) || !has_vlapic(d) ||
>- d->arch.hvm_domain.msixtbl_list.next )
>+ msixtbl_initialised(d) )
> return;
>
> INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
>@@ -569,7 +579,7 @@ void msixtbl_pt_cleanup(struct domain *d)
> {
> struct msixtbl_entry *entry, *temp;
>
>- if ( !d->arch.hvm_domain.msixtbl_list.next )
>+ if ( !msixtbl_initialised(d) )
> return;
>
> spin_lock(&d->event_lock);
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
2016-07-21 10:37 ` Sander Eikelenboom
@ 2016-07-22 8:50 ` Sander Eikelenboom
2016-07-25 10:16 ` Andrew Cooper
2016-07-25 10:26 ` George Dunlap
2 siblings, 1 reply; 13+ messages in thread
From: Sander Eikelenboom @ 2016-07-22 8:50 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel
Thursday, July 21, 2016, 12:18:37 PM, you wrote:
> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
> table infrastructure not to always be initialised, but it missed one path
> which needed an is-initialised check.
> If a devices is passed through to a domain which is MSI capable but not MSI-X
> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
> hypercall still calls into msixtbl_pt_unregister(). This follows the linked
> list pointer which is still NULL.
> Introduce an is-initalised check to msixtbl_pt_unregister().
> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
> subtle. Introduce an msixtbl_initialised() predicate instead, which makes its
> purpose far more obvious.
> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Sander Eikelenboom <linux@eikelenboom.it>
> Sander - would you mind double checking this patch?
> ---
Hi Andrew,
Just got the chance to test and it works for me !
Thanks,
Sander
> xen/arch/x86/hvm/vmsi.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index e418b98..ef1dfff 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -166,6 +166,16 @@ struct msixtbl_entry
>
> static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
>
> +/*
> + * MSI-X table infrastructure is dynamically initialised when an MSI-X capable
> + * device is passed through to a domain, rather than unconditionally for all
> + * domains.
> + */
> +static bool msixtbl_initialised(const struct domain *d)
> +{
+ return !!d->>arch.hvm_domain.msixtbl_list.next;
> +}
> +
> static struct msixtbl_entry *msixtbl_find_entry(
> struct vcpu *v, unsigned long addr)
> {
> @@ -519,7 +529,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq)
> ASSERT(pcidevs_locked());
> ASSERT(spin_is_locked(&d->event_lock));
>
> - if ( !has_vlapic(d) )
> + if ( !msixtbl_initialised(d) )
> return;
>
> irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
> @@ -552,7 +562,7 @@ void msixtbl_init(struct domain *d)
> struct hvm_io_handler *handler;
>
> if ( !has_hvm_container_domain(d) || !has_vlapic(d) ||
- d->>arch.hvm_domain.msixtbl_list.next )
> + msixtbl_initialised(d) )
> return;
>
> INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
> @@ -569,7 +579,7 @@ void msixtbl_pt_cleanup(struct domain *d)
> {
> struct msixtbl_entry *entry, *temp;
>
- if ( !d->>arch.hvm_domain.msixtbl_list.next )
> + if ( !msixtbl_initialised(d) )
> return;
>
> spin_lock(&d->event_lock);
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
2016-07-22 8:50 ` Sander Eikelenboom
@ 2016-07-25 10:16 ` Andrew Cooper
2016-07-25 10:19 ` Andrew Cooper
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2016-07-25 10:16 UTC (permalink / raw)
To: Sander Eikelenboom; +Cc: Jan Beulich, Xen-devel
On 22/07/16 09:50, Sander Eikelenboom wrote:
> Thursday, July 21, 2016, 12:18:37 PM, you wrote:
>
>> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
>> table infrastructure not to always be initialised, but it missed one path
>> which needed an is-initialised check.
>> If a devices is passed through to a domain which is MSI capable but not MSI-X
>> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
>> hypercall still calls into msixtbl_pt_unregister(). This follows the linked
>> list pointer which is still NULL.
>> Introduce an is-initalised check to msixtbl_pt_unregister().
>> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
>> subtle. Introduce an msixtbl_initialised() predicate instead, which makes its
>> purpose far more obvious.
>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Sander Eikelenboom <linux@eikelenboom.it>
>> Sander - would you mind double checking this patch?
>> ---
> Hi Andrew,
>
> Just got the chance to test and it works for me !
>
> Thanks,
May I take that as a Test-by: then please?
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
2016-07-25 10:16 ` Andrew Cooper
@ 2016-07-25 10:19 ` Andrew Cooper
2016-07-25 10:23 ` Sander Eikelenboom
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2016-07-25 10:19 UTC (permalink / raw)
To: Sander Eikelenboom; +Cc: Jan Beulich, Xen-devel
On 25/07/16 11:16, Andrew Cooper wrote:
> On 22/07/16 09:50, Sander Eikelenboom wrote:
>> Thursday, July 21, 2016, 12:18:37 PM, you wrote:
>>
>>> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
>>> table infrastructure not to always be initialised, but it missed one path
>>> which needed an is-initialised check.
>>> If a devices is passed through to a domain which is MSI capable but not MSI-X
>>> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
>>> hypercall still calls into msixtbl_pt_unregister(). This follows the linked
>>> list pointer which is still NULL.
>>> Introduce an is-initalised check to msixtbl_pt_unregister().
>>> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
>>> subtle. Introduce an msixtbl_initialised() predicate instead, which makes its
>>> purpose far more obvious.
>>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Sander Eikelenboom <linux@eikelenboom.it>
>>> Sander - would you mind double checking this patch?
>>> ---
>> Hi Andrew,
>>
>> Just got the chance to test and it works for me !
>>
>> Thanks,
> May I take that as a Test-by: then please?
And of course, I meant Tested-by:
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
2016-07-25 10:19 ` Andrew Cooper
@ 2016-07-25 10:23 ` Sander Eikelenboom
0 siblings, 0 replies; 13+ messages in thread
From: Sander Eikelenboom @ 2016-07-25 10:23 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel
Monday, July 25, 2016, 12:19:55 PM, you wrote:
> On 25/07/16 11:16, Andrew Cooper wrote:
>> On 22/07/16 09:50, Sander Eikelenboom wrote:
>>> Thursday, July 21, 2016, 12:18:37 PM, you wrote:
>>>
>>>> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
>>>> table infrastructure not to always be initialised, but it missed one path
>>>> which needed an is-initialised check.
>>>> If a devices is passed through to a domain which is MSI capable but not MSI-X
>>>> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
>>>> hypercall still calls into msixtbl_pt_unregister(). This follows the linked
>>>> list pointer which is still NULL.
>>>> Introduce an is-initalised check to msixtbl_pt_unregister().
>>>> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
>>>> subtle. Introduce an msixtbl_initialised() predicate instead, which makes its
>>>> purpose far more obvious.
>>>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Sander Eikelenboom <linux@eikelenboom.it>
>>>> Sander - would you mind double checking this patch?
>>>> ---
>>> Hi Andrew,
>>>
>>> Just got the chance to test and it works for me !
>>>
>>> Thanks,
>> May I take that as a Test-by: then please?
> And of course, I meant Tested-by:
Yes, thanks for the quick fix !
--
Sander
> ~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
2016-07-21 10:37 ` Sander Eikelenboom
2016-07-22 8:50 ` Sander Eikelenboom
@ 2016-07-25 10:26 ` George Dunlap
2 siblings, 0 replies; 13+ messages in thread
From: George Dunlap @ 2016-07-25 10:26 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Sander Eikelenboom, Jan Beulich, Xen-devel
On Thu, Jul 21, 2016 at 11:18 AM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
> table infrastructure not to always be initialised, but it missed one path
> which needed an is-initialised check.
>
> If a devices is passed through to a domain which is MSI capable but not MSI-X
> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
> hypercall still calls into msixtbl_pt_unregister(). This follows the linked
> list pointer which is still NULL.
>
> Introduce an is-initalised check to msixtbl_pt_unregister().
>
> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
> subtle. Introduce an msixtbl_initialised() predicate instead, which makes its
> purpose far more obvious.
Thanks for this bit.
> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-07-25 10:26 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-18 10:21 Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts linux
2016-07-18 17:48 ` Andrew Cooper
2016-07-18 19:26 ` Sander Eikelenboom
2016-07-18 20:57 ` Andrew Cooper
2016-07-18 22:03 ` linux
2016-07-18 22:07 ` Andrew Cooper
2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
2016-07-21 10:37 ` Sander Eikelenboom
2016-07-22 8:50 ` Sander Eikelenboom
2016-07-25 10:16 ` Andrew Cooper
2016-07-25 10:19 ` Andrew Cooper
2016-07-25 10:23 ` Sander Eikelenboom
2016-07-25 10:26 ` George Dunlap
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.