All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
@ 2016-07-18 10:21 linux
  2016-07-18 17:48 ` Andrew Cooper
  2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
  0 siblings, 2 replies; 13+ messages in thread
From: linux @ 2016-07-18 10:21 UTC (permalink / raw)
  To: Xen-devel; +Cc: Jan Beulich

Hi Jan,

It seems that since your patch series starting with commit:
2016-06-22 x86/vMSI-X: defer intercept handler registration
74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798

The shutdown of a guest which has a PCI device passed through which uses 
MSI-X interrupts causes
a host crash, see the splat below. Somehow it also doesn't reboot in 5 
seconds as it is supposed to (i don't have no-reboot on the command 
line).

--
Sander


(XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable  x86_64  debug=y  
Not tainted ]----
(XEN) [2016-07-16 16:03:17.069] CPU:    0
(XEN) [2016-07-16 16:03:17.069] RIP:    e008:[<ffff82d0801e39de>] 
msixtbl_pt_unregister+0x7b/0xd9
(XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082   CONTEXT: 
hypervisor (d0v0)
(XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40   rbx: 
ffff83055c685500   rcx: 0000000000000001
(XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000   rsi: 
0000000000001ab0   rdi: ffff8305313b85a0
(XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78   rsp: 
ffff83009fd07c68   r8:  ffff8305356dfff0
(XEN) [2016-07-16 16:03:17.069] r9:  ffff8305356df480   r10: 
ffff830503420c50   r11: 0000000000000282
(XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000   r13: 
ffff83009fd07e48   r14: ffff8305313b8000
(XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8   cr0: 
0000000080050033   cr4: 00000000000006e0
(XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000   cr2: 
0000000000000000
(XEN) [2016-07-16 16:03:17.069] ds: 0000   es: 0000   fs: 0000   gs: 
0000   ss: e010   cs: e008
(XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de> 
(msixtbl_pt_unregister+0x7b/0xd9):
(XEN) [2016-07-16 16:03:17.069]  39 42 18 74 19 48 89 ca <48> 8b 0a 0f 
18 09 48 39 fa 75 ec 48 8d 7b 24 e8
(XEN) [2016-07-16 16:03:17.069] Xen stack trace from 
rsp=ffff83009fd07c68:
(XEN) [2016-07-16 16:03:17.069]    0000000000000000 ffff8305356df480 
ffff83009fd07ce8 ffff82d08014c394
(XEN) [2016-07-16 16:03:17.069]    0000000000000001 ffff8305356df480 
0000000000000293 ffff8305313b80cc
(XEN) [2016-07-16 16:03:17.069]    000000568012ffe5 ffff8305313b8000 
ffff83009fd07cd8 ffff83009fd07e38
(XEN) [2016-07-16 16:03:17.070]    0000000000000000 ffff83054e5fc000 
00007fc25a33e004 ffff8305313b8000
(XEN) [2016-07-16 16:03:17.070]    ffff83009fd07da8 ffff82d0801629c8 
0000000000000000 ffff83053b1191f0
(XEN) [2016-07-16 16:03:17.070]    0000000000000246 ffff83009fd07d28 
ffff82d0801300ae 000000000000000e
(XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d080171497 
ffff83009fd07d78 000000020001d17b
(XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d68 0000000000000000 
ffff83009fd07d68 ffff82d080130280
(XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d08014d0aa 
0000000000000202 0000000000000000
(XEN) [2016-07-16 16:03:17.070]    ffff8305313b8000 ffff88005716d320 
0000000000305000 00007fc25a33e004
(XEN) [2016-07-16 16:03:17.070]    ffff83009fd07ef8 ffff82d080104b2c 
0000000000000206 0000000000000002
(XEN) [2016-07-16 16:03:17.070]    ffff83009fd07df8 ffff82d08018c9db 
0000000000000cfe 0000000000000002
(XEN) [2016-07-16 16:03:17.070]    0000000000000002 ffff83054e5fc000 
ffff83009fd07e48 ffff82d08019c119
(XEN) [2016-07-16 16:03:17.070]    ffff83009fd07e38 0000000080121177 
ffff83009fd07e38 0000000000000cfe
(XEN) [2016-07-16 16:03:17.070]    ffff83009fd07f18 0000000000000206 
0000000c00000030 000056082bb90013
(XEN) [2016-07-16 16:03:17.070]    0000000200000056 00007fc200000013 
0000305600000000 000056082b87465d
(XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 00007fc25606b31f 
0000000000000000 000056082b8746cf
(XEN) [2016-07-16 16:03:17.070]    0000000000001000 fee5600026820730 
00007ffe26820740 000056082b8797be
(XEN) [2016-07-16 16:03:17.070]    00000000fee56000 0000430026820772 
00007ffe26820740 0000000000003056
(XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 ffff83009ff8a000 
00007ffe26820580 ffff88005716d320
(XEN) [2016-07-16 16:03:17.070] Xen call trace:
(XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801e39de>] 
msixtbl_pt_unregister+0x7b/0xd9
(XEN) [2016-07-16 16:03:17.070]    [<ffff82d08014c394>] 
pt_irq_destroy_bind+0x2be/0x3f0
(XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801629c8>] 
arch_do_domctl+0xc77/0x2414
(XEN) [2016-07-16 16:03:17.070]    [<ffff82d080104b2c>] 
do_domctl+0x19db/0x1d26
(XEN) [2016-07-16 16:03:17.070]    [<ffff82d0802426bd>] 
lstar_enter+0xdd/0x137
(XEN) [2016-07-16 16:03:17.070]
(XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
(XEN) [2016-07-16 16:03:17.070]  L4[0x000] = 0000000000000000 
ffffffffffffffff
(XEN) [2016-07-16 16:03:18.147]
(XEN) [2016-07-16 16:03:18.155] ****************************************
(XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
(XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
(XEN) [2016-07-16 16:03:18.200] [error_code=0000]
(XEN) [2016-07-16 16:03:18.214] Faulting linear address: 
0000000000000000
(XEN) [2016-07-16 16:03:18.233] ****************************************
(XEN) [2016-07-16 16:03:18.252]
(XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
  2016-07-18 10:21 Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts linux
@ 2016-07-18 17:48 ` Andrew Cooper
  2016-07-18 19:26   ` Sander Eikelenboom
  2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2016-07-18 17:48 UTC (permalink / raw)
  To: linux, Xen-devel; +Cc: Jan Beulich

On 18/07/16 11:21, linux@eikelenboom.it wrote:
> Hi Jan,
>
> It seems that since your patch series starting with commit:
> 2016-06-22 x86/vMSI-X: defer intercept handler registration
> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>
> The shutdown of a guest which has a PCI device passed through which
> uses MSI-X interrupts causes
> a host crash, see the splat below. Somehow it also doesn't reboot in 5
> seconds as it is supposed to (i don't have no-reboot on the command
> line).
>
> -- 
> Sander
>
>
> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable  x86_64 
> debug=y  Not tainted ]----
> (XEN) [2016-07-16 16:03:17.069] CPU:    0
> (XEN) [2016-07-16 16:03:17.069] RIP:    e008:[<ffff82d0801e39de>]
> msixtbl_pt_unregister+0x7b/0xd9
> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082   CONTEXT:
> hypervisor (d0v0)
> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40   rbx:
> ffff83055c685500   rcx: 0000000000000001
> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000   rsi:
> 0000000000001ab0   rdi: ffff8305313b85a0
> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78   rsp:
> ffff83009fd07c68   r8:  ffff8305356dfff0
> (XEN) [2016-07-16 16:03:17.069] r9:  ffff8305356df480   r10:
> ffff830503420c50   r11: 0000000000000282
> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000   r13:
> ffff83009fd07e48   r14: ffff8305313b8000
> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8   cr0:
> 0000000080050033   cr4: 00000000000006e0
> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000   cr2:
> 0000000000000000
> (XEN) [2016-07-16 16:03:17.069] ds: 0000   es: 0000   fs: 0000   gs:
> 0000   ss: e010   cs: e008
> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
> (msixtbl_pt_unregister+0x7b/0xd9):
> (XEN) [2016-07-16 16:03:17.069]  39 42 18 74 19 48 89 ca <48> 8b 0a 0f
> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
> rsp=ffff83009fd07c68:
> (XEN) [2016-07-16 16:03:17.069]    0000000000000000 ffff8305356df480
> ffff83009fd07ce8 ffff82d08014c394
> (XEN) [2016-07-16 16:03:17.069]    0000000000000001 ffff8305356df480
> 0000000000000293 ffff8305313b80cc
> (XEN) [2016-07-16 16:03:17.069]    000000568012ffe5 ffff8305313b8000
> ffff83009fd07cd8 ffff83009fd07e38
> (XEN) [2016-07-16 16:03:17.070]    0000000000000000 ffff83054e5fc000
> 00007fc25a33e004 ffff8305313b8000
> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07da8 ffff82d0801629c8
> 0000000000000000 ffff83053b1191f0
> (XEN) [2016-07-16 16:03:17.070]    0000000000000246 ffff83009fd07d28
> ffff82d0801300ae 000000000000000e
> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d080171497
> ffff83009fd07d78 000000020001d17b
> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d68 0000000000000000
> ffff83009fd07d68 ffff82d080130280
> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d08014d0aa
> 0000000000000202 0000000000000000
> (XEN) [2016-07-16 16:03:17.070]    ffff8305313b8000 ffff88005716d320
> 0000000000305000 00007fc25a33e004
> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07ef8 ffff82d080104b2c
> 0000000000000206 0000000000000002
> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07df8 ffff82d08018c9db
> 0000000000000cfe 0000000000000002
> (XEN) [2016-07-16 16:03:17.070]    0000000000000002 ffff83054e5fc000
> ffff83009fd07e48 ffff82d08019c119
> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07e38 0000000080121177
> ffff83009fd07e38 0000000000000cfe
> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07f18 0000000000000206
> 0000000c00000030 000056082bb90013
> (XEN) [2016-07-16 16:03:17.070]    0000000200000056 00007fc200000013
> 0000305600000000 000056082b87465d
> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 00007fc25606b31f
> 0000000000000000 000056082b8746cf
> (XEN) [2016-07-16 16:03:17.070]    0000000000001000 fee5600026820730
> 00007ffe26820740 000056082b8797be
> (XEN) [2016-07-16 16:03:17.070]    00000000fee56000 0000430026820772
> 00007ffe26820740 0000000000003056
> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 ffff83009ff8a000
> 00007ffe26820580 ffff88005716d320
> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801e39de>]
> msixtbl_pt_unregister+0x7b/0xd9
> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d08014c394>]
> pt_irq_destroy_bind+0x2be/0x3f0
> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801629c8>]
> arch_do_domctl+0xc77/0x2414
> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d080104b2c>]
> do_domctl+0x19db/0x1d26
> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0802426bd>]
> lstar_enter+0xdd/0x137
> (XEN) [2016-07-16 16:03:17.070]
> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
> (XEN) [2016-07-16 16:03:17.070]  L4[0x000] = 0000000000000000
> ffffffffffffffff
> (XEN) [2016-07-16 16:03:18.147]
> (XEN) [2016-07-16 16:03:18.155] ****************************************
> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000
> (XEN) [2016-07-16 16:03:18.233] ****************************************
> (XEN) [2016-07-16 16:03:18.252]
> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>

Can you paste the disassembly of msixtbl_pt_unregister() please?  That
is a dereference of %rdx which is NULL at this point, but I need to
figure out which pointer it is supposed to be.

Thanks,

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
  2016-07-18 17:48 ` Andrew Cooper
@ 2016-07-18 19:26   ` Sander Eikelenboom
  2016-07-18 20:57     ` Andrew Cooper
  0 siblings, 1 reply; 13+ messages in thread
From: Sander Eikelenboom @ 2016-07-18 19:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel


Monday, July 18, 2016, 7:48:20 PM, you wrote:

> On 18/07/16 11:21, linux@eikelenboom.it wrote:
>> Hi Jan,
>>
>> It seems that since your patch series starting with commit:
>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>
>> The shutdown of a guest which has a PCI device passed through which
>> uses MSI-X interrupts causes
>> a host crash, see the splat below. Somehow it also doesn't reboot in 5
>> seconds as it is supposed to (i don't have no-reboot on the command
>> line).
>>
>> -- 
>> Sander
>>
>>
>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable  x86_64 
>> debug=y  Not tainted ]----
>> (XEN) [2016-07-16 16:03:17.069] CPU:    0
>> (XEN) [2016-07-16 16:03:17.069] RIP:    e008:[<ffff82d0801e39de>]
>> msixtbl_pt_unregister+0x7b/0xd9
>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082   CONTEXT:
>> hypervisor (d0v0)
>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40   rbx:
>> ffff83055c685500   rcx: 0000000000000001
>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000   rsi:
>> 0000000000001ab0   rdi: ffff8305313b85a0
>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78   rsp:
>> ffff83009fd07c68   r8:  ffff8305356dfff0
>> (XEN) [2016-07-16 16:03:17.069] r9:  ffff8305356df480   r10:
>> ffff830503420c50   r11: 0000000000000282
>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000   r13:
>> ffff83009fd07e48   r14: ffff8305313b8000
>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8   cr0:
>> 0000000080050033   cr4: 00000000000006e0
>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000   cr2:
>> 0000000000000000
>> (XEN) [2016-07-16 16:03:17.069] ds: 0000   es: 0000   fs: 0000   gs:
>> 0000   ss: e010   cs: e008
>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>> (msixtbl_pt_unregister+0x7b/0xd9):
>> (XEN) [2016-07-16 16:03:17.069]  39 42 18 74 19 48 89 ca <48> 8b 0a 0f
>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>> rsp=ffff83009fd07c68:
>> (XEN) [2016-07-16 16:03:17.069]    0000000000000000 ffff8305356df480
>> ffff83009fd07ce8 ffff82d08014c394
>> (XEN) [2016-07-16 16:03:17.069]    0000000000000001 ffff8305356df480
>> 0000000000000293 ffff8305313b80cc
>> (XEN) [2016-07-16 16:03:17.069]    000000568012ffe5 ffff8305313b8000
>> ffff83009fd07cd8 ffff83009fd07e38
>> (XEN) [2016-07-16 16:03:17.070]    0000000000000000 ffff83054e5fc000
>> 00007fc25a33e004 ffff8305313b8000
>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07da8 ffff82d0801629c8
>> 0000000000000000 ffff83053b1191f0
>> (XEN) [2016-07-16 16:03:17.070]    0000000000000246 ffff83009fd07d28
>> ffff82d0801300ae 000000000000000e
>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d080171497
>> ffff83009fd07d78 000000020001d17b
>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d68 0000000000000000
>> ffff83009fd07d68 ffff82d080130280
>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d08014d0aa
>> 0000000000000202 0000000000000000
>> (XEN) [2016-07-16 16:03:17.070]    ffff8305313b8000 ffff88005716d320
>> 0000000000305000 00007fc25a33e004
>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07ef8 ffff82d080104b2c
>> 0000000000000206 0000000000000002
>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07df8 ffff82d08018c9db
>> 0000000000000cfe 0000000000000002
>> (XEN) [2016-07-16 16:03:17.070]    0000000000000002 ffff83054e5fc000
>> ffff83009fd07e48 ffff82d08019c119
>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07e38 0000000080121177
>> ffff83009fd07e38 0000000000000cfe
>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07f18 0000000000000206
>> 0000000c00000030 000056082bb90013
>> (XEN) [2016-07-16 16:03:17.070]    0000000200000056 00007fc200000013
>> 0000305600000000 000056082b87465d
>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 00007fc25606b31f
>> 0000000000000000 000056082b8746cf
>> (XEN) [2016-07-16 16:03:17.070]    0000000000001000 fee5600026820730
>> 00007ffe26820740 000056082b8797be
>> (XEN) [2016-07-16 16:03:17.070]    00000000fee56000 0000430026820772
>> 00007ffe26820740 0000000000003056
>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 ffff83009ff8a000
>> 00007ffe26820580 ffff88005716d320
>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801e39de>]
>> msixtbl_pt_unregister+0x7b/0xd9
>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d08014c394>]
>> pt_irq_destroy_bind+0x2be/0x3f0
>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801629c8>]
>> arch_do_domctl+0xc77/0x2414
>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d080104b2c>]
>> do_domctl+0x19db/0x1d26
>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0802426bd>]
>> lstar_enter+0xdd/0x137
>> (XEN) [2016-07-16 16:03:17.070]
>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
>> (XEN) [2016-07-16 16:03:17.070]  L4[0x000] = 0000000000000000
>> ffffffffffffffff
>> (XEN) [2016-07-16 16:03:18.147]
>> (XEN) [2016-07-16 16:03:18.155] ****************************************
>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000
>> (XEN) [2016-07-16 16:03:18.233] ****************************************
>> (XEN) [2016-07-16 16:03:18.252]
>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>

> Can you paste the disassembly of msixtbl_pt_unregister() please?  That
> is a dereference of %rdx which is NULL at this point, but I need to
> figure out which pointer it is supposed to be.

Hi Andrew,

# addr2line -e xen-syms ffff82d0801e3e7e
/usr/src/new/xen-unstable/xen/arch/x86/hvm/vmsi.c:535 (discriminator 1)

So the RIP points to:
void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq)
{
    struct irq_desc *irq_desc;
    struct msi_desc *msi_desc;
    struct pci_dev *pdev;
    struct msixtbl_entry *entry;

    ASSERT(pcidevs_locked());
    ASSERT(spin_is_locked(&d->event_lock));

    if ( !has_vlapic(d) )
        return;

    irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
    if ( !irq_desc )
        return;

    msi_desc = irq_desc->msi_desc;
    if ( !msi_desc )
        goto out;

    pdev = msi_desc->dev;

    list_for_each_entry( entry, &d->arch.hvm_domain.msixtbl_list, list )   <--- HERE
        if ( pdev == entry->pdev )
            goto found;

out:
    spin_unlock_irq(&irq_desc->lock);
    return;

found:
    if ( !atomic_dec_and_test(&entry->refcnt) )
        del_msixtbl_entry(entry);

    spin_unlock_irq(&irq_desc->lock);
}


Disassembly:

(gdb) info line msixtbl_pt_unregister
Line 513 of "vmsi.c" starts at address 0xffff82d0801e3e03 <msixtbl_pt_unregister> and ends at 0xffff82d0801e3e10 <msixtbl_pt_unregister+13>.
(gdb) disas 0xffff82d0801e3e03
Dump of assembler code for function msixtbl_pt_unregister:
   0xffff82d0801e3e03 <+0>:     push   %rbp
   0xffff82d0801e3e04 <+1>:     mov    %rsp,%rbp
   0xffff82d0801e3e07 <+4>:     push   %r12
   0xffff82d0801e3e09 <+6>:     push   %rbx
   0xffff82d0801e3e0a <+7>:     mov    %rdi,%r12
   0xffff82d0801e3e0d <+10>:    mov    %rsi,%rbx
   0xffff82d0801e3e10 <+13>:    callq  0xffff82d08014d585 <pcidevs_locked>
   0xffff82d0801e3e15 <+18>:    test   %al,%al
   0xffff82d0801e3e17 <+20>:    jne    0xffff82d0801e3e1b <msixtbl_pt_unregister+24>
   0xffff82d0801e3e19 <+22>:    ud2
   0xffff82d0801e3e1b <+24>:    lea    0xcc(%r12),%rdi
   0xffff82d0801e3e23 <+32>:    callq  0xffff82d080130544 <_spin_is_locked>
   0xffff82d0801e3e28 <+37>:    test   %eax,%eax
   0xffff82d0801e3e2a <+39>:    jne    0xffff82d0801e3e2e <msixtbl_pt_unregister+43>
   0xffff82d0801e3e2c <+41>:    ud2
   0xffff82d0801e3e2e <+43>:    testb  $0x1,0x9dc(%r12)
   0xffff82d0801e3e37 <+52>:    je     0xffff82d0801e3ed7 <msixtbl_pt_unregister+212>
   0xffff82d0801e3e3d <+58>:    mov    $0x0,%esi
   0xffff82d0801e3e42 <+63>:    mov    %rbx,%rdi
   0xffff82d0801e3e45 <+66>:    callq  0xffff82d0801743a4 <pirq_spin_lock_irq_desc>
   0xffff82d0801e3e4a <+71>:    mov    %rax,%rbx
   0xffff82d0801e3e4d <+74>:    test   %rax,%rax
   0xffff82d0801e3e50 <+77>:    je     0xffff82d0801e3ed7 <msixtbl_pt_unregister+212>
   0xffff82d0801e3e56 <+83>:    mov    0x10(%rax),%rax
   0xffff82d0801e3e5a <+87>:    test   %rax,%rax
   0xffff82d0801e3e5d <+90>:    je     0xffff82d0801e3e89 <msixtbl_pt_unregister+134>
   0xffff82d0801e3e5f <+92>:    mov    0x20(%rax),%rax
   0xffff82d0801e3e63 <+96>:    mov    0x5a0(%r12),%rdx
   0xffff82d0801e3e6b <+104>:   lea    0x5a0(%r12),%rdi
   0xffff82d0801e3e73 <+112>:   jmp    0xffff82d0801e3e7e <msixtbl_pt_unregister+123>
   0xffff82d0801e3e75 <+114>:   cmp    %rax,0x18(%rdx)
   0xffff82d0801e3e79 <+118>:   je     0xffff82d0801e3e94 <msixtbl_pt_unregister+145>
   0xffff82d0801e3e7b <+120>:   mov    %rcx,%rdx
   0xffff82d0801e3e7e <+123>:   mov    (%rdx),%rcx
   0xffff82d0801e3e81 <+126>:   prefetcht0 (%rcx)
   0xffff82d0801e3e84 <+129>:   cmp    %rdi,%rdx
   0xffff82d0801e3e87 <+132>:   jne    0xffff82d0801e3e75 <msixtbl_pt_unregister+114>
   0xffff82d0801e3e89 <+134>:   lea    0x24(%rbx),%rdi
   0xffff82d0801e3e8d <+138>:   callq  0xffff82d080130514 <_spin_unlock_irq>
   0xffff82d0801e3e92 <+143>:   jmp    0xffff82d0801e3ed7 <msixtbl_pt_unregister+212>
   0xffff82d0801e3e94 <+145>:   lock decl 0x10(%rdx)
   0xffff82d0801e3e98 <+149>:   sete   %al
   0xffff82d0801e3e9b <+152>:   test   %al,%al
   0xffff82d0801e3e9d <+154>:   jne    0xffff82d0801e3ece <msixtbl_pt_unregister+203>
   0xffff82d0801e3e9f <+156>:   mov    (%rdx),%rcx
   0xffff82d0801e3ea2 <+159>:   mov    0x8(%rdx),%rax
   0xffff82d0801e3ea6 <+163>:   mov    %rax,0x8(%rcx)
   0xffff82d0801e3eaa <+167>:   mov    %rcx,(%rax)
   0xffff82d0801e3ead <+170>:   movabs $0x200200200200200,%rax
   0xffff82d0801e3eb7 <+180>:   mov    %rax,0x8(%rdx)
   0xffff82d0801e3ebb <+184>:   lea    0x158(%rdx),%rdi
   0xffff82d0801e3ec2 <+191>:   lea    -0xac1(%rip),%rsi        # 0xffff82d0801e3408 <free_msixtbl_entry>
   0xffff82d0801e3ec9 <+198>:   callq  0xffff82d080122be0 <call_rcu>
   0xffff82d0801e3ece <+203>:   lea    0x24(%rbx),%rdi
   0xffff82d0801e3ed2 <+207>:   callq  0xffff82d080130514 <_spin_unlock_irq>
   0xffff82d0801e3ed7 <+212>:   pop    %rbx
   0xffff82d0801e3ed8 <+213>:   pop    %r12
   0xffff82d0801e3eda <+215>:   pop    %rbp
   0xffff82d0801e3edb <+216>:   retq
End of assembler dump.

--
Sander

> Thanks,

> ~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
  2016-07-18 19:26   ` Sander Eikelenboom
@ 2016-07-18 20:57     ` Andrew Cooper
  2016-07-18 22:03       ` linux
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2016-07-18 20:57 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: Jan Beulich, Xen-devel

On 18/07/2016 20:26, Sander Eikelenboom wrote:
> Monday, July 18, 2016, 7:48:20 PM, you wrote:
>
>> On 18/07/16 11:21, linux@eikelenboom.it wrote:
>>> Hi Jan,
>>>
>>> It seems that since your patch series starting with commit:
>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>>
>>> The shutdown of a guest which has a PCI device passed through which
>>> uses MSI-X interrupts causes
>>> a host crash, see the splat below. Somehow it also doesn't reboot in 5
>>> seconds as it is supposed to (i don't have no-reboot on the command
>>> line).
>>>
>>> -- 
>>> Sander
>>>
>>>
>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable  x86_64 
>>> debug=y  Not tainted ]----
>>> (XEN) [2016-07-16 16:03:17.069] CPU:    0
>>> (XEN) [2016-07-16 16:03:17.069] RIP:    e008:[<ffff82d0801e39de>]
>>> msixtbl_pt_unregister+0x7b/0xd9
>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082   CONTEXT:
>>> hypervisor (d0v0)
>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40   rbx:
>>> ffff83055c685500   rcx: 0000000000000001
>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000   rsi:
>>> 0000000000001ab0   rdi: ffff8305313b85a0
>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78   rsp:
>>> ffff83009fd07c68   r8:  ffff8305356dfff0
>>> (XEN) [2016-07-16 16:03:17.069] r9:  ffff8305356df480   r10:
>>> ffff830503420c50   r11: 0000000000000282
>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000   r13:
>>> ffff83009fd07e48   r14: ffff8305313b8000
>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8   cr0:
>>> 0000000080050033   cr4: 00000000000006e0
>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000   cr2:
>>> 0000000000000000
>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000   es: 0000   fs: 0000   gs:
>>> 0000   ss: e010   cs: e008
>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>>> (msixtbl_pt_unregister+0x7b/0xd9):
>>> (XEN) [2016-07-16 16:03:17.069]  39 42 18 74 19 48 89 ca <48> 8b 0a 0f
>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>>> rsp=ffff83009fd07c68:
>>> (XEN) [2016-07-16 16:03:17.069]    0000000000000000 ffff8305356df480
>>> ffff83009fd07ce8 ffff82d08014c394
>>> (XEN) [2016-07-16 16:03:17.069]    0000000000000001 ffff8305356df480
>>> 0000000000000293 ffff8305313b80cc
>>> (XEN) [2016-07-16 16:03:17.069]    000000568012ffe5 ffff8305313b8000
>>> ffff83009fd07cd8 ffff83009fd07e38
>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000000 ffff83054e5fc000
>>> 00007fc25a33e004 ffff8305313b8000
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07da8 ffff82d0801629c8
>>> 0000000000000000 ffff83053b1191f0
>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000246 ffff83009fd07d28
>>> ffff82d0801300ae 000000000000000e
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d080171497
>>> ffff83009fd07d78 000000020001d17b
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d68 0000000000000000
>>> ffff83009fd07d68 ffff82d080130280
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d08014d0aa
>>> 0000000000000202 0000000000000000
>>> (XEN) [2016-07-16 16:03:17.070]    ffff8305313b8000 ffff88005716d320
>>> 0000000000305000 00007fc25a33e004
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07ef8 ffff82d080104b2c
>>> 0000000000000206 0000000000000002
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07df8 ffff82d08018c9db
>>> 0000000000000cfe 0000000000000002
>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000002 ffff83054e5fc000
>>> ffff83009fd07e48 ffff82d08019c119
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07e38 0000000080121177
>>> ffff83009fd07e38 0000000000000cfe
>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07f18 0000000000000206
>>> 0000000c00000030 000056082bb90013
>>> (XEN) [2016-07-16 16:03:17.070]    0000000200000056 00007fc200000013
>>> 0000305600000000 000056082b87465d
>>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 00007fc25606b31f
>>> 0000000000000000 000056082b8746cf
>>> (XEN) [2016-07-16 16:03:17.070]    0000000000001000 fee5600026820730
>>> 00007ffe26820740 000056082b8797be
>>> (XEN) [2016-07-16 16:03:17.070]    00000000fee56000 0000430026820772
>>> 00007ffe26820740 0000000000003056
>>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 ffff83009ff8a000
>>> 00007ffe26820580 ffff88005716d320
>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801e39de>]
>>> msixtbl_pt_unregister+0x7b/0xd9
>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d08014c394>]
>>> pt_irq_destroy_bind+0x2be/0x3f0
>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801629c8>]
>>> arch_do_domctl+0xc77/0x2414
>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d080104b2c>]
>>> do_domctl+0x19db/0x1d26
>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0802426bd>]
>>> lstar_enter+0xdd/0x137
>>> (XEN) [2016-07-16 16:03:17.070]
>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
>>> (XEN) [2016-07-16 16:03:17.070]  L4[0x000] = 0000000000000000
>>> ffffffffffffffff
>>> (XEN) [2016-07-16 16:03:18.147]
>>> (XEN) [2016-07-16 16:03:18.155] ****************************************
>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 0000000000000000
>>> (XEN) [2016-07-16 16:03:18.233] ****************************************
>>> (XEN) [2016-07-16 16:03:18.252]
>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>>
>> Can you paste the disassembly of msixtbl_pt_unregister() please?  That
>> is a dereference of %rdx which is NULL at this point, but I need to
>> figure out which pointer it is supposed to be.
> Hi Andrew,

<snip>

Thanks.  What has happened is that the msixtbl linked list is still
uninitialised at this point.  The only way I can see for this to happen
is that msixtbl_init() hasn't been called, or hasn't passed its first if
condition.  The INIT_LIST_HEAD() visible in the context of the 2nd hunk
of identified changeset is the line of code which changes the list from
0 to initialised, and I don't see anywhere which re-zeros it later.

This alone suggests that the VM in question isn't actually using MSI-X
interrupts, even if the device passed through is capable.

Following the style of the identified changeset,

andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index e418b98..c533719 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
pirq *pirq)
     ASSERT(pcidevs_locked());
     ASSERT(spin_is_locked(&d->event_lock));

-    if ( !has_vlapic(d) )
+    if ( !d->arch.hvm_domain.msixtbl_list.next )
         return;

     irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);

should resolve your issue, although I am very tempted to replace the
opencoded list logic with a msixtbl_initialised() predicate instead.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
  2016-07-18 20:57     ` Andrew Cooper
@ 2016-07-18 22:03       ` linux
  2016-07-18 22:07         ` Andrew Cooper
  0 siblings, 1 reply; 13+ messages in thread
From: linux @ 2016-07-18 22:03 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Andrew Cooper, Jan Beulich, Xen-devel

On 2016-07-18 22:57, Andrew Cooper wrote:
> On 18/07/2016 20:26, Sander Eikelenboom wrote:
>> Monday, July 18, 2016, 7:48:20 PM, you wrote:
>> 
>>> On 18/07/16 11:21, linux@eikelenboom.it wrote:
>>>> Hi Jan,
>>>> 
>>>> It seems that since your patch series starting with commit:
>>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>>> 
>>>> The shutdown of a guest which has a PCI device passed through which
>>>> uses MSI-X interrupts causes
>>>> a host crash, see the splat below. Somehow it also doesn't reboot in 
>>>> 5
>>>> seconds as it is supposed to (i don't have no-reboot on the command
>>>> line).
>>>> 
>>>> --
>>>> Sander
>>>> 
>>>> 
>>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable  x86_64
>>>> debug=y  Not tainted ]----
>>>> (XEN) [2016-07-16 16:03:17.069] CPU:    0
>>>> (XEN) [2016-07-16 16:03:17.069] RIP:    e008:[<ffff82d0801e39de>]
>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082   CONTEXT:
>>>> hypervisor (d0v0)
>>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40   rbx:
>>>> ffff83055c685500   rcx: 0000000000000001
>>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000   rsi:
>>>> 0000000000001ab0   rdi: ffff8305313b85a0
>>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78   rsp:
>>>> ffff83009fd07c68   r8:  ffff8305356dfff0
>>>> (XEN) [2016-07-16 16:03:17.069] r9:  ffff8305356df480   r10:
>>>> ffff830503420c50   r11: 0000000000000282
>>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000   r13:
>>>> ffff83009fd07e48   r14: ffff8305313b8000
>>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8   cr0:
>>>> 0000000080050033   cr4: 00000000000006e0
>>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000   cr2:
>>>> 0000000000000000
>>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000   es: 0000   fs: 0000   gs:
>>>> 0000   ss: e010   cs: e008
>>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>>>> (msixtbl_pt_unregister+0x7b/0xd9):
>>>> (XEN) [2016-07-16 16:03:17.069]  39 42 18 74 19 48 89 ca <48> 8b 0a 
>>>> 0f
>>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>>>> rsp=ffff83009fd07c68:
>>>> (XEN) [2016-07-16 16:03:17.069]    0000000000000000 ffff8305356df480
>>>> ffff83009fd07ce8 ffff82d08014c394
>>>> (XEN) [2016-07-16 16:03:17.069]    0000000000000001 ffff8305356df480
>>>> 0000000000000293 ffff8305313b80cc
>>>> (XEN) [2016-07-16 16:03:17.069]    000000568012ffe5 ffff8305313b8000
>>>> ffff83009fd07cd8 ffff83009fd07e38
>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000000 ffff83054e5fc000
>>>> 00007fc25a33e004 ffff8305313b8000
>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07da8 ffff82d0801629c8
>>>> 0000000000000000 ffff83053b1191f0
>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000246 ffff83009fd07d28
>>>> ffff82d0801300ae 000000000000000e
>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d080171497
>>>> ffff83009fd07d78 000000020001d17b
>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d68 0000000000000000
>>>> ffff83009fd07d68 ffff82d080130280
>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d08014d0aa
>>>> 0000000000000202 0000000000000000
>>>> (XEN) [2016-07-16 16:03:17.070]    ffff8305313b8000 ffff88005716d320
>>>> 0000000000305000 00007fc25a33e004
>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07ef8 ffff82d080104b2c
>>>> 0000000000000206 0000000000000002
>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07df8 ffff82d08018c9db
>>>> 0000000000000cfe 0000000000000002
>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000002 ffff83054e5fc000
>>>> ffff83009fd07e48 ffff82d08019c119
>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07e38 0000000080121177
>>>> ffff83009fd07e38 0000000000000cfe
>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07f18 0000000000000206
>>>> 0000000c00000030 000056082bb90013
>>>> (XEN) [2016-07-16 16:03:17.070]    0000000200000056 00007fc200000013
>>>> 0000305600000000 000056082b87465d
>>>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 00007fc25606b31f
>>>> 0000000000000000 000056082b8746cf
>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000001000 fee5600026820730
>>>> 00007ffe26820740 000056082b8797be
>>>> (XEN) [2016-07-16 16:03:17.070]    00000000fee56000 0000430026820772
>>>> 00007ffe26820740 0000000000003056
>>>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 ffff83009ff8a000
>>>> 00007ffe26820580 ffff88005716d320
>>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801e39de>]
>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d08014c394>]
>>>> pt_irq_destroy_bind+0x2be/0x3f0
>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801629c8>]
>>>> arch_do_domctl+0xc77/0x2414
>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d080104b2c>]
>>>> do_domctl+0x19db/0x1d26
>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0802426bd>]
>>>> lstar_enter+0xdd/0x137
>>>> (XEN) [2016-07-16 16:03:17.070]
>>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 
>>>> 0000000000000000:
>>>> (XEN) [2016-07-16 16:03:17.070]  L4[0x000] = 0000000000000000
>>>> ffffffffffffffff
>>>> (XEN) [2016-07-16 16:03:18.147]
>>>> (XEN) [2016-07-16 16:03:18.155] 
>>>> ****************************************
>>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address: 
>>>> 0000000000000000
>>>> (XEN) [2016-07-16 16:03:18.233] 
>>>> ****************************************
>>>> (XEN) [2016-07-16 16:03:18.252]
>>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>>> 
>>> Can you paste the disassembly of msixtbl_pt_unregister() please?  
>>> That
>>> is a dereference of %rdx which is NULL at this point, but I need to
>>> figure out which pointer it is supposed to be.
>> Hi Andrew,
> 
> <snip>
> 
> Thanks.  What has happened is that the msixtbl linked list is still
> uninitialised at this point.  The only way I can see for this to happen
> is that msixtbl_init() hasn't been called, or hasn't passed its first 
> if
> condition.  The INIT_LIST_HEAD() visible in the context of the 2nd hunk
> of identified changeset is the line of code which changes the list from
> 0 to initialised, and I don't see anywhere which re-zeros it later.
> 
> This alone suggests that the VM in question isn't actually using MSI-X
> interrupts, even if the device passed through is capable.

Hmm didn't actually check this before, but you seem to be right
(below is the lspci output from within the guest).


> Following the style of the identified changeset,
> 
> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index e418b98..c533719 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
> pirq *pirq)
>      ASSERT(pcidevs_locked());
>      ASSERT(spin_is_locked(&d->event_lock));
> 
> -    if ( !has_vlapic(d) )
> +    if ( !d->arch.hvm_domain.msixtbl_list.next )
>          return;
> 
>      irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
> 
> should resolve your issue, although I am very tempted to replace the
> opencoded list logic with a msixtbl_initialised() predicate instead.
> 
> ~Andrew

It does resolve the issue, thanks !

--
Sander

00:05.0 VGA compatible controller: Advanced Micro Devices, Inc. 
[AMD/ATI] Turks PRO [Radeon HD 6570/7570/8550] (prog-if 00 [VGA 
controller])
	Subsystem: PC Partner Limited / Sapphire Technology Turks PRO [Radeon 
HD 6570/7570/8550]
	Physical Slot: 5
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 68
	Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at f3060000 (64-bit, non-prefetchable) [size=128K]
	Region 4: I/O ports at c100 [size=256]
	Expansion ROM at f3080000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 
unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s 
<64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- 
BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF 
Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF 
Disabled
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, 
EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee57000  Data: 4300
	Kernel driver in use: radeon

00:06.0 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] 
Turks/Whistler HDMI Audio [Radeon HD 6000 Series]
	Subsystem: PC Partner Limited / Sapphire Technology Turks/Whistler HDMI 
Audio [Radeon HD 6000 Series]
	Physical Slot: 6
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin B routed to IRQ 79
	Region 0: Memory at f30b0000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 
unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s 
<64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- 
BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF 
Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF 
Disabled
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, 
EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee56000  Data: 4300
	Kernel driver in use: snd_hda_intel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts.
  2016-07-18 22:03       ` linux
@ 2016-07-18 22:07         ` Andrew Cooper
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Cooper @ 2016-07-18 22:07 UTC (permalink / raw)
  To: linux; +Cc: Xen-devel, Jan Beulich, Andrew Cooper

On 18/07/2016 23:03, linux@eikelenboom.it wrote:
> On 2016-07-18 22:57, Andrew Cooper wrote:
>> On 18/07/2016 20:26, Sander Eikelenboom wrote:
>>> Monday, July 18, 2016, 7:48:20 PM, you wrote:
>>>
>>>> On 18/07/16 11:21, linux@eikelenboom.it wrote:
>>>>> Hi Jan,
>>>>>
>>>>> It seems that since your patch series starting with commit:
>>>>> 2016-06-22 x86/vMSI-X: defer intercept handler registration
>>>>> 74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798
>>>>>
>>>>> The shutdown of a guest which has a PCI device passed through which
>>>>> uses MSI-X interrupts causes
>>>>> a host crash, see the splat below. Somehow it also doesn't reboot
>>>>> in 5
>>>>> seconds as it is supposed to (i don't have no-reboot on the command
>>>>> line).
>>>>>
>>>>> -- 
>>>>> Sander
>>>>>
>>>>>
>>>>> (XEN) [2016-07-16 16:03:17.069] ----[ Xen-4.8-unstable  x86_64
>>>>> debug=y  Not tainted ]----
>>>>> (XEN) [2016-07-16 16:03:17.069] CPU:    0
>>>>> (XEN) [2016-07-16 16:03:17.069] RIP:    e008:[<ffff82d0801e39de>]
>>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>>> (XEN) [2016-07-16 16:03:17.069] RFLAGS: 0000000000010082   CONTEXT:
>>>>> hypervisor (d0v0)
>>>>> (XEN) [2016-07-16 16:03:17.069] rax: ffff83055c678e40   rbx:
>>>>> ffff83055c685500   rcx: 0000000000000001
>>>>> (XEN) [2016-07-16 16:03:17.069] rdx: 0000000000000000   rsi:
>>>>> 0000000000001ab0   rdi: ffff8305313b85a0
>>>>> (XEN) [2016-07-16 16:03:17.069] rbp: ffff83009fd07c78   rsp:
>>>>> ffff83009fd07c68   r8:  ffff8305356dfff0
>>>>> (XEN) [2016-07-16 16:03:17.069] r9:  ffff8305356df480   r10:
>>>>> ffff830503420c50   r11: 0000000000000282
>>>>> (XEN) [2016-07-16 16:03:17.069] r12: ffff8305313b8000   r13:
>>>>> ffff83009fd07e48   r14: ffff8305313b8000
>>>>> (XEN) [2016-07-16 16:03:17.069] r15: ffff8305356df4a8   cr0:
>>>>> 0000000080050033   cr4: 00000000000006e0
>>>>> (XEN) [2016-07-16 16:03:17.069] cr3: 000000053639f000   cr2:
>>>>> 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:17.069] ds: 0000   es: 0000   fs: 0000   gs:
>>>>> 0000   ss: e010   cs: e008
>>>>> (XEN) [2016-07-16 16:03:17.069] Xen code around <ffff82d0801e39de>
>>>>> (msixtbl_pt_unregister+0x7b/0xd9):
>>>>> (XEN) [2016-07-16 16:03:17.069]  39 42 18 74 19 48 89 ca <48> 8b
>>>>> 0a 0f
>>>>> 18 09 48 39 fa 75 ec 48 8d 7b 24 e8
>>>>> (XEN) [2016-07-16 16:03:17.069] Xen stack trace from
>>>>> rsp=ffff83009fd07c68:
>>>>> (XEN) [2016-07-16 16:03:17.069]    0000000000000000 ffff8305356df480
>>>>> ffff83009fd07ce8 ffff82d08014c394
>>>>> (XEN) [2016-07-16 16:03:17.069]    0000000000000001 ffff8305356df480
>>>>> 0000000000000293 ffff8305313b80cc
>>>>> (XEN) [2016-07-16 16:03:17.069]    000000568012ffe5 ffff8305313b8000
>>>>> ffff83009fd07cd8 ffff83009fd07e38
>>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000000 ffff83054e5fc000
>>>>> 00007fc25a33e004 ffff8305313b8000
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07da8 ffff82d0801629c8
>>>>> 0000000000000000 ffff83053b1191f0
>>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000246 ffff83009fd07d28
>>>>> ffff82d0801300ae 000000000000000e
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d080171497
>>>>> ffff83009fd07d78 000000020001d17b
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d68 0000000000000000
>>>>> ffff83009fd07d68 ffff82d080130280
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07d78 ffff82d08014d0aa
>>>>> 0000000000000202 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff8305313b8000 ffff88005716d320
>>>>> 0000000000305000 00007fc25a33e004
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07ef8 ffff82d080104b2c
>>>>> 0000000000000206 0000000000000002
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07df8 ffff82d08018c9db
>>>>> 0000000000000cfe 0000000000000002
>>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000000002 ffff83054e5fc000
>>>>> ffff83009fd07e48 ffff82d08019c119
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07e38 0000000080121177
>>>>> ffff83009fd07e38 0000000000000cfe
>>>>> (XEN) [2016-07-16 16:03:17.070]    ffff83009fd07f18 0000000000000206
>>>>> 0000000c00000030 000056082bb90013
>>>>> (XEN) [2016-07-16 16:03:17.070]    0000000200000056 00007fc200000013
>>>>> 0000305600000000 000056082b87465d
>>>>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 00007fc25606b31f
>>>>> 0000000000000000 000056082b8746cf
>>>>> (XEN) [2016-07-16 16:03:17.070]    0000000000001000 fee5600026820730
>>>>> 00007ffe26820740 000056082b8797be
>>>>> (XEN) [2016-07-16 16:03:17.070]    00000000fee56000 0000430026820772
>>>>> 00007ffe26820740 0000000000003056
>>>>> (XEN) [2016-07-16 16:03:17.070]    00007ffe268206e0 ffff83009ff8a000
>>>>> 00007ffe26820580 ffff88005716d320
>>>>> (XEN) [2016-07-16 16:03:17.070] Xen call trace:
>>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801e39de>]
>>>>> msixtbl_pt_unregister+0x7b/0xd9
>>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d08014c394>]
>>>>> pt_irq_destroy_bind+0x2be/0x3f0
>>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0801629c8>]
>>>>> arch_do_domctl+0xc77/0x2414
>>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d080104b2c>]
>>>>> do_domctl+0x19db/0x1d26
>>>>> (XEN) [2016-07-16 16:03:17.070]    [<ffff82d0802426bd>]
>>>>> lstar_enter+0xdd/0x137
>>>>> (XEN) [2016-07-16 16:03:17.070]
>>>>> (XEN) [2016-07-16 16:03:17.070] Pagetable walk from 0000000000000000:
>>>>> (XEN) [2016-07-16 16:03:17.070]  L4[0x000] = 0000000000000000
>>>>> ffffffffffffffff
>>>>> (XEN) [2016-07-16 16:03:18.147]
>>>>> (XEN) [2016-07-16 16:03:18.155]
>>>>> ****************************************
>>>>> (XEN) [2016-07-16 16:03:18.175] Panic on CPU 0:
>>>>> (XEN) [2016-07-16 16:03:18.187] FATAL PAGE FAULT
>>>>> (XEN) [2016-07-16 16:03:18.200] [error_code=0000]
>>>>> (XEN) [2016-07-16 16:03:18.214] Faulting linear address:
>>>>> 0000000000000000
>>>>> (XEN) [2016-07-16 16:03:18.233]
>>>>> ****************************************
>>>>> (XEN) [2016-07-16 16:03:18.252]
>>>>> (XEN) [2016-07-16 16:03:18.261] Reboot in five seconds...
>>>>>
>>>> Can you paste the disassembly of msixtbl_pt_unregister() please?  That
>>>> is a dereference of %rdx which is NULL at this point, but I need to
>>>> figure out which pointer it is supposed to be.
>>> Hi Andrew,
>>
>> <snip>
>>
>> Thanks.  What has happened is that the msixtbl linked list is still
>> uninitialised at this point.  The only way I can see for this to happen
>> is that msixtbl_init() hasn't been called, or hasn't passed its first if
>> condition.  The INIT_LIST_HEAD() visible in the context of the 2nd hunk
>> of identified changeset is the line of code which changes the list from
>> 0 to initialised, and I don't see anywhere which re-zeros it later.
>>
>> This alone suggests that the VM in question isn't actually using MSI-X
>> interrupts, even if the device passed through is capable.
>
> Hmm didn't actually check this before, but you seem to be right
> (below is the lspci output from within the guest).

Both of those devices are using MSI interrupts - they don't even support
MSI-X.

>
>
>> Following the style of the identified changeset,
>>
>> andrewcoop@andrewcoop:/local/xen.git/xen$ git diff
>> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>> index e418b98..c533719 100644
>> --- a/xen/arch/x86/hvm/vmsi.c
>> +++ b/xen/arch/x86/hvm/vmsi.c
>> @@ -519,7 +519,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
>> pirq *pirq)
>>      ASSERT(pcidevs_locked());
>>      ASSERT(spin_is_locked(&d->event_lock));
>>
>> -    if ( !has_vlapic(d) )
>> +    if ( !d->arch.hvm_domain.msixtbl_list.next )
>>          return;
>>
>>      irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
>>
>> should resolve your issue, although I am very tempted to replace the
>> opencoded list logic with a msixtbl_initialised() predicate instead.
>>
>> ~Andrew
>
> It does resolve the issue, thanks !

Right - I will clean up the patch tomorrow using a more logical predicate.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
  2016-07-18 10:21 Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts linux
  2016-07-18 17:48 ` Andrew Cooper
@ 2016-07-21 10:18 ` Andrew Cooper
  2016-07-21 10:37   ` Sander Eikelenboom
                     ` (2 more replies)
  1 sibling, 3 replies; 13+ messages in thread
From: Andrew Cooper @ 2016-07-21 10:18 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Jan Beulich, Sander Eikelenboom

c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
table infrastructure not to always be initialised, but it missed one path
which needed an is-initialised check.

If a devices is passed through to a domain which is MSI capable but not MSI-X
capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
hypercall still calls into msixtbl_pt_unregister().  This follows the linked
list pointer which is still NULL.

Introduce an is-initalised check to msixtbl_pt_unregister().

Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
subtle.  Introduce an msixtbl_initialised() predicate instead, which makes its
purpose far more obvious.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Sander Eikelenboom <linux@eikelenboom.it>

Sander - would you mind double checking this patch?
---
 xen/arch/x86/hvm/vmsi.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index e418b98..ef1dfff 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -166,6 +166,16 @@ struct msixtbl_entry
 
 static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
 
+/*
+ * MSI-X table infrastructure is dynamically initialised when an MSI-X capable
+ * device is passed through to a domain, rather than unconditionally for all
+ * domains.
+ */
+static bool msixtbl_initialised(const struct domain *d)
+{
+    return !!d->arch.hvm_domain.msixtbl_list.next;
+}
+
 static struct msixtbl_entry *msixtbl_find_entry(
     struct vcpu *v, unsigned long addr)
 {
@@ -519,7 +529,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq)
     ASSERT(pcidevs_locked());
     ASSERT(spin_is_locked(&d->event_lock));
 
-    if ( !has_vlapic(d) )
+    if ( !msixtbl_initialised(d) )
         return;
 
     irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
@@ -552,7 +562,7 @@ void msixtbl_init(struct domain *d)
     struct hvm_io_handler *handler;
 
     if ( !has_hvm_container_domain(d) || !has_vlapic(d) ||
-         d->arch.hvm_domain.msixtbl_list.next )
+         msixtbl_initialised(d) )
         return;
 
     INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
@@ -569,7 +579,7 @@ void msixtbl_pt_cleanup(struct domain *d)
 {
     struct msixtbl_entry *entry, *temp;
 
-    if ( !d->arch.hvm_domain.msixtbl_list.next )
+    if ( !msixtbl_initialised(d) )
         return;
 
     spin_lock(&d->event_lock);
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
  2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
@ 2016-07-21 10:37   ` Sander Eikelenboom
  2016-07-22  8:50   ` Sander Eikelenboom
  2016-07-25 10:26   ` George Dunlap
  2 siblings, 0 replies; 13+ messages in thread
From: Sander Eikelenboom @ 2016-07-21 10:37 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel; +Cc: Jan Beulich



On July 21, 2016 12:18:37 PM GMT+02:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused
>MSI-X
>table infrastructure not to always be initialised, but it missed one
>path
>which needed an is-initialised check.
>
>If a devices is passed through to a domain which is MSI capable but not
>MSI-X
>capable, the call to msixtbl_init() is omitted, but a
>XEN_DOMCTL_unbind_pt_irq
>hypercall still calls into msixtbl_pt_unregister().  This follows the
>linked
>list pointer which is still NULL.
>
>Introduce an is-initalised check to msixtbl_pt_unregister().
>
>Furthermore, the purpose of the open-coded msixtbl_list.next check is
>rather
>subtle.  Introduce an msixtbl_initialised() predicate instead, which
>makes its
>purpose far more obvious.
>
>Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>---
>CC: Jan Beulich <JBeulich@suse.com>
>CC: Sander Eikelenboom <linux@eikelenboom.it>
>
>Sander - would you mind double checking this patch?
>---

Sure, will report back tomorrow.

--
Sander

> xen/arch/x86/hvm/vmsi.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
>diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
>index e418b98..ef1dfff 100644
>--- a/xen/arch/x86/hvm/vmsi.c
>+++ b/xen/arch/x86/hvm/vmsi.c
>@@ -166,6 +166,16 @@ struct msixtbl_entry
> 
> static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
> 
>+/*
>+ * MSI-X table infrastructure is dynamically initialised when an MSI-X
>capable
>+ * device is passed through to a domain, rather than unconditionally
>for all
>+ * domains.
>+ */
>+static bool msixtbl_initialised(const struct domain *d)
>+{
>+    return !!d->arch.hvm_domain.msixtbl_list.next;
>+}
>+
> static struct msixtbl_entry *msixtbl_find_entry(
>     struct vcpu *v, unsigned long addr)
> {
>@@ -519,7 +529,7 @@ void msixtbl_pt_unregister(struct domain *d, struct
>pirq *pirq)
>     ASSERT(pcidevs_locked());
>     ASSERT(spin_is_locked(&d->event_lock));
> 
>-    if ( !has_vlapic(d) )
>+    if ( !msixtbl_initialised(d) )
>         return;
> 
>     irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
>@@ -552,7 +562,7 @@ void msixtbl_init(struct domain *d)
>     struct hvm_io_handler *handler;
> 
>     if ( !has_hvm_container_domain(d) || !has_vlapic(d) ||
>-         d->arch.hvm_domain.msixtbl_list.next )
>+         msixtbl_initialised(d) )
>         return;
> 
>     INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
>@@ -569,7 +579,7 @@ void msixtbl_pt_cleanup(struct domain *d)
> {
>     struct msixtbl_entry *entry, *temp;
> 
>-    if ( !d->arch.hvm_domain.msixtbl_list.next )
>+    if ( !msixtbl_initialised(d) )
>         return;
> 
>     spin_lock(&d->event_lock);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
  2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
  2016-07-21 10:37   ` Sander Eikelenboom
@ 2016-07-22  8:50   ` Sander Eikelenboom
  2016-07-25 10:16     ` Andrew Cooper
  2016-07-25 10:26   ` George Dunlap
  2 siblings, 1 reply; 13+ messages in thread
From: Sander Eikelenboom @ 2016-07-22  8:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel


Thursday, July 21, 2016, 12:18:37 PM, you wrote:

> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
> table infrastructure not to always be initialised, but it missed one path
> which needed an is-initialised check.

> If a devices is passed through to a domain which is MSI capable but not MSI-X
> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
> hypercall still calls into msixtbl_pt_unregister().  This follows the linked
> list pointer which is still NULL.

> Introduce an is-initalised check to msixtbl_pt_unregister().

> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
> subtle.  Introduce an msixtbl_initialised() predicate instead, which makes its
> purpose far more obvious.

> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Sander Eikelenboom <linux@eikelenboom.it>

> Sander - would you mind double checking this patch?
> ---

Hi Andrew,

Just got the chance to test and it works for me !

Thanks,

Sander


>  xen/arch/x86/hvm/vmsi.c | 16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)

> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index e418b98..ef1dfff 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -166,6 +166,16 @@ struct msixtbl_entry
>  
>  static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
>  
> +/*
> + * MSI-X table infrastructure is dynamically initialised when an MSI-X capable
> + * device is passed through to a domain, rather than unconditionally for all
> + * domains.
> + */
> +static bool msixtbl_initialised(const struct domain *d)
> +{
+    return !!d->>arch.hvm_domain.msixtbl_list.next;
> +}
> +
>  static struct msixtbl_entry *msixtbl_find_entry(
>      struct vcpu *v, unsigned long addr)
>  {
> @@ -519,7 +529,7 @@ void msixtbl_pt_unregister(struct domain *d, struct pirq *pirq)
>      ASSERT(pcidevs_locked());
>      ASSERT(spin_is_locked(&d->event_lock));
>  
> -    if ( !has_vlapic(d) )
> +    if ( !msixtbl_initialised(d) )
>          return;
>  
>      irq_desc = pirq_spin_lock_irq_desc(pirq, NULL);
> @@ -552,7 +562,7 @@ void msixtbl_init(struct domain *d)
>      struct hvm_io_handler *handler;
>  
>      if ( !has_hvm_container_domain(d) || !has_vlapic(d) ||
-         d->>arch.hvm_domain.msixtbl_list.next )
> +         msixtbl_initialised(d) )
>          return;
>  
>      INIT_LIST_HEAD(&d->arch.hvm_domain.msixtbl_list);
> @@ -569,7 +579,7 @@ void msixtbl_pt_cleanup(struct domain *d)
>  {
>      struct msixtbl_entry *entry, *temp;
>  
-    if ( !d->>arch.hvm_domain.msixtbl_list.next )
> +    if ( !msixtbl_initialised(d) )
>          return;
>  
>      spin_lock(&d->event_lock);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
  2016-07-22  8:50   ` Sander Eikelenboom
@ 2016-07-25 10:16     ` Andrew Cooper
  2016-07-25 10:19       ` Andrew Cooper
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2016-07-25 10:16 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: Jan Beulich, Xen-devel

On 22/07/16 09:50, Sander Eikelenboom wrote:
> Thursday, July 21, 2016, 12:18:37 PM, you wrote:
>
>> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
>> table infrastructure not to always be initialised, but it missed one path
>> which needed an is-initialised check.
>> If a devices is passed through to a domain which is MSI capable but not MSI-X
>> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
>> hypercall still calls into msixtbl_pt_unregister().  This follows the linked
>> list pointer which is still NULL.
>> Introduce an is-initalised check to msixtbl_pt_unregister().
>> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
>> subtle.  Introduce an msixtbl_initialised() predicate instead, which makes its
>> purpose far more obvious.
>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Sander Eikelenboom <linux@eikelenboom.it>
>> Sander - would you mind double checking this patch?
>> ---
> Hi Andrew,
>
> Just got the chance to test and it works for me !
>
> Thanks,

May I take that as a Test-by: then please?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
  2016-07-25 10:16     ` Andrew Cooper
@ 2016-07-25 10:19       ` Andrew Cooper
  2016-07-25 10:23         ` Sander Eikelenboom
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2016-07-25 10:19 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: Jan Beulich, Xen-devel

On 25/07/16 11:16, Andrew Cooper wrote:
> On 22/07/16 09:50, Sander Eikelenboom wrote:
>> Thursday, July 21, 2016, 12:18:37 PM, you wrote:
>>
>>> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
>>> table infrastructure not to always be initialised, but it missed one path
>>> which needed an is-initialised check.
>>> If a devices is passed through to a domain which is MSI capable but not MSI-X
>>> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
>>> hypercall still calls into msixtbl_pt_unregister().  This follows the linked
>>> list pointer which is still NULL.
>>> Introduce an is-initalised check to msixtbl_pt_unregister().
>>> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
>>> subtle.  Introduce an msixtbl_initialised() predicate instead, which makes its
>>> purpose far more obvious.
>>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Sander Eikelenboom <linux@eikelenboom.it>
>>> Sander - would you mind double checking this patch?
>>> ---
>> Hi Andrew,
>>
>> Just got the chance to test and it works for me !
>>
>> Thanks,
> May I take that as a Test-by: then please?

And of course, I meant Tested-by:

~Andrew


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
  2016-07-25 10:19       ` Andrew Cooper
@ 2016-07-25 10:23         ` Sander Eikelenboom
  0 siblings, 0 replies; 13+ messages in thread
From: Sander Eikelenboom @ 2016-07-25 10:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, Xen-devel


Monday, July 25, 2016, 12:19:55 PM, you wrote:

> On 25/07/16 11:16, Andrew Cooper wrote:
>> On 22/07/16 09:50, Sander Eikelenboom wrote:
>>> Thursday, July 21, 2016, 12:18:37 PM, you wrote:
>>>
>>>> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
>>>> table infrastructure not to always be initialised, but it missed one path
>>>> which needed an is-initialised check.
>>>> If a devices is passed through to a domain which is MSI capable but not MSI-X
>>>> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
>>>> hypercall still calls into msixtbl_pt_unregister().  This follows the linked
>>>> list pointer which is still NULL.
>>>> Introduce an is-initalised check to msixtbl_pt_unregister().
>>>> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
>>>> subtle.  Introduce an msixtbl_initialised() predicate instead, which makes its
>>>> purpose far more obvious.
>>>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Sander Eikelenboom <linux@eikelenboom.it>
>>>> Sander - would you mind double checking this patch?
>>>> ---
>>> Hi Andrew,
>>>
>>> Just got the chance to test and it works for me !
>>>
>>> Thanks,
>> May I take that as a Test-by: then please?

> And of course, I meant Tested-by:

Yes, thanks for the quick fix !

--
Sander

> ~Andrew



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
  2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
  2016-07-21 10:37   ` Sander Eikelenboom
  2016-07-22  8:50   ` Sander Eikelenboom
@ 2016-07-25 10:26   ` George Dunlap
  2 siblings, 0 replies; 13+ messages in thread
From: George Dunlap @ 2016-07-25 10:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Sander Eikelenboom, Jan Beulich, Xen-devel

On Thu, Jul 21, 2016 at 11:18 AM, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
> table infrastructure not to always be initialised, but it missed one path
> which needed an is-initialised check.
>
> If a devices is passed through to a domain which is MSI capable but not MSI-X
> capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
> hypercall still calls into msixtbl_pt_unregister().  This follows the linked
> list pointer which is still NULL.
>
> Introduce an is-initalised check to msixtbl_pt_unregister().
>
> Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
> subtle.  Introduce an msixtbl_initialised() predicate instead, which makes its
> purpose far more obvious.

Thanks for this bit.

> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: George Dunlap <george.dunlap@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-07-25 10:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-18 10:21 Xen-unstable 4.8: Host crash when shutting down guest with pci device passed through using MSI-X interrupts linux
2016-07-18 17:48 ` Andrew Cooper
2016-07-18 19:26   ` Sander Eikelenboom
2016-07-18 20:57     ` Andrew Cooper
2016-07-18 22:03       ` linux
2016-07-18 22:07         ` Andrew Cooper
2016-07-21 10:18 ` [PATCH] x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices Andrew Cooper
2016-07-21 10:37   ` Sander Eikelenboom
2016-07-22  8:50   ` Sander Eikelenboom
2016-07-25 10:16     ` Andrew Cooper
2016-07-25 10:19       ` Andrew Cooper
2016-07-25 10:23         ` Sander Eikelenboom
2016-07-25 10:26   ` George Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.