xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Deadlock in stdvga_mem_accept() with emulation
@ 2015-07-13  7:48 Razvan Cojocaru
  2015-07-13  8:10 ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Razvan Cojocaru @ 2015-07-13  7:48 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich

Hello,

I'm battling the following hypervisor crash with current staging:

(d2) Invoking ROMBIOS ...
(XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
(d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
(XEN) Watchdog timer detects that CPU7 is stuck!
(XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    7
(XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
(XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
(XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx: 000000000000c11e
(XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi: ffff83041e687978
(XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11: 0000000000000000
(XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14: ffff83040eb37278
(XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4: 00000000001526e0
(XEN) cr3: 00000004054a0000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff83040eb37200:
(XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6 0000000000000282
(XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000 ffff83040eb37268
(XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000 0000000000000001
(XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410 00000000000c253f
(XEN)    0000000000000000 0000000100000001 0100000000000000 ffff83040eb37328
(XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000 0000000000000000
(XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000 000000000000053f
(XEN)    0000000000000002 0000000000000000 ffff83040eb373f4 00000000000000c2
(XEN)    ffff83040eb373d8 0000000000000000 0000000000000000 ffff82d08028c620
(XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d ffff83040eb37398
(XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0 ffff830403791bf0
(XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0 0000000000000001
(XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f ffff83040eb373c8
(XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000 0000000000000001
(XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440 00000000000a0001
(XEN)    00000000000c253f 0000000100000001 0111000000000000 ffff83040eb37478
(XEN)    0000000000000001 0000000000000000 0000000000000000 0000000000000001
(XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9 0000000000000001
(XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001 0100000000000000
(XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000 ffff83040eb374c8
(XEN) Xen call trace:
(XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
(XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
(XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
(XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
(XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
(XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
(XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
(XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
(XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
(XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
(XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
(XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
(XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
(XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
(XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
(XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
(XEN)    [<ffff82d0801cb700>] handle_mmio_with_translation+0x44/0x46
(XEN)    [<ffff82d0801c27f6>] hvm_hap_nested_page_fault+0x15f/0x589
(XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
(XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 7:
(XEN) FATAL TRAP: vector = 2 (nmi)
(XEN) [error_code=0000]
(XEN) ****************************************

At first I thought it was caused by V5 of the vm_event-based
introspection series, but I've rolled it back enough to apply V4 on top
of it (which has been thoroughly tested on Thursday), and it still
happens, so this would at least appear to be unrelated at this point
(other than the fact that our use case is maybe somewhat unusual with
heavy emulation).

I'll keep digging, but since this is a busy time for Xen I thought I'd
issue a heads-up here as soon as possible, in case the problem is
obvious for somebody and it helps getting it fixed sooner.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  7:48 Deadlock in stdvga_mem_accept() with emulation Razvan Cojocaru
@ 2015-07-13  8:10 ` Andrew Cooper
  2015-07-13  8:50   ` Razvan Cojocaru
  2015-07-13  8:51   ` Paul Durrant
  0 siblings, 2 replies; 17+ messages in thread
From: Andrew Cooper @ 2015-07-13  8:10 UTC (permalink / raw)
  To: Razvan Cojocaru, xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

On 13/07/2015 08:48, Razvan Cojocaru wrote:
> Hello,
>
> I'm battling the following hypervisor crash with current staging:
>
> (d2) Invoking ROMBIOS ...
> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
> (XEN) Watchdog timer detects that CPU7 is stuck!
> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    7
> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx: 000000000000c11e
> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi: ffff83041e687978
> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:  0000000000000000
> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11: 0000000000000000
> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14: ffff83040eb37278
> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4: 00000000001526e0
> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff83040eb37200:
> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6 0000000000000282
> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000 ffff83040eb37268
> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000 0000000000000001
> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410 00000000000c253f
> (XEN)    0000000000000000 0000000100000001 0100000000000000 ffff83040eb37328
> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000 0000000000000000
> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000 000000000000053f
> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4 00000000000000c2
> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000 ffff82d08028c620
> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d ffff83040eb37398
> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0 ffff830403791bf0
> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0 0000000000000001
> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f ffff83040eb373c8
> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000 0000000000000001
> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440 00000000000a0001
> (XEN)    00000000000c253f 0000000100000001 0111000000000000 ffff83040eb37478
> (XEN)    0000000000000001 0000000000000000 0000000000000000 0000000000000001
> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9 0000000000000001
> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001 0100000000000000
> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000 ffff83040eb374c8
> (XEN) Xen call trace:
> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
> (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
> (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
> (XEN)    [<ffff82d0801cb700>] handle_mmio_with_translation+0x44/0x46
> (XEN)    [<ffff82d0801c27f6>] hvm_hap_nested_page_fault+0x15f/0x589
> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 7:
> (XEN) FATAL TRAP: vector = 2 (nmi)
> (XEN) [error_code=0000]
> (XEN) ****************************************
>
> At first I thought it was caused by V5 of the vm_event-based
> introspection series, but I've rolled it back enough to apply V4 on top
> of it (which has been thoroughly tested on Thursday), and it still
> happens, so this would at least appear to be unrelated at this point
> (other than the fact that our use case is maybe somewhat unusual with
> heavy emulation).
>
> I'll keep digging, but since this is a busy time for Xen I thought I'd
> issue a heads-up here as soon as possible, in case the problem is
> obvious for somebody and it helps getting it fixed sooner.

In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
deliberate split between stdvga_mem_accept() and stdvga_mem_complete()
about locking and unlocking the stdvga lock.

At a guess, the previous chain of execution accidentally omitted the
stdvga_mem_complete() call.

~Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  8:10 ` Andrew Cooper
@ 2015-07-13  8:50   ` Razvan Cojocaru
  2015-07-13  9:00     ` Andrew Cooper
  2015-07-13  9:01     ` Paul Durrant
  2015-07-13  8:51   ` Paul Durrant
  1 sibling, 2 replies; 17+ messages in thread
From: Razvan Cojocaru @ 2015-07-13  8:50 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

On 07/13/2015 11:10 AM, Andrew Cooper wrote:
> On 13/07/2015 08:48, Razvan Cojocaru wrote:
>> Hello,
>>
>> I'm battling the following hypervisor crash with current staging:
>>
>> (d2) Invoking ROMBIOS ...
>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
>> (XEN) Watchdog timer detects that CPU7 is stuck!
>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
>> (XEN) CPU:    7
>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx: 000000000000c11e
>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi: ffff83041e687978
>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:  0000000000000000
>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11: 0000000000000000
>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14: ffff83040eb37278
>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4: 00000000001526e0
>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6 0000000000000282
>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000 ffff83040eb37268
>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000 0000000000000001
>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410 00000000000c253f
>> (XEN)    0000000000000000 0000000100000001 0100000000000000 ffff83040eb37328
>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000 0000000000000000
>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000 000000000000053f
>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4 00000000000000c2
>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000 ffff82d08028c620
>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d ffff83040eb37398
>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0 ffff830403791bf0
>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0 0000000000000001
>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f ffff83040eb373c8
>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000 0000000000000001
>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440 00000000000a0001
>> (XEN)    00000000000c253f 0000000100000001 0111000000000000 ffff83040eb37478
>> (XEN)    0000000000000001 0000000000000000 0000000000000000 0000000000000001
>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9 0000000000000001
>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001 0100000000000000
>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000 ffff83040eb374c8
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
>> (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
>> (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
>> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
>> (XEN)    [<ffff82d0801cb700>] handle_mmio_with_translation+0x44/0x46
>> (XEN)    [<ffff82d0801c27f6>] hvm_hap_nested_page_fault+0x15f/0x589
>> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
>> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 7:
>> (XEN) FATAL TRAP: vector = 2 (nmi)
>> (XEN) [error_code=0000]
>> (XEN) ****************************************
>>
>> At first I thought it was caused by V5 of the vm_event-based
>> introspection series, but I've rolled it back enough to apply V4 on top
>> of it (which has been thoroughly tested on Thursday), and it still
>> happens, so this would at least appear to be unrelated at this point
>> (other than the fact that our use case is maybe somewhat unusual with
>> heavy emulation).
>>
>> I'll keep digging, but since this is a busy time for Xen I thought I'd
>> issue a heads-up here as soon as possible, in case the problem is
>> obvious for somebody and it helps getting it fixed sooner.
> 
> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
> deliberate split between stdvga_mem_accept() and stdvga_mem_complete()
> about locking and unlocking the stdvga lock.
> 
> At a guess, the previous chain of execution accidentally omitted the
> stdvga_mem_complete() call.

Thanks, I've reverted that patch and the crash is gone. I'll be happy to
test a fix if one is provided, but I don't know enough about that code
to go mess with it...


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  8:10 ` Andrew Cooper
  2015-07-13  8:50   ` Razvan Cojocaru
@ 2015-07-13  8:51   ` Paul Durrant
  1 sibling, 0 replies; 17+ messages in thread
From: Paul Durrant @ 2015-07-13  8:51 UTC (permalink / raw)
  To: Andrew Cooper, Razvan Cojocaru, xen-devel; +Cc: Keir (Xen.org), Jan Beulich

> -----Original Message-----
> From: Andrew Cooper [mailto:amc96@hermes.cam.ac.uk] On Behalf Of
> Andrew Cooper
> Sent: 13 July 2015 09:11
> To: Razvan Cojocaru; xen-devel@lists.xen.org
> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
> 
> On 13/07/2015 08:48, Razvan Cojocaru wrote:
> > Hello,
> >
> > I'm battling the following hypervisor crash with current staging:
> >
> > (d2) Invoking ROMBIOS ...
> > (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
> > (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
> > (XEN) Watchdog timer detects that CPU7 is stuck!
> > (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
> > (XEN) CPU:    7
> > (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> > (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
> > (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
> 000000000000c11e
> > (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi: ffff83041e687978
> > (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:  0000000000000000
> > (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11: 0000000000000000
> > (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14: ffff83040eb37278
> > (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
> 00000000001526e0
> > (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
> > (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > (XEN) Xen stack trace from rsp=ffff83040eb37200:
> > (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
> 0000000000000282
> > (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
> ffff83040eb37268
> > (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
> 0000000000000001
> > (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
> 00000000000c253f
> > (XEN)    0000000000000000 0000000100000001 0100000000000000
> ffff83040eb37328
> > (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
> 0000000000000000
> > (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
> 000000000000053f
> > (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
> 00000000000000c2
> > (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
> ffff82d08028c620
> > (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
> ffff83040eb37398
> > (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
> ffff830403791bf0
> > (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
> 0000000000000001
> > (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
> ffff83040eb373c8
> > (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
> 0000000000000001
> > (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
> 00000000000a0001
> > (XEN)    00000000000c253f 0000000100000001 0111000000000000
> ffff83040eb37478
> > (XEN)    0000000000000001 0000000000000000 0000000000000000
> 0000000000000001
> > (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
> 0000000000000001
> > (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
> 0100000000000000
> > (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
> ffff83040eb374c8
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> > (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
> > (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
> > (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
> > (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
> > (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
> > (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
> > (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
> > (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
> > (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
> > (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
> > (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
> > (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
> > (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
> > (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
> > (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
> > (XEN)    [<ffff82d0801cb700>] handle_mmio_with_translation+0x44/0x46
> > (XEN)    [<ffff82d0801c27f6>] hvm_hap_nested_page_fault+0x15f/0x589
> > (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
> > (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 7:
> > (XEN) FATAL TRAP: vector = 2 (nmi)
> > (XEN) [error_code=0000]
> > (XEN) ****************************************
> >
> > At first I thought it was caused by V5 of the vm_event-based
> > introspection series, but I've rolled it back enough to apply V4 on top
> > of it (which has been thoroughly tested on Thursday), and it still
> > happens, so this would at least appear to be unrelated at this point
> > (other than the fact that our use case is maybe somewhat unusual with
> > heavy emulation).
> >
> > I'll keep digging, but since this is a busy time for Xen I thought I'd
> > issue a heads-up here as soon as possible, in case the problem is
> > obvious for somebody and it helps getting it fixed sooner.
> 
> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
> deliberate split between stdvga_mem_accept() and
> stdvga_mem_complete()
> about locking and unlocking the stdvga lock.
> 
> At a guess, the previous chain of execution accidentally omitted the
> stdvga_mem_complete() call.
> 

I can't see a way for that to happen. The only time the accept() call is made is in hvm_find_io_handler(). That function is only called from hvm_io_intercept() or hvm_mmio_internal(). In either case the handler's complete() call is made before returning.

I think the problem here is indicated by the stack:

> > (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
> > (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
> > (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
> > (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
> > (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
> > (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
> > (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
> > (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1

What seems to be happening is that the loop in hvm_process_io_intercept() is testing for mmio space (in this case VRAM), which is then re-entering the I/O code (via a check for an internal I/O range) and trying to re-acquire the stdvga lock. I am not sure why that would be happening, since the check in hvmemul_do_io_addr() makes sure that the target of the copy_to/from_guest_phys is most definitely RAM.

  Paul
 
> ~Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  8:50   ` Razvan Cojocaru
@ 2015-07-13  9:00     ` Andrew Cooper
  2015-07-13  9:02       ` Razvan Cojocaru
  2015-07-13  9:01     ` Paul Durrant
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2015-07-13  9:00 UTC (permalink / raw)
  To: Razvan Cojocaru, xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

On 13/07/15 09:50, Razvan Cojocaru wrote:
> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
>>> Hello,
>>>
>>> I'm battling the following hypervisor crash with current staging:
>>>
>>> (d2) Invoking ROMBIOS ...
>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
>>> (XEN) Watchdog timer detects that CPU7 is stuck!
>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
>>> (XEN) CPU:    7
>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx: 000000000000c11e
>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi: ffff83041e687978
>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:  0000000000000000
>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11: 0000000000000000
>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14: ffff83040eb37278
>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4: 00000000001526e0
>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6 0000000000000282
>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000 ffff83040eb37268
>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000 0000000000000001
>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410 00000000000c253f
>>> (XEN)    0000000000000000 0000000100000001 0100000000000000 ffff83040eb37328
>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000 0000000000000000
>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000 000000000000053f
>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4 00000000000000c2
>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000 ffff82d08028c620
>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d ffff83040eb37398
>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0 ffff830403791bf0
>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0 0000000000000001
>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f ffff83040eb373c8
>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000 0000000000000001
>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440 00000000000a0001
>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000 ffff83040eb37478
>>> (XEN)    0000000000000001 0000000000000000 0000000000000000 0000000000000001
>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9 0000000000000001
>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001 0100000000000000
>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000 ffff83040eb374c8
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
>>> (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
>>> (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
>>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
>>> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
>>> (XEN)    [<ffff82d0801cb700>] handle_mmio_with_translation+0x44/0x46
>>> (XEN)    [<ffff82d0801c27f6>] hvm_hap_nested_page_fault+0x15f/0x589
>>> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
>>> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 7:
>>> (XEN) FATAL TRAP: vector = 2 (nmi)
>>> (XEN) [error_code=0000]
>>> (XEN) ****************************************
>>>
>>> At first I thought it was caused by V5 of the vm_event-based
>>> introspection series, but I've rolled it back enough to apply V4 on top
>>> of it (which has been thoroughly tested on Thursday), and it still
>>> happens, so this would at least appear to be unrelated at this point
>>> (other than the fact that our use case is maybe somewhat unusual with
>>> heavy emulation).
>>>
>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
>>> issue a heads-up here as soon as possible, in case the problem is
>>> obvious for somebody and it helps getting it fixed sooner.
>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
>> deliberate split between stdvga_mem_accept() and stdvga_mem_complete()
>> about locking and unlocking the stdvga lock.
>>
>> At a guess, the previous chain of execution accidentally omitted the
>> stdvga_mem_complete() call.
> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
> test a fix if one is provided, but I don't know enough about that code
> to go mess with it...

Thanks - that is a useful datapoint.

What exactly are you attempting to boot in this VM?

~Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  8:50   ` Razvan Cojocaru
  2015-07-13  9:00     ` Andrew Cooper
@ 2015-07-13  9:01     ` Paul Durrant
  2015-07-13  9:03       ` Razvan Cojocaru
  1 sibling, 1 reply; 17+ messages in thread
From: Paul Durrant @ 2015-07-13  9:01 UTC (permalink / raw)
  To: Razvan Cojocaru, Andrew Cooper, xen-devel; +Cc: Keir (Xen.org), Jan Beulich

> -----Original Message-----
> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> Sent: 13 July 2015 09:50
> To: Andrew Cooper; xen-devel@lists.xen.org
> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
> 
> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
> > On 13/07/2015 08:48, Razvan Cojocaru wrote:
> >> Hello,
> >>
> >> I'm battling the following hypervisor crash with current staging:
> >>
> >> (d2) Invoking ROMBIOS ...
> >> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
> >> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
> >> (XEN) Watchdog timer detects that CPU7 is stuck!
> >> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
> >> (XEN) CPU:    7
> >> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> >> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
> >> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
> 000000000000c11e
> >> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi: ffff83041e687978
> >> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:
> 0000000000000000
> >> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11:
> 0000000000000000
> >> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14: ffff83040eb37278
> >> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
> 00000000001526e0
> >> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
> >> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> >> (XEN) Xen stack trace from rsp=ffff83040eb37200:
> >> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
> 0000000000000282
> >> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
> ffff83040eb37268
> >> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
> 0000000000000001
> >> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
> 00000000000c253f
> >> (XEN)    0000000000000000 0000000100000001 0100000000000000
> ffff83040eb37328
> >> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
> 0000000000000000
> >> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
> 000000000000053f
> >> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
> 00000000000000c2
> >> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
> ffff82d08028c620
> >> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
> ffff83040eb37398
> >> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
> ffff830403791bf0
> >> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
> 0000000000000001
> >> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
> ffff83040eb373c8
> >> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
> 0000000000000001
> >> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
> 00000000000a0001
> >> (XEN)    00000000000c253f 0000000100000001 0111000000000000
> ffff83040eb37478
> >> (XEN)    0000000000000001 0000000000000000 0000000000000000
> 0000000000000001
> >> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
> 0000000000000001
> >> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
> 0100000000000000
> >> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
> ffff83040eb374c8
> >> (XEN) Xen call trace:
> >> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> >> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
> >> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
> >> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
> >> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
> >> (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
> >> (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
> >> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
> >> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
> >> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
> >> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
> >> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
> >> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
> >> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
> >> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
> >> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
> >> (XEN)    [<ffff82d0801cb700>] handle_mmio_with_translation+0x44/0x46
> >> (XEN)    [<ffff82d0801c27f6>] hvm_hap_nested_page_fault+0x15f/0x589
> >> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
> >> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
> >> (XEN)
> >> (XEN)
> >> (XEN) ****************************************
> >> (XEN) Panic on CPU 7:
> >> (XEN) FATAL TRAP: vector = 2 (nmi)
> >> (XEN) [error_code=0000]
> >> (XEN) ****************************************
> >>
> >> At first I thought it was caused by V5 of the vm_event-based
> >> introspection series, but I've rolled it back enough to apply V4 on top
> >> of it (which has been thoroughly tested on Thursday), and it still
> >> happens, so this would at least appear to be unrelated at this point
> >> (other than the fact that our use case is maybe somewhat unusual with
> >> heavy emulation).
> >>
> >> I'll keep digging, but since this is a busy time for Xen I thought I'd
> >> issue a heads-up here as soon as possible, in case the problem is
> >> obvious for somebody and it helps getting it fixed sooner.
> >
> > In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
> > deliberate split between stdvga_mem_accept() and
> stdvga_mem_complete()
> > about locking and unlocking the stdvga lock.
> >
> > At a guess, the previous chain of execution accidentally omitted the
> > stdvga_mem_complete() call.
> 
> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
> test a fix if one is provided, but I don't know enough about that code
> to go mess with it...
> 

The problem appears to be that __hvm_copy() is calling this code before establishing the type of the page:

        /*
         * No need to do the P2M lookup for internally handled MMIO, benefiting
         * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
         * - newer Windows (like Server 2012) for HPET accesses.
         */
        if ( !nestedhvm_vcpu_in_guestmode(curr)
             && is_hvm_vcpu(curr)
             && hvm_mmio_internal(gpa) )
            return HVMCOPY_bad_gfn_to_mfn;

...and that call to hvm_mmio_internal is trying to re-acquire the lock. The big question is why on earth was I not hitting this every time in testing too... This patch series has gone through some pretty rigorous tests. Anyway, I will post a fix very shortly (since I know what needs to be done).

  Paul

> 
> Thanks,
> Razvan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  9:00     ` Andrew Cooper
@ 2015-07-13  9:02       ` Razvan Cojocaru
  0 siblings, 0 replies; 17+ messages in thread
From: Razvan Cojocaru @ 2015-07-13  9:02 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Paul Durrant, Keir Fraser, Jan Beulich

On 07/13/2015 12:00 PM, Andrew Cooper wrote:
> On 13/07/15 09:50, Razvan Cojocaru wrote:
>> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
>>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
>>>> Hello,
>>>>
>>>> I'm battling the following hypervisor crash with current staging:
>>>>
>>>> (d2) Invoking ROMBIOS ...
>>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
>>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
>>>> (XEN) Watchdog timer detects that CPU7 is stuck!
>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
>>>> (XEN) CPU:    7
>>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
>>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx: 000000000000c11e
>>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi: ffff83041e687978
>>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:  0000000000000000
>>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11: 0000000000000000
>>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14: ffff83040eb37278
>>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4: 00000000001526e0
>>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
>>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6 0000000000000282
>>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000 ffff83040eb37268
>>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000 0000000000000001
>>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410 00000000000c253f
>>>> (XEN)    0000000000000000 0000000100000001 0100000000000000 ffff83040eb37328
>>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000 0000000000000000
>>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000 000000000000053f
>>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4 00000000000000c2
>>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000 ffff82d08028c620
>>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d ffff83040eb37398
>>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0 ffff830403791bf0
>>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0 0000000000000001
>>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f ffff83040eb373c8
>>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000 0000000000000001
>>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440 00000000000a0001
>>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000 ffff83040eb37478
>>>> (XEN)    0000000000000001 0000000000000000 0000000000000000 0000000000000001
>>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9 0000000000000001
>>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001 0100000000000000
>>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000 ffff83040eb374c8
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
>>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
>>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
>>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
>>>> (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
>>>> (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
>>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
>>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
>>>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
>>>> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
>>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
>>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
>>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
>>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
>>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
>>>> (XEN)    [<ffff82d0801cb700>] handle_mmio_with_translation+0x44/0x46
>>>> (XEN)    [<ffff82d0801c27f6>] hvm_hap_nested_page_fault+0x15f/0x589
>>>> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
>>>> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
>>>> (XEN)
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 7:
>>>> (XEN) FATAL TRAP: vector = 2 (nmi)
>>>> (XEN) [error_code=0000]
>>>> (XEN) ****************************************
>>>>
>>>> At first I thought it was caused by V5 of the vm_event-based
>>>> introspection series, but I've rolled it back enough to apply V4 on top
>>>> of it (which has been thoroughly tested on Thursday), and it still
>>>> happens, so this would at least appear to be unrelated at this point
>>>> (other than the fact that our use case is maybe somewhat unusual with
>>>> heavy emulation).
>>>>
>>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
>>>> issue a heads-up here as soon as possible, in case the problem is
>>>> obvious for somebody and it helps getting it fixed sooner.
>>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
>>> deliberate split between stdvga_mem_accept() and stdvga_mem_complete()
>>> about locking and unlocking the stdvga lock.
>>>
>>> At a guess, the previous chain of execution accidentally omitted the
>>> stdvga_mem_complete() call.
>> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
>> test a fix if one is provided, but I don't know enough about that code
>> to go mess with it...
> 
> Thanks - that is a useful datapoint.
> 
> What exactly are you attempting to boot in this VM?

It's just a plain HVM Windows 7 64-bit guest. There's really nothing
special about it.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  9:01     ` Paul Durrant
@ 2015-07-13  9:03       ` Razvan Cojocaru
  2015-07-13  9:05         ` Paul Durrant
  0 siblings, 1 reply; 17+ messages in thread
From: Razvan Cojocaru @ 2015-07-13  9:03 UTC (permalink / raw)
  To: Paul Durrant, Andrew Cooper, xen-devel; +Cc: Keir (Xen.org), Jan Beulich

On 07/13/2015 12:01 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
>> Sent: 13 July 2015 09:50
>> To: Andrew Cooper; xen-devel@lists.xen.org
>> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
>> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
>>
>> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
>>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
>>>> Hello,
>>>>
>>>> I'm battling the following hypervisor crash with current staging:
>>>>
>>>> (d2) Invoking ROMBIOS ...
>>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
>>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
>>>> (XEN) Watchdog timer detects that CPU7 is stuck!
>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
>>>> (XEN) CPU:    7
>>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
>>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
>> 000000000000c11e
>>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi: ffff83041e687978
>>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:
>> 0000000000000000
>>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11:
>> 0000000000000000
>>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14: ffff83040eb37278
>>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
>> 00000000001526e0
>>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
>>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
>> 0000000000000282
>>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
>> ffff83040eb37268
>>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
>> 0000000000000001
>>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
>> 00000000000c253f
>>>> (XEN)    0000000000000000 0000000100000001 0100000000000000
>> ffff83040eb37328
>>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
>> 0000000000000000
>>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
>> 000000000000053f
>>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
>> 00000000000000c2
>>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
>> ffff82d08028c620
>>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
>> ffff83040eb37398
>>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
>> ffff830403791bf0
>>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
>> 0000000000000001
>>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
>> ffff83040eb373c8
>>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
>> 0000000000000001
>>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
>> 00000000000a0001
>>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000
>> ffff83040eb37478
>>>> (XEN)    0000000000000001 0000000000000000 0000000000000000
>> 0000000000000001
>>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
>> 0000000000000001
>>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
>> 0100000000000000
>>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
>> ffff83040eb374c8
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
>>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
>>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
>>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
>>>> (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
>>>> (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
>>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
>>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
>>>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
>>>> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
>>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
>>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
>>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
>>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
>>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
>>>> (XEN)    [<ffff82d0801cb700>] handle_mmio_with_translation+0x44/0x46
>>>> (XEN)    [<ffff82d0801c27f6>] hvm_hap_nested_page_fault+0x15f/0x589
>>>> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
>>>> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
>>>> (XEN)
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 7:
>>>> (XEN) FATAL TRAP: vector = 2 (nmi)
>>>> (XEN) [error_code=0000]
>>>> (XEN) ****************************************
>>>>
>>>> At first I thought it was caused by V5 of the vm_event-based
>>>> introspection series, but I've rolled it back enough to apply V4 on top
>>>> of it (which has been thoroughly tested on Thursday), and it still
>>>> happens, so this would at least appear to be unrelated at this point
>>>> (other than the fact that our use case is maybe somewhat unusual with
>>>> heavy emulation).
>>>>
>>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
>>>> issue a heads-up here as soon as possible, in case the problem is
>>>> obvious for somebody and it helps getting it fixed sooner.
>>>
>>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
>>> deliberate split between stdvga_mem_accept() and
>> stdvga_mem_complete()
>>> about locking and unlocking the stdvga lock.
>>>
>>> At a guess, the previous chain of execution accidentally omitted the
>>> stdvga_mem_complete() call.
>>
>> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
>> test a fix if one is provided, but I don't know enough about that code
>> to go mess with it...
>>
> 
> The problem appears to be that __hvm_copy() is calling this code before establishing the type of the page:
> 
>         /*
>          * No need to do the P2M lookup for internally handled MMIO, benefiting
>          * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
>          * - newer Windows (like Server 2012) for HPET accesses.
>          */
>         if ( !nestedhvm_vcpu_in_guestmode(curr)
>              && is_hvm_vcpu(curr)
>              && hvm_mmio_internal(gpa) )
>             return HVMCOPY_bad_gfn_to_mfn;
> 
> ...and that call to hvm_mmio_internal is trying to re-acquire the lock. The big question is why on earth was I not hitting this every time in testing too... This patch series has gone through some pretty rigorous tests. Anyway, I will post a fix very shortly (since I know what needs to be done).
> 
>   Paul

Thanks Paul! I appreciate the quick response.


Cheers,
Razvan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  9:03       ` Razvan Cojocaru
@ 2015-07-13  9:05         ` Paul Durrant
  2015-07-13  9:27           ` Jan Beulich
  2015-07-13  9:42           ` Razvan Cojocaru
  0 siblings, 2 replies; 17+ messages in thread
From: Paul Durrant @ 2015-07-13  9:05 UTC (permalink / raw)
  To: Razvan Cojocaru, Andrew Cooper, xen-devel; +Cc: Keir (Xen.org), Jan Beulich

> -----Original Message-----
> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> Sent: 13 July 2015 10:03
> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
> Cc: Keir (Xen.org); Jan Beulich
> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
> 
> On 07/13/2015 12:01 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> >> Sent: 13 July 2015 09:50
> >> To: Andrew Cooper; xen-devel@lists.xen.org
> >> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
> >> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
> >>
> >> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
> >>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
> >>>> Hello,
> >>>>
> >>>> I'm battling the following hypervisor crash with current staging:
> >>>>
> >>>> (d2) Invoking ROMBIOS ...
> >>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
> >>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
> >>>> (XEN) Watchdog timer detects that CPU7 is stuck!
> >>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
> >>>> (XEN) CPU:    7
> >>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> >>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
> >>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
> >> 000000000000c11e
> >>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi:
> ffff83041e687978
> >>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:
> >> 0000000000000000
> >>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11:
> >> 0000000000000000
> >>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14:
> ffff83040eb37278
> >>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
> >> 00000000001526e0
> >>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
> >>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> >>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
> >>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
> >> 0000000000000282
> >>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
> >> ffff83040eb37268
> >>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
> >> 0000000000000001
> >>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
> >> 00000000000c253f
> >>>> (XEN)    0000000000000000 0000000100000001 0100000000000000
> >> ffff83040eb37328
> >>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
> >> 0000000000000000
> >>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
> >> 000000000000053f
> >>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
> >> 00000000000000c2
> >>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
> >> ffff82d08028c620
> >>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
> >> ffff83040eb37398
> >>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
> >> ffff830403791bf0
> >>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
> >> 0000000000000001
> >>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
> >> ffff83040eb373c8
> >>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
> >> 0000000000000001
> >>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
> >> 00000000000a0001
> >>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000
> >> ffff83040eb37478
> >>>> (XEN)    0000000000000001 0000000000000000 0000000000000000
> >> 0000000000000001
> >>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
> >> 0000000000000001
> >>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
> >> 0100000000000000
> >>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
> >> ffff83040eb374c8
> >>>> (XEN) Xen call trace:
> >>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> >>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
> >>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
> >>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
> >>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
> >>>> (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
> >>>> (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
> >>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
> >>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
> >>>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
> >>>> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
> >>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
> >>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
> >>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
> >>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
> >>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
> >>>> (XEN)    [<ffff82d0801cb700>]
> handle_mmio_with_translation+0x44/0x46
> >>>> (XEN)    [<ffff82d0801c27f6>]
> hvm_hap_nested_page_fault+0x15f/0x589
> >>>> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
> >>>> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
> >>>> (XEN)
> >>>> (XEN)
> >>>> (XEN) ****************************************
> >>>> (XEN) Panic on CPU 7:
> >>>> (XEN) FATAL TRAP: vector = 2 (nmi)
> >>>> (XEN) [error_code=0000]
> >>>> (XEN) ****************************************
> >>>>
> >>>> At first I thought it was caused by V5 of the vm_event-based
> >>>> introspection series, but I've rolled it back enough to apply V4 on top
> >>>> of it (which has been thoroughly tested on Thursday), and it still
> >>>> happens, so this would at least appear to be unrelated at this point
> >>>> (other than the fact that our use case is maybe somewhat unusual with
> >>>> heavy emulation).
> >>>>
> >>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
> >>>> issue a heads-up here as soon as possible, in case the problem is
> >>>> obvious for somebody and it helps getting it fixed sooner.
> >>>
> >>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
> >>> deliberate split between stdvga_mem_accept() and
> >> stdvga_mem_complete()
> >>> about locking and unlocking the stdvga lock.
> >>>
> >>> At a guess, the previous chain of execution accidentally omitted the
> >>> stdvga_mem_complete() call.
> >>
> >> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
> >> test a fix if one is provided, but I don't know enough about that code
> >> to go mess with it...
> >>
> >
> > The problem appears to be that __hvm_copy() is calling this code before
> establishing the type of the page:
> >
> >         /*
> >          * No need to do the P2M lookup for internally handled MMIO,
> benefiting
> >          * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
> >          * - newer Windows (like Server 2012) for HPET accesses.
> >          */
> >         if ( !nestedhvm_vcpu_in_guestmode(curr)
> >              && is_hvm_vcpu(curr)
> >              && hvm_mmio_internal(gpa) )
> >             return HVMCOPY_bad_gfn_to_mfn;
> >
> > ...and that call to hvm_mmio_internal is trying to re-acquire the lock. The
> big question is why on earth was I not hitting this every time in testing too...
> This patch series has gone through some pretty rigorous tests. Anyway, I will
> post a fix very shortly (since I know what needs to be done).
> >
> >   Paul
> 
> Thanks Paul! I appreciate the quick response.
> 

While I'm prepping, try this:

diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index ebb3b42..08c797c 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -490,11 +490,18 @@ static bool_t stdvga_mem_accept(const struct hvm_io_handle
 {
     struct hvm_hw_stdvga *s = &current->domain->arch.hvm_domain.stdvga;

+    /*
+     * The range check must be done without taking any locks, to avoid
+     * deadlock when hvm_mmio_internal() is called from
+     * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
+     */
+    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
+         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
+        return 0;
+
     spin_lock(&s->lock);

-    if ( !s->stdvga ||
-         (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
-         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
+    if ( !s->stdvga )
         goto reject;

     if ( p->dir == IOREQ_WRITE && p->count > 1 )

  Paul

> 
> Cheers,
> Razvan

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  9:05         ` Paul Durrant
@ 2015-07-13  9:27           ` Jan Beulich
  2015-07-13  9:30             ` Paul Durrant
  2015-07-13  9:42           ` Razvan Cojocaru
  1 sibling, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2015-07-13  9:27 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir (Xen.org), Razvan Cojocaru, xen-devel

>>> On 13.07.15 at 11:05, <Paul.Durrant@citrix.com> wrote:
> --- a/xen/arch/x86/hvm/stdvga.c
> +++ b/xen/arch/x86/hvm/stdvga.c
> @@ -490,11 +490,18 @@ static bool_t stdvga_mem_accept(const struct 
> hvm_io_handle
>  {
>      struct hvm_hw_stdvga *s = &current->domain->arch.hvm_domain.stdvga;
> 
> +    /*
> +     * The range check must be done without taking any locks, to avoid
> +     * deadlock when hvm_mmio_internal() is called from
> +     * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
> +     */
> +    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> +         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
> +        return 0;
> +
>      spin_lock(&s->lock);
> 
> -    if ( !s->stdvga ||
> -         (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> -         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
> +    if ( !s->stdvga )
>          goto reject;
> 
>      if ( p->dir == IOREQ_WRITE && p->count > 1 )

But won't the problem continue to exist if the address falls within the
VGA range? I.e. isn't the problem that the two uses of
hvm_mmio_internal() are quite different - while
hvm_hap_nested_page_fault() immediately afterwards calls a
handle_mmio() variant (which would even seem to call for the lock not
getting dropped between them), __hvm_copy() uses it as just a check.

I.e. perhaps better to convert the lock to a recursive one?

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  9:27           ` Jan Beulich
@ 2015-07-13  9:30             ` Paul Durrant
  2015-07-13  9:41               ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Durrant @ 2015-07-13  9:30 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir (Xen.org), Razvan Cojocaru, xen-devel

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 13 July 2015 10:28
> To: Paul Durrant
> Cc: Razvan Cojocaru; Andrew Cooper; xen-devel@lists.xen.org; Keir
> (Xen.org)
> Subject: RE: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
> 
> >>> On 13.07.15 at 11:05, <Paul.Durrant@citrix.com> wrote:
> > --- a/xen/arch/x86/hvm/stdvga.c
> > +++ b/xen/arch/x86/hvm/stdvga.c
> > @@ -490,11 +490,18 @@ static bool_t stdvga_mem_accept(const struct
> > hvm_io_handle
> >  {
> >      struct hvm_hw_stdvga *s = &current->domain-
> >arch.hvm_domain.stdvga;
> >
> > +    /*
> > +     * The range check must be done without taking any locks, to avoid
> > +     * deadlock when hvm_mmio_internal() is called from
> > +     * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
> > +     */
> > +    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> > +         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE))
> )
> > +        return 0;
> > +
> >      spin_lock(&s->lock);
> >
> > -    if ( !s->stdvga ||
> > -         (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> > -         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
> > +    if ( !s->stdvga )
> >          goto reject;
> >
> >      if ( p->dir == IOREQ_WRITE && p->count > 1 )
> 
> But won't the problem continue to exist if the address falls within the
> VGA range? I.e. isn't the problem that the two uses of
> hvm_mmio_internal() are quite different - while
> hvm_hap_nested_page_fault() immediately afterwards calls a
> handle_mmio() variant (which would even seem to call for the lock not
> getting dropped between them), __hvm_copy() uses it as just a check.
> 
> I.e. perhaps better to convert the lock to a recursive one?
> 

I think we are ok because the stdvga model will never actually accept the I/O since MMIO <-> MMIO rep mov is explicitly disallowed.

  Paul

> Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  9:30             ` Paul Durrant
@ 2015-07-13  9:41               ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2015-07-13  9:41 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Andrew Cooper, Keir (Xen.org), Razvan Cojocaru, xen-devel

>>> On 13.07.15 at 11:30, <Paul.Durrant@citrix.com> wrote:
>>  -----Original Message-----
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: 13 July 2015 10:28
>> To: Paul Durrant
>> Cc: Razvan Cojocaru; Andrew Cooper; xen-devel@lists.xen.org; Keir
>> (Xen.org)
>> Subject: RE: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
>> 
>> >>> On 13.07.15 at 11:05, <Paul.Durrant@citrix.com> wrote:
>> > --- a/xen/arch/x86/hvm/stdvga.c
>> > +++ b/xen/arch/x86/hvm/stdvga.c
>> > @@ -490,11 +490,18 @@ static bool_t stdvga_mem_accept(const struct
>> > hvm_io_handle
>> >  {
>> >      struct hvm_hw_stdvga *s = &current->domain-
>> >arch.hvm_domain.stdvga;
>> >
>> > +    /*
>> > +     * The range check must be done without taking any locks, to avoid
>> > +     * deadlock when hvm_mmio_internal() is called from
>> > +     * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
>> > +     */
>> > +    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
>> > +         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE))
>> )
>> > +        return 0;
>> > +
>> >      spin_lock(&s->lock);
>> >
>> > -    if ( !s->stdvga ||
>> > -         (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
>> > -         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
>> > +    if ( !s->stdvga )
>> >          goto reject;
>> >
>> >      if ( p->dir == IOREQ_WRITE && p->count > 1 )
>> 
>> But won't the problem continue to exist if the address falls within the
>> VGA range? I.e. isn't the problem that the two uses of
>> hvm_mmio_internal() are quite different - while
>> hvm_hap_nested_page_fault() immediately afterwards calls a
>> handle_mmio() variant (which would even seem to call for the lock not
>> getting dropped between them), __hvm_copy() uses it as just a check.
>> 
>> I.e. perhaps better to convert the lock to a recursive one?
> 
> I think we are ok because the stdvga model will never actually accept the 
> I/O since MMIO <-> MMIO rep mov is explicitly disallowed.

True, for now at least.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  9:05         ` Paul Durrant
  2015-07-13  9:27           ` Jan Beulich
@ 2015-07-13  9:42           ` Razvan Cojocaru
  2015-07-13 10:11             ` Paul Durrant
  1 sibling, 1 reply; 17+ messages in thread
From: Razvan Cojocaru @ 2015-07-13  9:42 UTC (permalink / raw)
  To: Paul Durrant, Andrew Cooper, xen-devel; +Cc: Keir (Xen.org), Jan Beulich

On 07/13/2015 12:05 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
>> Sent: 13 July 2015 10:03
>> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
>> Cc: Keir (Xen.org); Jan Beulich
>> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
>>
>> On 07/13/2015 12:01 PM, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
>>>> Sent: 13 July 2015 09:50
>>>> To: Andrew Cooper; xen-devel@lists.xen.org
>>>> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
>>>> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
>>>>
>>>> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
>>>>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I'm battling the following hypervisor crash with current staging:
>>>>>>
>>>>>> (d2) Invoking ROMBIOS ...
>>>>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
>>>>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $
>>>>>> (XEN) Watchdog timer detects that CPU7 is stuck!
>>>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
>>>>>> (XEN) CPU:    7
>>>>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
>>>>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
>>>> 000000000000c11e
>>>>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi:
>> ffff83041e687978
>>>>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:
>>>> 0000000000000000
>>>>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11:
>>>> 0000000000000000
>>>>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14:
>> ffff83040eb37278
>>>>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
>>>> 00000000001526e0
>>>>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
>>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
>>>>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
>>>> 0000000000000282
>>>>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
>>>> ffff83040eb37268
>>>>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
>>>> 0000000000000001
>>>>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
>>>> 00000000000c253f
>>>>>> (XEN)    0000000000000000 0000000100000001 0100000000000000
>>>> ffff83040eb37328
>>>>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
>>>> 0000000000000000
>>>>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
>>>> 000000000000053f
>>>>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
>>>> 00000000000000c2
>>>>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
>>>> ffff82d08028c620
>>>>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
>>>> ffff83040eb37398
>>>>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
>>>> ffff830403791bf0
>>>>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
>>>> 0000000000000001
>>>>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
>>>> ffff83040eb373c8
>>>>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
>>>> 0000000000000001
>>>>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
>>>> 00000000000a0001
>>>>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000
>>>> ffff83040eb37478
>>>>>> (XEN)    0000000000000001 0000000000000000 0000000000000000
>>>> 0000000000000001
>>>>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
>>>> 0000000000000001
>>>>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
>>>> 0100000000000000
>>>>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
>>>> ffff83040eb374c8
>>>>>> (XEN) Xen call trace:
>>>>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
>>>>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
>>>>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
>>>>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
>>>>>> (XEN)    [<ffff82d0801c3e5d>] hvm_copy_from_guest_phys+0x14/0x16
>>>>>> (XEN)    [<ffff82d0801cb107>] hvm_process_io_intercept+0x10b/0x1d6
>>>>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
>>>>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
>>>>>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
>>>>>> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
>>>>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
>>>>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
>>>>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
>>>>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
>>>>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
>>>>>> (XEN)    [<ffff82d0801cb700>]
>> handle_mmio_with_translation+0x44/0x46
>>>>>> (XEN)    [<ffff82d0801c27f6>]
>> hvm_hap_nested_page_fault+0x15f/0x589
>>>>>> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
>>>>>> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
>>>>>> (XEN)
>>>>>> (XEN)
>>>>>> (XEN) ****************************************
>>>>>> (XEN) Panic on CPU 7:
>>>>>> (XEN) FATAL TRAP: vector = 2 (nmi)
>>>>>> (XEN) [error_code=0000]
>>>>>> (XEN) ****************************************
>>>>>>
>>>>>> At first I thought it was caused by V5 of the vm_event-based
>>>>>> introspection series, but I've rolled it back enough to apply V4 on top
>>>>>> of it (which has been thoroughly tested on Thursday), and it still
>>>>>> happens, so this would at least appear to be unrelated at this point
>>>>>> (other than the fact that our use case is maybe somewhat unusual with
>>>>>> heavy emulation).
>>>>>>
>>>>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
>>>>>> issue a heads-up here as soon as possible, in case the problem is
>>>>>> obvious for somebody and it helps getting it fixed sooner.
>>>>>
>>>>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
>>>>> deliberate split between stdvga_mem_accept() and
>>>> stdvga_mem_complete()
>>>>> about locking and unlocking the stdvga lock.
>>>>>
>>>>> At a guess, the previous chain of execution accidentally omitted the
>>>>> stdvga_mem_complete() call.
>>>>
>>>> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
>>>> test a fix if one is provided, but I don't know enough about that code
>>>> to go mess with it...
>>>>
>>>
>>> The problem appears to be that __hvm_copy() is calling this code before
>> establishing the type of the page:
>>>
>>>         /*
>>>          * No need to do the P2M lookup for internally handled MMIO,
>> benefiting
>>>          * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC accesses,
>>>          * - newer Windows (like Server 2012) for HPET accesses.
>>>          */
>>>         if ( !nestedhvm_vcpu_in_guestmode(curr)
>>>              && is_hvm_vcpu(curr)
>>>              && hvm_mmio_internal(gpa) )
>>>             return HVMCOPY_bad_gfn_to_mfn;
>>>
>>> ...and that call to hvm_mmio_internal is trying to re-acquire the lock. The
>> big question is why on earth was I not hitting this every time in testing too...
>> This patch series has gone through some pretty rigorous tests. Anyway, I will
>> post a fix very shortly (since I know what needs to be done).
>>>
>>>   Paul
>>
>> Thanks Paul! I appreciate the quick response.
>>
> 
> While I'm prepping, try this:
> 
> diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
> index ebb3b42..08c797c 100644
> --- a/xen/arch/x86/hvm/stdvga.c
> +++ b/xen/arch/x86/hvm/stdvga.c
> @@ -490,11 +490,18 @@ static bool_t stdvga_mem_accept(const struct hvm_io_handle
>  {
>      struct hvm_hw_stdvga *s = &current->domain->arch.hvm_domain.stdvga;
> 
> +    /*
> +     * The range check must be done without taking any locks, to avoid
> +     * deadlock when hvm_mmio_internal() is called from
> +     * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
> +     */
> +    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> +         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
> +        return 0;
> +
>      spin_lock(&s->lock);
> 
> -    if ( !s->stdvga ||
> -         (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> -         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
> +    if ( !s->stdvga )
>          goto reject;
> 
>      if ( p->dir == IOREQ_WRITE && p->count > 1 )

The crash is gone, so in that respect:

Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>

But my guest seems stuck at boot for some reason, the last line of xl
dmesg is:

(XEN) stdvga.c:151:d3v0 leaving stdvga mode

and this is the guest state:

# ./xenctx -a 3
rip: fffff80001497de0
flags: 00000002 nz
rsp: fffff80001479538
rax: 0000000000000000   rcx: 00000000000003fe   rdx: 00000000000003fe
rbx: fffff8000149a110   rsi: fffff80001479601   rdi: 0000000000009860
rbp: 0000000000000000    r8: fffff8000149a174    r9: 0000000000000000
r10: 00000000000027d7   r11: fffff800014795a0   r12: 0000000000000000
r13: 0000000000000004   r14: 0000000000000002   r15: 0000000000000000
 cs: 0010        ss: 0018        ds: 002b        es: 002b
 fs: 0053 @ 0000000000000000
 gs: 002b @ fffff80001805d00/fffff80001805d00

cr0: 0000000080050031
cr2: 0000000000000030
cr3: 0000000000187000
cr4: 00000000000006b8

dr0: 0000000000000000
dr1: 0000000000000000
dr2: 0000000000000000
dr3: 0000000000000000
dr6: 00000000fffe0ff0
dr7: 0000000000000400
Code (instr addr fffff80001497de0)
90 8a c2 0f b7 d1 ee c3 90 90 90 90 90 90 90 90 90 0f b7 d1 ec <c3> 90
90 90 90 90 90 90 88 11 f0

But it's too early to tell that this isn't something my V5 code is now
doing, so I'll come back if it turns out to be unrelated.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13  9:42           ` Razvan Cojocaru
@ 2015-07-13 10:11             ` Paul Durrant
  2015-07-13 12:04               ` Paul Durrant
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Durrant @ 2015-07-13 10:11 UTC (permalink / raw)
  To: Razvan Cojocaru, Andrew Cooper, xen-devel; +Cc: Keir (Xen.org), Jan Beulich

> -----Original Message-----
> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> Sent: 13 July 2015 10:42
> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
> Cc: Keir (Xen.org); Jan Beulich
> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
> 
> On 07/13/2015 12:05 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> >> Sent: 13 July 2015 10:03
> >> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
> >> Cc: Keir (Xen.org); Jan Beulich
> >> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with
> emulation
> >>
> >> On 07/13/2015 12:01 PM, Paul Durrant wrote:
> >>>> -----Original Message-----
> >>>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> >>>> Sent: 13 July 2015 09:50
> >>>> To: Andrew Cooper; xen-devel@lists.xen.org
> >>>> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
> >>>> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
> >>>>
> >>>> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
> >>>>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
> >>>>>> Hello,
> >>>>>>
> >>>>>> I'm battling the following hypervisor crash with current staging:
> >>>>>>
> >>>>>> (d2) Invoking ROMBIOS ...
> >>>>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
> >>>>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp
> $
> >>>>>> (XEN) Watchdog timer detects that CPU7 is stuck!
> >>>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
> >>>>>> (XEN) CPU:    7
> >>>>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> >>>>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
> >>>>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
> >>>> 000000000000c11e
> >>>>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi:
> >> ffff83041e687978
> >>>>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:
> >>>> 0000000000000000
> >>>>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11:
> >>>> 0000000000000000
> >>>>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14:
> >> ffff83040eb37278
> >>>>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
> >>>> 00000000001526e0
> >>>>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
> >>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> >>>>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
> >>>>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
> >>>> 0000000000000282
> >>>>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
> >>>> ffff83040eb37268
> >>>>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
> >>>> 0000000000000001
> >>>>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
> >>>> 00000000000c253f
> >>>>>> (XEN)    0000000000000000 0000000100000001 0100000000000000
> >>>> ffff83040eb37328
> >>>>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
> >>>> 0000000000000000
> >>>>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
> >>>> 000000000000053f
> >>>>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
> >>>> 00000000000000c2
> >>>>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
> >>>> ffff82d08028c620
> >>>>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
> >>>> ffff83040eb37398
> >>>>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
> >>>> ffff830403791bf0
> >>>>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
> >>>> 0000000000000001
> >>>>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
> >>>> ffff83040eb373c8
> >>>>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
> >>>> 0000000000000001
> >>>>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
> >>>> 00000000000a0001
> >>>>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000
> >>>> ffff83040eb37478
> >>>>>> (XEN)    0000000000000001 0000000000000000 0000000000000000
> >>>> 0000000000000001
> >>>>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
> >>>> 0000000000000001
> >>>>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
> >>>> 0100000000000000
> >>>>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
> >>>> ffff83040eb374c8
> >>>>>> (XEN) Xen call trace:
> >>>>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> >>>>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
> >>>>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
> >>>>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
> >>>>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
> >>>>>> (XEN)    [<ffff82d0801c3e5d>]
> hvm_copy_from_guest_phys+0x14/0x16
> >>>>>> (XEN)    [<ffff82d0801cb107>]
> hvm_process_io_intercept+0x10b/0x1d6
> >>>>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
> >>>>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
> >>>>>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
> >>>>>> (XEN)    [<ffff82d0801bc129>] hvmemul_do_mmio_addr+0x24/0x26
> >>>>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
> >>>>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
> >>>>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
> >>>>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
> >>>>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
> >>>>>> (XEN)    [<ffff82d0801cb700>]
> >> handle_mmio_with_translation+0x44/0x46
> >>>>>> (XEN)    [<ffff82d0801c27f6>]
> >> hvm_hap_nested_page_fault+0x15f/0x589
> >>>>>> (XEN)    [<ffff82d0801e9741>] vmx_vmexit_handler+0x150e/0x188d
> >>>>>> (XEN)    [<ffff82d0801ee7d1>] vmx_asm_vmexit_handler+0x41/0xc0
> >>>>>> (XEN)
> >>>>>> (XEN)
> >>>>>> (XEN) ****************************************
> >>>>>> (XEN) Panic on CPU 7:
> >>>>>> (XEN) FATAL TRAP: vector = 2 (nmi)
> >>>>>> (XEN) [error_code=0000]
> >>>>>> (XEN) ****************************************
> >>>>>>
> >>>>>> At first I thought it was caused by V5 of the vm_event-based
> >>>>>> introspection series, but I've rolled it back enough to apply V4 on top
> >>>>>> of it (which has been thoroughly tested on Thursday), and it still
> >>>>>> happens, so this would at least appear to be unrelated at this point
> >>>>>> (other than the fact that our use case is maybe somewhat unusual
> with
> >>>>>> heavy emulation).
> >>>>>>
> >>>>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
> >>>>>> issue a heads-up here as soon as possible, in case the problem is
> >>>>>> obvious for somebody and it helps getting it fixed sooner.
> >>>>>
> >>>>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
> >>>>> deliberate split between stdvga_mem_accept() and
> >>>> stdvga_mem_complete()
> >>>>> about locking and unlocking the stdvga lock.
> >>>>>
> >>>>> At a guess, the previous chain of execution accidentally omitted the
> >>>>> stdvga_mem_complete() call.
> >>>>
> >>>> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
> >>>> test a fix if one is provided, but I don't know enough about that code
> >>>> to go mess with it...
> >>>>
> >>>
> >>> The problem appears to be that __hvm_copy() is calling this code before
> >> establishing the type of the page:
> >>>
> >>>         /*
> >>>          * No need to do the P2M lookup for internally handled MMIO,
> >> benefiting
> >>>          * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC
> accesses,
> >>>          * - newer Windows (like Server 2012) for HPET accesses.
> >>>          */
> >>>         if ( !nestedhvm_vcpu_in_guestmode(curr)
> >>>              && is_hvm_vcpu(curr)
> >>>              && hvm_mmio_internal(gpa) )
> >>>             return HVMCOPY_bad_gfn_to_mfn;
> >>>
> >>> ...and that call to hvm_mmio_internal is trying to re-acquire the lock. The
> >> big question is why on earth was I not hitting this every time in testing
> too...
> >> This patch series has gone through some pretty rigorous tests. Anyway, I
> will
> >> post a fix very shortly (since I know what needs to be done).
> >>>
> >>>   Paul
> >>
> >> Thanks Paul! I appreciate the quick response.
> >>
> >
> > While I'm prepping, try this:
> >
> > diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
> > index ebb3b42..08c797c 100644
> > --- a/xen/arch/x86/hvm/stdvga.c
> > +++ b/xen/arch/x86/hvm/stdvga.c
> > @@ -490,11 +490,18 @@ static bool_t stdvga_mem_accept(const struct
> hvm_io_handle
> >  {
> >      struct hvm_hw_stdvga *s = &current->domain-
> >arch.hvm_domain.stdvga;
> >
> > +    /*
> > +     * The range check must be done without taking any locks, to avoid
> > +     * deadlock when hvm_mmio_internal() is called from
> > +     * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
> > +     */
> > +    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> > +         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE))
> )
> > +        return 0;
> > +
> >      spin_lock(&s->lock);
> >
> > -    if ( !s->stdvga ||
> > -         (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> > -         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE + VGA_MEM_SIZE)) )
> > +    if ( !s->stdvga )
> >          goto reject;
> >
> >      if ( p->dir == IOREQ_WRITE && p->count > 1 )
> 
> The crash is gone, so in that respect:
> 
> Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
> 
> But my guest seems stuck at boot for some reason, the last line of xl
> dmesg is:
> 
> (XEN) stdvga.c:151:d3v0 leaving stdvga mode
> 
> and this is the guest state:
> 
> # ./xenctx -a 3
> rip: fffff80001497de0
> flags: 00000002 nz
> rsp: fffff80001479538
> rax: 0000000000000000   rcx: 00000000000003fe   rdx: 00000000000003fe
> rbx: fffff8000149a110   rsi: fffff80001479601   rdi: 0000000000009860
> rbp: 0000000000000000    r8: fffff8000149a174    r9: 0000000000000000
> r10: 00000000000027d7   r11: fffff800014795a0   r12: 0000000000000000
> r13: 0000000000000004   r14: 0000000000000002   r15: 0000000000000000
>  cs: 0010        ss: 0018        ds: 002b        es: 002b
>  fs: 0053 @ 0000000000000000
>  gs: 002b @ fffff80001805d00/fffff80001805d00
> 
> cr0: 0000000080050031
> cr2: 0000000000000030
> cr3: 0000000000187000
> cr4: 00000000000006b8
> 
> dr0: 0000000000000000
> dr1: 0000000000000000
> dr2: 0000000000000000
> dr3: 0000000000000000
> dr6: 00000000fffe0ff0
> dr7: 0000000000000400
> Code (instr addr fffff80001497de0)
> 90 8a c2 0f b7 d1 ee c3 90 90 90 90 90 90 90 90 90 0f b7 d1 ec <c3> 90
> 90 90 90 90 90 90 88 11 f0
> 
> But it's too early to tell that this isn't something my V5 code is now
> doing, so I'll come back if it turns out to be unrelated.
> 

I'm setting up my test rig again now.

  Paul

> 
> Thanks,
> Razvan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13 10:11             ` Paul Durrant
@ 2015-07-13 12:04               ` Paul Durrant
  2015-07-13 12:07                 ` Razvan Cojocaru
  0 siblings, 1 reply; 17+ messages in thread
From: Paul Durrant @ 2015-07-13 12:04 UTC (permalink / raw)
  To: Paul Durrant, Razvan Cojocaru, Andrew Cooper, xen-devel
  Cc: Keir (Xen.org), Jan Beulich

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Paul Durrant
> Sent: 13 July 2015 11:12
> To: Razvan Cojocaru; Andrew Cooper; xen-devel@lists.xen.org
> Cc: Keir (Xen.org); Jan Beulich
> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
> 
> > -----Original Message-----
> > From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> > Sent: 13 July 2015 10:42
> > To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
> > Cc: Keir (Xen.org); Jan Beulich
> > Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
> >
> > On 07/13/2015 12:05 PM, Paul Durrant wrote:
> > >> -----Original Message-----
> > >> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> > >> Sent: 13 July 2015 10:03
> > >> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
> > >> Cc: Keir (Xen.org); Jan Beulich
> > >> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with
> > emulation
> > >>
> > >> On 07/13/2015 12:01 PM, Paul Durrant wrote:
> > >>>> -----Original Message-----
> > >>>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> > >>>> Sent: 13 July 2015 09:50
> > >>>> To: Andrew Cooper; xen-devel@lists.xen.org
> > >>>> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
> > >>>> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
> > >>>>
> > >>>> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
> > >>>>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
> > >>>>>> Hello,
> > >>>>>>
> > >>>>>> I'm battling the following hypervisor crash with current staging:
> > >>>>>>
> > >>>>>> (d2) Invoking ROMBIOS ...
> > >>>>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
> > >>>>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert
> Exp
> > $
> > >>>>>> (XEN) Watchdog timer detects that CPU7 is stuck!
> > >>>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
> > >>>>>> (XEN) CPU:    7
> > >>>>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> > >>>>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
> > >>>>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
> > >>>> 000000000000c11e
> > >>>>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi:
> > >> ffff83041e687978
> > >>>>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:
> > >>>> 0000000000000000
> > >>>>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11:
> > >>>> 0000000000000000
> > >>>>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14:
> > >> ffff83040eb37278
> > >>>>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
> > >>>> 00000000001526e0
> > >>>>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
> > >>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> > >>>>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
> > >>>>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
> > >>>> 0000000000000282
> > >>>>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
> > >>>> ffff83040eb37268
> > >>>>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
> > >>>> 0000000000000001
> > >>>>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
> > >>>> 00000000000c253f
> > >>>>>> (XEN)    0000000000000000 0000000100000001 0100000000000000
> > >>>> ffff83040eb37328
> > >>>>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
> > >>>> 0000000000000000
> > >>>>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
> > >>>> 000000000000053f
> > >>>>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
> > >>>> 00000000000000c2
> > >>>>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
> > >>>> ffff82d08028c620
> > >>>>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
> > >>>> ffff83040eb37398
> > >>>>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
> > >>>> ffff830403791bf0
> > >>>>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
> > >>>> 0000000000000001
> > >>>>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
> > >>>> ffff83040eb373c8
> > >>>>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
> > >>>> 0000000000000001
> > >>>>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
> > >>>> 00000000000a0001
> > >>>>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000
> > >>>> ffff83040eb37478
> > >>>>>> (XEN)    0000000000000001 0000000000000000 0000000000000000
> > >>>> 0000000000000001
> > >>>>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
> > >>>> 0000000000000001
> > >>>>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
> > >>>> 0100000000000000
> > >>>>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
> > >>>> ffff83040eb374c8
> > >>>>>> (XEN) Xen call trace:
> > >>>>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> > >>>>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
> > >>>>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
> > >>>>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
> > >>>>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
> > >>>>>> (XEN)    [<ffff82d0801c3e5d>]
> > hvm_copy_from_guest_phys+0x14/0x16
> > >>>>>> (XEN)    [<ffff82d0801cb107>]
> > hvm_process_io_intercept+0x10b/0x1d6
> > >>>>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
> > >>>>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
> > >>>>>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
> > >>>>>> (XEN)    [<ffff82d0801bc129>]
> hvmemul_do_mmio_addr+0x24/0x26
> > >>>>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
> > >>>>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
> > >>>>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
> > >>>>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
> > >>>>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
> > >>>>>> (XEN)    [<ffff82d0801cb700>]
> > >> handle_mmio_with_translation+0x44/0x46
> > >>>>>> (XEN)    [<ffff82d0801c27f6>]
> > >> hvm_hap_nested_page_fault+0x15f/0x589
> > >>>>>> (XEN)    [<ffff82d0801e9741>]
> vmx_vmexit_handler+0x150e/0x188d
> > >>>>>> (XEN)    [<ffff82d0801ee7d1>]
> vmx_asm_vmexit_handler+0x41/0xc0
> > >>>>>> (XEN)
> > >>>>>> (XEN)
> > >>>>>> (XEN) ****************************************
> > >>>>>> (XEN) Panic on CPU 7:
> > >>>>>> (XEN) FATAL TRAP: vector = 2 (nmi)
> > >>>>>> (XEN) [error_code=0000]
> > >>>>>> (XEN) ****************************************
> > >>>>>>
> > >>>>>> At first I thought it was caused by V5 of the vm_event-based
> > >>>>>> introspection series, but I've rolled it back enough to apply V4 on
> top
> > >>>>>> of it (which has been thoroughly tested on Thursday), and it still
> > >>>>>> happens, so this would at least appear to be unrelated at this point
> > >>>>>> (other than the fact that our use case is maybe somewhat unusual
> > with
> > >>>>>> heavy emulation).
> > >>>>>>
> > >>>>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
> > >>>>>> issue a heads-up here as soon as possible, in case the problem is
> > >>>>>> obvious for somebody and it helps getting it fixed sooner.
> > >>>>>
> > >>>>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
> > >>>>> deliberate split between stdvga_mem_accept() and
> > >>>> stdvga_mem_complete()
> > >>>>> about locking and unlocking the stdvga lock.
> > >>>>>
> > >>>>> At a guess, the previous chain of execution accidentally omitted the
> > >>>>> stdvga_mem_complete() call.
> > >>>>
> > >>>> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
> > >>>> test a fix if one is provided, but I don't know enough about that code
> > >>>> to go mess with it...
> > >>>>
> > >>>
> > >>> The problem appears to be that __hvm_copy() is calling this code
> before
> > >> establishing the type of the page:
> > >>>
> > >>>         /*
> > >>>          * No need to do the P2M lookup for internally handled MMIO,
> > >> benefiting
> > >>>          * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC
> > accesses,
> > >>>          * - newer Windows (like Server 2012) for HPET accesses.
> > >>>          */
> > >>>         if ( !nestedhvm_vcpu_in_guestmode(curr)
> > >>>              && is_hvm_vcpu(curr)
> > >>>              && hvm_mmio_internal(gpa) )
> > >>>             return HVMCOPY_bad_gfn_to_mfn;
> > >>>
> > >>> ...and that call to hvm_mmio_internal is trying to re-acquire the lock.
> The
> > >> big question is why on earth was I not hitting this every time in testing
> > too...
> > >> This patch series has gone through some pretty rigorous tests. Anyway,
> I
> > will
> > >> post a fix very shortly (since I know what needs to be done).
> > >>>
> > >>>   Paul
> > >>
> > >> Thanks Paul! I appreciate the quick response.
> > >>
> > >
> > > While I'm prepping, try this:
> > >
> > > diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
> > > index ebb3b42..08c797c 100644
> > > --- a/xen/arch/x86/hvm/stdvga.c
> > > +++ b/xen/arch/x86/hvm/stdvga.c
> > > @@ -490,11 +490,18 @@ static bool_t stdvga_mem_accept(const struct
> > hvm_io_handle
> > >  {
> > >      struct hvm_hw_stdvga *s = &current->domain-
> > >arch.hvm_domain.stdvga;
> > >
> > > +    /*
> > > +     * The range check must be done without taking any locks, to avoid
> > > +     * deadlock when hvm_mmio_internal() is called from
> > > +     * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
> > > +     */
> > > +    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> > > +         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE +
> VGA_MEM_SIZE))
> > )
> > > +        return 0;
> > > +
> > >      spin_lock(&s->lock);
> > >
> > > -    if ( !s->stdvga ||
> > > -         (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> > > -         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE +
> VGA_MEM_SIZE)) )
> > > +    if ( !s->stdvga )
> > >          goto reject;
> > >
> > >      if ( p->dir == IOREQ_WRITE && p->count > 1 )
> >
> > The crash is gone, so in that respect:
> >
> > Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
> >
> > But my guest seems stuck at boot for some reason, the last line of xl
> > dmesg is:
> >
> > (XEN) stdvga.c:151:d3v0 leaving stdvga mode
> >
> > and this is the guest state:
> >
> > # ./xenctx -a 3
> > rip: fffff80001497de0
> > flags: 00000002 nz
> > rsp: fffff80001479538
> > rax: 0000000000000000   rcx: 00000000000003fe   rdx: 00000000000003fe
> > rbx: fffff8000149a110   rsi: fffff80001479601   rdi: 0000000000009860
> > rbp: 0000000000000000    r8: fffff8000149a174    r9: 0000000000000000
> > r10: 00000000000027d7   r11: fffff800014795a0   r12: 0000000000000000
> > r13: 0000000000000004   r14: 0000000000000002   r15: 0000000000000000
> >  cs: 0010        ss: 0018        ds: 002b        es: 002b
> >  fs: 0053 @ 0000000000000000
> >  gs: 002b @ fffff80001805d00/fffff80001805d00
> >
> > cr0: 0000000080050031
> > cr2: 0000000000000030
> > cr3: 0000000000187000
> > cr4: 00000000000006b8
> >
> > dr0: 0000000000000000
> > dr1: 0000000000000000
> > dr2: 0000000000000000
> > dr3: 0000000000000000
> > dr6: 00000000fffe0ff0
> > dr7: 0000000000000400
> > Code (instr addr fffff80001497de0)
> > 90 8a c2 0f b7 d1 ee c3 90 90 90 90 90 90 90 90 90 0f b7 d1 ec <c3> 90
> > 90 90 90 90 90 90 88 11 f0
> >
> > But it's too early to tell that this isn't something my V5 code is now
> > doing, so I'll come back if it turns out to be unrelated.
> >
> 
> I'm setting up my test rig again now.

I've not been able to repro so far. I'm using Windows 7 32-bit installation from ISO + cirrus VGA + ROMBIOS + shadow paging, which usually hits emulation pretty hard.

  Paul

> 
>   Paul
> 
> >
> > Thanks,
> > Razvan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13 12:04               ` Paul Durrant
@ 2015-07-13 12:07                 ` Razvan Cojocaru
  2015-07-13 12:08                   ` Paul Durrant
  0 siblings, 1 reply; 17+ messages in thread
From: Razvan Cojocaru @ 2015-07-13 12:07 UTC (permalink / raw)
  To: Paul Durrant, Andrew Cooper, xen-devel; +Cc: Keir (Xen.org), Jan Beulich

On 07/13/2015 03:04 PM, Paul Durrant wrote:
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>> bounces@lists.xen.org] On Behalf Of Paul Durrant
>> Sent: 13 July 2015 11:12
>> To: Razvan Cojocaru; Andrew Cooper; xen-devel@lists.xen.org
>> Cc: Keir (Xen.org); Jan Beulich
>> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
>>
>>> -----Original Message-----
>>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
>>> Sent: 13 July 2015 10:42
>>> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
>>> Cc: Keir (Xen.org); Jan Beulich
>>> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
>>>
>>> On 07/13/2015 12:05 PM, Paul Durrant wrote:
>>>>> -----Original Message-----
>>>>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
>>>>> Sent: 13 July 2015 10:03
>>>>> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
>>>>> Cc: Keir (Xen.org); Jan Beulich
>>>>> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with
>>> emulation
>>>>>
>>>>> On 07/13/2015 12:01 PM, Paul Durrant wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
>>>>>>> Sent: 13 July 2015 09:50
>>>>>>> To: Andrew Cooper; xen-devel@lists.xen.org
>>>>>>> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
>>>>>>> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
>>>>>>>
>>>>>>> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
>>>>>>>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I'm battling the following hypervisor crash with current staging:
>>>>>>>>>
>>>>>>>>> (d2) Invoking ROMBIOS ...
>>>>>>>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
>>>>>>>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert
>> Exp
>>> $
>>>>>>>>> (XEN) Watchdog timer detects that CPU7 is stuck!
>>>>>>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
>>>>>>>>> (XEN) CPU:    7
>>>>>>>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>>>>>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
>>>>>>>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
>>>>>>> 000000000000c11e
>>>>>>>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi:
>>>>> ffff83041e687978
>>>>>>>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:
>>>>>>> 0000000000000000
>>>>>>>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11:
>>>>>>> 0000000000000000
>>>>>>>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14:
>>>>> ffff83040eb37278
>>>>>>>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
>>>>>>> 00000000001526e0
>>>>>>>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
>>>>>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>>>>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
>>>>>>>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
>>>>>>> 0000000000000282
>>>>>>>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
>>>>>>> ffff83040eb37268
>>>>>>>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
>>>>>>> 0000000000000001
>>>>>>>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
>>>>>>> 00000000000c253f
>>>>>>>>> (XEN)    0000000000000000 0000000100000001 0100000000000000
>>>>>>> ffff83040eb37328
>>>>>>>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
>>>>>>> 0000000000000000
>>>>>>>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
>>>>>>> 000000000000053f
>>>>>>>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
>>>>>>> 00000000000000c2
>>>>>>>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
>>>>>>> ffff82d08028c620
>>>>>>>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
>>>>>>> ffff83040eb37398
>>>>>>>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
>>>>>>> ffff830403791bf0
>>>>>>>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
>>>>>>> 0000000000000001
>>>>>>>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
>>>>>>> ffff83040eb373c8
>>>>>>>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
>>>>>>> 0000000000000001
>>>>>>>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
>>>>>>> 00000000000a0001
>>>>>>>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000
>>>>>>> ffff83040eb37478
>>>>>>>>> (XEN)    0000000000000001 0000000000000000 0000000000000000
>>>>>>> 0000000000000001
>>>>>>>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
>>>>>>> 0000000000000001
>>>>>>>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
>>>>>>> 0100000000000000
>>>>>>>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
>>>>>>> ffff83040eb374c8
>>>>>>>>> (XEN) Xen call trace:
>>>>>>>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
>>>>>>>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
>>>>>>>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
>>>>>>>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
>>>>>>>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
>>>>>>>>> (XEN)    [<ffff82d0801c3e5d>]
>>> hvm_copy_from_guest_phys+0x14/0x16
>>>>>>>>> (XEN)    [<ffff82d0801cb107>]
>>> hvm_process_io_intercept+0x10b/0x1d6
>>>>>>>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
>>>>>>>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
>>>>>>>>> (XEN)    [<ffff82d0801bc0b9>] hvmemul_do_io_addr+0x117/0x163
>>>>>>>>> (XEN)    [<ffff82d0801bc129>]
>> hvmemul_do_mmio_addr+0x24/0x26
>>>>>>>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
>>>>>>>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
>>>>>>>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
>>>>>>>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
>>>>>>>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
>>>>>>>>> (XEN)    [<ffff82d0801cb700>]
>>>>> handle_mmio_with_translation+0x44/0x46
>>>>>>>>> (XEN)    [<ffff82d0801c27f6>]
>>>>> hvm_hap_nested_page_fault+0x15f/0x589
>>>>>>>>> (XEN)    [<ffff82d0801e9741>]
>> vmx_vmexit_handler+0x150e/0x188d
>>>>>>>>> (XEN)    [<ffff82d0801ee7d1>]
>> vmx_asm_vmexit_handler+0x41/0xc0
>>>>>>>>> (XEN)
>>>>>>>>> (XEN)
>>>>>>>>> (XEN) ****************************************
>>>>>>>>> (XEN) Panic on CPU 7:
>>>>>>>>> (XEN) FATAL TRAP: vector = 2 (nmi)
>>>>>>>>> (XEN) [error_code=0000]
>>>>>>>>> (XEN) ****************************************
>>>>>>>>>
>>>>>>>>> At first I thought it was caused by V5 of the vm_event-based
>>>>>>>>> introspection series, but I've rolled it back enough to apply V4 on
>> top
>>>>>>>>> of it (which has been thoroughly tested on Thursday), and it still
>>>>>>>>> happens, so this would at least appear to be unrelated at this point
>>>>>>>>> (other than the fact that our use case is maybe somewhat unusual
>>> with
>>>>>>>>> heavy emulation).
>>>>>>>>>
>>>>>>>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
>>>>>>>>> issue a heads-up here as soon as possible, in case the problem is
>>>>>>>>> obvious for somebody and it helps getting it fixed sooner.
>>>>>>>>
>>>>>>>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a
>>>>>>>> deliberate split between stdvga_mem_accept() and
>>>>>>> stdvga_mem_complete()
>>>>>>>> about locking and unlocking the stdvga lock.
>>>>>>>>
>>>>>>>> At a guess, the previous chain of execution accidentally omitted the
>>>>>>>> stdvga_mem_complete() call.
>>>>>>>
>>>>>>> Thanks, I've reverted that patch and the crash is gone. I'll be happy to
>>>>>>> test a fix if one is provided, but I don't know enough about that code
>>>>>>> to go mess with it...
>>>>>>>
>>>>>>
>>>>>> The problem appears to be that __hvm_copy() is calling this code
>> before
>>>>> establishing the type of the page:
>>>>>>
>>>>>>         /*
>>>>>>          * No need to do the P2M lookup for internally handled MMIO,
>>>>> benefiting
>>>>>>          * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC
>>> accesses,
>>>>>>          * - newer Windows (like Server 2012) for HPET accesses.
>>>>>>          */
>>>>>>         if ( !nestedhvm_vcpu_in_guestmode(curr)
>>>>>>              && is_hvm_vcpu(curr)
>>>>>>              && hvm_mmio_internal(gpa) )
>>>>>>             return HVMCOPY_bad_gfn_to_mfn;
>>>>>>
>>>>>> ...and that call to hvm_mmio_internal is trying to re-acquire the lock.
>> The
>>>>> big question is why on earth was I not hitting this every time in testing
>>> too...
>>>>> This patch series has gone through some pretty rigorous tests. Anyway,
>> I
>>> will
>>>>> post a fix very shortly (since I know what needs to be done).
>>>>>>
>>>>>>   Paul
>>>>>
>>>>> Thanks Paul! I appreciate the quick response.
>>>>>
>>>>
>>>> While I'm prepping, try this:
>>>>
>>>> diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
>>>> index ebb3b42..08c797c 100644
>>>> --- a/xen/arch/x86/hvm/stdvga.c
>>>> +++ b/xen/arch/x86/hvm/stdvga.c
>>>> @@ -490,11 +490,18 @@ static bool_t stdvga_mem_accept(const struct
>>> hvm_io_handle
>>>>  {
>>>>      struct hvm_hw_stdvga *s = &current->domain-
>>>> arch.hvm_domain.stdvga;
>>>>
>>>> +    /*
>>>> +     * The range check must be done without taking any locks, to avoid
>>>> +     * deadlock when hvm_mmio_internal() is called from
>>>> +     * hvm_copy_to/from_guest_phys() in hvm_process_io_intercept().
>>>> +     */
>>>> +    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
>>>> +         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE +
>> VGA_MEM_SIZE))
>>> )
>>>> +        return 0;
>>>> +
>>>>      spin_lock(&s->lock);
>>>>
>>>> -    if ( !s->stdvga ||
>>>> -         (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
>>>> -         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE +
>> VGA_MEM_SIZE)) )
>>>> +    if ( !s->stdvga )
>>>>          goto reject;
>>>>
>>>>      if ( p->dir == IOREQ_WRITE && p->count > 1 )
>>>
>>> The crash is gone, so in that respect:
>>>
>>> Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
>>>
>>> But my guest seems stuck at boot for some reason, the last line of xl
>>> dmesg is:
>>>
>>> (XEN) stdvga.c:151:d3v0 leaving stdvga mode
>>>
>>> and this is the guest state:
>>>
>>> # ./xenctx -a 3
>>> rip: fffff80001497de0
>>> flags: 00000002 nz
>>> rsp: fffff80001479538
>>> rax: 0000000000000000   rcx: 00000000000003fe   rdx: 00000000000003fe
>>> rbx: fffff8000149a110   rsi: fffff80001479601   rdi: 0000000000009860
>>> rbp: 0000000000000000    r8: fffff8000149a174    r9: 0000000000000000
>>> r10: 00000000000027d7   r11: fffff800014795a0   r12: 0000000000000000
>>> r13: 0000000000000004   r14: 0000000000000002   r15: 0000000000000000
>>>  cs: 0010        ss: 0018        ds: 002b        es: 002b
>>>  fs: 0053 @ 0000000000000000
>>>  gs: 002b @ fffff80001805d00/fffff80001805d00
>>>
>>> cr0: 0000000080050031
>>> cr2: 0000000000000030
>>> cr3: 0000000000187000
>>> cr4: 00000000000006b8
>>>
>>> dr0: 0000000000000000
>>> dr1: 0000000000000000
>>> dr2: 0000000000000000
>>> dr3: 0000000000000000
>>> dr6: 00000000fffe0ff0
>>> dr7: 0000000000000400
>>> Code (instr addr fffff80001497de0)
>>> 90 8a c2 0f b7 d1 ee c3 90 90 90 90 90 90 90 90 90 0f b7 d1 ec <c3> 90
>>> 90 90 90 90 90 90 88 11 f0
>>>
>>> But it's too early to tell that this isn't something my V5 code is now
>>> doing, so I'll come back if it turns out to be unrelated.
>>>
>>
>> I'm setting up my test rig again now.
> 
> I've not been able to repro so far. I'm using Windows 7 32-bit installation from ISO + cirrus VGA + ROMBIOS + shadow paging, which usually hits emulation pretty hard.

Thanks Paul! I don't want to waste your time since this could very well
be something on my end. I'll investigate more and come back when I have
something more concrete.


Thanks,
Razvan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Deadlock in stdvga_mem_accept() with emulation
  2015-07-13 12:07                 ` Razvan Cojocaru
@ 2015-07-13 12:08                   ` Paul Durrant
  0 siblings, 0 replies; 17+ messages in thread
From: Paul Durrant @ 2015-07-13 12:08 UTC (permalink / raw)
  To: Razvan Cojocaru, Andrew Cooper, xen-devel; +Cc: Keir (Xen.org), Jan Beulich

> -----Original Message-----
> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> Sent: 13 July 2015 13:08
> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
> Cc: Keir (Xen.org); Jan Beulich
> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with emulation
> 
> On 07/13/2015 03:04 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> >> bounces@lists.xen.org] On Behalf Of Paul Durrant
> >> Sent: 13 July 2015 11:12
> >> To: Razvan Cojocaru; Andrew Cooper; xen-devel@lists.xen.org
> >> Cc: Keir (Xen.org); Jan Beulich
> >> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with
> emulation
> >>
> >>> -----Original Message-----
> >>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> >>> Sent: 13 July 2015 10:42
> >>> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
> >>> Cc: Keir (Xen.org); Jan Beulich
> >>> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with
> emulation
> >>>
> >>> On 07/13/2015 12:05 PM, Paul Durrant wrote:
> >>>>> -----Original Message-----
> >>>>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> >>>>> Sent: 13 July 2015 10:03
> >>>>> To: Paul Durrant; Andrew Cooper; xen-devel@lists.xen.org
> >>>>> Cc: Keir (Xen.org); Jan Beulich
> >>>>> Subject: Re: [Xen-devel] Deadlock in stdvga_mem_accept() with
> >>> emulation
> >>>>>
> >>>>> On 07/13/2015 12:01 PM, Paul Durrant wrote:
> >>>>>>> -----Original Message-----
> >>>>>>> From: Razvan Cojocaru [mailto:rcojocaru@bitdefender.com]
> >>>>>>> Sent: 13 July 2015 09:50
> >>>>>>> To: Andrew Cooper; xen-devel@lists.xen.org
> >>>>>>> Cc: Keir (Xen.org); Jan Beulich; Paul Durrant
> >>>>>>> Subject: Re: Deadlock in stdvga_mem_accept() with emulation
> >>>>>>>
> >>>>>>> On 07/13/2015 11:10 AM, Andrew Cooper wrote:
> >>>>>>>> On 13/07/2015 08:48, Razvan Cojocaru wrote:
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> I'm battling the following hypervisor crash with current staging:
> >>>>>>>>>
> >>>>>>>>> (d2) Invoking ROMBIOS ...
> >>>>>>>>> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes
> >>>>>>>>> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert
> >> Exp
> >>> $
> >>>>>>>>> (XEN) Watchdog timer detects that CPU7 is stuck!
> >>>>>>>>> (XEN) ----[ Xen-4.6-unstable  x86_64  debug=y  Not tainted ]----
> >>>>>>>>> (XEN) CPU:    7
> >>>>>>>>> (XEN) RIP:    e008:[<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> >>>>>>>>> (XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor (d2v0)
> >>>>>>>>> (XEN) rax: 000000000000c11d   rbx: ffff83041e687970   rcx:
> >>>>>>> 000000000000c11e
> >>>>>>>>> (XEN) rdx: ffff83041e687970   rsi: 000000000000c11e   rdi:
> >>>>> ffff83041e687978
> >>>>>>>>> (XEN) rbp: ffff83040eb37208   rsp: ffff83040eb37200   r8:
> >>>>>>> 0000000000000000
> >>>>>>>>> (XEN) r9:  0000000000000000   r10: ffff82d08028c3c0   r11:
> >>>>>>> 0000000000000000
> >>>>>>>>> (XEN) r12: ffff83041e687000   r13: ffff83041e687970   r14:
> >>>>> ffff83040eb37278
> >>>>>>>>> (XEN) r15: 00000000000c253f   cr0: 000000008005003b   cr4:
> >>>>>>> 00000000001526e0
> >>>>>>>>> (XEN) cr3: 00000004054a0000   cr2: 0000000000000000
> >>>>>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> >>>>>>>>> (XEN) Xen stack trace from rsp=ffff83040eb37200:
> >>>>>>>>> (XEN)    ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6
> >>>>>>> 0000000000000282
> >>>>>>>>> (XEN)    0000000000000008 ffff830403791bf0 ffff83041e687000
> >>>>>>> ffff83040eb37268
> >>>>>>>>> (XEN)    ffff82d0801cb23a 00000000000c253f ffff8300d85fc000
> >>>>>>> 0000000000000001
> >>>>>>>>> (XEN)    00000000000000c2 ffff83040eb37298 ffff82d0801cb410
> >>>>>>> 00000000000c253f
> >>>>>>>>> (XEN)    0000000000000000 0000000100000001 0100000000000000
> >>>>>>> ffff83040eb37328
> >>>>>>>>> (XEN)    ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000
> >>>>>>> 0000000000000000
> >>>>>>>>> (XEN)    ffff83040eb37360 00000000000000c2 ffff8304054cb000
> >>>>>>> 000000000000053f
> >>>>>>>>> (XEN)    0000000000000002 0000000000000000 ffff83040eb373f4
> >>>>>>> 00000000000000c2
> >>>>>>>>> (XEN)    ffff83040eb373d8 0000000000000000 0000000000000000
> >>>>>>> ffff82d08028c620
> >>>>>>>>> (XEN)    0000000000000000 ffff83040eb37338 ffff82d0801c3e5d
> >>>>>>> ffff83040eb37398
> >>>>>>>>> (XEN)    ffff82d0801cb107 000000010eb37394 ffff830403791bf0
> >>>>>>> ffff830403791bf0
> >>>>>>>>> (XEN)    ffff83041e687000 ffff83040eb37398 ffff830403791bf0
> >>>>>>> 0000000000000001
> >>>>>>>>> (XEN)    ffff83040eb373d8 0000000000000001 00000000000c253f
> >>>>>>> ffff83040eb373c8
> >>>>>>>>> (XEN)    ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000
> >>>>>>> 0000000000000001
> >>>>>>>>> (XEN)    0000000000000000 ffff83040eb37428 ffff82d0801bb440
> >>>>>>> 00000000000a0001
> >>>>>>>>> (XEN)    00000000000c253f 0000000100000001 0111000000000000
> >>>>>>> ffff83040eb37478
> >>>>>>>>> (XEN)    0000000000000001 0000000000000000 0000000000000000
> >>>>>>> 0000000000000001
> >>>>>>>>> (XEN)    0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9
> >>>>>>> 0000000000000001
> >>>>>>>>> (XEN)    00000000000c253f ffff8300d85fc000 00000000000a0001
> >>>>>>> 0100000000000000
> >>>>>>>>> (XEN)    ffff83040eb37728 ffff82e00819dc60 0000000000000000
> >>>>>>> ffff83040eb374c8
> >>>>>>>>> (XEN) Xen call trace:
> >>>>>>>>> (XEN)    [<ffff82d08012c3f1>] _spin_lock+0x31/0x54
> >>>>>>>>> (XEN)    [<ffff82d0801d09b6>] stdvga_mem_accept+0x3b/0x125
> >>>>>>>>> (XEN)    [<ffff82d0801cb23a>] hvm_find_io_handler+0x68/0x8a
> >>>>>>>>> (XEN)    [<ffff82d0801cb410>] hvm_mmio_internal+0x37/0x67
> >>>>>>>>> (XEN)    [<ffff82d0801c2403>] __hvm_copy+0xe9/0x37d
> >>>>>>>>> (XEN)    [<ffff82d0801c3e5d>]
> >>> hvm_copy_from_guest_phys+0x14/0x16
> >>>>>>>>> (XEN)    [<ffff82d0801cb107>]
> >>> hvm_process_io_intercept+0x10b/0x1d6
> >>>>>>>>> (XEN)    [<ffff82d0801cb291>] hvm_io_intercept+0x35/0x5b
> >>>>>>>>> (XEN)    [<ffff82d0801bb440>] hvmemul_do_io+0x1ff/0x2c1
> >>>>>>>>> (XEN)    [<ffff82d0801bc0b9>]
> hvmemul_do_io_addr+0x117/0x163
> >>>>>>>>> (XEN)    [<ffff82d0801bc129>]
> >> hvmemul_do_mmio_addr+0x24/0x26
> >>>>>>>>> (XEN)    [<ffff82d0801bcbb5>] hvmemul_rep_movs+0x1ef/0x335
> >>>>>>>>> (XEN)    [<ffff82d080198b49>] x86_emulate+0x56c9/0x13088
> >>>>>>>>> (XEN)    [<ffff82d0801bbd26>] _hvm_emulate_one+0x186/0x281
> >>>>>>>>> (XEN)    [<ffff82d0801bc1e8>] hvm_emulate_one+0x10/0x12
> >>>>>>>>> (XEN)    [<ffff82d0801cb63e>] handle_mmio+0x54/0xd2
> >>>>>>>>> (XEN)    [<ffff82d0801cb700>]
> >>>>> handle_mmio_with_translation+0x44/0x46
> >>>>>>>>> (XEN)    [<ffff82d0801c27f6>]
> >>>>> hvm_hap_nested_page_fault+0x15f/0x589
> >>>>>>>>> (XEN)    [<ffff82d0801e9741>]
> >> vmx_vmexit_handler+0x150e/0x188d
> >>>>>>>>> (XEN)    [<ffff82d0801ee7d1>]
> >> vmx_asm_vmexit_handler+0x41/0xc0
> >>>>>>>>> (XEN)
> >>>>>>>>> (XEN)
> >>>>>>>>> (XEN) ****************************************
> >>>>>>>>> (XEN) Panic on CPU 7:
> >>>>>>>>> (XEN) FATAL TRAP: vector = 2 (nmi)
> >>>>>>>>> (XEN) [error_code=0000]
> >>>>>>>>> (XEN) ****************************************
> >>>>>>>>>
> >>>>>>>>> At first I thought it was caused by V5 of the vm_event-based
> >>>>>>>>> introspection series, but I've rolled it back enough to apply V4 on
> >> top
> >>>>>>>>> of it (which has been thoroughly tested on Thursday), and it still
> >>>>>>>>> happens, so this would at least appear to be unrelated at this
> point
> >>>>>>>>> (other than the fact that our use case is maybe somewhat
> unusual
> >>> with
> >>>>>>>>> heavy emulation).
> >>>>>>>>>
> >>>>>>>>> I'll keep digging, but since this is a busy time for Xen I thought I'd
> >>>>>>>>> issue a heads-up here as soon as possible, in case the problem is
> >>>>>>>>> obvious for somebody and it helps getting it fixed sooner.
> >>>>>>>>
> >>>>>>>> In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now
> a
> >>>>>>>> deliberate split between stdvga_mem_accept() and
> >>>>>>> stdvga_mem_complete()
> >>>>>>>> about locking and unlocking the stdvga lock.
> >>>>>>>>
> >>>>>>>> At a guess, the previous chain of execution accidentally omitted
> the
> >>>>>>>> stdvga_mem_complete() call.
> >>>>>>>
> >>>>>>> Thanks, I've reverted that patch and the crash is gone. I'll be happy
> to
> >>>>>>> test a fix if one is provided, but I don't know enough about that
> code
> >>>>>>> to go mess with it...
> >>>>>>>
> >>>>>>
> >>>>>> The problem appears to be that __hvm_copy() is calling this code
> >> before
> >>>>> establishing the type of the page:
> >>>>>>
> >>>>>>         /*
> >>>>>>          * No need to do the P2M lookup for internally handled MMIO,
> >>>>> benefiting
> >>>>>>          * - 32-bit WinXP (& older Windows) on AMD CPUs for LAPIC
> >>> accesses,
> >>>>>>          * - newer Windows (like Server 2012) for HPET accesses.
> >>>>>>          */
> >>>>>>         if ( !nestedhvm_vcpu_in_guestmode(curr)
> >>>>>>              && is_hvm_vcpu(curr)
> >>>>>>              && hvm_mmio_internal(gpa) )
> >>>>>>             return HVMCOPY_bad_gfn_to_mfn;
> >>>>>>
> >>>>>> ...and that call to hvm_mmio_internal is trying to re-acquire the lock.
> >> The
> >>>>> big question is why on earth was I not hitting this every time in testing
> >>> too...
> >>>>> This patch series has gone through some pretty rigorous tests.
> Anyway,
> >> I
> >>> will
> >>>>> post a fix very shortly (since I know what needs to be done).
> >>>>>>
> >>>>>>   Paul
> >>>>>
> >>>>> Thanks Paul! I appreciate the quick response.
> >>>>>
> >>>>
> >>>> While I'm prepping, try this:
> >>>>
> >>>> diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
> >>>> index ebb3b42..08c797c 100644
> >>>> --- a/xen/arch/x86/hvm/stdvga.c
> >>>> +++ b/xen/arch/x86/hvm/stdvga.c
> >>>> @@ -490,11 +490,18 @@ static bool_t stdvga_mem_accept(const
> struct
> >>> hvm_io_handle
> >>>>  {
> >>>>      struct hvm_hw_stdvga *s = &current->domain-
> >>>> arch.hvm_domain.stdvga;
> >>>>
> >>>> +    /*
> >>>> +     * The range check must be done without taking any locks, to avoid
> >>>> +     * deadlock when hvm_mmio_internal() is called from
> >>>> +     * hvm_copy_to/from_guest_phys() in
> hvm_process_io_intercept().
> >>>> +     */
> >>>> +    if ( (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> >>>> +         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE +
> >> VGA_MEM_SIZE))
> >>> )
> >>>> +        return 0;
> >>>> +
> >>>>      spin_lock(&s->lock);
> >>>>
> >>>> -    if ( !s->stdvga ||
> >>>> -         (hvm_mmio_first_byte(p) < VGA_MEM_BASE) ||
> >>>> -         (hvm_mmio_last_byte(p) >= (VGA_MEM_BASE +
> >> VGA_MEM_SIZE)) )
> >>>> +    if ( !s->stdvga )
> >>>>          goto reject;
> >>>>
> >>>>      if ( p->dir == IOREQ_WRITE && p->count > 1 )
> >>>
> >>> The crash is gone, so in that respect:
> >>>
> >>> Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
> >>>
> >>> But my guest seems stuck at boot for some reason, the last line of xl
> >>> dmesg is:
> >>>
> >>> (XEN) stdvga.c:151:d3v0 leaving stdvga mode
> >>>
> >>> and this is the guest state:
> >>>
> >>> # ./xenctx -a 3
> >>> rip: fffff80001497de0
> >>> flags: 00000002 nz
> >>> rsp: fffff80001479538
> >>> rax: 0000000000000000   rcx: 00000000000003fe   rdx: 00000000000003fe
> >>> rbx: fffff8000149a110   rsi: fffff80001479601   rdi: 0000000000009860
> >>> rbp: 0000000000000000    r8: fffff8000149a174    r9: 0000000000000000
> >>> r10: 00000000000027d7   r11: fffff800014795a0   r12: 0000000000000000
> >>> r13: 0000000000000004   r14: 0000000000000002   r15: 0000000000000000
> >>>  cs: 0010        ss: 0018        ds: 002b        es: 002b
> >>>  fs: 0053 @ 0000000000000000
> >>>  gs: 002b @ fffff80001805d00/fffff80001805d00
> >>>
> >>> cr0: 0000000080050031
> >>> cr2: 0000000000000030
> >>> cr3: 0000000000187000
> >>> cr4: 00000000000006b8
> >>>
> >>> dr0: 0000000000000000
> >>> dr1: 0000000000000000
> >>> dr2: 0000000000000000
> >>> dr3: 0000000000000000
> >>> dr6: 00000000fffe0ff0
> >>> dr7: 0000000000000400
> >>> Code (instr addr fffff80001497de0)
> >>> 90 8a c2 0f b7 d1 ee c3 90 90 90 90 90 90 90 90 90 0f b7 d1 ec <c3> 90
> >>> 90 90 90 90 90 90 88 11 f0
> >>>
> >>> But it's too early to tell that this isn't something my V5 code is now
> >>> doing, so I'll come back if it turns out to be unrelated.
> >>>
> >>
> >> I'm setting up my test rig again now.
> >
> > I've not been able to repro so far. I'm using Windows 7 32-bit installation
> from ISO + cirrus VGA + ROMBIOS + shadow paging, which usually hits
> emulation pretty hard.
> 
> Thanks Paul! I don't want to waste your time since this could very well
> be something on my end. I'll investigate more and come back when I have
> something more concrete.
> 

No problem. Thanks for finding the original issue!

  Paul

> 
> Thanks,
> Razvan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-07-13 12:08 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-13  7:48 Deadlock in stdvga_mem_accept() with emulation Razvan Cojocaru
2015-07-13  8:10 ` Andrew Cooper
2015-07-13  8:50   ` Razvan Cojocaru
2015-07-13  9:00     ` Andrew Cooper
2015-07-13  9:02       ` Razvan Cojocaru
2015-07-13  9:01     ` Paul Durrant
2015-07-13  9:03       ` Razvan Cojocaru
2015-07-13  9:05         ` Paul Durrant
2015-07-13  9:27           ` Jan Beulich
2015-07-13  9:30             ` Paul Durrant
2015-07-13  9:41               ` Jan Beulich
2015-07-13  9:42           ` Razvan Cojocaru
2015-07-13 10:11             ` Paul Durrant
2015-07-13 12:04               ` Paul Durrant
2015-07-13 12:07                 ` Razvan Cojocaru
2015-07-13 12:08                   ` Paul Durrant
2015-07-13  8:51   ` Paul Durrant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).