From mboxrd@z Thu Jan 1 00:00:00 1970 From: Razvan Cojocaru Subject: Re: Deadlock in stdvga_mem_accept() with emulation Date: Mon, 13 Jul 2015 11:50:29 +0300 Message-ID: <55A37BD5.2070508@bitdefender.com> References: <55A36D67.3050203@bitdefender.com> <55A37285.3050304@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55A37285.3050304@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper , "xen-devel@lists.xen.org" Cc: Paul Durrant , Keir Fraser , Jan Beulich List-Id: xen-devel@lists.xenproject.org On 07/13/2015 11:10 AM, Andrew Cooper wrote: > On 13/07/2015 08:48, Razvan Cojocaru wrote: >> Hello, >> >> I'm battling the following hypervisor crash with current staging: >> >> (d2) Invoking ROMBIOS ... >> (XEN) stdvga.c:147:d2v0 entering stdvga and caching modes >> (d2) VGABios $Id: vgabios.c,v 1.67 2008/01/27 09:44:12 vruppert Exp $ >> (XEN) Watchdog timer detects that CPU7 is stuck! >> (XEN) ----[ Xen-4.6-unstable x86_64 debug=y Not tainted ]---- >> (XEN) CPU: 7 >> (XEN) RIP: e008:[] _spin_lock+0x31/0x54 >> (XEN) RFLAGS: 0000000000000202 CONTEXT: hypervisor (d2v0) >> (XEN) rax: 000000000000c11d rbx: ffff83041e687970 rcx: 000000000000c11e >> (XEN) rdx: ffff83041e687970 rsi: 000000000000c11e rdi: ffff83041e687978 >> (XEN) rbp: ffff83040eb37208 rsp: ffff83040eb37200 r8: 0000000000000000 >> (XEN) r9: 0000000000000000 r10: ffff82d08028c3c0 r11: 0000000000000000 >> (XEN) r12: ffff83041e687000 r13: ffff83041e687970 r14: ffff83040eb37278 >> (XEN) r15: 00000000000c253f cr0: 000000008005003b cr4: 00000000001526e0 >> (XEN) cr3: 00000004054a0000 cr2: 0000000000000000 >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 >> (XEN) Xen stack trace from rsp=ffff83040eb37200: >> (XEN) ffff83040eb37278 ffff83040eb37238 ffff82d0801d09b6 0000000000000282 >> (XEN) 0000000000000008 ffff830403791bf0 ffff83041e687000 ffff83040eb37268 >> (XEN) ffff82d0801cb23a 00000000000c253f ffff8300d85fc000 0000000000000001 >> (XEN) 00000000000000c2 ffff83040eb37298 ffff82d0801cb410 00000000000c253f >> (XEN) 0000000000000000 0000000100000001 0100000000000000 ffff83040eb37328 >> (XEN) ffff82d0801c2403 ffff83040eb37394 ffff83040eb30000 0000000000000000 >> (XEN) ffff83040eb37360 00000000000000c2 ffff8304054cb000 000000000000053f >> (XEN) 0000000000000002 0000000000000000 ffff83040eb373f4 00000000000000c2 >> (XEN) ffff83040eb373d8 0000000000000000 0000000000000000 ffff82d08028c620 >> (XEN) 0000000000000000 ffff83040eb37338 ffff82d0801c3e5d ffff83040eb37398 >> (XEN) ffff82d0801cb107 000000010eb37394 ffff830403791bf0 ffff830403791bf0 >> (XEN) ffff83041e687000 ffff83040eb37398 ffff830403791bf0 0000000000000001 >> (XEN) ffff83040eb373d8 0000000000000001 00000000000c253f ffff83040eb373c8 >> (XEN) ffff82d0801cb291 ffff83040eb37b30 ffff8300d85fc000 0000000000000001 >> (XEN) 0000000000000000 ffff83040eb37428 ffff82d0801bb440 00000000000a0001 >> (XEN) 00000000000c253f 0000000100000001 0111000000000000 ffff83040eb37478 >> (XEN) 0000000000000001 0000000000000000 0000000000000000 0000000000000001 >> (XEN) 0000000000000001 ffff83040eb374a8 ffff82d0801bc0b9 0000000000000001 >> (XEN) 00000000000c253f ffff8300d85fc000 00000000000a0001 0100000000000000 >> (XEN) ffff83040eb37728 ffff82e00819dc60 0000000000000000 ffff83040eb374c8 >> (XEN) Xen call trace: >> (XEN) [] _spin_lock+0x31/0x54 >> (XEN) [] stdvga_mem_accept+0x3b/0x125 >> (XEN) [] hvm_find_io_handler+0x68/0x8a >> (XEN) [] hvm_mmio_internal+0x37/0x67 >> (XEN) [] __hvm_copy+0xe9/0x37d >> (XEN) [] hvm_copy_from_guest_phys+0x14/0x16 >> (XEN) [] hvm_process_io_intercept+0x10b/0x1d6 >> (XEN) [] hvm_io_intercept+0x35/0x5b >> (XEN) [] hvmemul_do_io+0x1ff/0x2c1 >> (XEN) [] hvmemul_do_io_addr+0x117/0x163 >> (XEN) [] hvmemul_do_mmio_addr+0x24/0x26 >> (XEN) [] hvmemul_rep_movs+0x1ef/0x335 >> (XEN) [] x86_emulate+0x56c9/0x13088 >> (XEN) [] _hvm_emulate_one+0x186/0x281 >> (XEN) [] hvm_emulate_one+0x10/0x12 >> (XEN) [] handle_mmio+0x54/0xd2 >> (XEN) [] handle_mmio_with_translation+0x44/0x46 >> (XEN) [] hvm_hap_nested_page_fault+0x15f/0x589 >> (XEN) [] vmx_vmexit_handler+0x150e/0x188d >> (XEN) [] vmx_asm_vmexit_handler+0x41/0xc0 >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 7: >> (XEN) FATAL TRAP: vector = 2 (nmi) >> (XEN) [error_code=0000] >> (XEN) **************************************** >> >> At first I thought it was caused by V5 of the vm_event-based >> introspection series, but I've rolled it back enough to apply V4 on top >> of it (which has been thoroughly tested on Thursday), and it still >> happens, so this would at least appear to be unrelated at this point >> (other than the fact that our use case is maybe somewhat unusual with >> heavy emulation). >> >> I'll keep digging, but since this is a busy time for Xen I thought I'd >> issue a heads-up here as soon as possible, in case the problem is >> obvious for somebody and it helps getting it fixed sooner. > > In c/s 3bbaaec09b1b942f5624dee176da6e416d31f982 there is now a > deliberate split between stdvga_mem_accept() and stdvga_mem_complete() > about locking and unlocking the stdvga lock. > > At a guess, the previous chain of execution accidentally omitted the > stdvga_mem_complete() call. Thanks, I've reverted that patch and the crash is gone. I'll be happy to test a fix if one is provided, but I don't know enough about that code to go mess with it... Thanks, Razvan