From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59340) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WzGbn-00006n-CY for qemu-devel@nongnu.org; Mon, 23 Jun 2014 22:37:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WzGbh-0005Hi-6x for qemu-devel@nongnu.org; Mon, 23 Jun 2014 22:37:19 -0400 Received: from [203.217.0.115] (port=28998 helo=bom.nom.co) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WzGbf-0005GG-Fs for qemu-devel@nongnu.org; Mon, 23 Jun 2014 22:37:13 -0400 Date: Tue, 24 Jun 2014 09:41:03 +0800 From: "Michael W. Bombardieri" Message-ID: <20140624014103.GA606@bom.nom.co> References: <20131023073949.GA4527@bom.nom.co> <52679025.3000106@redhat.com> <526830E6.6070904@weilnetz.de> <5268F8B4.7030105@redhat.com> <52694CC1.6040301@weilnetz.de> <5269958A.50400@redhat.com> <526B908B.20104@weilnetz.de> <53A83C22.6090208@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53A83C22.6090208@redhat.com> Subject: Re: [Qemu-devel] broken win32 coroutines (was Re: qemu 1.6.1) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Stefan Weil , Prashant Satish , qemu-devel@nongnu.org, Stefan Hajnoczi Hi, Thanks for adding me to this thread. I am not familiar with the qemu source code but I am aware of the coroutine crash and I can test a patch if you have one. qemu 1.5.1 was the last version I was able to build and use on win32. Later versions build without error but exhibit the coroutine crash. qemu binaries built with the coroutine feature disabled are to slow to use. - Michael On Mon, Jun 23, 2014 at 04:39:30PM +0200, Paolo Bonzini wrote: > Il 26/10/2013 11:51, Stefan Weil ha scritto: > > unpatched QEMU, crash with assertion > > > > 00448670 <_qemu_coroutine_switch>: > > 448670: 53 push %ebx > > 448671: 83 ec 18 sub $0x18,%esp > > 448674: c7 04 24 a8 62 6d 00 movl $0x6d62a8,(%esp) > > 44867b: 8b 5c 24 24 mov 0x24(%esp),%ebx > > 44867f: e8 ec 9e 27 00 call 6c2570 > > <___emutls_get_address> > > 448684: 89 18 mov %ebx,(%eax) > > 448686: 8b 44 24 28 mov 0x28(%esp),%eax > > 44868a: 89 43 24 mov %eax,0x24(%ebx) > > 44868d: 8b 43 20 mov 0x20(%ebx),%eax > > 448690: 89 04 24 mov %eax,(%esp) > > 448693: ff 15 c0 5f 83 00 call *0x835fc0 > > 448699: 83 ec 04 sub $0x4,%esp > > 44869c: 8b 44 24 20 mov 0x20(%esp),%eax > > 4486a0: 8b 40 24 mov 0x24(%eax),%eax > > 4486a3: 83 c4 18 add $0x18,%esp > > 4486a6: 5b pop %ebx > > 4486a7: c3 ret > > > > > > patched, works > > > > 00448620 <_qemu_coroutine_switch>: > > 448620: 83 ec 1c sub $0x1c,%esp > > 448623: c7 04 24 a8 62 6d 00 movl $0x6d62a8,(%esp) > > 44862a: 89 5c 24 14 mov %ebx,0x14(%esp) > > 44862e: 8b 5c 24 24 mov 0x24(%esp),%ebx > > 448632: 89 74 24 18 mov %esi,0x18(%esp) > > 448636: e8 25 9f 27 00 call 6c2560 > > <___emutls_get_address> > > 44863b: 8b 30 mov (%eax),%esi > > 44863d: 89 18 mov %ebx,(%eax) > > 44863f: 8b 44 24 28 mov 0x28(%esp),%eax > > 448643: 89 43 24 mov %eax,0x24(%ebx) > > 448646: 8b 43 20 mov 0x20(%ebx),%eax > > 448649: 89 04 24 mov %eax,(%esp) > > 44864c: ff 15 c0 5f 83 00 call *0x835fc0 > > 448652: 8b 46 24 mov 0x24(%esi),%eax > > 448655: 83 ec 04 sub $0x4,%esp > > 448658: 8b 5c 24 14 mov 0x14(%esp),%ebx > > 44865c: 8b 74 24 18 mov 0x18(%esp),%esi > > 448660: 83 c4 1c add $0x1c,%esp > > 448663: c3 ret > > The only difference here is basically that "from" is being saved in > %esi across the calls to __emutls_get_address and SwitchToFiber. But > as Peter found out, the fix really happens because now the compiler > will not inline qemu_coroutine_switch anymore. > > Peter provided another dump, this time from Win64. It is the > coroutine_trampoline with inlined qemu_coroutine_switch: > > 0: push %rdi > 1: push %rsi > 2: push %rbx > 3: sub $0x30,%rsp > 7: mov %rcx,%rbx > a: lea ...(%rip),%rcx > 11: mov ...(%rip),%rax > 18: mov %rax,0x28(%rsp) > 1d: xor %eax,%eax > 1f: callq ___emutls_get_address > 24: mov ...(%rip),%rsi > 2b: mov %rax,%rdi > 2e: nop2 > 30: mov 0x8(%rbx),%rcx # ecx = co->entry_arg > 34: callq *(%rbx) # co->entry(co->entry_arg); > 36: mov 0x10(%rbx),%rax # load co->caller > 3a: mov %rax,(%rdi) # "current" = co->caller; (wrong current!) > 3d: movl $0x2,0x48(%rax) # co->caller->action = COROUTINE_TERMINATE; > 44: mov 0x40(%rax),%rcx # load co->caller->fiber > 48: callq *%rsi > 4a: jmp 30 > 4c: nopl 0x0(%rax) > > Some offsets are incomplete because this is from a .o, so not > linked, but it's already enough to see that ___emutls_get_address > (from "current = to_;" in qemu_coroutine_switch) is being hoisted > outside the loop, which becomes: > > Coroutine *co = co_; > Coroutine **p_current = ¤t; > > while (true) { > co->entry(co->entry_arg); > > CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, co); > CoroutineWin32 *to = DO_UPCAST(CoroutineWin32, base, co->caller); > > *p_current = to_; > > to->action = action; > SwitchToFiber(to->fiber); > return from->action; > } > > This is wrong if co->entry yields out of the coroutine which is then > restarted on another thread. This can happen quite often. Typically. > a coroutine is started when a VCPU thread does bdrv_aio_readv: > > VCPU thread > > main VCPU thread coroutine I/O coroutine > bdrv_aio_readv -----> > start I/O operation > thread_pool_submit_co > <------------ yields > back to emulation > > Then I/O finishes and the thread-pool.c event notifier triggers in > the I/O thread. event_notifier_ready calls thread_pool_co_cb, and > the I/O coroutine now restarts *in another thread*: > > iothread > > main iothread coroutine I/O coroutine (formerly in VCPU thread) > event_notifier_ready > thread_pool_co_cb -----> current = I/O coroutine; > call AIO callback > > But on Win32, because of the bug, the "current" being set here the > current coroutine of the VCPU thread, not the iothread. > > noinline is a good-enough workaround, and quite unlikely to break in > the future. > > Paolo