All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] qemu 1.6.1
@ 2013-10-23  7:39 Michael W. Bombardieri
  2013-10-23  9:00 ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Michael W. Bombardieri @ 2013-10-23  7:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: mb

Hi,

My newly built qemu/win32 binary (v1.6.1) crashes in qemu-system-i386 and qemu-system-x86_64 when
booting from an install CD.

	C:\Program Files\qemu>qemu-system-x86_64 -boot d -vnc 0.0.0.0:20 -cdrom NetBSD-6.1.2-amd64.iso
	Assertion failed: qemu_in_coroutine(), file qemu-coroutine-lock.c, line 99

	This application has requested the Runtime to terminate it in an unusual way.
	Please contact the application's support team for more information.

I noticed that qemu-system-sparc still booted OpenBSD/sparc 5.3 install CD correctly.
No further info at this stage.
Any ideas?

- Michael

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 1.6.1
  2013-10-23  7:39 [Qemu-devel] qemu 1.6.1 Michael W. Bombardieri
@ 2013-10-23  9:00 ` Paolo Bonzini
  2013-10-23 20:26   ` Stefan Weil
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2013-10-23  9:00 UTC (permalink / raw)
  To: Michael W. Bombardieri; +Cc: qemu-devel

Il 23/10/2013 08:39, Michael W. Bombardieri ha scritto:
> Hi,
> 
> My newly built qemu/win32 binary (v1.6.1) crashes in qemu-system-i386 and qemu-system-x86_64 when
> booting from an install CD.
> 
> 	C:\Program Files\qemu>qemu-system-x86_64 -boot d -vnc 0.0.0.0:20 -cdrom NetBSD-6.1.2-amd64.iso
> 	Assertion failed: qemu_in_coroutine(), file qemu-coroutine-lock.c, line 99
> 
> 	This application has requested the Runtime to terminate it in an unusual way.
> 	Please contact the application's support team for more information.
> 
> I noticed that qemu-system-sparc still booted OpenBSD/sparc 5.3 install CD correctly.
> No further info at this stage.
> Any ideas?

It's a known problem that not everyone can reproduce.  Please compile
with --disable-coroutine-pool on the configure command line.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 1.6.1
  2013-10-23  9:00 ` Paolo Bonzini
@ 2013-10-23 20:26   ` Stefan Weil
  2013-10-24 10:38     ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Weil @ 2013-10-23 20:26 UTC (permalink / raw)
  To: Paolo Bonzini, Michael W. Bombardieri; +Cc: qemu-devel

Am 23.10.2013 11:00, schrieb Paolo Bonzini:
> Il 23/10/2013 08:39, Michael W. Bombardieri ha scritto:
>> Hi,
>>
>> My newly built qemu/win32 binary (v1.6.1) crashes in qemu-system-i386 and qemu-system-x86_64 when
>> booting from an install CD.
>>
>> 	C:\Program Files\qemu>qemu-system-x86_64 -boot d -vnc 0.0.0.0:20 -cdrom NetBSD-6.1.2-amd64.iso
>> 	Assertion failed: qemu_in_coroutine(), file qemu-coroutine-lock.c, line 99
>>
>> 	This application has requested the Runtime to terminate it in an unusual way.
>> 	Please contact the application's support team for more information.
>>
>> I noticed that qemu-system-sparc still booted OpenBSD/sparc 5.3 install CD correctly.
>> No further info at this stage.
>> Any ideas?
> It's a known problem that not everyone can reproduce.  Please compile
> with --disable-coroutine-pool on the configure command line.
>
> Paolo

This patch also helps (at least for me, tested native and on Linux / Wine):
http://repo.or.cz/w/qemu/ar7.git/commit/c777d5d62a729fd8b19847aaa0aad3d7a1f73f47

It looks like a compiler problem related to thread local storage
(variable "current").

I recently got several bug reports from a Windows user and included
patches to fix them in
my personal tree http://repo.or.cz/w/qemu/ar7.git. The binaries on
qemu.weilnetz.de
are based on that tree.

Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 1.6.1
  2013-10-23 20:26   ` Stefan Weil
@ 2013-10-24 10:38     ` Paolo Bonzini
  2013-10-24 16:37       ` Stefan Weil
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2013-10-24 10:38 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Michael W. Bombardieri, qemu-devel

Il 23/10/2013 21:26, Stefan Weil ha scritto:
> Am 23.10.2013 11:00, schrieb Paolo Bonzini:
>> Il 23/10/2013 08:39, Michael W. Bombardieri ha scritto:
>>> Hi,
>>>
>>> My newly built qemu/win32 binary (v1.6.1) crashes in qemu-system-i386 and qemu-system-x86_64 when
>>> booting from an install CD.
>>>
>>> 	C:\Program Files\qemu>qemu-system-x86_64 -boot d -vnc 0.0.0.0:20 -cdrom NetBSD-6.1.2-amd64.iso
>>> 	Assertion failed: qemu_in_coroutine(), file qemu-coroutine-lock.c, line 99
>>>
>>> 	This application has requested the Runtime to terminate it in an unusual way.
>>> 	Please contact the application's support team for more information.
>>>
>>> I noticed that qemu-system-sparc still booted OpenBSD/sparc 5.3 install CD correctly.
>>> No further info at this stage.
>>> Any ideas?
>> It's a known problem that not everyone can reproduce.  Please compile
>> with --disable-coroutine-pool on the configure command line.
>>
>> Paolo
> 
> This patch also helps (at least for me, tested native and on Linux / Wine):
> http://repo.or.cz/w/qemu/ar7.git/commit/c777d5d62a729fd8b19847aaa0aad3d7a1f73f47
> 
> It looks like a compiler problem related to thread local storage
> (variable "current").

Ugh.

> I recently got several bug reports from a Windows user and included
> patches to fix them in
> my personal tree http://repo.or.cz/w/qemu/ar7.git. The binaries on
> qemu.weilnetz.de
> are based on that tree.

Does something like

     CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, current);

also work?  Then we can just remove from_.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 1.6.1
  2013-10-24 10:38     ` Paolo Bonzini
@ 2013-10-24 16:37       ` Stefan Weil
  2013-10-24 21:47         ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Weil @ 2013-10-24 16:37 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael W. Bombardieri, qemu-devel, Stefan Hajnoczi

Am 24.10.2013 12:38, schrieb Paolo Bonzini:
> Il 23/10/2013 21:26, Stefan Weil ha scritto:
>> Am 23.10.2013 11:00, schrieb Paolo Bonzini:
>>> Il 23/10/2013 08:39, Michael W. Bombardieri ha scritto:
>>>> Hi,
>>>>
>>>> My newly built qemu/win32 binary (v1.6.1) crashes in qemu-system-i386 and qemu-system-x86_64 when
>>>> booting from an install CD.
>>>>
>>>> 	C:\Program Files\qemu>qemu-system-x86_64 -boot d -vnc 0.0.0.0:20 -cdrom NetBSD-6.1.2-amd64.iso
>>>> 	Assertion failed: qemu_in_coroutine(), file qemu-coroutine-lock.c, line 99
>>>>
>>>> 	This application has requested the Runtime to terminate it in an unusual way.
>>>> 	Please contact the application's support team for more information.
>>>>
>>>> I noticed that qemu-system-sparc still booted OpenBSD/sparc 5.3 install CD correctly.
>>>> No further info at this stage.
>>>> Any ideas?
>>> It's a known problem that not everyone can reproduce.  Please compile
>>> with --disable-coroutine-pool on the configure command line.
>>>
>>> Paolo
>> This patch also helps (at least for me, tested native and on Linux / Wine):
>> http://repo.or.cz/w/qemu/ar7.git/commit/c777d5d62a729fd8b19847aaa0aad3d7a1f73f47
>>
>> It looks like a compiler problem related to thread local storage
>> (variable "current").
> Ugh.
>
>> I recently got several bug reports from a Windows user and included
>> patches to fix them in
>> my personal tree http://repo.or.cz/w/qemu/ar7.git. The binaries on
>> qemu.weilnetz.de
>> are based on that tree.
> Does something like
>
>      CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, current);
>
> also work?  Then we can just remove from_.
>
> Paolo

Yes, that works, too. It also fixes the problem with the assertion
(tested with Wine).

No, we cannot remove from_, because the same interface is also used
for Linux and other hosts which don't have a 'current' variable.
Or we would have to call qemu_coroutine_self() to get the current
coroutine.

Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 1.6.1
  2013-10-24 16:37       ` Stefan Weil
@ 2013-10-24 21:47         ` Paolo Bonzini
  2013-10-26  9:51           ` Stefan Weil
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2013-10-24 21:47 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Michael W. Bombardieri, qemu-devel, Stefan Hajnoczi

Il 24/10/2013 17:37, Stefan Weil ha scritto:
> Yes, that works, too. It also fixes the problem with the assertion
> (tested with Wine).
> 
> No, we cannot remove from_, because the same interface is also used
> for Linux and other hosts which don't have a 'current' variable.
> Or we would have to call qemu_coroutine_self() to get the current
> coroutine.

Yes, I was thinking of using qemu_coroutine_self().

By the way, can you post the two assembly language outputs for just

-     CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, from_);
+     CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, current);

which AIUI works and is enough to fix the bug?

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 1.6.1
  2013-10-24 21:47         ` Paolo Bonzini
@ 2013-10-26  9:51           ` Stefan Weil
  2013-10-27  6:54             ` Paolo Bonzini
  2014-06-23 14:39             ` [Qemu-devel] broken win32 coroutines (was Re: qemu 1.6.1) Paolo Bonzini
  0 siblings, 2 replies; 14+ messages in thread
From: Stefan Weil @ 2013-10-26  9:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael W. Bombardieri, qemu-devel, Stefan Hajnoczi

Am 24.10.2013 23:47, schrieb Paolo Bonzini:
> Il 24/10/2013 17:37, Stefan Weil ha scritto:
>> Yes, that works, too. It also fixes the problem with the assertion
>> (tested with Wine).
>>
>> No, we cannot remove from_, because the same interface is also used
>> for Linux and other hosts which don't have a 'current' variable.
>> Or we would have to call qemu_coroutine_self() to get the current
>> coroutine.
> Yes, I was thinking of using qemu_coroutine_self().
>
> By the way, can you post the two assembly language outputs for just
>
> -     CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, from_);
> +     CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, current);
>
> which AIUI works and is enough to fix the bug?
>
> Paolo

See disassembled code below. I removed compiler option -fstack-protector-all
to simplify the assembler code and tested that the result was not affected
by this removal.

The C and assembler code from the test is also available at
http://qemu.weilnetz.de/test/coroutine-win32/.

Stefan

unpatched QEMU, crash with assertion

00448670 <_qemu_coroutine_switch>:
  448670:       53                      push   %ebx
  448671:       83 ec 18                sub    $0x18,%esp
  448674:       c7 04 24 a8 62 6d 00    movl   $0x6d62a8,(%esp)
  44867b:       8b 5c 24 24             mov    0x24(%esp),%ebx
  44867f:       e8 ec 9e 27 00          call   6c2570
<___emutls_get_address>
  448684:       89 18                   mov    %ebx,(%eax)
  448686:       8b 44 24 28             mov    0x28(%esp),%eax
  44868a:       89 43 24                mov    %eax,0x24(%ebx)
  44868d:       8b 43 20                mov    0x20(%ebx),%eax
  448690:       89 04 24                mov    %eax,(%esp)
  448693:       ff 15 c0 5f 83 00       call   *0x835fc0
  448699:       83 ec 04                sub    $0x4,%esp
  44869c:       8b 44 24 20             mov    0x20(%esp),%eax
  4486a0:       8b 40 24                mov    0x24(%eax),%eax
  4486a3:       83 c4 18                add    $0x18,%esp
  4486a6:       5b                      pop    %ebx
  4486a7:       c3                      ret   


patched, works

00448620 <_qemu_coroutine_switch>:
  448620:       83 ec 1c                sub    $0x1c,%esp
  448623:       c7 04 24 a8 62 6d 00    movl   $0x6d62a8,(%esp)
  44862a:       89 5c 24 14             mov    %ebx,0x14(%esp)
  44862e:       8b 5c 24 24             mov    0x24(%esp),%ebx
  448632:       89 74 24 18             mov    %esi,0x18(%esp)
  448636:       e8 25 9f 27 00          call   6c2560
<___emutls_get_address>
  44863b:       8b 30                   mov    (%eax),%esi
  44863d:       89 18                   mov    %ebx,(%eax)
  44863f:       8b 44 24 28             mov    0x28(%esp),%eax
  448643:       89 43 24                mov    %eax,0x24(%ebx)
  448646:       8b 43 20                mov    0x20(%ebx),%eax
  448649:       89 04 24                mov    %eax,(%esp)
  44864c:       ff 15 c0 5f 83 00       call   *0x835fc0
  448652:       8b 46 24                mov    0x24(%esi),%eax
  448655:       83 ec 04                sub    $0x4,%esp
  448658:       8b 5c 24 14             mov    0x14(%esp),%ebx
  44865c:       8b 74 24 18             mov    0x18(%esp),%esi
  448660:       83 c4 1c                add    $0x1c,%esp
  448663:       c3                      ret   

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 1.6.1
  2013-10-26  9:51           ` Stefan Weil
@ 2013-10-27  6:54             ` Paolo Bonzini
  2013-10-27 10:44               ` Stefan Weil
  2013-10-27 15:38               ` Stefan Weil
  2014-06-23 14:39             ` [Qemu-devel] broken win32 coroutines (was Re: qemu 1.6.1) Paolo Bonzini
  1 sibling, 2 replies; 14+ messages in thread
From: Paolo Bonzini @ 2013-10-27  6:54 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Michael W. Bombardieri, qemu-devel, Stefan Hajnoczi

Il 26/10/2013 11:51, Stefan Weil ha scritto:
> Am 24.10.2013 23:47, schrieb Paolo Bonzini:
>> Il 24/10/2013 17:37, Stefan Weil ha scritto:
>>> Yes, that works, too. It also fixes the problem with the assertion
>>> (tested with Wine).
>>>
>>> No, we cannot remove from_, because the same interface is also used
>>> for Linux and other hosts which don't have a 'current' variable.
>>> Or we would have to call qemu_coroutine_self() to get the current
>>> coroutine.
>> Yes, I was thinking of using qemu_coroutine_self().
>>
>> By the way, can you post the two assembly language outputs for just
>>
>> -     CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, from_);
>> +     CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, current);
>>
>> which AIUI works and is enough to fix the bug?
>>
>> Paolo
> 
> See disassembled code below. I removed compiler option -fstack-protector-all
> to simplify the assembler code and tested that the result was not affected
> by this removal.
> 
> The C and assembler code from the test is also available at
> http://qemu.weilnetz.de/test/coroutine-win32/.

Here is the code with annotations

     broken                                   works
  -------------------------------------------------------------------------
     push   %ebx
     sub    $0x18,%esp                        sub    $0x1c,%esp               
                                              mov    %ebx,0x14(%esp)          
                                              mov    %esi,0x18(%esp)          
                            
     movl   $0x6d62a8,(%esp)                  movl   $0x6d62a8,(%esp)         
     mov    0x24(%esp),%ebx                   mov    0x24(%esp),%ebx              ebx = to;
     call   ___emutls_get_address             call   ___emutls_get_address        eax = &current;
                            
                                              mov    (%eax),%esi                  esi = current;
                            
     mov    %ebx,(%eax)                       mov    %ebx,(%eax)                  current = to;

     mov    0x28(%esp),%eax                   mov    0x28(%esp),%eax              eax = action
     mov    %eax,0x24(%ebx)                   mov    %eax,0x24(%ebx)              to->action = action
     mov    0x20(%ebx),%eax                   mov    0x20(%ebx),%eax              eax = to->fiber
     mov    %eax,(%esp)                       mov    %eax,(%esp)                  "push" to->fiber
     call   *0x835fc0                         call   *0x835fc0                    SwitchToFiber(to->fiber)
     sub    $0x4,%esp                         sub    $0x4,%esp                    undo PASCAL calling convention
                            
**   mov    0x20(%esp),%eax                                                       eax = from
     mov    0x24(%eax),%eax                   mov    0x24(%esi),%eax              eax = from->action
                            
                                              mov    0x14(%esp),%ebx          
                                              mov    0x18(%esp),%esi          
     add    $0x18,%esp                        add    $0x1c,%esp               
     pop    %ebx                                                              
     ret                                      ret                             


I think the problem is that 0x20(%esp) gets somehow corrupted at the
instruction I highlighted with **.

The simplest fix then would be to add a barrier() before and after
SwitchToFiber.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 1.6.1
  2013-10-27  6:54             ` Paolo Bonzini
@ 2013-10-27 10:44               ` Stefan Weil
  2013-10-27 15:38               ` Stefan Weil
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Weil @ 2013-10-27 10:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael W. Bombardieri, qemu-devel, Stefan Hajnoczi

Am 27.10.2013 07:54, schrieb Paolo Bonzini:
> Here is the code with annotations
>
>      broken                                   works
>   -------------------------------------------------------------------------
>      push   %ebx
>      sub    $0x18,%esp                        sub    $0x1c,%esp               
>                                               mov    %ebx,0x14(%esp)          
>                                               mov    %esi,0x18(%esp)          
>                             
>      movl   $0x6d62a8,(%esp)                  movl   $0x6d62a8,(%esp)         
>      mov    0x24(%esp),%ebx                   mov    0x24(%esp),%ebx              ebx = to;
>      call   ___emutls_get_address             call   ___emutls_get_address        eax = &current;
>                             
>                                               mov    (%eax),%esi                  esi = current;
>                             
>      mov    %ebx,(%eax)                       mov    %ebx,(%eax)                  current = to;
>
>      mov    0x28(%esp),%eax                   mov    0x28(%esp),%eax              eax = action
>      mov    %eax,0x24(%ebx)                   mov    %eax,0x24(%ebx)              to->action = action
>      mov    0x20(%ebx),%eax                   mov    0x20(%ebx),%eax              eax = to->fiber
>      mov    %eax,(%esp)                       mov    %eax,(%esp)                  "push" to->fiber
>      call   *0x835fc0                         call   *0x835fc0                    SwitchToFiber(to->fiber)
>      sub    $0x4,%esp                         sub    $0x4,%esp                    undo PASCAL calling convention
>                             
> **   mov    0x20(%esp),%eax                                                       eax = from
>      mov    0x24(%eax),%eax                   mov    0x24(%esi),%eax              eax = from->action
>                             
>                                               mov    0x14(%esp),%ebx          
>                                               mov    0x18(%esp),%esi          
>      add    $0x18,%esp                        add    $0x1c,%esp               
>      pop    %ebx                                                              
>      ret                                      ret                             
>
>
> I think the problem is that 0x20(%esp) gets somehow corrupted at the
> instruction I highlighted with **.
>
> The simplest fix then would be to add a barrier() before and after
> SwitchToFiber.
>
> Paolo

I tried adding two barrier() statements around SwitchToFiber().
That change did not result in different assembler code (=> unchanged
behaviour, QEMU still raises an assertion).

Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] qemu 1.6.1
  2013-10-27  6:54             ` Paolo Bonzini
  2013-10-27 10:44               ` Stefan Weil
@ 2013-10-27 15:38               ` Stefan Weil
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Weil @ 2013-10-27 15:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael W. Bombardieri, qemu-devel, Stefan Hajnoczi

Am 27.10.2013 07:54, schrieb Paolo Bonzini:
>
> I think the problem is that 0x20(%esp) gets somehow corrupted at the
> instruction I highlighted with **.
>
> The simplest fix then would be to add a barrier() before and after
> SwitchToFiber.
>
> Paolo

I added some debugging output (see code at
http://qemu.weilnetz.de/test/coroutine-win32/). The result with pointer
values replaced by function names is shown below.

There are two threads (WinMain, qemu_tcg_cpu_thread_fn) and one
coroutine (bdrv_rw_co_entry).

Stefan


qemu_coroutine_self:WinMain
qemu_in_coroutine:WinMain,null
qemu_coroutine_new:bdrv_rw_co_entry
qemu_coroutine_create,co=bdrv_rw_co_entry
qemu_coroutine_enter(bdrv_rw_co_entry,...),self=WinMain
coroutine_swap(WinMain,bdrv_rw_co_entry)
qemu_coroutine_yield,self=bdrv_rw_co_entry,to=WinMain
coroutine_swap(bdrv_rw_co_entry,WinMain)
qemu_coroutine_enter(bdrv_rw_co_entry,...),self=WinMain
coroutine_swap(WinMain,bdrv_rw_co_entry)
qemu_in_coroutine:bdrv_rw_co_entry,WinMain
# active coroutine bdrv_rw_co_entry is deleted
coroutine_delete(bdrv_rw_co_entry)
qemu_coroutine_self:qemu_tcg_cpu_thread_fn
qemu_in_coroutine:qemu_tcg_cpu_thread_fn,null
qemu_coroutine_create,co=bdrv_rw_co_entry
qemu_coroutine_enter(bdrv_rw_co_entry,...),self=qemu_tcg_cpu_thread_fn
coroutine_swap(qemu_tcg_cpu_thread_fn,bdrv_rw_co_entry)
qemu_coroutine_yield,self=bdrv_rw_co_entry,to=qemu_tcg_cpu_thread_fn
coroutine_swap(bdrv_rw_co_entry,qemu_tcg_cpu_thread_fn)
qemu_coroutine_enter(bdrv_rw_co_entry,...),self=qemu_tcg_cpu_thread_fn
coroutine_swap(qemu_tcg_cpu_thread_fn,bdrv_rw_co_entry)
qemu_in_coroutine:bdrv_rw_co_entry,qemu_tcg_cpu_thread_fn
qemu_in_coroutine:bdrv_rw_co_entry,qemu_tcg_cpu_thread_fn
# active coroutine bdrv_rw_co_entry is deleted
coroutine_delete(bdrv_rw_co_entry)
# now we are still in deleted coroutine bdrv_rw_co_entry
qemu_in_coroutine:bdrv_rw_co_entry,null
qemu_coroutine_create,co=bdrv_rw_co_entry
qemu_coroutine_enter(bdrv_rw_co_entry,...),self=bdrv_rw_co_entry
coroutine_swap(bdrv_rw_co_entry,bdrv_rw_co_entry)
qemu_coroutine_yield,self=bdrv_rw_co_entry,to=bdrv_rw_co_entry
coroutine_swap(bdrv_rw_co_entry,bdrv_rw_co_entry)
qemu_in_coroutine:bdrv_rw_co_entry,null

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] broken win32 coroutines (was Re: qemu 1.6.1)
  2013-10-26  9:51           ` Stefan Weil
  2013-10-27  6:54             ` Paolo Bonzini
@ 2014-06-23 14:39             ` Paolo Bonzini
  2014-06-24  1:41               ` Michael W. Bombardieri
  1 sibling, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2014-06-23 14:39 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Michael W. Bombardieri, qemu-devel, Stefan Hajnoczi

Il 26/10/2013 11:51, Stefan Weil ha scritto:
> unpatched QEMU, crash with assertion
> 
> 00448670 <_qemu_coroutine_switch>:
>   448670:       53                      push   %ebx
>   448671:       83 ec 18                sub    $0x18,%esp
>   448674:       c7 04 24 a8 62 6d 00    movl   $0x6d62a8,(%esp)
>   44867b:       8b 5c 24 24             mov    0x24(%esp),%ebx
>   44867f:       e8 ec 9e 27 00          call   6c2570
> <___emutls_get_address>
>   448684:       89 18                   mov    %ebx,(%eax)
>   448686:       8b 44 24 28             mov    0x28(%esp),%eax
>   44868a:       89 43 24                mov    %eax,0x24(%ebx)
>   44868d:       8b 43 20                mov    0x20(%ebx),%eax
>   448690:       89 04 24                mov    %eax,(%esp)
>   448693:       ff 15 c0 5f 83 00       call   *0x835fc0
>   448699:       83 ec 04                sub    $0x4,%esp
>   44869c:       8b 44 24 20             mov    0x20(%esp),%eax
>   4486a0:       8b 40 24                mov    0x24(%eax),%eax
>   4486a3:       83 c4 18                add    $0x18,%esp
>   4486a6:       5b                      pop    %ebx
>   4486a7:       c3                      ret   
> 
> 
> patched, works
> 
> 00448620 <_qemu_coroutine_switch>:
>   448620:       83 ec 1c                sub    $0x1c,%esp
>   448623:       c7 04 24 a8 62 6d 00    movl   $0x6d62a8,(%esp)
>   44862a:       89 5c 24 14             mov    %ebx,0x14(%esp)
>   44862e:       8b 5c 24 24             mov    0x24(%esp),%ebx
>   448632:       89 74 24 18             mov    %esi,0x18(%esp)
>   448636:       e8 25 9f 27 00          call   6c2560
> <___emutls_get_address>
>   44863b:       8b 30                   mov    (%eax),%esi
>   44863d:       89 18                   mov    %ebx,(%eax)
>   44863f:       8b 44 24 28             mov    0x28(%esp),%eax
>   448643:       89 43 24                mov    %eax,0x24(%ebx)
>   448646:       8b 43 20                mov    0x20(%ebx),%eax
>   448649:       89 04 24                mov    %eax,(%esp)
>   44864c:       ff 15 c0 5f 83 00       call   *0x835fc0
>   448652:       8b 46 24                mov    0x24(%esi),%eax
>   448655:       83 ec 04                sub    $0x4,%esp
>   448658:       8b 5c 24 14             mov    0x14(%esp),%ebx
>   44865c:       8b 74 24 18             mov    0x18(%esp),%esi
>   448660:       83 c4 1c                add    $0x1c,%esp
>   448663:       c3                      ret   

The only difference here is basically that "from" is being saved in 
%esi across the calls to __emutls_get_address and SwitchToFiber.  But 
as Peter found out, the fix really happens because now the compiler 
will not inline qemu_coroutine_switch anymore.

Peter provided another dump, this time from Win64.  It is the
coroutine_trampoline with inlined qemu_coroutine_switch:

   0:   push   %rdi
   1:   push   %rsi
   2:   push   %rbx
   3:   sub    $0x30,%rsp
   7:   mov    %rcx,%rbx
   a:   lea    ...(%rip),%rcx        
  11:   mov    ...(%rip),%rax
  18:   mov    %rax,0x28(%rsp)
  1d:   xor    %eax,%eax
  1f:   callq  ___emutls_get_address
  24:   mov    ...(%rip),%rsi
  2b:   mov    %rax,%rdi
  2e:   nop2
  30:   mov    0x8(%rbx),%rcx        # ecx = co->entry_arg
  34:   callq  *(%rbx)               # co->entry(co->entry_arg);
  36:   mov    0x10(%rbx),%rax       # load co->caller
  3a:   mov    %rax,(%rdi)           # "current" = co->caller; (wrong current!)
  3d:   movl   $0x2,0x48(%rax)       # co->caller->action = COROUTINE_TERMINATE;
  44:   mov    0x40(%rax),%rcx       # load co->caller->fiber
  48:   callq  *%rsi
  4a:   jmp    30 <coroutine_trampoline+0x30>
  4c:   nopl   0x0(%rax)

Some offsets are incomplete because this is from a .o, so not
linked, but it's already enough to see that ___emutls_get_address
(from "current = to_;" in qemu_coroutine_switch) is being hoisted
outside the loop, which becomes:

    Coroutine *co = co_;
    Coroutine **p_current = &current;

    while (true) {
        co->entry(co->entry_arg);

        CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, co);
        CoroutineWin32 *to = DO_UPCAST(CoroutineWin32, base, co->caller);

        *p_current = to_;

        to->action = action;
        SwitchToFiber(to->fiber);
        return from->action;
    }

This is wrong if co->entry yields out of the coroutine which is then
restarted on another thread.  This can happen quite often.  Typically.
a coroutine is started when a VCPU thread does bdrv_aio_readv:

     VCPU thread

     main VCPU thread coroutine      I/O coroutine
        bdrv_aio_readv ----->
                                     start I/O operation
                                       thread_pool_submit_co
                       <------------ yields
        back to emulation

Then I/O finishes and the thread-pool.c event notifier triggers in
the I/O thread.  event_notifier_ready calls thread_pool_co_cb, and
the I/O coroutine now restarts *in another thread*:

     iothread

     main iothread coroutine         I/O coroutine (formerly in VCPU thread)      
        event_notifier_ready
          thread_pool_co_cb ----->   current = I/O coroutine;
                                     call AIO callback

But on Win32, because of the bug, the "current" being set here the
current coroutine of the VCPU thread, not the iothread.

noinline is a good-enough workaround, and quite unlikely to break in
the future.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] broken win32 coroutines (was Re: qemu 1.6.1)
  2014-06-23 14:39             ` [Qemu-devel] broken win32 coroutines (was Re: qemu 1.6.1) Paolo Bonzini
@ 2014-06-24  1:41               ` Michael W. Bombardieri
  2014-06-24  5:22                 ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Michael W. Bombardieri @ 2014-06-24  1:41 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Weil, Prashant Satish, qemu-devel, Stefan Hajnoczi

Hi,

Thanks for adding me to this thread. 
I am not familiar with the qemu source code but I am aware
of the coroutine crash and I can test a patch if you have
one.

qemu 1.5.1 was the last version I was able to build and use
on win32. Later versions build without error but exhibit the
coroutine crash. 
qemu binaries built with the coroutine feature disabled are
to slow to use.

- Michael


On Mon, Jun 23, 2014 at 04:39:30PM +0200, Paolo Bonzini wrote:
> Il 26/10/2013 11:51, Stefan Weil ha scritto:
> > unpatched QEMU, crash with assertion
> > 
> > 00448670 <_qemu_coroutine_switch>:
> >   448670:       53                      push   %ebx
> >   448671:       83 ec 18                sub    $0x18,%esp
> >   448674:       c7 04 24 a8 62 6d 00    movl   $0x6d62a8,(%esp)
> >   44867b:       8b 5c 24 24             mov    0x24(%esp),%ebx
> >   44867f:       e8 ec 9e 27 00          call   6c2570
> > <___emutls_get_address>
> >   448684:       89 18                   mov    %ebx,(%eax)
> >   448686:       8b 44 24 28             mov    0x28(%esp),%eax
> >   44868a:       89 43 24                mov    %eax,0x24(%ebx)
> >   44868d:       8b 43 20                mov    0x20(%ebx),%eax
> >   448690:       89 04 24                mov    %eax,(%esp)
> >   448693:       ff 15 c0 5f 83 00       call   *0x835fc0
> >   448699:       83 ec 04                sub    $0x4,%esp
> >   44869c:       8b 44 24 20             mov    0x20(%esp),%eax
> >   4486a0:       8b 40 24                mov    0x24(%eax),%eax
> >   4486a3:       83 c4 18                add    $0x18,%esp
> >   4486a6:       5b                      pop    %ebx
> >   4486a7:       c3                      ret   
> > 
> > 
> > patched, works
> > 
> > 00448620 <_qemu_coroutine_switch>:
> >   448620:       83 ec 1c                sub    $0x1c,%esp
> >   448623:       c7 04 24 a8 62 6d 00    movl   $0x6d62a8,(%esp)
> >   44862a:       89 5c 24 14             mov    %ebx,0x14(%esp)
> >   44862e:       8b 5c 24 24             mov    0x24(%esp),%ebx
> >   448632:       89 74 24 18             mov    %esi,0x18(%esp)
> >   448636:       e8 25 9f 27 00          call   6c2560
> > <___emutls_get_address>
> >   44863b:       8b 30                   mov    (%eax),%esi
> >   44863d:       89 18                   mov    %ebx,(%eax)
> >   44863f:       8b 44 24 28             mov    0x28(%esp),%eax
> >   448643:       89 43 24                mov    %eax,0x24(%ebx)
> >   448646:       8b 43 20                mov    0x20(%ebx),%eax
> >   448649:       89 04 24                mov    %eax,(%esp)
> >   44864c:       ff 15 c0 5f 83 00       call   *0x835fc0
> >   448652:       8b 46 24                mov    0x24(%esi),%eax
> >   448655:       83 ec 04                sub    $0x4,%esp
> >   448658:       8b 5c 24 14             mov    0x14(%esp),%ebx
> >   44865c:       8b 74 24 18             mov    0x18(%esp),%esi
> >   448660:       83 c4 1c                add    $0x1c,%esp
> >   448663:       c3                      ret   
> 
> The only difference here is basically that "from" is being saved in 
> %esi across the calls to __emutls_get_address and SwitchToFiber.  But 
> as Peter found out, the fix really happens because now the compiler 
> will not inline qemu_coroutine_switch anymore.
> 
> Peter provided another dump, this time from Win64.  It is the
> coroutine_trampoline with inlined qemu_coroutine_switch:
> 
>    0:   push   %rdi
>    1:   push   %rsi
>    2:   push   %rbx
>    3:   sub    $0x30,%rsp
>    7:   mov    %rcx,%rbx
>    a:   lea    ...(%rip),%rcx        
>   11:   mov    ...(%rip),%rax
>   18:   mov    %rax,0x28(%rsp)
>   1d:   xor    %eax,%eax
>   1f:   callq  ___emutls_get_address
>   24:   mov    ...(%rip),%rsi
>   2b:   mov    %rax,%rdi
>   2e:   nop2
>   30:   mov    0x8(%rbx),%rcx        # ecx = co->entry_arg
>   34:   callq  *(%rbx)               # co->entry(co->entry_arg);
>   36:   mov    0x10(%rbx),%rax       # load co->caller
>   3a:   mov    %rax,(%rdi)           # "current" = co->caller; (wrong current!)
>   3d:   movl   $0x2,0x48(%rax)       # co->caller->action = COROUTINE_TERMINATE;
>   44:   mov    0x40(%rax),%rcx       # load co->caller->fiber
>   48:   callq  *%rsi
>   4a:   jmp    30 <coroutine_trampoline+0x30>
>   4c:   nopl   0x0(%rax)
> 
> Some offsets are incomplete because this is from a .o, so not
> linked, but it's already enough to see that ___emutls_get_address
> (from "current = to_;" in qemu_coroutine_switch) is being hoisted
> outside the loop, which becomes:
> 
>     Coroutine *co = co_;
>     Coroutine **p_current = &current;
> 
>     while (true) {
>         co->entry(co->entry_arg);
> 
>         CoroutineWin32 *from = DO_UPCAST(CoroutineWin32, base, co);
>         CoroutineWin32 *to = DO_UPCAST(CoroutineWin32, base, co->caller);
> 
>         *p_current = to_;
> 
>         to->action = action;
>         SwitchToFiber(to->fiber);
>         return from->action;
>     }
> 
> This is wrong if co->entry yields out of the coroutine which is then
> restarted on another thread.  This can happen quite often.  Typically.
> a coroutine is started when a VCPU thread does bdrv_aio_readv:
> 
>      VCPU thread
> 
>      main VCPU thread coroutine      I/O coroutine
>         bdrv_aio_readv ----->
>                                      start I/O operation
>                                        thread_pool_submit_co
>                        <------------ yields
>         back to emulation
> 
> Then I/O finishes and the thread-pool.c event notifier triggers in
> the I/O thread.  event_notifier_ready calls thread_pool_co_cb, and
> the I/O coroutine now restarts *in another thread*:
> 
>      iothread
> 
>      main iothread coroutine         I/O coroutine (formerly in VCPU thread)      
>         event_notifier_ready
>           thread_pool_co_cb ----->   current = I/O coroutine;
>                                      call AIO callback
> 
> But on Win32, because of the bug, the "current" being set here the
> current coroutine of the VCPU thread, not the iothread.
> 
> noinline is a good-enough workaround, and quite unlikely to break in
> the future.
> 
> Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] broken win32 coroutines (was Re: qemu 1.6.1)
  2014-06-24  1:41               ` Michael W. Bombardieri
@ 2014-06-24  5:22                 ` Paolo Bonzini
  2014-06-25  6:48                   ` Michael W. Bombardieri
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2014-06-24  5:22 UTC (permalink / raw)
  To: Michael W. Bombardieri
  Cc: Stefan Weil, Prashant Satish, qemu-devel, Stefan Hajnoczi

Il 24/06/2014 03:41, Michael W. Bombardieri ha scritto:
> Hi,
>
> Thanks for adding me to this thread.
> I am not familiar with the qemu source code but I am aware
> of the coroutine crash and I can test a patch if you have
> one.
>
> qemu 1.5.1 was the last version I was able to build and use
> on win32. Later versions build without error but exhibit the
> coroutine crash.
> qemu binaries built with the coroutine feature disabled are
> to slow to use.

Peter posted one here:

http://article.gmane.org/gmane.comp.emulators.qemu/282189/raw

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] broken win32 coroutines (was Re: qemu 1.6.1)
  2014-06-24  5:22                 ` Paolo Bonzini
@ 2014-06-25  6:48                   ` Michael W. Bombardieri
  0 siblings, 0 replies; 14+ messages in thread
From: Michael W. Bombardieri @ 2014-06-25  6:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Weil, Prashant Satish, qemu-devel, Stefan Hajnoczi

I applied the patch manually on qemu releases 1.7.1 and 2.0.0.

Configuration

* Host system is Lenovo ThinkPad T520 (windows 7)
* qemu built with --disable-sdl and --disable-gtk
* GCC version is 4.6.1 (MinGW win32)

Test (same steps/results for both 1.7.1 and 2.0.0)

* Created qcow2 disk image using qemu-img
* Started qemu-system-i386 with new disk image and OpenBSD/i386 5.5 install ISO
* Connected VM display using VNC
* VM booted successfully and install script started
* Install process too slow; gave up after >60 minutes

Following the same testing steps against a qemu 1.5.1 binary,
the OpenBSD 5.5 installation finished in 3 minutes on my system.

I haven't tried to profile the code.

- Michael


On Tue, Jun 24, 2014 at 07:22:59AM +0200, Paolo Bonzini wrote:
> Il 24/06/2014 03:41, Michael W. Bombardieri ha scritto:
> >Hi,
> >
> >Thanks for adding me to this thread.
> >I am not familiar with the qemu source code but I am aware
> >of the coroutine crash and I can test a patch if you have
> >one.
> >
> >qemu 1.5.1 was the last version I was able to build and use
> >on win32. Later versions build without error but exhibit the
> >coroutine crash.
> >qemu binaries built with the coroutine feature disabled are
> >to slow to use.
> 
> Peter posted one here:
> 
> http://article.gmane.org/gmane.comp.emulators.qemu/282189/raw
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-06-25  7:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-23  7:39 [Qemu-devel] qemu 1.6.1 Michael W. Bombardieri
2013-10-23  9:00 ` Paolo Bonzini
2013-10-23 20:26   ` Stefan Weil
2013-10-24 10:38     ` Paolo Bonzini
2013-10-24 16:37       ` Stefan Weil
2013-10-24 21:47         ` Paolo Bonzini
2013-10-26  9:51           ` Stefan Weil
2013-10-27  6:54             ` Paolo Bonzini
2013-10-27 10:44               ` Stefan Weil
2013-10-27 15:38               ` Stefan Weil
2014-06-23 14:39             ` [Qemu-devel] broken win32 coroutines (was Re: qemu 1.6.1) Paolo Bonzini
2014-06-24  1:41               ` Michael W. Bombardieri
2014-06-24  5:22                 ` Paolo Bonzini
2014-06-25  6:48                   ` Michael W. Bombardieri

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.