All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] qemu-2.8-rc4 is broken
@ 2016-12-19  9:47 Pavel Dovgalyuk
  2016-12-20  7:45 ` Pavel Dovgalyuk
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Dovgalyuk @ 2016-12-19  9:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, peter.maydell, 'Pavel Dovgalyuk'

Hi!

 

I encountered the following bug with the latest version of QEMU.

I use windows host and start qemu with the following command line:

qemu-system-i386.exe -soundhw ac97 -snapshot -hda disk.qcow2 -net none

 

Guest system is Windows XP 32-bit. It founds new hardware (including audio controller)

and I start playing mp3 file.

After seconds of playing qemu fails with an exception.

 

I tried to bisect between 2.7 and 2.8, but bug is not stable.

It manifested itself at commits "68701de1362b29fd6941a2021e9393ddbe60edd8" and
"6a928d25b6d8bc3729c3d28326c6db13b9481059".

 

Pavel Dovgalyuk

 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2016-12-19  9:47 [Qemu-devel] qemu-2.8-rc4 is broken Pavel Dovgalyuk
@ 2016-12-20  7:45 ` Pavel Dovgalyuk
  2016-12-20 10:21   ` Stefan Hajnoczi
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Dovgalyuk @ 2016-12-20  7:45 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, peter.maydell, 'Pavel Dovgalyuk'

It also fails much earlier when I enable logs with "-d int -D log".

Here is backtrace for this failure:

 

#0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()

   from /c/Windows/SYSTEM32/ntdll.dll

#1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()

   from /c/Windows/SYSTEM32/ntdll.dll

#2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()

   from /c/Windows/SYSTEM32/ntdll.dll

#3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()

   from /c/Windows/SYSTEM32/ntdll.dll

#4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll

#5  0x000000000040b6b4 in invalidate_page_bitmap (p=0x10c33498, p=0x10c33498)

    at D:/Projects/QEMU/qemu/translate-all.c:880

#6  page_flush_tb_1 (level=level@entry=0, lp=0x54f4fb0)

    at D:/Projects/QEMU/qemu/translate-all.c:899

#7  0x000000000040b6ee in page_flush_tb_1 (level=1, lp=0xac8ac0 <l1_map>)

    at D:/Projects/QEMU/qemu/translate-all.c:905

#8  0x000000000040b7b3 in page_flush_tb ()

    at D:/Projects/QEMU/qemu/translate-all.c:915

#9  do_tb_flush (cpu=<optimized out>, tb_flush_count=...)

    at D:/Projects/QEMU/qemu/translate-all.c:953

#10 0x0000000000519ac1 in process_queued_cpu_work (cpu=0x5412fd0)

    at cpus-common.c:338

#11 0x0000000000439761 in qemu_wait_io_event_common (cpu=0x5412fd0)

    at D:/Projects/QEMU/qemu/cpus.c:942

#12 qemu_tcg_wait_io_event (cpu=<optimized out>)

    at D:/Projects/QEMU/qemu/cpus.c:957

#13 qemu_tcg_cpu_thread_fn (arg=arg@entry=0x5412fd0)

    at D:/Projects/QEMU/qemu/cpus.c:1216

#14 0x000000000072c285 in win32_start_routine (arg=0x543ba70)

    at util/qemu-thread-win32.c:406

#15 0x000007fefdc8415f in srand () from /c/Windows/system32/msvcrt.dll

#16 0x000007fefdc86ebd in msvcrt!_ftime64_s ()

   from /c/Windows/system32/msvcrt.dll

#17 0x0000000076cc59cd in KERNEL32!BaseThreadInitThunk ()

   from /c/Windows/system32/kernel32.dll

#18 0x0000000076dfa561 in ntdll!RtlUserThreadStart ()

   from /c/Windows/SYSTEM32/ntdll.dll

#19 0x0000000000000000 in ?? ()

 

 

 

Another example of backtrace is the following:

 

#0  0x0000000076e8f3b0 in ntdll!RtlUnhandledExceptionFilter ()

   from /c/Windows/SYSTEM32/ntdll.dll

#1  0x0000000076e8f9c6 in ntdll!EtwEnumerateProcessRegGuids ()

   from /c/Windows/SYSTEM32/ntdll.dll

#2  0x0000000076e90592 in ntdll!RtlQueryProcessLockInformation ()

   from /c/Windows/SYSTEM32/ntdll.dll

#3  0x0000000076e92204 in ntdll!RtlLogStackBackTrace ()

   from /c/Windows/SYSTEM32/ntdll.dll

#4  0x0000000076e2d21c in ntdll!RtlIsDosDeviceName_U ()

   from /c/Windows/SYSTEM32/ntdll.dll

#5  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll

#6  0x000000000040c57d in invalidate_page_bitmap (p=<optimized out>,

    p=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:880

#7  tb_invalidate_phys_page_range (start=826113, end=end@entry=826116,

    is_cpu_write_access=is_cpu_write_access@entry=0)

    at D:/Projects/QEMU/qemu/translate-all.c:1526

#8  0x000000000040c5ed in tb_invalidate_phys_range_1 (end=826116,

    start=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:1413

#9  tb_invalidate_phys_range (start=start@entry=826113, end=end@entry=826116)

    at D:/Projects/QEMU/qemu/translate-all.c:1423

#10 0x0000000000402e5f in invalidate_and_set_dirty (mr=mr@entry=0x53fe980,

    addr=<optimized out>, length=<optimized out>)

    at D:/Projects/QEMU/qemu/exec.c:2511

#11 0x0000000000406af7 in cpu_physical_memory_write_rom_internal (

    type=WRITE_DATA, len=3, buf=0x22f141 "", addr=826113,

    as=0xab4280 <address_space_memory>) at D:/Projects/QEMU/qemu/exec.c:2795

#12 cpu_physical_memory_write_rom (as=0xab4280 <address_space_memory>,

    addr=<optimized out>, buf=<optimized out>, len=<optimized out>)

    at D:/Projects/QEMU/qemu/exec.c:2813

#13 0x0000000000470a35 in apic_sync_vapic (s=s@entry=0x507f0a0,

    sync_type=sync_type@entry=4) at D:/Projects/QEMU/qemu/hw/intc/apic.c:125

#14 0x000000000047163e in apic_set_irq (s=0x507f0a0,

    vector_num=<optimized out>, trigger_mode=0)

    at D:/Projects/QEMU/qemu/hw/intc/apic.c:396

#15 0x0000000000471aa3 in apic_bus_deliver (deliver_bitmask=<optimized out>,

    delivery_mode=<optimized out>, vector_num=<optimized out>,

    trigger_mode=<optimized out>) at D:/Projects/QEMU/qemu/hw/intc/apic.c:234

#16 0x0000000000471b1e in apic_deliver_irq (dest=1 '\001',

    dest_mode=1 '\001', delivery_mode=1 '\001', vector_num=163 '\243',

    trigger_mode=0 '\000') at D:/Projects/QEMU/qemu/hw/intc/apic.c:284

#17 0x0000000000471bf2 in apic_send_msi (msi=msi@entry=0x22f320)

    at D:/Projects/QEMU/qemu/hw/intc/apic.c:753

#18 0x0000000000471f76 in apic_mem_writel (opaque=<optimized out>, addr=4100,

    val=419) at D:/Projects/QEMU/qemu/hw/intc/apic.c:768

#19 0x000000000044bcbd in memory_region_oldmmio_write_accessor (mr=0x507f110,

    addr=4100, value=<optimized out>, size=4, shift=0, mask=4294967295,

    attrs=...) at D:/Projects/QEMU/qemu/memory.c:500

#20 0x0000000000448576 in access_with_adjusted_size (addr=addr@entry=4100,

    value=value@entry=0x22f620, size=size@entry=4,

    access_size_min=access_size_min@entry=1,

    access_size_max=access_size_max@entry=4,

    access=access@entry=0x44bc20 <memory_region_oldmmio_write_accessor>,

    mr=mr@entry=0x507f110, attrs=attrs@entry=...)

    at D:/Projects/QEMU/qemu/memory.c:592

#21 0x000000000044cdae in memory_region_dispatch_write (mr=<optimized out>,

    mr@entry=0x507f110, addr=4100, data=data@entry=419, size=<optimized out>,

    size@entry=4, attrs=attrs@entry=...)

    at D:/Projects/QEMU/qemu/memory.c:1336

#22 0x0000000000409f63 in address_space_stl_internal (

    endian=DEVICE_LITTLE_ENDIAN, result=0x0, attrs=..., val=419,

    addr=1756135440, as=0x0) at D:/Projects/QEMU/qemu/exec.c:3433

#23 address_space_stl_le (result=0x0, attrs=..., val=419, addr=1756135440,

    as=0x0) at D:/Projects/QEMU/qemu/exec.c:3470

#24 stl_le_phys (as=as@entry=0xab4280 <address_space_memory>,

    addr=addr@entry=4276097028, val=419) at D:/Projects/QEMU/qemu/exec.c:3488

#25 0x0000000000473941 in ioapic_service (s=0x1182e1d0)

    at D:/Projects/QEMU/qemu/hw/intc/ioapic.c:144

#26 0x000000000059062a in ps2_queue (b=24, opaque=0x11c809d0)

    at hw/input/ps2.c:549

#27 ps2_mouse_send_packet (s=s@entry=0x11c809d0) at hw/input/ps2.c:839

#28 0x0000000000590b51 in ps2_mouse_sync (dev=0x11c809d0)

    at hw/input/ps2.c:927

#29 0x000000000066515a in qemu_input_event_sync_impl () at ui/input.c:351

#30 0x0000000000666917 in sdl_send_mouse_event (dx=<optimized out>,

    dy=<optimized out>, x=<optimized out>, y=<optimized out>, state=0,

    scon=<optimized out>, scon=<optimized out>) at ui/sdl2.c:315

#31 0x0000000000667112 in handle_mousemotion (ev=0x22f970) at ui/sdl2.c:482

#32 sdl2_poll_events (scon=0x1230c260) at ui/sdl2.c:619

#33 0x000000000065f622 in dpy_refresh (s=0x119ba030) at ui/console.c:1560

#34 gui_update (opaque=opaque@entry=0x119ba030) at ui/console.c:200

#35 0x000000000068d60c in timerlist_run_timers (timer_list=0x5022d40)

    at qemu-timer.c:528

#36 0x000000000068d823 in qemu_clock_run_timers (type=<optimized out>)

    at qemu-timer.c:539

#37 qemu_clock_run_all_timers () at qemu-timer.c:653

#38 0x000000000068c94e in main_loop_wait (nonblocking=<optimized out>)

    at main-loop.c:516

#39 0x00000000005023b0 in main_loop () at vl.c:1966

#40 qemu_main (argc=argc@entry=12, argv=argv@entry=0x3a0130,

    envp=envp@entry=0x0) at vl.c:4684

#41 0x00000000005033c8 in SDL_main (argc=argc@entry=12,

    argv=argv@entry=0x3a0130) at vl.c:45

#42 0x000000000074088a in main_utf8 (argv=0x3a0130, argc=<optimized out>)

    at ../src/main/windows/SDL_windows_main.c:126

#43 WinMain (hInst=<optimized out>, hPrev=hPrev@entry=0x0,

    szCmdLine=<optimized out>, sw=<optimized out>)

    at ../src/main/windows/SDL_windows_main.c:189

#44 0x0000000000754862 in main (flags=<optimized out>,

    cmdline=<optimized out>, inst=<optimized out>)

    at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crt0_c.c:18

#45 0x00000000004013ed in __tmainCRTStartup ()

    at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:334

#46 0x00000000004014fb in WinMainCRTStartup ()

    at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:184

 

Pavel Dovgalyuk

 

From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru] 
Sent: Monday, December 19, 2016 12:48 PM
To: qemu-devel@nongnu.org
Cc: pbonzini@redhat.com; peter.maydell@linaro.org; 'Pavel Dovgalyuk'
Subject: qemu-2.8-rc4 is broken

 

Hi!

 

I encountered the following bug with the latest version of QEMU.

I use windows host and start qemu with the following command line:

qemu-system-i386.exe -soundhw ac97 -snapshot -hda disk.qcow2 -net none

 

Guest system is Windows XP 32-bit. It founds new hardware (including audio controller)

and I start playing mp3 file.

After seconds of playing qemu fails with an exception.

 

I tried to bisect between 2.7 and 2.8, but bug is not stable.

It manifested itself at commits "68701de1362b29fd6941a2021e9393ddbe60edd8" and
"6a928d25b6d8bc3729c3d28326c6db13b9481059".

 

Pavel Dovgalyuk

 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2016-12-20  7:45 ` Pavel Dovgalyuk
@ 2016-12-20 10:21   ` Stefan Hajnoczi
  2016-12-20 11:10     ` Pavel Dovgalyuk
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2016-12-20 10:21 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: qemu-devel, pbonzini, 'Pavel Dovgalyuk', peter.maydell

[-- Attachment #1: Type: text/plain, Size: 10602 bytes --]

On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
> It also fails much earlier when I enable logs with "-d int -D log".
> 
> Here is backtrace for this failure:
> 
>  
> 
> #0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
> 
>    from /c/Windows/SYSTEM32/ntdll.dll
> 
> #1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
> 
>    from /c/Windows/SYSTEM32/ntdll.dll
> 
> #2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
> 
>    from /c/Windows/SYSTEM32/ntdll.dll
> 
> #3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
> 
>    from /c/Windows/SYSTEM32/ntdll.dll
> 
> #4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll

Looks like a heap corruption bug since free() is failing.

QEMU 2.8.0 is scheduled for release today.  I have checked that
qemu-system-i386.exe works but without playing an MP3 file in Windows
XP.

I plan to go ahead with the release unless information becomes available
that suggests it affects more than just this one scenario.

> 
> #5  0x000000000040b6b4 in invalidate_page_bitmap (p=0x10c33498, p=0x10c33498)
> 
>     at D:/Projects/QEMU/qemu/translate-all.c:880
> 
> #6  page_flush_tb_1 (level=level@entry=0, lp=0x54f4fb0)
> 
>     at D:/Projects/QEMU/qemu/translate-all.c:899
> 
> #7  0x000000000040b6ee in page_flush_tb_1 (level=1, lp=0xac8ac0 <l1_map>)
> 
>     at D:/Projects/QEMU/qemu/translate-all.c:905
> 
> #8  0x000000000040b7b3 in page_flush_tb ()
> 
>     at D:/Projects/QEMU/qemu/translate-all.c:915
> 
> #9  do_tb_flush (cpu=<optimized out>, tb_flush_count=...)
> 
>     at D:/Projects/QEMU/qemu/translate-all.c:953
> 
> #10 0x0000000000519ac1 in process_queued_cpu_work (cpu=0x5412fd0)
> 
>     at cpus-common.c:338
> 
> #11 0x0000000000439761 in qemu_wait_io_event_common (cpu=0x5412fd0)
> 
>     at D:/Projects/QEMU/qemu/cpus.c:942
> 
> #12 qemu_tcg_wait_io_event (cpu=<optimized out>)
> 
>     at D:/Projects/QEMU/qemu/cpus.c:957
> 
> #13 qemu_tcg_cpu_thread_fn (arg=arg@entry=0x5412fd0)
> 
>     at D:/Projects/QEMU/qemu/cpus.c:1216
> 
> #14 0x000000000072c285 in win32_start_routine (arg=0x543ba70)
> 
>     at util/qemu-thread-win32.c:406
> 
> #15 0x000007fefdc8415f in srand () from /c/Windows/system32/msvcrt.dll
> 
> #16 0x000007fefdc86ebd in msvcrt!_ftime64_s ()
> 
>    from /c/Windows/system32/msvcrt.dll
> 
> #17 0x0000000076cc59cd in KERNEL32!BaseThreadInitThunk ()
> 
>    from /c/Windows/system32/kernel32.dll
> 
> #18 0x0000000076dfa561 in ntdll!RtlUserThreadStart ()
> 
>    from /c/Windows/SYSTEM32/ntdll.dll
> 
> #19 0x0000000000000000 in ?? ()
> 
>  
> 
>  
> 
>  
> 
> Another example of backtrace is the following:
> 
>  
> 
> #0  0x0000000076e8f3b0 in ntdll!RtlUnhandledExceptionFilter ()
> 
>    from /c/Windows/SYSTEM32/ntdll.dll
> 
> #1  0x0000000076e8f9c6 in ntdll!EtwEnumerateProcessRegGuids ()
> 
>    from /c/Windows/SYSTEM32/ntdll.dll
> 
> #2  0x0000000076e90592 in ntdll!RtlQueryProcessLockInformation ()
> 
>    from /c/Windows/SYSTEM32/ntdll.dll
> 
> #3  0x0000000076e92204 in ntdll!RtlLogStackBackTrace ()
> 
>    from /c/Windows/SYSTEM32/ntdll.dll
> 
> #4  0x0000000076e2d21c in ntdll!RtlIsDosDeviceName_U ()
> 
>    from /c/Windows/SYSTEM32/ntdll.dll
> 
> #5  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll
> 
> #6  0x000000000040c57d in invalidate_page_bitmap (p=<optimized out>,
> 
>     p=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:880
> 
> #7  tb_invalidate_phys_page_range (start=826113, end=end@entry=826116,
> 
>     is_cpu_write_access=is_cpu_write_access@entry=0)
> 
>     at D:/Projects/QEMU/qemu/translate-all.c:1526
> 
> #8  0x000000000040c5ed in tb_invalidate_phys_range_1 (end=826116,
> 
>     start=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:1413
> 
> #9  tb_invalidate_phys_range (start=start@entry=826113, end=end@entry=826116)
> 
>     at D:/Projects/QEMU/qemu/translate-all.c:1423
> 
> #10 0x0000000000402e5f in invalidate_and_set_dirty (mr=mr@entry=0x53fe980,
> 
>     addr=<optimized out>, length=<optimized out>)
> 
>     at D:/Projects/QEMU/qemu/exec.c:2511
> 
> #11 0x0000000000406af7 in cpu_physical_memory_write_rom_internal (
> 
>     type=WRITE_DATA, len=3, buf=0x22f141 "", addr=826113,
> 
>     as=0xab4280 <address_space_memory>) at D:/Projects/QEMU/qemu/exec.c:2795
> 
> #12 cpu_physical_memory_write_rom (as=0xab4280 <address_space_memory>,
> 
>     addr=<optimized out>, buf=<optimized out>, len=<optimized out>)
> 
>     at D:/Projects/QEMU/qemu/exec.c:2813
> 
> #13 0x0000000000470a35 in apic_sync_vapic (s=s@entry=0x507f0a0,
> 
>     sync_type=sync_type@entry=4) at D:/Projects/QEMU/qemu/hw/intc/apic.c:125
> 
> #14 0x000000000047163e in apic_set_irq (s=0x507f0a0,
> 
>     vector_num=<optimized out>, trigger_mode=0)
> 
>     at D:/Projects/QEMU/qemu/hw/intc/apic.c:396
> 
> #15 0x0000000000471aa3 in apic_bus_deliver (deliver_bitmask=<optimized out>,
> 
>     delivery_mode=<optimized out>, vector_num=<optimized out>,
> 
>     trigger_mode=<optimized out>) at D:/Projects/QEMU/qemu/hw/intc/apic.c:234
> 
> #16 0x0000000000471b1e in apic_deliver_irq (dest=1 '\001',
> 
>     dest_mode=1 '\001', delivery_mode=1 '\001', vector_num=163 '\243',
> 
>     trigger_mode=0 '\000') at D:/Projects/QEMU/qemu/hw/intc/apic.c:284
> 
> #17 0x0000000000471bf2 in apic_send_msi (msi=msi@entry=0x22f320)
> 
>     at D:/Projects/QEMU/qemu/hw/intc/apic.c:753
> 
> #18 0x0000000000471f76 in apic_mem_writel (opaque=<optimized out>, addr=4100,
> 
>     val=419) at D:/Projects/QEMU/qemu/hw/intc/apic.c:768
> 
> #19 0x000000000044bcbd in memory_region_oldmmio_write_accessor (mr=0x507f110,
> 
>     addr=4100, value=<optimized out>, size=4, shift=0, mask=4294967295,
> 
>     attrs=...) at D:/Projects/QEMU/qemu/memory.c:500
> 
> #20 0x0000000000448576 in access_with_adjusted_size (addr=addr@entry=4100,
> 
>     value=value@entry=0x22f620, size=size@entry=4,
> 
>     access_size_min=access_size_min@entry=1,
> 
>     access_size_max=access_size_max@entry=4,
> 
>     access=access@entry=0x44bc20 <memory_region_oldmmio_write_accessor>,
> 
>     mr=mr@entry=0x507f110, attrs=attrs@entry=...)
> 
>     at D:/Projects/QEMU/qemu/memory.c:592
> 
> #21 0x000000000044cdae in memory_region_dispatch_write (mr=<optimized out>,
> 
>     mr@entry=0x507f110, addr=4100, data=data@entry=419, size=<optimized out>,
> 
>     size@entry=4, attrs=attrs@entry=...)
> 
>     at D:/Projects/QEMU/qemu/memory.c:1336
> 
> #22 0x0000000000409f63 in address_space_stl_internal (
> 
>     endian=DEVICE_LITTLE_ENDIAN, result=0x0, attrs=..., val=419,
> 
>     addr=1756135440, as=0x0) at D:/Projects/QEMU/qemu/exec.c:3433
> 
> #23 address_space_stl_le (result=0x0, attrs=..., val=419, addr=1756135440,
> 
>     as=0x0) at D:/Projects/QEMU/qemu/exec.c:3470
> 
> #24 stl_le_phys (as=as@entry=0xab4280 <address_space_memory>,
> 
>     addr=addr@entry=4276097028, val=419) at D:/Projects/QEMU/qemu/exec.c:3488
> 
> #25 0x0000000000473941 in ioapic_service (s=0x1182e1d0)
> 
>     at D:/Projects/QEMU/qemu/hw/intc/ioapic.c:144
> 
> #26 0x000000000059062a in ps2_queue (b=24, opaque=0x11c809d0)
> 
>     at hw/input/ps2.c:549
> 
> #27 ps2_mouse_send_packet (s=s@entry=0x11c809d0) at hw/input/ps2.c:839
> 
> #28 0x0000000000590b51 in ps2_mouse_sync (dev=0x11c809d0)
> 
>     at hw/input/ps2.c:927
> 
> #29 0x000000000066515a in qemu_input_event_sync_impl () at ui/input.c:351
> 
> #30 0x0000000000666917 in sdl_send_mouse_event (dx=<optimized out>,
> 
>     dy=<optimized out>, x=<optimized out>, y=<optimized out>, state=0,
> 
>     scon=<optimized out>, scon=<optimized out>) at ui/sdl2.c:315
> 
> #31 0x0000000000667112 in handle_mousemotion (ev=0x22f970) at ui/sdl2.c:482
> 
> #32 sdl2_poll_events (scon=0x1230c260) at ui/sdl2.c:619
> 
> #33 0x000000000065f622 in dpy_refresh (s=0x119ba030) at ui/console.c:1560
> 
> #34 gui_update (opaque=opaque@entry=0x119ba030) at ui/console.c:200
> 
> #35 0x000000000068d60c in timerlist_run_timers (timer_list=0x5022d40)
> 
>     at qemu-timer.c:528
> 
> #36 0x000000000068d823 in qemu_clock_run_timers (type=<optimized out>)
> 
>     at qemu-timer.c:539
> 
> #37 qemu_clock_run_all_timers () at qemu-timer.c:653
> 
> #38 0x000000000068c94e in main_loop_wait (nonblocking=<optimized out>)
> 
>     at main-loop.c:516
> 
> #39 0x00000000005023b0 in main_loop () at vl.c:1966
> 
> #40 qemu_main (argc=argc@entry=12, argv=argv@entry=0x3a0130,
> 
>     envp=envp@entry=0x0) at vl.c:4684
> 
> #41 0x00000000005033c8 in SDL_main (argc=argc@entry=12,
> 
>     argv=argv@entry=0x3a0130) at vl.c:45
> 
> #42 0x000000000074088a in main_utf8 (argv=0x3a0130, argc=<optimized out>)
> 
>     at ../src/main/windows/SDL_windows_main.c:126
> 
> #43 WinMain (hInst=<optimized out>, hPrev=hPrev@entry=0x0,
> 
>     szCmdLine=<optimized out>, sw=<optimized out>)
> 
>     at ../src/main/windows/SDL_windows_main.c:189
> 
> #44 0x0000000000754862 in main (flags=<optimized out>,
> 
>     cmdline=<optimized out>, inst=<optimized out>)
> 
>     at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crt0_c.c:18
> 
> #45 0x00000000004013ed in __tmainCRTStartup ()
> 
>     at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:334
> 
> #46 0x00000000004014fb in WinMainCRTStartup ()
> 
>     at C:/repo/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:184
> 
>  
> 
> Pavel Dovgalyuk
> 
>  
> 
> From: Pavel Dovgalyuk [mailto:dovgaluk@ispras.ru] 
> Sent: Monday, December 19, 2016 12:48 PM
> To: qemu-devel@nongnu.org
> Cc: pbonzini@redhat.com; peter.maydell@linaro.org; 'Pavel Dovgalyuk'
> Subject: qemu-2.8-rc4 is broken
> 
>  
> 
> Hi!
> 
>  
> 
> I encountered the following bug with the latest version of QEMU.
> 
> I use windows host and start qemu with the following command line:
> 
> qemu-system-i386.exe -soundhw ac97 -snapshot -hda disk.qcow2 -net none
> 
>  
> 
> Guest system is Windows XP 32-bit. It founds new hardware (including audio controller)
> 
> and I start playing mp3 file.
> 
> After seconds of playing qemu fails with an exception.
> 
>  
> 
> I tried to bisect between 2.7 and 2.8, but bug is not stable.
> 
> It manifested itself at commits "68701de1362b29fd6941a2021e9393ddbe60edd8" and
> "6a928d25b6d8bc3729c3d28326c6db13b9481059".
> 
>  
> 
> Pavel Dovgalyuk
> 
>  
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2016-12-20 10:21   ` Stefan Hajnoczi
@ 2016-12-20 11:10     ` Pavel Dovgalyuk
  2016-12-20 13:20       ` Stefan Hajnoczi
  2016-12-20 14:02       ` Stefan Hajnoczi
  0 siblings, 2 replies; 16+ messages in thread
From: Pavel Dovgalyuk @ 2016-12-20 11:10 UTC (permalink / raw)
  To: 'Stefan Hajnoczi'
  Cc: qemu-devel, pbonzini, 'Pavel Dovgalyuk', peter.maydell

> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
> > It also fails much earlier when I enable logs with "-d int -D log".
> >
> > Here is backtrace for this failure:
> >
> >
> >
> > #0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
> >
> >    from /c/Windows/SYSTEM32/ntdll.dll
> >
> > #1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
> >
> >    from /c/Windows/SYSTEM32/ntdll.dll
> >
> > #2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
> >
> >    from /c/Windows/SYSTEM32/ntdll.dll
> >
> > #3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
> >
> >    from /c/Windows/SYSTEM32/ntdll.dll
> >
> > #4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll
> 
> Looks like a heap corruption bug since free() is failing.

Seems to be a race condition.
When I add logs into invalidate_page_bitmap, the bug disappears.
It seems that someone tries to free the same page bitmap twice and simultaneously.

Here is the stack trace for two threads at the moment when qemu fails:
Failed thread:
#0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
   from /c/Windows/SYSTEM32/ntdll.dll
#1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
   from /c/Windows/SYSTEM32/ntdll.dll
#2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
   from /c/Windows/SYSTEM32/ntdll.dll
#3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
   from /c/Windows/SYSTEM32/ntdll.dll
#4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll
#5  0x000000000040c57d in invalidate_page_bitmap (p=<optimized out>,
    p=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:881
#6  tb_invalidate_phys_page_range (start=826113, end=end@entry=826116,
    is_cpu_write_access=is_cpu_write_access@entry=0)
    at D:/Projects/QEMU/qemu/translate-all.c:1527
#7  0x000000000040c5ed in tb_invalidate_phys_range_1 (end=826116,
    start=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:1414
#8  tb_invalidate_phys_range (start=start@entry=826113, end=end@entry=826116)
    at D:/Projects/QEMU/qemu/translate-all.c:1424
#9  0x0000000000402e5f in invalidate_and_set_dirty (mr=mr@entry=0x569e980,
    addr=<optimized out>, length=<optimized out>)
    at D:/Projects/QEMU/qemu/exec.c:2511
#10 0x0000000000406af7 in cpu_physical_memory_write_rom_internal (
    type=WRITE_DATA, len=3, buf=0x22f411 "", addr=826113,
    as=0xab4280 <address_space_memory>) at D:/Projects/QEMU/qemu/exec.c:2795
<and more>

Another thread:
#0  0x00000000007411b0 in g_free ()
#1  0x000000000040b6b4 in invalidate_page_bitmap (p=0x1143f738, p=0x1143f738)
    at D:/Projects/QEMU/qemu/translate-all.c:881
#2  page_flush_tb_1 (level=level@entry=0, lp=0x5715058)
    at D:/Projects/QEMU/qemu/translate-all.c:900
#3  0x000000000040b6ee in page_flush_tb_1 (level=1, lp=0xac8ac0 <l1_map>)
    at D:/Projects/QEMU/qemu/translate-all.c:906
#4  0x000000000040b7b3 in page_flush_tb ()
    at D:/Projects/QEMU/qemu/translate-all.c:916
#5  do_tb_flush (cpu=<optimized out>, tb_flush_count=...)
    at D:/Projects/QEMU/qemu/translate-all.c:954
#6  0x0000000000519ac1 in process_queued_cpu_work (cpu=0x5632fd0)
    at cpus-common.c:338
#7  0x0000000000439761 in qemu_wait_io_event_common (cpu=0x5632fd0)
    at D:/Projects/QEMU/qemu/cpus.c:942
#8  qemu_tcg_wait_io_event (cpu=<optimized out>)
    at D:/Projects/QEMU/qemu/cpus.c:957
#9  qemu_tcg_cpu_thread_fn (arg=arg@entry=0x5632fd0)
    at D:/Projects/QEMU/qemu/cpus.c:1216
#10 0x000000000072c285 in win32_start_routine (arg=0x565ba40)
    at util/qemu-thread-win32.c:406
#11 0x000007fefdc8415f in srand () from /c/Windows/system32/msvcrt.dll
#12 0x000007fefdc86ebd in msvcrt!_ftime64_s ()
<and more>

> 
> QEMU 2.8.0 is scheduled for release today.  I have checked that
> qemu-system-i386.exe works but without playing an MP3 file in Windows
> XP.
> 
> I plan to go ahead with the release unless information becomes available
> that suggests it affects more than just this one scenario.

> >
> >
> > I encountered the following bug with the latest version of QEMU.
> >
> > I use windows host and start qemu with the following command line:
> >
> > qemu-system-i386.exe -soundhw ac97 -snapshot -hda disk.qcow2 -net none
> >
> >
> >
> > Guest system is Windows XP 32-bit. It founds new hardware (including audio controller)
> >
> > and I start playing mp3 file.
> >
> > After seconds of playing qemu fails with an exception.
> >
> >
> >
> > I tried to bisect between 2.7 and 2.8, but bug is not stable.
> >
> > It manifested itself at commits "68701de1362b29fd6941a2021e9393ddbe60edd8" and
> > "6a928d25b6d8bc3729c3d28326c6db13b9481059".


Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2016-12-20 11:10     ` Pavel Dovgalyuk
@ 2016-12-20 13:20       ` Stefan Hajnoczi
  2016-12-20 14:02       ` Stefan Hajnoczi
  1 sibling, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2016-12-20 13:20 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: qemu-devel, Paolo Bonzini, Pavel Dovgalyuk, Peter Maydell,
	Alex Bennée

On Tue, Dec 20, 2016 at 11:10 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
>> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
>> > It also fails much earlier when I enable logs with "-d int -D log".
>> >
>> > Here is backtrace for this failure:
>> >
>> >
>> >
>> > #0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
>> >
>> >    from /c/Windows/SYSTEM32/ntdll.dll
>> >
>> > #1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
>> >
>> >    from /c/Windows/SYSTEM32/ntdll.dll
>> >
>> > #2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
>> >
>> >    from /c/Windows/SYSTEM32/ntdll.dll
>> >
>> > #3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
>> >
>> >    from /c/Windows/SYSTEM32/ntdll.dll
>> >
>> > #4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll
>>
>> Looks like a heap corruption bug since free() is failing.
>
> Seems to be a race condition.
> When I add logs into invalidate_page_bitmap, the bug disappears.
> It seems that someone tries to free the same page bitmap twice and simultaneously.

It definitely looks like the locking is incomplete in this code path.
tb_lock() does nothing in a softmmu build.

Alex: Is this bug due to missing MTTCG patches in QEMU 2.8?

>
> Here is the stack trace for two threads at the moment when qemu fails:
> Failed thread:
> #0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
>    from /c/Windows/SYSTEM32/ntdll.dll
> #1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
>    from /c/Windows/SYSTEM32/ntdll.dll
> #2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
>    from /c/Windows/SYSTEM32/ntdll.dll
> #3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
>    from /c/Windows/SYSTEM32/ntdll.dll
> #4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll
> #5  0x000000000040c57d in invalidate_page_bitmap (p=<optimized out>,
>     p=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:881
> #6  tb_invalidate_phys_page_range (start=826113, end=end@entry=826116,
>     is_cpu_write_access=is_cpu_write_access@entry=0)
>     at D:/Projects/QEMU/qemu/translate-all.c:1527
> #7  0x000000000040c5ed in tb_invalidate_phys_range_1 (end=826116,
>     start=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:1414
> #8  tb_invalidate_phys_range (start=start@entry=826113, end=end@entry=826116)
>     at D:/Projects/QEMU/qemu/translate-all.c:1424
> #9  0x0000000000402e5f in invalidate_and_set_dirty (mr=mr@entry=0x569e980,
>     addr=<optimized out>, length=<optimized out>)
>     at D:/Projects/QEMU/qemu/exec.c:2511
> #10 0x0000000000406af7 in cpu_physical_memory_write_rom_internal (
>     type=WRITE_DATA, len=3, buf=0x22f411 "", addr=826113,
>     as=0xab4280 <address_space_memory>) at D:/Projects/QEMU/qemu/exec.c:2795
> <and more>
>
> Another thread:
> #0  0x00000000007411b0 in g_free ()
> #1  0x000000000040b6b4 in invalidate_page_bitmap (p=0x1143f738, p=0x1143f738)
>     at D:/Projects/QEMU/qemu/translate-all.c:881
> #2  page_flush_tb_1 (level=level@entry=0, lp=0x5715058)
>     at D:/Projects/QEMU/qemu/translate-all.c:900
> #3  0x000000000040b6ee in page_flush_tb_1 (level=1, lp=0xac8ac0 <l1_map>)
>     at D:/Projects/QEMU/qemu/translate-all.c:906
> #4  0x000000000040b7b3 in page_flush_tb ()
>     at D:/Projects/QEMU/qemu/translate-all.c:916
> #5  do_tb_flush (cpu=<optimized out>, tb_flush_count=...)
>     at D:/Projects/QEMU/qemu/translate-all.c:954
> #6  0x0000000000519ac1 in process_queued_cpu_work (cpu=0x5632fd0)
>     at cpus-common.c:338
> #7  0x0000000000439761 in qemu_wait_io_event_common (cpu=0x5632fd0)
>     at D:/Projects/QEMU/qemu/cpus.c:942
> #8  qemu_tcg_wait_io_event (cpu=<optimized out>)
>     at D:/Projects/QEMU/qemu/cpus.c:957
> #9  qemu_tcg_cpu_thread_fn (arg=arg@entry=0x5632fd0)
>     at D:/Projects/QEMU/qemu/cpus.c:1216
> #10 0x000000000072c285 in win32_start_routine (arg=0x565ba40)
>     at util/qemu-thread-win32.c:406
> #11 0x000007fefdc8415f in srand () from /c/Windows/system32/msvcrt.dll
> #12 0x000007fefdc86ebd in msvcrt!_ftime64_s ()
> <and more>
>
>>
>> QEMU 2.8.0 is scheduled for release today.  I have checked that
>> qemu-system-i386.exe works but without playing an MP3 file in Windows
>> XP.
>>
>> I plan to go ahead with the release unless information becomes available
>> that suggests it affects more than just this one scenario.
>
>> >
>> >
>> > I encountered the following bug with the latest version of QEMU.
>> >
>> > I use windows host and start qemu with the following command line:
>> >
>> > qemu-system-i386.exe -soundhw ac97 -snapshot -hda disk.qcow2 -net none
>> >
>> >
>> >
>> > Guest system is Windows XP 32-bit. It founds new hardware (including audio controller)
>> >
>> > and I start playing mp3 file.
>> >
>> > After seconds of playing qemu fails with an exception.
>> >
>> >
>> >
>> > I tried to bisect between 2.7 and 2.8, but bug is not stable.
>> >
>> > It manifested itself at commits "68701de1362b29fd6941a2021e9393ddbe60edd8" and
>> > "6a928d25b6d8bc3729c3d28326c6db13b9481059".
>
>
> Pavel Dovgalyuk
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2016-12-20 11:10     ` Pavel Dovgalyuk
  2016-12-20 13:20       ` Stefan Hajnoczi
@ 2016-12-20 14:02       ` Stefan Hajnoczi
  2016-12-20 16:02         ` Alex Bennée
  2016-12-21  5:57         ` Pavel Dovgalyuk
  1 sibling, 2 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2016-12-20 14:02 UTC (permalink / raw)
  To: Pavel Dovgalyuk; +Cc: qemu-devel, Paolo Bonzini, Pavel Dovgalyuk, Peter Maydell

On Tue, Dec 20, 2016 at 11:10 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
>> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
>> > It also fails much earlier when I enable logs with "-d int -D log".
>> >
>> > Here is backtrace for this failure:
>> >
>> >
>> >
>> > #0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
>> >
>> >    from /c/Windows/SYSTEM32/ntdll.dll
>> >
>> > #1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
>> >
>> >    from /c/Windows/SYSTEM32/ntdll.dll
>> >
>> > #2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
>> >
>> >    from /c/Windows/SYSTEM32/ntdll.dll
>> >
>> > #3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
>> >
>> >    from /c/Windows/SYSTEM32/ntdll.dll
>> >
>> > #4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll
>>
>> Looks like a heap corruption bug since free() is failing.
>
> Seems to be a race condition.
> When I add logs into invalidate_page_bitmap, the bug disappears.
> It seems that someone tries to free the same page bitmap twice and simultaneously.

Does the following workaround prevent the crashes?

-global apic-common.vapic=off

> Here is the stack trace for two threads at the moment when qemu fails:
> Failed thread:
> #0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
>    from /c/Windows/SYSTEM32/ntdll.dll
> #1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
>    from /c/Windows/SYSTEM32/ntdll.dll
> #2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
>    from /c/Windows/SYSTEM32/ntdll.dll
> #3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
>    from /c/Windows/SYSTEM32/ntdll.dll
> #4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll
> #5  0x000000000040c57d in invalidate_page_bitmap (p=<optimized out>,
>     p=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:881
> #6  tb_invalidate_phys_page_range (start=826113, end=end@entry=826116,
>     is_cpu_write_access=is_cpu_write_access@entry=0)
>     at D:/Projects/QEMU/qemu/translate-all.c:1527
> #7  0x000000000040c5ed in tb_invalidate_phys_range_1 (end=826116,
>     start=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:1414
> #8  tb_invalidate_phys_range (start=start@entry=826113, end=end@entry=826116)
>     at D:/Projects/QEMU/qemu/translate-all.c:1424
> #9  0x0000000000402e5f in invalidate_and_set_dirty (mr=mr@entry=0x569e980,
>     addr=<optimized out>, length=<optimized out>)
>     at D:/Projects/QEMU/qemu/exec.c:2511
> #10 0x0000000000406af7 in cpu_physical_memory_write_rom_internal (
>     type=WRITE_DATA, len=3, buf=0x22f411 "", addr=826113,
>     as=0xab4280 <address_space_memory>) at D:/Projects/QEMU/qemu/exec.c:2795
> <and more>
>
> Another thread:
> #0  0x00000000007411b0 in g_free ()
> #1  0x000000000040b6b4 in invalidate_page_bitmap (p=0x1143f738, p=0x1143f738)
>     at D:/Projects/QEMU/qemu/translate-all.c:881
> #2  page_flush_tb_1 (level=level@entry=0, lp=0x5715058)
>     at D:/Projects/QEMU/qemu/translate-all.c:900
> #3  0x000000000040b6ee in page_flush_tb_1 (level=1, lp=0xac8ac0 <l1_map>)
>     at D:/Projects/QEMU/qemu/translate-all.c:906
> #4  0x000000000040b7b3 in page_flush_tb ()
>     at D:/Projects/QEMU/qemu/translate-all.c:916
> #5  do_tb_flush (cpu=<optimized out>, tb_flush_count=...)
>     at D:/Projects/QEMU/qemu/translate-all.c:954
> #6  0x0000000000519ac1 in process_queued_cpu_work (cpu=0x5632fd0)
>     at cpus-common.c:338
> #7  0x0000000000439761 in qemu_wait_io_event_common (cpu=0x5632fd0)
>     at D:/Projects/QEMU/qemu/cpus.c:942
> #8  qemu_tcg_wait_io_event (cpu=<optimized out>)
>     at D:/Projects/QEMU/qemu/cpus.c:957
> #9  qemu_tcg_cpu_thread_fn (arg=arg@entry=0x5632fd0)
>     at D:/Projects/QEMU/qemu/cpus.c:1216
> #10 0x000000000072c285 in win32_start_routine (arg=0x565ba40)
>     at util/qemu-thread-win32.c:406
> #11 0x000007fefdc8415f in srand () from /c/Windows/system32/msvcrt.dll
> #12 0x000007fefdc86ebd in msvcrt!_ftime64_s ()
> <and more>
>
>>
>> QEMU 2.8.0 is scheduled for release today.  I have checked that
>> qemu-system-i386.exe works but without playing an MP3 file in Windows
>> XP.
>>
>> I plan to go ahead with the release unless information becomes available
>> that suggests it affects more than just this one scenario.
>
>> >
>> >
>> > I encountered the following bug with the latest version of QEMU.
>> >
>> > I use windows host and start qemu with the following command line:
>> >
>> > qemu-system-i386.exe -soundhw ac97 -snapshot -hda disk.qcow2 -net none
>> >
>> >
>> >
>> > Guest system is Windows XP 32-bit. It founds new hardware (including audio controller)
>> >
>> > and I start playing mp3 file.
>> >
>> > After seconds of playing qemu fails with an exception.
>> >
>> >
>> >
>> > I tried to bisect between 2.7 and 2.8, but bug is not stable.
>> >
>> > It manifested itself at commits "68701de1362b29fd6941a2021e9393ddbe60edd8" and
>> > "6a928d25b6d8bc3729c3d28326c6db13b9481059".
>
>
> Pavel Dovgalyuk
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2016-12-20 14:02       ` Stefan Hajnoczi
@ 2016-12-20 16:02         ` Alex Bennée
  2016-12-20 16:13           ` Stefan Hajnoczi
  2016-12-21  5:57         ` Pavel Dovgalyuk
  1 sibling, 1 reply; 16+ messages in thread
From: Alex Bennée @ 2016-12-20 16:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Pavel Dovgalyuk, Paolo Bonzini, qemu-devel, Pavel Dovgalyuk,
	Peter Maydell

So having a quick review on #qemu with Stefan I think this is an
existing race condition that may have been made more likely with
recent changes. AFAICT what is happening is mouse movement in the GUI
is translated into PS2 events which are injected as interrupts into
the system by GUI thread. For whatever weird x86 reason the IRQ's are
triggered by writing to a ROM area of the system. As a result
invalidation of translated ROM code is triggered that clashes with a
tb_flush which is occurring at the same time.

I suspect you could make the race more likely to hit by starting the
system with --tb-size 1 which will make the tb_flush()'s more frequent
at the same time as wiggling the mouse a lot. If you could test this
on 2.7 I suspect you'll find it's just as crash-able.

Once the rest of MTTCG gets merged the tb_lock() will become active
for the SoftMMU which will prevent the race (as one vCPU will be able
to trigger invalidations in another vCPUs code).

As far as the problem of injecting events into the vCPU is concerned I
think the correct solution would be queue the work up using the
async_run_on_cpu functions which in the current case would ensure only
one thing was happening at a time. Certainly the design of QEMU isn't
expecting non-vCPU threads to invalidate TBs. The gdbstub does this as
well but currently gets away with it as generally the vCPUs are in a
halted state when changes are made.

I'm currently on holiday but I'll keep an eye out for this thread if
anyone has any follow up questions. It would be nice if one of the x86
guys could explain exactly what this weird ROM/IRQ setup is and why it
exists.

On 20 December 2016 at 14:02, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Tue, Dec 20, 2016 at 11:10 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
>>> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>>> On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
>>> > It also fails much earlier when I enable logs with "-d int -D log".
>>> >
>>> > Here is backtrace for this failure:
>>> >
>>> >
>>> >
>>> > #0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
>>> >
>>> >    from /c/Windows/SYSTEM32/ntdll.dll
>>> >
>>> > #1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
>>> >
>>> >    from /c/Windows/SYSTEM32/ntdll.dll
>>> >
>>> > #2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
>>> >
>>> >    from /c/Windows/SYSTEM32/ntdll.dll
>>> >
>>> > #3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
>>> >
>>> >    from /c/Windows/SYSTEM32/ntdll.dll
>>> >
>>> > #4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll
>>>
>>> Looks like a heap corruption bug since free() is failing.
>>
>> Seems to be a race condition.
>> When I add logs into invalidate_page_bitmap, the bug disappears.
>> It seems that someone tries to free the same page bitmap twice and simultaneously.
>
> Does the following workaround prevent the crashes?
>
> -global apic-common.vapic=off
>
>> Here is the stack trace for two threads at the moment when qemu fails:
>> Failed thread:
>> #0  0x0000000076e79e52 in ntdll!EtwpCreateEtwThread ()
>>    from /c/Windows/SYSTEM32/ntdll.dll
>> #1  0x0000000076e56965 in ntdll!EtwEventSetInformation ()
>>    from /c/Windows/SYSTEM32/ntdll.dll
>> #2  0x0000000076e942d9 in ntdll!RtlLogStackBackTrace ()
>>    from /c/Windows/SYSTEM32/ntdll.dll
>> #3  0x0000000076e3797c in ntdll!TpAlpcRegisterCompletionList ()
>>    from /c/Windows/SYSTEM32/ntdll.dll
>> #4  0x000007fefdc810c8 in msvcrt!free () from /c/Windows/system32/msvcrt.dll
>> #5  0x000000000040c57d in invalidate_page_bitmap (p=<optimized out>,
>>     p=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:881
>> #6  tb_invalidate_phys_page_range (start=826113, end=end@entry=826116,
>>     is_cpu_write_access=is_cpu_write_access@entry=0)
>>     at D:/Projects/QEMU/qemu/translate-all.c:1527
>> #7  0x000000000040c5ed in tb_invalidate_phys_range_1 (end=826116,
>>     start=<optimized out>) at D:/Projects/QEMU/qemu/translate-all.c:1414
>> #8  tb_invalidate_phys_range (start=start@entry=826113, end=end@entry=826116)
>>     at D:/Projects/QEMU/qemu/translate-all.c:1424
>> #9  0x0000000000402e5f in invalidate_and_set_dirty (mr=mr@entry=0x569e980,
>>     addr=<optimized out>, length=<optimized out>)
>>     at D:/Projects/QEMU/qemu/exec.c:2511
>> #10 0x0000000000406af7 in cpu_physical_memory_write_rom_internal (
>>     type=WRITE_DATA, len=3, buf=0x22f411 "", addr=826113,
>>     as=0xab4280 <address_space_memory>) at D:/Projects/QEMU/qemu/exec.c:2795
>> <and more>
>>
>> Another thread:
>> #0  0x00000000007411b0 in g_free ()
>> #1  0x000000000040b6b4 in invalidate_page_bitmap (p=0x1143f738, p=0x1143f738)
>>     at D:/Projects/QEMU/qemu/translate-all.c:881
>> #2  page_flush_tb_1 (level=level@entry=0, lp=0x5715058)
>>     at D:/Projects/QEMU/qemu/translate-all.c:900
>> #3  0x000000000040b6ee in page_flush_tb_1 (level=1, lp=0xac8ac0 <l1_map>)
>>     at D:/Projects/QEMU/qemu/translate-all.c:906
>> #4  0x000000000040b7b3 in page_flush_tb ()
>>     at D:/Projects/QEMU/qemu/translate-all.c:916
>> #5  do_tb_flush (cpu=<optimized out>, tb_flush_count=...)
>>     at D:/Projects/QEMU/qemu/translate-all.c:954
>> #6  0x0000000000519ac1 in process_queued_cpu_work (cpu=0x5632fd0)
>>     at cpus-common.c:338
>> #7  0x0000000000439761 in qemu_wait_io_event_common (cpu=0x5632fd0)
>>     at D:/Projects/QEMU/qemu/cpus.c:942
>> #8  qemu_tcg_wait_io_event (cpu=<optimized out>)
>>     at D:/Projects/QEMU/qemu/cpus.c:957
>> #9  qemu_tcg_cpu_thread_fn (arg=arg@entry=0x5632fd0)
>>     at D:/Projects/QEMU/qemu/cpus.c:1216
>> #10 0x000000000072c285 in win32_start_routine (arg=0x565ba40)
>>     at util/qemu-thread-win32.c:406
>> #11 0x000007fefdc8415f in srand () from /c/Windows/system32/msvcrt.dll
>> #12 0x000007fefdc86ebd in msvcrt!_ftime64_s ()
>> <and more>
>>
>>>
>>> QEMU 2.8.0 is scheduled for release today.  I have checked that
>>> qemu-system-i386.exe works but without playing an MP3 file in Windows
>>> XP.
>>>
>>> I plan to go ahead with the release unless information becomes available
>>> that suggests it affects more than just this one scenario.
>>
>>> >
>>> >
>>> > I encountered the following bug with the latest version of QEMU.
>>> >
>>> > I use windows host and start qemu with the following command line:
>>> >
>>> > qemu-system-i386.exe -soundhw ac97 -snapshot -hda disk.qcow2 -net none
>>> >
>>> >
>>> >
>>> > Guest system is Windows XP 32-bit. It founds new hardware (including audio controller)
>>> >
>>> > and I start playing mp3 file.
>>> >
>>> > After seconds of playing qemu fails with an exception.
>>> >
>>> >
>>> >
>>> > I tried to bisect between 2.7 and 2.8, but bug is not stable.
>>> >
>>> > It manifested itself at commits "68701de1362b29fd6941a2021e9393ddbe60edd8" and
>>> > "6a928d25b6d8bc3729c3d28326c6db13b9481059".
>>
>>
>> Pavel Dovgalyuk
>>
>



-- 
Alex Bennée
KVM/QEMU Hacker for Linaro

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2016-12-20 16:02         ` Alex Bennée
@ 2016-12-20 16:13           ` Stefan Hajnoczi
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2016-12-20 16:13 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Pavel Dovgalyuk, Paolo Bonzini, qemu-devel, Pavel Dovgalyuk,
	Peter Maydell

Thanks for looking into this bug.

The QEMU 2.8 release will go ahead as planned today since this is an
existing race and there is a potential workaround (i.e. disabling
vapic on x86).

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2016-12-20 14:02       ` Stefan Hajnoczi
  2016-12-20 16:02         ` Alex Bennée
@ 2016-12-21  5:57         ` Pavel Dovgalyuk
  2016-12-21 11:05           ` Stefan Hajnoczi
  1 sibling, 1 reply; 16+ messages in thread
From: Pavel Dovgalyuk @ 2016-12-21  5:57 UTC (permalink / raw)
  To: 'Stefan Hajnoczi'
  Cc: 'qemu-devel', 'Paolo Bonzini',
	'Pavel Dovgalyuk', 'Peter Maydell'


> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> On Tue, Dec 20, 2016 at 11:10 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
> >> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> >> On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
> >> > It also fails much earlier when I enable logs with "-d int -D log".
> >> >
> >> Looks like a heap corruption bug since free() is failing.
> >
> > Seems to be a race condition.
> > When I add logs into invalidate_page_bitmap, the bug disappears.
> > It seems that someone tries to free the same page bitmap twice and simultaneously.
> 
> Does the following workaround prevent the crashes?
> 
> -global apic-common.vapic=off

Yes, this option helps.
Thank you.

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2016-12-21  5:57         ` Pavel Dovgalyuk
@ 2016-12-21 11:05           ` Stefan Hajnoczi
  2017-01-11  7:10             ` Pavel Dovgalyuk
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2016-12-21 11:05 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: qemu-devel, Paolo Bonzini, Pavel Dovgalyuk, Peter Maydell,
	Alex Bennée

On Wed, Dec 21, 2016 at 5:57 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
>> -----Original Message-----
>> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> On Tue, Dec 20, 2016 at 11:10 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
>> >> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> >> On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
>> >> > It also fails much earlier when I enable logs with "-d int -D log".
>> >> >
>> >> Looks like a heap corruption bug since free() is failing.
>> >
>> > Seems to be a race condition.
>> > When I add logs into invalidate_page_bitmap, the bug disappears.
>> > It seems that someone tries to free the same page bitmap twice and simultaneously.
>>
>> Does the following workaround prevent the crashes?
>>
>> -global apic-common.vapic=off
>
> Yes, this option helps.
> Thank you.

Good news.  This can be fixed in 2.8.1 once someone finds a solution.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2016-12-21 11:05           ` Stefan Hajnoczi
@ 2017-01-11  7:10             ` Pavel Dovgalyuk
  2017-01-11 11:41               ` Alex Bennée
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Dovgalyuk @ 2017-01-11  7:10 UTC (permalink / raw)
  To: 'Stefan Hajnoczi'
  Cc: 'qemu-devel', 'Paolo Bonzini',
	'Pavel Dovgalyuk', 'Peter Maydell',
	'Alex Bennée'

> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> On Wed, Dec 21, 2016 at 5:57 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
> >> -----Original Message-----
> >> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> >> On Tue, Dec 20, 2016 at 11:10 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
> >> >> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> >> >> On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
> >> >> > It also fails much earlier when I enable logs with "-d int -D log".
> >> >> >
> >> >> Looks like a heap corruption bug since free() is failing.
> >> >
> >> > Seems to be a race condition.
> >> > When I add logs into invalidate_page_bitmap, the bug disappears.
> >> > It seems that someone tries to free the same page bitmap twice and simultaneously.
> >>
> >> Does the following workaround prevent the crashes?
> >>
> >> -global apic-common.vapic=off
> >
> > Yes, this option helps.
> > Thank you.
> 
> Good news.  This can be fixed in 2.8.1 once someone finds a solution.

It seems that something still goes wrong.
I'm using this workaround, but there is a kind of deadlock in translation.
call_rcu_thread hangs at some moment in qemu_event_wait.

As far as I understand, it is used by QHT in translate-all.c.
I can't get more information yet, because logging makes everything too slow.

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2017-01-11  7:10             ` Pavel Dovgalyuk
@ 2017-01-11 11:41               ` Alex Bennée
  2017-01-12  8:07                 ` Pavel Dovgalyuk
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bennée @ 2017-01-11 11:41 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: 'Stefan Hajnoczi', 'qemu-devel',
	'Paolo Bonzini', 'Pavel Dovgalyuk',
	'Peter Maydell'


Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> On Wed, Dec 21, 2016 at 5:57 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
>> >> -----Original Message-----
>> >> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> >> On Tue, Dec 20, 2016 at 11:10 AM, Pavel Dovgalyuk <dovgaluk@ispras.ru> wrote:
>> >> >> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> >> >> On Tue, Dec 20, 2016 at 10:45:44AM +0300, Pavel Dovgalyuk wrote:
>> >> >> > It also fails much earlier when I enable logs with "-d int -D log".
>> >> >> >
>> >> >> Looks like a heap corruption bug since free() is failing.
>> >> >
>> >> > Seems to be a race condition.
>> >> > When I add logs into invalidate_page_bitmap, the bug disappears.
>> >> > It seems that someone tries to free the same page bitmap twice and simultaneously.
>> >>
>> >> Does the following workaround prevent the crashes?
>> >>
>> >> -global apic-common.vapic=off
>> >
>> > Yes, this option helps.
>> > Thank you.
>>
>> Good news.  This can be fixed in 2.8.1 once someone finds a solution.
>
> It seems that something still goes wrong.
> I'm using this workaround, but there is a kind of deadlock in translation.
> call_rcu_thread hangs at some moment in qemu_event_wait.
>
> As far as I understand, it is used by QHT in translate-all.c.
> I can't get more information yet, because logging makes everything too slow.

There are a number of users of RCU bit for QHT I think it only gets
activated when it needs to re-size its hash table on insertion of new
TranslationBlocks.

Can you get a backtrace of all threads when it deadlocks?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2017-01-11 11:41               ` Alex Bennée
@ 2017-01-12  8:07                 ` Pavel Dovgalyuk
  2017-01-20 17:33                   ` Alex Bennée
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Dovgalyuk @ 2017-01-12  8:07 UTC (permalink / raw)
  To: 'Alex Bennée'
  Cc: 'Stefan Hajnoczi', 'qemu-devel',
	'Paolo Bonzini', 'Pavel Dovgalyuk',
	'Peter Maydell'

> From: Alex Bennée [mailto:alex.bennee@linaro.org]
> >> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
> >> >
> >> > Yes, this option helps.
> >> > Thank you.
> >>
> >> Good news.  This can be fixed in 2.8.1 once someone finds a solution.
> >
> > It seems that something still goes wrong.
> > I'm using this workaround, but there is a kind of deadlock in translation.
> > call_rcu_thread hangs at some moment in qemu_event_wait.
> >
> > As far as I understand, it is used by QHT in translate-all.c.
> > I can't get more information yet, because logging makes everything too slow.
> 
> There are a number of users of RCU bit for QHT I think it only gets
> activated when it needs to re-size its hash table on insertion of new
> TranslationBlocks.
> 
> Can you get a backtrace of all threads when it deadlocks?

Sorry, this is another problem which occurs only in icount replay mode:
1. cpu_handle_exception tries to force exception when is cannot occur due to
   running out all the planned instructions:
    } else if (replay_has_exception()
               && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
        /* try to cause an exception pending in the log */
        cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
        *ret = -1;
        return true;

2. tb_find calls tb_gen_code, which cannot allocate new translation block 
   and calls tb_flush (which only queues the flushing) and cpu_loop_exit
3. cpu_loop_exit returns to infinite loop of cpu_exec and the condition
            if (cpu_handle_exception(cpu, &ret)) {
                break;
            }
   is checked again causing an infinite loop.

TB cache is not flushed because we never execute that break and real work of tb_flush
is made outside this loop.

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2017-01-12  8:07                 ` Pavel Dovgalyuk
@ 2017-01-20 17:33                   ` Alex Bennée
  2017-01-23  7:50                     ` Pavel Dovgalyuk
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Bennée @ 2017-01-20 17:33 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: 'Stefan Hajnoczi', 'qemu-devel',
	'Paolo Bonzini', 'Pavel Dovgalyuk',
	'Peter Maydell'


Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: Alex Bennée [mailto:alex.bennee@linaro.org]
>> >> From: Stefan Hajnoczi [mailto:stefanha@gmail.com]
>> >> >
>> >> > Yes, this option helps.
>> >> > Thank you.
>> >>
>> >> Good news.  This can be fixed in 2.8.1 once someone finds a solution.
>> >
>> > It seems that something still goes wrong.
>> > I'm using this workaround, but there is a kind of deadlock in translation.
>> > call_rcu_thread hangs at some moment in qemu_event_wait.
>> >
>> > As far as I understand, it is used by QHT in translate-all.c.
>> > I can't get more information yet, because logging makes everything too slow.
>>
>> There are a number of users of RCU bit for QHT I think it only gets
>> activated when it needs to re-size its hash table on insertion of new
>> TranslationBlocks.
>>
>> Can you get a backtrace of all threads when it deadlocks?
>
> Sorry, this is another problem which occurs only in icount replay mode:
> 1. cpu_handle_exception tries to force exception when is cannot occur due to
>    running out all the planned instructions:
>     } else if (replay_has_exception()
>                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>         /* try to cause an exception pending in the log */
>         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>         *ret = -1;
>         return true;
>
> 2. tb_find calls tb_gen_code, which cannot allocate new translation block
>    and calls tb_flush (which only queues the flushing) and cpu_loop_exit
> 3. cpu_loop_exit returns to infinite loop of cpu_exec and the condition
>             if (cpu_handle_exception(cpu, &ret)) {
>                 break;
>             }
>    is checked again causing an infinite loop.
>
> TB cache is not flushed because we never execute that break and real work of tb_flush
> is made outside this loop.

I think what we need is a:


  if (cpu->exit_request)
    break;

before the cpu_handle_exception() call to ensure any queued work gets
processed first. Can you give me you current command line so I can
reproduce this and check the fix works?

--
Alex Bennée

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2017-01-20 17:33                   ` Alex Bennée
@ 2017-01-23  7:50                     ` Pavel Dovgalyuk
  2017-01-23  9:38                       ` Alex Bennée
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Dovgalyuk @ 2017-01-23  7:50 UTC (permalink / raw)
  To: 'Alex Bennée'
  Cc: 'Stefan Hajnoczi', 'qemu-devel',
	'Paolo Bonzini', 'Pavel Dovgalyuk',
	'Peter Maydell'

> From: Alex Bennée [mailto:alex.bennee@linaro.org]
> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
> 
> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]
> >
> > Sorry, this is another problem which occurs only in icount replay mode:
> > 1. cpu_handle_exception tries to force exception when is cannot occur due to
> >    running out all the planned instructions:
> >     } else if (replay_has_exception()
> >                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> >         /* try to cause an exception pending in the log */
> >         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
> >         *ret = -1;
> >         return true;
> >
> > 2. tb_find calls tb_gen_code, which cannot allocate new translation block
> >    and calls tb_flush (which only queues the flushing) and cpu_loop_exit
> > 3. cpu_loop_exit returns to infinite loop of cpu_exec and the condition
> >             if (cpu_handle_exception(cpu, &ret)) {
> >                 break;
> >             }
> >    is checked again causing an infinite loop.
> >
> > TB cache is not flushed because we never execute that break and real work of tb_flush
> > is made outside this loop.
> 
> I think what we need is a:
> 
> 
>   if (cpu->exit_request)
>     break;

Where this exit_request is supposed to be set?

> 
> before the cpu_handle_exception() call to ensure any queued work gets
> processed first. Can you give me you current command line so I can
> reproduce this and check the fix works?

I solved the problem using following patch:

--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
 #ifndef CONFIG_USER_ONLY
     } else if (replay_has_exception()
                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+        /* Break the execution loop in case of running out of TB cache.
+           This is needed to make flushing of the TB cache, because
+           real flush is queued to be executed outside the cpu loop. */
+        cpu->exception_index = EXCP_INTERRUPT;
         /* try to cause an exception pending in the log */
         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
         *ret = -1;

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] qemu-2.8-rc4 is broken
  2017-01-23  7:50                     ` Pavel Dovgalyuk
@ 2017-01-23  9:38                       ` Alex Bennée
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Bennée @ 2017-01-23  9:38 UTC (permalink / raw)
  To: Pavel Dovgalyuk
  Cc: 'Stefan Hajnoczi', 'qemu-devel',
	'Paolo Bonzini', 'Pavel Dovgalyuk',
	'Peter Maydell'


Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: Alex Bennée [mailto:alex.bennee@linaro.org]
>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:
>>
>> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]
>> >
>> > Sorry, this is another problem which occurs only in icount replay mode:
>> > 1. cpu_handle_exception tries to force exception when is cannot occur due to
>> >    running out all the planned instructions:
>> >     } else if (replay_has_exception()
>> >                && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>> >         /* try to cause an exception pending in the log */
>> >         cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>> >         *ret = -1;
>> >         return true;
>> >
>> > 2. tb_find calls tb_gen_code, which cannot allocate new translation block
>> >    and calls tb_flush (which only queues the flushing) and cpu_loop_exit
>> > 3. cpu_loop_exit returns to infinite loop of cpu_exec and the condition
>> >             if (cpu_handle_exception(cpu, &ret)) {
>> >                 break;
>> >             }
>> >    is checked again causing an infinite loop.
>> >
>> > TB cache is not flushed because we never execute that break and real work of tb_flush
>> > is made outside this loop.
>>
>> I think what we need is a:
>>
>>
>>   if (cpu->exit_request)
>>     break;
>
> Where this exit_request is supposed to be set?

Ahh my mistake. Currently it is a global exit_request (becoming a
per-cpu exit_request when MTTCG is merged). It's set by qemu_cpu_kick()
when work is queued up, in this case the tb_flush async work.


>> before the cpu_handle_exception() call to ensure any queued work gets
>> processed first. Can you give me you current command line so I can
>> reproduce this and check the fix works?
>
> I solved the problem using following patch:
>
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -451,6 +451,10 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>  #ifndef CONFIG_USER_ONLY
>      } else if (replay_has_exception()
>                 && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> +        /* Break the execution loop in case of running out of TB cache.
> +           This is needed to make flushing of the TB cache, because
> +           real flush is queued to be executed outside the cpu loop. */
> +        cpu->exception_index = EXCP_INTERRUPT;
>          /* try to cause an exception pending in the log */
>          cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
>          *ret = -1;

I wonder if it worth renaming EXCP_INTERRUPT? I always get it confused
with a guest interrupt. But the effect is the same as we set it on an
exit_request.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-01-23  9:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-19  9:47 [Qemu-devel] qemu-2.8-rc4 is broken Pavel Dovgalyuk
2016-12-20  7:45 ` Pavel Dovgalyuk
2016-12-20 10:21   ` Stefan Hajnoczi
2016-12-20 11:10     ` Pavel Dovgalyuk
2016-12-20 13:20       ` Stefan Hajnoczi
2016-12-20 14:02       ` Stefan Hajnoczi
2016-12-20 16:02         ` Alex Bennée
2016-12-20 16:13           ` Stefan Hajnoczi
2016-12-21  5:57         ` Pavel Dovgalyuk
2016-12-21 11:05           ` Stefan Hajnoczi
2017-01-11  7:10             ` Pavel Dovgalyuk
2017-01-11 11:41               ` Alex Bennée
2017-01-12  8:07                 ` Pavel Dovgalyuk
2017-01-20 17:33                   ` Alex Bennée
2017-01-23  7:50                     ` Pavel Dovgalyuk
2017-01-23  9:38                       ` Alex Bennée

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.