* [Qemu-devel] another locking issue in current dataplane code?
@ 2014-07-07 11:58 Christian Borntraeger
2014-07-08 7:19 ` Christian Borntraeger
2014-07-08 15:59 ` Stefan Hajnoczi
0 siblings, 2 replies; 13+ messages in thread
From: Christian Borntraeger @ 2014-07-07 11:58 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Cornelia Huck, Kevin Wolf, ming.lei, qemu-devel, Dominik Dingel
Folks,
with current 2.1-rc0 (
+ dataplane: do not free VirtQueueElement in vring_push()
+ virtio-blk: avoid dataplane VirtIOBlockReq early free
+ some not-ready yet s390 patches for migration
)
I still having issues with dataplane during managedsave (without dataplane everything seems to work fine):
With 1 CPU and 1 disk (and some workload, e.g. a simple dd on the disk) I get:
Thread 3 (Thread 0x3fff90fd910 (LWP 27218)):
#0 0x000003fffcdb7ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x000003fffcdbac0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
#2 0x000003fffcdb399a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#3 0x00000000801fff06 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8037f788 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#4 0x00000000800472f4 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:843
#5 qemu_kvm_cpu_thread_fn (arg=0x809ad6b0) at /home/cborntra/REPOS/qemu/cpus.c:879
#6 0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
#7 0x000003fffba350ae in thread_start () from /lib64/libc.so.6
Thread 2 (Thread 0x3fff88fd910 (LWP 27219)):
#0 0x000003fffba2a8e0 in ppoll () from /lib64/libc.so.6
#1 0x00000000801af250 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2 qemu_poll_ns (fds=fds@entry=0x3fff40010c0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:314
#3 0x00000000801b0702 in aio_poll (ctx=0x807f2230, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
#4 0x00000000800be3c4 in iothread_run (opaque=0x807f20d8) at /home/cborntra/REPOS/qemu/iothread.c:41
#5 0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
#6 0x000003fffba350ae in thread_start () from /lib64/libc.so.6
Thread 1 (Thread 0x3fff9c529b0 (LWP 27215)):
#0 0x000003fffcdb38f0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00000000801fff06 in qemu_cond_wait (cond=cond@entry=0x807f22c0, mutex=mutex@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
#2 0x0000000080212906 in rfifolock_lock (r=r@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/rfifolock.c:59
#3 0x000000008019e536 in aio_context_acquire (ctx=ctx@entry=0x807f2230) at /home/cborntra/REPOS/qemu/async.c:295
#4 0x00000000801a34e6 in bdrv_drain_all () at /home/cborntra/REPOS/qemu/block.c:1907
#5 0x0000000080048e24 in do_vm_stop (state=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:538
#6 vm_stop (state=state@entry=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:1221
#7 0x00000000800e6338 in qmp_stop (errp=errp@entry=0x3ffffa9dc00) at /home/cborntra/REPOS/qemu/qmp.c:98
#8 0x00000000800e1314 in qmp_marshal_input_stop (mon=<optimized out>, qdict=<optimized out>, ret=<optimized out>) at qmp-marshal.c:2806
#9 0x000000008004b91a in qmp_call_cmd (cmd=<optimized out>, params=0x8096cf50, mon=0x8080b8a0) at /home/cborntra/REPOS/qemu/monitor.c:5038
#10 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /home/cborntra/REPOS/qemu/monitor.c:5104
#11 0x00000000801faf16 in json_message_process_token (lexer=0x8080b7c0, token=0x808f2610, type=<optimized out>, x=<optimized out>, y=6) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:87
#12 0x0000000080212bac in json_lexer_feed_char (lexer=lexer@entry=0x8080b7c0, ch=<optimized out>, flush=flush@entry=false) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:303
#13 0x0000000080212cfe in json_lexer_feed (lexer=0x8080b7c0, buffer=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:356
#14 0x00000000801fb10e in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:110
#15 0x0000000080049f28 in monitor_control_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/monitor.c:5125
#16 0x00000000800c8636 in qemu_chr_be_write (len=1, buf=0x3ffffa9e010 "}[B\377\373\251\372\b", s=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:213
#17 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:2690
#18 0x000003fffcc9f05a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#19 0x00000000801ae3e0 in glib_pollfds_poll () at /home/cborntra/REPOS/qemu/main-loop.c:190
#20 os_host_main_loop_wait (timeout=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:235
#21 main_loop_wait (nonblocking=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:484
#22 0x00000000800169e2 in main_loop () at /home/cborntra/REPOS/qemu/vl.c:2024
#23 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:4551
Now. If aio_poll never returns, we have a deadlock here.
To me it looks like, that aio_poll could be called from iothread_run, even if there are no outstanding request.
Opinions?
Christian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-07 11:58 [Qemu-devel] another locking issue in current dataplane code? Christian Borntraeger
@ 2014-07-08 7:19 ` Christian Borntraeger
2014-07-08 7:43 ` Ming Lei
2014-07-08 15:59 ` Stefan Hajnoczi
1 sibling, 1 reply; 13+ messages in thread
From: Christian Borntraeger @ 2014-07-08 7:19 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Cornelia Huck, Kevin Wolf, ming.lei, qemu-devel, Dominik Dingel
Ping.
has anyone seen a similar hang on x86?
On 07/07/14 13:58, Christian Borntraeger wrote:
> Folks,
>
> with current 2.1-rc0 (
> + dataplane: do not free VirtQueueElement in vring_push()
> + virtio-blk: avoid dataplane VirtIOBlockReq early free
> + some not-ready yet s390 patches for migration
> )
>
> I still having issues with dataplane during managedsave (without dataplane everything seems to work fine):
>
> With 1 CPU and 1 disk (and some workload, e.g. a simple dd on the disk) I get:
>
>
> Thread 3 (Thread 0x3fff90fd910 (LWP 27218)):
> #0 0x000003fffcdb7ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1 0x000003fffcdbac0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
> #2 0x000003fffcdb399a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #3 0x00000000801fff06 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8037f788 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
> #4 0x00000000800472f4 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:843
> #5 qemu_kvm_cpu_thread_fn (arg=0x809ad6b0) at /home/cborntra/REPOS/qemu/cpus.c:879
> #6 0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
> #7 0x000003fffba350ae in thread_start () from /lib64/libc.so.6
>
> Thread 2 (Thread 0x3fff88fd910 (LWP 27219)):
> #0 0x000003fffba2a8e0 in ppoll () from /lib64/libc.so.6
> #1 0x00000000801af250 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2 qemu_poll_ns (fds=fds@entry=0x3fff40010c0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:314
> #3 0x00000000801b0702 in aio_poll (ctx=0x807f2230, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
> #4 0x00000000800be3c4 in iothread_run (opaque=0x807f20d8) at /home/cborntra/REPOS/qemu/iothread.c:41
> #5 0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
> #6 0x000003fffba350ae in thread_start () from /lib64/libc.so.6
>
> Thread 1 (Thread 0x3fff9c529b0 (LWP 27215)):
> #0 0x000003fffcdb38f0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1 0x00000000801fff06 in qemu_cond_wait (cond=cond@entry=0x807f22c0, mutex=mutex@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
> #2 0x0000000080212906 in rfifolock_lock (r=r@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/rfifolock.c:59
> #3 0x000000008019e536 in aio_context_acquire (ctx=ctx@entry=0x807f2230) at /home/cborntra/REPOS/qemu/async.c:295
> #4 0x00000000801a34e6 in bdrv_drain_all () at /home/cborntra/REPOS/qemu/block.c:1907
> #5 0x0000000080048e24 in do_vm_stop (state=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:538
> #6 vm_stop (state=state@entry=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:1221
> #7 0x00000000800e6338 in qmp_stop (errp=errp@entry=0x3ffffa9dc00) at /home/cborntra/REPOS/qemu/qmp.c:98
> #8 0x00000000800e1314 in qmp_marshal_input_stop (mon=<optimized out>, qdict=<optimized out>, ret=<optimized out>) at qmp-marshal.c:2806
> #9 0x000000008004b91a in qmp_call_cmd (cmd=<optimized out>, params=0x8096cf50, mon=0x8080b8a0) at /home/cborntra/REPOS/qemu/monitor.c:5038
> #10 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /home/cborntra/REPOS/qemu/monitor.c:5104
> #11 0x00000000801faf16 in json_message_process_token (lexer=0x8080b7c0, token=0x808f2610, type=<optimized out>, x=<optimized out>, y=6) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:87
> #12 0x0000000080212bac in json_lexer_feed_char (lexer=lexer@entry=0x8080b7c0, ch=<optimized out>, flush=flush@entry=false) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:303
> #13 0x0000000080212cfe in json_lexer_feed (lexer=0x8080b7c0, buffer=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:356
> #14 0x00000000801fb10e in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:110
> #15 0x0000000080049f28 in monitor_control_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/monitor.c:5125
> #16 0x00000000800c8636 in qemu_chr_be_write (len=1, buf=0x3ffffa9e010 "}[B\377\373\251\372\b", s=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:213
> #17 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:2690
> #18 0x000003fffcc9f05a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
> #19 0x00000000801ae3e0 in glib_pollfds_poll () at /home/cborntra/REPOS/qemu/main-loop.c:190
> #20 os_host_main_loop_wait (timeout=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:235
> #21 main_loop_wait (nonblocking=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:484
> #22 0x00000000800169e2 in main_loop () at /home/cborntra/REPOS/qemu/vl.c:2024
> #23 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:4551
>
> Now. If aio_poll never returns, we have a deadlock here.
> To me it looks like, that aio_poll could be called from iothread_run, even if there are no outstanding request.
> Opinions?
>
> Christian
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-08 7:19 ` Christian Borntraeger
@ 2014-07-08 7:43 ` Ming Lei
2014-07-08 8:38 ` Christian Borntraeger
2014-07-08 9:09 ` Christian Borntraeger
0 siblings, 2 replies; 13+ messages in thread
From: Ming Lei @ 2014-07-08 7:43 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Cornelia Huck, Kevin Wolf, Dominik Dingel, qemu-devel, Stefan Hajnoczi
On Tue, Jul 8, 2014 at 3:19 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> Ping.
>
> has anyone seen a similar hang on x86?
>
>
>
> On 07/07/14 13:58, Christian Borntraeger wrote:
>> Folks,
>>
>> with current 2.1-rc0 (
>> + dataplane: do not free VirtQueueElement in vring_push()
>> + virtio-blk: avoid dataplane VirtIOBlockReq early free
>> + some not-ready yet s390 patches for migration
>> )
>>
>> I still having issues with dataplane during managedsave (without dataplane everything seems to work fine):
>>
>> With 1 CPU and 1 disk (and some workload, e.g. a simple dd on the disk) I get:
>>
>>
>> Thread 3 (Thread 0x3fff90fd910 (LWP 27218)):
>> #0 0x000003fffcdb7ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
>> #1 0x000003fffcdbac0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
>> #2 0x000003fffcdb399a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>> #3 0x00000000801fff06 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8037f788 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
>> #4 0x00000000800472f4 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:843
>> #5 qemu_kvm_cpu_thread_fn (arg=0x809ad6b0) at /home/cborntra/REPOS/qemu/cpus.c:879
>> #6 0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
>> #7 0x000003fffba350ae in thread_start () from /lib64/libc.so.6
>>
>> Thread 2 (Thread 0x3fff88fd910 (LWP 27219)):
>> #0 0x000003fffba2a8e0 in ppoll () from /lib64/libc.so.6
>> #1 0x00000000801af250 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
>> #2 qemu_poll_ns (fds=fds@entry=0x3fff40010c0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:314
>> #3 0x00000000801b0702 in aio_poll (ctx=0x807f2230, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
>> #4 0x00000000800be3c4 in iothread_run (opaque=0x807f20d8) at /home/cborntra/REPOS/qemu/iothread.c:41
>> #5 0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
>> #6 0x000003fffba350ae in thread_start () from /lib64/libc.so.6
>>
>> Thread 1 (Thread 0x3fff9c529b0 (LWP 27215)):
>> #0 0x000003fffcdb38f0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>> #1 0x00000000801fff06 in qemu_cond_wait (cond=cond@entry=0x807f22c0, mutex=mutex@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
>> #2 0x0000000080212906 in rfifolock_lock (r=r@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/rfifolock.c:59
>> #3 0x000000008019e536 in aio_context_acquire (ctx=ctx@entry=0x807f2230) at /home/cborntra/REPOS/qemu/async.c:295
>> #4 0x00000000801a34e6 in bdrv_drain_all () at /home/cborntra/REPOS/qemu/block.c:1907
>> #5 0x0000000080048e24 in do_vm_stop (state=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:538
>> #6 vm_stop (state=state@entry=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:1221
>> #7 0x00000000800e6338 in qmp_stop (errp=errp@entry=0x3ffffa9dc00) at /home/cborntra/REPOS/qemu/qmp.c:98
>> #8 0x00000000800e1314 in qmp_marshal_input_stop (mon=<optimized out>, qdict=<optimized out>, ret=<optimized out>) at qmp-marshal.c:2806
>> #9 0x000000008004b91a in qmp_call_cmd (cmd=<optimized out>, params=0x8096cf50, mon=0x8080b8a0) at /home/cborntra/REPOS/qemu/monitor.c:5038
>> #10 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /home/cborntra/REPOS/qemu/monitor.c:5104
>> #11 0x00000000801faf16 in json_message_process_token (lexer=0x8080b7c0, token=0x808f2610, type=<optimized out>, x=<optimized out>, y=6) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:87
>> #12 0x0000000080212bac in json_lexer_feed_char (lexer=lexer@entry=0x8080b7c0, ch=<optimized out>, flush=flush@entry=false) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:303
>> #13 0x0000000080212cfe in json_lexer_feed (lexer=0x8080b7c0, buffer=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:356
>> #14 0x00000000801fb10e in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:110
>> #15 0x0000000080049f28 in monitor_control_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/monitor.c:5125
>> #16 0x00000000800c8636 in qemu_chr_be_write (len=1, buf=0x3ffffa9e010 "}[B\377\373\251\372\b", s=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:213
>> #17 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:2690
>> #18 0x000003fffcc9f05a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>> #19 0x00000000801ae3e0 in glib_pollfds_poll () at /home/cborntra/REPOS/qemu/main-loop.c:190
>> #20 os_host_main_loop_wait (timeout=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:235
>> #21 main_loop_wait (nonblocking=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:484
>> #22 0x00000000800169e2 in main_loop () at /home/cborntra/REPOS/qemu/vl.c:2024
>> #23 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:4551
>>
>> Now. If aio_poll never returns, we have a deadlock here.
>> To me it looks like, that aio_poll could be called from iothread_run, even if there are no outstanding request.
>> Opinions?
I have sent out one patch to fix the issue, and the title is
"virtio-blk: data-plane: fix save/set .complete_request in start".
Please try this patch to see if it fixes your issue.
thanks
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-08 7:43 ` Ming Lei
@ 2014-07-08 8:38 ` Christian Borntraeger
2014-07-08 9:09 ` Christian Borntraeger
1 sibling, 0 replies; 13+ messages in thread
From: Christian Borntraeger @ 2014-07-08 8:38 UTC (permalink / raw)
To: Ming Lei
Cc: Cornelia Huck, Kevin Wolf, Dominik Dingel, qemu-devel, Stefan Hajnoczi
On 08/07/14 09:43, Ming Lei wrote:
> On Tue, Jul 8, 2014 at 3:19 PM, Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
>> Ping.
>>
>> has anyone seen a similar hang on x86?
>>
>>
>>
>> On 07/07/14 13:58, Christian Borntraeger wrote:
>>> Folks,
>>>
>>> with current 2.1-rc0 (
>>> + dataplane: do not free VirtQueueElement in vring_push()
>>> + virtio-blk: avoid dataplane VirtIOBlockReq early free
>>> + some not-ready yet s390 patches for migration
>>> )
>>>
>>> I still having issues with dataplane during managedsave (without dataplane everything seems to work fine):
>>>
>>> With 1 CPU and 1 disk (and some workload, e.g. a simple dd on the disk) I get:
>>>
>>>
>>> Thread 3 (Thread 0x3fff90fd910 (LWP 27218)):
>>> #0 0x000003fffcdb7ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
>>> #1 0x000003fffcdbac0c in __pthread_mutex_cond_lock () from /lib64/libpthread.so.0
>>> #2 0x000003fffcdb399a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>>> #3 0x00000000801fff06 in qemu_cond_wait (cond=<optimized out>, mutex=mutex@entry=0x8037f788 <qemu_global_mutex>) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
>>> #4 0x00000000800472f4 in qemu_kvm_wait_io_event (cpu=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:843
>>> #5 qemu_kvm_cpu_thread_fn (arg=0x809ad6b0) at /home/cborntra/REPOS/qemu/cpus.c:879
>>> #6 0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
>>> #7 0x000003fffba350ae in thread_start () from /lib64/libc.so.6
>>>
>>> Thread 2 (Thread 0x3fff88fd910 (LWP 27219)):
>>> #0 0x000003fffba2a8e0 in ppoll () from /lib64/libc.so.6
>>> #1 0x00000000801af250 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
>>> #2 qemu_poll_ns (fds=fds@entry=0x3fff40010c0, nfds=nfds@entry=3, timeout=-1) at /home/cborntra/REPOS/qemu/qemu-timer.c:314
>>> #3 0x00000000801b0702 in aio_poll (ctx=0x807f2230, blocking=blocking@entry=true) at /home/cborntra/REPOS/qemu/aio-posix.c:221
>>> #4 0x00000000800be3c4 in iothread_run (opaque=0x807f20d8) at /home/cborntra/REPOS/qemu/iothread.c:41
>>> #5 0x000003fffcdaf412 in start_thread () from /lib64/libpthread.so.0
>>> #6 0x000003fffba350ae in thread_start () from /lib64/libc.so.6
>>>
>>> Thread 1 (Thread 0x3fff9c529b0 (LWP 27215)):
>>> #0 0x000003fffcdb38f0 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>>> #1 0x00000000801fff06 in qemu_cond_wait (cond=cond@entry=0x807f22c0, mutex=mutex@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
>>> #2 0x0000000080212906 in rfifolock_lock (r=r@entry=0x807f2290) at /home/cborntra/REPOS/qemu/util/rfifolock.c:59
>>> #3 0x000000008019e536 in aio_context_acquire (ctx=ctx@entry=0x807f2230) at /home/cborntra/REPOS/qemu/async.c:295
>>> #4 0x00000000801a34e6 in bdrv_drain_all () at /home/cborntra/REPOS/qemu/block.c:1907
>>> #5 0x0000000080048e24 in do_vm_stop (state=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:538
>>> #6 vm_stop (state=state@entry=RUN_STATE_PAUSED) at /home/cborntra/REPOS/qemu/cpus.c:1221
>>> #7 0x00000000800e6338 in qmp_stop (errp=errp@entry=0x3ffffa9dc00) at /home/cborntra/REPOS/qemu/qmp.c:98
>>> #8 0x00000000800e1314 in qmp_marshal_input_stop (mon=<optimized out>, qdict=<optimized out>, ret=<optimized out>) at qmp-marshal.c:2806
>>> #9 0x000000008004b91a in qmp_call_cmd (cmd=<optimized out>, params=0x8096cf50, mon=0x8080b8a0) at /home/cborntra/REPOS/qemu/monitor.c:5038
>>> #10 handle_qmp_command (parser=<optimized out>, tokens=<optimized out>) at /home/cborntra/REPOS/qemu/monitor.c:5104
>>> #11 0x00000000801faf16 in json_message_process_token (lexer=0x8080b7c0, token=0x808f2610, type=<optimized out>, x=<optimized out>, y=6) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:87
>>> #12 0x0000000080212bac in json_lexer_feed_char (lexer=lexer@entry=0x8080b7c0, ch=<optimized out>, flush=flush@entry=false) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:303
>>> #13 0x0000000080212cfe in json_lexer_feed (lexer=0x8080b7c0, buffer=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/qobject/json-lexer.c:356
>>> #14 0x00000000801fb10e in json_message_parser_feed (parser=<optimized out>, buffer=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/qobject/json-streamer.c:110
>>> #15 0x0000000080049f28 in monitor_control_read (opaque=<optimized out>, buf=<optimized out>, size=<optimized out>) at /home/cborntra/REPOS/qemu/monitor.c:5125
>>> #16 0x00000000800c8636 in qemu_chr_be_write (len=1, buf=0x3ffffa9e010 "}[B\377\373\251\372\b", s=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:213
>>> #17 tcp_chr_read (chan=<optimized out>, cond=<optimized out>, opaque=0x807f5af0) at /home/cborntra/REPOS/qemu/qemu-char.c:2690
>>> #18 0x000003fffcc9f05a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
>>> #19 0x00000000801ae3e0 in glib_pollfds_poll () at /home/cborntra/REPOS/qemu/main-loop.c:190
>>> #20 os_host_main_loop_wait (timeout=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:235
>>> #21 main_loop_wait (nonblocking=<optimized out>) at /home/cborntra/REPOS/qemu/main-loop.c:484
>>> #22 0x00000000800169e2 in main_loop () at /home/cborntra/REPOS/qemu/vl.c:2024
>>> #23 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/cborntra/REPOS/qemu/vl.c:4551
>>>
>>> Now. If aio_poll never returns, we have a deadlock here.
>>> To me it looks like, that aio_poll could be called from iothread_run, even if there are no outstanding request.
>>> Opinions?
>
> I have sent out one patch to fix the issue, and the title is
> "virtio-blk: data-plane: fix save/set .complete_request in start".
>
> Please try this patch to see if it fixes your issue.
Yes, I have seen that patch. Unfortunately it does not make a difference for the managedsave case.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-08 7:43 ` Ming Lei
2014-07-08 8:38 ` Christian Borntraeger
@ 2014-07-08 9:09 ` Christian Borntraeger
2014-07-08 10:12 ` Christian Borntraeger
1 sibling, 1 reply; 13+ messages in thread
From: Christian Borntraeger @ 2014-07-08 9:09 UTC (permalink / raw)
To: Ming Lei
Cc: Cornelia Huck, Kevin Wolf, Dominik Dingel, qemu-devel, Stefan Hajnoczi
On 08/07/14 09:43, Ming Lei wrote:
> On Tue, Jul 8, 2014 at 3:19 PM, Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
>> Ping.
>>
>> has anyone seen a similar hang on x86?
The problem seems to be, that for managedsave, we do a VM stop before we call the migration_state_notifier. to be verified.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-08 9:09 ` Christian Borntraeger
@ 2014-07-08 10:12 ` Christian Borntraeger
2014-07-08 10:37 ` Christian Borntraeger
0 siblings, 1 reply; 13+ messages in thread
From: Christian Borntraeger @ 2014-07-08 10:12 UTC (permalink / raw)
To: Ming Lei
Cc: Cornelia Huck, Kevin Wolf, Dominik Dingel, qemu-devel, Stefan Hajnoczi
On 08/07/14 11:09, Christian Borntraeger wrote:
> On 08/07/14 09:43, Ming Lei wrote:
>> On Tue, Jul 8, 2014 at 3:19 PM, Christian Borntraeger
>> <borntraeger@de.ibm.com> wrote:
>>> Ping.
>>>
>>> has anyone seen a similar hang on x86?
>
> The problem seems to be, that for managedsave, we do a VM stop before we call the migration_state_notifier. to be verified.
Yes. virsh suspend also hangs. Any ideas?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-08 10:12 ` Christian Borntraeger
@ 2014-07-08 10:37 ` Christian Borntraeger
2014-07-08 11:03 ` Christian Borntraeger
0 siblings, 1 reply; 13+ messages in thread
From: Christian Borntraeger @ 2014-07-08 10:37 UTC (permalink / raw)
To: Ming Lei
Cc: Cornelia Huck, Kevin Wolf, Dominik Dingel, qemu-devel, Stefan Hajnoczi
On 08/07/14 12:12, Christian Borntraeger wrote:
> On 08/07/14 11:09, Christian Borntraeger wrote:
>> On 08/07/14 09:43, Ming Lei wrote:
>>> On Tue, Jul 8, 2014 at 3:19 PM, Christian Borntraeger
>>> <borntraeger@de.ibm.com> wrote:
>>>> Ping.
>>>>
>>>> has anyone seen a similar hang on x86?
>>
>> The problem seems to be, that for managedsave, we do a VM stop before we call the migration_state_notifier. to be verified.
>
> Yes. virsh suspend also hangs. Any ideas?
>
Finally found a solution. Merging belows merge request from upstream seems to fix my issues. I guess the plugging/unplugging fixed this implicetely,but I dont understand it yet.
Since the problem is gone, I will no longer investigate...
Bug fixes for QEMU 2.1-rc1.
The following changes since commit 9d9de254c2b81b68cd48f2324cc753a570a4cdd8:
MAINTAINERS: seccomp: change email contact for Eduardo Otubo (2014-07-03 12:36:15 +0100)
are available in the git repository at:
git://github.com/stefanha/qemu.git tags/block-pull-request
for you to fetch changes up to f4eb32b590bf58c1c67570775eb78beb09964fad:
qmp: show QOM properties in device-list-properties (2014-07-07 11:10:05 +0200)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-08 10:37 ` Christian Borntraeger
@ 2014-07-08 11:03 ` Christian Borntraeger
0 siblings, 0 replies; 13+ messages in thread
From: Christian Borntraeger @ 2014-07-08 11:03 UTC (permalink / raw)
To: Ming Lei
Cc: Cornelia Huck, Kevin Wolf, Dominik Dingel, qemu-devel, Stefan Hajnoczi
On 08/07/14 12:37, Christian Borntraeger wrote:
> On 08/07/14 12:12, Christian Borntraeger wrote:
>> On 08/07/14 11:09, Christian Borntraeger wrote:
>>> On 08/07/14 09:43, Ming Lei wrote:
>>>> On Tue, Jul 8, 2014 at 3:19 PM, Christian Borntraeger
>>>> <borntraeger@de.ibm.com> wrote:
>>>>> Ping.
>>>>>
>>>>> has anyone seen a similar hang on x86?
>>>
>>> The problem seems to be, that for managedsave, we do a VM stop before we call the migration_state_notifier. to be verified.
>>
>> Yes. virsh suspend also hangs. Any ideas?
>>
>
> Finally found a solution. Merging belows merge request from upstream seems to fix my issues. I guess the plugging/unplugging fixed this implicetely,but I dont understand it yet.
> Since the problem is gone, I will no longer investigate...
Sigh. This merge just made the bug less likely to occur.
>
>
> Bug fixes for QEMU 2.1-rc1.
>
> The following changes since commit 9d9de254c2b81b68cd48f2324cc753a570a4cdd8:
>
> MAINTAINERS: seccomp: change email contact for Eduardo Otubo (2014-07-03 12:36:15 +0100)
>
> are available in the git repository at:
>
> git://github.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to f4eb32b590bf58c1c67570775eb78beb09964fad:
>
> qmp: show QOM properties in device-list-properties (2014-07-07 11:10:05 +0200)
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-07 11:58 [Qemu-devel] another locking issue in current dataplane code? Christian Borntraeger
2014-07-08 7:19 ` Christian Borntraeger
@ 2014-07-08 15:59 ` Stefan Hajnoczi
2014-07-08 17:08 ` Paolo Bonzini
1 sibling, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2014-07-08 15:59 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Cornelia Huck, Kevin Wolf, ming.lei, qemu-devel, Dominik Dingel
[-- Attachment #1: Type: text/plain, Size: 738 bytes --]
On Mon, Jul 07, 2014 at 01:58:01PM +0200, Christian Borntraeger wrote:
> Now. If aio_poll never returns, we have a deadlock here.
> To me it looks like, that aio_poll could be called from iothread_run, even if there are no outstanding request.
> Opinions?
Christian pointed out that iothread_run() can miss aio_notify() if a
file descriptor becomes readable/writeable at the same time as the
AioContext->notifier. aio_poll() will return true since progress was
made and we are left with a hung QEMU.
I sent Christian an initial patch to fix this but now both threads are
stuck in rfifolock_lock() inside cond wait. That's very strange and
should never happen.
Still trying to figure out what is going on...
Stefan
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-08 15:59 ` Stefan Hajnoczi
@ 2014-07-08 17:08 ` Paolo Bonzini
2014-07-08 19:07 ` Christian Borntraeger
0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2014-07-08 17:08 UTC (permalink / raw)
To: Stefan Hajnoczi, Christian Borntraeger
Cc: Cornelia Huck, Kevin Wolf, ming.lei, qemu-devel, Dominik Dingel
Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto:
> I sent Christian an initial patch to fix this but now both threads are
> stuck in rfifolock_lock() inside cond wait. That's very strange and
> should never happen.
I had this patch pending for 2.2:
commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Mon Jul 7 10:39:49 2014 +0200
iothread: do not rely on aio_poll(ctx, true) result to end a loop
Currently, whenever aio_poll(ctx, true) has completed all pending
work it returns true *and* the next call to aio_poll(ctx, true)
will not block.
This invariant has its roots in qemu_aio_flush()'s implementation
as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does
not exist anymore and bdrv_drain_all() is implemented differently;
and this invariant is complicated to maintain and subtly different
from the return value of GMainLoop's g_main_context_iteration.
All calls to aio_poll(ctx, true) except one are guarded by a
while() loop checking for a request to be incomplete, or a
BlockDriverState to be idle. Modify that one exception in
iothread.c.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/iothread.c b/iothread.c
index 1fbf9f1..d9403cf 100644
--- a/iothread.c
+++ b/iothread.c
@@ -30,6 +30,7 @@ typedef ObjectClass IOThreadClass;
static void *iothread_run(void *opaque)
{
IOThread *iothread = opaque;
+ bool blocking;
qemu_mutex_lock(&iothread->init_done_lock);
iothread->thread_id = qemu_get_thread_id();
@@ -38,8 +39,10 @@ static void *iothread_run(void *opaque)
while (!iothread->stopping) {
aio_context_acquire(iothread->ctx);
- while (!iothread->stopping && aio_poll(iothread->ctx, true)) {
+ blocking = true;
+ while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) {
/* Progress was made, keep going */
+ blocking = false;
}
aio_context_release(iothread->ctx);
}
Christian, can you test it?
Paolo
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-08 17:08 ` Paolo Bonzini
@ 2014-07-08 19:07 ` Christian Borntraeger
2014-07-08 19:50 ` Paolo Bonzini
2014-07-09 7:56 ` Stefan Hajnoczi
0 siblings, 2 replies; 13+ messages in thread
From: Christian Borntraeger @ 2014-07-08 19:07 UTC (permalink / raw)
To: Paolo Bonzini, Stefan Hajnoczi
Cc: Cornelia Huck, Kevin Wolf, ming.lei, qemu-devel, Dominik Dingel
On 08/07/14 19:08, Paolo Bonzini wrote:
> Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto:
>> I sent Christian an initial patch to fix this but now both threads are
>> stuck in rfifolock_lock() inside cond wait. That's very strange and
>> should never happen.
>
> I had this patch pending for 2.2:
>
> commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date: Mon Jul 7 10:39:49 2014 +0200
>
> iothread: do not rely on aio_poll(ctx, true) result to end a loop
>
> Currently, whenever aio_poll(ctx, true) has completed all pending
> work it returns true *and* the next call to aio_poll(ctx, true)
> will not block.
>
> This invariant has its roots in qemu_aio_flush()'s implementation
> as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does
> not exist anymore and bdrv_drain_all() is implemented differently;
> and this invariant is complicated to maintain and subtly different
> from the return value of GMainLoop's g_main_context_iteration.
>
> All calls to aio_poll(ctx, true) except one are guarded by a
> while() loop checking for a request to be incomplete, or a
> BlockDriverState to be idle. Modify that one exception in
> iothread.c.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The hangs are gone. Looks like 2.1 material now...
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>
> diff --git a/iothread.c b/iothread.c
> index 1fbf9f1..d9403cf 100644
> --- a/iothread.c
> +++ b/iothread.c
> @@ -30,6 +30,7 @@ typedef ObjectClass IOThreadClass;
> static void *iothread_run(void *opaque)
> {
> IOThread *iothread = opaque;
> + bool blocking;
>
> qemu_mutex_lock(&iothread->init_done_lock);
> iothread->thread_id = qemu_get_thread_id();
> @@ -38,8 +39,10 @@ static void *iothread_run(void *opaque)
>
> while (!iothread->stopping) {
> aio_context_acquire(iothread->ctx);
> - while (!iothread->stopping && aio_poll(iothread->ctx, true)) {
> + blocking = true;
> + while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) {
> /* Progress was made, keep going */
> + blocking = false;
> }
> aio_context_release(iothread->ctx);
> }
>
> Christian, can you test it?
>
> Paolo
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-08 19:07 ` Christian Borntraeger
@ 2014-07-08 19:50 ` Paolo Bonzini
2014-07-09 7:56 ` Stefan Hajnoczi
1 sibling, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2014-07-08 19:50 UTC (permalink / raw)
To: Christian Borntraeger, Stefan Hajnoczi
Cc: Cornelia Huck, Kevin Wolf, ming.lei, qemu-devel, Dominik Dingel
Il 08/07/2014 21:07, Christian Borntraeger ha scritto:
> On 08/07/14 19:08, Paolo Bonzini wrote:
>> Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto:
>>> I sent Christian an initial patch to fix this but now both threads are
>>> stuck in rfifolock_lock() inside cond wait. That's very strange and
>>> should never happen.
>>
>> I had this patch pending for 2.2:
>>
>> commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f
>> Author: Paolo Bonzini <pbonzini@redhat.com>
>> Date: Mon Jul 7 10:39:49 2014 +0200
>>
>> iothread: do not rely on aio_poll(ctx, true) result to end a loop
>>
>> Currently, whenever aio_poll(ctx, true) has completed all pending
>> work it returns true *and* the next call to aio_poll(ctx, true)
>> will not block.
>>
>> This invariant has its roots in qemu_aio_flush()'s implementation
>> as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does
>> not exist anymore and bdrv_drain_all() is implemented differently;
>> and this invariant is complicated to maintain and subtly different
>> from the return value of GMainLoop's g_main_context_iteration.
>>
>> All calls to aio_poll(ctx, true) except one are guarded by a
>> while() loop checking for a request to be incomplete, or a
>> BlockDriverState to be idle. Modify that one exception in
>> iothread.c.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> The hangs are gone. Looks like 2.1 material now...
>
> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Great, I'll send it out tomorrow morning.
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] another locking issue in current dataplane code?
2014-07-08 19:07 ` Christian Borntraeger
2014-07-08 19:50 ` Paolo Bonzini
@ 2014-07-09 7:56 ` Stefan Hajnoczi
1 sibling, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2014-07-09 7:56 UTC (permalink / raw)
To: Christian Borntraeger
Cc: Kevin Wolf, Ming Lei, qemu-devel, Dominik Dingel,
Stefan Hajnoczi, Cornelia Huck, Paolo Bonzini
On Tue, Jul 8, 2014 at 9:07 PM, Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> On 08/07/14 19:08, Paolo Bonzini wrote:
>> Il 08/07/2014 17:59, Stefan Hajnoczi ha scritto:
>>> I sent Christian an initial patch to fix this but now both threads are
>>> stuck in rfifolock_lock() inside cond wait. That's very strange and
>>> should never happen.
>>
>> I had this patch pending for 2.2:
>>
>> commit 6c81e31615c3cda5ea981a998ba8b1b8ed17de6f
>> Author: Paolo Bonzini <pbonzini@redhat.com>
>> Date: Mon Jul 7 10:39:49 2014 +0200
>>
>> iothread: do not rely on aio_poll(ctx, true) result to end a loop
>>
>> Currently, whenever aio_poll(ctx, true) has completed all pending
>> work it returns true *and* the next call to aio_poll(ctx, true)
>> will not block.
>>
>> This invariant has its roots in qemu_aio_flush()'s implementation
>> as "while (qemu_aio_wait()) {}". However, qemu_aio_flush() does
>> not exist anymore and bdrv_drain_all() is implemented differently;
>> and this invariant is complicated to maintain and subtly different
>> from the return value of GMainLoop's g_main_context_iteration.
>>
>> All calls to aio_poll(ctx, true) except one are guarded by a
>> while() loop checking for a request to be incomplete, or a
>> BlockDriverState to be idle. Modify that one exception in
>> iothread.c.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> The hangs are gone. Looks like 2.1 material now...
>
> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
>
>
>
>
>>
>> diff --git a/iothread.c b/iothread.c
>> index 1fbf9f1..d9403cf 100644
>> --- a/iothread.c
>> +++ b/iothread.c
>> @@ -30,6 +30,7 @@ typedef ObjectClass IOThreadClass;
>> static void *iothread_run(void *opaque)
>> {
>> IOThread *iothread = opaque;
>> + bool blocking;
>>
>> qemu_mutex_lock(&iothread->init_done_lock);
>> iothread->thread_id = qemu_get_thread_id();
>> @@ -38,8 +39,10 @@ static void *iothread_run(void *opaque)
>>
>> while (!iothread->stopping) {
>> aio_context_acquire(iothread->ctx);
>> - while (!iothread->stopping && aio_poll(iothread->ctx, true)) {
>> + blocking = true;
>> + while (!iothread->stopping && aio_poll(iothread->ctx, blocking)) {
>> /* Progress was made, keep going */
>> + blocking = false;
>> }
>> aio_context_release(iothread->ctx);
>> }
>>
>> Christian, can you test it?
Could affect performance because of the extra poll/release/acquire but
a clean solution for broken iothread_run().
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Good for 2.1
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2014-07-09 7:56 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-07 11:58 [Qemu-devel] another locking issue in current dataplane code? Christian Borntraeger
2014-07-08 7:19 ` Christian Borntraeger
2014-07-08 7:43 ` Ming Lei
2014-07-08 8:38 ` Christian Borntraeger
2014-07-08 9:09 ` Christian Borntraeger
2014-07-08 10:12 ` Christian Borntraeger
2014-07-08 10:37 ` Christian Borntraeger
2014-07-08 11:03 ` Christian Borntraeger
2014-07-08 15:59 ` Stefan Hajnoczi
2014-07-08 17:08 ` Paolo Bonzini
2014-07-08 19:07 ` Christian Borntraeger
2014-07-08 19:50 ` Paolo Bonzini
2014-07-09 7:56 ` Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.