On 9/12/19 1:37 AM, Sergio Lopez wrote: >> I tried to test this patch, but even with it applied, I still got an >> aio-context crasher by attempting an nbd-server-start, nbd-server-add, >> nbd-server-stop (intentionally skipping the nbd-server-remove step) on a >> domain using iothreads, with a backtrace of: >> >> #0 0x00007ff09d070e35 in raise () from target:/lib64/libc.so.6 >> #1 0x00007ff09d05b895 in abort () from target:/lib64/libc.so.6 >> #2 0x000055dd03b9ab86 in error_exit (err=1, msg=0x55dd03d59fb0 >> <__func__.15769> "qemu_mutex_unlock_impl") >> at util/qemu-thread-posix.c:36 >> #3 0x000055dd03b9adcf in qemu_mutex_unlock_impl (mutex=0x55dd062d5090, >> file=0x55dd03d59041 "util/async.c", >> line=523) at util/qemu-thread-posix.c:96 >> #4 0x000055dd03b93433 in aio_context_release (ctx=0x55dd062d5030) at >> util/async.c:523 >> #14 0x000055dd03748845 in qmp_nbd_server_stop (errp=0x7ffcdf3cb4e8) at >> blockdev-nbd.c:233 >> ... Sorry for truncating the initial stackdump report. The rest of the trace (it is definitely in the main loop): #15 0x0000560be491c910 in qmp_marshal_nbd_server_stop (args=0x560be54c4d00, ret=0x7ffdd832de38, errp=0x7ffdd832de30) at qapi/qapi-commands-block.c:318 #16 0x0000560be4a7a306 in do_qmp_dispatch (cmds=0x560be50dc1f0 , request=0x7fbcac009af0, allow_oob=false, errp=0x7ffdd832ded8) at qapi/qmp-dispatch.c:131 #17 0x0000560be4a7a507 in qmp_dispatch (cmds=0x560be50dc1f0 , request=0x7fbcac009af0, allow_oob=false) at qapi/qmp-dispatch.c:174 #18 0x0000560be48edd81 in monitor_qmp_dispatch (mon=0x560be55d6670, req=0x7fbcac009af0) at monitor/qmp.c:120 #19 0x0000560be48ee116 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:209 #20 0x0000560be4ad16a2 in aio_bh_call (bh=0x560be53dbe90) at util/async.c:89 #21 0x0000560be4ad173a in aio_bh_poll (ctx=0x560be53daba0) at util/async.c:117 #22 0x0000560be4ad6514 in aio_dispatch (ctx=0x560be53daba0) at util/aio-posix.c:459 #23 0x0000560be4ad1ad3 in aio_ctx_dispatch (source=0x560be53daba0, callback=0x0, user_data=0x0) at util/async.c:260 #24 0x00007fbcd7083ecd in g_main_context_dispatch () from target:/lib64/libglib-2.0.so.0 #25 0x0000560be4ad4e47 in glib_pollfds_poll () at util/main-loop.c:218 #26 0x0000560be4ad4ec1 in os_host_main_loop_wait (timeout=1000000000) at util/main-loop.c:241 #27 0x0000560be4ad4fc6 in main_loop_wait (nonblocking=0) at util/main-loop.c:517 --Type for more, q to quit, c to continue without paging-- #28 0x0000560be4691266 in main_loop () at vl.c:1806 #29 0x0000560be46988a9 in main (argc=112, argv=0x7ffdd832e4e8, envp=0x7ffdd832e870) at vl.c:4488 >> >> Does that sound familiar to what you were seeing? Does it mean we >> missed another spot where the context is set incorrectly? > > It looks like it was trying to release the AioContext while it was still > held by some other thread. Is this stacktrace from the main thread or an > iothread? What was the other one doing? Kevin had some ideas on what it might be; I'm playing with obtaining the context in the spots he pointed out. > >> I'm happy to work with you on IRC for more real-time debugging of this >> (I'm woefully behind on understanding how aio contexts are supposed to >> work). > > I must be missing some step, because I can't reproduce this one > here. I've tried both with an idle NDB server and one with a client > generating I/O. Is it reproducible 100% of them time? Yes, with iothreads. I took some time today to boil it down to something that does not require libvirt: $ file myfile myfile: QEMU QCOW2 Image (v3), 104857600 bytes $ qemu-img create -f qcow2 -o backing_file=myfile,backing_fmt=qcow2 \ myfile.wrap Formatting 'myfile.wrap', fmt=qcow2 size=104857600 backing_file=myfile backing_fmt=qcow2 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ ./x86_64-softmmu/qemu-system-x86_64 -nodefaults \ -name tmp,debug-threads=on -machine pc-q35-3.1,accel=kvm \ -object iothread,id=iothread1 \ -drive file=myfile,format=qcow2,if=none,id=drive-virtio-disk0,node-name=n \ -device virtio-blk-pci,iothread=iothread1,drive=drive-virtio-disk0,id=virtio-disk0 \ -qmp stdio -nographic {'execute':'qmp_capabilities'} {'execute':'nbd-server-start','arguments':{'addr':{'type':'inet', 'data':{'host':'localhost','port':'10809'}}}} {'execute':'blockdev-add','arguments':{'driver':'qcow2', 'node-name':'t','file'{'driver':'file', 'filename':'myfile.wrap'},'backing':'n'}} {'execute':'blockdev-backup','arguments':{'device':'n', 'target':'t','sync':'none','job-id':'b'}} {'execute':'nbd-server-add','arguments':{'device':'t','name':'t'}} {'execute':'nbd-server-remove','arguments':{'name':'t'}} Aborted (core dumped) I'm now playing with Kevin's ideas of grabbing the aiocontext around nbd unref. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org