On 9/12/19 6:31 AM, Kevin Wolf wrote: >> >> Yes, I think locking the context during the "if (exp->blk) {" block at >> nbd/server.c:1646 should do the trick. That line number has moved over time; which function are you referring to? > > We need to be careful to avoid locking things twice, so maybe > nbd_export_put() is already too deep inside the NBD server. > > Its callers are: > > * qmp_nbd_server_add(). Like all other QMP handlers in blockdev-nbd.c it > neglects to lock the AioContext, but it should do so. The lock is not > only needed for the nbd_export_put() call, but even before. > > * nbd_export_close(), which in turn is called from: > * nbd_eject_notifier(): run in the main thread, not locked > * nbd_export_remove(): > * qmp_nbd_server_remove(): see above > * nbd_export_close_all(): > * bdrv_close_all() > * qmp_nbd_server_stop() Even weirder: nbd_export_put() calls nbd_export_close(), and nbd_export_close() calls nbd_export_put(). The mutual recursion is mind-numbing, and the fact that we use get/put instead of ref/unref like most other qemu code is not making it any easier to reason about. > > There are also calls from qemu-nbd, but these can be ignored because we > don't have iothreads there. > > I think the cleanest would be to take the lock in the outermost callers, > i.e. all QMP handlers that deal with a specific export, in the eject > notifier and in nbd_export_close_all(). Okay, I'm trying that (I already tried grabbing the aio_context in nbd_export_close(), but as you predicted, that deadlocked when a nested call already encountered the lock taken from an outer call). > >> On the other hand, I wonder if there is any situation in which calling >> to blk_unref() without locking the context could be safe. If there isn't >> any, perhaps we should assert that the lock is held if blk->ctx != NULL >> to catch this kind of bugs earlier? > > blk_unref() must be called from the main thread, and if the BlockBackend > to be unreferenced is not in the main AioContext, the lock must be held. > > I'm not sure how to assert that locks are held, though. I once looked > for a way to do this, but it involved either looking at the internal > state of pthreads mutexes or hacking up QemuMutex with debug state. Even if we can only test that in a debug build but not during normal builds, could any of our CI builds set up that configuration? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org