On Tue, Dec 15, 2020 at 04:34:05PM +0100, Kevin Wolf wrote: > Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben: > > There's a cross-dependency between closing the block exports and > > draining the block layer. The latter needs that we close all export's > > client connections to ensure they won't queue more requests, but the > > exports may have coroutines yielding in the block layer, which implies > > they can't be fully closed until we drain it. > > A coroutine that yielded must have some way to be reentered. So I guess > the quesiton becomes why they aren't reentered until drain. We do > process events: > > AIO_WAIT_WHILE(NULL, blk_exp_has_type(type)); > > So in theory, anything that would finalise the block export closing > should still execute. > > What is the difference that drain makes compared to a simple > AIO_WAIT_WHILE, so that coroutine are reentered during drain, but not > during AIO_WAIT_WHILE? > > This is an even more interesting question because the NBD server isn't a > block node nor a BdrvChildClass implementation, so it shouldn't even > notice a drain operation. OK, took a deeper dive into the issue. While shutting down the guest, some co-routines from the NBD server are stuck here: nbd_trip nbd_handle_request nbd_do_cmd_read nbd_co_send_sparse_read blk_pread blk_prw blk_read_entry blk_do_preadv blk_wait_while_drained qemu_co_queue_wait This happens because bdrv_close_all() is called after bdrv_drain_all_begin(), so all block backends are quiesced. An alternative approach to this patch would be moving blk_exp_close_all() to vl.c:qemu_cleanup, before bdrv_drain_all_begin(). Do you have a preference for one of these options? Thanks, Sergio.