* [RFC] qemu_cleanup: do vm_shutdown() before bdrv_drain_all_begin()
@ 2021-07-30 14:29 Vladimir Sementsov-Ogievskiy
2021-09-02 9:32 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 2+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-07-30 14:29 UTC (permalink / raw)
To: qemu-block; +Cc: mreitz, kwolf, den, vsementsov, qemu-devel, pbonzini
That doesn't seem good to stop handling io when guest is still running.
For example it leads to the following:
After bdrv_drain_all_begin(), during vm_shutdown() scsi_dma_writev()
calls blk_aio_pwritev(). As we are in drained section the request waits
in blk_wait_while_drained().
Next, during bdrv_close_all() bs is removed from blk, and blk drain
finishes. So, the request is resumed, and fails with -ENOMEDIUM.
Corresponding BLOCK_IO_ERROR event is sent and takes place in libvirt
log. That doesn't seem good.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
Hi all!
In our product (v5.2 based) we faced -ENOMEDIUM BLOCK_IO_ERROR events
during qemu termination (by SIGTERM). I don't have a reproducer for
master. Still the problem seems possible.
Ideas of how to reproduce it are welcome.
Also, I thought that issuing blk_ requests from SCSI is not possible
during drained section, and logic with blk_wait_while_drained() was
introduced for IDE.. Which code is responsible for not issuing SCSI
requests during drained sections? May be it is racy.. Or it may be our
downstream problem, I don't know :(
softmmu/runstate.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 10d9b7365a..1966d773f3 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -797,21 +797,18 @@ void qemu_cleanup(void)
*/
blk_exp_close_all();
+ /* No more vcpu or device emulation activity beyond this point */
+ vm_shutdown();
+ replay_finish();
+
/*
* We must cancel all block jobs while the block layer is drained,
* or cancelling will be affected by throttling and thus may block
* for an extended period of time.
- * vm_shutdown() will bdrv_drain_all(), so we may as well include
- * it in the drained section.
* We do not need to end this section, because we do not want any
* requests happening from here on anyway.
*/
bdrv_drain_all_begin();
-
- /* No more vcpu or device emulation activity beyond this point */
- vm_shutdown();
- replay_finish();
-
job_cancel_sync_all();
bdrv_close_all();
--
2.29.2
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [RFC] qemu_cleanup: do vm_shutdown() before bdrv_drain_all_begin()
2021-07-30 14:29 [RFC] qemu_cleanup: do vm_shutdown() before bdrv_drain_all_begin() Vladimir Sementsov-Ogievskiy
@ 2021-09-02 9:32 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 2+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2021-09-02 9:32 UTC (permalink / raw)
To: qemu-block; +Cc: mreitz, kwolf, den, qemu-devel, pbonzini
ping
30.07.2021 17:29, Vladimir Sementsov-Ogievskiy wrote:
> That doesn't seem good to stop handling io when guest is still running.
> For example it leads to the following:
>
> After bdrv_drain_all_begin(), during vm_shutdown() scsi_dma_writev()
> calls blk_aio_pwritev(). As we are in drained section the request waits
> in blk_wait_while_drained().
>
> Next, during bdrv_close_all() bs is removed from blk, and blk drain
> finishes. So, the request is resumed, and fails with -ENOMEDIUM.
> Corresponding BLOCK_IO_ERROR event is sent and takes place in libvirt
> log. That doesn't seem good.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>
> Hi all!
>
> In our product (v5.2 based) we faced -ENOMEDIUM BLOCK_IO_ERROR events
> during qemu termination (by SIGTERM). I don't have a reproducer for
> master. Still the problem seems possible.
>
> Ideas of how to reproduce it are welcome.
>
> Also, I thought that issuing blk_ requests from SCSI is not possible
> during drained section, and logic with blk_wait_while_drained() was
> introduced for IDE.. Which code is responsible for not issuing SCSI
> requests during drained sections? May be it is racy.. Or it may be our
> downstream problem, I don't know :(
>
> softmmu/runstate.c | 11 ++++-------
> 1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index 10d9b7365a..1966d773f3 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -797,21 +797,18 @@ void qemu_cleanup(void)
> */
> blk_exp_close_all();
>
> + /* No more vcpu or device emulation activity beyond this point */
> + vm_shutdown();
> + replay_finish();
> +
> /*
> * We must cancel all block jobs while the block layer is drained,
> * or cancelling will be affected by throttling and thus may block
> * for an extended period of time.
> - * vm_shutdown() will bdrv_drain_all(), so we may as well include
> - * it in the drained section.
> * We do not need to end this section, because we do not want any
> * requests happening from here on anyway.
> */
> bdrv_drain_all_begin();
> -
> - /* No more vcpu or device emulation activity beyond this point */
> - vm_shutdown();
> - replay_finish();
> -
> job_cancel_sync_all();
> bdrv_close_all();
>
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-09-02 9:33 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 14:29 [RFC] qemu_cleanup: do vm_shutdown() before bdrv_drain_all_begin() Vladimir Sementsov-Ogievskiy
2021-09-02 9:32 ` Vladimir Sementsov-Ogievskiy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).