Properly quitting qemu immediately after failing migration

* Properly quitting qemu immediately after failing migration
@ 2020-06-29 13:48 Max Reitz
  2020-06-29 14:18 ` Vladimir Sementsov-Ogievskiy
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Max Reitz @ 2020-06-29 13:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Vladimir Sementsov-Ogievskiy,
	Juan Quintela, qemu-devel

[-- Attachment #1.1: Type: text/plain, Size: 1852 bytes --]

Hi,

In an iotest, I’m trying to quit qemu immediately after a migration has
failed.  Unfortunately, that doesn’t seem to be possible in a clean way:
migrate_fd_cleanup() runs only at some point after the migration state
is already “failed”, so if I just wait for that “failed” state and
immediately quit, some cleanup functions may not have been run yet.

This is a problem with dirty bitmap migration at least, because it
increases the refcount on all block devices that are to be migrated, so
if we don’t call the cleanup function before quitting, the refcount will
stay elevated and bdrv_close_all() will hit an assertion because those
block devices are still around after blk_remove_all_bs() and
blockdev_close_all_bdrv_states().

In practice this particular issue might not be that big of a problem,
because it just means qemu aborts when the user intended to let it quit
anyway.  But on one hand I could imagine that there are other clean-up
paths that should definitely run before qemu quits (although I don’t
know), and on the other, it’s a problem for my test.

I tried working around the problem for my test by waiting on “Unable to
write” appearing on stderr, because that indicates that
migrate_fd_cleanup()’s error_report_err() has been reached.  But on one
hand, that isn’t really nice, and on the other, it doesn’t even work
when the failure is on the source side (because then there is no
s->error for migrate_fd_cleanup() to report).

In all, I’m asking:
(1) Is there a nice solution for me now to delay quitting qemu until the
failed migration has been fully resolved, including the clean-up?

(2) Isn’t it a problem if qemu crashes when you issue “quit” via QMP at
the wrong time?  Like, maybe lingering subprocesses when using “exec”?

Thanks,

Max

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread