[Qemu-devel] Live migration without bdrv_drain_all()

* [Qemu-devel] Live migration without bdrv_drain_all()
@ 2016-08-29 15:06 Stefan Hajnoczi
  2016-08-29 18:56 ` Felipe Franciosi
  2016-09-27  9:48 ` Daniel P. Berrange
  0 siblings, 2 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2016-08-29 15:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: cui, felipe, Kevin Wolf, Paolo Bonzini

At KVM Forum an interesting idea was proposed to avoid
bdrv_drain_all() during live migration.  Mike Cui and Felipe Franciosi
mentioned running at queue depth 1.  It needs more thought to make it
workable but I want to capture it here for discussion and to archive
it.

bdrv_drain_all() is synchronous and can cause VM downtime if I/O
requests hang.  We should find a better way of quiescing I/O that is
not synchronous.  Up until now I thought we should simply add a
timeout to bdrv_drain_all() so it can at least fail (and live
migration would fail) if I/O is stuck instead of hanging the VM.  But
the following approach is also interesting...

During the iteration phase of live migration we could limit the queue
depth so points with no I/O requests in-flight are identified.  At
these points the migration algorithm has the opportunity to move to
the next phase without requiring bdrv_drain_all() since no requests
are pending.

Unprocessed requests are left in the virtio-blk/virtio-scsi virtqueues
so that the destination QEMU can process them after migration
completes.

Unfortunately this approach makes convergence harder because the VM
might also be dirtying memory pages during the iteration phase.  Now
we need to reach a spot where no I/O is in-flight *and* dirty memory
is under the threshold.

Thoughts?

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread