reliable live migration of large and busy guests

* reliable live migration of large and busy guests
@ 2012-11-06 20:28 Olaf Hering
  2012-11-06 20:45 ` Keir Fraser
  0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2012-11-06 20:28 UTC (permalink / raw)
  To: xen-devel

We got a customer report about long-lasting and then failing live
migration of busy guests.

The guest has 64G memory, is busy with its set of applications and as a
result there will be always dirty pages to transfer. While some of this
can be solved with faster network connection, the underlying issue is
that tools/libxc/xc_domain_save.c:xc_domain_save will suspend a domain
after a given number of iterations to transfer the remaining dirty
pages. From what I understand this pausing of the guest (I dont know how
long it is actually paused) is causing issues within the guest, the
applications start to fail (again, no details).

Their suggestion is to add some knob to the overall live migration
process to avoid the suspend. If the guest could not be transfered with
the parameters passed to xc_domain_save(), abort the migration and let
it running on the old host.

My questions are:
Was such issue ever seen elsewhere?
Should 'xm migrate --live' and 'xl migrate' get something like a
--no-suspend option?

Olaf

^ permalink raw reply	[flat|nested] 10+ messages in thread