All of lore.kernel.org
 help / color / mirror / Atom feed
* reliable live migration of large and busy guests
@ 2012-11-06 20:28 Olaf Hering
  2012-11-06 20:45 ` Keir Fraser
  0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2012-11-06 20:28 UTC (permalink / raw)
  To: xen-devel


We got a customer report about long-lasting and then failing live
migration of busy guests.

The guest has 64G memory, is busy with its set of applications and as a
result there will be always dirty pages to transfer. While some of this
can be solved with faster network connection, the underlying issue is
that tools/libxc/xc_domain_save.c:xc_domain_save will suspend a domain
after a given number of iterations to transfer the remaining dirty
pages. From what I understand this pausing of the guest (I dont know how
long it is actually paused) is causing issues within the guest, the
applications start to fail (again, no details).

Their suggestion is to add some knob to the overall live migration
process to avoid the suspend. If the guest could not be transfered with
the parameters passed to xc_domain_save(), abort the migration and let
it running on the old host.


My questions are:
Was such issue ever seen elsewhere?
Should 'xm migrate --live' and 'xl migrate' get something like a
--no-suspend option?


Olaf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: reliable live migration of large and busy guests
  2012-11-06 20:28 reliable live migration of large and busy guests Olaf Hering
@ 2012-11-06 20:45 ` Keir Fraser
  2012-11-06 22:18   ` Olaf Hering
  0 siblings, 1 reply; 10+ messages in thread
From: Keir Fraser @ 2012-11-06 20:45 UTC (permalink / raw)
  To: Olaf Hering, xen-devel

On 06/11/2012 20:28, "Olaf Hering" <olaf@aepfle.de> wrote:

> We got a customer report about long-lasting and then failing live
> migration of busy guests.
> 
> The guest has 64G memory, is busy with its set of applications and as a
> result there will be always dirty pages to transfer. While some of this
> can be solved with faster network connection, the underlying issue is
> that tools/libxc/xc_domain_save.c:xc_domain_save will suspend a domain
> after a given number of iterations to transfer the remaining dirty
> pages. From what I understand this pausing of the guest (I dont know how
> long it is actually paused) is causing issues within the guest, the
> applications start to fail (again, no details).
> 
> Their suggestion is to add some knob to the overall live migration
> process to avoid the suspend. If the guest could not be transfered with
> the parameters passed to xc_domain_save(), abort the migration and let
> it running on the old host.
> 
> 
> My questions are:
> Was such issue ever seen elsewhere?

It's known that if you have a workload that is dirtying lots of pages
quickly, the final stop-and-copy phase will necessarily be large. A VM that
is busy dirtying lots of pages can dirty pages much quicker than they can be
transferred over the LAN.

> Should 'xm migrate --live' and 'xl migrate' get something like a
> --no-suspend option?

Well, it is not really possible to avoid the suspend altogether, there is
always going to be some minimal 'dirty working set'. But could provide
parameters to require the dirty working set to be smaller than X pages
within Y rounds of dirty page copying.

 -- Keir

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: reliable live migration of large and busy guests
  2012-11-06 20:45 ` Keir Fraser
@ 2012-11-06 22:18   ` Olaf Hering
  2012-11-06 23:18     ` Andrew Cooper
  0 siblings, 1 reply; 10+ messages in thread
From: Olaf Hering @ 2012-11-06 22:18 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel

On Tue, Nov 06, Keir Fraser wrote:

> It's known that if you have a workload that is dirtying lots of pages
> quickly, the final stop-and-copy phase will necessarily be large. A VM that
> is busy dirtying lots of pages can dirty pages much quicker than they can be
> transferred over the LAN.

In my opinion such migration should be done at application level.

> > Should 'xm migrate --live' and 'xl migrate' get something like a
> > --no-suspend option?
> 
> Well, it is not really possible to avoid the suspend altogether, there is
> always going to be some minimal 'dirty working set'. But could provide
> parameters to require the dirty working set to be smaller than X pages
> within Y rounds of dirty page copying.

Should such knobs be exposed to the tools like x[lm] migrate --knob1 val --knob2 val?

Olaf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: reliable live migration of large and busy guests
  2012-11-06 22:18   ` Olaf Hering
@ 2012-11-06 23:18     ` Andrew Cooper
  2012-11-06 23:41       ` Dan Magenheimer
  2012-11-07 14:13       ` Olaf Hering
  0 siblings, 2 replies; 10+ messages in thread
From: Andrew Cooper @ 2012-11-06 23:18 UTC (permalink / raw)
  To: xen-devel

On 06/11/12 22:18, Olaf Hering wrote:
> On Tue, Nov 06, Keir Fraser wrote:
>
>> It's known that if you have a workload that is dirtying lots of pages
>> quickly, the final stop-and-copy phase will necessarily be large. A VM that
>> is busy dirtying lots of pages can dirty pages much quicker than they can be
>> transferred over the LAN.
> In my opinion such migration should be done at application level.
>
>>> Should 'xm migrate --live' and 'xl migrate' get something like a
>>> --no-suspend option?
>> Well, it is not really possible to avoid the suspend altogether, there is
>> always going to be some minimal 'dirty working set'. But could provide
>> parameters to require the dirty working set to be smaller than X pages
>> within Y rounds of dirty page copying.
> Should such knobs be exposed to the tools like x[lm] migrate --knob1 val --knob2 val?
>
> Olaf

We (Citrix) are currently looking at some fairly serious performance
issues with migration with both classic and pvops dom0 kernels (Patches
to follow in due course)

While that will make the situation better, it wont solve the problem you
have described.

As far as I understand (so please correct me if I am wrong), migration
works by transmitting pages until the rate of dirty pages per round
approaches constant, at which point the domain gets paused, all
remaining dirty pages are transmitted.  (With the proviso that currently
there are a maximum number of rounds until an automatic pausing - this
is automatically problematic with increasing guest sizes.)  Having these
knobs tweakable by the admin/toolstack seems like a very sensible idea.

The application problem you described could possibly be something
crashing because of a sufficiently large jump in time?

As potential food for thought:

Is there wisdom in having a new kind of live migrate which, when pausing
the VM on the source host, resumes the VM on the destination host.  Xen
would have to track not-yet-sent pages and pause the guest on pagefault,
and request the required page as a matter of priority.

The advantages of this approach would be that a timing sensitive
workloads would be paused for far less time.  Even if it was frequently
being paused for pagefaults, the time to get a single page over the LAN
would be far quicker than the entire dirty set, at which point on
resume, the interrupt paths would fire again; The timing paths would
quickly become fully populated.  Further to that, a busy workload in the
guest dirtying a page which has already been sent will not result in any
further network traffic.

The disadvantages would be that Xen would need 2 way communication with
the toolstack to prioritise which page is needed to resolve a pagefault,
and presumably the toolstack->toolstack protocol would be more
complicated.  In addition, it would be much harder to "roll back" the
migrate; Once you resume the guest on the destination host, you are
committed to completing the migrate.

I presume there are other issues I have overlooked, but this idea has
literally just occurred to me upon reading this thread so far.  Comments?

~Andrew

>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: reliable live migration of large and busy guests
  2012-11-06 23:18     ` Andrew Cooper
@ 2012-11-06 23:41       ` Dan Magenheimer
  2012-11-07 13:44         ` Andrew Cooper
  2012-11-07 14:13       ` Olaf Hering
  1 sibling, 1 reply; 10+ messages in thread
From: Dan Magenheimer @ 2012-11-06 23:41 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Sent: Tuesday, November 06, 2012 4:19 PM
> To: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] reliable live migration of large and busy guests
> 
> As potential food for thought:
> 
> Is there wisdom in having a new kind of live migrate which, when pausing
> the VM on the source host, resumes the VM on the destination host.  Xen
> would have to track not-yet-sent pages and pause the guest on pagefault,
> and request the required page as a matter of priority.
> 
> The advantages of this approach would be that a timing sensitive
> workloads would be paused for far less time.  Even if it was frequently
> being paused for pagefaults, the time to get a single page over the LAN
> would be far quicker than the entire dirty set, at which point on
> resume, the interrupt paths would fire again; The timing paths would
> quickly become fully populated.  Further to that, a busy workload in the
> guest dirtying a page which has already been sent will not result in any
> further network traffic.

Something like this?

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2368 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: reliable live migration of large and busy guests
  2012-11-06 23:41       ` Dan Magenheimer
@ 2012-11-07 13:44         ` Andrew Cooper
  2012-11-07 15:10           ` Dan Magenheimer
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Cooper @ 2012-11-07 13:44 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: xen-devel

On 06/11/12 23:41, Dan Magenheimer wrote:
>> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
>> Sent: Tuesday, November 06, 2012 4:19 PM
>> To: xen-devel@lists.xen.org
>> Subject: Re: [Xen-devel] reliable live migration of large and busy guests
>>
>> As potential food for thought:
>>
>> Is there wisdom in having a new kind of live migrate which, when pausing
>> the VM on the source host, resumes the VM on the destination host.  Xen
>> would have to track not-yet-sent pages and pause the guest on pagefault,
>> and request the required page as a matter of priority.
>>
>> The advantages of this approach would be that a timing sensitive
>> workloads would be paused for far less time.  Even if it was frequently
>> being paused for pagefaults, the time to get a single page over the LAN
>> would be far quicker than the entire dirty set, at which point on
>> resume, the interrupt paths would fire again; The timing paths would
>> quickly become fully populated.  Further to that, a busy workload in the
>> guest dirtying a page which has already been sent will not result in any
>> further network traffic.
> Something like this?
>
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2368 

Oh wow - something quite like that.  Thankyou very much.  I will read
the paper in full when I get a free moment, but the abstract looks very
interesting.

>From an idealistic point of view, it might be quite nice to have several
live migrate mechanisms, so the user can choose whether they value
minimum downtime, minimum network utilisation, or maximum safety.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: reliable live migration of large and busy guests
  2012-11-06 23:18     ` Andrew Cooper
  2012-11-06 23:41       ` Dan Magenheimer
@ 2012-11-07 14:13       ` Olaf Hering
  1 sibling, 0 replies; 10+ messages in thread
From: Olaf Hering @ 2012-11-07 14:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

On Tue, Nov 06, Andrew Cooper wrote:

> Is there wisdom in having a new kind of live migrate which, when pausing
> the VM on the source host, resumes the VM on the destination host.  Xen
> would have to track not-yet-sent pages and pause the guest on pagefault,
> and request the required page as a matter of priority.

On the receiving side all missing pages could be handled as "paged"
(just nominating a missing pfn should be enough). A pager can then
request them from the sender host.

Olaf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: reliable live migration of large and busy guests
  2012-11-07 13:44         ` Andrew Cooper
@ 2012-11-07 15:10           ` Dan Magenheimer
  2012-11-08 10:58             ` George Dunlap
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Magenheimer @ 2012-11-07 15:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel

> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> Subject: Re: [Xen-devel] reliable live migration of large and busy guests
> 
> On 06/11/12 23:41, Dan Magenheimer wrote:
> >> From: Andrew Cooper [mailto:andrew.cooper3@citrix.com]
> >> Sent: Tuesday, November 06, 2012 4:19 PM
> >> To: xen-devel@lists.xen.org
> >> Subject: Re: [Xen-devel] reliable live migration of large and busy guests
> >>
> >> As potential food for thought:
> >>
> >> Is there wisdom in having a new kind of live migrate which, when pausing
> >> the VM on the source host, resumes the VM on the destination host.  Xen
> >> would have to track not-yet-sent pages and pause the guest on pagefault,
> >> and request the required page as a matter of priority.
> >>
> >> The advantages of this approach would be that a timing sensitive
> >> workloads would be paused for far less time.  Even if it was frequently
> >> being paused for pagefaults, the time to get a single page over the LAN
> >> would be far quicker than the entire dirty set, at which point on
> >> resume, the interrupt paths would fire again; The timing paths would
> >> quickly become fully populated.  Further to that, a busy workload in the
> >> guest dirtying a page which has already been sent will not result in any
> >> further network traffic.
> > Something like this?
> >
> > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.2368
> 
> Oh wow - something quite like that.  Thankyou very much.  I will read
> the paper in full when I get a free moment, but the abstract looks very
> interesting.

Hi Andrew --

FYI, selfballooning is now built into the Linux kernel (since about
summer of 2011, so may not be in many distros yet).  It is currently
tied to tmem (transcendent memory), which is not turned on by
default but if you start developing something like post-copy migration,
let me know.  AFAIK, there is no way to do selfballooning in
Windows (not even in userspace I think, since IIRC, unlike Linux
sysfs, there is no way to adjust the balloon size outside the
kernel... but I know nothing about Windows ;-)
 
> From an idealistic point of view, it might be quite nice to have several
> live migrate mechanisms, so the user can choose whether they value
> minimum downtime, minimum network utilisation, or maximum safety.

Agreed.  IIRC, when post-copy was suggested for Xen years ago,
Ian Pratt was against it, though I don't recall why, so Michael
Hines work was never pursued (outside of academia).  Probably
worth asking IanP before investing too much time into it.

Dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: reliable live migration of large and busy guests
  2012-11-07 15:10           ` Dan Magenheimer
@ 2012-11-08 10:58             ` George Dunlap
  2012-11-12 17:12               ` Dan Magenheimer
  0 siblings, 1 reply; 10+ messages in thread
From: George Dunlap @ 2012-11-08 10:58 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: Andrew Cooper, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1541 bytes --]

On Wed, Nov 7, 2012 at 3:10 PM, Dan Magenheimer
<dan.magenheimer@oracle.com>wrote:

> > From an idealistic point of view, it might be quite nice to have several
> > live migrate mechanisms, so the user can choose whether they value
> > minimum downtime, minimum network utilisation, or maximum safety.
>
> Agreed.  IIRC, when post-copy was suggested for Xen years ago,
> Ian Pratt was against it, though I don't recall why, so Michael
> Hines work was never pursued (outside of academia).  Probably
> worth asking IanP before investing too much time into it.
>


Was he against a hybrid approach, where you "push" things first, and then
"pull" things later?  Or just against a pure "pull" approach?  I'm pretty
sure a pure "pull" approach would result in lower performance during the
migration.

Just tossing another idea out there: What about throttling the VM if it's
dirtying too many pages?  You can use the "cap" feature of the credit1
scheduler to reduce the amount of cpu time a given VM gets, even if there
is free cpu time available.  You could play around with doing N iterations,
and then cranking down the cap on each iteration after that; then the
application wouldn't have a several-second pause, it would just be "running
slowly" for some period of time.

Overall doing a hybrid "send dirty pages for a while, then move and pull
the rest in" seems like the best approach in the long-run, but it's fairly
complicated.  A throttling approach is probably less optimal but simpler to
get working as a temporary measure.

 -George

[-- Attachment #1.2: Type: text/html, Size: 2009 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: reliable live migration of large and busy guests
  2012-11-08 10:58             ` George Dunlap
@ 2012-11-12 17:12               ` Dan Magenheimer
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Magenheimer @ 2012-11-12 17:12 UTC (permalink / raw)
  To: George Dunlap; +Cc: Andrew Cooper, xen-devel

> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
> Subject: Re: [Xen-devel] reliable live migration of large and busy guests
> 
> On Wed, Nov 7, 2012 at 3:10 PM, Dan Magenheimer <dan.magenheimer@oracle.com> wrote:
> 
> 
> 	> From an idealistic point of view, it might be quite nice to have several
> 
> 	> live migrate mechanisms, so the user can choose whether they value
> 	> minimum downtime, minimum network utilisation, or maximum safety.
> 
> 	Agreed.  IIRC, when post-copy was suggested for Xen years ago,
> 	Ian Pratt was against it, though I don't recall why, so Michael
> 	Hines work was never pursued (outside of academia).  Probably
> 	worth asking IanP before investing too much time into it.
> 
> Was he against a hybrid approach, where you "push" things first, and then "pull" things later?  Or
> just against a pure "pull" approach?  I'm pretty sure a pure "pull" approach would result in lower
> performance during the migration.

Sorry, I don't recall.
 
> Just tossing another idea out there: What about throttling the VM if it's dirtying too many pages?
> You can use the "cap" feature of the credit1 scheduler to reduce the amount of cpu time a given VM
> gets, even if there is free cpu time available.  You could play around with doing N iterations, and
> then cranking down the cap on each iteration after that; then the application wouldn't have a several-
> second pause, it would just be "running slowly" for some period of time.
> 
> Overall doing a hybrid "send dirty pages for a while, then move and pull the rest in" seems like the
> best approach in the long-run, but it's fairly complicated.  A throttling approach is probably less
> optimal but simpler to get working as a temporary measure.

I agree there are lots of interesting hybrid possibilities worth exploring.
There are side-effects to be considered though... for example, the current
push approach is the only one that should be used when the goal of the
migration is to evacuate a physical machine so that it can be powered
off ASAP for maintenance or power management.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-11-12 17:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-06 20:28 reliable live migration of large and busy guests Olaf Hering
2012-11-06 20:45 ` Keir Fraser
2012-11-06 22:18   ` Olaf Hering
2012-11-06 23:18     ` Andrew Cooper
2012-11-06 23:41       ` Dan Magenheimer
2012-11-07 13:44         ` Andrew Cooper
2012-11-07 15:10           ` Dan Magenheimer
2012-11-08 10:58             ` George Dunlap
2012-11-12 17:12               ` Dan Magenheimer
2012-11-07 14:13       ` Olaf Hering

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.