From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [RFC Design Doc]Speed up live migration by skipping free pages
Date: Wed, 20 Apr 2016 09:10:34 +0100
Message-ID: <20160420081034.GA2263@work-vm>
References: <F2CBF3009FA73547804AE4C663CAB28E0415BD6F@shsmsx102.ccr.corp.intel.com>
 <20160324165530-mutt-send-email-mst@redhat.com>
 <F2CBF3009FA73547804AE4C663CAB28E0415C07D@shsmsx102.ccr.corp.intel.com>
 <20160324175503-mutt-send-email-mst@redhat.com>
 <F2CBF3009FA73547804AE4C663CAB28E0415C0F9@shsmsx102.ccr.corp.intel.com>
 <20160324181031-mutt-send-email-mst@redhat.com>
 <20160324174933.GA11662@work-vm>
 <F2CBF3009FA73547804AE4C663CAB28E04181432@shsmsx102.ccr.corp.intel.com>
 <20160419190542.GH2255@work-vm>
 <F2CBF3009FA73547804AE4C663CAB28E041834E8@shsmsx102.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	Wei Yang <richard.weiyang@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kenel.org" <linux-kernel@vger.kenel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"amit.shah@redhat.com" <amit.shah@redhat.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"mohan_parthasarathy@hpe.com" <mohan_parthasarathy@hpe.com>,
	"jitendra.kolhe@hpe.com" <jitendra.kolhe@hpe.com>,
	"simhan@hpe.com" <simhan@hpe.com>,
	"rkagan@virtuozzo.com" <rkagan@virtuozzo.com>,
	"riel@redhat.com" <riel@redhat.com>
To: "Li, Liang Z" <liang.z.li@intel.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:58919 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754614AbcDTIKl (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 20 Apr 2016 04:10:41 -0400
Content-Disposition: inline
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E041834E8@shsmsx102.ccr.corp.intel.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > Subject: Re: [RFC Design Doc]Speed up live migration by skipping free pages
> > 
> > * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > Hi Dave,
> > >
> > > I am now working on how to benefit post-copy by skipping the free
> > > pages, and I remember you have said we should let the destination know
> > > the info of free pages so as to avoid request the free pages from the
> > source.
> > >
> > > We have two solutions:
> > >
> > > a. send the migration dirty page bitmap to destination before post
> > > copy start, so the destination can decide whether to request the pages
> > > or place zero pages by checking the migration dirty page bitmap. The
> > > advantage is that we can avoid sending the free pages. the
> > > disadvantage is that we have to send extra data to destination.
> > >
> > > b. Check the page request on the source side, if it's not a dirty
> > > page, send a zero page header to the destination.
> > >
> > > What's your opinion about them?
> > 
> > (b) is certainly simpler - and requires no changes on the destination side or
> > the protocol.
> > If you then decided to add stuff to send the dirty page bit map later you
> > could do.
> > 
> > However, there are some other problems to figure out:
> >    1) The source side quits when it thinks it's sent all pages; when is your
> >        source going to quit?  If it quits while the destination still has
> >        unfulfilled pages then the destination will fail.
> 
> The source quit as the same as before, but before quitting, tell destination it has already quit.
> After that, the destination don't need to request pages from the source, just place zero pages. works?

Yes, maybe. The destination side would somehow have to clean up once it has all
the zero pages, but it currently doesn't keep a count or map of which pages
still need to be received.
Actually, perhaps that's easy - when the destination receives the 'quit it's zero'
message from the source, maybe it just turns off userfault; any fresh accesses
would get a zero page.  However, I'm not sure what happens to pages that are
already blocked/waiting for a page - that we'd need to check with Andrea/test.

> >    2) I sent a 'discard' bitmap of pages for the destination to unmap
> >       just at the change into postcopy; so I'm already sending one bitmap;
> >       this is for pages that have been changed since they were first sent
> >       but not yet resent.
> >       Be careful about how any changes you make interact with the generation
> >       of that bitmap.
> 
> Thanks for your reminding.
> 
> >    3) It's potentially very slow if the destination has to keep requesting
> >       blank pages.
> 
> Yes, really.
> 
> > Essentially what you're suggesting for (a) is a way to send a compressed set
> > of 'page is zero' messages based on a bitmap, and you're worried about the
> > time to send it - which I think is where we started the conversation about
> > time to deal with zeros :-).  Two ways to think of that are:
> 
> All my thoughts are in your words. :)
> 
> >    4) I already send one bitmap - so you're only doubling it in theory;
> >       I originally used a sparse bitmap but the suggestion was it was
> >       more complex than needed and it turned into more of a run-length
> > encoding.
> >    5) You're worried it would increase the downtime as you send the bitmap;
> > however
> >       if you implement (b) as well as (a) then you can send the data for
> >       (a) after the destination is running and not increase the downtime.
> 
> The downtime is main reason that I start to consider about (b), for VM with huge amount of RAM.
> the downtime will become a big problem.  Obviously, (a) is more efficient then (b).

With your idea about sending a 'quit' message to tell the destination the remaining
pages are all zero, I'm not sure that's true - (b) + the quit message sounds like
a good combination.

Dave

> 
> 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59233)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1asnDk-0001uf-0W
	for qemu-devel@nongnu.org; Wed, 20 Apr 2016 04:10:49 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1asnDe-0001AD-16
	for qemu-devel@nongnu.org; Wed, 20 Apr 2016 04:10:47 -0400
Received: from mx1.redhat.com ([209.132.183.28]:57887)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1asnDd-0001A9-PQ
	for qemu-devel@nongnu.org; Wed, 20 Apr 2016 04:10:41 -0400
Date: Wed, 20 Apr 2016 09:10:34 +0100
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20160420081034.GA2263@work-vm>
References: <F2CBF3009FA73547804AE4C663CAB28E0415BD6F@shsmsx102.ccr.corp.intel.com>
	<20160324165530-mutt-send-email-mst@redhat.com>
	<F2CBF3009FA73547804AE4C663CAB28E0415C07D@shsmsx102.ccr.corp.intel.com>
	<20160324175503-mutt-send-email-mst@redhat.com>
	<F2CBF3009FA73547804AE4C663CAB28E0415C0F9@shsmsx102.ccr.corp.intel.com>
	<20160324181031-mutt-send-email-mst@redhat.com>
	<20160324174933.GA11662@work-vm>
	<F2CBF3009FA73547804AE4C663CAB28E04181432@shsmsx102.ccr.corp.intel.com>
	<20160419190542.GH2255@work-vm>
	<F2CBF3009FA73547804AE4C663CAB28E041834E8@shsmsx102.ccr.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E041834E8@shsmsx102.ccr.corp.intel.com>
Subject: Re: [Qemu-devel] [RFC Design Doc]Speed up live migration by
 skipping free pages
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Li, Liang Z" <liang.z.li@intel.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>, Wei Yang <richard.weiyang@huawei.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "linux-kernel@vger.kenel.org" <linux-kernel@vger.kenel.org>, "pbonzini@redhat.com" <pbonzini@redhat.com>, "rth@twiddle.net" <rth@twiddle.net>, "ehabkost@redhat.com" <ehabkost@redhat.com>, "amit.shah@redhat.com" <amit.shah@redhat.com>, "quintela@redhat.com" <quintela@redhat.com>, "mohan_parthasarathy@hpe.com" <mohan_parthasarathy@hpe.com>, "jitendra.kolhe@hpe.com" <jitendra.kolhe@hpe.com>, "simhan@hpe.com" <simhan@hpe.com>, "rkagan@virtuozzo.com" <rkagan@virtuozzo.com>, "riel@redhat.com" <riel@redhat.com>

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > Subject: Re: [RFC Design Doc]Speed up live migration by skipping free pages
> > 
> > * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > Hi Dave,
> > >
> > > I am now working on how to benefit post-copy by skipping the free
> > > pages, and I remember you have said we should let the destination know
> > > the info of free pages so as to avoid request the free pages from the
> > source.
> > >
> > > We have two solutions:
> > >
> > > a. send the migration dirty page bitmap to destination before post
> > > copy start, so the destination can decide whether to request the pages
> > > or place zero pages by checking the migration dirty page bitmap. The
> > > advantage is that we can avoid sending the free pages. the
> > > disadvantage is that we have to send extra data to destination.
> > >
> > > b. Check the page request on the source side, if it's not a dirty
> > > page, send a zero page header to the destination.
> > >
> > > What's your opinion about them?
> > 
> > (b) is certainly simpler - and requires no changes on the destination side or
> > the protocol.
> > If you then decided to add stuff to send the dirty page bit map later you
> > could do.
> > 
> > However, there are some other problems to figure out:
> >    1) The source side quits when it thinks it's sent all pages; when is your
> >        source going to quit?  If it quits while the destination still has
> >        unfulfilled pages then the destination will fail.
> 
> The source quit as the same as before, but before quitting, tell destination it has already quit.
> After that, the destination don't need to request pages from the source, just place zero pages. works?

Yes, maybe. The destination side would somehow have to clean up once it has all
the zero pages, but it currently doesn't keep a count or map of which pages
still need to be received.
Actually, perhaps that's easy - when the destination receives the 'quit it's zero'
message from the source, maybe it just turns off userfault; any fresh accesses
would get a zero page.  However, I'm not sure what happens to pages that are
already blocked/waiting for a page - that we'd need to check with Andrea/test.

> >    2) I sent a 'discard' bitmap of pages for the destination to unmap
> >       just at the change into postcopy; so I'm already sending one bitmap;
> >       this is for pages that have been changed since they were first sent
> >       but not yet resent.
> >       Be careful about how any changes you make interact with the generation
> >       of that bitmap.
> 
> Thanks for your reminding.
> 
> >    3) It's potentially very slow if the destination has to keep requesting
> >       blank pages.
> 
> Yes, really.
> 
> > Essentially what you're suggesting for (a) is a way to send a compressed set
> > of 'page is zero' messages based on a bitmap, and you're worried about the
> > time to send it - which I think is where we started the conversation about
> > time to deal with zeros :-).  Two ways to think of that are:
> 
> All my thoughts are in your words. :)
> 
> >    4) I already send one bitmap - so you're only doubling it in theory;
> >       I originally used a sparse bitmap but the suggestion was it was
> >       more complex than needed and it turned into more of a run-length
> > encoding.
> >    5) You're worried it would increase the downtime as you send the bitmap;
> > however
> >       if you implement (b) as well as (a) then you can send the data for
> >       (a) after the destination is running and not increase the downtime.
> 
> The downtime is main reason that I start to consider about (b), for VM with huge amount of RAM.
> the downtime will become a big problem.  Obviously, (a) is more efficient then (b).

With your idea about sending a 'quit' message to tell the destination the remaining
pages are all zero, I'm not sure that's true - (b) + the quit message sounds like
a good combination.

Dave

> 
> 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK