From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [RFC Design Doc]Speed up live migration by skipping free pages
Date: Thu, 24 Mar 2016 18:25:24 +0200
Message-ID: <20160324181031-mutt-send-email-mst@redhat.com>
References: <F2CBF3009FA73547804AE4C663CAB28E04159C9C@shsmsx102.ccr.corp.intel.com>
 <20160324012424.GB14956@linux-gk3p>
 <20160324090004.GA2230@work-vm>
 <F2CBF3009FA73547804AE4C663CAB28E0415B8E5@shsmsx102.ccr.corp.intel.com>
 <20160324102354.GB2230@work-vm>
 <F2CBF3009FA73547804AE4C663CAB28E0415BD6F@shsmsx102.ccr.corp.intel.com>
 <20160324165530-mutt-send-email-mst@redhat.com>
 <F2CBF3009FA73547804AE4C663CAB28E0415C07D@shsmsx102.ccr.corp.intel.com>
 <20160324175503-mutt-send-email-mst@redhat.com>
 <F2CBF3009FA73547804AE4C663CAB28E0415C0F9@shsmsx102.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Wei Yang <richard.weiyang@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kenel.org" <linux-kernel@vger.kenel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"amit.shah@redhat.com" <amit.shah@redhat.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"mohan_parthasarathy@hpe.com" <mohan_parthasarathy@hpe.com>,
	"jitendra.kolhe@hpe.com" <jitendra.kolhe@hpe.com>,
	"simhan@hpe.com" <simhan@hpe.com>,
	"rkagan@virtuozzo.com" <rkagan@virtuozzo.com>,
	"riel@redhat.com" <riel@redhat.com>
To: "Li, Liang Z" <liang.z.li@intel.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:38729 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752211AbcCXQZd (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 24 Mar 2016 12:25:33 -0400
Content-Disposition: inline
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E0415C0F9@shsmsx102.ccr.corp.intel.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Thu, Mar 24, 2016 at 04:05:16PM +0000, Li, Liang Z wrote:
>=20
>=20
> On=A0%D, %SN wrote:
> %Q
>=20
> %C
>=20
> Liang
>=20
>=20
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Thursday, March 24, 2016 11:57 PM
> > To: Li, Liang Z
> > Cc: Dr. David Alan Gilbert; Wei Yang; qemu-devel@nongnu.org;
> > kvm@vger.kernel.org; linux-kernel@vger.kenel.org; pbonzini@redhat.c=
om;
> > rth@twiddle.net; ehabkost@redhat.com; amit.shah@redhat.com;
> > quintela@redhat.com; mohan_parthasarathy@hpe.com;
> > jitendra.kolhe@hpe.com; simhan@hpe.com; rkagan@virtuozzo.com;
> > riel@redhat.com
> > Subject: Re: [RFC Design Doc]Speed up live migration by skipping fr=
ee pages
> >=20
> > On Thu, Mar 24, 2016 at 03:53:25PM +0000, Li, Liang Z wrote:
> > > > > > > Not very complex, we can implement like this:
> > > > > > >
> > > > > > > 1. Set all the bits in the migration_bitmap_rcu->bmap to =
1 2.
> > > > > > > Clear all the bits in ram_list.
> > > > > > > dirty_memory[DIRTY_MEMORY_MIGRATION]
> > > > > > > 3. Send the get_free_page_bitmap request 4. Start to send
> > > > > > > pages to destination and check if the free_page_bitmap is=
 ready
> > > > > > >     if (is_ready) {
> > > > > > >           filter out the free pages from  migration_bitma=
p_rcu->bmap;
> > > > > > >           migration_bitmap_sync();
> > > > > > >     }
> > > > > > >      continue until live migration complete.
> > > > > > >
> > > > > > >
> > > > > > > Is that right?
> > > > > >
> > > > > > The order I'm trying to understand is something like:
> > > > > >
> > > > > >     a) Send the get_free_page_bitmap request
> > > > > >     b) Start sending pages
> > > > > >     c) Reach the end of memory
> > > > > >       [ is_ready is false - guest hasn't made free map yet =
]
> > > > > >     d) normal migration_bitmap_sync() at end of first pass
> > > > > >     e) Carry on sending dirty pages
> > > > > >     f) is_ready is true
> > > > > >       f.1) filter out free pages?
> > > > > >       f.2) migration_bitmap_sync()
> > > > > >
> > > > > > It's f.1 I'm worried about.  If the guest started generatin=
g the
> > > > > > free bitmap before (d), then a page marked as 'free' in f.1
> > > > > > might have become dirty before (d) and so (f.2) doesn't set=
 the
> > > > > > dirty again, and so we can't filter out pages in f.1.
> > > > > >
> > > > >
> > > > > As you described, the order is incorrect.
> > > > >
> > > > > Liang
> > > >
> > > >
> > > > So to make it safe, what is required is to make sure no free li=
st us
> > > > outstanding before calling migration_bitmap_sync.
> > > >
> > > > If one is outstanding, filter out pages before calling
> > migration_bitmap_sync.
> > > >
> > > > Of course, if we just do it like we normally do with migration,=
 then
> > > > by the time we call migration_bitmap_sync dirty bitmap is compl=
etely
> > > > empty, so there won't be anything to filter out.
> > > >
> > > > One way to address this is call migration_bitmap_sync in the IO
> > > > handler, while VCPU is stopped, then make sure to filter out pa=
ges
> > > > before the next migration_bitmap_sync.
> > > >
> > > > Another is to start filtering out pages upon IO handler, but ma=
ke
> > > > sure to flush the queue before calling migration_bitmap_sync.
> > > >
> > >
> > > It's really complex, maybe we should switch to a simple start,  j=
ust
> > > skip the free page in the ram bulk stage and make it asynchronous=
?
> > >
> > > Liang
> >=20
> > You mean like your patches do? No, blocking bulk migration until gu=
est
> > response is basically a non-starter.
> >=20
>=20
> No, don't wait anymore. Like below (copy from previous thread)
> --------------------------------------------------------------
> 1. Set all the bits in the migration_bitmap_rcu->bmap to 1=20
> 2. Clear all the bits in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION=
]
> 3. Send the get_free_page_bitmap request=20
> 4. Start to send  pages to destination and check if the free_page_bit=
map is ready
>    if (is_ready) {
>      filter out the free pages from  migration_bitmap_rcu->bmap;
>      migration_bitmap_sync();
>  }
> continue until live migration complete.
> ---------------------------------------------------------------
> Can this work?
>=20
> Liang

Not if you get the ready bit asynchronously like you wrote here
since is_ready can get set while you called migration_bitmap_sync.

As I said previously,
to make this work you need to filter out synchronously while VCPU is
stopped and while free pages from list are not being used.

Alternatively prevent getting free page list
and filtering them out from
guest from racing with migration_bitmap_sync.

=46or example, flush the VQ after migration_bitmap_sync.
So:

    lock
    migration_bitmap_sync();
    while (elem =3D virtqueue_pop) {
        virtqueue_push(elem)
        g_free(elem)
    }
    unlock


while in handle_output

    lock
    while (elem =3D virtqueue_pop) {
        list =3D get_free_list(elem)
        filter_out_free(list)
        virtqueue_push(elem)
        free(elem)
    }
    unlock


lock prevents migration_bitmap_sync from racing
against  handle_output


This way you can actually use ioeventfd
for this VQ so VCPU won't be blocked.

I do not think this is so complex, and
this way you can add requests for guest
free bitmap at an arbitary interval
either in host or in guest.

=46or example, add a value that says how often
should guest update the bitmap, set it to 0
to disable updates after migration done.

Or, make guest resubmit a new one when we consume
the old one, run handle_output about through
a periodic timer on host.


> > --
> > MST

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:57675)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1aj84l-0003F5-Ve
	for qemu-devel@nongnu.org; Thu, 24 Mar 2016 12:25:37 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1aj84i-0006WE-4u
	for qemu-devel@nongnu.org; Thu, 24 Mar 2016 12:25:35 -0400
Received: from mx1.redhat.com ([209.132.183.28]:38853)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1aj84h-0006W8-TX
	for qemu-devel@nongnu.org; Thu, 24 Mar 2016 12:25:32 -0400
Date: Thu, 24 Mar 2016 18:25:24 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20160324181031-mutt-send-email-mst@redhat.com>
References: <F2CBF3009FA73547804AE4C663CAB28E04159C9C@shsmsx102.ccr.corp.intel.com>
	<20160324012424.GB14956@linux-gk3p> <20160324090004.GA2230@work-vm>
	<F2CBF3009FA73547804AE4C663CAB28E0415B8E5@shsmsx102.ccr.corp.intel.com>
	<20160324102354.GB2230@work-vm>
	<F2CBF3009FA73547804AE4C663CAB28E0415BD6F@shsmsx102.ccr.corp.intel.com>
	<20160324165530-mutt-send-email-mst@redhat.com>
	<F2CBF3009FA73547804AE4C663CAB28E0415C07D@shsmsx102.ccr.corp.intel.com>
	<20160324175503-mutt-send-email-mst@redhat.com>
	<F2CBF3009FA73547804AE4C663CAB28E0415C0F9@shsmsx102.ccr.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <F2CBF3009FA73547804AE4C663CAB28E0415C0F9@shsmsx102.ccr.corp.intel.com>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [RFC Design Doc]Speed up live migration by
	skipping free pages
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Li, Liang Z" <liang.z.li@intel.com>
Cc: "rkagan@virtuozzo.com" <rkagan@virtuozzo.com>, "linux-kernel@vger.kenel.org" <linux-kernel@vger.kenel.org>, "ehabkost@redhat.com" <ehabkost@redhat.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, "quintela@redhat.com" <quintela@redhat.com>, "simhan@hpe.com" <simhan@hpe.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "jitendra.kolhe@hpe.com" <jitendra.kolhe@hpe.com>, "mohan_parthasarathy@hpe.com" <mohan_parthasarathy@hpe.com>, "amit.shah@redhat.com" <amit.shah@redhat.com>, "pbonzini@redhat.com" <pbonzini@redhat.com>, Wei Yang <richard.weiyang@huawei.com>, "rth@twiddle.net" <rth@twiddle.net>

On Thu, Mar 24, 2016 at 04:05:16PM +0000, Li, Liang Z wrote:
>=20
>=20
> On=A0%D, %SN wrote:
> %Q
>=20
> %C
>=20
> Liang
>=20
>=20
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst@redhat.com]
> > Sent: Thursday, March 24, 2016 11:57 PM
> > To: Li, Liang Z
> > Cc: Dr. David Alan Gilbert; Wei Yang; qemu-devel@nongnu.org;
> > kvm@vger.kernel.org; linux-kernel@vger.kenel.org; pbonzini@redhat.com=
;
> > rth@twiddle.net; ehabkost@redhat.com; amit.shah@redhat.com;
> > quintela@redhat.com; mohan_parthasarathy@hpe.com;
> > jitendra.kolhe@hpe.com; simhan@hpe.com; rkagan@virtuozzo.com;
> > riel@redhat.com
> > Subject: Re: [RFC Design Doc]Speed up live migration by skipping free=
 pages
> >=20
> > On Thu, Mar 24, 2016 at 03:53:25PM +0000, Li, Liang Z wrote:
> > > > > > > Not very complex, we can implement like this:
> > > > > > >
> > > > > > > 1. Set all the bits in the migration_bitmap_rcu->bmap to 1 =
2.
> > > > > > > Clear all the bits in ram_list.
> > > > > > > dirty_memory[DIRTY_MEMORY_MIGRATION]
> > > > > > > 3. Send the get_free_page_bitmap request 4. Start to send
> > > > > > > pages to destination and check if the free_page_bitmap is r=
eady
> > > > > > >     if (is_ready) {
> > > > > > >           filter out the free pages from  migration_bitmap_=
rcu->bmap;
> > > > > > >           migration_bitmap_sync();
> > > > > > >     }
> > > > > > >      continue until live migration complete.
> > > > > > >
> > > > > > >
> > > > > > > Is that right?
> > > > > >
> > > > > > The order I'm trying to understand is something like:
> > > > > >
> > > > > >     a) Send the get_free_page_bitmap request
> > > > > >     b) Start sending pages
> > > > > >     c) Reach the end of memory
> > > > > >       [ is_ready is false - guest hasn't made free map yet ]
> > > > > >     d) normal migration_bitmap_sync() at end of first pass
> > > > > >     e) Carry on sending dirty pages
> > > > > >     f) is_ready is true
> > > > > >       f.1) filter out free pages?
> > > > > >       f.2) migration_bitmap_sync()
> > > > > >
> > > > > > It's f.1 I'm worried about.  If the guest started generating =
the
> > > > > > free bitmap before (d), then a page marked as 'free' in f.1
> > > > > > might have become dirty before (d) and so (f.2) doesn't set t=
he
> > > > > > dirty again, and so we can't filter out pages in f.1.
> > > > > >
> > > > >
> > > > > As you described, the order is incorrect.
> > > > >
> > > > > Liang
> > > >
> > > >
> > > > So to make it safe, what is required is to make sure no free list=
 us
> > > > outstanding before calling migration_bitmap_sync.
> > > >
> > > > If one is outstanding, filter out pages before calling
> > migration_bitmap_sync.
> > > >
> > > > Of course, if we just do it like we normally do with migration, t=
hen
> > > > by the time we call migration_bitmap_sync dirty bitmap is complet=
ely
> > > > empty, so there won't be anything to filter out.
> > > >
> > > > One way to address this is call migration_bitmap_sync in the IO
> > > > handler, while VCPU is stopped, then make sure to filter out page=
s
> > > > before the next migration_bitmap_sync.
> > > >
> > > > Another is to start filtering out pages upon IO handler, but make
> > > > sure to flush the queue before calling migration_bitmap_sync.
> > > >
> > >
> > > It's really complex, maybe we should switch to a simple start,  jus=
t
> > > skip the free page in the ram bulk stage and make it asynchronous?
> > >
> > > Liang
> >=20
> > You mean like your patches do? No, blocking bulk migration until gues=
t
> > response is basically a non-starter.
> >=20
>=20
> No, don't wait anymore. Like below (copy from previous thread)
> --------------------------------------------------------------
> 1. Set all the bits in the migration_bitmap_rcu->bmap to 1=20
> 2. Clear all the bits in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
> 3. Send the get_free_page_bitmap request=20
> 4. Start to send  pages to destination and check if the free_page_bitma=
p is ready
>    if (is_ready) {
>      filter out the free pages from  migration_bitmap_rcu->bmap;
>      migration_bitmap_sync();
>  }
> continue until live migration complete.
> ---------------------------------------------------------------
> Can this work?
>=20
> Liang

Not if you get the ready bit asynchronously like you wrote here
since is_ready can get set while you called migration_bitmap_sync.

As I said previously,
to make this work you need to filter out synchronously while VCPU is
stopped and while free pages from list are not being used.

Alternatively prevent getting free page list
and filtering them out from
guest from racing with migration_bitmap_sync.

For example, flush the VQ after migration_bitmap_sync.
So:

    lock
    migration_bitmap_sync();
    while (elem =3D virtqueue_pop) {
        virtqueue_push(elem)
        g_free(elem)
    }
    unlock


while in handle_output

    lock
    while (elem =3D virtqueue_pop) {
        list =3D get_free_list(elem)
        filter_out_free(list)
        virtqueue_push(elem)
        free(elem)
    }
    unlock


lock prevents migration_bitmap_sync from racing
against  handle_output


This way you can actually use ioeventfd
for this VQ so VCPU won't be blocked.

I do not think this is so complex, and
this way you can add requests for guest
free bitmap at an arbitary interval
either in host or in guest.

For example, add a value that says how often
should guest update the bitmap, set it to 0
to disable updates after migration done.

Or, make guest resubmit a new one when we consume
the old one, run handle_output about through
a periodic timer on host.


> > --
> > MST