All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Li, Liang Z" <liang.z.li@intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Wei Yang <richard.weiyang@huawei.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kenel.org" <linux-kernel@vger.kenel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"amit.shah@redhat.com" <amit.shah@redhat.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"mohan_parthasarathy@hpe.com" <mohan_parthasarathy@hpe.com>,
	"jitendra.kolhe@hpe.com" <jitendra.kolhe@hpe.com>,
	"simhan@hpe.com" <simhan@hpe.com>,
	"rkagan@virtuozzo.com" <rkagan@virtuozzo.com>,
	"riel@redhat.com" <riel@redhat.com>
Subject: RE: [RFC Design Doc]Speed up live migration by skipping free pages
Date: Fri, 25 Mar 2016 01:59:21 +0000	[thread overview]
Message-ID: <F2CBF3009FA73547804AE4C663CAB28E0415C7E9@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <20160324202738-mutt-send-email-mst@redhat.com>

> > > > > > > > > The order I'm trying to understand is something like:
> > > > > > > > >
> > > > > > > > >     a) Send the get_free_page_bitmap request
> > > > > > > > >     b) Start sending pages
> > > > > > > > >     c) Reach the end of memory
> > > > > > > > >       [ is_ready is false - guest hasn't made free map yet ]
> > > > > > > > >     d) normal migration_bitmap_sync() at end of first pass
> > > > > > > > >     e) Carry on sending dirty pages
> > > > > > > > >     f) is_ready is true
> > > > > > > > >       f.1) filter out free pages?
> > > > > > > > >       f.2) migration_bitmap_sync()
> > > > > > > > >
> > > > > > > > > It's f.1 I'm worried about.  If the guest started
> > > > > > > > > generating the free bitmap before (d), then a page
> > > > > > > > > marked as 'free' in f.1 might have become dirty before
> > > > > > > > > (d) and so (f.2) doesn't set the dirty again, and so we can't
> filter out pages in f.1.
> > > > > > > > >
> > > > > > > >
> > > > > > > > As you described, the order is incorrect.
> > > > > > > >
> > > > > > > > Liang
> > > > > > >
> > > > > > >
> > > > > > > So to make it safe, what is required is to make sure no free
> > > > > > > list us outstanding before calling migration_bitmap_sync.
> > > > > > >
> > > > > > > If one is outstanding, filter out pages before calling
> > > > > migration_bitmap_sync.
> > > > > > >
> > > > > > > Of course, if we just do it like we normally do with
> > > > > > > migration, then by the time we call migration_bitmap_sync
> > > > > > > dirty bitmap is completely empty, so there won't be anything to
> filter out.
> > > > > > >
> > > > > > > One way to address this is call migration_bitmap_sync in the
> > > > > > > IO handler, while VCPU is stopped, then make sure to filter
> > > > > > > out pages before the next migration_bitmap_sync.
> > > > > > >
> > > > > > > Another is to start filtering out pages upon IO handler, but
> > > > > > > make sure to flush the queue before calling
> migration_bitmap_sync.
> > > > > > >
> > > > > >
> > > > > > It's really complex, maybe we should switch to a simple start,
> > > > > > just skip the free page in the ram bulk stage and make it
> asynchronous?
> > > > > >
> > > > > > Liang
> > > > >
> > > > > You mean like your patches do? No, blocking bulk migration until
> > > > > guest response is basically a non-starter.
> > > > >
> > > >
> > > > No, don't wait anymore. Like below (copy from previous thread)
> > > > --------------------------------------------------------------
> > > > 1. Set all the bits in the migration_bitmap_rcu->bmap to 1 2.
> > > > Clear all the bits in
> > > > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
> > > > 3. Send the get_free_page_bitmap request 4. Start to send  pages
> > > > to destination and check if the free_page_bitmap is ready
> > > >    if (is_ready) {
> > > >      filter out the free pages from  migration_bitmap_rcu->bmap;
> > > >      migration_bitmap_sync();
> > > >  }
> > > > continue until live migration complete.
> > > > ---------------------------------------------------------------
> > > > Can this work?
> > > >
> > > > Liang
> > >
> > > Not if you get the ready bit asynchronously like you wrote here
> > > since is_ready can get set while you called migration_bitmap_sync.
> > >
> > > As I said previously,
> > > to make this work you need to filter out synchronously while VCPU is
> > > stopped and while free pages from list are not being used.
> > >
> > > Alternatively prevent getting free page list and filtering them out
> > > from guest from racing with migration_bitmap_sync.
> > >
> > > For example, flush the VQ after migration_bitmap_sync.
> > > So:
> > >
> > >     lock
> > >     migration_bitmap_sync();
> > >     while (elem = virtqueue_pop) {
> > >         virtqueue_push(elem)
> > >         g_free(elem)
> > >     }
> > >     unlock
> > >
> > >
> > > while in handle_output
> > >
> > >     lock
> > >     while (elem = virtqueue_pop) {
> > >         list = get_free_list(elem)
> > >         filter_out_free(list)
> > >         virtqueue_push(elem)
> > >         free(elem)
> > >     }
> > >     unlock
> > >
> > >
> > > lock prevents migration_bitmap_sync from racing against
> > > handle_output
> >
> > I think the easier way is just to ignore the guests free list response
> > if it comes back after the first pass.
> >
> > Dave
> 
> That's a subset of course - after the first pass == after
> migration_bitmap_sync.
> 
> But it's really nasty - for example, how do you know it's the response from
> this migration round and not a previous one?

It's easy, add a request and response ID can solve this issue.

> It is really better to just keep things orthogonal and not introduce arbitrary
> limitations.
> 
> 
> For example, with post-copy there's no first pass, and it can still benefit from
> this optimization.
> 

Leave this to Dave ...

Liang

> 
> > >
> > >
> > > This way you can actually use ioeventfd for this VQ so VCPU won't be
> > > blocked.
> > >
> > > I do not think this is so complex, and this way you can add requests
> > > for guest free bitmap at an arbitary interval either in host or in
> > > guest.
> > >
> > > For example, add a value that says how often should guest update the
> > > bitmap, set it to 0 to disable updates after migration done.
> > >
> > > Or, make guest resubmit a new one when we consume the old one, run
> > > handle_output about through a periodic timer on host.
> > >
> > >
> > > > > --
> > > > > MST
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

WARNING: multiple messages have this Message-ID (diff)
From: "Li, Liang Z" <liang.z.li@intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "rkagan@virtuozzo.com" <rkagan@virtuozzo.com>,
	"linux-kernel@vger.kenel.org" <linux-kernel@vger.kenel.org>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"simhan@hpe.com" <simhan@hpe.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"jitendra.kolhe@hpe.com" <jitendra.kolhe@hpe.com>,
	"mohan_parthasarathy@hpe.com" <mohan_parthasarathy@hpe.com>,
	"amit.shah@redhat.com" <amit.shah@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	Wei Yang <richard.weiyang@huawei.com>,
	"rth@twiddle.net" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [RFC Design Doc]Speed up live migration by skipping free pages
Date: Fri, 25 Mar 2016 01:59:21 +0000	[thread overview]
Message-ID: <F2CBF3009FA73547804AE4C663CAB28E0415C7E9@shsmsx102.ccr.corp.intel.com> (raw)
In-Reply-To: <20160324202738-mutt-send-email-mst@redhat.com>

> > > > > > > > > The order I'm trying to understand is something like:
> > > > > > > > >
> > > > > > > > >     a) Send the get_free_page_bitmap request
> > > > > > > > >     b) Start sending pages
> > > > > > > > >     c) Reach the end of memory
> > > > > > > > >       [ is_ready is false - guest hasn't made free map yet ]
> > > > > > > > >     d) normal migration_bitmap_sync() at end of first pass
> > > > > > > > >     e) Carry on sending dirty pages
> > > > > > > > >     f) is_ready is true
> > > > > > > > >       f.1) filter out free pages?
> > > > > > > > >       f.2) migration_bitmap_sync()
> > > > > > > > >
> > > > > > > > > It's f.1 I'm worried about.  If the guest started
> > > > > > > > > generating the free bitmap before (d), then a page
> > > > > > > > > marked as 'free' in f.1 might have become dirty before
> > > > > > > > > (d) and so (f.2) doesn't set the dirty again, and so we can't
> filter out pages in f.1.
> > > > > > > > >
> > > > > > > >
> > > > > > > > As you described, the order is incorrect.
> > > > > > > >
> > > > > > > > Liang
> > > > > > >
> > > > > > >
> > > > > > > So to make it safe, what is required is to make sure no free
> > > > > > > list us outstanding before calling migration_bitmap_sync.
> > > > > > >
> > > > > > > If one is outstanding, filter out pages before calling
> > > > > migration_bitmap_sync.
> > > > > > >
> > > > > > > Of course, if we just do it like we normally do with
> > > > > > > migration, then by the time we call migration_bitmap_sync
> > > > > > > dirty bitmap is completely empty, so there won't be anything to
> filter out.
> > > > > > >
> > > > > > > One way to address this is call migration_bitmap_sync in the
> > > > > > > IO handler, while VCPU is stopped, then make sure to filter
> > > > > > > out pages before the next migration_bitmap_sync.
> > > > > > >
> > > > > > > Another is to start filtering out pages upon IO handler, but
> > > > > > > make sure to flush the queue before calling
> migration_bitmap_sync.
> > > > > > >
> > > > > >
> > > > > > It's really complex, maybe we should switch to a simple start,
> > > > > > just skip the free page in the ram bulk stage and make it
> asynchronous?
> > > > > >
> > > > > > Liang
> > > > >
> > > > > You mean like your patches do? No, blocking bulk migration until
> > > > > guest response is basically a non-starter.
> > > > >
> > > >
> > > > No, don't wait anymore. Like below (copy from previous thread)
> > > > --------------------------------------------------------------
> > > > 1. Set all the bits in the migration_bitmap_rcu->bmap to 1 2.
> > > > Clear all the bits in
> > > > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
> > > > 3. Send the get_free_page_bitmap request 4. Start to send  pages
> > > > to destination and check if the free_page_bitmap is ready
> > > >    if (is_ready) {
> > > >      filter out the free pages from  migration_bitmap_rcu->bmap;
> > > >      migration_bitmap_sync();
> > > >  }
> > > > continue until live migration complete.
> > > > ---------------------------------------------------------------
> > > > Can this work?
> > > >
> > > > Liang
> > >
> > > Not if you get the ready bit asynchronously like you wrote here
> > > since is_ready can get set while you called migration_bitmap_sync.
> > >
> > > As I said previously,
> > > to make this work you need to filter out synchronously while VCPU is
> > > stopped and while free pages from list are not being used.
> > >
> > > Alternatively prevent getting free page list and filtering them out
> > > from guest from racing with migration_bitmap_sync.
> > >
> > > For example, flush the VQ after migration_bitmap_sync.
> > > So:
> > >
> > >     lock
> > >     migration_bitmap_sync();
> > >     while (elem = virtqueue_pop) {
> > >         virtqueue_push(elem)
> > >         g_free(elem)
> > >     }
> > >     unlock
> > >
> > >
> > > while in handle_output
> > >
> > >     lock
> > >     while (elem = virtqueue_pop) {
> > >         list = get_free_list(elem)
> > >         filter_out_free(list)
> > >         virtqueue_push(elem)
> > >         free(elem)
> > >     }
> > >     unlock
> > >
> > >
> > > lock prevents migration_bitmap_sync from racing against
> > > handle_output
> >
> > I think the easier way is just to ignore the guests free list response
> > if it comes back after the first pass.
> >
> > Dave
> 
> That's a subset of course - after the first pass == after
> migration_bitmap_sync.
> 
> But it's really nasty - for example, how do you know it's the response from
> this migration round and not a previous one?

It's easy, add a request and response ID can solve this issue.

> It is really better to just keep things orthogonal and not introduce arbitrary
> limitations.
> 
> 
> For example, with post-copy there's no first pass, and it can still benefit from
> this optimization.
> 

Leave this to Dave ...

Liang

> 
> > >
> > >
> > > This way you can actually use ioeventfd for this VQ so VCPU won't be
> > > blocked.
> > >
> > > I do not think this is so complex, and this way you can add requests
> > > for guest free bitmap at an arbitary interval either in host or in
> > > guest.
> > >
> > > For example, add a value that says how often should guest update the
> > > bitmap, set it to 0 to disable updates after migration done.
> > >
> > > Or, make guest resubmit a new one when we consume the old one, run
> > > handle_output about through a periodic timer on host.
> > >
> > >
> > > > > --
> > > > > MST
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2016-03-25  1:59 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-22  7:43 [RFC Design Doc]Speed up live migration by skipping free pages Liang Li
2016-03-22  7:43 ` [Qemu-devel] " Liang Li
2016-03-22 10:11 ` Michael S. Tsirkin
2016-03-22 10:11   ` [Qemu-devel] " Michael S. Tsirkin
2016-03-23  6:05   ` Li, Liang Z
2016-03-23  6:05     ` [Qemu-devel] " Li, Liang Z
2016-03-23 14:08     ` Michael S. Tsirkin
2016-03-23 14:08       ` [Qemu-devel] " Michael S. Tsirkin
2016-03-24  1:19       ` Li, Liang Z
2016-03-24  1:19         ` [Qemu-devel] " Li, Liang Z
2016-03-24  9:48         ` Michael S. Tsirkin
2016-03-24  9:48           ` [Qemu-devel] " Michael S. Tsirkin
2016-03-24 10:16           ` Li, Liang Z
2016-03-24 10:16             ` [Qemu-devel] " Li, Liang Z
2016-03-24 10:29             ` Michael S. Tsirkin
2016-03-24 10:29               ` [Qemu-devel] " Michael S. Tsirkin
2016-03-24 14:33               ` Li, Liang Z
2016-03-24 14:33                 ` [Qemu-devel] " Li, Liang Z
2016-03-24 14:44                 ` Michael S. Tsirkin
2016-03-24 14:44                   ` [Qemu-devel] " Michael S. Tsirkin
2016-03-24 15:16                   ` Li, Liang Z
2016-03-24 15:16                     ` [Qemu-devel] " Li, Liang Z
2016-03-24 15:18                     ` Paolo Bonzini
2016-03-24 15:18                       ` [Qemu-devel] " Paolo Bonzini
2016-03-24 15:25                       ` Li, Liang Z
2016-03-24 15:25                         ` [Qemu-devel] " Li, Liang Z
2016-03-24 15:27                     ` Michael S. Tsirkin
2016-03-24 15:27                       ` [Qemu-devel] " Michael S. Tsirkin
2016-03-24 15:39                       ` Li, Liang Z
2016-03-24 15:39                         ` [Qemu-devel] " Li, Liang Z
2016-03-24 15:47                         ` Paolo Bonzini
2016-03-24 15:47                           ` [Qemu-devel] " Paolo Bonzini
2016-03-24 15:59                           ` Li, Liang Z
2016-03-24 15:59                             ` [Qemu-devel] " Li, Liang Z
2016-03-22 19:05 ` Dr. David Alan Gilbert
2016-03-22 19:05   ` [Qemu-devel] " Dr. David Alan Gilbert
2016-03-23  6:48   ` Li, Liang Z
2016-03-23  6:48     ` [Qemu-devel] " Li, Liang Z
2016-03-24  1:24     ` Wei Yang
2016-03-24  1:24       ` [Qemu-devel] " Wei Yang
2016-03-24  9:00       ` Dr. David Alan Gilbert
2016-03-24  9:00         ` [Qemu-devel] " Dr. David Alan Gilbert
2016-03-24 10:09         ` Li, Liang Z
2016-03-24 10:09           ` [Qemu-devel] " Li, Liang Z
2016-03-24 10:23           ` Dr. David Alan Gilbert
2016-03-24 10:23             ` [Qemu-devel] " Dr. David Alan Gilbert
2016-03-24 14:50             ` Li, Liang Z
2016-03-24 14:50               ` [Qemu-devel] " Li, Liang Z
2016-03-24 15:11               ` Michael S. Tsirkin
2016-03-24 15:11                 ` [Qemu-devel] " Michael S. Tsirkin
2016-03-24 15:53                 ` Li, Liang Z
2016-03-24 15:53                   ` [Qemu-devel] " Li, Liang Z
2016-03-24 15:56                   ` Michael S. Tsirkin
2016-03-24 15:56                     ` [Qemu-devel] " Michael S. Tsirkin
2016-03-24 16:05                     ` Li, Liang Z
2016-03-24 16:05                       ` [Qemu-devel] " Li, Liang Z
2016-03-24 16:25                       ` Michael S. Tsirkin
2016-03-24 16:25                         ` [Qemu-devel] " Michael S. Tsirkin
2016-03-24 17:49                         ` Dr. David Alan Gilbert
2016-03-24 17:49                           ` [Qemu-devel] " Dr. David Alan Gilbert
2016-03-24 22:16                           ` Michael S. Tsirkin
2016-03-24 22:16                             ` [Qemu-devel] " Michael S. Tsirkin
2016-03-25  1:59                             ` Li, Liang Z [this message]
2016-03-25  1:59                               ` Li, Liang Z
2016-03-25  1:32                           ` Li, Liang Z
2016-03-25  1:32                             ` [Qemu-devel] " Li, Liang Z
2016-04-18 11:08                           ` Li, Liang Z
2016-04-18 11:08                             ` [Qemu-devel] " Li, Liang Z
2016-04-18 11:29                             ` Michael S. Tsirkin
2016-04-18 11:29                               ` [Qemu-devel] " Michael S. Tsirkin
2016-04-18 14:36                               ` Li, Liang Z
2016-04-18 14:36                                 ` [Qemu-devel] " Li, Liang Z
2016-04-18 15:38                                 ` Michael S. Tsirkin
2016-04-18 15:38                                   ` [Qemu-devel] " Michael S. Tsirkin
2016-04-19  2:20                                   ` Li, Liang Z
2016-04-19  2:20                                     ` [Qemu-devel] " Li, Liang Z
2016-04-19 19:12                               ` Dr. David Alan Gilbert
2016-04-19 19:12                                 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-25 10:56                                 ` Michael S. Tsirkin
2016-04-25 10:56                                   ` [Qemu-devel] " Michael S. Tsirkin
2016-04-19 19:05                             ` Dr. David Alan Gilbert
2016-04-19 19:05                               ` [Qemu-devel] " Dr. David Alan Gilbert
2016-04-20  3:22                               ` Li, Liang Z
2016-04-20  3:22                                 ` [Qemu-devel] " Li, Liang Z
2016-04-20  8:10                                 ` Dr. David Alan Gilbert
2016-04-20  8:10                                   ` [Qemu-devel] " Dr. David Alan Gilbert
2016-03-25  1:32                         ` Li, Liang Z
2016-03-25  1:32                           ` [Qemu-devel] " Li, Liang Z
2016-04-01 10:54   ` Amit Shah
2016-04-01 10:54     ` [Qemu-devel] " Amit Shah
2016-04-05  1:49     ` Li, Liang Z
2016-04-05  1:49       ` [Qemu-devel] " Li, Liang Z
2016-03-23  1:37 ` Wei Yang
2016-03-23  1:37   ` [Qemu-devel] " Wei Yang
2016-03-23  7:18   ` Li, Liang Z
2016-03-23  7:18     ` [Qemu-devel] " Li, Liang Z
2016-03-23  9:46     ` Wei Yang
2016-03-23  9:46       ` [Qemu-devel] " Wei Yang
2016-03-23 14:35       ` Li, Liang Z
2016-03-23 14:35         ` [Qemu-devel] " Li, Liang Z
2016-03-24  0:52         ` Wei Yang
2016-03-24  0:52           ` [Qemu-devel] " Wei Yang
2016-03-24  1:32           ` Li, Liang Z
2016-03-24  1:32             ` [Qemu-devel] " Li, Liang Z
2016-03-24  1:56             ` Wei Yang
2016-03-24  1:56               ` [Qemu-devel] " Wei Yang
2016-03-23 16:53     ` Eric Blake
2016-03-23 16:53       ` Eric Blake
2016-03-23 21:41       ` Wei Yang
2016-03-23 21:41         ` Wei Yang
2016-03-24  1:23       ` Li, Liang Z
2016-03-24  1:23         ` Li, Liang Z

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F2CBF3009FA73547804AE4C663CAB28E0415C7E9@shsmsx102.ccr.corp.intel.com \
    --to=liang.z.li@intel.com \
    --cc=amit.shah@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=jitendra.kolhe@hpe.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kenel.org \
    --cc=mohan_parthasarathy@hpe.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.weiyang@huawei.com \
    --cc=riel@redhat.com \
    --cc=rkagan@virtuozzo.com \
    --cc=rth@twiddle.net \
    --cc=simhan@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.