[RFC qemu 0/4] A PV solution for live migration optimization

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-03 10:44 Liang Li
  0 siblings, 0 replies; 58+ messages in thread
From: Liang Li @ 2016-03-03 10:44 UTC (permalink / raw)
  To: quintela, amit.shah, qemu-devel, linux-kernel
  Cc: ehabkost, kvm, mst, Liang Li, dgilbert, virtualization, linux-mm,
	pbonzini, akpm, rth

The current QEMU live migration implementation mark the all the
guest's RAM pages as dirtied in the ram bulk stage, all these pages
will be processed and that takes quit a lot of CPU cycles.

From guest's point of view, it doesn't care about the content in free
pages. We can make use of this fact and skip processing the free
pages in the ram bulk stage, it can save a lot CPU cycles and reduce
the network traffic significantly while speed up the live migration
process obviously.

This patch set is the QEMU side implementation.

The virtio-balloon is extended so that QEMU can get the free pages
information from the guest through virtio.

After getting the free pages information (a bitmap), QEMU can use it
to filter out the guest's free pages in the ram bulk stage. This make
the live migration process much more efficient.

This RFC version doesn't take the post-copy and RDMA into
consideration, maybe both of them can benefit from this PV solution
by with some extra modifications.

Performance data
================

Test environment:

CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
Host RAM: 64GB
Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
Network:  X540-AT2 with 10 Gigabit connection
Guest RAM: 8GB

Case 1: Idle guest just boots:
============================================
                    | original  |    pv    
-------------------------------------------
total time(ms)      |    1894   |   421
--------------------------------------------
transferred ram(KB) |   398017  |  353242
============================================


Case 2: The guest has ever run some memory consuming workload, the
workload is terminated just before live migration.
============================================
                    | original  |    pv    
-------------------------------------------
total time(ms)      |   7436    |   552
--------------------------------------------
transferred ram(KB) |  8146291  |  361375
============================================

Liang Li (4):
  pc: Add code to get the lowmem form PCMachineState
  virtio-balloon: Add a new feature to balloon device
  migration: not set migration bitmap in setup stage
  migration: filter out guest's free pages in ram bulk stage

 balloon.c                                       | 30 ++++++++-
 hw/i386/pc.c                                    |  5 ++
 hw/i386/pc_piix.c                               |  1 +
 hw/i386/pc_q35.c                                |  1 +
 hw/virtio/virtio-balloon.c                      | 81 ++++++++++++++++++++++++-
 include/hw/i386/pc.h                            |  3 +-
 include/hw/virtio/virtio-balloon.h              | 17 +++++-
 include/standard-headers/linux/virtio_balloon.h |  1 +
 include/sysemu/balloon.h                        | 10 ++-
 migration/ram.c                                 | 64 +++++++++++++++----
 10 files changed, 195 insertions(+), 18 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-15 19:55                     ` Dr. David Alan Gilbert
  (?)
@ 2016-03-16  1:20                       ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-16  1:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Michael S. Tsirkin, Amit Shah, quintela, qemu-devel,
	linux-kernel, akpm, pbonzini, rth, ehabkost, linux-mm,
	virtualization, kvm, mohan_parthasarathy, jitendra.kolhe, simhan

> > > > > >   I'm just catching back up on this thread; so without
> > > > > > reference to any particular previous mail in the thread.
> > > > > >
> > > > > >   1) How many of the free pages do we tell the host about?
> > > > > >      Your main change is telling the host about all the
> > > > > >      free pages.
> > > > >
> > > > > Yes, all the guest's free pages.
> > > > >
> > > > > >      If we tell the host about all the free pages, then we might
> > > > > >      end up needing to allocate more pages and update the host
> > > > > >      with pages we now want to use; that would have to wait for the
> > > > > >      host to acknowledge that use of these pages, since if we don't
> > > > > >      wait for it then it might have skipped migrating a page we
> > > > > >      just started using (I don't understand how your series solves that).
> > > > > >      So the guest probably needs to keep some free pages - how
> many?
> > > > >
> > > > > Actually, there is no need to care about whether the free pages
> > > > > will be
> > > used by the host.
> > > > > We only care about some of the free pages we get reused by the
> > > > > guest,
> > > right?
> > > > >
> > > > > The dirty page logging can be used to solve this, starting the
> > > > > dirty page logging before getting the free pages informant from guest.
> > > > > Even some of the free pages are modified by the guest during the
> > > > > process of getting the free pages information, these modified
> > > > > pages will
> > > be traced by the dirty page logging mechanism. So in the following
> > > migration_bitmap_sync() function.
> > > > > The pages in the free pages bitmap, but latter was modified,
> > > > > will be reset to dirty. We won't omit any dirtied pages.
> > > > >
> > > > > So, guest doesn't need to keep any free pages.
> > > >
> > > > OK, yes, that works; so we do:
> > > >   * enable dirty logging
> > > >   * ask guest for free pages
> > > >   * initialise the migration bitmap as everything-free
> > > >   * then later we do the normal sync-dirty bitmap stuff and it all just
> works.
> > > >
> > > > That's nice and simple.
> > >
> > > This works once, sure. But there's an issue is that you have to
> > > defer migration until you get the free page list, and this only
> > > works once. So you end up with heuristics about how long to wait.
> > >
> > > Instead I propose:
> > >
> > > - mark all pages dirty as we do now.
> > >
> > > - at start of migration, start tracking dirty
> > >   pages in kvm, and tell guest to start tracking free pages
> > >
> > > we can now introduce any kind of delay, for example wait for ack
> > > from guest, or do whatever else, or even just start migrating pages
> > >
> > > - repeatedly:
> > > 	- get list of free pages from guest
> > > 	- clear them in migration bitmap
> > > 	- get dirty list from kvm
> > >
> > > - at end of migration, stop tracking writes in kvm,
> > >   and tell guest to stop tracking free pages
> >
> > I had thought of filtering out the free pages in each migration bitmap
> synchronization.
> > The advantage is we can skip process as many free pages as possible. Not
> just once.
> > The disadvantage is that we should change the current memory
> > management code to track the free pages, instead of traversing the free
> page list to construct the free pages bitmap, to reduce the overhead to get
> the free pages bitmap.
> > I am not sure the if the Kernel people would like it.
> >
> > If keeping the traversing mechanism, because of the overhead, maybe it's
> not worth to filter out the free pages repeatedly.
> 
> Well, Michael's idea of not waiting for the dirty bitmap to be filled does make
> that idea of constnatly using the free-bitmap better.
> 

No wait is a good idea.
Actually, we could shorten the waiting time by pre allocating the free pages bit map
and update it when guest allocating/freeing pages. it requires to modify the mm 
related code. I don't know whether the kernel people like this.

> In that case, is it easier if something (guest/host?) allocates some memory in
> the guests physical RAM space and just points the host to it, rather than
> having an explicit 'send'.
> 

Good idea too.

Liang
> Dave

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-16  1:20                       ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-16  1:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: ehabkost, kvm, simhan, Michael S. Tsirkin, linux-kernel,
	qemu-devel, jitendra.kolhe, linux-mm, mohan_parthasarathy,
	Amit Shah, pbonzini, akpm, virtualization, rth

> > > > > >   I'm just catching back up on this thread; so without
> > > > > > reference to any particular previous mail in the thread.
> > > > > >
> > > > > >   1) How many of the free pages do we tell the host about?
> > > > > >      Your main change is telling the host about all the
> > > > > >      free pages.
> > > > >
> > > > > Yes, all the guest's free pages.
> > > > >
> > > > > >      If we tell the host about all the free pages, then we might
> > > > > >      end up needing to allocate more pages and update the host
> > > > > >      with pages we now want to use; that would have to wait for the
> > > > > >      host to acknowledge that use of these pages, since if we don't
> > > > > >      wait for it then it might have skipped migrating a page we
> > > > > >      just started using (I don't understand how your series solves that).
> > > > > >      So the guest probably needs to keep some free pages - how
> many?
> > > > >
> > > > > Actually, there is no need to care about whether the free pages
> > > > > will be
> > > used by the host.
> > > > > We only care about some of the free pages we get reused by the
> > > > > guest,
> > > right?
> > > > >
> > > > > The dirty page logging can be used to solve this, starting the
> > > > > dirty page logging before getting the free pages informant from guest.
> > > > > Even some of the free pages are modified by the guest during the
> > > > > process of getting the free pages information, these modified
> > > > > pages will
> > > be traced by the dirty page logging mechanism. So in the following
> > > migration_bitmap_sync() function.
> > > > > The pages in the free pages bitmap, but latter was modified,
> > > > > will be reset to dirty. We won't omit any dirtied pages.
> > > > >
> > > > > So, guest doesn't need to keep any free pages.
> > > >
> > > > OK, yes, that works; so we do:
> > > >   * enable dirty logging
> > > >   * ask guest for free pages
> > > >   * initialise the migration bitmap as everything-free
> > > >   * then later we do the normal sync-dirty bitmap stuff and it all just
> works.
> > > >
> > > > That's nice and simple.
> > >
> > > This works once, sure. But there's an issue is that you have to
> > > defer migration until you get the free page list, and this only
> > > works once. So you end up with heuristics about how long to wait.
> > >
> > > Instead I propose:
> > >
> > > - mark all pages dirty as we do now.
> > >
> > > - at start of migration, start tracking dirty
> > >   pages in kvm, and tell guest to start tracking free pages
> > >
> > > we can now introduce any kind of delay, for example wait for ack
> > > from guest, or do whatever else, or even just start migrating pages
> > >
> > > - repeatedly:
> > > 	- get list of free pages from guest
> > > 	- clear them in migration bitmap
> > > 	- get dirty list from kvm
> > >
> > > - at end of migration, stop tracking writes in kvm,
> > >   and tell guest to stop tracking free pages
> >
> > I had thought of filtering out the free pages in each migration bitmap
> synchronization.
> > The advantage is we can skip process as many free pages as possible. Not
> just once.
> > The disadvantage is that we should change the current memory
> > management code to track the free pages, instead of traversing the free
> page list to construct the free pages bitmap, to reduce the overhead to get
> the free pages bitmap.
> > I am not sure the if the Kernel people would like it.
> >
> > If keeping the traversing mechanism, because of the overhead, maybe it's
> not worth to filter out the free pages repeatedly.
> 
> Well, Michael's idea of not waiting for the dirty bitmap to be filled does make
> that idea of constnatly using the free-bitmap better.
> 

No wait is a good idea.
Actually, we could shorten the waiting time by pre allocating the free pages bit map
and update it when guest allocating/freeing pages. it requires to modify the mm 
related code. I don't know whether the kernel people like this.

> In that case, is it easier if something (guest/host?) allocates some memory in
> the guests physical RAM space and just points the host to it, rather than
> having an explicit 'send'.
> 

Good idea too.

Liang
> Dave

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-16  1:20                       ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-16  1:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Michael S. Tsirkin, Amit Shah, quintela, qemu-devel,
	linux-kernel, akpm, pbonzini, rth, ehabkost, linux-mm,
	virtualization, kvm, mohan_parthasarathy, jitendra.kolhe, simhan

> > > > > >   I'm just catching back up on this thread; so without
> > > > > > reference to any particular previous mail in the thread.
> > > > > >
> > > > > >   1) How many of the free pages do we tell the host about?
> > > > > >      Your main change is telling the host about all the
> > > > > >      free pages.
> > > > >
> > > > > Yes, all the guest's free pages.
> > > > >
> > > > > >      If we tell the host about all the free pages, then we might
> > > > > >      end up needing to allocate more pages and update the host
> > > > > >      with pages we now want to use; that would have to wait for the
> > > > > >      host to acknowledge that use of these pages, since if we don't
> > > > > >      wait for it then it might have skipped migrating a page we
> > > > > >      just started using (I don't understand how your series solves that).
> > > > > >      So the guest probably needs to keep some free pages - how
> many?
> > > > >
> > > > > Actually, there is no need to care about whether the free pages
> > > > > will be
> > > used by the host.
> > > > > We only care about some of the free pages we get reused by the
> > > > > guest,
> > > right?
> > > > >
> > > > > The dirty page logging can be used to solve this, starting the
> > > > > dirty page logging before getting the free pages informant from guest.
> > > > > Even some of the free pages are modified by the guest during the
> > > > > process of getting the free pages information, these modified
> > > > > pages will
> > > be traced by the dirty page logging mechanism. So in the following
> > > migration_bitmap_sync() function.
> > > > > The pages in the free pages bitmap, but latter was modified,
> > > > > will be reset to dirty. We won't omit any dirtied pages.
> > > > >
> > > > > So, guest doesn't need to keep any free pages.
> > > >
> > > > OK, yes, that works; so we do:
> > > >   * enable dirty logging
> > > >   * ask guest for free pages
> > > >   * initialise the migration bitmap as everything-free
> > > >   * then later we do the normal sync-dirty bitmap stuff and it all just
> works.
> > > >
> > > > That's nice and simple.
> > >
> > > This works once, sure. But there's an issue is that you have to
> > > defer migration until you get the free page list, and this only
> > > works once. So you end up with heuristics about how long to wait.
> > >
> > > Instead I propose:
> > >
> > > - mark all pages dirty as we do now.
> > >
> > > - at start of migration, start tracking dirty
> > >   pages in kvm, and tell guest to start tracking free pages
> > >
> > > we can now introduce any kind of delay, for example wait for ack
> > > from guest, or do whatever else, or even just start migrating pages
> > >
> > > - repeatedly:
> > > 	- get list of free pages from guest
> > > 	- clear them in migration bitmap
> > > 	- get dirty list from kvm
> > >
> > > - at end of migration, stop tracking writes in kvm,
> > >   and tell guest to stop tracking free pages
> >
> > I had thought of filtering out the free pages in each migration bitmap
> synchronization.
> > The advantage is we can skip process as many free pages as possible. Not
> just once.
> > The disadvantage is that we should change the current memory
> > management code to track the free pages, instead of traversing the free
> page list to construct the free pages bitmap, to reduce the overhead to get
> the free pages bitmap.
> > I am not sure the if the Kernel people would like it.
> >
> > If keeping the traversing mechanism, because of the overhead, maybe it's
> not worth to filter out the free pages repeatedly.
> 
> Well, Michael's idea of not waiting for the dirty bitmap to be filled does make
> that idea of constnatly using the free-bitmap better.
> 

No wait is a good idea.
Actually, we could shorten the waiting time by pre allocating the free pages bit map
and update it when guest allocating/freeing pages. it requires to modify the mm 
related code. I don't know whether the kernel people like this.

> In that case, is it easier if something (guest/host?) allocates some memory in
> the guests physical RAM space and just points the host to it, rather than
> having an explicit 'send'.
> 

Good idea too.

Liang
> Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-15 11:11                   ` Li, Liang Z
  (?)
@ 2016-03-15 19:55                     ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-15 19:55 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Michael S. Tsirkin, Amit Shah, quintela, qemu-devel,
	linux-kernel, akpm, pbonzini, rth, ehabkost, linux-mm,
	virtualization, kvm, mohan_parthasarathy, jitendra.kolhe, simhan

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > On Mon, Mar 14, 2016 at 05:03:34PM +0000, Dr. David Alan Gilbert wrote:
> > > * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > > >
> > > > > Hi,
> > > > >   I'm just catching back up on this thread; so without reference
> > > > > to any particular previous mail in the thread.
> > > > >
> > > > >   1) How many of the free pages do we tell the host about?
> > > > >      Your main change is telling the host about all the
> > > > >      free pages.
> > > >
> > > > Yes, all the guest's free pages.
> > > >
> > > > >      If we tell the host about all the free pages, then we might
> > > > >      end up needing to allocate more pages and update the host
> > > > >      with pages we now want to use; that would have to wait for the
> > > > >      host to acknowledge that use of these pages, since if we don't
> > > > >      wait for it then it might have skipped migrating a page we
> > > > >      just started using (I don't understand how your series solves that).
> > > > >      So the guest probably needs to keep some free pages - how many?
> > > >
> > > > Actually, there is no need to care about whether the free pages will be
> > used by the host.
> > > > We only care about some of the free pages we get reused by the guest,
> > right?
> > > >
> > > > The dirty page logging can be used to solve this, starting the dirty
> > > > page logging before getting the free pages informant from guest.
> > > > Even some of the free pages are modified by the guest during the
> > > > process of getting the free pages information, these modified pages will
> > be traced by the dirty page logging mechanism. So in the following
> > migration_bitmap_sync() function.
> > > > The pages in the free pages bitmap, but latter was modified, will be
> > > > reset to dirty. We won't omit any dirtied pages.
> > > >
> > > > So, guest doesn't need to keep any free pages.
> > >
> > > OK, yes, that works; so we do:
> > >   * enable dirty logging
> > >   * ask guest for free pages
> > >   * initialise the migration bitmap as everything-free
> > >   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> > >
> > > That's nice and simple.
> > 
> > This works once, sure. But there's an issue is that you have to defer migration
> > until you get the free page list, and this only works once. So you end up with
> > heuristics about how long to wait.
> > 
> > Instead I propose:
> > 
> > - mark all pages dirty as we do now.
> > 
> > - at start of migration, start tracking dirty
> >   pages in kvm, and tell guest to start tracking free pages
> > 
> > we can now introduce any kind of delay, for example wait for ack from guest,
> > or do whatever else, or even just start migrating pages
> > 
> > - repeatedly:
> > 	- get list of free pages from guest
> > 	- clear them in migration bitmap
> > 	- get dirty list from kvm
> > 
> > - at end of migration, stop tracking writes in kvm,
> >   and tell guest to stop tracking free pages
> 
> I had thought of filtering out the free pages in each migration bitmap synchronization. 
> The advantage is we can skip process as many free pages as possible. Not just once.
> The disadvantage is that we should change the current memory management code to track the free pages,
> instead of traversing the free page list to construct the free pages bitmap, to reduce the overhead to get the free pages bitmap.
> I am not sure the if the Kernel people would like it.
> 
> If keeping the traversing mechanism, because of the overhead, maybe it's not worth to filter out the free pages repeatedly.

Well, Michael's idea of not waiting for the dirty
bitmap to be filled does make that idea of constnatly
using the free-bitmap better.

In that case, is it easier if something (guest/host?)
allocates some memory in the guests physical RAM space
and just points the host to it, rather than having an 
explicit 'send'.

Dave

> Liang
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-15 19:55                     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-15 19:55 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: ehabkost, kvm, simhan, Michael S. Tsirkin, linux-kernel,
	qemu-devel, jitendra.kolhe, linux-mm, mohan_parthasarathy,
	Amit Shah, pbonzini, akpm, virtualization, rth

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > On Mon, Mar 14, 2016 at 05:03:34PM +0000, Dr. David Alan Gilbert wrote:
> > > * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > > >
> > > > > Hi,
> > > > >   I'm just catching back up on this thread; so without reference
> > > > > to any particular previous mail in the thread.
> > > > >
> > > > >   1) How many of the free pages do we tell the host about?
> > > > >      Your main change is telling the host about all the
> > > > >      free pages.
> > > >
> > > > Yes, all the guest's free pages.
> > > >
> > > > >      If we tell the host about all the free pages, then we might
> > > > >      end up needing to allocate more pages and update the host
> > > > >      with pages we now want to use; that would have to wait for the
> > > > >      host to acknowledge that use of these pages, since if we don't
> > > > >      wait for it then it might have skipped migrating a page we
> > > > >      just started using (I don't understand how your series solves that).
> > > > >      So the guest probably needs to keep some free pages - how many?
> > > >
> > > > Actually, there is no need to care about whether the free pages will be
> > used by the host.
> > > > We only care about some of the free pages we get reused by the guest,
> > right?
> > > >
> > > > The dirty page logging can be used to solve this, starting the dirty
> > > > page logging before getting the free pages informant from guest.
> > > > Even some of the free pages are modified by the guest during the
> > > > process of getting the free pages information, these modified pages will
> > be traced by the dirty page logging mechanism. So in the following
> > migration_bitmap_sync() function.
> > > > The pages in the free pages bitmap, but latter was modified, will be
> > > > reset to dirty. We won't omit any dirtied pages.
> > > >
> > > > So, guest doesn't need to keep any free pages.
> > >
> > > OK, yes, that works; so we do:
> > >   * enable dirty logging
> > >   * ask guest for free pages
> > >   * initialise the migration bitmap as everything-free
> > >   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> > >
> > > That's nice and simple.
> > 
> > This works once, sure. But there's an issue is that you have to defer migration
> > until you get the free page list, and this only works once. So you end up with
> > heuristics about how long to wait.
> > 
> > Instead I propose:
> > 
> > - mark all pages dirty as we do now.
> > 
> > - at start of migration, start tracking dirty
> >   pages in kvm, and tell guest to start tracking free pages
> > 
> > we can now introduce any kind of delay, for example wait for ack from guest,
> > or do whatever else, or even just start migrating pages
> > 
> > - repeatedly:
> > 	- get list of free pages from guest
> > 	- clear them in migration bitmap
> > 	- get dirty list from kvm
> > 
> > - at end of migration, stop tracking writes in kvm,
> >   and tell guest to stop tracking free pages
> 
> I had thought of filtering out the free pages in each migration bitmap synchronization. 
> The advantage is we can skip process as many free pages as possible. Not just once.
> The disadvantage is that we should change the current memory management code to track the free pages,
> instead of traversing the free page list to construct the free pages bitmap, to reduce the overhead to get the free pages bitmap.
> I am not sure the if the Kernel people would like it.
> 
> If keeping the traversing mechanism, because of the overhead, maybe it's not worth to filter out the free pages repeatedly.

Well, Michael's idea of not waiting for the dirty
bitmap to be filled does make that idea of constnatly
using the free-bitmap better.

In that case, is it easier if something (guest/host?)
allocates some memory in the guests physical RAM space
and just points the host to it, rather than having an 
explicit 'send'.

Dave

> Liang
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-15 19:55                     ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-15 19:55 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Michael S. Tsirkin, Amit Shah, quintela, qemu-devel,
	linux-kernel, akpm, pbonzini, rth, ehabkost, linux-mm,
	virtualization, kvm, mohan_parthasarathy, jitendra.kolhe, simhan

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > On Mon, Mar 14, 2016 at 05:03:34PM +0000, Dr. David Alan Gilbert wrote:
> > > * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > > >
> > > > > Hi,
> > > > >   I'm just catching back up on this thread; so without reference
> > > > > to any particular previous mail in the thread.
> > > > >
> > > > >   1) How many of the free pages do we tell the host about?
> > > > >      Your main change is telling the host about all the
> > > > >      free pages.
> > > >
> > > > Yes, all the guest's free pages.
> > > >
> > > > >      If we tell the host about all the free pages, then we might
> > > > >      end up needing to allocate more pages and update the host
> > > > >      with pages we now want to use; that would have to wait for the
> > > > >      host to acknowledge that use of these pages, since if we don't
> > > > >      wait for it then it might have skipped migrating a page we
> > > > >      just started using (I don't understand how your series solves that).
> > > > >      So the guest probably needs to keep some free pages - how many?
> > > >
> > > > Actually, there is no need to care about whether the free pages will be
> > used by the host.
> > > > We only care about some of the free pages we get reused by the guest,
> > right?
> > > >
> > > > The dirty page logging can be used to solve this, starting the dirty
> > > > page logging before getting the free pages informant from guest.
> > > > Even some of the free pages are modified by the guest during the
> > > > process of getting the free pages information, these modified pages will
> > be traced by the dirty page logging mechanism. So in the following
> > migration_bitmap_sync() function.
> > > > The pages in the free pages bitmap, but latter was modified, will be
> > > > reset to dirty. We won't omit any dirtied pages.
> > > >
> > > > So, guest doesn't need to keep any free pages.
> > >
> > > OK, yes, that works; so we do:
> > >   * enable dirty logging
> > >   * ask guest for free pages
> > >   * initialise the migration bitmap as everything-free
> > >   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> > >
> > > That's nice and simple.
> > 
> > This works once, sure. But there's an issue is that you have to defer migration
> > until you get the free page list, and this only works once. So you end up with
> > heuristics about how long to wait.
> > 
> > Instead I propose:
> > 
> > - mark all pages dirty as we do now.
> > 
> > - at start of migration, start tracking dirty
> >   pages in kvm, and tell guest to start tracking free pages
> > 
> > we can now introduce any kind of delay, for example wait for ack from guest,
> > or do whatever else, or even just start migrating pages
> > 
> > - repeatedly:
> > 	- get list of free pages from guest
> > 	- clear them in migration bitmap
> > 	- get dirty list from kvm
> > 
> > - at end of migration, stop tracking writes in kvm,
> >   and tell guest to stop tracking free pages
> 
> I had thought of filtering out the free pages in each migration bitmap synchronization. 
> The advantage is we can skip process as many free pages as possible. Not just once.
> The disadvantage is that we should change the current memory management code to track the free pages,
> instead of traversing the free page list to construct the free pages bitmap, to reduce the overhead to get the free pages bitmap.
> I am not sure the if the Kernel people would like it.
> 
> If keeping the traversing mechanism, because of the overhead, maybe it's not worth to filter out the free pages repeatedly.

Well, Michael's idea of not waiting for the dirty
bitmap to be filled does make that idea of constnatly
using the free-bitmap better.

In that case, is it easier if something (guest/host?)
allocates some memory in the guests physical RAM space
and just points the host to it, rather than having an 
explicit 'send'.

Dave

> Liang
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-15 10:29                 ` Michael S. Tsirkin
  (?)
@ 2016-03-15 11:11                   ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-15 11:11 UTC (permalink / raw)
  To: Michael S. Tsirkin, Dr. David Alan Gilbert
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, akpm, pbonzini,
	rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

> On Mon, Mar 14, 2016 at 05:03:34PM +0000, Dr. David Alan Gilbert wrote:
> > * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > >
> > > > Hi,
> > > >   I'm just catching back up on this thread; so without reference
> > > > to any particular previous mail in the thread.
> > > >
> > > >   1) How many of the free pages do we tell the host about?
> > > >      Your main change is telling the host about all the
> > > >      free pages.
> > >
> > > Yes, all the guest's free pages.
> > >
> > > >      If we tell the host about all the free pages, then we might
> > > >      end up needing to allocate more pages and update the host
> > > >      with pages we now want to use; that would have to wait for the
> > > >      host to acknowledge that use of these pages, since if we don't
> > > >      wait for it then it might have skipped migrating a page we
> > > >      just started using (I don't understand how your series solves that).
> > > >      So the guest probably needs to keep some free pages - how many?
> > >
> > > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > > We only care about some of the free pages we get reused by the guest,
> right?
> > >
> > > The dirty page logging can be used to solve this, starting the dirty
> > > page logging before getting the free pages informant from guest.
> > > Even some of the free pages are modified by the guest during the
> > > process of getting the free pages information, these modified pages will
> be traced by the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > > The pages in the free pages bitmap, but latter was modified, will be
> > > reset to dirty. We won't omit any dirtied pages.
> > >
> > > So, guest doesn't need to keep any free pages.
> >
> > OK, yes, that works; so we do:
> >   * enable dirty logging
> >   * ask guest for free pages
> >   * initialise the migration bitmap as everything-free
> >   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> >
> > That's nice and simple.
> 
> This works once, sure. But there's an issue is that you have to defer migration
> until you get the free page list, and this only works once. So you end up with
> heuristics about how long to wait.
> 
> Instead I propose:
> 
> - mark all pages dirty as we do now.
> 
> - at start of migration, start tracking dirty
>   pages in kvm, and tell guest to start tracking free pages
> 
> we can now introduce any kind of delay, for example wait for ack from guest,
> or do whatever else, or even just start migrating pages
> 
> - repeatedly:
> 	- get list of free pages from guest
> 	- clear them in migration bitmap
> 	- get dirty list from kvm
> 
> - at end of migration, stop tracking writes in kvm,
>   and tell guest to stop tracking free pages

I had thought of filtering out the free pages in each migration bitmap synchronization. 
The advantage is we can skip process as many free pages as possible. Not just once.
The disadvantage is that we should change the current memory management code to track the free pages,
instead of traversing the free page list to construct the free pages bitmap, to reduce the overhead to get the free pages bitmap.
I am not sure the if the Kernel people would like it.

If keeping the traversing mechanism, because of the overhead, maybe it's not worth to filter out the free pages repeatedly.

Liang

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-15 11:11                   ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-15 11:11 UTC (permalink / raw)
  To: Michael S. Tsirkin, Dr. David Alan Gilbert
  Cc: ehabkost, kvm, simhan, linux-kernel, qemu-devel, jitendra.kolhe,
	linux-mm, mohan_parthasarathy, Amit Shah, pbonzini, akpm,
	virtualization, rth

> On Mon, Mar 14, 2016 at 05:03:34PM +0000, Dr. David Alan Gilbert wrote:
> > * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > >
> > > > Hi,
> > > >   I'm just catching back up on this thread; so without reference
> > > > to any particular previous mail in the thread.
> > > >
> > > >   1) How many of the free pages do we tell the host about?
> > > >      Your main change is telling the host about all the
> > > >      free pages.
> > >
> > > Yes, all the guest's free pages.
> > >
> > > >      If we tell the host about all the free pages, then we might
> > > >      end up needing to allocate more pages and update the host
> > > >      with pages we now want to use; that would have to wait for the
> > > >      host to acknowledge that use of these pages, since if we don't
> > > >      wait for it then it might have skipped migrating a page we
> > > >      just started using (I don't understand how your series solves that).
> > > >      So the guest probably needs to keep some free pages - how many?
> > >
> > > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > > We only care about some of the free pages we get reused by the guest,
> right?
> > >
> > > The dirty page logging can be used to solve this, starting the dirty
> > > page logging before getting the free pages informant from guest.
> > > Even some of the free pages are modified by the guest during the
> > > process of getting the free pages information, these modified pages will
> be traced by the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > > The pages in the free pages bitmap, but latter was modified, will be
> > > reset to dirty. We won't omit any dirtied pages.
> > >
> > > So, guest doesn't need to keep any free pages.
> >
> > OK, yes, that works; so we do:
> >   * enable dirty logging
> >   * ask guest for free pages
> >   * initialise the migration bitmap as everything-free
> >   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> >
> > That's nice and simple.
> 
> This works once, sure. But there's an issue is that you have to defer migration
> until you get the free page list, and this only works once. So you end up with
> heuristics about how long to wait.
> 
> Instead I propose:
> 
> - mark all pages dirty as we do now.
> 
> - at start of migration, start tracking dirty
>   pages in kvm, and tell guest to start tracking free pages
> 
> we can now introduce any kind of delay, for example wait for ack from guest,
> or do whatever else, or even just start migrating pages
> 
> - repeatedly:
> 	- get list of free pages from guest
> 	- clear them in migration bitmap
> 	- get dirty list from kvm
> 
> - at end of migration, stop tracking writes in kvm,
>   and tell guest to stop tracking free pages

I had thought of filtering out the free pages in each migration bitmap synchronization. 
The advantage is we can skip process as many free pages as possible. Not just once.
The disadvantage is that we should change the current memory management code to track the free pages,
instead of traversing the free page list to construct the free pages bitmap, to reduce the overhead to get the free pages bitmap.
I am not sure the if the Kernel people would like it.

If keeping the traversing mechanism, because of the overhead, maybe it's not worth to filter out the free pages repeatedly.

Liang

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-15 11:11                   ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-15 11:11 UTC (permalink / raw)
  To: Michael S. Tsirkin, Dr. David Alan Gilbert
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, akpm, pbonzini,
	rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

> On Mon, Mar 14, 2016 at 05:03:34PM +0000, Dr. David Alan Gilbert wrote:
> > * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > >
> > > > Hi,
> > > >   I'm just catching back up on this thread; so without reference
> > > > to any particular previous mail in the thread.
> > > >
> > > >   1) How many of the free pages do we tell the host about?
> > > >      Your main change is telling the host about all the
> > > >      free pages.
> > >
> > > Yes, all the guest's free pages.
> > >
> > > >      If we tell the host about all the free pages, then we might
> > > >      end up needing to allocate more pages and update the host
> > > >      with pages we now want to use; that would have to wait for the
> > > >      host to acknowledge that use of these pages, since if we don't
> > > >      wait for it then it might have skipped migrating a page we
> > > >      just started using (I don't understand how your series solves that).
> > > >      So the guest probably needs to keep some free pages - how many?
> > >
> > > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > > We only care about some of the free pages we get reused by the guest,
> right?
> > >
> > > The dirty page logging can be used to solve this, starting the dirty
> > > page logging before getting the free pages informant from guest.
> > > Even some of the free pages are modified by the guest during the
> > > process of getting the free pages information, these modified pages will
> be traced by the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > > The pages in the free pages bitmap, but latter was modified, will be
> > > reset to dirty. We won't omit any dirtied pages.
> > >
> > > So, guest doesn't need to keep any free pages.
> >
> > OK, yes, that works; so we do:
> >   * enable dirty logging
> >   * ask guest for free pages
> >   * initialise the migration bitmap as everything-free
> >   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> >
> > That's nice and simple.
> 
> This works once, sure. But there's an issue is that you have to defer migration
> until you get the free page list, and this only works once. So you end up with
> heuristics about how long to wait.
> 
> Instead I propose:
> 
> - mark all pages dirty as we do now.
> 
> - at start of migration, start tracking dirty
>   pages in kvm, and tell guest to start tracking free pages
> 
> we can now introduce any kind of delay, for example wait for ack from guest,
> or do whatever else, or even just start migrating pages
> 
> - repeatedly:
> 	- get list of free pages from guest
> 	- clear them in migration bitmap
> 	- get dirty list from kvm
> 
> - at end of migration, stop tracking writes in kvm,
>   and tell guest to stop tracking free pages

I had thought of filtering out the free pages in each migration bitmap synchronization. 
The advantage is we can skip process as many free pages as possible. Not just once.
The disadvantage is that we should change the current memory management code to track the free pages,
instead of traversing the free page list to construct the free pages bitmap, to reduce the overhead to get the free pages bitmap.
I am not sure the if the Kernel people would like it.

If keeping the traversing mechanism, because of the overhead, maybe it's not worth to filter out the free pages repeatedly.

Liang




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-14 17:03               ` Dr. David Alan Gilbert
@ 2016-03-15 10:29                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2016-03-15 10:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Li, Liang Z, Amit Shah, quintela, qemu-devel, linux-kernel, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

On Mon, Mar 14, 2016 at 05:03:34PM +0000, Dr. David Alan Gilbert wrote:
> * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > 
> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to any
> > > particular previous mail in the thread.
> > > 
> > >   1) How many of the free pages do we tell the host about?
> > >      Your main change is telling the host about all the
> > >      free pages.
> > 
> > Yes, all the guest's free pages.
> > 
> > >      If we tell the host about all the free pages, then we might
> > >      end up needing to allocate more pages and update the host
> > >      with pages we now want to use; that would have to wait for the
> > >      host to acknowledge that use of these pages, since if we don't
> > >      wait for it then it might have skipped migrating a page we
> > >      just started using (I don't understand how your series solves that).
> > >      So the guest probably needs to keep some free pages - how many?
> > 
> > Actually, there is no need to care about whether the free pages will be used by the host.
> > We only care about some of the free pages we get reused by the guest, right?
> > 
> > The dirty page logging can be used to solve this, starting the dirty page logging before getting
> > the free pages informant from guest. Even some of the free pages are modified by the guest
> > during the process of getting the free pages information, these modified pages will be traced
> > by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
> > omit any dirtied pages.
> > 
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.

This works once, sure. But there's an issue is that you have
to defer migration until you get the free page list,
and this only works once. So you end up with heuristics
about how long to wait.

Instead I propose:

- mark all pages dirty as we do now.

- at start of migration, start tracking dirty
  pages in kvm, and tell guest to start tracking free pages

we can now introduce any kind of delay, for
example wait for ack from guest, or do whatever else,
or even just start migrating pages

- repeatedly:
	- get list of free pages from guest
	- clear them in migration bitmap
	- get dirty list from kvm

- at end of migration, stop tracking writes in kvm,
  and tell guest to stop tracking free pages



> > >   2) Clearing out caches
> > >      Does it make sense to clean caches?  They're apparently useful data
> > >      so if we clean them it's likely to slow the guest down; I guess
> > >      they're also likely to be fairly static data - so at least fairly
> > >      easy to migrate.
> > >      The answer here partially depends on what you want from your migration;
> > >      if you're after the fastest possible migration time it might make
> > >      sense to clean the caches and avoid migrating them; but that might
> > >      be at the cost of more disruption to the guest - there's a trade off
> > >      somewhere and it's not clear to me how you set that depending on your
> > >      guest/network/reqirements.
> > > 
> > 
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> > 
> > >   3) Why is ballooning slow?
> > >      You've got a figure of 5s to balloon on an 8GB VM - but an
> > >      8GB VM isn't huge; so I worry about how long it would take
> > >      on a big VM.   We need to understand why it's slow
> > >        * is it due to the guest shuffling pages around?
> > >        * is it due to the virtio-balloon protocol sending one page
> > >          at a time?
> > >          + Do balloon pages normally clump in physical memory
> > >             - i.e. would a 'large balloon' message help
> > >             - or do we need a bitmap because it tends not to clump?
> > > 
> > 
> > I didn't do a comprehensive test. But I found most of the time spending
> > on allocating the pages and sending the PFNs to guest, I don't know that's
> > the most time consuming operation, allocating the pages or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where
> the problem is.
> 
> > >        * is it due to the madvise on the host?
> > >          If we were using the normal balloon messages, then we
> > >          could, during migration, just route those to the migration
> > >          code rather than bothering with the madvise.
> > >          If they're clumping together we could just turn that into
> > >          one big madvise; if they're not then would we benefit from
> > >          a call that lets us madvise lots of areas?
> > > 
> > 
> > My test showed madvise() is not the main reason for the long time, only taken
> > 10% of the total  inflating balloon operation time.
> > Big madvise can more or less improve the performance.
> 
> OK; 10% of the total is still pretty big even for your 8GB VM.
> 
> > >   4) Speeding up the migration of those free pages
> > >     You're using the bitmap to avoid migrating those free pages; HPe's
> > >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> > >     this all makes sense to avoid migrating them - I'd also been thinking
> > >     of using pagemap to spot zero pages that would help find other zero'd
> > >     pages, but perhaps ballooned is enough?
> > > 
> > Could you describe your ideal with more details?
> 
> At the moment the migration code spends a fair amount of time checking if a page
> is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap
> and check if the page was mapped; that would seem cheap if we're checking big
> ranges; and that would find all the balloon pages.
> 
> > >   5) Second-migrate
> > >     Given a VM where you've done all those tricks on, what happens when
> > >     you migrate it a second time?   I guess you're aiming for the guest
> > >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> > >     bitmap along with the migration data.
> > 
> > Nothing is special in the second migration, QEMU will request the guest for free pages
> > Information, and the guest will traverse it's current free page list to construct a
> > new free page bitmap and send it to QEMU. Just like in the first migration.
> 
> Right.
> 
> Dave
> 
> > Liang
> > > 
> > > Dave
> > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-15 10:29                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2016-03-15 10:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Li, Liang Z, Amit Shah, quintela, qemu-devel, linux-kernel, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

On Mon, Mar 14, 2016 at 05:03:34PM +0000, Dr. David Alan Gilbert wrote:
> * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > 
> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to any
> > > particular previous mail in the thread.
> > > 
> > >   1) How many of the free pages do we tell the host about?
> > >      Your main change is telling the host about all the
> > >      free pages.
> > 
> > Yes, all the guest's free pages.
> > 
> > >      If we tell the host about all the free pages, then we might
> > >      end up needing to allocate more pages and update the host
> > >      with pages we now want to use; that would have to wait for the
> > >      host to acknowledge that use of these pages, since if we don't
> > >      wait for it then it might have skipped migrating a page we
> > >      just started using (I don't understand how your series solves that).
> > >      So the guest probably needs to keep some free pages - how many?
> > 
> > Actually, there is no need to care about whether the free pages will be used by the host.
> > We only care about some of the free pages we get reused by the guest, right?
> > 
> > The dirty page logging can be used to solve this, starting the dirty page logging before getting
> > the free pages informant from guest. Even some of the free pages are modified by the guest
> > during the process of getting the free pages information, these modified pages will be traced
> > by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
> > omit any dirtied pages.
> > 
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.

This works once, sure. But there's an issue is that you have
to defer migration until you get the free page list,
and this only works once. So you end up with heuristics
about how long to wait.

Instead I propose:

- mark all pages dirty as we do now.

- at start of migration, start tracking dirty
  pages in kvm, and tell guest to start tracking free pages

we can now introduce any kind of delay, for
example wait for ack from guest, or do whatever else,
or even just start migrating pages

- repeatedly:
	- get list of free pages from guest
	- clear them in migration bitmap
	- get dirty list from kvm

- at end of migration, stop tracking writes in kvm,
  and tell guest to stop tracking free pages



> > >   2) Clearing out caches
> > >      Does it make sense to clean caches?  They're apparently useful data
> > >      so if we clean them it's likely to slow the guest down; I guess
> > >      they're also likely to be fairly static data - so at least fairly
> > >      easy to migrate.
> > >      The answer here partially depends on what you want from your migration;
> > >      if you're after the fastest possible migration time it might make
> > >      sense to clean the caches and avoid migrating them; but that might
> > >      be at the cost of more disruption to the guest - there's a trade off
> > >      somewhere and it's not clear to me how you set that depending on your
> > >      guest/network/reqirements.
> > > 
> > 
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> > 
> > >   3) Why is ballooning slow?
> > >      You've got a figure of 5s to balloon on an 8GB VM - but an
> > >      8GB VM isn't huge; so I worry about how long it would take
> > >      on a big VM.   We need to understand why it's slow
> > >        * is it due to the guest shuffling pages around?
> > >        * is it due to the virtio-balloon protocol sending one page
> > >          at a time?
> > >          + Do balloon pages normally clump in physical memory
> > >             - i.e. would a 'large balloon' message help
> > >             - or do we need a bitmap because it tends not to clump?
> > > 
> > 
> > I didn't do a comprehensive test. But I found most of the time spending
> > on allocating the pages and sending the PFNs to guest, I don't know that's
> > the most time consuming operation, allocating the pages or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where
> the problem is.
> 
> > >        * is it due to the madvise on the host?
> > >          If we were using the normal balloon messages, then we
> > >          could, during migration, just route those to the migration
> > >          code rather than bothering with the madvise.
> > >          If they're clumping together we could just turn that into
> > >          one big madvise; if they're not then would we benefit from
> > >          a call that lets us madvise lots of areas?
> > > 
> > 
> > My test showed madvise() is not the main reason for the long time, only taken
> > 10% of the total  inflating balloon operation time.
> > Big madvise can more or less improve the performance.
> 
> OK; 10% of the total is still pretty big even for your 8GB VM.
> 
> > >   4) Speeding up the migration of those free pages
> > >     You're using the bitmap to avoid migrating those free pages; HPe's
> > >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> > >     this all makes sense to avoid migrating them - I'd also been thinking
> > >     of using pagemap to spot zero pages that would help find other zero'd
> > >     pages, but perhaps ballooned is enough?
> > > 
> > Could you describe your ideal with more details?
> 
> At the moment the migration code spends a fair amount of time checking if a page
> is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap
> and check if the page was mapped; that would seem cheap if we're checking big
> ranges; and that would find all the balloon pages.
> 
> > >   5) Second-migrate
> > >     Given a VM where you've done all those tricks on, what happens when
> > >     you migrate it a second time?   I guess you're aiming for the guest
> > >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> > >     bitmap along with the migration data.
> > 
> > Nothing is special in the second migration, QEMU will request the guest for free pages
> > Information, and the guest will traverse it's current free page list to construct a
> > new free page bitmap and send it to QEMU. Just like in the first migration.
> 
> Right.
> 
> Dave
> 
> > Liang
> > > 
> > > Dave
> > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-14 17:03               ` Dr. David Alan Gilbert
                                 ` (3 preceding siblings ...)
  (?)
@ 2016-03-15 10:29               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 58+ messages in thread
From: Michael S. Tsirkin @ 2016-03-15 10:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: ehabkost, kvm, simhan, Li, Liang Z, linux-kernel, qemu-devel,
	jitendra.kolhe, linux-mm, mohan_parthasarathy, Amit Shah,
	pbonzini, akpm, virtualization, rth

On Mon, Mar 14, 2016 at 05:03:34PM +0000, Dr. David Alan Gilbert wrote:
> * Li, Liang Z (liang.z.li@intel.com) wrote:
> > > 
> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to any
> > > particular previous mail in the thread.
> > > 
> > >   1) How many of the free pages do we tell the host about?
> > >      Your main change is telling the host about all the
> > >      free pages.
> > 
> > Yes, all the guest's free pages.
> > 
> > >      If we tell the host about all the free pages, then we might
> > >      end up needing to allocate more pages and update the host
> > >      with pages we now want to use; that would have to wait for the
> > >      host to acknowledge that use of these pages, since if we don't
> > >      wait for it then it might have skipped migrating a page we
> > >      just started using (I don't understand how your series solves that).
> > >      So the guest probably needs to keep some free pages - how many?
> > 
> > Actually, there is no need to care about whether the free pages will be used by the host.
> > We only care about some of the free pages we get reused by the guest, right?
> > 
> > The dirty page logging can be used to solve this, starting the dirty page logging before getting
> > the free pages informant from guest. Even some of the free pages are modified by the guest
> > during the process of getting the free pages information, these modified pages will be traced
> > by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
> > omit any dirtied pages.
> > 
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.

This works once, sure. But there's an issue is that you have
to defer migration until you get the free page list,
and this only works once. So you end up with heuristics
about how long to wait.

Instead I propose:

- mark all pages dirty as we do now.

- at start of migration, start tracking dirty
  pages in kvm, and tell guest to start tracking free pages

we can now introduce any kind of delay, for
example wait for ack from guest, or do whatever else,
or even just start migrating pages

- repeatedly:
	- get list of free pages from guest
	- clear them in migration bitmap
	- get dirty list from kvm

- at end of migration, stop tracking writes in kvm,
  and tell guest to stop tracking free pages



> > >   2) Clearing out caches
> > >      Does it make sense to clean caches?  They're apparently useful data
> > >      so if we clean them it's likely to slow the guest down; I guess
> > >      they're also likely to be fairly static data - so at least fairly
> > >      easy to migrate.
> > >      The answer here partially depends on what you want from your migration;
> > >      if you're after the fastest possible migration time it might make
> > >      sense to clean the caches and avoid migrating them; but that might
> > >      be at the cost of more disruption to the guest - there's a trade off
> > >      somewhere and it's not clear to me how you set that depending on your
> > >      guest/network/reqirements.
> > > 
> > 
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> > 
> > >   3) Why is ballooning slow?
> > >      You've got a figure of 5s to balloon on an 8GB VM - but an
> > >      8GB VM isn't huge; so I worry about how long it would take
> > >      on a big VM.   We need to understand why it's slow
> > >        * is it due to the guest shuffling pages around?
> > >        * is it due to the virtio-balloon protocol sending one page
> > >          at a time?
> > >          + Do balloon pages normally clump in physical memory
> > >             - i.e. would a 'large balloon' message help
> > >             - or do we need a bitmap because it tends not to clump?
> > > 
> > 
> > I didn't do a comprehensive test. But I found most of the time spending
> > on allocating the pages and sending the PFNs to guest, I don't know that's
> > the most time consuming operation, allocating the pages or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where
> the problem is.
> 
> > >        * is it due to the madvise on the host?
> > >          If we were using the normal balloon messages, then we
> > >          could, during migration, just route those to the migration
> > >          code rather than bothering with the madvise.
> > >          If they're clumping together we could just turn that into
> > >          one big madvise; if they're not then would we benefit from
> > >          a call that lets us madvise lots of areas?
> > > 
> > 
> > My test showed madvise() is not the main reason for the long time, only taken
> > 10% of the total  inflating balloon operation time.
> > Big madvise can more or less improve the performance.
> 
> OK; 10% of the total is still pretty big even for your 8GB VM.
> 
> > >   4) Speeding up the migration of those free pages
> > >     You're using the bitmap to avoid migrating those free pages; HPe's
> > >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> > >     this all makes sense to avoid migrating them - I'd also been thinking
> > >     of using pagemap to spot zero pages that would help find other zero'd
> > >     pages, but perhaps ballooned is enough?
> > > 
> > Could you describe your ideal with more details?
> 
> At the moment the migration code spends a fair amount of time checking if a page
> is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap
> and check if the page was mapped; that would seem cheap if we're checking big
> ranges; and that would find all the balloon pages.
> 
> > >   5) Second-migrate
> > >     Given a VM where you've done all those tricks on, what happens when
> > >     you migrate it a second time?   I guess you're aiming for the guest
> > >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> > >     bitmap along with the migration data.
> > 
> > Nothing is special in the second migration, QEMU will request the guest for free pages
> > Information, and the guest will traverse it's current free page list to construct a
> > new free page bitmap and send it to QEMU. Just like in the first migration.
> 
> Right.
> 
> Dave
> 
> > Liang
> > > 
> > > Dave
> > > 
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-14 17:03               ` Dr. David Alan Gilbert
  (?)
@ 2016-03-15  3:31                 ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-15  3:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to
> > > any particular previous mail in the thread.
> > >
> > >   1) How many of the free pages do we tell the host about?
> > >      Your main change is telling the host about all the
> > >      free pages.
> >
> > Yes, all the guest's free pages.
> >
> > >      If we tell the host about all the free pages, then we might
> > >      end up needing to allocate more pages and update the host
> > >      with pages we now want to use; that would have to wait for the
> > >      host to acknowledge that use of these pages, since if we don't
> > >      wait for it then it might have skipped migrating a page we
> > >      just started using (I don't understand how your series solves that).
> > >      So the guest probably needs to keep some free pages - how many?
> >
> > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > We only care about some of the free pages we get reused by the guest,
> right?
> >
> > The dirty page logging can be used to solve this, starting the dirty
> > page logging before getting the free pages informant from guest. Even
> > some of the free pages are modified by the guest during the process of
> > getting the free pages information, these modified pages will be traced by
> the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be
> > reset to dirty. We won't omit any dirtied pages.
> >
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.
> 
> > >   2) Clearing out caches
> > >      Does it make sense to clean caches?  They're apparently useful data
> > >      so if we clean them it's likely to slow the guest down; I guess
> > >      they're also likely to be fairly static data - so at least fairly
> > >      easy to migrate.
> > >      The answer here partially depends on what you want from your
> migration;
> > >      if you're after the fastest possible migration time it might make
> > >      sense to clean the caches and avoid migrating them; but that might
> > >      be at the cost of more disruption to the guest - there's a trade off
> > >      somewhere and it's not clear to me how you set that depending on
> your
> > >      guest/network/reqirements.
> > >
> >
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> >
> > >   3) Why is ballooning slow?
> > >      You've got a figure of 5s to balloon on an 8GB VM - but an
> > >      8GB VM isn't huge; so I worry about how long it would take
> > >      on a big VM.   We need to understand why it's slow
> > >        * is it due to the guest shuffling pages around?
> > >        * is it due to the virtio-balloon protocol sending one page
> > >          at a time?
> > >          + Do balloon pages normally clump in physical memory
> > >             - i.e. would a 'large balloon' message help
> > >             - or do we need a bitmap because it tends not to clump?
> > >
> >
> > I didn't do a comprehensive test. But I found most of the time
> > spending on allocating the pages and sending the PFNs to guest, I
> > don't know that's the most time consuming operation, allocating the pages
> or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where the
> problem is.
> 

Yes, I will try to measure the time spending on different parts.

> > >        * is it due to the madvise on the host?
> > >          If we were using the normal balloon messages, then we
> > >          could, during migration, just route those to the migration
> > >          code rather than bothering with the madvise.
> > >          If they're clumping together we could just turn that into
> > >          one big madvise; if they're not then would we benefit from
> > >          a call that lets us madvise lots of areas?
> > >
> >
> > My test showed madvise() is not the main reason for the long time,
> > only taken 10% of the total  inflating balloon operation time.
> > Big madvise can more or less improve the performance.
> 
> OK; 10% of the total is still pretty big even for your 8GB VM.
> 
> > >   4) Speeding up the migration of those free pages
> > >     You're using the bitmap to avoid migrating those free pages; HPe's
> > >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> > >     this all makes sense to avoid migrating them - I'd also been thinking
> > >     of using pagemap to spot zero pages that would help find other zero'd
> > >     pages, but perhaps ballooned is enough?
> > >
> > Could you describe your ideal with more details?
> 
> At the moment the migration code spends a fair amount of time checking if a
> page is zero; I was thinking perhaps the qemu could just open
> /proc/self/pagemap and check if the page was mapped; that would seem
> cheap if we're checking big ranges; and that would find all the balloon pages.
> 

Even if virtio-balloon is not enabled, it can be used to find the pages that never used
by guest.

> > >   5) Second-migrate
> > >     Given a VM where you've done all those tricks on, what happens when
> > >     you migrate it a second time?   I guess you're aiming for the guest
> > >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> > >     bitmap along with the migration data.
> >
> > Nothing is special in the second migration, QEMU will request the
> > guest for free pages Information, and the guest will traverse it's
> > current free page list to construct a new free page bitmap and send it to
> QEMU. Just like in the first migration.
> 
> Right.
> 
> Dave
> 
> > Liang
> > >
> > > Dave
> > >
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-15  3:31                 ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-15  3:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: ehabkost, kvm, mst, simhan, quintela, linux-kernel, qemu-devel,
	jitendra.kolhe, linux-mm, mohan_parthasarathy, Amit Shah,
	pbonzini, akpm, virtualization, rth

> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to
> > > any particular previous mail in the thread.
> > >
> > >   1) How many of the free pages do we tell the host about?
> > >      Your main change is telling the host about all the
> > >      free pages.
> >
> > Yes, all the guest's free pages.
> >
> > >      If we tell the host about all the free pages, then we might
> > >      end up needing to allocate more pages and update the host
> > >      with pages we now want to use; that would have to wait for the
> > >      host to acknowledge that use of these pages, since if we don't
> > >      wait for it then it might have skipped migrating a page we
> > >      just started using (I don't understand how your series solves that).
> > >      So the guest probably needs to keep some free pages - how many?
> >
> > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > We only care about some of the free pages we get reused by the guest,
> right?
> >
> > The dirty page logging can be used to solve this, starting the dirty
> > page logging before getting the free pages informant from guest. Even
> > some of the free pages are modified by the guest during the process of
> > getting the free pages information, these modified pages will be traced by
> the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be
> > reset to dirty. We won't omit any dirtied pages.
> >
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.
> 
> > >   2) Clearing out caches
> > >      Does it make sense to clean caches?  They're apparently useful data
> > >      so if we clean them it's likely to slow the guest down; I guess
> > >      they're also likely to be fairly static data - so at least fairly
> > >      easy to migrate.
> > >      The answer here partially depends on what you want from your
> migration;
> > >      if you're after the fastest possible migration time it might make
> > >      sense to clean the caches and avoid migrating them; but that might
> > >      be at the cost of more disruption to the guest - there's a trade off
> > >      somewhere and it's not clear to me how you set that depending on
> your
> > >      guest/network/reqirements.
> > >
> >
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> >
> > >   3) Why is ballooning slow?
> > >      You've got a figure of 5s to balloon on an 8GB VM - but an
> > >      8GB VM isn't huge; so I worry about how long it would take
> > >      on a big VM.   We need to understand why it's slow
> > >        * is it due to the guest shuffling pages around?
> > >        * is it due to the virtio-balloon protocol sending one page
> > >          at a time?
> > >          + Do balloon pages normally clump in physical memory
> > >             - i.e. would a 'large balloon' message help
> > >             - or do we need a bitmap because it tends not to clump?
> > >
> >
> > I didn't do a comprehensive test. But I found most of the time
> > spending on allocating the pages and sending the PFNs to guest, I
> > don't know that's the most time consuming operation, allocating the pages
> or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where the
> problem is.
> 

Yes, I will try to measure the time spending on different parts.

> > >        * is it due to the madvise on the host?
> > >          If we were using the normal balloon messages, then we
> > >          could, during migration, just route those to the migration
> > >          code rather than bothering with the madvise.
> > >          If they're clumping together we could just turn that into
> > >          one big madvise; if they're not then would we benefit from
> > >          a call that lets us madvise lots of areas?
> > >
> >
> > My test showed madvise() is not the main reason for the long time,
> > only taken 10% of the total  inflating balloon operation time.
> > Big madvise can more or less improve the performance.
> 
> OK; 10% of the total is still pretty big even for your 8GB VM.
> 
> > >   4) Speeding up the migration of those free pages
> > >     You're using the bitmap to avoid migrating those free pages; HPe's
> > >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> > >     this all makes sense to avoid migrating them - I'd also been thinking
> > >     of using pagemap to spot zero pages that would help find other zero'd
> > >     pages, but perhaps ballooned is enough?
> > >
> > Could you describe your ideal with more details?
> 
> At the moment the migration code spends a fair amount of time checking if a
> page is zero; I was thinking perhaps the qemu could just open
> /proc/self/pagemap and check if the page was mapped; that would seem
> cheap if we're checking big ranges; and that would find all the balloon pages.
> 

Even if virtio-balloon is not enabled, it can be used to find the pages that never used
by guest.

> > >   5) Second-migrate
> > >     Given a VM where you've done all those tricks on, what happens when
> > >     you migrate it a second time?   I guess you're aiming for the guest
> > >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> > >     bitmap along with the migration data.
> >
> > Nothing is special in the second migration, QEMU will request the
> > guest for free pages Information, and the guest will traverse it's
> > current free page list to construct a new free page bitmap and send it to
> QEMU. Just like in the first migration.
> 
> Right.
> 
> Dave
> 
> > Liang
> > >
> > > Dave
> > >
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-15  3:31                 ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-15  3:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to
> > > any particular previous mail in the thread.
> > >
> > >   1) How many of the free pages do we tell the host about?
> > >      Your main change is telling the host about all the
> > >      free pages.
> >
> > Yes, all the guest's free pages.
> >
> > >      If we tell the host about all the free pages, then we might
> > >      end up needing to allocate more pages and update the host
> > >      with pages we now want to use; that would have to wait for the
> > >      host to acknowledge that use of these pages, since if we don't
> > >      wait for it then it might have skipped migrating a page we
> > >      just started using (I don't understand how your series solves that).
> > >      So the guest probably needs to keep some free pages - how many?
> >
> > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > We only care about some of the free pages we get reused by the guest,
> right?
> >
> > The dirty page logging can be used to solve this, starting the dirty
> > page logging before getting the free pages informant from guest. Even
> > some of the free pages are modified by the guest during the process of
> > getting the free pages information, these modified pages will be traced by
> the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be
> > reset to dirty. We won't omit any dirtied pages.
> >
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.
> 
> > >   2) Clearing out caches
> > >      Does it make sense to clean caches?  They're apparently useful data
> > >      so if we clean them it's likely to slow the guest down; I guess
> > >      they're also likely to be fairly static data - so at least fairly
> > >      easy to migrate.
> > >      The answer here partially depends on what you want from your
> migration;
> > >      if you're after the fastest possible migration time it might make
> > >      sense to clean the caches and avoid migrating them; but that might
> > >      be at the cost of more disruption to the guest - there's a trade off
> > >      somewhere and it's not clear to me how you set that depending on
> your
> > >      guest/network/reqirements.
> > >
> >
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> >
> > >   3) Why is ballooning slow?
> > >      You've got a figure of 5s to balloon on an 8GB VM - but an
> > >      8GB VM isn't huge; so I worry about how long it would take
> > >      on a big VM.   We need to understand why it's slow
> > >        * is it due to the guest shuffling pages around?
> > >        * is it due to the virtio-balloon protocol sending one page
> > >          at a time?
> > >          + Do balloon pages normally clump in physical memory
> > >             - i.e. would a 'large balloon' message help
> > >             - or do we need a bitmap because it tends not to clump?
> > >
> >
> > I didn't do a comprehensive test. But I found most of the time
> > spending on allocating the pages and sending the PFNs to guest, I
> > don't know that's the most time consuming operation, allocating the pages
> or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where the
> problem is.
> 

Yes, I will try to measure the time spending on different parts.

> > >        * is it due to the madvise on the host?
> > >          If we were using the normal balloon messages, then we
> > >          could, during migration, just route those to the migration
> > >          code rather than bothering with the madvise.
> > >          If they're clumping together we could just turn that into
> > >          one big madvise; if they're not then would we benefit from
> > >          a call that lets us madvise lots of areas?
> > >
> >
> > My test showed madvise() is not the main reason for the long time,
> > only taken 10% of the total  inflating balloon operation time.
> > Big madvise can more or less improve the performance.
> 
> OK; 10% of the total is still pretty big even for your 8GB VM.
> 
> > >   4) Speeding up the migration of those free pages
> > >     You're using the bitmap to avoid migrating those free pages; HPe's
> > >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> > >     this all makes sense to avoid migrating them - I'd also been thinking
> > >     of using pagemap to spot zero pages that would help find other zero'd
> > >     pages, but perhaps ballooned is enough?
> > >
> > Could you describe your ideal with more details?
> 
> At the moment the migration code spends a fair amount of time checking if a
> page is zero; I was thinking perhaps the qemu could just open
> /proc/self/pagemap and check if the page was mapped; that would seem
> cheap if we're checking big ranges; and that would find all the balloon pages.
> 

Even if virtio-balloon is not enabled, it can be used to find the pages that never used
by guest.

> > >   5) Second-migrate
> > >     Given a VM where you've done all those tricks on, what happens when
> > >     you migrate it a second time?   I guess you're aiming for the guest
> > >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> > >     bitmap along with the migration data.
> >
> > Nothing is special in the second migration, QEMU will request the
> > guest for free pages Information, and the guest will traverse it's
> > current free page list to construct a new free page bitmap and send it to
> QEMU. Just like in the first migration.
> 
> Right.
> 
> Dave
> 
> > Liang
> > >
> > > Dave
> > >
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-14 17:03               ` Dr. David Alan Gilbert
                                 ` (2 preceding siblings ...)
  (?)
@ 2016-03-15  3:31               ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-15  3:31 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: ehabkost, kvm, mst, simhan, linux-kernel, qemu-devel,
	jitendra.kolhe, linux-mm, mohan_parthasarathy, Amit Shah,
	pbonzini, akpm, virtualization, rth

> > > Hi,
> > >   I'm just catching back up on this thread; so without reference to
> > > any particular previous mail in the thread.
> > >
> > >   1) How many of the free pages do we tell the host about?
> > >      Your main change is telling the host about all the
> > >      free pages.
> >
> > Yes, all the guest's free pages.
> >
> > >      If we tell the host about all the free pages, then we might
> > >      end up needing to allocate more pages and update the host
> > >      with pages we now want to use; that would have to wait for the
> > >      host to acknowledge that use of these pages, since if we don't
> > >      wait for it then it might have skipped migrating a page we
> > >      just started using (I don't understand how your series solves that).
> > >      So the guest probably needs to keep some free pages - how many?
> >
> > Actually, there is no need to care about whether the free pages will be
> used by the host.
> > We only care about some of the free pages we get reused by the guest,
> right?
> >
> > The dirty page logging can be used to solve this, starting the dirty
> > page logging before getting the free pages informant from guest. Even
> > some of the free pages are modified by the guest during the process of
> > getting the free pages information, these modified pages will be traced by
> the dirty page logging mechanism. So in the following
> migration_bitmap_sync() function.
> > The pages in the free pages bitmap, but latter was modified, will be
> > reset to dirty. We won't omit any dirtied pages.
> >
> > So, guest doesn't need to keep any free pages.
> 
> OK, yes, that works; so we do:
>   * enable dirty logging
>   * ask guest for free pages
>   * initialise the migration bitmap as everything-free
>   * then later we do the normal sync-dirty bitmap stuff and it all just works.
> 
> That's nice and simple.
> 
> > >   2) Clearing out caches
> > >      Does it make sense to clean caches?  They're apparently useful data
> > >      so if we clean them it's likely to slow the guest down; I guess
> > >      they're also likely to be fairly static data - so at least fairly
> > >      easy to migrate.
> > >      The answer here partially depends on what you want from your
> migration;
> > >      if you're after the fastest possible migration time it might make
> > >      sense to clean the caches and avoid migrating them; but that might
> > >      be at the cost of more disruption to the guest - there's a trade off
> > >      somewhere and it's not clear to me how you set that depending on
> your
> > >      guest/network/reqirements.
> > >
> >
> > Yes, clean the caches is an option.  Let the users decide using it or not.
> >
> > >   3) Why is ballooning slow?
> > >      You've got a figure of 5s to balloon on an 8GB VM - but an
> > >      8GB VM isn't huge; so I worry about how long it would take
> > >      on a big VM.   We need to understand why it's slow
> > >        * is it due to the guest shuffling pages around?
> > >        * is it due to the virtio-balloon protocol sending one page
> > >          at a time?
> > >          + Do balloon pages normally clump in physical memory
> > >             - i.e. would a 'large balloon' message help
> > >             - or do we need a bitmap because it tends not to clump?
> > >
> >
> > I didn't do a comprehensive test. But I found most of the time
> > spending on allocating the pages and sending the PFNs to guest, I
> > don't know that's the most time consuming operation, allocating the pages
> or sending the PFNs.
> 
> It might be a good idea to analyse it a bit more to convince people where the
> problem is.
> 

Yes, I will try to measure the time spending on different parts.

> > >        * is it due to the madvise on the host?
> > >          If we were using the normal balloon messages, then we
> > >          could, during migration, just route those to the migration
> > >          code rather than bothering with the madvise.
> > >          If they're clumping together we could just turn that into
> > >          one big madvise; if they're not then would we benefit from
> > >          a call that lets us madvise lots of areas?
> > >
> >
> > My test showed madvise() is not the main reason for the long time,
> > only taken 10% of the total  inflating balloon operation time.
> > Big madvise can more or less improve the performance.
> 
> OK; 10% of the total is still pretty big even for your 8GB VM.
> 
> > >   4) Speeding up the migration of those free pages
> > >     You're using the bitmap to avoid migrating those free pages; HPe's
> > >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> > >     this all makes sense to avoid migrating them - I'd also been thinking
> > >     of using pagemap to spot zero pages that would help find other zero'd
> > >     pages, but perhaps ballooned is enough?
> > >
> > Could you describe your ideal with more details?
> 
> At the moment the migration code spends a fair amount of time checking if a
> page is zero; I was thinking perhaps the qemu could just open
> /proc/self/pagemap and check if the page was mapped; that would seem
> cheap if we're checking big ranges; and that would find all the balloon pages.
> 

Even if virtio-balloon is not enabled, it can be used to find the pages that never used
by guest.

> > >   5) Second-migrate
> > >     Given a VM where you've done all those tricks on, what happens when
> > >     you migrate it a second time?   I guess you're aiming for the guest
> > >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> > >     bitmap along with the migration data.
> >
> > Nothing is special in the second migration, QEMU will request the
> > guest for free pages Information, and the guest will traverse it's
> > current free page list to construct a new free page bitmap and send it to
> QEMU. Just like in the first migration.
> 
> Right.
> 
> Dave
> 
> > Liang
> > >
> > > Dave
> > >
> > > --
> > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-11  2:38             ` Li, Liang Z
  (?)
@ 2016-03-14 17:03               ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-14 17:03 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > 
> > Hi,
> >   I'm just catching back up on this thread; so without reference to any
> > particular previous mail in the thread.
> > 
> >   1) How many of the free pages do we tell the host about?
> >      Your main change is telling the host about all the
> >      free pages.
> 
> Yes, all the guest's free pages.
> 
> >      If we tell the host about all the free pages, then we might
> >      end up needing to allocate more pages and update the host
> >      with pages we now want to use; that would have to wait for the
> >      host to acknowledge that use of these pages, since if we don't
> >      wait for it then it might have skipped migrating a page we
> >      just started using (I don't understand how your series solves that).
> >      So the guest probably needs to keep some free pages - how many?
> 
> Actually, there is no need to care about whether the free pages will be used by the host.
> We only care about some of the free pages we get reused by the guest, right?
> 
> The dirty page logging can be used to solve this, starting the dirty page logging before getting
> the free pages informant from guest. Even some of the free pages are modified by the guest
> during the process of getting the free pages information, these modified pages will be traced
> by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
> The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
> omit any dirtied pages.
> 
> So, guest doesn't need to keep any free pages.

OK, yes, that works; so we do:
  * enable dirty logging
  * ask guest for free pages
  * initialise the migration bitmap as everything-free
  * then later we do the normal sync-dirty bitmap stuff and it all just works.

That's nice and simple.

> >   2) Clearing out caches
> >      Does it make sense to clean caches?  They're apparently useful data
> >      so if we clean them it's likely to slow the guest down; I guess
> >      they're also likely to be fairly static data - so at least fairly
> >      easy to migrate.
> >      The answer here partially depends on what you want from your migration;
> >      if you're after the fastest possible migration time it might make
> >      sense to clean the caches and avoid migrating them; but that might
> >      be at the cost of more disruption to the guest - there's a trade off
> >      somewhere and it's not clear to me how you set that depending on your
> >      guest/network/reqirements.
> > 
> 
> Yes, clean the caches is an option.  Let the users decide using it or not.
> 
> >   3) Why is ballooning slow?
> >      You've got a figure of 5s to balloon on an 8GB VM - but an
> >      8GB VM isn't huge; so I worry about how long it would take
> >      on a big VM.   We need to understand why it's slow
> >        * is it due to the guest shuffling pages around?
> >        * is it due to the virtio-balloon protocol sending one page
> >          at a time?
> >          + Do balloon pages normally clump in physical memory
> >             - i.e. would a 'large balloon' message help
> >             - or do we need a bitmap because it tends not to clump?
> > 
> 
> I didn't do a comprehensive test. But I found most of the time spending
> on allocating the pages and sending the PFNs to guest, I don't know that's
> the most time consuming operation, allocating the pages or sending the PFNs.

It might be a good idea to analyse it a bit more to convince people where
the problem is.

> >        * is it due to the madvise on the host?
> >          If we were using the normal balloon messages, then we
> >          could, during migration, just route those to the migration
> >          code rather than bothering with the madvise.
> >          If they're clumping together we could just turn that into
> >          one big madvise; if they're not then would we benefit from
> >          a call that lets us madvise lots of areas?
> > 
> 
> My test showed madvise() is not the main reason for the long time, only taken
> 10% of the total  inflating balloon operation time.
> Big madvise can more or less improve the performance.

OK; 10% of the total is still pretty big even for your 8GB VM.

> >   4) Speeding up the migration of those free pages
> >     You're using the bitmap to avoid migrating those free pages; HPe's
> >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> >     this all makes sense to avoid migrating them - I'd also been thinking
> >     of using pagemap to spot zero pages that would help find other zero'd
> >     pages, but perhaps ballooned is enough?
> > 
> Could you describe your ideal with more details?

At the moment the migration code spends a fair amount of time checking if a page
is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap
and check if the page was mapped; that would seem cheap if we're checking big
ranges; and that would find all the balloon pages.

> >   5) Second-migrate
> >     Given a VM where you've done all those tricks on, what happens when
> >     you migrate it a second time?   I guess you're aiming for the guest
> >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> >     bitmap along with the migration data.
> 
> Nothing is special in the second migration, QEMU will request the guest for free pages
> Information, and the guest will traverse it's current free page list to construct a
> new free page bitmap and send it to QEMU. Just like in the first migration.

Right.

Dave

> Liang
> > 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-14 17:03               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-14 17:03 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > 
> > Hi,
> >   I'm just catching back up on this thread; so without reference to any
> > particular previous mail in the thread.
> > 
> >   1) How many of the free pages do we tell the host about?
> >      Your main change is telling the host about all the
> >      free pages.
> 
> Yes, all the guest's free pages.
> 
> >      If we tell the host about all the free pages, then we might
> >      end up needing to allocate more pages and update the host
> >      with pages we now want to use; that would have to wait for the
> >      host to acknowledge that use of these pages, since if we don't
> >      wait for it then it might have skipped migrating a page we
> >      just started using (I don't understand how your series solves that).
> >      So the guest probably needs to keep some free pages - how many?
> 
> Actually, there is no need to care about whether the free pages will be used by the host.
> We only care about some of the free pages we get reused by the guest, right?
> 
> The dirty page logging can be used to solve this, starting the dirty page logging before getting
> the free pages informant from guest. Even some of the free pages are modified by the guest
> during the process of getting the free pages information, these modified pages will be traced
> by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
> The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
> omit any dirtied pages.
> 
> So, guest doesn't need to keep any free pages.

OK, yes, that works; so we do:
  * enable dirty logging
  * ask guest for free pages
  * initialise the migration bitmap as everything-free
  * then later we do the normal sync-dirty bitmap stuff and it all just works.

That's nice and simple.

> >   2) Clearing out caches
> >      Does it make sense to clean caches?  They're apparently useful data
> >      so if we clean them it's likely to slow the guest down; I guess
> >      they're also likely to be fairly static data - so at least fairly
> >      easy to migrate.
> >      The answer here partially depends on what you want from your migration;
> >      if you're after the fastest possible migration time it might make
> >      sense to clean the caches and avoid migrating them; but that might
> >      be at the cost of more disruption to the guest - there's a trade off
> >      somewhere and it's not clear to me how you set that depending on your
> >      guest/network/reqirements.
> > 
> 
> Yes, clean the caches is an option.  Let the users decide using it or not.
> 
> >   3) Why is ballooning slow?
> >      You've got a figure of 5s to balloon on an 8GB VM - but an
> >      8GB VM isn't huge; so I worry about how long it would take
> >      on a big VM.   We need to understand why it's slow
> >        * is it due to the guest shuffling pages around?
> >        * is it due to the virtio-balloon protocol sending one page
> >          at a time?
> >          + Do balloon pages normally clump in physical memory
> >             - i.e. would a 'large balloon' message help
> >             - or do we need a bitmap because it tends not to clump?
> > 
> 
> I didn't do a comprehensive test. But I found most of the time spending
> on allocating the pages and sending the PFNs to guest, I don't know that's
> the most time consuming operation, allocating the pages or sending the PFNs.

It might be a good idea to analyse it a bit more to convince people where
the problem is.

> >        * is it due to the madvise on the host?
> >          If we were using the normal balloon messages, then we
> >          could, during migration, just route those to the migration
> >          code rather than bothering with the madvise.
> >          If they're clumping together we could just turn that into
> >          one big madvise; if they're not then would we benefit from
> >          a call that lets us madvise lots of areas?
> > 
> 
> My test showed madvise() is not the main reason for the long time, only taken
> 10% of the total  inflating balloon operation time.
> Big madvise can more or less improve the performance.

OK; 10% of the total is still pretty big even for your 8GB VM.

> >   4) Speeding up the migration of those free pages
> >     You're using the bitmap to avoid migrating those free pages; HPe's
> >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> >     this all makes sense to avoid migrating them - I'd also been thinking
> >     of using pagemap to spot zero pages that would help find other zero'd
> >     pages, but perhaps ballooned is enough?
> > 
> Could you describe your ideal with more details?

At the moment the migration code spends a fair amount of time checking if a page
is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap
and check if the page was mapped; that would seem cheap if we're checking big
ranges; and that would find all the balloon pages.

> >   5) Second-migrate
> >     Given a VM where you've done all those tricks on, what happens when
> >     you migrate it a second time?   I guess you're aiming for the guest
> >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> >     bitmap along with the migration data.
> 
> Nothing is special in the second migration, QEMU will request the guest for free pages
> Information, and the guest will traverse it's current free page list to construct a
> new free page bitmap and send it to QEMU. Just like in the first migration.

Right.

Dave

> Liang
> > 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-14 17:03               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-14 17:03 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > 
> > Hi,
> >   I'm just catching back up on this thread; so without reference to any
> > particular previous mail in the thread.
> > 
> >   1) How many of the free pages do we tell the host about?
> >      Your main change is telling the host about all the
> >      free pages.
> 
> Yes, all the guest's free pages.
> 
> >      If we tell the host about all the free pages, then we might
> >      end up needing to allocate more pages and update the host
> >      with pages we now want to use; that would have to wait for the
> >      host to acknowledge that use of these pages, since if we don't
> >      wait for it then it might have skipped migrating a page we
> >      just started using (I don't understand how your series solves that).
> >      So the guest probably needs to keep some free pages - how many?
> 
> Actually, there is no need to care about whether the free pages will be used by the host.
> We only care about some of the free pages we get reused by the guest, right?
> 
> The dirty page logging can be used to solve this, starting the dirty page logging before getting
> the free pages informant from guest. Even some of the free pages are modified by the guest
> during the process of getting the free pages information, these modified pages will be traced
> by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
> The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
> omit any dirtied pages.
> 
> So, guest doesn't need to keep any free pages.

OK, yes, that works; so we do:
  * enable dirty logging
  * ask guest for free pages
  * initialise the migration bitmap as everything-free
  * then later we do the normal sync-dirty bitmap stuff and it all just works.

That's nice and simple.

> >   2) Clearing out caches
> >      Does it make sense to clean caches?  They're apparently useful data
> >      so if we clean them it's likely to slow the guest down; I guess
> >      they're also likely to be fairly static data - so at least fairly
> >      easy to migrate.
> >      The answer here partially depends on what you want from your migration;
> >      if you're after the fastest possible migration time it might make
> >      sense to clean the caches and avoid migrating them; but that might
> >      be at the cost of more disruption to the guest - there's a trade off
> >      somewhere and it's not clear to me how you set that depending on your
> >      guest/network/reqirements.
> > 
> 
> Yes, clean the caches is an option.  Let the users decide using it or not.
> 
> >   3) Why is ballooning slow?
> >      You've got a figure of 5s to balloon on an 8GB VM - but an
> >      8GB VM isn't huge; so I worry about how long it would take
> >      on a big VM.   We need to understand why it's slow
> >        * is it due to the guest shuffling pages around?
> >        * is it due to the virtio-balloon protocol sending one page
> >          at a time?
> >          + Do balloon pages normally clump in physical memory
> >             - i.e. would a 'large balloon' message help
> >             - or do we need a bitmap because it tends not to clump?
> > 
> 
> I didn't do a comprehensive test. But I found most of the time spending
> on allocating the pages and sending the PFNs to guest, I don't know that's
> the most time consuming operation, allocating the pages or sending the PFNs.

It might be a good idea to analyse it a bit more to convince people where
the problem is.

> >        * is it due to the madvise on the host?
> >          If we were using the normal balloon messages, then we
> >          could, during migration, just route those to the migration
> >          code rather than bothering with the madvise.
> >          If they're clumping together we could just turn that into
> >          one big madvise; if they're not then would we benefit from
> >          a call that lets us madvise lots of areas?
> > 
> 
> My test showed madvise() is not the main reason for the long time, only taken
> 10% of the total  inflating balloon operation time.
> Big madvise can more or less improve the performance.

OK; 10% of the total is still pretty big even for your 8GB VM.

> >   4) Speeding up the migration of those free pages
> >     You're using the bitmap to avoid migrating those free pages; HPe's
> >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> >     this all makes sense to avoid migrating them - I'd also been thinking
> >     of using pagemap to spot zero pages that would help find other zero'd
> >     pages, but perhaps ballooned is enough?
> > 
> Could you describe your ideal with more details?

At the moment the migration code spends a fair amount of time checking if a page
is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap
and check if the page was mapped; that would seem cheap if we're checking big
ranges; and that would find all the balloon pages.

> >   5) Second-migrate
> >     Given a VM where you've done all those tricks on, what happens when
> >     you migrate it a second time?   I guess you're aiming for the guest
> >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> >     bitmap along with the migration data.
> 
> Nothing is special in the second migration, QEMU will request the guest for free pages
> Information, and the guest will traverse it's current free page list to construct a
> new free page bitmap and send it to QEMU. Just like in the first migration.

Right.

Dave

> Liang
> > 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-11  2:38             ` Li, Liang Z
  (?)
@ 2016-03-14 17:03             ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-14 17:03 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: ehabkost, kvm, mst, simhan, linux-kernel, qemu-devel,
	jitendra.kolhe, linux-mm, mohan_parthasarathy, Amit Shah,
	pbonzini, akpm, virtualization, rth

* Li, Liang Z (liang.z.li@intel.com) wrote:
> > 
> > Hi,
> >   I'm just catching back up on this thread; so without reference to any
> > particular previous mail in the thread.
> > 
> >   1) How many of the free pages do we tell the host about?
> >      Your main change is telling the host about all the
> >      free pages.
> 
> Yes, all the guest's free pages.
> 
> >      If we tell the host about all the free pages, then we might
> >      end up needing to allocate more pages and update the host
> >      with pages we now want to use; that would have to wait for the
> >      host to acknowledge that use of these pages, since if we don't
> >      wait for it then it might have skipped migrating a page we
> >      just started using (I don't understand how your series solves that).
> >      So the guest probably needs to keep some free pages - how many?
> 
> Actually, there is no need to care about whether the free pages will be used by the host.
> We only care about some of the free pages we get reused by the guest, right?
> 
> The dirty page logging can be used to solve this, starting the dirty page logging before getting
> the free pages informant from guest. Even some of the free pages are modified by the guest
> during the process of getting the free pages information, these modified pages will be traced
> by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
> The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
> omit any dirtied pages.
> 
> So, guest doesn't need to keep any free pages.

OK, yes, that works; so we do:
  * enable dirty logging
  * ask guest for free pages
  * initialise the migration bitmap as everything-free
  * then later we do the normal sync-dirty bitmap stuff and it all just works.

That's nice and simple.

> >   2) Clearing out caches
> >      Does it make sense to clean caches?  They're apparently useful data
> >      so if we clean them it's likely to slow the guest down; I guess
> >      they're also likely to be fairly static data - so at least fairly
> >      easy to migrate.
> >      The answer here partially depends on what you want from your migration;
> >      if you're after the fastest possible migration time it might make
> >      sense to clean the caches and avoid migrating them; but that might
> >      be at the cost of more disruption to the guest - there's a trade off
> >      somewhere and it's not clear to me how you set that depending on your
> >      guest/network/reqirements.
> > 
> 
> Yes, clean the caches is an option.  Let the users decide using it or not.
> 
> >   3) Why is ballooning slow?
> >      You've got a figure of 5s to balloon on an 8GB VM - but an
> >      8GB VM isn't huge; so I worry about how long it would take
> >      on a big VM.   We need to understand why it's slow
> >        * is it due to the guest shuffling pages around?
> >        * is it due to the virtio-balloon protocol sending one page
> >          at a time?
> >          + Do balloon pages normally clump in physical memory
> >             - i.e. would a 'large balloon' message help
> >             - or do we need a bitmap because it tends not to clump?
> > 
> 
> I didn't do a comprehensive test. But I found most of the time spending
> on allocating the pages and sending the PFNs to guest, I don't know that's
> the most time consuming operation, allocating the pages or sending the PFNs.

It might be a good idea to analyse it a bit more to convince people where
the problem is.

> >        * is it due to the madvise on the host?
> >          If we were using the normal balloon messages, then we
> >          could, during migration, just route those to the migration
> >          code rather than bothering with the madvise.
> >          If they're clumping together we could just turn that into
> >          one big madvise; if they're not then would we benefit from
> >          a call that lets us madvise lots of areas?
> > 
> 
> My test showed madvise() is not the main reason for the long time, only taken
> 10% of the total  inflating balloon operation time.
> Big madvise can more or less improve the performance.

OK; 10% of the total is still pretty big even for your 8GB VM.

> >   4) Speeding up the migration of those free pages
> >     You're using the bitmap to avoid migrating those free pages; HPe's
> >     patchset is reconstructing a bitmap from the balloon data;  OK, so
> >     this all makes sense to avoid migrating them - I'd also been thinking
> >     of using pagemap to spot zero pages that would help find other zero'd
> >     pages, but perhaps ballooned is enough?
> > 
> Could you describe your ideal with more details?

At the moment the migration code spends a fair amount of time checking if a page
is zero; I was thinking perhaps the qemu could just open /proc/self/pagemap
and check if the page was mapped; that would seem cheap if we're checking big
ranges; and that would find all the balloon pages.

> >   5) Second-migrate
> >     Given a VM where you've done all those tricks on, what happens when
> >     you migrate it a second time?   I guess you're aiming for the guest
> >     to update it's bitmap;  HPe's solution is to migrate it's balloon
> >     bitmap along with the migration data.
> 
> Nothing is special in the second migration, QEMU will request the guest for free pages
> Information, and the guest will traverse it's current free page list to construct a
> new free page bitmap and send it to QEMU. Just like in the first migration.

Right.

Dave

> Liang
> > 
> > Dave
> > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-10 11:18           ` Dr. David Alan Gilbert
@ 2016-03-11  2:38             ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-11  2:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

> 
> Hi,
>   I'm just catching back up on this thread; so without reference to any
> particular previous mail in the thread.
> 
>   1) How many of the free pages do we tell the host about?
>      Your main change is telling the host about all the
>      free pages.

Yes, all the guest's free pages.

>      If we tell the host about all the free pages, then we might
>      end up needing to allocate more pages and update the host
>      with pages we now want to use; that would have to wait for the
>      host to acknowledge that use of these pages, since if we don't
>      wait for it then it might have skipped migrating a page we
>      just started using (I don't understand how your series solves that).
>      So the guest probably needs to keep some free pages - how many?

Actually, there is no need to care about whether the free pages will be used by the host.
We only care about some of the free pages we get reused by the guest, right?

The dirty page logging can be used to solve this, starting the dirty page logging before getting
the free pages informant from guest. Even some of the free pages are modified by the guest
during the process of getting the free pages information, these modified pages will be traced
by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
omit any dirtied pages.

So, guest doesn't need to keep any free pages.

>   2) Clearing out caches
>      Does it make sense to clean caches?  They're apparently useful data
>      so if we clean them it's likely to slow the guest down; I guess
>      they're also likely to be fairly static data - so at least fairly
>      easy to migrate.
>      The answer here partially depends on what you want from your migration;
>      if you're after the fastest possible migration time it might make
>      sense to clean the caches and avoid migrating them; but that might
>      be at the cost of more disruption to the guest - there's a trade off
>      somewhere and it's not clear to me how you set that depending on your
>      guest/network/reqirements.
> 

Yes, clean the caches is an option.  Let the users decide using it or not.

>   3) Why is ballooning slow?
>      You've got a figure of 5s to balloon on an 8GB VM - but an
>      8GB VM isn't huge; so I worry about how long it would take
>      on a big VM.   We need to understand why it's slow
>        * is it due to the guest shuffling pages around?
>        * is it due to the virtio-balloon protocol sending one page
>          at a time?
>          + Do balloon pages normally clump in physical memory
>             - i.e. would a 'large balloon' message help
>             - or do we need a bitmap because it tends not to clump?
> 

I didn't do a comprehensive test. But I found most of the time spending
on allocating the pages and sending the PFNs to guest, I don't know that's
the most time consuming operation, allocating the pages or sending the PFNs.

>        * is it due to the madvise on the host?
>          If we were using the normal balloon messages, then we
>          could, during migration, just route those to the migration
>          code rather than bothering with the madvise.
>          If they're clumping together we could just turn that into
>          one big madvise; if they're not then would we benefit from
>          a call that lets us madvise lots of areas?
> 

My test showed madvise() is not the main reason for the long time, only taken
10% of the total  inflating balloon operation time.
Big madvise can more or less improve the performance.

>   4) Speeding up the migration of those free pages
>     You're using the bitmap to avoid migrating those free pages; HPe's
>     patchset is reconstructing a bitmap from the balloon data;  OK, so
>     this all makes sense to avoid migrating them - I'd also been thinking
>     of using pagemap to spot zero pages that would help find other zero'd
>     pages, but perhaps ballooned is enough?
> 
Could you describe your ideal with more details?

>   5) Second-migrate
>     Given a VM where you've done all those tricks on, what happens when
>     you migrate it a second time?   I guess you're aiming for the guest
>     to update it's bitmap;  HPe's solution is to migrate it's balloon
>     bitmap along with the migration data.

Nothing is special in the second migration, QEMU will request the guest for free pages
Information, and the guest will traverse it's current free page list to construct a
new free page bitmap and send it to QEMU. Just like in the first migration.

Liang
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-11  2:38             ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-11  2:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

> 
> Hi,
>   I'm just catching back up on this thread; so without reference to any
> particular previous mail in the thread.
> 
>   1) How many of the free pages do we tell the host about?
>      Your main change is telling the host about all the
>      free pages.

Yes, all the guest's free pages.

>      If we tell the host about all the free pages, then we might
>      end up needing to allocate more pages and update the host
>      with pages we now want to use; that would have to wait for the
>      host to acknowledge that use of these pages, since if we don't
>      wait for it then it might have skipped migrating a page we
>      just started using (I don't understand how your series solves that).
>      So the guest probably needs to keep some free pages - how many?

Actually, there is no need to care about whether the free pages will be used by the host.
We only care about some of the free pages we get reused by the guest, right?

The dirty page logging can be used to solve this, starting the dirty page logging before getting
the free pages informant from guest. Even some of the free pages are modified by the guest
during the process of getting the free pages information, these modified pages will be traced
by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
omit any dirtied pages.

So, guest doesn't need to keep any free pages.

>   2) Clearing out caches
>      Does it make sense to clean caches?  They're apparently useful data
>      so if we clean them it's likely to slow the guest down; I guess
>      they're also likely to be fairly static data - so at least fairly
>      easy to migrate.
>      The answer here partially depends on what you want from your migration;
>      if you're after the fastest possible migration time it might make
>      sense to clean the caches and avoid migrating them; but that might
>      be at the cost of more disruption to the guest - there's a trade off
>      somewhere and it's not clear to me how you set that depending on your
>      guest/network/reqirements.
> 

Yes, clean the caches is an option.  Let the users decide using it or not.

>   3) Why is ballooning slow?
>      You've got a figure of 5s to balloon on an 8GB VM - but an
>      8GB VM isn't huge; so I worry about how long it would take
>      on a big VM.   We need to understand why it's slow
>        * is it due to the guest shuffling pages around?
>        * is it due to the virtio-balloon protocol sending one page
>          at a time?
>          + Do balloon pages normally clump in physical memory
>             - i.e. would a 'large balloon' message help
>             - or do we need a bitmap because it tends not to clump?
> 

I didn't do a comprehensive test. But I found most of the time spending
on allocating the pages and sending the PFNs to guest, I don't know that's
the most time consuming operation, allocating the pages or sending the PFNs.

>        * is it due to the madvise on the host?
>          If we were using the normal balloon messages, then we
>          could, during migration, just route those to the migration
>          code rather than bothering with the madvise.
>          If they're clumping together we could just turn that into
>          one big madvise; if they're not then would we benefit from
>          a call that lets us madvise lots of areas?
> 

My test showed madvise() is not the main reason for the long time, only taken
10% of the total  inflating balloon operation time.
Big madvise can more or less improve the performance.

>   4) Speeding up the migration of those free pages
>     You're using the bitmap to avoid migrating those free pages; HPe's
>     patchset is reconstructing a bitmap from the balloon data;  OK, so
>     this all makes sense to avoid migrating them - I'd also been thinking
>     of using pagemap to spot zero pages that would help find other zero'd
>     pages, but perhaps ballooned is enough?
> 
Could you describe your ideal with more details?

>   5) Second-migrate
>     Given a VM where you've done all those tricks on, what happens when
>     you migrate it a second time?   I guess you're aiming for the guest
>     to update it's bitmap;  HPe's solution is to migrate it's balloon
>     bitmap along with the migration data.

Nothing is special in the second migration, QEMU will request the guest for free pages
Information, and the guest will traverse it's current free page list to construct a
new free page bitmap and send it to QEMU. Just like in the first migration.

Liang
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-10 11:18           ` Dr. David Alan Gilbert
  (?)
  (?)
@ 2016-03-11  2:38           ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-11  2:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: ehabkost, kvm, mst, simhan, linux-kernel, qemu-devel,
	jitendra.kolhe, linux-mm, mohan_parthasarathy, Amit Shah,
	pbonzini, akpm, virtualization, rth

> 
> Hi,
>   I'm just catching back up on this thread; so without reference to any
> particular previous mail in the thread.
> 
>   1) How many of the free pages do we tell the host about?
>      Your main change is telling the host about all the
>      free pages.

Yes, all the guest's free pages.

>      If we tell the host about all the free pages, then we might
>      end up needing to allocate more pages and update the host
>      with pages we now want to use; that would have to wait for the
>      host to acknowledge that use of these pages, since if we don't
>      wait for it then it might have skipped migrating a page we
>      just started using (I don't understand how your series solves that).
>      So the guest probably needs to keep some free pages - how many?

Actually, there is no need to care about whether the free pages will be used by the host.
We only care about some of the free pages we get reused by the guest, right?

The dirty page logging can be used to solve this, starting the dirty page logging before getting
the free pages informant from guest. Even some of the free pages are modified by the guest
during the process of getting the free pages information, these modified pages will be traced
by the dirty page logging mechanism. So in the following migration_bitmap_sync() function.
The pages in the free pages bitmap, but latter was modified, will be reset to dirty. We won't
omit any dirtied pages.

So, guest doesn't need to keep any free pages.

>   2) Clearing out caches
>      Does it make sense to clean caches?  They're apparently useful data
>      so if we clean them it's likely to slow the guest down; I guess
>      they're also likely to be fairly static data - so at least fairly
>      easy to migrate.
>      The answer here partially depends on what you want from your migration;
>      if you're after the fastest possible migration time it might make
>      sense to clean the caches and avoid migrating them; but that might
>      be at the cost of more disruption to the guest - there's a trade off
>      somewhere and it's not clear to me how you set that depending on your
>      guest/network/reqirements.
> 

Yes, clean the caches is an option.  Let the users decide using it or not.

>   3) Why is ballooning slow?
>      You've got a figure of 5s to balloon on an 8GB VM - but an
>      8GB VM isn't huge; so I worry about how long it would take
>      on a big VM.   We need to understand why it's slow
>        * is it due to the guest shuffling pages around?
>        * is it due to the virtio-balloon protocol sending one page
>          at a time?
>          + Do balloon pages normally clump in physical memory
>             - i.e. would a 'large balloon' message help
>             - or do we need a bitmap because it tends not to clump?
> 

I didn't do a comprehensive test. But I found most of the time spending
on allocating the pages and sending the PFNs to guest, I don't know that's
the most time consuming operation, allocating the pages or sending the PFNs.

>        * is it due to the madvise on the host?
>          If we were using the normal balloon messages, then we
>          could, during migration, just route those to the migration
>          code rather than bothering with the madvise.
>          If they're clumping together we could just turn that into
>          one big madvise; if they're not then would we benefit from
>          a call that lets us madvise lots of areas?
> 

My test showed madvise() is not the main reason for the long time, only taken
10% of the total  inflating balloon operation time.
Big madvise can more or less improve the performance.

>   4) Speeding up the migration of those free pages
>     You're using the bitmap to avoid migrating those free pages; HPe's
>     patchset is reconstructing a bitmap from the balloon data;  OK, so
>     this all makes sense to avoid migrating them - I'd also been thinking
>     of using pagemap to spot zero pages that would help find other zero'd
>     pages, but perhaps ballooned is enough?
> 
Could you describe your ideal with more details?

>   5) Second-migrate
>     Given a VM where you've done all those tricks on, what happens when
>     you migrate it a second time?   I guess you're aiming for the guest
>     to update it's bitmap;  HPe's solution is to migrate it's balloon
>     bitmap along with the migration data.

Nothing is special in the second migration, QEMU will request the guest for free pages
Information, and the guest will traverse it's current free page list to construct a
new free page bitmap and send it to QEMU. Just like in the first migration.

Liang
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-10  8:36         ` Li, Liang Z
@ 2016-03-10 11:18           ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-10 11:18 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

Hi,
  I'm just catching back up on this thread; so without reference to any
particular previous mail in the thread.

  1) How many of the free pages do we tell the host about?
     Your main change is telling the host about all the
     free pages.
     If we tell the host about all the free pages, then we might
     end up needing to allocate more pages and update the host
     with pages we now want to use; that would have to wait for the
     host to acknowledge that use of these pages, since if we don't
     wait for it then it might have skipped migrating a page we
     just started using (I don't understand how your series solves that).
     So the guest probably needs to keep some free pages - how many?

  2) Clearing out caches
     Does it make sense to clean caches?  They're apparently useful data
     so if we clean them it's likely to slow the guest down; I guess
     they're also likely to be fairly static data - so at least fairly
     easy to migrate.
     The answer here partially depends on what you want from your migration;
     if you're after the fastest possible migration time it might make
     sense to clean the caches and avoid migrating them; but that might
     be at the cost of more disruption to the guest - there's a trade off
     somewhere and it's not clear to me how you set that depending on your
     guest/network/reqirements.

  3) Why is ballooning slow?
     You've got a figure of 5s to balloon on an 8GB VM - but an 
     8GB VM isn't huge; so I worry about how long it would take
     on a big VM.   We need to understand why it's slow 
       * is it due to the guest shuffling pages around? 
       * is it due to the virtio-balloon protocol sending one page
         at a time?
         + Do balloon pages normally clump in physical memory
            - i.e. would a 'large balloon' message help
            - or do we need a bitmap because it tends not to clump?

       * is it due to the madvise on the host?
         If we were using the normal balloon messages, then we
         could, during migration, just route those to the migration
         code rather than bothering with the madvise.
         If they're clumping together we could just turn that into
         one big madvise; if they're not then would we benefit from
         a call that lets us madvise lots of areas?

  4) Speeding up the migration of those free pages
    You're using the bitmap to avoid migrating those free pages; HPe's
    patchset is reconstructing a bitmap from the balloon data;  OK, so
    this all makes sense to avoid migrating them - I'd also been thinking
    of using pagemap to spot zero pages that would help find other zero'd
    pages, but perhaps ballooned is enough?

  5) Second-migrate
    Given a VM where you've done all those tricks on, what happens when
    you migrate it a second time?   I guess you're aiming for the guest
    to update it's bitmap;  HPe's solution is to migrate it's balloon
    bitmap along with the migration data.
     
Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-10 11:18           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-10 11:18 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Amit Shah, quintela, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm,
	mohan_parthasarathy, jitendra.kolhe, simhan

Hi,
  I'm just catching back up on this thread; so without reference to any
particular previous mail in the thread.

  1) How many of the free pages do we tell the host about?
     Your main change is telling the host about all the
     free pages.
     If we tell the host about all the free pages, then we might
     end up needing to allocate more pages and update the host
     with pages we now want to use; that would have to wait for the
     host to acknowledge that use of these pages, since if we don't
     wait for it then it might have skipped migrating a page we
     just started using (I don't understand how your series solves that).
     So the guest probably needs to keep some free pages - how many?

  2) Clearing out caches
     Does it make sense to clean caches?  They're apparently useful data
     so if we clean them it's likely to slow the guest down; I guess
     they're also likely to be fairly static data - so at least fairly
     easy to migrate.
     The answer here partially depends on what you want from your migration;
     if you're after the fastest possible migration time it might make
     sense to clean the caches and avoid migrating them; but that might
     be at the cost of more disruption to the guest - there's a trade off
     somewhere and it's not clear to me how you set that depending on your
     guest/network/reqirements.

  3) Why is ballooning slow?
     You've got a figure of 5s to balloon on an 8GB VM - but an 
     8GB VM isn't huge; so I worry about how long it would take
     on a big VM.   We need to understand why it's slow 
       * is it due to the guest shuffling pages around? 
       * is it due to the virtio-balloon protocol sending one page
         at a time?
         + Do balloon pages normally clump in physical memory
            - i.e. would a 'large balloon' message help
            - or do we need a bitmap because it tends not to clump?

       * is it due to the madvise on the host?
         If we were using the normal balloon messages, then we
         could, during migration, just route those to the migration
         code rather than bothering with the madvise.
         If they're clumping together we could just turn that into
         one big madvise; if they're not then would we benefit from
         a call that lets us madvise lots of areas?

  4) Speeding up the migration of those free pages
    You're using the bitmap to avoid migrating those free pages; HPe's
    patchset is reconstructing a bitmap from the balloon data;  OK, so
    this all makes sense to avoid migrating them - I'd also been thinking
    of using pagemap to spot zero pages that would help find other zero'd
    pages, but perhaps ballooned is enough?

  5) Second-migrate
    Given a VM where you've done all those tricks on, what happens when
    you migrate it a second time?   I guess you're aiming for the guest
    to update it's bitmap;  HPe's solution is to migrate it's balloon
    bitmap along with the migration data.
     
Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-10  8:36         ` Li, Liang Z
  (?)
@ 2016-03-10 11:18         ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-10 11:18 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: ehabkost, kvm, mst, simhan, linux-kernel, qemu-devel,
	jitendra.kolhe, linux-mm, mohan_parthasarathy, Amit Shah,
	pbonzini, akpm, virtualization, rth

Hi,
  I'm just catching back up on this thread; so without reference to any
particular previous mail in the thread.

  1) How many of the free pages do we tell the host about?
     Your main change is telling the host about all the
     free pages.
     If we tell the host about all the free pages, then we might
     end up needing to allocate more pages and update the host
     with pages we now want to use; that would have to wait for the
     host to acknowledge that use of these pages, since if we don't
     wait for it then it might have skipped migrating a page we
     just started using (I don't understand how your series solves that).
     So the guest probably needs to keep some free pages - how many?

  2) Clearing out caches
     Does it make sense to clean caches?  They're apparently useful data
     so if we clean them it's likely to slow the guest down; I guess
     they're also likely to be fairly static data - so at least fairly
     easy to migrate.
     The answer here partially depends on what you want from your migration;
     if you're after the fastest possible migration time it might make
     sense to clean the caches and avoid migrating them; but that might
     be at the cost of more disruption to the guest - there's a trade off
     somewhere and it's not clear to me how you set that depending on your
     guest/network/reqirements.

  3) Why is ballooning slow?
     You've got a figure of 5s to balloon on an 8GB VM - but an 
     8GB VM isn't huge; so I worry about how long it would take
     on a big VM.   We need to understand why it's slow 
       * is it due to the guest shuffling pages around? 
       * is it due to the virtio-balloon protocol sending one page
         at a time?
         + Do balloon pages normally clump in physical memory
            - i.e. would a 'large balloon' message help
            - or do we need a bitmap because it tends not to clump?

       * is it due to the madvise on the host?
         If we were using the normal balloon messages, then we
         could, during migration, just route those to the migration
         code rather than bothering with the madvise.
         If they're clumping together we could just turn that into
         one big madvise; if they're not then would we benefit from
         a call that lets us madvise lots of areas?

  4) Speeding up the migration of those free pages
    You're using the bitmap to avoid migrating those free pages; HPe's
    patchset is reconstructing a bitmap from the balloon data;  OK, so
    this all makes sense to avoid migrating them - I'd also been thinking
    of using pagemap to spot zero pages that would help find other zero'd
    pages, but perhaps ballooned is enough?

  5) Second-migrate
    Given a VM where you've done all those tricks on, what happens when
    you migrate it a second time?   I guess you're aiming for the guest
    to update it's bitmap;  HPe's solution is to migrate it's balloon
    bitmap along with the migration data.
     
Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-10  7:57       ` Amit Shah
@ 2016-03-10  8:36         ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-10  8:36 UTC (permalink / raw)
  To: Amit Shah
  Cc: quintela, qemu-devel, linux-kernel, mst, akpm, pbonzini, rth,
	ehabkost, linux-mm, virtualization, kvm, dgilbert

> >  Could provide more information on how to use virtio-serial to exchange
> data?  Thread , Wiki or code are all OK.
> >  I have not find some useful information yet.
> 
> See this commit in the Linux sources:
> 
> 108fc82596e3b66b819df9d28c1ebbc9ab5de14c
> 
> that adds a way to send guest trace data over to the host.  I think that's the
> most relevant to your use-case.  However, you'll have to add an in-kernel
> user of virtio-serial (like the virtio-console code
> -- the code that deals with tty and hvc currently).  There's no other non-tty
> user right now, and this is the right kind of use-case to add one for!
> 
> For many other (userspace) use-cases, see the qemu-guest-agent in the
> qemu sources.
> 
> The API is documented in the wiki:
> 
> http://www.linux-kvm.org/page/Virtio-serial_API
> 
> and the feature pages have some information that may help as well:
> 
> https://fedoraproject.org/wiki/Features/VirtioSerial
> 
> There are some links in here too:
> 
> http://log.amitshah.net/2010/09/communication-between-guests-and-
> hosts/
> 
> Hope this helps.
> 
> 
> 		Amit

Thanks a lot !!

Liang

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-10  8:36         ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-10  8:36 UTC (permalink / raw)
  To: Amit Shah
  Cc: quintela, qemu-devel, linux-kernel, mst, akpm, pbonzini, rth,
	ehabkost, linux-mm, virtualization, kvm, dgilbert

> >  Could provide more information on how to use virtio-serial to exchange
> data?  Thread , Wiki or code are all OK.
> >  I have not find some useful information yet.
> 
> See this commit in the Linux sources:
> 
> 108fc82596e3b66b819df9d28c1ebbc9ab5de14c
> 
> that adds a way to send guest trace data over to the host.  I think that's the
> most relevant to your use-case.  However, you'll have to add an in-kernel
> user of virtio-serial (like the virtio-console code
> -- the code that deals with tty and hvc currently).  There's no other non-tty
> user right now, and this is the right kind of use-case to add one for!
> 
> For many other (userspace) use-cases, see the qemu-guest-agent in the
> qemu sources.
> 
> The API is documented in the wiki:
> 
> http://www.linux-kvm.org/page/Virtio-serial_API
> 
> and the feature pages have some information that may help as well:
> 
> https://fedoraproject.org/wiki/Features/VirtioSerial
> 
> There are some links in here too:
> 
> http://log.amitshah.net/2010/09/communication-between-guests-and-
> hosts/
> 
> Hope this helps.
> 
> 
> 		Amit

Thanks a lot !!

Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-10  7:57       ` Amit Shah
  (?)
  (?)
@ 2016-03-10  8:36       ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-10  8:36 UTC (permalink / raw)
  To: Amit Shah
  Cc: ehabkost, kvm, mst, linux-kernel, qemu-devel, linux-mm, pbonzini,
	akpm, virtualization, dgilbert, rth

> >  Could provide more information on how to use virtio-serial to exchange
> data?  Thread , Wiki or code are all OK.
> >  I have not find some useful information yet.
> 
> See this commit in the Linux sources:
> 
> 108fc82596e3b66b819df9d28c1ebbc9ab5de14c
> 
> that adds a way to send guest trace data over to the host.  I think that's the
> most relevant to your use-case.  However, you'll have to add an in-kernel
> user of virtio-serial (like the virtio-console code
> -- the code that deals with tty and hvc currently).  There's no other non-tty
> user right now, and this is the right kind of use-case to add one for!
> 
> For many other (userspace) use-cases, see the qemu-guest-agent in the
> qemu sources.
> 
> The API is documented in the wiki:
> 
> http://www.linux-kvm.org/page/Virtio-serial_API
> 
> and the feature pages have some information that may help as well:
> 
> https://fedoraproject.org/wiki/Features/VirtioSerial
> 
> There are some links in here too:
> 
> http://log.amitshah.net/2010/09/communication-between-guests-and-
> hosts/
> 
> Hope this helps.
> 
> 
> 		Amit

Thanks a lot !!

Liang

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-10  7:44     ` Li, Liang Z
@ 2016-03-10  7:57       ` Amit Shah
  -1 siblings, 0 replies; 58+ messages in thread
From: Amit Shah @ 2016-03-10  7:57 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: quintela, qemu-devel, linux-kernel, mst, akpm, pbonzini, rth,
	ehabkost, linux-mm, virtualization, kvm, dgilbert

On (Thu) 10 Mar 2016 [07:44:19], Li, Liang Z wrote:
> 
> Hi Amit,
> 
>  Could provide more information on how to use virtio-serial to exchange data?  Thread , Wiki or code are all OK. 
>  I have not find some useful information yet.

See this commit in the Linux sources:

108fc82596e3b66b819df9d28c1ebbc9ab5de14c

that adds a way to send guest trace data over to the host.  I think
that's the most relevant to your use-case.  However, you'll have to
add an in-kernel user of virtio-serial (like the virtio-console code
-- the code that deals with tty and hvc currently).  There's no other
non-tty user right now, and this is the right kind of use-case to add
one for!

For many other (userspace) use-cases, see the qemu-guest-agent in the
qemu sources.

The API is documented in the wiki:

http://www.linux-kvm.org/page/Virtio-serial_API

and the feature pages have some information that may help as well:

https://fedoraproject.org/wiki/Features/VirtioSerial

There are some links in here too:

http://log.amitshah.net/2010/09/communication-between-guests-and-hosts/

Hope this helps.


		Amit

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-10  7:57       ` Amit Shah
  0 siblings, 0 replies; 58+ messages in thread
From: Amit Shah @ 2016-03-10  7:57 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: quintela, qemu-devel, linux-kernel, mst, akpm, pbonzini, rth,
	ehabkost, linux-mm, virtualization, kvm, dgilbert

On (Thu) 10 Mar 2016 [07:44:19], Li, Liang Z wrote:
> 
> Hi Amit,
> 
>  Could provide more information on how to use virtio-serial to exchange data?  Thread , Wiki or code are all OK. 
>  I have not find some useful information yet.

See this commit in the Linux sources:

108fc82596e3b66b819df9d28c1ebbc9ab5de14c

that adds a way to send guest trace data over to the host.  I think
that's the most relevant to your use-case.  However, you'll have to
add an in-kernel user of virtio-serial (like the virtio-console code
-- the code that deals with tty and hvc currently).  There's no other
non-tty user right now, and this is the right kind of use-case to add
one for!

For many other (userspace) use-cases, see the qemu-guest-agent in the
qemu sources.

The API is documented in the wiki:

http://www.linux-kvm.org/page/Virtio-serial_API

and the feature pages have some information that may help as well:

https://fedoraproject.org/wiki/Features/VirtioSerial

There are some links in here too:

http://log.amitshah.net/2010/09/communication-between-guests-and-hosts/

Hope this helps.


		Amit

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-10  7:44     ` Li, Liang Z
  (?)
@ 2016-03-10  7:57     ` Amit Shah
  -1 siblings, 0 replies; 58+ messages in thread
From: Amit Shah @ 2016-03-10  7:57 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: ehabkost, kvm, mst, linux-kernel, qemu-devel, linux-mm, pbonzini,
	akpm, virtualization, dgilbert, rth

On (Thu) 10 Mar 2016 [07:44:19], Li, Liang Z wrote:
> 
> Hi Amit,
> 
>  Could provide more information on how to use virtio-serial to exchange data?  Thread , Wiki or code are all OK. 
>  I have not find some useful information yet.

See this commit in the Linux sources:

108fc82596e3b66b819df9d28c1ebbc9ab5de14c

that adds a way to send guest trace data over to the host.  I think
that's the most relevant to your use-case.  However, you'll have to
add an in-kernel user of virtio-serial (like the virtio-console code
-- the code that deals with tty and hvc currently).  There's no other
non-tty user right now, and this is the right kind of use-case to add
one for!

For many other (userspace) use-cases, see the qemu-guest-agent in the
qemu sources.

The API is documented in the wiki:

http://www.linux-kvm.org/page/Virtio-serial_API

and the feature pages have some information that may help as well:

https://fedoraproject.org/wiki/Features/VirtioSerial

There are some links in here too:

http://log.amitshah.net/2010/09/communication-between-guests-and-hosts/

Hope this helps.


		Amit

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-08 11:13   ` Amit Shah
@ 2016-03-10  7:44     ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-10  7:44 UTC (permalink / raw)
  To: Amit Shah
  Cc: quintela, qemu-devel, linux-kernel, mst, akpm, pbonzini, rth,
	ehabkost, linux-mm, virtualization, kvm, dgilbert

> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to exchange free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange data.
> 
> 
> 		Amit

Hi Amit,

 Could provide more information on how to use virtio-serial to exchange data?  Thread , Wiki or code are all OK. 
 I have not find some useful information yet.

Thanks
Liang

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-10  7:44     ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-10  7:44 UTC (permalink / raw)
  To: Amit Shah
  Cc: quintela, qemu-devel, linux-kernel, mst, akpm, pbonzini, rth,
	ehabkost, linux-mm, virtualization, kvm, dgilbert

> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to exchange free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange data.
> 
> 
> 		Amit

Hi Amit,

 Could provide more information on how to use virtio-serial to exchange data?  Thread , Wiki or code are all OK. 
 I have not find some useful information yet.

Thanks
Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-08 11:13   ` Amit Shah
                     ` (3 preceding siblings ...)
  (?)
@ 2016-03-10  7:44   ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-10  7:44 UTC (permalink / raw)
  To: Amit Shah
  Cc: ehabkost, kvm, mst, linux-kernel, qemu-devel, linux-mm, pbonzini,
	akpm, virtualization, dgilbert, rth

> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to exchange free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange data.
> 
> 
> 		Amit

Hi Amit,

 Could provide more information on how to use virtio-serial to exchange data?  Thread , Wiki or code are all OK. 
 I have not find some useful information yet.

Thanks
Liang

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-08 11:13   ` Amit Shah
@ 2016-03-08 13:11     ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-08 13:11 UTC (permalink / raw)
  To: Amit Shah
  Cc: quintela, qemu-devel, linux-kernel, mst, akpm, pbonzini, rth,
	ehabkost, linux-mm, virtualization, kvm, dgilbert

> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to exchange free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange data.
> 
> 
> 		Amit

I don't like to use the virtio-balloon too, and it's confusing. 
It's grate if the virtio-serial can be used, I will take a look at it. 

Thanks for your suggestion!

Liang

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-08 13:11     ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-08 13:11 UTC (permalink / raw)
  To: Amit Shah
  Cc: quintela, qemu-devel, linux-kernel, mst, akpm, pbonzini, rth,
	ehabkost, linux-mm, virtualization, kvm, dgilbert

> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to exchange free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange data.
> 
> 
> 		Amit

I don't like to use the virtio-balloon too, and it's confusing. 
It's grate if the virtio-serial can be used, I will take a look at it. 

Thanks for your suggestion!

Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-08 11:13   ` Amit Shah
  (?)
  (?)
@ 2016-03-08 13:11   ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-08 13:11 UTC (permalink / raw)
  To: Amit Shah
  Cc: ehabkost, kvm, mst, linux-kernel, qemu-devel, linux-mm, pbonzini,
	akpm, virtualization, dgilbert, rth

> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> >
> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> I like the idea, just have to prove (review) and test it a lot to ensure we don't
> end up skipping pages that matter.
> 
> However, there are a couple of points:
> 
> In my opinion, the information that's exchanged between the guest and the
> host should be exchanged over a virtio-serial channel rather than virtio-
> balloon.  First, there's nothing related to the balloon here.
> It just happens to be memory info.  Second, I would never enable balloon in
> a guest that I want to be performance-sensitive.  So even if you add this as
> part of balloon, you'll find no one is using this solution.
> 
> Secondly, I suggest virtio-serial, because it's meant exactly to exchange free-
> flowing information between a host and a guest, and you don't need to
> extend any part of the protocol for it (hence no changes necessary to the
> spec).  You can see how spice, vnc, etc., use virtio-serial to exchange data.
> 
> 
> 		Amit

I don't like to use the virtio-balloon too, and it's confusing. 
It's grate if the virtio-serial can be used, I will take a look at it. 

Thanks for your suggestion!

Liang

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-04  9:32 Jitendra Kolhe
@ 2016-03-08 11:14   ` Amit Shah
  2016-03-08 11:14   ` Amit Shah
  2016-03-08 11:14 ` Amit Shah
  2 siblings, 0 replies; 58+ messages in thread
From: Amit Shah @ 2016-03-08 11:14 UTC (permalink / raw)
  To: Jitendra Kolhe
  Cc: liang.z.li, dgilbert, ehabkost, kvm, mst, quintela, linux-kernel,
	qemu-devel, linux-mm, pbonzini, akpm, virtualization, rth,
	mohan_parthasarathy, simhan

On (Fri) 04 Mar 2016 [15:02:47], Jitendra Kolhe wrote:
> > >
> > > * Liang Li (liang.z.li@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > > will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in free
> > > > pages. We can make use of this fact and skip processing the free pages
> > > > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > > > network traffic significantly while speed up the live migration
> > > > process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use it
> > > > to filter out the guest's free pages in the ram bulk stage. This make
> > > > the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been looking at
> > > how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the balloon.
> 
> We were also tying to address similar problem, without actually needing to modify
> the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver

The scope of this patch series seems to be wider: don't send free
pages to a dest at all, vs. don't send pages that are ballooned out.

		Amit

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-08 11:14   ` Amit Shah
  0 siblings, 0 replies; 58+ messages in thread
From: Amit Shah @ 2016-03-08 11:14 UTC (permalink / raw)
  To: Jitendra Kolhe
  Cc: liang.z.li, dgilbert, ehabkost, kvm, mst, quintela, linux-kernel,
	qemu-devel, linux-mm, pbonzini, akpm, virtualization, rth,
	mohan_parthasarathy, simhan

On (Fri) 04 Mar 2016 [15:02:47], Jitendra Kolhe wrote:
> > >
> > > * Liang Li (liang.z.li@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > > will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in free
> > > > pages. We can make use of this fact and skip processing the free pages
> > > > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > > > network traffic significantly while speed up the live migration
> > > > process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use it
> > > > to filter out the guest's free pages in the ram bulk stage. This make
> > > > the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been looking at
> > > how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the balloon.
> 
> We were also tying to address similar problem, without actually needing to modify
> the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver

The scope of this patch series seems to be wider: don't send free
pages to a dest at all, vs. don't send pages that are ballooned out.

		Amit

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-04  9:32 Jitendra Kolhe
  2016-03-04  9:36   ` Li, Liang Z
  2016-03-08 11:14   ` Amit Shah
@ 2016-03-08 11:14 ` Amit Shah
  2 siblings, 0 replies; 58+ messages in thread
From: Amit Shah @ 2016-03-08 11:14 UTC (permalink / raw)
  To: Jitendra Kolhe
  Cc: ehabkost, kvm, qemu-devel, liang.z.li, dgilbert, linux-kernel,
	linux-mm, mst, mohan_parthasarathy, simhan, pbonzini, akpm,
	virtualization, rth

On (Fri) 04 Mar 2016 [15:02:47], Jitendra Kolhe wrote:
> > >
> > > * Liang Li (liang.z.li@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > > will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in free
> > > > pages. We can make use of this fact and skip processing the free pages
> > > > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > > > network traffic significantly while speed up the live migration
> > > > process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use it
> > > > to filter out the guest's free pages in the ram bulk stage. This make
> > > > the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been looking at
> > > how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the balloon.
> 
> We were also tying to address similar problem, without actually needing to modify
> the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver

The scope of this patch series seems to be wider: don't send free
pages to a dest at all, vs. don't send pages that are ballooned out.

		Amit

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-03 10:44 ` Liang Li
  (?)
@ 2016-03-08 11:13   ` Amit Shah
  -1 siblings, 0 replies; 58+ messages in thread
From: Amit Shah @ 2016-03-08 11:13 UTC (permalink / raw)
  To: Liang Li
  Cc: quintela, qemu-devel, linux-kernel, mst, akpm, pbonzini, rth,
	ehabkost, linux-mm, virtualization, kvm, dgilbert

On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.
> 
> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.

I like the idea, just have to prove (review) and test it a lot to
ensure we don't end up skipping pages that matter.

However, there are a couple of points:

In my opinion, the information that's exchanged between the guest and
the host should be exchanged over a virtio-serial channel rather than
virtio-balloon.  First, there's nothing related to the balloon here.
It just happens to be memory info.  Second, I would never enable
balloon in a guest that I want to be performance-sensitive.  So even
if you add this as part of balloon, you'll find no one is using this
solution.

Secondly, I suggest virtio-serial, because it's meant exactly to
exchange free-flowing information between a host and a guest, and you
don't need to extend any part of the protocol for it (hence no changes
necessary to the spec).  You can see how spice, vnc, etc., use
virtio-serial to exchange data.

		Amit

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-08 11:13   ` Amit Shah
  0 siblings, 0 replies; 58+ messages in thread
From: Amit Shah @ 2016-03-08 11:13 UTC (permalink / raw)
  To: Liang Li
  Cc: ehabkost, kvm, mst, linux-kernel, qemu-devel, linux-mm, pbonzini,
	akpm, virtualization, dgilbert, rth

On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.
> 
> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.

I like the idea, just have to prove (review) and test it a lot to
ensure we don't end up skipping pages that matter.

However, there are a couple of points:

In my opinion, the information that's exchanged between the guest and
the host should be exchanged over a virtio-serial channel rather than
virtio-balloon.  First, there's nothing related to the balloon here.
It just happens to be memory info.  Second, I would never enable
balloon in a guest that I want to be performance-sensitive.  So even
if you add this as part of balloon, you'll find no one is using this
solution.

Secondly, I suggest virtio-serial, because it's meant exactly to
exchange free-flowing information between a host and a guest, and you
don't need to extend any part of the protocol for it (hence no changes
necessary to the spec).  You can see how spice, vnc, etc., use
virtio-serial to exchange data.

		Amit

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-08 11:13   ` Amit Shah
  0 siblings, 0 replies; 58+ messages in thread
From: Amit Shah @ 2016-03-08 11:13 UTC (permalink / raw)
  To: Liang Li
  Cc: quintela, qemu-devel, linux-kernel, mst, akpm, pbonzini, rth,
	ehabkost, linux-mm, virtualization, kvm, dgilbert

On (Thu) 03 Mar 2016 [18:44:24], Liang Li wrote:
> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.
> 
> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.

I like the idea, just have to prove (review) and test it a lot to
ensure we don't end up skipping pages that matter.

However, there are a couple of points:

In my opinion, the information that's exchanged between the guest and
the host should be exchanged over a virtio-serial channel rather than
virtio-balloon.  First, there's nothing related to the balloon here.
It just happens to be memory info.  Second, I would never enable
balloon in a guest that I want to be performance-sensitive.  So even
if you add this as part of balloon, you'll find no one is using this
solution.

Secondly, I suggest virtio-serial, because it's meant exactly to
exchange free-flowing information between a host and a guest, and you
don't need to extend any part of the protocol for it (hence no changes
necessary to the spec).  You can see how spice, vnc, etc., use
virtio-serial to exchange data.

		Amit

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-04  9:32 Jitendra Kolhe
  2016-03-04  9:36   ` Li, Liang Z
@ 2016-03-04  9:36   ` Li, Liang Z
  2016-03-08 11:14 ` Amit Shah
  2 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-04  9:36 UTC (permalink / raw)
  To: Jitendra Kolhe, dgilbert
  Cc: ehabkost, kvm, mst, quintela, linux-kernel, qemu-devel, linux-mm,
	amit.shah, pbonzini, akpm, virtualization, rth,
	mohan_parthasarathy, simhan

> > > * Liang Li (liang.z.li@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these
> > > > pages will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in
> > > > free pages. We can make use of this fact and skip processing the
> > > > free pages in the ram bulk stage, it can save a lot CPU cycles and
> > > > reduce the network traffic significantly while speed up the live
> > > > migration process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use
> > > > it to filter out the guest's free pages in the ram bulk stage.
> > > > This make the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been
> > > looking at how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the
> balloon.
> 
> We were also tying to address similar problem, without actually needing to
> modify the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver
> 
> Thanks,
> - Jitendra
> 

Great! Thanks for your information.

Liang
> >
> > >   I wonder if it would be possible to avoid the kernel changes by
> > > parsing /proc/self/pagemap - if that can be used to detect
> > > unmapped/zero mapped pages in the guest ram, would it achieve the
> same result?
> > >

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-04  9:36   ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-04  9:36 UTC (permalink / raw)
  To: Jitendra Kolhe, dgilbert
  Cc: ehabkost, kvm, simhan, mst, qemu-devel, linux-kernel, linux-mm,
	mohan_parthasarathy, amit.shah, pbonzini, akpm, virtualization,
	rth

> > > * Liang Li (liang.z.li@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these
> > > > pages will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in
> > > > free pages. We can make use of this fact and skip processing the
> > > > free pages in the ram bulk stage, it can save a lot CPU cycles and
> > > > reduce the network traffic significantly while speed up the live
> > > > migration process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use
> > > > it to filter out the guest's free pages in the ram bulk stage.
> > > > This make the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been
> > > looking at how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the
> balloon.
> 
> We were also tying to address similar problem, without actually needing to
> modify the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver
> 
> Thanks,
> - Jitendra
> 

Great! Thanks for your information.

Liang
> >
> > >   I wonder if it would be possible to avoid the kernel changes by
> > > parsing /proc/self/pagemap - if that can be used to detect
> > > unmapped/zero mapped pages in the guest ram, would it achieve the
> same result?
> > >

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-04  9:36   ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-04  9:36 UTC (permalink / raw)
  To: Jitendra Kolhe, dgilbert
  Cc: ehabkost, kvm, mst, quintela, linux-kernel, qemu-devel, linux-mm,
	amit.shah, pbonzini, akpm, virtualization, rth,
	mohan_parthasarathy, simhan

> > > * Liang Li (liang.z.li@intel.com) wrote:
> > > > The current QEMU live migration implementation mark the all the
> > > > guest's RAM pages as dirtied in the ram bulk stage, all these
> > > > pages will be processed and that takes quit a lot of CPU cycles.
> > > >
> > > > From guest's point of view, it doesn't care about the content in
> > > > free pages. We can make use of this fact and skip processing the
> > > > free pages in the ram bulk stage, it can save a lot CPU cycles and
> > > > reduce the network traffic significantly while speed up the live
> > > > migration process obviously.
> > > >
> > > > This patch set is the QEMU side implementation.
> > > >
> > > > The virtio-balloon is extended so that QEMU can get the free pages
> > > > information from the guest through virtio.
> > > >
> > > > After getting the free pages information (a bitmap), QEMU can use
> > > > it to filter out the guest's free pages in the ram bulk stage.
> > > > This make the live migration process much more efficient.
> > >
> > > Hi,
> > >   An interesting solution; I know a few different people have been
> > > looking at how to speed up ballooned VM migration.
> > >
> >
> > Ooh, different solutions for the same purpose, and both based on the
> balloon.
> 
> We were also tying to address similar problem, without actually needing to
> modify the guest driver. Please find patch details under mail with subject.
> migration: skip sending ram pages released by virtio-balloon driver
> 
> Thanks,
> - Jitendra
> 

Great! Thanks for your information.

Liang
> >
> > >   I wonder if it would be possible to avoid the kernel changes by
> > > parsing /proc/self/pagemap - if that can be used to detect
> > > unmapped/zero mapped pages in the guest ram, would it achieve the
> same result?
> > >

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-04  9:32 Jitendra Kolhe
  2016-03-04  9:36   ` Li, Liang Z
                   ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Jitendra Kolhe @ 2016-03-04  9:32 UTC (permalink / raw)
  To: liang.z.li, dgilbert
  Cc: ehabkost, kvm, mst, quintela, linux-kernel, qemu-devel, linux-mm,
	amit.shah, pbonzini, akpm, virtualization, rth,
	mohan_parthasarathy, jitendra.kolhe, simhan

> >
> > * Liang Li (liang.z.li@intel.com) wrote:
> > > The current QEMU live migration implementation mark the all the
> > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > will be processed and that takes quit a lot of CPU cycles.
> > >
> > > From guest's point of view, it doesn't care about the content in free
> > > pages. We can make use of this fact and skip processing the free pages
> > > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > > network traffic significantly while speed up the live migration
> > > process obviously.
> > >
> > > This patch set is the QEMU side implementation.
> > >
> > > The virtio-balloon is extended so that QEMU can get the free pages
> > > information from the guest through virtio.
> > >
> > > After getting the free pages information (a bitmap), QEMU can use it
> > > to filter out the guest's free pages in the ram bulk stage. This make
> > > the live migration process much more efficient.
> >
> > Hi,
> >   An interesting solution; I know a few different people have been looking at
> > how to speed up ballooned VM migration.
> >
>
> Ooh, different solutions for the same purpose, and both based on the balloon.

We were also tying to address similar problem, without actually needing to modify
the guest driver. Please find patch details under mail with subject.
migration: skip sending ram pages released by virtio-balloon driver

Thanks,
- Jitendra

>
> >   I wonder if it would be possible to avoid the kernel changes by parsing
> > /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> > pages in the guest ram, would it achieve the same result?
> >
>
> Only detect the unmapped/zero mapped pages is not enough. Consider the
> situation like case 2, it can't achieve the same result.
>
> > > This RFC version doesn't take the post-copy and RDMA into
> > > consideration, maybe both of them can benefit from this PV solution by
> > > with some extra modifications.
> >
> > For postcopy to be safe, you would still need to send a message to the
> > destination telling it that there were zero pages, otherwise the destination
> > can't tell if it's supposed to request the page from the source or treat the
> > page as zero.
> >
> > Dave
>
> I will consider this later, thanks, Dave.
>
> Liang
>
> >
> > >
> > > Performance data
> > > ================
> > >
> > > Test environment:
> > >
> > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > > Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> > > Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> > > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> > >
> > > Case 1: Idle guest just boots:
> > > ============================================
> > >                     | original  |    pv
> > > -------------------------------------------
> > > total time(ms)      |    1894   |   421
> > > --------------------------------------------
> > > transferred ram(KB) |   398017  |  353242
> > > ============================================
> > >
> > >
> > > Case 2: The guest has ever run some memory consuming workload, the
> > > workload is terminated just before live migration.
> > > ============================================
> > >                     | original  |    pv
> > > -------------------------------------------
> > > total time(ms)      |   7436    |   552
> > > --------------------------------------------
> > > transferred ram(KB) |  8146291  |  361375
> > > ============================================
> > >
>
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-04  9:32 Jitendra Kolhe
  0 siblings, 0 replies; 58+ messages in thread
From: Jitendra Kolhe @ 2016-03-04  9:32 UTC (permalink / raw)
  To: liang.z.li, dgilbert
  Cc: ehabkost, kvm, simhan, mst, qemu-devel, linux-kernel,
	jitendra.kolhe, linux-mm, mohan_parthasarathy, amit.shah,
	pbonzini, akpm, virtualization, rth

> >
> > * Liang Li (liang.z.li@intel.com) wrote:
> > > The current QEMU live migration implementation mark the all the
> > > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > > will be processed and that takes quit a lot of CPU cycles.
> > >
> > > From guest's point of view, it doesn't care about the content in free
> > > pages. We can make use of this fact and skip processing the free pages
> > > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > > network traffic significantly while speed up the live migration
> > > process obviously.
> > >
> > > This patch set is the QEMU side implementation.
> > >
> > > The virtio-balloon is extended so that QEMU can get the free pages
> > > information from the guest through virtio.
> > >
> > > After getting the free pages information (a bitmap), QEMU can use it
> > > to filter out the guest's free pages in the ram bulk stage. This make
> > > the live migration process much more efficient.
> >
> > Hi,
> >   An interesting solution; I know a few different people have been looking at
> > how to speed up ballooned VM migration.
> >
>
> Ooh, different solutions for the same purpose, and both based on the balloon.

We were also tying to address similar problem, without actually needing to modify
the guest driver. Please find patch details under mail with subject.
migration: skip sending ram pages released by virtio-balloon driver

Thanks,
- Jitendra

>
> >   I wonder if it would be possible to avoid the kernel changes by parsing
> > /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> > pages in the guest ram, would it achieve the same result?
> >
>
> Only detect the unmapped/zero mapped pages is not enough. Consider the
> situation like case 2, it can't achieve the same result.
>
> > > This RFC version doesn't take the post-copy and RDMA into
> > > consideration, maybe both of them can benefit from this PV solution by
> > > with some extra modifications.
> >
> > For postcopy to be safe, you would still need to send a message to the
> > destination telling it that there were zero pages, otherwise the destination
> > can't tell if it's supposed to request the page from the source or treat the
> > page as zero.
> >
> > Dave
>
> I will consider this later, thanks, Dave.
>
> Liang
>
> >
> > >
> > > Performance data
> > > ================
> > >
> > > Test environment:
> > >
> > > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > > Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> > > Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> > > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> > >
> > > Case 1: Idle guest just boots:
> > > ============================================
> > >                     | original  |    pv
> > > -------------------------------------------
> > > total time(ms)      |    1894   |   421
> > > --------------------------------------------
> > > transferred ram(KB) |   398017  |  353242
> > > ============================================
> > >
> > >
> > > Case 2: The guest has ever run some memory consuming workload, the
> > > workload is terminated just before live migration.
> > > ============================================
> > >                     | original  |    pv
> > > -------------------------------------------
> > > total time(ms)      |   7436    |   552
> > > --------------------------------------------
> > > transferred ram(KB) |  8146291  |  361375
> > > ============================================
> > >
>
>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-03 17:46   ` Dr. David Alan Gilbert
  (?)
@ 2016-03-04  1:52     ` Li, Liang Z
  -1 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-04  1:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: quintela, amit.shah, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm

> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> * Liang Li (liang.z.li@intel.com) wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> 
> Hi,
>   An interesting solution; I know a few different people have been looking at
> how to speed up ballooned VM migration.
> 

Ooh, different solutions for the same purpose, and both based on the balloon.

>   I wonder if it would be possible to avoid the kernel changes by parsing
> /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> pages in the guest ram, would it achieve the same result?
> 

Only detect the unmapped/zero mapped pages is not enough. Consider the 
situation like case 2, it can't achieve the same result.

> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> For postcopy to be safe, you would still need to send a message to the
> destination telling it that there were zero pages, otherwise the destination
> can't tell if it's supposed to request the page from the source or treat the
> page as zero.
> 
> Dave

I will consider this later, thanks, Dave.

Liang

> 
> >
> > Performance data
> > ================
> >
> > Test environment:
> >
> > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> > Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> >
> > Case 1: Idle guest just boots:
> > ============================================
> >                     | original  |    pv
> > -------------------------------------------
> > total time(ms)      |    1894   |   421
> > --------------------------------------------
> > transferred ram(KB) |   398017  |  353242
> > ============================================
> >
> >
> > Case 2: The guest has ever run some memory consuming workload, the
> > workload is terminated just before live migration.
> > ============================================
> >                     | original  |    pv
> > -------------------------------------------
> > total time(ms)      |   7436    |   552
> > --------------------------------------------
> > transferred ram(KB) |  8146291  |  361375
> > ============================================
> >

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-04  1:52     ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-04  1:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: ehabkost, kvm, mst, linux-kernel, qemu-devel, linux-mm,
	amit.shah, pbonzini, akpm, virtualization, rth

> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> * Liang Li (liang.z.li@intel.com) wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> 
> Hi,
>   An interesting solution; I know a few different people have been looking at
> how to speed up ballooned VM migration.
> 

Ooh, different solutions for the same purpose, and both based on the balloon.

>   I wonder if it would be possible to avoid the kernel changes by parsing
> /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> pages in the guest ram, would it achieve the same result?
> 

Only detect the unmapped/zero mapped pages is not enough. Consider the 
situation like case 2, it can't achieve the same result.

> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> For postcopy to be safe, you would still need to send a message to the
> destination telling it that there were zero pages, otherwise the destination
> can't tell if it's supposed to request the page from the source or treat the
> page as zero.
> 
> Dave

I will consider this later, thanks, Dave.

Liang

> 
> >
> > Performance data
> > ================
> >
> > Test environment:
> >
> > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> > Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> >
> > Case 1: Idle guest just boots:
> > ============================================
> >                     | original  |    pv
> > -------------------------------------------
> > total time(ms)      |    1894   |   421
> > --------------------------------------------
> > transferred ram(KB) |   398017  |  353242
> > ============================================
> >
> >
> > Case 2: The guest has ever run some memory consuming workload, the
> > workload is terminated just before live migration.
> > ============================================
> >                     | original  |    pv
> > -------------------------------------------
> > total time(ms)      |   7436    |   552
> > --------------------------------------------
> > transferred ram(KB) |  8146291  |  361375
> > ============================================
> >

^ permalink raw reply	[flat|nested] 58+ messages in thread

* RE: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-04  1:52     ` Li, Liang Z
  0 siblings, 0 replies; 58+ messages in thread
From: Li, Liang Z @ 2016-03-04  1:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: quintela, amit.shah, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm

> Subject: Re: [RFC qemu 0/4] A PV solution for live migration optimization
> 
> * Liang Li (liang.z.li@intel.com) wrote:
> > The current QEMU live migration implementation mark the all the
> > guest's RAM pages as dirtied in the ram bulk stage, all these pages
> > will be processed and that takes quit a lot of CPU cycles.
> >
> > From guest's point of view, it doesn't care about the content in free
> > pages. We can make use of this fact and skip processing the free pages
> > in the ram bulk stage, it can save a lot CPU cycles and reduce the
> > network traffic significantly while speed up the live migration
> > process obviously.
> >
> > This patch set is the QEMU side implementation.
> >
> > The virtio-balloon is extended so that QEMU can get the free pages
> > information from the guest through virtio.
> >
> > After getting the free pages information (a bitmap), QEMU can use it
> > to filter out the guest's free pages in the ram bulk stage. This make
> > the live migration process much more efficient.
> 
> Hi,
>   An interesting solution; I know a few different people have been looking at
> how to speed up ballooned VM migration.
> 

Ooh, different solutions for the same purpose, and both based on the balloon.

>   I wonder if it would be possible to avoid the kernel changes by parsing
> /proc/self/pagemap - if that can be used to detect unmapped/zero mapped
> pages in the guest ram, would it achieve the same result?
> 

Only detect the unmapped/zero mapped pages is not enough. Consider the 
situation like case 2, it can't achieve the same result.

> > This RFC version doesn't take the post-copy and RDMA into
> > consideration, maybe both of them can benefit from this PV solution by
> > with some extra modifications.
> 
> For postcopy to be safe, you would still need to send a message to the
> destination telling it that there were zero pages, otherwise the destination
> can't tell if it's supposed to request the page from the source or treat the
> page as zero.
> 
> Dave

I will consider this later, thanks, Dave.

Liang

> 
> >
> > Performance data
> > ================
> >
> > Test environment:
> >
> > CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz Host RAM: 64GB
> > Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> > Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> > Network:  X540-AT2 with 10 Gigabit connection Guest RAM: 8GB
> >
> > Case 1: Idle guest just boots:
> > ============================================
> >                     | original  |    pv
> > -------------------------------------------
> > total time(ms)      |    1894   |   421
> > --------------------------------------------
> > transferred ram(KB) |   398017  |  353242
> > ============================================
> >
> >
> > Case 2: The guest has ever run some memory consuming workload, the
> > workload is terminated just before live migration.
> > ============================================
> >                     | original  |    pv
> > -------------------------------------------
> > total time(ms)      |   7436    |   552
> > --------------------------------------------
> > transferred ram(KB) |  8146291  |  361375
> > ============================================
> >

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-03 10:44 ` Liang Li
@ 2016-03-03 17:46   ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-03 17:46 UTC (permalink / raw)
  To: Liang Li
  Cc: quintela, amit.shah, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm

* Liang Li (liang.z.li@intel.com) wrote:
> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.

Hi,
  An interesting solution; I know a few different people have been looking
at how to speed up ballooned VM migration.

  I wonder if it would be possible to avoid the kernel changes by
parsing /proc/self/pagemap - if that can be used to detect unmapped/zero
mapped pages in the guest ram, would it achieve the same result?

> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.

For postcopy to be safe, you would still need to send a message to the
destination telling it that there were zero pages, otherwise the destination
can't tell if it's supposed to request the page from the source or
treat the page as zero.

Dave

> 
> Performance data
> ================
> 
> Test environment:
> 
> CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
> Host RAM: 64GB
> Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> Network:  X540-AT2 with 10 Gigabit connection
> Guest RAM: 8GB
> 
> Case 1: Idle guest just boots:
> ============================================
>                     | original  |    pv    
> -------------------------------------------
> total time(ms)      |    1894   |   421
> --------------------------------------------
> transferred ram(KB) |   398017  |  353242
> ============================================
> 
> 
> Case 2: The guest has ever run some memory consuming workload, the
> workload is terminated just before live migration.
> ============================================
>                     | original  |    pv    
> -------------------------------------------
> total time(ms)      |   7436    |   552
> --------------------------------------------
> transferred ram(KB) |  8146291  |  361375
> ============================================
> 
> Liang Li (4):
>   pc: Add code to get the lowmem form PCMachineState
>   virtio-balloon: Add a new feature to balloon device
>   migration: not set migration bitmap in setup stage
>   migration: filter out guest's free pages in ram bulk stage
> 
>  balloon.c                                       | 30 ++++++++-
>  hw/i386/pc.c                                    |  5 ++
>  hw/i386/pc_piix.c                               |  1 +
>  hw/i386/pc_q35.c                                |  1 +
>  hw/virtio/virtio-balloon.c                      | 81 ++++++++++++++++++++++++-
>  include/hw/i386/pc.h                            |  3 +-
>  include/hw/virtio/virtio-balloon.h              | 17 +++++-
>  include/standard-headers/linux/virtio_balloon.h |  1 +
>  include/sysemu/balloon.h                        | 10 ++-
>  migration/ram.c                                 | 64 +++++++++++++++----
>  10 files changed, 195 insertions(+), 18 deletions(-)
> 
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-03 17:46   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-03 17:46 UTC (permalink / raw)
  To: Liang Li
  Cc: quintela, amit.shah, qemu-devel, linux-kernel, mst, akpm,
	pbonzini, rth, ehabkost, linux-mm, virtualization, kvm

* Liang Li (liang.z.li@intel.com) wrote:
> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.

Hi,
  An interesting solution; I know a few different people have been looking
at how to speed up ballooned VM migration.

  I wonder if it would be possible to avoid the kernel changes by
parsing /proc/self/pagemap - if that can be used to detect unmapped/zero
mapped pages in the guest ram, would it achieve the same result?

> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.

For postcopy to be safe, you would still need to send a message to the
destination telling it that there were zero pages, otherwise the destination
can't tell if it's supposed to request the page from the source or
treat the page as zero.

Dave

> 
> Performance data
> ================
> 
> Test environment:
> 
> CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
> Host RAM: 64GB
> Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> Network:  X540-AT2 with 10 Gigabit connection
> Guest RAM: 8GB
> 
> Case 1: Idle guest just boots:
> ============================================
>                     | original  |    pv    
> -------------------------------------------
> total time(ms)      |    1894   |   421
> --------------------------------------------
> transferred ram(KB) |   398017  |  353242
> ============================================
> 
> 
> Case 2: The guest has ever run some memory consuming workload, the
> workload is terminated just before live migration.
> ============================================
>                     | original  |    pv    
> -------------------------------------------
> total time(ms)      |   7436    |   552
> --------------------------------------------
> transferred ram(KB) |  8146291  |  361375
> ============================================
> 
> Liang Li (4):
>   pc: Add code to get the lowmem form PCMachineState
>   virtio-balloon: Add a new feature to balloon device
>   migration: not set migration bitmap in setup stage
>   migration: filter out guest's free pages in ram bulk stage
> 
>  balloon.c                                       | 30 ++++++++-
>  hw/i386/pc.c                                    |  5 ++
>  hw/i386/pc_piix.c                               |  1 +
>  hw/i386/pc_q35.c                                |  1 +
>  hw/virtio/virtio-balloon.c                      | 81 ++++++++++++++++++++++++-
>  include/hw/i386/pc.h                            |  3 +-
>  include/hw/virtio/virtio-balloon.h              | 17 +++++-
>  include/standard-headers/linux/virtio_balloon.h |  1 +
>  include/sysemu/balloon.h                        | 10 ++-
>  migration/ram.c                                 | 64 +++++++++++++++----
>  10 files changed, 195 insertions(+), 18 deletions(-)
> 
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC qemu 0/4] A PV solution for live migration optimization
  2016-03-03 10:44 ` Liang Li
  (?)
  (?)
@ 2016-03-03 17:46 ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 58+ messages in thread
From: Dr. David Alan Gilbert @ 2016-03-03 17:46 UTC (permalink / raw)
  To: Liang Li
  Cc: ehabkost, kvm, mst, linux-kernel, qemu-devel, linux-mm,
	amit.shah, pbonzini, akpm, virtualization, rth

* Liang Li (liang.z.li@intel.com) wrote:
> The current QEMU live migration implementation mark the all the
> guest's RAM pages as dirtied in the ram bulk stage, all these pages
> will be processed and that takes quit a lot of CPU cycles.
> 
> From guest's point of view, it doesn't care about the content in free
> pages. We can make use of this fact and skip processing the free
> pages in the ram bulk stage, it can save a lot CPU cycles and reduce
> the network traffic significantly while speed up the live migration
> process obviously.
> 
> This patch set is the QEMU side implementation.
> 
> The virtio-balloon is extended so that QEMU can get the free pages
> information from the guest through virtio.
> 
> After getting the free pages information (a bitmap), QEMU can use it
> to filter out the guest's free pages in the ram bulk stage. This make
> the live migration process much more efficient.

Hi,
  An interesting solution; I know a few different people have been looking
at how to speed up ballooned VM migration.

  I wonder if it would be possible to avoid the kernel changes by
parsing /proc/self/pagemap - if that can be used to detect unmapped/zero
mapped pages in the guest ram, would it achieve the same result?

> This RFC version doesn't take the post-copy and RDMA into
> consideration, maybe both of them can benefit from this PV solution
> by with some extra modifications.

For postcopy to be safe, you would still need to send a message to the
destination telling it that there were zero pages, otherwise the destination
can't tell if it's supposed to request the page from the source or
treat the page as zero.

Dave

> 
> Performance data
> ================
> 
> Test environment:
> 
> CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
> Host RAM: 64GB
> Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
> Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
> Network:  X540-AT2 with 10 Gigabit connection
> Guest RAM: 8GB
> 
> Case 1: Idle guest just boots:
> ============================================
>                     | original  |    pv    
> -------------------------------------------
> total time(ms)      |    1894   |   421
> --------------------------------------------
> transferred ram(KB) |   398017  |  353242
> ============================================
> 
> 
> Case 2: The guest has ever run some memory consuming workload, the
> workload is terminated just before live migration.
> ============================================
>                     | original  |    pv    
> -------------------------------------------
> total time(ms)      |   7436    |   552
> --------------------------------------------
> transferred ram(KB) |  8146291  |  361375
> ============================================
> 
> Liang Li (4):
>   pc: Add code to get the lowmem form PCMachineState
>   virtio-balloon: Add a new feature to balloon device
>   migration: not set migration bitmap in setup stage
>   migration: filter out guest's free pages in ram bulk stage
> 
>  balloon.c                                       | 30 ++++++++-
>  hw/i386/pc.c                                    |  5 ++
>  hw/i386/pc_piix.c                               |  1 +
>  hw/i386/pc_q35.c                                |  1 +
>  hw/virtio/virtio-balloon.c                      | 81 ++++++++++++++++++++++++-
>  include/hw/i386/pc.h                            |  3 +-
>  include/hw/virtio/virtio-balloon.h              | 17 +++++-
>  include/standard-headers/linux/virtio_balloon.h |  1 +
>  include/sysemu/balloon.h                        | 10 ++-
>  migration/ram.c                                 | 64 +++++++++++++++----
>  10 files changed, 195 insertions(+), 18 deletions(-)
> 
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-03 10:44 ` Liang Li
  0 siblings, 0 replies; 58+ messages in thread
From: Liang Li @ 2016-03-03 10:44 UTC (permalink / raw)
  To: quintela, amit.shah, qemu-devel, linux-kernel
  Cc: mst, akpm, pbonzini, rth, ehabkost, linux-mm, virtualization,
	kvm, dgilbert, Liang Li

The current QEMU live migration implementation mark the all the
guest's RAM pages as dirtied in the ram bulk stage, all these pages
will be processed and that takes quit a lot of CPU cycles.

>From guest's point of view, it doesn't care about the content in free
pages. We can make use of this fact and skip processing the free
pages in the ram bulk stage, it can save a lot CPU cycles and reduce
the network traffic significantly while speed up the live migration
process obviously.

This patch set is the QEMU side implementation.

The virtio-balloon is extended so that QEMU can get the free pages
information from the guest through virtio.

After getting the free pages information (a bitmap), QEMU can use it
to filter out the guest's free pages in the ram bulk stage. This make
the live migration process much more efficient.

This RFC version doesn't take the post-copy and RDMA into
consideration, maybe both of them can benefit from this PV solution
by with some extra modifications.

Performance data
================

Test environment:

CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
Host RAM: 64GB
Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
Network:  X540-AT2 with 10 Gigabit connection
Guest RAM: 8GB

Case 1: Idle guest just boots:
============================================
                    | original  |    pv    
-------------------------------------------
total time(ms)      |    1894   |   421
--------------------------------------------
transferred ram(KB) |   398017  |  353242
============================================


Case 2: The guest has ever run some memory consuming workload, the
workload is terminated just before live migration.
============================================
                    | original  |    pv    
-------------------------------------------
total time(ms)      |   7436    |   552
--------------------------------------------
transferred ram(KB) |  8146291  |  361375
============================================

Liang Li (4):
  pc: Add code to get the lowmem form PCMachineState
  virtio-balloon: Add a new feature to balloon device
  migration: not set migration bitmap in setup stage
  migration: filter out guest's free pages in ram bulk stage

 balloon.c                                       | 30 ++++++++-
 hw/i386/pc.c                                    |  5 ++
 hw/i386/pc_piix.c                               |  1 +
 hw/i386/pc_q35.c                                |  1 +
 hw/virtio/virtio-balloon.c                      | 81 ++++++++++++++++++++++++-
 include/hw/i386/pc.h                            |  3 +-
 include/hw/virtio/virtio-balloon.h              | 17 +++++-
 include/standard-headers/linux/virtio_balloon.h |  1 +
 include/sysemu/balloon.h                        | 10 ++-
 migration/ram.c                                 | 64 +++++++++++++++----
 10 files changed, 195 insertions(+), 18 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC qemu 0/4] A PV solution for live migration optimization
@ 2016-03-03 10:44 ` Liang Li
  0 siblings, 0 replies; 58+ messages in thread
From: Liang Li @ 2016-03-03 10:44 UTC (permalink / raw)
  To: quintela, amit.shah, qemu-devel, linux-kernel
  Cc: mst, akpm, pbonzini, rth, ehabkost, linux-mm, virtualization,
	kvm, dgilbert, Liang Li

The current QEMU live migration implementation mark the all the
guest's RAM pages as dirtied in the ram bulk stage, all these pages
will be processed and that takes quit a lot of CPU cycles.

>From guest's point of view, it doesn't care about the content in free
pages. We can make use of this fact and skip processing the free
pages in the ram bulk stage, it can save a lot CPU cycles and reduce
the network traffic significantly while speed up the live migration
process obviously.

This patch set is the QEMU side implementation.

The virtio-balloon is extended so that QEMU can get the free pages
information from the guest through virtio.

After getting the free pages information (a bitmap), QEMU can use it
to filter out the guest's free pages in the ram bulk stage. This make
the live migration process much more efficient.

This RFC version doesn't take the post-copy and RDMA into
consideration, maybe both of them can benefit from this PV solution
by with some extra modifications.

Performance data
================

Test environment:

CPU: Intel (R) Xeon(R) CPU ES-2699 v3 @ 2.30GHz
Host RAM: 64GB
Host Linux Kernel:  4.2.0           Host OS: CentOS 7.1
Guest Linux Kernel:  4.5.rc6        Guest OS: CentOS 6.6
Network:  X540-AT2 with 10 Gigabit connection
Guest RAM: 8GB

Case 1: Idle guest just boots:
============================================
                    | original  |    pv    
-------------------------------------------
total time(ms)      |    1894   |   421
--------------------------------------------
transferred ram(KB) |   398017  |  353242
============================================


Case 2: The guest has ever run some memory consuming workload, the
workload is terminated just before live migration.
============================================
                    | original  |    pv    
-------------------------------------------
total time(ms)      |   7436    |   552
--------------------------------------------
transferred ram(KB) |  8146291  |  361375
============================================

Liang Li (4):
  pc: Add code to get the lowmem form PCMachineState
  virtio-balloon: Add a new feature to balloon device
  migration: not set migration bitmap in setup stage
  migration: filter out guest's free pages in ram bulk stage

 balloon.c                                       | 30 ++++++++-
 hw/i386/pc.c                                    |  5 ++
 hw/i386/pc_piix.c                               |  1 +
 hw/i386/pc_q35.c                                |  1 +
 hw/virtio/virtio-balloon.c                      | 81 ++++++++++++++++++++++++-
 include/hw/i386/pc.h                            |  3 +-
 include/hw/virtio/virtio-balloon.h              | 17 +++++-
 include/standard-headers/linux/virtio_balloon.h |  1 +
 include/sysemu/balloon.h                        | 10 ++-
 migration/ram.c                                 | 64 +++++++++++++++----
 10 files changed, 195 insertions(+), 18 deletions(-)

-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2016-03-16  1:20 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-03 10:44 [RFC qemu 0/4] A PV solution for live migration optimization Liang Li
  -- strict thread matches above, loose matches on Subject: below --
2016-03-04  9:32 Jitendra Kolhe
2016-03-04  9:32 Jitendra Kolhe
2016-03-04  9:36 ` Li, Liang Z
2016-03-04  9:36   ` Li, Liang Z
2016-03-04  9:36   ` Li, Liang Z
2016-03-08 11:14 ` Amit Shah
2016-03-08 11:14   ` Amit Shah
2016-03-08 11:14 ` Amit Shah
2016-03-03 10:44 Liang Li
2016-03-03 10:44 ` Liang Li
2016-03-03 17:46 ` Dr. David Alan Gilbert
2016-03-03 17:46   ` Dr. David Alan Gilbert
2016-03-04  1:52   ` Li, Liang Z
2016-03-04  1:52     ` Li, Liang Z
2016-03-04  1:52     ` Li, Liang Z
2016-03-03 17:46 ` Dr. David Alan Gilbert
2016-03-08 11:13 ` Amit Shah
2016-03-08 11:13   ` Amit Shah
2016-03-08 11:13   ` Amit Shah
2016-03-08 13:11   ` Li, Liang Z
2016-03-08 13:11   ` Li, Liang Z
2016-03-08 13:11     ` Li, Liang Z
2016-03-10  7:44   ` Li, Liang Z
2016-03-10  7:44   ` Li, Liang Z
2016-03-10  7:44     ` Li, Liang Z
2016-03-10  7:57     ` Amit Shah
2016-03-10  7:57     ` Amit Shah
2016-03-10  7:57       ` Amit Shah
2016-03-10  8:36       ` Li, Liang Z
2016-03-10  8:36         ` Li, Liang Z
2016-03-10 11:18         ` Dr. David Alan Gilbert
2016-03-10 11:18         ` Dr. David Alan Gilbert
2016-03-10 11:18           ` Dr. David Alan Gilbert
2016-03-11  2:38           ` Li, Liang Z
2016-03-11  2:38             ` Li, Liang Z
2016-03-14 17:03             ` Dr. David Alan Gilbert
2016-03-14 17:03             ` Dr. David Alan Gilbert
2016-03-14 17:03               ` Dr. David Alan Gilbert
2016-03-14 17:03               ` Dr. David Alan Gilbert
2016-03-15  3:31               ` Li, Liang Z
2016-03-15  3:31                 ` Li, Liang Z
2016-03-15  3:31                 ` Li, Liang Z
2016-03-15  3:31               ` Li, Liang Z
2016-03-15 10:29               ` Michael S. Tsirkin
2016-03-15 10:29               ` Michael S. Tsirkin
2016-03-15 10:29                 ` Michael S. Tsirkin
2016-03-15 11:11                 ` Li, Liang Z
2016-03-15 11:11                   ` Li, Liang Z
2016-03-15 11:11                   ` Li, Liang Z
2016-03-15 19:55                   ` Dr. David Alan Gilbert
2016-03-15 19:55                     ` Dr. David Alan Gilbert
2016-03-15 19:55                     ` Dr. David Alan Gilbert
2016-03-16  1:20                     ` Li, Liang Z
2016-03-16  1:20                       ` Li, Liang Z
2016-03-16  1:20                       ` Li, Liang Z
2016-03-11  2:38           ` Li, Liang Z
2016-03-10  8:36       ` Li, Liang Z

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.