[PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-11-30  8:43 Liang Li
  0 siblings, 0 replies; 29+ messages in thread
From: Liang Li @ 2016-11-30  8:43 UTC (permalink / raw)
  To: kvm
  Cc: virtio-dev, mhocko, mst, dave.hansen, qemu-devel, linux-kernel,
	linux-mm, kirill.shutemov, pbonzini, akpm, virtualization,
	Liang Li, dgilbert

This patch set contains two parts of changes to the virtio-balloon.

One is the change for speeding up the inflating & deflating process,
the main idea of this optimization is to use bitmap to send the page
information to host instead of the PFNs, to reduce the overhead of
virtio data transmission, address translation and madvise(). This can
help to improve the performance by about 85%.

Another change is for speeding up live migration. By skipping process
guest's unused pages in the first round of data copy, to reduce needless
data processing, this can help to save quite a lot of CPU cycles and
network bandwidth. We put guest's unused page information in a bitmap
and send it to host with the virt queue of virtio-balloon. For an idle
guest with 8GB RAM, this can help to shorten the total live migration
time from 2Sec to about 500ms in 10Gbps network environment.

Changes from v4 to v5:
    * Drop the code to get the max_pfn, use another way instead.
    * Simplify the API to get the unused page information from mm. 

Changes from v3 to v4:
    * Use the new scheme suggested by Dave Hansen to encode the bitmap.
    * Add code which is missed in v3 to handle migrate page. 
    * Free the memory for bitmap intime once the operation is done.
    * Address some of the comments in v3.

Changes from v2 to v3:
    * Change the name of 'free page' to 'unused page'.
    * Use the scatter & gather bitmap instead of a 1MB page bitmap.
    * Fix overwriting the page bitmap after kicking.
    * Some of MST's comments for v2.

Changes from v1 to v2:
    * Abandon the patch for dropping page cache.
    * Put some structures to uapi head file.
    * Use a new way to determine the page bitmap size.
    * Use a unified way to send the free page information with the bitmap
    * Address the issues referred in MST's comments

Liang Li (5):
  virtio-balloon: rework deflate to add page to a list
  virtio-balloon: define new feature bit and head struct
  virtio-balloon: speed up inflate/deflate process
  virtio-balloon: define flags and head for host request vq
  virtio-balloon: tell host vm's unused page info

 drivers/virtio/virtio_balloon.c     | 539 ++++++++++++++++++++++++++++++++----
 include/linux/mm.h                  |   3 +-
 include/uapi/linux/virtio_balloon.h |  41 +++
 mm/page_alloc.c                     |  72 +++++
 4 files changed, 607 insertions(+), 48 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-11-30  8:43 ` Liang Li
  0 siblings, 0 replies; 29+ messages in thread
From: Liang Li @ 2016-11-30  8:43 UTC (permalink / raw)
  To: kvm
  Cc: virtualization, linux-kernel, linux-mm, virtio-dev, qemu-devel,
	quintela, dgilbert, dave.hansen, mst, jasowang, kirill.shutemov,
	akpm, mhocko, pbonzini, Liang Li

This patch set contains two parts of changes to the virtio-balloon.

One is the change for speeding up the inflating & deflating process,
the main idea of this optimization is to use bitmap to send the page
information to host instead of the PFNs, to reduce the overhead of
virtio data transmission, address translation and madvise(). This can
help to improve the performance by about 85%.

Another change is for speeding up live migration. By skipping process
guest's unused pages in the first round of data copy, to reduce needless
data processing, this can help to save quite a lot of CPU cycles and
network bandwidth. We put guest's unused page information in a bitmap
and send it to host with the virt queue of virtio-balloon. For an idle
guest with 8GB RAM, this can help to shorten the total live migration
time from 2Sec to about 500ms in 10Gbps network environment.

Changes from v4 to v5:
    * Drop the code to get the max_pfn, use another way instead.
    * Simplify the API to get the unused page information from mm. 

Changes from v3 to v4:
    * Use the new scheme suggested by Dave Hansen to encode the bitmap.
    * Add code which is missed in v3 to handle migrate page. 
    * Free the memory for bitmap intime once the operation is done.
    * Address some of the comments in v3.

Changes from v2 to v3:
    * Change the name of 'free page' to 'unused page'.
    * Use the scatter & gather bitmap instead of a 1MB page bitmap.
    * Fix overwriting the page bitmap after kicking.
    * Some of MST's comments for v2.

Changes from v1 to v2:
    * Abandon the patch for dropping page cache.
    * Put some structures to uapi head file.
    * Use a new way to determine the page bitmap size.
    * Use a unified way to send the free page information with the bitmap
    * Address the issues referred in MST's comments

Liang Li (5):
  virtio-balloon: rework deflate to add page to a list
  virtio-balloon: define new feature bit and head struct
  virtio-balloon: speed up inflate/deflate process
  virtio-balloon: define flags and head for host request vq
  virtio-balloon: tell host vm's unused page info

 drivers/virtio/virtio_balloon.c     | 539 ++++++++++++++++++++++++++++++++----
 include/linux/mm.h                  |   3 +-
 include/uapi/linux/virtio_balloon.h |  41 +++
 mm/page_alloc.c                     |  72 +++++
 4 files changed, 607 insertions(+), 48 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-11-30  8:43 ` Liang Li
  0 siblings, 0 replies; 29+ messages in thread
From: Liang Li @ 2016-11-30  8:43 UTC (permalink / raw)
  To: kvm
  Cc: virtualization, linux-kernel, linux-mm, virtio-dev, qemu-devel,
	quintela, dgilbert, dave.hansen, mst, jasowang, kirill.shutemov,
	akpm, mhocko, pbonzini, Liang Li

This patch set contains two parts of changes to the virtio-balloon.
 
One is the change for speeding up the inflating & deflating process,
the main idea of this optimization is to use bitmap to send the page
information to host instead of the PFNs, to reduce the overhead of
virtio data transmission, address translation and madvise(). This can
help to improve the performance by about 85%.
 
Another change is for speeding up live migration. By skipping process
guest's unused pages in the first round of data copy, to reduce needless
data processing, this can help to save quite a lot of CPU cycles and
network bandwidth. We put guest's unused page information in a bitmap
and send it to host with the virt queue of virtio-balloon. For an idle
guest with 8GB RAM, this can help to shorten the total live migration
time from 2Sec to about 500ms in 10Gbps network environment.
 
Changes from v4 to v5:
    * Drop the code to get the max_pfn, use another way instead.
    * Simplify the API to get the unused page information from mm. 

Changes from v3 to v4:
    * Use the new scheme suggested by Dave Hansen to encode the bitmap.
    * Add code which is missed in v3 to handle migrate page. 
    * Free the memory for bitmap intime once the operation is done.
    * Address some of the comments in v3.

Changes from v2 to v3:
    * Change the name of 'free page' to 'unused page'.
    * Use the scatter & gather bitmap instead of a 1MB page bitmap.
    * Fix overwriting the page bitmap after kicking.
    * Some of MST's comments for v2.
 
Changes from v1 to v2:
    * Abandon the patch for dropping page cache.
    * Put some structures to uapi head file.
    * Use a new way to determine the page bitmap size.
    * Use a unified way to send the free page information with the bitmap
    * Address the issues referred in MST's comments

Liang Li (5):
  virtio-balloon: rework deflate to add page to a list
  virtio-balloon: define new feature bit and head struct
  virtio-balloon: speed up inflate/deflate process
  virtio-balloon: define flags and head for host request vq
  virtio-balloon: tell host vm's unused page info

 drivers/virtio/virtio_balloon.c     | 539 ++++++++++++++++++++++++++++++++----
 include/linux/mm.h                  |   3 +-
 include/uapi/linux/virtio_balloon.h |  41 +++
 mm/page_alloc.c                     |  72 +++++
 4 files changed, 607 insertions(+), 48 deletions(-)

-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-11-30  8:43 ` Liang Li
  (?)
@ 2016-12-06  8:40   ` David Hildenbrand
  -1 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2016-12-06  8:40 UTC (permalink / raw)
  To: Liang Li, kvm
  Cc: virtio-dev, mhocko, mst, dave.hansen, qemu-devel, linux-kernel,
	linux-mm, kirill.shutemov, pbonzini, akpm, virtualization,
	dgilbert

Am 30.11.2016 um 09:43 schrieb Liang Li:
> This patch set contains two parts of changes to the virtio-balloon.
>
> One is the change for speeding up the inflating & deflating process,
> the main idea of this optimization is to use bitmap to send the page
> information to host instead of the PFNs, to reduce the overhead of
> virtio data transmission, address translation and madvise(). This can
> help to improve the performance by about 85%.

Do you have some statistics/some rough feeling how many consecutive bits 
are usually set in the bitmaps? Is it really just purely random or is 
there some granularity that is usually consecutive?

IOW in real examples, do we have really large consecutive areas or are 
all pages just completely distributed over our memory?

Thanks!

-- 

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-06  8:40   ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2016-12-06  8:40 UTC (permalink / raw)
  To: Liang Li, kvm
  Cc: virtio-dev, mhocko, mst, dave.hansen, qemu-devel, linux-kernel,
	linux-mm, kirill.shutemov, pbonzini, akpm, virtualization,
	dgilbert

Am 30.11.2016 um 09:43 schrieb Liang Li:
> This patch set contains two parts of changes to the virtio-balloon.
>
> One is the change for speeding up the inflating & deflating process,
> the main idea of this optimization is to use bitmap to send the page
> information to host instead of the PFNs, to reduce the overhead of
> virtio data transmission, address translation and madvise(). This can
> help to improve the performance by about 85%.

Do you have some statistics/some rough feeling how many consecutive bits 
are usually set in the bitmaps? Is it really just purely random or is 
there some granularity that is usually consecutive?

IOW in real examples, do we have really large consecutive areas or are 
all pages just completely distributed over our memory?

Thanks!

-- 

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-06  8:40   ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2016-12-06  8:40 UTC (permalink / raw)
  To: Liang Li, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	dave.hansen, dgilbert, pbonzini, akpm, virtualization,
	kirill.shutemov

Am 30.11.2016 um 09:43 schrieb Liang Li:
> This patch set contains two parts of changes to the virtio-balloon.
>
> One is the change for speeding up the inflating & deflating process,
> the main idea of this optimization is to use bitmap to send the page
> information to host instead of the PFNs, to reduce the overhead of
> virtio data transmission, address translation and madvise(). This can
> help to improve the performance by about 85%.

Do you have some statistics/some rough feeling how many consecutive bits 
are usually set in the bitmaps? Is it really just purely random or is 
there some granularity that is usually consecutive?

IOW in real examples, do we have really large consecutive areas or are 
all pages just completely distributed over our memory?

Thanks!

-- 

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-06  8:40   ` David Hildenbrand
  (?)
@ 2016-12-07 13:35     ` Li, Liang Z
  -1 siblings, 0 replies; 29+ messages in thread
From: Li, Liang Z @ 2016-12-07 13:35 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, Hansen, Dave, qemu-devel, linux-kernel,
	linux-mm, kirill.shutemov, pbonzini, akpm, virtualization,
	dgilbert

> Am 30.11.2016 um 09:43 schrieb Liang Li:
> > This patch set contains two parts of changes to the virtio-balloon.
> >
> > One is the change for speeding up the inflating & deflating process,
> > the main idea of this optimization is to use bitmap to send the page
> > information to host instead of the PFNs, to reduce the overhead of
> > virtio data transmission, address translation and madvise(). This can
> > help to improve the performance by about 85%.
> 
> Do you have some statistics/some rough feeling how many consecutive bits are
> usually set in the bitmaps? Is it really just purely random or is there some
> granularity that is usually consecutive?
> 

I did something similar. Filled the balloon with 15GB for a 16GB idle guest, by
using bitmap, the madvise count was reduced to 605. when using the PFNs, the madvise count
was 3932160. It means there are quite a lot consecutive bits in the bitmap.
I didn't test for a guest with heavy memory workload. 

> IOW in real examples, do we have really large consecutive areas or are all
> pages just completely distributed over our memory?
> 

The buddy system of Linux kernel memory management shows there should be quite a lot of
 consecutive pages as long as there are a portion of free memory in the guest.
If all pages just completely distributed over our memory, it means the memory 
fragmentation is very serious, the kernel has the mechanism to avoid this happened.
In the other hand, the inflating should not happen at this time because the guest is almost
'out of memory'.

Liang

> Thanks!
> 
> --
> 
> David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-07 13:35     ` Li, Liang Z
  0 siblings, 0 replies; 29+ messages in thread
From: Li, Liang Z @ 2016-12-07 13:35 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, Hansen, Dave, qemu-devel, linux-kernel,
	linux-mm, kirill.shutemov, pbonzini, akpm, virtualization,
	dgilbert

> Am 30.11.2016 um 09:43 schrieb Liang Li:
> > This patch set contains two parts of changes to the virtio-balloon.
> >
> > One is the change for speeding up the inflating & deflating process,
> > the main idea of this optimization is to use bitmap to send the page
> > information to host instead of the PFNs, to reduce the overhead of
> > virtio data transmission, address translation and madvise(). This can
> > help to improve the performance by about 85%.
> 
> Do you have some statistics/some rough feeling how many consecutive bits are
> usually set in the bitmaps? Is it really just purely random or is there some
> granularity that is usually consecutive?
> 

I did something similar. Filled the balloon with 15GB for a 16GB idle guest, by
using bitmap, the madvise count was reduced to 605. when using the PFNs, the madvise count
was 3932160. It means there are quite a lot consecutive bits in the bitmap.
I didn't test for a guest with heavy memory workload. 

> IOW in real examples, do we have really large consecutive areas or are all
> pages just completely distributed over our memory?
> 

The buddy system of Linux kernel memory management shows there should be quite a lot of
 consecutive pages as long as there are a portion of free memory in the guest.
If all pages just completely distributed over our memory, it means the memory 
fragmentation is very serious, the kernel has the mechanism to avoid this happened.
In the other hand, the inflating should not happen at this time because the guest is almost
'out of memory'.

Liang

> Thanks!
> 
> --
> 
> David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-07 13:35     ` Li, Liang Z
  0 siblings, 0 replies; 29+ messages in thread
From: Li, Liang Z @ 2016-12-07 13:35 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, Hansen, Dave, qemu-devel, linux-kernel,
	linux-mm, kirill.shutemov, pbonzini, akpm, virtualization,
	dgilbert

> Am 30.11.2016 um 09:43 schrieb Liang Li:
> > This patch set contains two parts of changes to the virtio-balloon.
> >
> > One is the change for speeding up the inflating & deflating process,
> > the main idea of this optimization is to use bitmap to send the page
> > information to host instead of the PFNs, to reduce the overhead of
> > virtio data transmission, address translation and madvise(). This can
> > help to improve the performance by about 85%.
> 
> Do you have some statistics/some rough feeling how many consecutive bits are
> usually set in the bitmaps? Is it really just purely random or is there some
> granularity that is usually consecutive?
> 

I did something similar. Filled the balloon with 15GB for a 16GB idle guest, by
using bitmap, the madvise count was reduced to 605. when using the PFNs, the madvise count
was 3932160. It means there are quite a lot consecutive bits in the bitmap.
I didn't test for a guest with heavy memory workload. 

> IOW in real examples, do we have really large consecutive areas or are all
> pages just completely distributed over our memory?
> 

The buddy system of Linux kernel memory management shows there should be quite a lot of
 consecutive pages as long as there are a portion of free memory in the guest.
If all pages just completely distributed over our memory, it means the memory 
fragmentation is very serious, the kernel has the mechanism to avoid this happened.
In the other hand, the inflating should not happen at this time because the guest is almost
'out of memory'.

Liang

> Thanks!
> 
> --
> 
> David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-07 13:35     ` Li, Liang Z
@ 2016-12-07 15:34       ` Dave Hansen
  -1 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2016-12-07 15:34 UTC (permalink / raw)
  To: Li, Liang Z, David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	kirill.shutemov, pbonzini, akpm, virtualization, dgilbert

On 12/07/2016 05:35 AM, Li, Liang Z wrote:
>> Am 30.11.2016 um 09:43 schrieb Liang Li:
>> IOW in real examples, do we have really large consecutive areas or are all
>> pages just completely distributed over our memory?
> 
> The buddy system of Linux kernel memory management shows there should
> be quite a lot of consecutive pages as long as there are a portion of
> free memory in the guest.
...
> If all pages just completely distributed over our memory, it means
> the memory fragmentation is very serious, the kernel has the
> mechanism to avoid this happened.

While it is correct that the kernel has anti-fragmentation mechanisms, I
don't think it invalidates the question as to whether a bitmap would be
too sparse to be effective.

> In the other hand, the inflating should not happen at this time because the guest is almost
> 'out of memory'.

I don't think this is correct.  Most systems try to run with relatively
little free memory all the time, using the bulk of it as page cache.  We
have no reason to expect that ballooning will only occur when there is
lots of actual free memory and that it will not occur when that same
memory is in use as page cache.

In these patches, you're effectively still sending pfns.  You're just
sending one pfn per high-order page which is giving a really nice
speedup.  IMNHO, you're avoiding doing a real bitmap because creating a
bitmap means either have a really big bitmap, or you would have to do
some sorting (or multiple passes) of the free lists before populating a
smaller bitmap.

Like David, I would still like to see some data on whether the choice
between bitmaps and pfn lists is ever clearly in favor of bitmaps.  You
haven't convinced me, at least, that the data isn't even worth collecting.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-07 15:34       ` Dave Hansen
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2016-12-07 15:34 UTC (permalink / raw)
  To: Li, Liang Z, David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	kirill.shutemov, pbonzini, akpm, virtualization, dgilbert

On 12/07/2016 05:35 AM, Li, Liang Z wrote:
>> Am 30.11.2016 um 09:43 schrieb Liang Li:
>> IOW in real examples, do we have really large consecutive areas or are all
>> pages just completely distributed over our memory?
> 
> The buddy system of Linux kernel memory management shows there should
> be quite a lot of consecutive pages as long as there are a portion of
> free memory in the guest.
...
> If all pages just completely distributed over our memory, it means
> the memory fragmentation is very serious, the kernel has the
> mechanism to avoid this happened.

While it is correct that the kernel has anti-fragmentation mechanisms, I
don't think it invalidates the question as to whether a bitmap would be
too sparse to be effective.

> In the other hand, the inflating should not happen at this time because the guest is almost
> 'out of memory'.

I don't think this is correct.  Most systems try to run with relatively
little free memory all the time, using the bulk of it as page cache.  We
have no reason to expect that ballooning will only occur when there is
lots of actual free memory and that it will not occur when that same
memory is in use as page cache.

In these patches, you're effectively still sending pfns.  You're just
sending one pfn per high-order page which is giving a really nice
speedup.  IMNHO, you're avoiding doing a real bitmap because creating a
bitmap means either have a really big bitmap, or you would have to do
some sorting (or multiple passes) of the free lists before populating a
smaller bitmap.

Like David, I would still like to see some data on whether the choice
between bitmaps and pfn lists is ever clearly in favor of bitmaps.  You
haven't convinced me, at least, that the data isn't even worth collecting.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-07 15:34       ` Dave Hansen
  (?)
@ 2016-12-09  3:09         ` Li, Liang Z
  -1 siblings, 0 replies; 29+ messages in thread
From: Li, Liang Z @ 2016-12-09  3:09 UTC (permalink / raw)
  To: Hansen, Dave, David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	kirill.shutemov, pbonzini, akpm, virtualization, dgilbert

> Subject: Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating
> & fast live migration
> 
> On 12/07/2016 05:35 AM, Li, Liang Z wrote:
> >> Am 30.11.2016 um 09:43 schrieb Liang Li:
> >> IOW in real examples, do we have really large consecutive areas or
> >> are all pages just completely distributed over our memory?
> >
> > The buddy system of Linux kernel memory management shows there
> should
> > be quite a lot of consecutive pages as long as there are a portion of
> > free memory in the guest.
> ...
> > If all pages just completely distributed over our memory, it means the
> > memory fragmentation is very serious, the kernel has the mechanism to
> > avoid this happened.
> 
> While it is correct that the kernel has anti-fragmentation mechanisms, I don't
> think it invalidates the question as to whether a bitmap would be too sparse
> to be effective.
> 
> > In the other hand, the inflating should not happen at this time
> > because the guest is almost 'out of memory'.
> 
> I don't think this is correct.  Most systems try to run with relatively little free
> memory all the time, using the bulk of it as page cache.  We have no reason
> to expect that ballooning will only occur when there is lots of actual free
> memory and that it will not occur when that same memory is in use as page
> cache.
> 

Yes.
> In these patches, you're effectively still sending pfns.  You're just sending
> one pfn per high-order page which is giving a really nice speedup.  IMNHO,
> you're avoiding doing a real bitmap because creating a bitmap means either
> have a really big bitmap, or you would have to do some sorting (or multiple
> passes) of the free lists before populating a smaller bitmap.
> 
> Like David, I would still like to see some data on whether the choice between
> bitmaps and pfn lists is ever clearly in favor of bitmaps.  You haven't
> convinced me, at least, that the data isn't even worth collecting.

I will try to get some data with the real workload and share it with your guys.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-09  3:09         ` Li, Liang Z
  0 siblings, 0 replies; 29+ messages in thread
From: Li, Liang Z @ 2016-12-09  3:09 UTC (permalink / raw)
  To: Hansen, Dave, David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	kirill.shutemov, pbonzini, akpm, virtualization, dgilbert

> Subject: Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating
> & fast live migration
> 
> On 12/07/2016 05:35 AM, Li, Liang Z wrote:
> >> Am 30.11.2016 um 09:43 schrieb Liang Li:
> >> IOW in real examples, do we have really large consecutive areas or
> >> are all pages just completely distributed over our memory?
> >
> > The buddy system of Linux kernel memory management shows there
> should
> > be quite a lot of consecutive pages as long as there are a portion of
> > free memory in the guest.
> ...
> > If all pages just completely distributed over our memory, it means the
> > memory fragmentation is very serious, the kernel has the mechanism to
> > avoid this happened.
> 
> While it is correct that the kernel has anti-fragmentation mechanisms, I don't
> think it invalidates the question as to whether a bitmap would be too sparse
> to be effective.
> 
> > In the other hand, the inflating should not happen at this time
> > because the guest is almost 'out of memory'.
> 
> I don't think this is correct.  Most systems try to run with relatively little free
> memory all the time, using the bulk of it as page cache.  We have no reason
> to expect that ballooning will only occur when there is lots of actual free
> memory and that it will not occur when that same memory is in use as page
> cache.
> 

Yes.
> In these patches, you're effectively still sending pfns.  You're just sending
> one pfn per high-order page which is giving a really nice speedup.  IMNHO,
> you're avoiding doing a real bitmap because creating a bitmap means either
> have a really big bitmap, or you would have to do some sorting (or multiple
> passes) of the free lists before populating a smaller bitmap.
> 
> Like David, I would still like to see some data on whether the choice between
> bitmaps and pfn lists is ever clearly in favor of bitmaps.  You haven't
> convinced me, at least, that the data isn't even worth collecting.

I will try to get some data with the real workload and share it with your guys.

Thanks!
Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-09  3:09         ` Li, Liang Z
  0 siblings, 0 replies; 29+ messages in thread
From: Li, Liang Z @ 2016-12-09  3:09 UTC (permalink / raw)
  To: Hansen, Dave, David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	kirill.shutemov, pbonzini, akpm, virtualization, dgilbert

> Subject: Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating
> & fast live migration
> 
> On 12/07/2016 05:35 AM, Li, Liang Z wrote:
> >> Am 30.11.2016 um 09:43 schrieb Liang Li:
> >> IOW in real examples, do we have really large consecutive areas or
> >> are all pages just completely distributed over our memory?
> >
> > The buddy system of Linux kernel memory management shows there
> should
> > be quite a lot of consecutive pages as long as there are a portion of
> > free memory in the guest.
> ...
> > If all pages just completely distributed over our memory, it means the
> > memory fragmentation is very serious, the kernel has the mechanism to
> > avoid this happened.
> 
> While it is correct that the kernel has anti-fragmentation mechanisms, I don't
> think it invalidates the question as to whether a bitmap would be too sparse
> to be effective.
> 
> > In the other hand, the inflating should not happen at this time
> > because the guest is almost 'out of memory'.
> 
> I don't think this is correct.  Most systems try to run with relatively little free
> memory all the time, using the bulk of it as page cache.  We have no reason
> to expect that ballooning will only occur when there is lots of actual free
> memory and that it will not occur when that same memory is in use as page
> cache.
> 

Yes.
> In these patches, you're effectively still sending pfns.  You're just sending
> one pfn per high-order page which is giving a really nice speedup.  IMNHO,
> you're avoiding doing a real bitmap because creating a bitmap means either
> have a really big bitmap, or you would have to do some sorting (or multiple
> passes) of the free lists before populating a smaller bitmap.
> 
> Like David, I would still like to see some data on whether the choice between
> bitmaps and pfn lists is ever clearly in favor of bitmaps.  You haven't
> convinced me, at least, that the data isn't even worth collecting.

I will try to get some data with the real workload and share it with your guys.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-07 15:34       ` Dave Hansen
  (?)
  (?)
@ 2016-12-09  3:09       ` Li, Liang Z
  -1 siblings, 0 replies; 29+ messages in thread
From: Li, Liang Z @ 2016-12-09  3:09 UTC (permalink / raw)
  To: Hansen, Dave, David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	dgilbert, pbonzini, akpm, virtualization, kirill.shutemov

> Subject: Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating
> & fast live migration
> 
> On 12/07/2016 05:35 AM, Li, Liang Z wrote:
> >> Am 30.11.2016 um 09:43 schrieb Liang Li:
> >> IOW in real examples, do we have really large consecutive areas or
> >> are all pages just completely distributed over our memory?
> >
> > The buddy system of Linux kernel memory management shows there
> should
> > be quite a lot of consecutive pages as long as there are a portion of
> > free memory in the guest.
> ...
> > If all pages just completely distributed over our memory, it means the
> > memory fragmentation is very serious, the kernel has the mechanism to
> > avoid this happened.
> 
> While it is correct that the kernel has anti-fragmentation mechanisms, I don't
> think it invalidates the question as to whether a bitmap would be too sparse
> to be effective.
> 
> > In the other hand, the inflating should not happen at this time
> > because the guest is almost 'out of memory'.
> 
> I don't think this is correct.  Most systems try to run with relatively little free
> memory all the time, using the bulk of it as page cache.  We have no reason
> to expect that ballooning will only occur when there is lots of actual free
> memory and that it will not occur when that same memory is in use as page
> cache.
> 

Yes.
> In these patches, you're effectively still sending pfns.  You're just sending
> one pfn per high-order page which is giving a really nice speedup.  IMNHO,
> you're avoiding doing a real bitmap because creating a bitmap means either
> have a really big bitmap, or you would have to do some sorting (or multiple
> passes) of the free lists before populating a smaller bitmap.
> 
> Like David, I would still like to see some data on whether the choice between
> bitmaps and pfn lists is ever clearly in favor of bitmaps.  You haven't
> convinced me, at least, that the data isn't even worth collecting.

I will try to get some data with the real workload and share it with your guys.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-07 13:35     ` Li, Liang Z
                       ` (2 preceding siblings ...)
  (?)
@ 2016-12-07 15:34     ` Dave Hansen
  -1 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2016-12-07 15:34 UTC (permalink / raw)
  To: Li, Liang Z, David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	dgilbert, pbonzini, akpm, virtualization, kirill.shutemov

On 12/07/2016 05:35 AM, Li, Liang Z wrote:
>> Am 30.11.2016 um 09:43 schrieb Liang Li:
>> IOW in real examples, do we have really large consecutive areas or are all
>> pages just completely distributed over our memory?
> 
> The buddy system of Linux kernel memory management shows there should
> be quite a lot of consecutive pages as long as there are a portion of
> free memory in the guest.
...
> If all pages just completely distributed over our memory, it means
> the memory fragmentation is very serious, the kernel has the
> mechanism to avoid this happened.

While it is correct that the kernel has anti-fragmentation mechanisms, I
don't think it invalidates the question as to whether a bitmap would be
too sparse to be effective.

> In the other hand, the inflating should not happen at this time because the guest is almost
> 'out of memory'.

I don't think this is correct.  Most systems try to run with relatively
little free memory all the time, using the bulk of it as page cache.  We
have no reason to expect that ballooning will only occur when there is
lots of actual free memory and that it will not occur when that same
memory is in use as page cache.

In these patches, you're effectively still sending pfns.  You're just
sending one pfn per high-order page which is giving a really nice
speedup.  IMNHO, you're avoiding doing a real bitmap because creating a
bitmap means either have a really big bitmap, or you would have to do
some sorting (or multiple passes) of the free lists before populating a
smaller bitmap.

Like David, I would still like to see some data on whether the choice
between bitmaps and pfn lists is ever clearly in favor of bitmaps.  You
haven't convinced me, at least, that the data isn't even worth collecting.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-07 13:35     ` Li, Liang Z
@ 2016-12-07 15:42       ` David Hildenbrand
  -1 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2016-12-07 15:42 UTC (permalink / raw)
  To: Li, Liang Z, kvm
  Cc: virtio-dev, mhocko, mst, Hansen, Dave, qemu-devel, linux-kernel,
	linux-mm, kirill.shutemov, pbonzini, akpm, virtualization,
	dgilbert, Andrea Arcangeli

Am 07.12.2016 um 14:35 schrieb Li, Liang Z:
>> Am 30.11.2016 um 09:43 schrieb Liang Li:
>>> This patch set contains two parts of changes to the virtio-balloon.
>>>
>>> One is the change for speeding up the inflating & deflating process,
>>> the main idea of this optimization is to use bitmap to send the page
>>> information to host instead of the PFNs, to reduce the overhead of
>>> virtio data transmission, address translation and madvise(). This can
>>> help to improve the performance by about 85%.
>>
>> Do you have some statistics/some rough feeling how many consecutive bits are
>> usually set in the bitmaps? Is it really just purely random or is there some
>> granularity that is usually consecutive?
>>
>
> I did something similar. Filled the balloon with 15GB for a 16GB idle guest, by
> using bitmap, the madvise count was reduced to 605. when using the PFNs, the madvise count
> was 3932160. It means there are quite a lot consecutive bits in the bitmap.
> I didn't test for a guest with heavy memory workload.

Would it then even make sense to go one step further and report {pfn, 
length} combinations?

So simply send over an array of {pfn, length}?

This idea came up when talking to Andrea Arcangeli (put him on cc).

And it makes sense if you think about:

a) hugetlb backing: The host may only be able to free huge pages (we 
might want to communicate that to the guest later, that's another 
story). Still we would have to send bitmaps full of 4k frames (512 bits 
for 2mb frames). Of course, we could add a way to communicate that we 
are using a different bitmap-granularity.

b) if we really inflate huge memory regions (and it sounds like that 
according to your measurements), we can minimize the communication to 
the hypervisor and therefore the madvice calls.

c) we don't want to optimize for inflating guests with almost full 
memory (and therefore little consecutive memory areas) - my opinion :)

Thanks for the explanation!

-- 

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-07 15:42       ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2016-12-07 15:42 UTC (permalink / raw)
  To: Li, Liang Z, kvm
  Cc: virtio-dev, mhocko, mst, Hansen, Dave, qemu-devel, linux-kernel,
	linux-mm, kirill.shutemov, pbonzini, akpm, virtualization,
	dgilbert, Andrea Arcangeli

Am 07.12.2016 um 14:35 schrieb Li, Liang Z:
>> Am 30.11.2016 um 09:43 schrieb Liang Li:
>>> This patch set contains two parts of changes to the virtio-balloon.
>>>
>>> One is the change for speeding up the inflating & deflating process,
>>> the main idea of this optimization is to use bitmap to send the page
>>> information to host instead of the PFNs, to reduce the overhead of
>>> virtio data transmission, address translation and madvise(). This can
>>> help to improve the performance by about 85%.
>>
>> Do you have some statistics/some rough feeling how many consecutive bits are
>> usually set in the bitmaps? Is it really just purely random or is there some
>> granularity that is usually consecutive?
>>
>
> I did something similar. Filled the balloon with 15GB for a 16GB idle guest, by
> using bitmap, the madvise count was reduced to 605. when using the PFNs, the madvise count
> was 3932160. It means there are quite a lot consecutive bits in the bitmap.
> I didn't test for a guest with heavy memory workload.

Would it then even make sense to go one step further and report {pfn, 
length} combinations?

So simply send over an array of {pfn, length}?

This idea came up when talking to Andrea Arcangeli (put him on cc).

And it makes sense if you think about:

a) hugetlb backing: The host may only be able to free huge pages (we 
might want to communicate that to the guest later, that's another 
story). Still we would have to send bitmaps full of 4k frames (512 bits 
for 2mb frames). Of course, we could add a way to communicate that we 
are using a different bitmap-granularity.

b) if we really inflate huge memory regions (and it sounds like that 
according to your measurements), we can minimize the communication to 
the hypervisor and therefore the madvice calls.

c) we don't want to optimize for inflating guests with almost full 
memory (and therefore little consecutive memory areas) - my opinion :)


Thanks for the explanation!

-- 

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-07 15:42       ` David Hildenbrand
  (?)
@ 2016-12-07 15:45         ` Dave Hansen
  -1 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2016-12-07 15:45 UTC (permalink / raw)
  To: David Hildenbrand, Li, Liang Z, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	kirill.shutemov, pbonzini, akpm, virtualization, dgilbert,
	Andrea Arcangeli

On 12/07/2016 07:42 AM, David Hildenbrand wrote:
> Am 07.12.2016 um 14:35 schrieb Li, Liang Z:
>>> Am 30.11.2016 um 09:43 schrieb Liang Li:
>>>> This patch set contains two parts of changes to the virtio-balloon.
>>>>
>>>> One is the change for speeding up the inflating & deflating process,
>>>> the main idea of this optimization is to use bitmap to send the page
>>>> information to host instead of the PFNs, to reduce the overhead of
>>>> virtio data transmission, address translation and madvise(). This can
>>>> help to improve the performance by about 85%.
>>>
>>> Do you have some statistics/some rough feeling how many consecutive
>>> bits are
>>> usually set in the bitmaps? Is it really just purely random or is
>>> there some
>>> granularity that is usually consecutive?
>>>
>>
>> I did something similar. Filled the balloon with 15GB for a 16GB idle
>> guest, by
>> using bitmap, the madvise count was reduced to 605. when using the
>> PFNs, the madvise count
>> was 3932160. It means there are quite a lot consecutive bits in the
>> bitmap.
>> I didn't test for a guest with heavy memory workload.
> 
> Would it then even make sense to go one step further and report {pfn,
> length} combinations?
> 
> So simply send over an array of {pfn, length}?

Li's current patches do that.  Well, maybe not pfn/length, but they do
take a pfn and page-order, which fits perfectly with the kernel's
concept of high-order pages.

> And it makes sense if you think about:
> 
> a) hugetlb backing: The host may only be able to free huge pages (we
> might want to communicate that to the guest later, that's another
> story). Still we would have to send bitmaps full of 4k frames (512 bits
> for 2mb frames). Of course, we could add a way to communicate that we
> are using a different bitmap-granularity.

Yeah, please read the patches.  If they're not clear, then the
descriptions need work, but this is done already.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-07 15:45         ` Dave Hansen
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2016-12-07 15:45 UTC (permalink / raw)
  To: David Hildenbrand, Li, Liang Z, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	kirill.shutemov, pbonzini, akpm, virtualization, dgilbert,
	Andrea Arcangeli

On 12/07/2016 07:42 AM, David Hildenbrand wrote:
> Am 07.12.2016 um 14:35 schrieb Li, Liang Z:
>>> Am 30.11.2016 um 09:43 schrieb Liang Li:
>>>> This patch set contains two parts of changes to the virtio-balloon.
>>>>
>>>> One is the change for speeding up the inflating & deflating process,
>>>> the main idea of this optimization is to use bitmap to send the page
>>>> information to host instead of the PFNs, to reduce the overhead of
>>>> virtio data transmission, address translation and madvise(). This can
>>>> help to improve the performance by about 85%.
>>>
>>> Do you have some statistics/some rough feeling how many consecutive
>>> bits are
>>> usually set in the bitmaps? Is it really just purely random or is
>>> there some
>>> granularity that is usually consecutive?
>>>
>>
>> I did something similar. Filled the balloon with 15GB for a 16GB idle
>> guest, by
>> using bitmap, the madvise count was reduced to 605. when using the
>> PFNs, the madvise count
>> was 3932160. It means there are quite a lot consecutive bits in the
>> bitmap.
>> I didn't test for a guest with heavy memory workload.
> 
> Would it then even make sense to go one step further and report {pfn,
> length} combinations?
> 
> So simply send over an array of {pfn, length}?

Li's current patches do that.  Well, maybe not pfn/length, but they do
take a pfn and page-order, which fits perfectly with the kernel's
concept of high-order pages.

> And it makes sense if you think about:
> 
> a) hugetlb backing: The host may only be able to free huge pages (we
> might want to communicate that to the guest later, that's another
> story). Still we would have to send bitmaps full of 4k frames (512 bits
> for 2mb frames). Of course, we could add a way to communicate that we
> are using a different bitmap-granularity.

Yeah, please read the patches.  If they're not clear, then the
descriptions need work, but this is done already.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-07 15:45         ` Dave Hansen
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2016-12-07 15:45 UTC (permalink / raw)
  To: David Hildenbrand, Li, Liang Z, kvm
  Cc: Andrea Arcangeli, virtio-dev, mhocko, mst, qemu-devel,
	linux-kernel, linux-mm, dgilbert, pbonzini, akpm, virtualization,
	kirill.shutemov

On 12/07/2016 07:42 AM, David Hildenbrand wrote:
> Am 07.12.2016 um 14:35 schrieb Li, Liang Z:
>>> Am 30.11.2016 um 09:43 schrieb Liang Li:
>>>> This patch set contains two parts of changes to the virtio-balloon.
>>>>
>>>> One is the change for speeding up the inflating & deflating process,
>>>> the main idea of this optimization is to use bitmap to send the page
>>>> information to host instead of the PFNs, to reduce the overhead of
>>>> virtio data transmission, address translation and madvise(). This can
>>>> help to improve the performance by about 85%.
>>>
>>> Do you have some statistics/some rough feeling how many consecutive
>>> bits are
>>> usually set in the bitmaps? Is it really just purely random or is
>>> there some
>>> granularity that is usually consecutive?
>>>
>>
>> I did something similar. Filled the balloon with 15GB for a 16GB idle
>> guest, by
>> using bitmap, the madvise count was reduced to 605. when using the
>> PFNs, the madvise count
>> was 3932160. It means there are quite a lot consecutive bits in the
>> bitmap.
>> I didn't test for a guest with heavy memory workload.
> 
> Would it then even make sense to go one step further and report {pfn,
> length} combinations?
> 
> So simply send over an array of {pfn, length}?

Li's current patches do that.  Well, maybe not pfn/length, but they do
take a pfn and page-order, which fits perfectly with the kernel's
concept of high-order pages.

> And it makes sense if you think about:
> 
> a) hugetlb backing: The host may only be able to free huge pages (we
> might want to communicate that to the guest later, that's another
> story). Still we would have to send bitmaps full of 4k frames (512 bits
> for 2mb frames). Of course, we could add a way to communicate that we
> are using a different bitmap-granularity.

Yeah, please read the patches.  If they're not clear, then the
descriptions need work, but this is done already.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-07 15:45         ` Dave Hansen
  (?)
@ 2016-12-07 16:21           ` David Hildenbrand
  -1 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2016-12-07 16:21 UTC (permalink / raw)
  To: Dave Hansen, Li, Liang Z, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	kirill.shutemov, pbonzini, akpm, virtualization, dgilbert,
	Andrea Arcangeli

>>>
>>> I did something similar. Filled the balloon with 15GB for a 16GB idle
>>> guest, by
>>> using bitmap, the madvise count was reduced to 605. when using the
>>> PFNs, the madvise count
>>> was 3932160. It means there are quite a lot consecutive bits in the
>>> bitmap.
>>> I didn't test for a guest with heavy memory workload.
>>
>> Would it then even make sense to go one step further and report {pfn,
>> length} combinations?
>>
>> So simply send over an array of {pfn, length}?
>
> Li's current patches do that.  Well, maybe not pfn/length, but they do
> take a pfn and page-order, which fits perfectly with the kernel's
> concept of high-order pages.

So we can send length in powers of two. Still, I don't see any benefit
over a simple pfn/len schema. But I'll have a more detailed look at the
implementation first, maybe that will enlighten me :)

>
>> And it makes sense if you think about:
>>
>> a) hugetlb backing: The host may only be able to free huge pages (we
>> might want to communicate that to the guest later, that's another
>> story). Still we would have to send bitmaps full of 4k frames (512 bits
>> for 2mb frames). Of course, we could add a way to communicate that we
>> are using a different bitmap-granularity.
>
> Yeah, please read the patches.  If they're not clear, then the
> descriptions need work, but this is done already.
>

I missed the page_shift, thanks for the hint.

-- 

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-07 16:21           ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2016-12-07 16:21 UTC (permalink / raw)
  To: Dave Hansen, Li, Liang Z, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	kirill.shutemov, pbonzini, akpm, virtualization, dgilbert,
	Andrea Arcangeli

>>>
>>> I did something similar. Filled the balloon with 15GB for a 16GB idle
>>> guest, by
>>> using bitmap, the madvise count was reduced to 605. when using the
>>> PFNs, the madvise count
>>> was 3932160. It means there are quite a lot consecutive bits in the
>>> bitmap.
>>> I didn't test for a guest with heavy memory workload.
>>
>> Would it then even make sense to go one step further and report {pfn,
>> length} combinations?
>>
>> So simply send over an array of {pfn, length}?
>
> Li's current patches do that.  Well, maybe not pfn/length, but they do
> take a pfn and page-order, which fits perfectly with the kernel's
> concept of high-order pages.

So we can send length in powers of two. Still, I don't see any benefit
over a simple pfn/len schema. But I'll have a more detailed look at the
implementation first, maybe that will enlighten me :)

>
>> And it makes sense if you think about:
>>
>> a) hugetlb backing: The host may only be able to free huge pages (we
>> might want to communicate that to the guest later, that's another
>> story). Still we would have to send bitmaps full of 4k frames (512 bits
>> for 2mb frames). Of course, we could add a way to communicate that we
>> are using a different bitmap-granularity.
>
> Yeah, please read the patches.  If they're not clear, then the
> descriptions need work, but this is done already.
>

I missed the page_shift, thanks for the hint.

-- 

David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-07 16:21           ` David Hildenbrand
  0 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2016-12-07 16:21 UTC (permalink / raw)
  To: Dave Hansen, Li, Liang Z, kvm
  Cc: Andrea Arcangeli, virtio-dev, mhocko, mst, qemu-devel,
	linux-kernel, linux-mm, dgilbert, pbonzini, akpm, virtualization,
	kirill.shutemov

>>>
>>> I did something similar. Filled the balloon with 15GB for a 16GB idle
>>> guest, by
>>> using bitmap, the madvise count was reduced to 605. when using the
>>> PFNs, the madvise count
>>> was 3932160. It means there are quite a lot consecutive bits in the
>>> bitmap.
>>> I didn't test for a guest with heavy memory workload.
>>
>> Would it then even make sense to go one step further and report {pfn,
>> length} combinations?
>>
>> So simply send over an array of {pfn, length}?
>
> Li's current patches do that.  Well, maybe not pfn/length, but they do
> take a pfn and page-order, which fits perfectly with the kernel's
> concept of high-order pages.

So we can send length in powers of two. Still, I don't see any benefit
over a simple pfn/len schema. But I'll have a more detailed look at the
implementation first, maybe that will enlighten me :)

>
>> And it makes sense if you think about:
>>
>> a) hugetlb backing: The host may only be able to free huge pages (we
>> might want to communicate that to the guest later, that's another
>> story). Still we would have to send bitmaps full of 4k frames (512 bits
>> for 2mb frames). Of course, we could add a way to communicate that we
>> are using a different bitmap-granularity.
>
> Yeah, please read the patches.  If they're not clear, then the
> descriptions need work, but this is done already.
>

I missed the page_shift, thanks for the hint.

-- 

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-07 16:21           ` David Hildenbrand
  (?)
  (?)
@ 2016-12-07 16:57           ` Dave Hansen
  -1 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2016-12-07 16:57 UTC (permalink / raw)
  To: David Hildenbrand, Li, Liang Z, kvm
  Cc: Andrea Arcangeli, mhocko, mst, linux-kernel, qemu-devel,
	linux-mm, dgilbert, pbonzini, akpm, virtualization,
	kirill.shutemov

Removing silly virtio-dev@ list because it's bouncing mail...

On 12/07/2016 08:21 AM, David Hildenbrand wrote:
>> Li's current patches do that.  Well, maybe not pfn/length, but they do
>> take a pfn and page-order, which fits perfectly with the kernel's
>> concept of high-order pages.
> 
> So we can send length in powers of two. Still, I don't see any benefit
> over a simple pfn/len schema. But I'll have a more detailed look at the
> implementation first, maybe that will enlighten me :)

It is more space-efficient.  We're fitting the order into 6 bits, which
would allows the full 2^64 address space to be represented in one entry,
and leaves room for the bitmap size to be encoded as well, if we decide
we need a bitmap in the future.

If that was purely a length, we'd be limited to 64*4k pages per entry,
which isn't even a full large page.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-07 16:21           ` David Hildenbrand
@ 2016-12-07 16:57             ` Dave Hansen
  -1 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2016-12-07 16:57 UTC (permalink / raw)
  To: David Hildenbrand, Li, Liang Z, kvm
  Cc: mhocko, mst, qemu-devel, linux-kernel, linux-mm, kirill.shutemov,
	pbonzini, akpm, virtualization, dgilbert, Andrea Arcangeli

Removing silly virtio-dev@ list because it's bouncing mail...

On 12/07/2016 08:21 AM, David Hildenbrand wrote:
>> Li's current patches do that.  Well, maybe not pfn/length, but they do
>> take a pfn and page-order, which fits perfectly with the kernel's
>> concept of high-order pages.
> 
> So we can send length in powers of two. Still, I don't see any benefit
> over a simple pfn/len schema. But I'll have a more detailed look at the
> implementation first, maybe that will enlighten me :)

It is more space-efficient.  We're fitting the order into 6 bits, which
would allows the full 2^64 address space to be represented in one entry,
and leaves room for the bitmap size to be encoded as well, if we decide
we need a bitmap in the future.

If that was purely a length, we'd be limited to 64*4k pages per entry,
which isn't even a full large page.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-12-07 16:57             ` Dave Hansen
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2016-12-07 16:57 UTC (permalink / raw)
  To: David Hildenbrand, Li, Liang Z, kvm
  Cc: mhocko, mst, qemu-devel, linux-kernel, linux-mm, kirill.shutemov,
	pbonzini, akpm, virtualization, dgilbert, Andrea Arcangeli

Removing silly virtio-dev@ list because it's bouncing mail...

On 12/07/2016 08:21 AM, David Hildenbrand wrote:
>> Li's current patches do that.  Well, maybe not pfn/length, but they do
>> take a pfn and page-order, which fits perfectly with the kernel's
>> concept of high-order pages.
> 
> So we can send length in powers of two. Still, I don't see any benefit
> over a simple pfn/len schema. But I'll have a more detailed look at the
> implementation first, maybe that will enlighten me :)

It is more space-efficient.  We're fitting the order into 6 bits, which
would allows the full 2^64 address space to be represented in one entry,
and leaves room for the bitmap size to be encoded as well, if we decide
we need a bitmap in the future.

If that was purely a length, we'd be limited to 64*4k pages per entry,
which isn't even a full large page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-07 13:35     ` Li, Liang Z
                       ` (4 preceding siblings ...)
  (?)
@ 2016-12-07 15:42     ` David Hildenbrand
  -1 siblings, 0 replies; 29+ messages in thread
From: David Hildenbrand @ 2016-12-07 15:42 UTC (permalink / raw)
  To: Li, Liang Z, kvm
  Cc: Andrea Arcangeli, virtio-dev, mhocko, mst, qemu-devel,
	linux-kernel, linux-mm, Hansen, Dave, dgilbert, pbonzini, akpm,
	virtualization, kirill.shutemov

Am 07.12.2016 um 14:35 schrieb Li, Liang Z:
>> Am 30.11.2016 um 09:43 schrieb Liang Li:
>>> This patch set contains two parts of changes to the virtio-balloon.
>>>
>>> One is the change for speeding up the inflating & deflating process,
>>> the main idea of this optimization is to use bitmap to send the page
>>> information to host instead of the PFNs, to reduce the overhead of
>>> virtio data transmission, address translation and madvise(). This can
>>> help to improve the performance by about 85%.
>>
>> Do you have some statistics/some rough feeling how many consecutive bits are
>> usually set in the bitmaps? Is it really just purely random or is there some
>> granularity that is usually consecutive?
>>
>
> I did something similar. Filled the balloon with 15GB for a 16GB idle guest, by
> using bitmap, the madvise count was reduced to 605. when using the PFNs, the madvise count
> was 3932160. It means there are quite a lot consecutive bits in the bitmap.
> I didn't test for a guest with heavy memory workload.

Would it then even make sense to go one step further and report {pfn, 
length} combinations?

So simply send over an array of {pfn, length}?

This idea came up when talking to Andrea Arcangeli (put him on cc).

And it makes sense if you think about:

a) hugetlb backing: The host may only be able to free huge pages (we 
might want to communicate that to the guest later, that's another 
story). Still we would have to send bitmaps full of 4k frames (512 bits 
for 2mb frames). Of course, we could add a way to communicate that we 
are using a different bitmap-granularity.

b) if we really inflate huge memory regions (and it sounds like that 
according to your measurements), we can minimize the communication to 
the hypervisor and therefore the madvice calls.

c) we don't want to optimize for inflating guests with almost full 
memory (and therefore little consecutive memory areas) - my opinion :)

Thanks for the explanation!

-- 

David

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration
  2016-12-06  8:40   ` David Hildenbrand
                     ` (2 preceding siblings ...)
  (?)
@ 2016-12-07 13:35   ` Li, Liang Z
  -1 siblings, 0 replies; 29+ messages in thread
From: Li, Liang Z @ 2016-12-07 13:35 UTC (permalink / raw)
  To: David Hildenbrand, kvm
  Cc: virtio-dev, mhocko, mst, qemu-devel, linux-kernel, linux-mm,
	Hansen, Dave, dgilbert, pbonzini, akpm, virtualization,
	kirill.shutemov

> Am 30.11.2016 um 09:43 schrieb Liang Li:
> > This patch set contains two parts of changes to the virtio-balloon.
> >
> > One is the change for speeding up the inflating & deflating process,
> > the main idea of this optimization is to use bitmap to send the page
> > information to host instead of the PFNs, to reduce the overhead of
> > virtio data transmission, address translation and madvise(). This can
> > help to improve the performance by about 85%.
> 
> Do you have some statistics/some rough feeling how many consecutive bits are
> usually set in the bitmaps? Is it really just purely random or is there some
> granularity that is usually consecutive?
> 

I did something similar. Filled the balloon with 15GB for a 16GB idle guest, by
using bitmap, the madvise count was reduced to 605. when using the PFNs, the madvise count
was 3932160. It means there are quite a lot consecutive bits in the bitmap.
I didn't test for a guest with heavy memory workload. 

> IOW in real examples, do we have really large consecutive areas or are all
> pages just completely distributed over our memory?
> 

The buddy system of Linux kernel memory management shows there should be quite a lot of
 consecutive pages as long as there are a portion of free memory in the guest.
If all pages just completely distributed over our memory, it means the memory 
fragmentation is very serious, the kernel has the mechanism to avoid this happened.
In the other hand, the inflating should not happen at this time because the guest is almost
'out of memory'.

Liang

> Thanks!
> 
> --
> 
> David

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-12-09  3:09 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-30  8:43 [PATCH kernel v5 0/5] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
2016-11-30  8:43 Liang Li
2016-11-30  8:43 ` Liang Li
2016-12-06  8:40 ` David Hildenbrand
2016-12-06  8:40   ` David Hildenbrand
2016-12-06  8:40   ` David Hildenbrand
2016-12-07 13:35   ` Li, Liang Z
2016-12-07 13:35     ` Li, Liang Z
2016-12-07 13:35     ` Li, Liang Z
2016-12-07 15:34     ` Dave Hansen
2016-12-07 15:34       ` Dave Hansen
2016-12-09  3:09       ` Li, Liang Z
2016-12-09  3:09         ` Li, Liang Z
2016-12-09  3:09         ` Li, Liang Z
2016-12-09  3:09       ` Li, Liang Z
2016-12-07 15:34     ` Dave Hansen
2016-12-07 15:42     ` David Hildenbrand
2016-12-07 15:42       ` David Hildenbrand
2016-12-07 15:45       ` Dave Hansen
2016-12-07 15:45         ` Dave Hansen
2016-12-07 15:45         ` Dave Hansen
2016-12-07 16:21         ` David Hildenbrand
2016-12-07 16:21           ` David Hildenbrand
2016-12-07 16:21           ` David Hildenbrand
2016-12-07 16:57           ` Dave Hansen
2016-12-07 16:57           ` Dave Hansen
2016-12-07 16:57             ` Dave Hansen
2016-12-07 15:42     ` David Hildenbrand
2016-12-07 13:35   ` Li, Liang Z

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.