All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 repost 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-07-27  1:23 ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li

This patchset is for kernel and contains two parts of change to the
virtio-balloon. 

One is the change for speeding up the inflating & deflating process,
the main idea of this optimization is to use bitmap to send the page
information to host instead of the PFNs, to reduce the overhead of
virtio data transmission, address translation and madvise(). This can
help to improve the performance by about 85%.

Another change is for speeding up live migration. By skipping process
guest's free pages in the first round of data copy, to reduce needless
data processing, this can help to save quite a lot of CPU cycles and
network bandwidth. We put guest's free page information in bitmap and
send it to host with the virt queue of virtio-balloon. For an idle 8GB
guest, this can help to shorten the total live migration time from 2Sec
to about 500ms in the 10Gbps network environment.  


Changes from v1 to v2:
    * Abandon the patch for dropping page cache.
    * Put some structures to uapi head file.
    * Use a new way to determine the page bitmap size.
    * Use a unified way to send the free page information with the bitmap 
    * Address the issues referred in MST's comments

Liang Li (7):
  virtio-balloon: rework deflate to add page to a list
  virtio-balloon: define new feature bit and page bitmap head
  mm: add a function to get the max pfn
  virtio-balloon: speed up inflate/deflate process
  virtio-balloon: define feature bit and head for misc virt queue
  mm: add the related functions to get free page info
  virtio-balloon: tell host vm's free page info

 drivers/virtio/virtio_balloon.c     | 306 +++++++++++++++++++++++++++++++-----
 include/uapi/linux/virtio_balloon.h |  41 +++++
 mm/page_alloc.c                     |  52 ++++++
 3 files changed, 359 insertions(+), 40 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-07-27  1:23 ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li

This patchset is for kernel and contains two parts of change to the
virtio-balloon. 

One is the change for speeding up the inflating & deflating process,
the main idea of this optimization is to use bitmap to send the page
information to host instead of the PFNs, to reduce the overhead of
virtio data transmission, address translation and madvise(). This can
help to improve the performance by about 85%.

Another change is for speeding up live migration. By skipping process
guest's free pages in the first round of data copy, to reduce needless
data processing, this can help to save quite a lot of CPU cycles and
network bandwidth. We put guest's free page information in bitmap and
send it to host with the virt queue of virtio-balloon. For an idle 8GB
guest, this can help to shorten the total live migration time from 2Sec
to about 500ms in the 10Gbps network environment.  


Changes from v1 to v2:
    * Abandon the patch for dropping page cache.
    * Put some structures to uapi head file.
    * Use a new way to determine the page bitmap size.
    * Use a unified way to send the free page information with the bitmap 
    * Address the issues referred in MST's comments

Liang Li (7):
  virtio-balloon: rework deflate to add page to a list
  virtio-balloon: define new feature bit and page bitmap head
  mm: add a function to get the max pfn
  virtio-balloon: speed up inflate/deflate process
  virtio-balloon: define feature bit and head for misc virt queue
  mm: add the related functions to get free page info
  virtio-balloon: tell host vm's free page info

 drivers/virtio/virtio_balloon.c     | 306 +++++++++++++++++++++++++++++++-----
 include/uapi/linux/virtio_balloon.h |  41 +++++
 mm/page_alloc.c                     |  52 ++++++
 3 files changed, 359 insertions(+), 40 deletions(-)

-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [Qemu-devel] [PATCH v2 repost 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-07-27  1:23 ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li

This patchset is for kernel and contains two parts of change to the
virtio-balloon. 

One is the change for speeding up the inflating & deflating process,
the main idea of this optimization is to use bitmap to send the page
information to host instead of the PFNs, to reduce the overhead of
virtio data transmission, address translation and madvise(). This can
help to improve the performance by about 85%.

Another change is for speeding up live migration. By skipping process
guest's free pages in the first round of data copy, to reduce needless
data processing, this can help to save quite a lot of CPU cycles and
network bandwidth. We put guest's free page information in bitmap and
send it to host with the virt queue of virtio-balloon. For an idle 8GB
guest, this can help to shorten the total live migration time from 2Sec
to about 500ms in the 10Gbps network environment.  


Changes from v1 to v2:
    * Abandon the patch for dropping page cache.
    * Put some structures to uapi head file.
    * Use a new way to determine the page bitmap size.
    * Use a unified way to send the free page information with the bitmap 
    * Address the issues referred in MST's comments

Liang Li (7):
  virtio-balloon: rework deflate to add page to a list
  virtio-balloon: define new feature bit and page bitmap head
  mm: add a function to get the max pfn
  virtio-balloon: speed up inflate/deflate process
  virtio-balloon: define feature bit and head for misc virt queue
  mm: add the related functions to get free page info
  virtio-balloon: tell host vm's free page info

 drivers/virtio/virtio_balloon.c     | 306 +++++++++++++++++++++++++++++++-----
 include/uapi/linux/virtio_balloon.h |  41 +++++
 mm/page_alloc.c                     |  52 ++++++
 3 files changed, 359 insertions(+), 40 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 1/7] virtio-balloon: rework deflate to add page to a list
  2016-07-27  1:23 ` Liang Li
  (?)
@ 2016-07-27  1:23   ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Paolo Bonzini,
	Cornelia Huck, Amit Shah

will allow faster notifications using a bitmap down the road.
balloon_pfn_to_page() can be removed because it's useless.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 476c0e3..8d649a2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -98,12 +98,6 @@ static u32 page_to_balloon_pfn(struct page *page)
 	return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE;
 }
 
-static struct page *balloon_pfn_to_page(u32 pfn)
-{
-	BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE);
-	return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE);
-}
-
 static void balloon_ack(struct virtqueue *vq)
 {
 	struct virtio_balloon *vb = vq->vdev->priv;
@@ -176,18 +170,16 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 	return num_allocated_pages;
 }
 
-static void release_pages_balloon(struct virtio_balloon *vb)
+static void release_pages_balloon(struct virtio_balloon *vb,
+				 struct list_head *pages)
 {
-	unsigned int i;
-	struct page *page;
+	struct page *page, *next;
 
-	/* Find pfns pointing at start of each page, get pages and free them. */
-	for (i = 0; i < vb->num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		page = balloon_pfn_to_page(virtio32_to_cpu(vb->vdev,
-							   vb->pfns[i]));
+	list_for_each_entry_safe(page, next, pages, lru) {
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 			adjust_managed_page_count(page, 1);
+		list_del(&page->lru);
 		put_page(page); /* balloon reference */
 	}
 }
@@ -197,6 +189,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	unsigned num_freed_pages;
 	struct page *page;
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
+	LIST_HEAD(pages);
 
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
@@ -208,6 +201,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 		if (!page)
 			break;
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		list_add(&page->lru, &pages);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
 
@@ -219,7 +213,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 */
 	if (vb->num_pfns != 0)
 		tell_host(vb, vb->deflate_vq);
-	release_pages_balloon(vb);
+	release_pages_balloon(vb, &pages);
 	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 1/7] virtio-balloon: rework deflate to add page to a list
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Paolo Bonzini,
	Cornelia Huck, Amit Shah

will allow faster notifications using a bitmap down the road.
balloon_pfn_to_page() can be removed because it's useless.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 476c0e3..8d649a2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -98,12 +98,6 @@ static u32 page_to_balloon_pfn(struct page *page)
 	return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE;
 }
 
-static struct page *balloon_pfn_to_page(u32 pfn)
-{
-	BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE);
-	return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE);
-}
-
 static void balloon_ack(struct virtqueue *vq)
 {
 	struct virtio_balloon *vb = vq->vdev->priv;
@@ -176,18 +170,16 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 	return num_allocated_pages;
 }
 
-static void release_pages_balloon(struct virtio_balloon *vb)
+static void release_pages_balloon(struct virtio_balloon *vb,
+				 struct list_head *pages)
 {
-	unsigned int i;
-	struct page *page;
+	struct page *page, *next;
 
-	/* Find pfns pointing at start of each page, get pages and free them. */
-	for (i = 0; i < vb->num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		page = balloon_pfn_to_page(virtio32_to_cpu(vb->vdev,
-							   vb->pfns[i]));
+	list_for_each_entry_safe(page, next, pages, lru) {
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 			adjust_managed_page_count(page, 1);
+		list_del(&page->lru);
 		put_page(page); /* balloon reference */
 	}
 }
@@ -197,6 +189,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	unsigned num_freed_pages;
 	struct page *page;
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
+	LIST_HEAD(pages);
 
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
@@ -208,6 +201,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 		if (!page)
 			break;
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		list_add(&page->lru, &pages);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
 
@@ -219,7 +213,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 */
 	if (vb->num_pfns != 0)
 		tell_host(vb, vb->deflate_vq);
-	release_pages_balloon(vb);
+	release_pages_balloon(vb, &pages);
 	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
 }
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [Qemu-devel] [PATCH v2 repost 1/7] virtio-balloon: rework deflate to add page to a list
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Paolo Bonzini,
	Cornelia Huck, Amit Shah

will allow faster notifications using a bitmap down the road.
balloon_pfn_to_page() can be removed because it's useless.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 476c0e3..8d649a2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -98,12 +98,6 @@ static u32 page_to_balloon_pfn(struct page *page)
 	return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE;
 }
 
-static struct page *balloon_pfn_to_page(u32 pfn)
-{
-	BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE);
-	return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE);
-}
-
 static void balloon_ack(struct virtqueue *vq)
 {
 	struct virtio_balloon *vb = vq->vdev->priv;
@@ -176,18 +170,16 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 	return num_allocated_pages;
 }
 
-static void release_pages_balloon(struct virtio_balloon *vb)
+static void release_pages_balloon(struct virtio_balloon *vb,
+				 struct list_head *pages)
 {
-	unsigned int i;
-	struct page *page;
+	struct page *page, *next;
 
-	/* Find pfns pointing at start of each page, get pages and free them. */
-	for (i = 0; i < vb->num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		page = balloon_pfn_to_page(virtio32_to_cpu(vb->vdev,
-							   vb->pfns[i]));
+	list_for_each_entry_safe(page, next, pages, lru) {
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 			adjust_managed_page_count(page, 1);
+		list_del(&page->lru);
 		put_page(page); /* balloon reference */
 	}
 }
@@ -197,6 +189,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	unsigned num_freed_pages;
 	struct page *page;
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
+	LIST_HEAD(pages);
 
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
@@ -208,6 +201,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 		if (!page)
 			break;
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		list_add(&page->lru, &pages);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
 
@@ -219,7 +213,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 */
 	if (vb->num_pfns != 0)
 		tell_host(vb, vb->deflate_vq);
-	release_pages_balloon(vb);
+	release_pages_balloon(vb, &pages);
 	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 1/7] virtio-balloon: rework deflate to add page to a list
  2016-07-27  1:23 ` Liang Li
  (?)
  (?)
@ 2016-07-27  1:23 ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Liang Li, qemu-devel, dgilbert,
	linux-mm, Michael S. Tsirkin, Paolo Bonzini, virtualization

will allow faster notifications using a bitmap down the road.
balloon_pfn_to_page() can be removed because it's useless.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 476c0e3..8d649a2 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -98,12 +98,6 @@ static u32 page_to_balloon_pfn(struct page *page)
 	return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE;
 }
 
-static struct page *balloon_pfn_to_page(u32 pfn)
-{
-	BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE);
-	return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE);
-}
-
 static void balloon_ack(struct virtqueue *vq)
 {
 	struct virtio_balloon *vb = vq->vdev->priv;
@@ -176,18 +170,16 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 	return num_allocated_pages;
 }
 
-static void release_pages_balloon(struct virtio_balloon *vb)
+static void release_pages_balloon(struct virtio_balloon *vb,
+				 struct list_head *pages)
 {
-	unsigned int i;
-	struct page *page;
+	struct page *page, *next;
 
-	/* Find pfns pointing at start of each page, get pages and free them. */
-	for (i = 0; i < vb->num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		page = balloon_pfn_to_page(virtio32_to_cpu(vb->vdev,
-							   vb->pfns[i]));
+	list_for_each_entry_safe(page, next, pages, lru) {
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 			adjust_managed_page_count(page, 1);
+		list_del(&page->lru);
 		put_page(page); /* balloon reference */
 	}
 }
@@ -197,6 +189,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	unsigned num_freed_pages;
 	struct page *page;
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
+	LIST_HEAD(pages);
 
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
@@ -208,6 +201,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 		if (!page)
 			break;
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		list_add(&page->lru, &pages);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
 
@@ -219,7 +213,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 */
 	if (vb->num_pfns != 0)
 		tell_host(vb, vb->deflate_vq);
-	release_pages_balloon(vb);
+	release_pages_balloon(vb, &pages);
 	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 2/7] virtio-balloon: define new feature bit and page bitmap head
  2016-07-27  1:23 ` Liang Li
  (?)
@ 2016-07-27  1:23   ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Paolo Bonzini,
	Cornelia Huck, Amit Shah

Add a new feature which supports sending the page information with
a bitmap. The current implementation uses PFNs array, which is not
very efficient. Using bitmap can improve the performance of
inflating/deflating significantly

The page bitmap header will used to tell the host some information
about the page bitmap. e.g. the page size, page bitmap length and
start pfn.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 include/uapi/linux/virtio_balloon.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 343d7dd..d3b182a 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -34,6 +34,7 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_PAGE_BITMAP	3 /* Send page info with bitmap */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -82,4 +83,22 @@ struct virtio_balloon_stat {
 	__virtio64 val;
 } __attribute__((packed));
 
+/* Page bitmap header structure */
+struct balloon_bmap_hdr {
+	/* Used to distinguish different request */
+	__virtio16 cmd;
+	/* Shift width of page in the bitmap */
+	__virtio16 page_shift;
+	/* flag used to identify different status */
+	__virtio16 flag;
+	/* Reserved */
+	__virtio16 reserved;
+	/* ID of the request */
+	__virtio64 req_id;
+	/* The pfn of 0 bit in the bitmap */
+	__virtio64 start_pfn;
+	/* The length of the bitmap, in bytes */
+	__virtio64 bmap_len;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 2/7] virtio-balloon: define new feature bit and page bitmap head
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Paolo Bonzini,
	Cornelia Huck, Amit Shah

Add a new feature which supports sending the page information with
a bitmap. The current implementation uses PFNs array, which is not
very efficient. Using bitmap can improve the performance of
inflating/deflating significantly

The page bitmap header will used to tell the host some information
about the page bitmap. e.g. the page size, page bitmap length and
start pfn.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 include/uapi/linux/virtio_balloon.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 343d7dd..d3b182a 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -34,6 +34,7 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_PAGE_BITMAP	3 /* Send page info with bitmap */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -82,4 +83,22 @@ struct virtio_balloon_stat {
 	__virtio64 val;
 } __attribute__((packed));
 
+/* Page bitmap header structure */
+struct balloon_bmap_hdr {
+	/* Used to distinguish different request */
+	__virtio16 cmd;
+	/* Shift width of page in the bitmap */
+	__virtio16 page_shift;
+	/* flag used to identify different status */
+	__virtio16 flag;
+	/* Reserved */
+	__virtio16 reserved;
+	/* ID of the request */
+	__virtio64 req_id;
+	/* The pfn of 0 bit in the bitmap */
+	__virtio64 start_pfn;
+	/* The length of the bitmap, in bytes */
+	__virtio64 bmap_len;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [Qemu-devel] [PATCH v2 repost 2/7] virtio-balloon: define new feature bit and page bitmap head
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Paolo Bonzini,
	Cornelia Huck, Amit Shah

Add a new feature which supports sending the page information with
a bitmap. The current implementation uses PFNs array, which is not
very efficient. Using bitmap can improve the performance of
inflating/deflating significantly

The page bitmap header will used to tell the host some information
about the page bitmap. e.g. the page size, page bitmap length and
start pfn.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 include/uapi/linux/virtio_balloon.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 343d7dd..d3b182a 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -34,6 +34,7 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_PAGE_BITMAP	3 /* Send page info with bitmap */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -82,4 +83,22 @@ struct virtio_balloon_stat {
 	__virtio64 val;
 } __attribute__((packed));
 
+/* Page bitmap header structure */
+struct balloon_bmap_hdr {
+	/* Used to distinguish different request */
+	__virtio16 cmd;
+	/* Shift width of page in the bitmap */
+	__virtio16 page_shift;
+	/* flag used to identify different status */
+	__virtio16 flag;
+	/* Reserved */
+	__virtio16 reserved;
+	/* ID of the request */
+	__virtio64 req_id;
+	/* The pfn of 0 bit in the bitmap */
+	__virtio64 start_pfn;
+	/* The length of the bitmap, in bytes */
+	__virtio64 bmap_len;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 2/7] virtio-balloon: define new feature bit and page bitmap head
  2016-07-27  1:23 ` Liang Li
                   ` (3 preceding siblings ...)
  (?)
@ 2016-07-27  1:23 ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Liang Li, qemu-devel, dgilbert,
	linux-mm, Michael S. Tsirkin, Paolo Bonzini, virtualization

Add a new feature which supports sending the page information with
a bitmap. The current implementation uses PFNs array, which is not
very efficient. Using bitmap can improve the performance of
inflating/deflating significantly

The page bitmap header will used to tell the host some information
about the page bitmap. e.g. the page size, page bitmap length and
start pfn.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 include/uapi/linux/virtio_balloon.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 343d7dd..d3b182a 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -34,6 +34,7 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_PAGE_BITMAP	3 /* Send page info with bitmap */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -82,4 +83,22 @@ struct virtio_balloon_stat {
 	__virtio64 val;
 } __attribute__((packed));
 
+/* Page bitmap header structure */
+struct balloon_bmap_hdr {
+	/* Used to distinguish different request */
+	__virtio16 cmd;
+	/* Shift width of page in the bitmap */
+	__virtio16 page_shift;
+	/* flag used to identify different status */
+	__virtio16 flag;
+	/* Reserved */
+	__virtio16 reserved;
+	/* ID of the request */
+	__virtio64 req_id;
+	/* The pfn of 0 bit in the bitmap */
+	__virtio64 start_pfn;
+	/* The length of the bitmap, in bytes */
+	__virtio64 bmap_len;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 3/7] mm: add a function to get the max pfn
  2016-07-27  1:23 ` Liang Li
  (?)
@ 2016-07-27  1:23   ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

Expose the function to get the max pfn, so it can be used in the
virtio-balloon device driver.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 mm/page_alloc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8b3e134..7da61ad 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4517,6 +4517,12 @@ void show_free_areas(unsigned int filter)
 	show_swap_cache_info();
 }
 
+unsigned long get_max_pfn(void)
+{
+	return max_pfn;
+}
+EXPORT_SYMBOL(get_max_pfn);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 3/7] mm: add a function to get the max pfn
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

Expose the function to get the max pfn, so it can be used in the
virtio-balloon device driver.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 mm/page_alloc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8b3e134..7da61ad 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4517,6 +4517,12 @@ void show_free_areas(unsigned int filter)
 	show_swap_cache_info();
 }
 
+unsigned long get_max_pfn(void)
+{
+	return max_pfn;
+}
+EXPORT_SYMBOL(get_max_pfn);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [Qemu-devel] [PATCH v2 repost 3/7] mm: add a function to get the max pfn
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

Expose the function to get the max pfn, so it can be used in the
virtio-balloon device driver.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 mm/page_alloc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8b3e134..7da61ad 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4517,6 +4517,12 @@ void show_free_areas(unsigned int filter)
 	show_swap_cache_info();
 }
 
+unsigned long get_max_pfn(void)
+{
+	return max_pfn;
+}
+EXPORT_SYMBOL(get_max_pfn);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 3/7] mm: add a function to get the max pfn
  2016-07-27  1:23 ` Liang Li
                   ` (5 preceding siblings ...)
  (?)
@ 2016-07-27  1:23 ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Liang Li, qemu-devel, dgilbert,
	linux-mm, Michael S. Tsirkin, Paolo Bonzini, Andrew Morton,
	virtualization, Mel Gorman, Vlastimil Babka

Expose the function to get the max pfn, so it can be used in the
virtio-balloon device driver.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 mm/page_alloc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8b3e134..7da61ad 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4517,6 +4517,12 @@ void show_free_areas(unsigned int filter)
 	show_swap_cache_info();
 }
 
+unsigned long get_max_pfn(void)
+{
+	return max_pfn;
+}
+EXPORT_SYMBOL(get_max_pfn);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27  1:23 ` Liang Li
  (?)
@ 2016-07-27  1:23   ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

The implementation of the current virtio-balloon is not very
efficient, the time spends on different stages of inflating
the balloon to 7GB of a 8GB idle guest:

a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

It takes about 4126ms for the inflating process to complete.
Debugging shows that the bottle neck are the stage b and stage d.

If using a bitmap to send the page info instead of the PFNs, we
can reduce the overhead in stage b quite a lot. Furthermore, we
can do the address translation and call madvise() with a bulk of
RAM pages, instead of the current page per page way, the overhead
of stage c and stage d can also be reduced a lot.

This patch is the kernel side implementation which is intended to
speed up the inflating & deflating process by adding a new feature
to the virtio-balloon device. With this new feature, inflating the
balloon to 7GB of a 8GB idle guest only takes 590ms, the
performance improvement is about 85%.

TODO: optimize stage a by allocating/freeing a chunk of pages
instead of a single page at a time.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 162 insertions(+), 22 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8d649a2..2d18ff6 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -41,10 +41,28 @@
 #define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+/*
+ * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
+ * to prevent a very large page bitmap, there are two reasons for this:
+ * 1) to save memory.
+ * 2) allocate a large bitmap may fail.
+ *
+ * The actual limit of pfn is determined by:
+ * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
+ *
+ * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
+ * the page list and send the PFNs with several times. To reduce the
+ * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
+ * be set with a value which can cover most cases.
+ */
+#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */
+
 static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 
+extern unsigned long get_max_pfn(void);
+
 struct virtio_balloon {
 	struct virtio_device *vdev;
 	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
@@ -62,6 +80,15 @@ struct virtio_balloon {
 
 	/* Number of balloon pages we've told the Host we're not using. */
 	unsigned int num_pages;
+	/* Pointer of the bitmap header. */
+	void *bmap_hdr;
+	/* Bitmap and length used to tell the host the pages */
+	unsigned long *page_bitmap;
+	unsigned long bmap_len;
+	/* Pfn limit */
+	unsigned long pfn_limit;
+	/* Used to record the processed pfn range */
+	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
 	/*
 	 * The pages we've told the Host we're not using are enqueued
 	 * at vb_dev_info->pages list.
@@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
 	wake_up(&vb->acked);
 }
 
+static inline void init_pfn_range(struct virtio_balloon *vb)
+{
+	vb->min_pfn = ULONG_MAX;
+	vb->max_pfn = 0;
+}
+
+static inline void update_pfn_range(struct virtio_balloon *vb,
+				 struct page *page)
+{
+	unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+	if (balloon_pfn < vb->min_pfn)
+		vb->min_pfn = balloon_pfn;
+	if (balloon_pfn > vb->max_pfn)
+		vb->max_pfn = balloon_pfn;
+}
+
 static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 {
 	struct scatterlist sg;
 	unsigned int len;
 
-	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
+		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
+		unsigned long bmap_len;
+
+		/* cmd and req_id are not used here, set them to 0 */
+		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
+		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
+		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
+		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
+		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
+		bmap_len = min(vb->bmap_len,
+			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
+		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
+		sg_init_one(&sg, hdr,
+			 sizeof(struct balloon_bmap_hdr) + bmap_len);
+	} else
+		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
 
 	/* We should always be able to add one buffer to an empty queue. */
 	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
@@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 
 	/* When host has read buffer, this completes via balloon_ack */
 	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
-
 }
 
 static void set_page_pfns(struct virtio_balloon *vb,
@@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
 					  page_to_balloon_pfn(page) + i);
 }
 
-static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
+static void set_page_bitmap(struct virtio_balloon *vb,
+			 struct list_head *pages, struct virtqueue *vq)
+{
+	unsigned long pfn;
+	struct page *page;
+	bool found;
+
+	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
+	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
+	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
+			pfn += vb->pfn_limit) {
+		vb->start_pfn = pfn + vb->pfn_limit;
+		vb->end_pfn = pfn;
+		memset(vb->page_bitmap, 0, vb->bmap_len);
+		found = false;
+		list_for_each_entry(page, pages, lru) {
+			unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+			if (balloon_pfn < pfn ||
+				 balloon_pfn >= pfn + vb->pfn_limit)
+				continue;
+			set_bit(balloon_pfn - pfn, vb->page_bitmap);
+			if (balloon_pfn > vb->end_pfn)
+				vb->end_pfn = balloon_pfn;
+			if (balloon_pfn < vb->start_pfn)
+				vb->start_pfn = balloon_pfn;
+			found = true;
+		}
+		if (found) {
+			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
+			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
+			tell_host(vb, vq);
+		}
+	}
+}
+
+static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
+				 bool use_bmap)
 {
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
-	unsigned num_allocated_pages;
+	unsigned int num_allocated_pages;
 
-	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+	if (use_bmap)
+		init_pfn_range(vb);
+	else
+		/* We can only do one array worth at a time. */
+		num = min(num, ARRAY_SIZE(vb->pfns));
 
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 			msleep(200);
 			break;
 		}
-		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		if (use_bmap)
+			update_pfn_range(vb, page);
+		else
+			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
@@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 
 	num_allocated_pages = vb->num_pfns;
 	/* Did we get any? */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->inflate_vq);
+	if (vb->num_pfns != 0) {
+		if (use_bmap)
+			set_page_bitmap(vb, &vb_dev_info->pages,
+					vb->inflate_vq);
+		else
+			tell_host(vb, vb->inflate_vq);
+	}
 	mutex_unlock(&vb->balloon_lock);
 
 	return num_allocated_pages;
@@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
 	}
 }
 
-static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
+				bool use_bmap)
 {
-	unsigned num_freed_pages;
+	unsigned int num_freed_pages;
 	struct page *page;
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
 	LIST_HEAD(pages);
 
-	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+	if (use_bmap)
+		init_pfn_range(vb);
+	else
+		/* We can only do one array worth at a time. */
+		num = min(num, ARRAY_SIZE(vb->pfns));
 
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 		page = balloon_page_dequeue(vb_dev_info);
 		if (!page)
 			break;
-		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		if (use_bmap)
+			update_pfn_range(vb, page);
+		else
+			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		list_add(&page->lru, &pages);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
@@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 	 * is true, we *have* to do it in this order
 	 */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->deflate_vq);
-	release_pages_balloon(vb, &pages);
+	if (vb->num_pfns != 0) {
+		if (use_bmap)
+			set_page_bitmap(vb, &pages, vb->deflate_vq);
+		else
+			tell_host(vb, vb->deflate_vq);
+
+		release_pages_balloon(vb, &pages);
+	}
 	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
 }
@@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
 	struct virtio_balloon *vb;
 	unsigned long *freed;
 	unsigned num_freed_pages;
+	bool use_bmap;
 
 	vb = container_of(self, struct virtio_balloon, nb);
 	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 		return NOTIFY_OK;
 
 	freed = parm;
-	num_freed_pages = leak_balloon(vb, oom_pages);
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
 	update_balloon_size(vb);
 	*freed += num_freed_pages;
 
@@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
 {
 	struct virtio_balloon *vb;
 	s64 diff;
+	bool use_bmap;
 
 	vb = container_of(work, struct virtio_balloon,
 			  update_balloon_size_work);
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
 	diff = towards_target(vb);
 
 	if (diff > 0)
-		diff -= fill_balloon(vb, diff);
+		diff -= fill_balloon(vb, diff, use_bmap);
 	else if (diff < 0)
-		diff += leak_balloon(vb, -diff);
+		diff += leak_balloon(vb, -diff, use_bmap);
 	update_balloon_size(vb);
 
 	if (diff)
@@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
 static int virtballoon_probe(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb;
-	int err;
+	int err, hdr_len;
 
 	if (!vdev->config->get) {
 		dev_err(&vdev->dev, "%s failure: config access disabled\n",
@@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	spin_lock_init(&vb->stop_update_lock);
 	vb->stop_update = false;
 	vb->num_pages = 0;
+	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
+	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
+	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
+		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
+	hdr_len = sizeof(struct balloon_bmap_hdr);
+	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
+
+	/* Clear the feature bit if memory allocation fails */
+	if (!vb->bmap_hdr)
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+	else
+		vb->page_bitmap = vb->bmap_hdr + hdr_len;
 	mutex_init(&vb->balloon_lock);
 	init_waitqueue_head(&vb->acked);
 	vb->vdev = vdev;
@@ -541,9 +676,12 @@ out:
 
 static void remove_common(struct virtio_balloon *vb)
 {
+	bool use_bmap;
+
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
 	/* There might be pages left in the balloon: free them. */
 	while (vb->num_pages)
-		leak_balloon(vb, vb->num_pages);
+		leak_balloon(vb, vb->num_pages, use_bmap);
 	update_balloon_size(vb);
 
 	/* Now we reset the device so we can clean up the queues. */
@@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 	cancel_work_sync(&vb->update_balloon_stats_work);
 
 	remove_common(vb);
+	kfree(vb->page_bitmap);
 	kfree(vb);
 }
 
@@ -603,6 +742,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
+	VIRTIO_BALLOON_F_PAGE_BITMAP,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

The implementation of the current virtio-balloon is not very
efficient, the time spends on different stages of inflating
the balloon to 7GB of a 8GB idle guest:

a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

It takes about 4126ms for the inflating process to complete.
Debugging shows that the bottle neck are the stage b and stage d.

If using a bitmap to send the page info instead of the PFNs, we
can reduce the overhead in stage b quite a lot. Furthermore, we
can do the address translation and call madvise() with a bulk of
RAM pages, instead of the current page per page way, the overhead
of stage c and stage d can also be reduced a lot.

This patch is the kernel side implementation which is intended to
speed up the inflating & deflating process by adding a new feature
to the virtio-balloon device. With this new feature, inflating the
balloon to 7GB of a 8GB idle guest only takes 590ms, the
performance improvement is about 85%.

TODO: optimize stage a by allocating/freeing a chunk of pages
instead of a single page at a time.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 162 insertions(+), 22 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8d649a2..2d18ff6 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -41,10 +41,28 @@
 #define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+/*
+ * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
+ * to prevent a very large page bitmap, there are two reasons for this:
+ * 1) to save memory.
+ * 2) allocate a large bitmap may fail.
+ *
+ * The actual limit of pfn is determined by:
+ * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
+ *
+ * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
+ * the page list and send the PFNs with several times. To reduce the
+ * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
+ * be set with a value which can cover most cases.
+ */
+#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */
+
 static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 
+extern unsigned long get_max_pfn(void);
+
 struct virtio_balloon {
 	struct virtio_device *vdev;
 	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
@@ -62,6 +80,15 @@ struct virtio_balloon {
 
 	/* Number of balloon pages we've told the Host we're not using. */
 	unsigned int num_pages;
+	/* Pointer of the bitmap header. */
+	void *bmap_hdr;
+	/* Bitmap and length used to tell the host the pages */
+	unsigned long *page_bitmap;
+	unsigned long bmap_len;
+	/* Pfn limit */
+	unsigned long pfn_limit;
+	/* Used to record the processed pfn range */
+	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
 	/*
 	 * The pages we've told the Host we're not using are enqueued
 	 * at vb_dev_info->pages list.
@@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
 	wake_up(&vb->acked);
 }
 
+static inline void init_pfn_range(struct virtio_balloon *vb)
+{
+	vb->min_pfn = ULONG_MAX;
+	vb->max_pfn = 0;
+}
+
+static inline void update_pfn_range(struct virtio_balloon *vb,
+				 struct page *page)
+{
+	unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+	if (balloon_pfn < vb->min_pfn)
+		vb->min_pfn = balloon_pfn;
+	if (balloon_pfn > vb->max_pfn)
+		vb->max_pfn = balloon_pfn;
+}
+
 static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 {
 	struct scatterlist sg;
 	unsigned int len;
 
-	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
+		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
+		unsigned long bmap_len;
+
+		/* cmd and req_id are not used here, set them to 0 */
+		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
+		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
+		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
+		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
+		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
+		bmap_len = min(vb->bmap_len,
+			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
+		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
+		sg_init_one(&sg, hdr,
+			 sizeof(struct balloon_bmap_hdr) + bmap_len);
+	} else
+		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
 
 	/* We should always be able to add one buffer to an empty queue. */
 	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
@@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 
 	/* When host has read buffer, this completes via balloon_ack */
 	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
-
 }
 
 static void set_page_pfns(struct virtio_balloon *vb,
@@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
 					  page_to_balloon_pfn(page) + i);
 }
 
-static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
+static void set_page_bitmap(struct virtio_balloon *vb,
+			 struct list_head *pages, struct virtqueue *vq)
+{
+	unsigned long pfn;
+	struct page *page;
+	bool found;
+
+	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
+	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
+	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
+			pfn += vb->pfn_limit) {
+		vb->start_pfn = pfn + vb->pfn_limit;
+		vb->end_pfn = pfn;
+		memset(vb->page_bitmap, 0, vb->bmap_len);
+		found = false;
+		list_for_each_entry(page, pages, lru) {
+			unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+			if (balloon_pfn < pfn ||
+				 balloon_pfn >= pfn + vb->pfn_limit)
+				continue;
+			set_bit(balloon_pfn - pfn, vb->page_bitmap);
+			if (balloon_pfn > vb->end_pfn)
+				vb->end_pfn = balloon_pfn;
+			if (balloon_pfn < vb->start_pfn)
+				vb->start_pfn = balloon_pfn;
+			found = true;
+		}
+		if (found) {
+			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
+			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
+			tell_host(vb, vq);
+		}
+	}
+}
+
+static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
+				 bool use_bmap)
 {
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
-	unsigned num_allocated_pages;
+	unsigned int num_allocated_pages;
 
-	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+	if (use_bmap)
+		init_pfn_range(vb);
+	else
+		/* We can only do one array worth at a time. */
+		num = min(num, ARRAY_SIZE(vb->pfns));
 
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 			msleep(200);
 			break;
 		}
-		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		if (use_bmap)
+			update_pfn_range(vb, page);
+		else
+			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
@@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 
 	num_allocated_pages = vb->num_pfns;
 	/* Did we get any? */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->inflate_vq);
+	if (vb->num_pfns != 0) {
+		if (use_bmap)
+			set_page_bitmap(vb, &vb_dev_info->pages,
+					vb->inflate_vq);
+		else
+			tell_host(vb, vb->inflate_vq);
+	}
 	mutex_unlock(&vb->balloon_lock);
 
 	return num_allocated_pages;
@@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
 	}
 }
 
-static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
+				bool use_bmap)
 {
-	unsigned num_freed_pages;
+	unsigned int num_freed_pages;
 	struct page *page;
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
 	LIST_HEAD(pages);
 
-	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+	if (use_bmap)
+		init_pfn_range(vb);
+	else
+		/* We can only do one array worth at a time. */
+		num = min(num, ARRAY_SIZE(vb->pfns));
 
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 		page = balloon_page_dequeue(vb_dev_info);
 		if (!page)
 			break;
-		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		if (use_bmap)
+			update_pfn_range(vb, page);
+		else
+			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		list_add(&page->lru, &pages);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
@@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 	 * is true, we *have* to do it in this order
 	 */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->deflate_vq);
-	release_pages_balloon(vb, &pages);
+	if (vb->num_pfns != 0) {
+		if (use_bmap)
+			set_page_bitmap(vb, &pages, vb->deflate_vq);
+		else
+			tell_host(vb, vb->deflate_vq);
+
+		release_pages_balloon(vb, &pages);
+	}
 	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
 }
@@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
 	struct virtio_balloon *vb;
 	unsigned long *freed;
 	unsigned num_freed_pages;
+	bool use_bmap;
 
 	vb = container_of(self, struct virtio_balloon, nb);
 	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 		return NOTIFY_OK;
 
 	freed = parm;
-	num_freed_pages = leak_balloon(vb, oom_pages);
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
 	update_balloon_size(vb);
 	*freed += num_freed_pages;
 
@@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
 {
 	struct virtio_balloon *vb;
 	s64 diff;
+	bool use_bmap;
 
 	vb = container_of(work, struct virtio_balloon,
 			  update_balloon_size_work);
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
 	diff = towards_target(vb);
 
 	if (diff > 0)
-		diff -= fill_balloon(vb, diff);
+		diff -= fill_balloon(vb, diff, use_bmap);
 	else if (diff < 0)
-		diff += leak_balloon(vb, -diff);
+		diff += leak_balloon(vb, -diff, use_bmap);
 	update_balloon_size(vb);
 
 	if (diff)
@@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
 static int virtballoon_probe(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb;
-	int err;
+	int err, hdr_len;
 
 	if (!vdev->config->get) {
 		dev_err(&vdev->dev, "%s failure: config access disabled\n",
@@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	spin_lock_init(&vb->stop_update_lock);
 	vb->stop_update = false;
 	vb->num_pages = 0;
+	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
+	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
+	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
+		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
+	hdr_len = sizeof(struct balloon_bmap_hdr);
+	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
+
+	/* Clear the feature bit if memory allocation fails */
+	if (!vb->bmap_hdr)
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+	else
+		vb->page_bitmap = vb->bmap_hdr + hdr_len;
 	mutex_init(&vb->balloon_lock);
 	init_waitqueue_head(&vb->acked);
 	vb->vdev = vdev;
@@ -541,9 +676,12 @@ out:
 
 static void remove_common(struct virtio_balloon *vb)
 {
+	bool use_bmap;
+
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
 	/* There might be pages left in the balloon: free them. */
 	while (vb->num_pages)
-		leak_balloon(vb, vb->num_pages);
+		leak_balloon(vb, vb->num_pages, use_bmap);
 	update_balloon_size(vb);
 
 	/* Now we reset the device so we can clean up the queues. */
@@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 	cancel_work_sync(&vb->update_balloon_stats_work);
 
 	remove_common(vb);
+	kfree(vb->page_bitmap);
 	kfree(vb);
 }
 
@@ -603,6 +742,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
+	VIRTIO_BALLOON_F_PAGE_BITMAP,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

The implementation of the current virtio-balloon is not very
efficient, the time spends on different stages of inflating
the balloon to 7GB of a 8GB idle guest:

a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

It takes about 4126ms for the inflating process to complete.
Debugging shows that the bottle neck are the stage b and stage d.

If using a bitmap to send the page info instead of the PFNs, we
can reduce the overhead in stage b quite a lot. Furthermore, we
can do the address translation and call madvise() with a bulk of
RAM pages, instead of the current page per page way, the overhead
of stage c and stage d can also be reduced a lot.

This patch is the kernel side implementation which is intended to
speed up the inflating & deflating process by adding a new feature
to the virtio-balloon device. With this new feature, inflating the
balloon to 7GB of a 8GB idle guest only takes 590ms, the
performance improvement is about 85%.

TODO: optimize stage a by allocating/freeing a chunk of pages
instead of a single page at a time.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 162 insertions(+), 22 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8d649a2..2d18ff6 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -41,10 +41,28 @@
 #define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+/*
+ * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
+ * to prevent a very large page bitmap, there are two reasons for this:
+ * 1) to save memory.
+ * 2) allocate a large bitmap may fail.
+ *
+ * The actual limit of pfn is determined by:
+ * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
+ *
+ * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
+ * the page list and send the PFNs with several times. To reduce the
+ * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
+ * be set with a value which can cover most cases.
+ */
+#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */
+
 static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 
+extern unsigned long get_max_pfn(void);
+
 struct virtio_balloon {
 	struct virtio_device *vdev;
 	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
@@ -62,6 +80,15 @@ struct virtio_balloon {
 
 	/* Number of balloon pages we've told the Host we're not using. */
 	unsigned int num_pages;
+	/* Pointer of the bitmap header. */
+	void *bmap_hdr;
+	/* Bitmap and length used to tell the host the pages */
+	unsigned long *page_bitmap;
+	unsigned long bmap_len;
+	/* Pfn limit */
+	unsigned long pfn_limit;
+	/* Used to record the processed pfn range */
+	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
 	/*
 	 * The pages we've told the Host we're not using are enqueued
 	 * at vb_dev_info->pages list.
@@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
 	wake_up(&vb->acked);
 }
 
+static inline void init_pfn_range(struct virtio_balloon *vb)
+{
+	vb->min_pfn = ULONG_MAX;
+	vb->max_pfn = 0;
+}
+
+static inline void update_pfn_range(struct virtio_balloon *vb,
+				 struct page *page)
+{
+	unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+	if (balloon_pfn < vb->min_pfn)
+		vb->min_pfn = balloon_pfn;
+	if (balloon_pfn > vb->max_pfn)
+		vb->max_pfn = balloon_pfn;
+}
+
 static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 {
 	struct scatterlist sg;
 	unsigned int len;
 
-	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
+		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
+		unsigned long bmap_len;
+
+		/* cmd and req_id are not used here, set them to 0 */
+		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
+		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
+		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
+		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
+		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
+		bmap_len = min(vb->bmap_len,
+			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
+		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
+		sg_init_one(&sg, hdr,
+			 sizeof(struct balloon_bmap_hdr) + bmap_len);
+	} else
+		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
 
 	/* We should always be able to add one buffer to an empty queue. */
 	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
@@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 
 	/* When host has read buffer, this completes via balloon_ack */
 	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
-
 }
 
 static void set_page_pfns(struct virtio_balloon *vb,
@@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
 					  page_to_balloon_pfn(page) + i);
 }
 
-static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
+static void set_page_bitmap(struct virtio_balloon *vb,
+			 struct list_head *pages, struct virtqueue *vq)
+{
+	unsigned long pfn;
+	struct page *page;
+	bool found;
+
+	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
+	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
+	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
+			pfn += vb->pfn_limit) {
+		vb->start_pfn = pfn + vb->pfn_limit;
+		vb->end_pfn = pfn;
+		memset(vb->page_bitmap, 0, vb->bmap_len);
+		found = false;
+		list_for_each_entry(page, pages, lru) {
+			unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+			if (balloon_pfn < pfn ||
+				 balloon_pfn >= pfn + vb->pfn_limit)
+				continue;
+			set_bit(balloon_pfn - pfn, vb->page_bitmap);
+			if (balloon_pfn > vb->end_pfn)
+				vb->end_pfn = balloon_pfn;
+			if (balloon_pfn < vb->start_pfn)
+				vb->start_pfn = balloon_pfn;
+			found = true;
+		}
+		if (found) {
+			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
+			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
+			tell_host(vb, vq);
+		}
+	}
+}
+
+static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
+				 bool use_bmap)
 {
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
-	unsigned num_allocated_pages;
+	unsigned int num_allocated_pages;
 
-	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+	if (use_bmap)
+		init_pfn_range(vb);
+	else
+		/* We can only do one array worth at a time. */
+		num = min(num, ARRAY_SIZE(vb->pfns));
 
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 			msleep(200);
 			break;
 		}
-		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		if (use_bmap)
+			update_pfn_range(vb, page);
+		else
+			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
@@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 
 	num_allocated_pages = vb->num_pfns;
 	/* Did we get any? */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->inflate_vq);
+	if (vb->num_pfns != 0) {
+		if (use_bmap)
+			set_page_bitmap(vb, &vb_dev_info->pages,
+					vb->inflate_vq);
+		else
+			tell_host(vb, vb->inflate_vq);
+	}
 	mutex_unlock(&vb->balloon_lock);
 
 	return num_allocated_pages;
@@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
 	}
 }
 
-static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
+				bool use_bmap)
 {
-	unsigned num_freed_pages;
+	unsigned int num_freed_pages;
 	struct page *page;
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
 	LIST_HEAD(pages);
 
-	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+	if (use_bmap)
+		init_pfn_range(vb);
+	else
+		/* We can only do one array worth at a time. */
+		num = min(num, ARRAY_SIZE(vb->pfns));
 
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 		page = balloon_page_dequeue(vb_dev_info);
 		if (!page)
 			break;
-		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		if (use_bmap)
+			update_pfn_range(vb, page);
+		else
+			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		list_add(&page->lru, &pages);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
@@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 	 * is true, we *have* to do it in this order
 	 */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->deflate_vq);
-	release_pages_balloon(vb, &pages);
+	if (vb->num_pfns != 0) {
+		if (use_bmap)
+			set_page_bitmap(vb, &pages, vb->deflate_vq);
+		else
+			tell_host(vb, vb->deflate_vq);
+
+		release_pages_balloon(vb, &pages);
+	}
 	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
 }
@@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
 	struct virtio_balloon *vb;
 	unsigned long *freed;
 	unsigned num_freed_pages;
+	bool use_bmap;
 
 	vb = container_of(self, struct virtio_balloon, nb);
 	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 		return NOTIFY_OK;
 
 	freed = parm;
-	num_freed_pages = leak_balloon(vb, oom_pages);
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
 	update_balloon_size(vb);
 	*freed += num_freed_pages;
 
@@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
 {
 	struct virtio_balloon *vb;
 	s64 diff;
+	bool use_bmap;
 
 	vb = container_of(work, struct virtio_balloon,
 			  update_balloon_size_work);
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
 	diff = towards_target(vb);
 
 	if (diff > 0)
-		diff -= fill_balloon(vb, diff);
+		diff -= fill_balloon(vb, diff, use_bmap);
 	else if (diff < 0)
-		diff += leak_balloon(vb, -diff);
+		diff += leak_balloon(vb, -diff, use_bmap);
 	update_balloon_size(vb);
 
 	if (diff)
@@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
 static int virtballoon_probe(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb;
-	int err;
+	int err, hdr_len;
 
 	if (!vdev->config->get) {
 		dev_err(&vdev->dev, "%s failure: config access disabled\n",
@@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	spin_lock_init(&vb->stop_update_lock);
 	vb->stop_update = false;
 	vb->num_pages = 0;
+	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
+	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
+	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
+		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
+	hdr_len = sizeof(struct balloon_bmap_hdr);
+	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
+
+	/* Clear the feature bit if memory allocation fails */
+	if (!vb->bmap_hdr)
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+	else
+		vb->page_bitmap = vb->bmap_hdr + hdr_len;
 	mutex_init(&vb->balloon_lock);
 	init_waitqueue_head(&vb->acked);
 	vb->vdev = vdev;
@@ -541,9 +676,12 @@ out:
 
 static void remove_common(struct virtio_balloon *vb)
 {
+	bool use_bmap;
+
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
 	/* There might be pages left in the balloon: free them. */
 	while (vb->num_pages)
-		leak_balloon(vb, vb->num_pages);
+		leak_balloon(vb, vb->num_pages, use_bmap);
 	update_balloon_size(vb);
 
 	/* Now we reset the device so we can clean up the queues. */
@@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 	cancel_work_sync(&vb->update_balloon_stats_work);
 
 	remove_common(vb);
+	kfree(vb->page_bitmap);
 	kfree(vb);
 }
 
@@ -603,6 +742,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
+	VIRTIO_BALLOON_F_PAGE_BITMAP,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27  1:23 ` Liang Li
                   ` (7 preceding siblings ...)
  (?)
@ 2016-07-27  1:23 ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Liang Li, qemu-devel, dgilbert,
	linux-mm, Michael S. Tsirkin, Paolo Bonzini, Andrew Morton,
	virtualization, Mel Gorman, Vlastimil Babka

The implementation of the current virtio-balloon is not very
efficient, the time spends on different stages of inflating
the balloon to 7GB of a 8GB idle guest:

a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

It takes about 4126ms for the inflating process to complete.
Debugging shows that the bottle neck are the stage b and stage d.

If using a bitmap to send the page info instead of the PFNs, we
can reduce the overhead in stage b quite a lot. Furthermore, we
can do the address translation and call madvise() with a bulk of
RAM pages, instead of the current page per page way, the overhead
of stage c and stage d can also be reduced a lot.

This patch is the kernel side implementation which is intended to
speed up the inflating & deflating process by adding a new feature
to the virtio-balloon device. With this new feature, inflating the
balloon to 7GB of a 8GB idle guest only takes 590ms, the
performance improvement is about 85%.

TODO: optimize stage a by allocating/freeing a chunk of pages
instead of a single page at a time.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 162 insertions(+), 22 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 8d649a2..2d18ff6 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -41,10 +41,28 @@
 #define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+/*
+ * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
+ * to prevent a very large page bitmap, there are two reasons for this:
+ * 1) to save memory.
+ * 2) allocate a large bitmap may fail.
+ *
+ * The actual limit of pfn is determined by:
+ * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
+ *
+ * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
+ * the page list and send the PFNs with several times. To reduce the
+ * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
+ * be set with a value which can cover most cases.
+ */
+#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */
+
 static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 
+extern unsigned long get_max_pfn(void);
+
 struct virtio_balloon {
 	struct virtio_device *vdev;
 	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
@@ -62,6 +80,15 @@ struct virtio_balloon {
 
 	/* Number of balloon pages we've told the Host we're not using. */
 	unsigned int num_pages;
+	/* Pointer of the bitmap header. */
+	void *bmap_hdr;
+	/* Bitmap and length used to tell the host the pages */
+	unsigned long *page_bitmap;
+	unsigned long bmap_len;
+	/* Pfn limit */
+	unsigned long pfn_limit;
+	/* Used to record the processed pfn range */
+	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
 	/*
 	 * The pages we've told the Host we're not using are enqueued
 	 * at vb_dev_info->pages list.
@@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
 	wake_up(&vb->acked);
 }
 
+static inline void init_pfn_range(struct virtio_balloon *vb)
+{
+	vb->min_pfn = ULONG_MAX;
+	vb->max_pfn = 0;
+}
+
+static inline void update_pfn_range(struct virtio_balloon *vb,
+				 struct page *page)
+{
+	unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+	if (balloon_pfn < vb->min_pfn)
+		vb->min_pfn = balloon_pfn;
+	if (balloon_pfn > vb->max_pfn)
+		vb->max_pfn = balloon_pfn;
+}
+
 static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 {
 	struct scatterlist sg;
 	unsigned int len;
 
-	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
+		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
+		unsigned long bmap_len;
+
+		/* cmd and req_id are not used here, set them to 0 */
+		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
+		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
+		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
+		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
+		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
+		bmap_len = min(vb->bmap_len,
+			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
+		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
+		sg_init_one(&sg, hdr,
+			 sizeof(struct balloon_bmap_hdr) + bmap_len);
+	} else
+		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
 
 	/* We should always be able to add one buffer to an empty queue. */
 	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
@@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
 
 	/* When host has read buffer, this completes via balloon_ack */
 	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
-
 }
 
 static void set_page_pfns(struct virtio_balloon *vb,
@@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
 					  page_to_balloon_pfn(page) + i);
 }
 
-static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
+static void set_page_bitmap(struct virtio_balloon *vb,
+			 struct list_head *pages, struct virtqueue *vq)
+{
+	unsigned long pfn;
+	struct page *page;
+	bool found;
+
+	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
+	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
+	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
+			pfn += vb->pfn_limit) {
+		vb->start_pfn = pfn + vb->pfn_limit;
+		vb->end_pfn = pfn;
+		memset(vb->page_bitmap, 0, vb->bmap_len);
+		found = false;
+		list_for_each_entry(page, pages, lru) {
+			unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+			if (balloon_pfn < pfn ||
+				 balloon_pfn >= pfn + vb->pfn_limit)
+				continue;
+			set_bit(balloon_pfn - pfn, vb->page_bitmap);
+			if (balloon_pfn > vb->end_pfn)
+				vb->end_pfn = balloon_pfn;
+			if (balloon_pfn < vb->start_pfn)
+				vb->start_pfn = balloon_pfn;
+			found = true;
+		}
+		if (found) {
+			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
+			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
+			tell_host(vb, vq);
+		}
+	}
+}
+
+static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
+				 bool use_bmap)
 {
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
-	unsigned num_allocated_pages;
+	unsigned int num_allocated_pages;
 
-	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+	if (use_bmap)
+		init_pfn_range(vb);
+	else
+		/* We can only do one array worth at a time. */
+		num = min(num, ARRAY_SIZE(vb->pfns));
 
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 			msleep(200);
 			break;
 		}
-		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		if (use_bmap)
+			update_pfn_range(vb, page);
+		else
+			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
@@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 
 	num_allocated_pages = vb->num_pfns;
 	/* Did we get any? */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->inflate_vq);
+	if (vb->num_pfns != 0) {
+		if (use_bmap)
+			set_page_bitmap(vb, &vb_dev_info->pages,
+					vb->inflate_vq);
+		else
+			tell_host(vb, vb->inflate_vq);
+	}
 	mutex_unlock(&vb->balloon_lock);
 
 	return num_allocated_pages;
@@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
 	}
 }
 
-static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
+				bool use_bmap)
 {
-	unsigned num_freed_pages;
+	unsigned int num_freed_pages;
 	struct page *page;
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
 	LIST_HEAD(pages);
 
-	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+	if (use_bmap)
+		init_pfn_range(vb);
+	else
+		/* We can only do one array worth at a time. */
+		num = min(num, ARRAY_SIZE(vb->pfns));
 
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 		page = balloon_page_dequeue(vb_dev_info);
 		if (!page)
 			break;
-		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		if (use_bmap)
+			update_pfn_range(vb, page);
+		else
+			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		list_add(&page->lru, &pages);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
@@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 	 * is true, we *have* to do it in this order
 	 */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->deflate_vq);
-	release_pages_balloon(vb, &pages);
+	if (vb->num_pfns != 0) {
+		if (use_bmap)
+			set_page_bitmap(vb, &pages, vb->deflate_vq);
+		else
+			tell_host(vb, vb->deflate_vq);
+
+		release_pages_balloon(vb, &pages);
+	}
 	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
 }
@@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
 	struct virtio_balloon *vb;
 	unsigned long *freed;
 	unsigned num_freed_pages;
+	bool use_bmap;
 
 	vb = container_of(self, struct virtio_balloon, nb);
 	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 		return NOTIFY_OK;
 
 	freed = parm;
-	num_freed_pages = leak_balloon(vb, oom_pages);
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
 	update_balloon_size(vb);
 	*freed += num_freed_pages;
 
@@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
 {
 	struct virtio_balloon *vb;
 	s64 diff;
+	bool use_bmap;
 
 	vb = container_of(work, struct virtio_balloon,
 			  update_balloon_size_work);
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
 	diff = towards_target(vb);
 
 	if (diff > 0)
-		diff -= fill_balloon(vb, diff);
+		diff -= fill_balloon(vb, diff, use_bmap);
 	else if (diff < 0)
-		diff += leak_balloon(vb, -diff);
+		diff += leak_balloon(vb, -diff, use_bmap);
 	update_balloon_size(vb);
 
 	if (diff)
@@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
 static int virtballoon_probe(struct virtio_device *vdev)
 {
 	struct virtio_balloon *vb;
-	int err;
+	int err, hdr_len;
 
 	if (!vdev->config->get) {
 		dev_err(&vdev->dev, "%s failure: config access disabled\n",
@@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	spin_lock_init(&vb->stop_update_lock);
 	vb->stop_update = false;
 	vb->num_pages = 0;
+	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
+	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
+	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
+		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
+	hdr_len = sizeof(struct balloon_bmap_hdr);
+	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
+
+	/* Clear the feature bit if memory allocation fails */
+	if (!vb->bmap_hdr)
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+	else
+		vb->page_bitmap = vb->bmap_hdr + hdr_len;
 	mutex_init(&vb->balloon_lock);
 	init_waitqueue_head(&vb->acked);
 	vb->vdev = vdev;
@@ -541,9 +676,12 @@ out:
 
 static void remove_common(struct virtio_balloon *vb)
 {
+	bool use_bmap;
+
+	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
 	/* There might be pages left in the balloon: free them. */
 	while (vb->num_pages)
-		leak_balloon(vb, vb->num_pages);
+		leak_balloon(vb, vb->num_pages, use_bmap);
 	update_balloon_size(vb);
 
 	/* Now we reset the device so we can clean up the queues. */
@@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 	cancel_work_sync(&vb->update_balloon_stats_work);
 
 	remove_common(vb);
+	kfree(vb->page_bitmap);
 	kfree(vb);
 }
 
@@ -603,6 +742,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
+	VIRTIO_BALLOON_F_PAGE_BITMAP,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 5/7] virtio-balloon: define feature bit and head for misc virt queue
  2016-07-27  1:23 ` Liang Li
  (?)
@ 2016-07-27  1:23   ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Paolo Bonzini,
	Cornelia Huck, Amit Shah

Define a new feature bit which supports a new virtual queue. This
new virtual qeuque is for information exchange between hypervisor
and guest. The VMM hypervisor can make use of this virtual queue
to request the guest do some operations, e.g. drop page cache,
synchronize file system, etc. And the VMM hypervisor can get some
of guest's runtime information through this virtual queue, e.g. the
guest's free page information, which can be used for live migration
optimization.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 include/uapi/linux/virtio_balloon.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index d3b182a..be4880f 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_PAGE_BITMAP	3 /* Send page info with bitmap */
+#define VIRTIO_BALLOON_F_MISC_VQ	4 /* Misc info virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -101,4 +102,25 @@ struct balloon_bmap_hdr {
 	__virtio64 bmap_len;
 };
 
+enum balloon_req_id {
+	/* Get free pages information */
+	BALLOON_GET_FREE_PAGES,
+};
+
+enum balloon_flag {
+	/* Have more data for a request */
+	BALLOON_FLAG_CONT,
+	/* No more data for a request */
+	BALLOON_FLAG_DONE,
+};
+
+struct balloon_req_hdr {
+	/* Used to distinguish different request */
+	__virtio16 cmd;
+	/* Reserved */
+	__virtio16 reserved[3];
+	/* Request parameter */
+	__virtio64 param;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 5/7] virtio-balloon: define feature bit and head for misc virt queue
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Paolo Bonzini,
	Cornelia Huck, Amit Shah

Define a new feature bit which supports a new virtual queue. This
new virtual qeuque is for information exchange between hypervisor
and guest. The VMM hypervisor can make use of this virtual queue
to request the guest do some operations, e.g. drop page cache,
synchronize file system, etc. And the VMM hypervisor can get some
of guest's runtime information through this virtual queue, e.g. the
guest's free page information, which can be used for live migration
optimization.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 include/uapi/linux/virtio_balloon.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index d3b182a..be4880f 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_PAGE_BITMAP	3 /* Send page info with bitmap */
+#define VIRTIO_BALLOON_F_MISC_VQ	4 /* Misc info virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -101,4 +102,25 @@ struct balloon_bmap_hdr {
 	__virtio64 bmap_len;
 };
 
+enum balloon_req_id {
+	/* Get free pages information */
+	BALLOON_GET_FREE_PAGES,
+};
+
+enum balloon_flag {
+	/* Have more data for a request */
+	BALLOON_FLAG_CONT,
+	/* No more data for a request */
+	BALLOON_FLAG_DONE,
+};
+
+struct balloon_req_hdr {
+	/* Used to distinguish different request */
+	__virtio16 cmd;
+	/* Reserved */
+	__virtio16 reserved[3];
+	/* Request parameter */
+	__virtio64 param;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [Qemu-devel] [PATCH v2 repost 5/7] virtio-balloon: define feature bit and head for misc virt queue
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Paolo Bonzini,
	Cornelia Huck, Amit Shah

Define a new feature bit which supports a new virtual queue. This
new virtual qeuque is for information exchange between hypervisor
and guest. The VMM hypervisor can make use of this virtual queue
to request the guest do some operations, e.g. drop page cache,
synchronize file system, etc. And the VMM hypervisor can get some
of guest's runtime information through this virtual queue, e.g. the
guest's free page information, which can be used for live migration
optimization.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 include/uapi/linux/virtio_balloon.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index d3b182a..be4880f 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_PAGE_BITMAP	3 /* Send page info with bitmap */
+#define VIRTIO_BALLOON_F_MISC_VQ	4 /* Misc info virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -101,4 +102,25 @@ struct balloon_bmap_hdr {
 	__virtio64 bmap_len;
 };
 
+enum balloon_req_id {
+	/* Get free pages information */
+	BALLOON_GET_FREE_PAGES,
+};
+
+enum balloon_flag {
+	/* Have more data for a request */
+	BALLOON_FLAG_CONT,
+	/* No more data for a request */
+	BALLOON_FLAG_DONE,
+};
+
+struct balloon_req_hdr {
+	/* Used to distinguish different request */
+	__virtio16 cmd;
+	/* Reserved */
+	__virtio16 reserved[3];
+	/* Request parameter */
+	__virtio64 param;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 5/7] virtio-balloon: define feature bit and head for misc virt queue
  2016-07-27  1:23 ` Liang Li
                   ` (10 preceding siblings ...)
  (?)
@ 2016-07-27  1:23 ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Liang Li, qemu-devel, dgilbert,
	linux-mm, Michael S. Tsirkin, Paolo Bonzini, virtualization

Define a new feature bit which supports a new virtual queue. This
new virtual qeuque is for information exchange between hypervisor
and guest. The VMM hypervisor can make use of this virtual queue
to request the guest do some operations, e.g. drop page cache,
synchronize file system, etc. And the VMM hypervisor can get some
of guest's runtime information through this virtual queue, e.g. the
guest's free page information, which can be used for live migration
optimization.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 include/uapi/linux/virtio_balloon.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index d3b182a..be4880f 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_PAGE_BITMAP	3 /* Send page info with bitmap */
+#define VIRTIO_BALLOON_F_MISC_VQ	4 /* Misc info virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -101,4 +102,25 @@ struct balloon_bmap_hdr {
 	__virtio64 bmap_len;
 };
 
+enum balloon_req_id {
+	/* Get free pages information */
+	BALLOON_GET_FREE_PAGES,
+};
+
+enum balloon_flag {
+	/* Have more data for a request */
+	BALLOON_FLAG_CONT,
+	/* No more data for a request */
+	BALLOON_FLAG_DONE,
+};
+
+struct balloon_req_hdr {
+	/* Used to distinguish different request */
+	__virtio16 cmd;
+	/* Reserved */
+	__virtio16 reserved[3];
+	/* Request parameter */
+	__virtio64 param;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27  1:23 ` Liang Li
  (?)
@ 2016-07-27  1:23   ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

Save the free page info into a page bitmap, will be used in virtio
balloon device driver.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7da61ad..3ad8b10 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)
 }
 EXPORT_SYMBOL(get_max_pfn);
 
+static void mark_free_pages_bitmap(struct zone *zone, unsigned long start_pfn,
+	unsigned long end_pfn, unsigned long *bitmap, unsigned long len)
+{
+	unsigned long pfn, flags, page_num;
+	unsigned int order, t;
+	struct list_head *curr;
+
+	if (zone_is_empty(zone))
+		return;
+	end_pfn = min(start_pfn + len, end_pfn);
+	spin_lock_irqsave(&zone->lock, flags);
+
+	for_each_migratetype_order(order, t) {
+		list_for_each(curr, &zone->free_area[order].free_list[t]) {
+			pfn = page_to_pfn(list_entry(curr, struct page, lru));
+			if (pfn >= start_pfn && pfn <= end_pfn) {
+				page_num = 1UL << order;
+				if (pfn + page_num > end_pfn)
+					page_num = end_pfn - pfn;
+				bitmap_set(bitmap, pfn - start_pfn, page_num);
+			}
+		}
+	}
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
+int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
+		unsigned long *bitmap, unsigned long len)
+{
+	struct zone *zone;
+	int ret = 0;
+
+	if (bitmap == NULL || start_pfn > end_pfn || start_pfn >= max_pfn)
+		return 0;
+	if (end_pfn < max_pfn)
+		ret = 1;
+	if (end_pfn >= max_pfn)
+		ret = 0;
+
+	for_each_populated_zone(zone)
+		mark_free_pages_bitmap(zone, start_pfn, end_pfn, bitmap, len);
+	return ret;
+}
+EXPORT_SYMBOL(get_free_pages);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

Save the free page info into a page bitmap, will be used in virtio
balloon device driver.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7da61ad..3ad8b10 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)
 }
 EXPORT_SYMBOL(get_max_pfn);
 
+static void mark_free_pages_bitmap(struct zone *zone, unsigned long start_pfn,
+	unsigned long end_pfn, unsigned long *bitmap, unsigned long len)
+{
+	unsigned long pfn, flags, page_num;
+	unsigned int order, t;
+	struct list_head *curr;
+
+	if (zone_is_empty(zone))
+		return;
+	end_pfn = min(start_pfn + len, end_pfn);
+	spin_lock_irqsave(&zone->lock, flags);
+
+	for_each_migratetype_order(order, t) {
+		list_for_each(curr, &zone->free_area[order].free_list[t]) {
+			pfn = page_to_pfn(list_entry(curr, struct page, lru));
+			if (pfn >= start_pfn && pfn <= end_pfn) {
+				page_num = 1UL << order;
+				if (pfn + page_num > end_pfn)
+					page_num = end_pfn - pfn;
+				bitmap_set(bitmap, pfn - start_pfn, page_num);
+			}
+		}
+	}
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
+int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
+		unsigned long *bitmap, unsigned long len)
+{
+	struct zone *zone;
+	int ret = 0;
+
+	if (bitmap == NULL || start_pfn > end_pfn || start_pfn >= max_pfn)
+		return 0;
+	if (end_pfn < max_pfn)
+		ret = 1;
+	if (end_pfn >= max_pfn)
+		ret = 0;
+
+	for_each_populated_zone(zone)
+		mark_free_pages_bitmap(zone, start_pfn, end_pfn, bitmap, len);
+	return ret;
+}
+EXPORT_SYMBOL(get_free_pages);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [Qemu-devel] [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

Save the free page info into a page bitmap, will be used in virtio
balloon device driver.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7da61ad..3ad8b10 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)
 }
 EXPORT_SYMBOL(get_max_pfn);
 
+static void mark_free_pages_bitmap(struct zone *zone, unsigned long start_pfn,
+	unsigned long end_pfn, unsigned long *bitmap, unsigned long len)
+{
+	unsigned long pfn, flags, page_num;
+	unsigned int order, t;
+	struct list_head *curr;
+
+	if (zone_is_empty(zone))
+		return;
+	end_pfn = min(start_pfn + len, end_pfn);
+	spin_lock_irqsave(&zone->lock, flags);
+
+	for_each_migratetype_order(order, t) {
+		list_for_each(curr, &zone->free_area[order].free_list[t]) {
+			pfn = page_to_pfn(list_entry(curr, struct page, lru));
+			if (pfn >= start_pfn && pfn <= end_pfn) {
+				page_num = 1UL << order;
+				if (pfn + page_num > end_pfn)
+					page_num = end_pfn - pfn;
+				bitmap_set(bitmap, pfn - start_pfn, page_num);
+			}
+		}
+	}
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
+int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
+		unsigned long *bitmap, unsigned long len)
+{
+	struct zone *zone;
+	int ret = 0;
+
+	if (bitmap == NULL || start_pfn > end_pfn || start_pfn >= max_pfn)
+		return 0;
+	if (end_pfn < max_pfn)
+		ret = 1;
+	if (end_pfn >= max_pfn)
+		ret = 0;
+
+	for_each_populated_zone(zone)
+		mark_free_pages_bitmap(zone, start_pfn, end_pfn, bitmap, len);
+	return ret;
+}
+EXPORT_SYMBOL(get_free_pages);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27  1:23 ` Liang Li
                   ` (12 preceding siblings ...)
  (?)
@ 2016-07-27  1:23 ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Liang Li, qemu-devel, dgilbert,
	linux-mm, Michael S. Tsirkin, Paolo Bonzini, Andrew Morton,
	virtualization, Mel Gorman, Vlastimil Babka

Save the free page info into a page bitmap, will be used in virtio
balloon device driver.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7da61ad..3ad8b10 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)
 }
 EXPORT_SYMBOL(get_max_pfn);
 
+static void mark_free_pages_bitmap(struct zone *zone, unsigned long start_pfn,
+	unsigned long end_pfn, unsigned long *bitmap, unsigned long len)
+{
+	unsigned long pfn, flags, page_num;
+	unsigned int order, t;
+	struct list_head *curr;
+
+	if (zone_is_empty(zone))
+		return;
+	end_pfn = min(start_pfn + len, end_pfn);
+	spin_lock_irqsave(&zone->lock, flags);
+
+	for_each_migratetype_order(order, t) {
+		list_for_each(curr, &zone->free_area[order].free_list[t]) {
+			pfn = page_to_pfn(list_entry(curr, struct page, lru));
+			if (pfn >= start_pfn && pfn <= end_pfn) {
+				page_num = 1UL << order;
+				if (pfn + page_num > end_pfn)
+					page_num = end_pfn - pfn;
+				bitmap_set(bitmap, pfn - start_pfn, page_num);
+			}
+		}
+	}
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
+int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
+		unsigned long *bitmap, unsigned long len)
+{
+	struct zone *zone;
+	int ret = 0;
+
+	if (bitmap == NULL || start_pfn > end_pfn || start_pfn >= max_pfn)
+		return 0;
+	if (end_pfn < max_pfn)
+		ret = 1;
+	if (end_pfn >= max_pfn)
+		ret = 0;
+
+	for_each_populated_zone(zone)
+		mark_free_pages_bitmap(zone, start_pfn, end_pfn, bitmap, len);
+	return ret;
+}
+EXPORT_SYMBOL(get_free_pages);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
  2016-07-27  1:23 ` Liang Li
  (?)
@ 2016-07-27  1:23   ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

Support the request for vm's free page information, response with
a page bitmap. QEMU can make use of this free page bitmap to speed
up live migration process by skipping process the free pages.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 104 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 98 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 2d18ff6..5ca4ad3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -62,10 +62,13 @@ module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 
 extern unsigned long get_max_pfn(void);
+extern int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
+		unsigned long *bitmap, unsigned long len);
+
 
 struct virtio_balloon {
 	struct virtio_device *vdev;
-	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *misc_vq;
 
 	/* The balloon servicing is delegated to a freezable workqueue. */
 	struct work_struct update_balloon_stats_work;
@@ -89,6 +92,8 @@ struct virtio_balloon {
 	unsigned long pfn_limit;
 	/* Used to record the processed pfn range */
 	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
+	/* Request header */
+	struct balloon_req_hdr req_hdr;
 	/*
 	 * The pages we've told the Host we're not using are enqueued
 	 * at vb_dev_info->pages list.
@@ -373,6 +378,49 @@ static void update_balloon_stats(struct virtio_balloon *vb)
 				pages_to_bytes(available));
 }
 
+static void update_free_pages_stats(struct virtio_balloon *vb,
+				unsigned long req_id)
+{
+	struct scatterlist sg_in, sg_out;
+	unsigned long pfn = 0, bmap_len, max_pfn;
+	struct virtqueue *vq = vb->misc_vq;
+	struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
+	int ret = 1;
+
+	max_pfn = get_max_pfn();
+	mutex_lock(&vb->balloon_lock);
+	while (pfn < max_pfn) {
+		memset(vb->page_bitmap, 0, vb->bmap_len);
+		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
+			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
+		hdr->cmd = cpu_to_virtio16(vb->vdev, BALLOON_GET_FREE_PAGES);
+		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
+		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
+		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
+		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
+		if (!ret) {
+			hdr->flag = cpu_to_virtio16(vb->vdev,
+							BALLOON_FLAG_DONE);
+			if (pfn + vb->pfn_limit > max_pfn)
+				bmap_len = (max_pfn - pfn) / BITS_PER_BYTE;
+		} else
+			hdr->flag = cpu_to_virtio16(vb->vdev,
+							BALLOON_FLAG_CONT);
+		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
+		sg_init_one(&sg_out, hdr,
+			 sizeof(struct balloon_bmap_hdr) + bmap_len);
+
+		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
+		virtqueue_kick(vq);
+		pfn += vb->pfn_limit;
+	}
+
+	sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+	virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL);
+	virtqueue_kick(vq);
+	mutex_unlock(&vb->balloon_lock);
+}
+
 /*
  * While most virtqueues communicate guest-initiated requests to the hypervisor,
  * the stats queue operates in reverse.  The driver initializes the virtqueue
@@ -511,18 +559,49 @@ static void update_balloon_size_func(struct work_struct *work)
 		queue_work(system_freezable_wq, work);
 }
 
+static void misc_handle_rq(struct virtio_balloon *vb)
+{
+	struct balloon_req_hdr *ptr_hdr;
+	unsigned int len;
+
+	ptr_hdr = virtqueue_get_buf(vb->misc_vq, &len);
+	if (!ptr_hdr || len != sizeof(vb->req_hdr))
+		return;
+
+	switch (ptr_hdr->cmd) {
+	case BALLOON_GET_FREE_PAGES:
+		update_free_pages_stats(vb, ptr_hdr->param);
+		break;
+	default:
+		break;
+	}
+}
+
+static void misc_request(struct virtqueue *vq)
+{
+	struct virtio_balloon *vb = vq->vdev->priv;
+
+	misc_handle_rq(vb);
+}
+
 static int init_vqs(struct virtio_balloon *vb)
 {
-	struct virtqueue *vqs[3];
-	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
-	static const char * const names[] = { "inflate", "deflate", "stats" };
+	struct virtqueue *vqs[4];
+	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
+					 stats_request, misc_request };
+	static const char * const names[] = { "inflate", "deflate", "stats",
+						 "misc" };
 	int err, nvqs;
 
 	/*
 	 * We expect two virtqueues: inflate and deflate, and
 	 * optionally stat.
 	 */
-	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
+		nvqs = 4;
+	else
+		nvqs = virtio_has_feature(vb->vdev,
+					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
 	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
 	if (err)
 		return err;
@@ -543,6 +622,16 @@ static int init_vqs(struct virtio_balloon *vb)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
 	}
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) {
+		struct scatterlist sg_in;
+
+		vb->misc_vq = vqs[3];
+		sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+		if (virtqueue_add_inbuf(vb->misc_vq, &sg_in, 1,
+		    &vb->req_hdr, GFP_KERNEL) < 0)
+			BUG();
+		virtqueue_kick(vb->misc_vq);
+	}
 	return 0;
 }
 
@@ -639,8 +728,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
 
 	/* Clear the feature bit if memory allocation fails */
-	if (!vb->bmap_hdr)
+	if (!vb->bmap_hdr) {
 		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_MISC_VQ);
+	}
 	else
 		vb->page_bitmap = vb->bmap_hdr + hdr_len;
 	mutex_init(&vb->balloon_lock);
@@ -743,6 +834,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
 	VIRTIO_BALLOON_F_PAGE_BITMAP,
+	VIRTIO_BALLOON_F_MISC_VQ,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

Support the request for vm's free page information, response with
a page bitmap. QEMU can make use of this free page bitmap to speed
up live migration process by skipping process the free pages.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 104 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 98 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 2d18ff6..5ca4ad3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -62,10 +62,13 @@ module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 
 extern unsigned long get_max_pfn(void);
+extern int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
+		unsigned long *bitmap, unsigned long len);
+
 
 struct virtio_balloon {
 	struct virtio_device *vdev;
-	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *misc_vq;
 
 	/* The balloon servicing is delegated to a freezable workqueue. */
 	struct work_struct update_balloon_stats_work;
@@ -89,6 +92,8 @@ struct virtio_balloon {
 	unsigned long pfn_limit;
 	/* Used to record the processed pfn range */
 	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
+	/* Request header */
+	struct balloon_req_hdr req_hdr;
 	/*
 	 * The pages we've told the Host we're not using are enqueued
 	 * at vb_dev_info->pages list.
@@ -373,6 +378,49 @@ static void update_balloon_stats(struct virtio_balloon *vb)
 				pages_to_bytes(available));
 }
 
+static void update_free_pages_stats(struct virtio_balloon *vb,
+				unsigned long req_id)
+{
+	struct scatterlist sg_in, sg_out;
+	unsigned long pfn = 0, bmap_len, max_pfn;
+	struct virtqueue *vq = vb->misc_vq;
+	struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
+	int ret = 1;
+
+	max_pfn = get_max_pfn();
+	mutex_lock(&vb->balloon_lock);
+	while (pfn < max_pfn) {
+		memset(vb->page_bitmap, 0, vb->bmap_len);
+		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
+			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
+		hdr->cmd = cpu_to_virtio16(vb->vdev, BALLOON_GET_FREE_PAGES);
+		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
+		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
+		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
+		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
+		if (!ret) {
+			hdr->flag = cpu_to_virtio16(vb->vdev,
+							BALLOON_FLAG_DONE);
+			if (pfn + vb->pfn_limit > max_pfn)
+				bmap_len = (max_pfn - pfn) / BITS_PER_BYTE;
+		} else
+			hdr->flag = cpu_to_virtio16(vb->vdev,
+							BALLOON_FLAG_CONT);
+		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
+		sg_init_one(&sg_out, hdr,
+			 sizeof(struct balloon_bmap_hdr) + bmap_len);
+
+		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
+		virtqueue_kick(vq);
+		pfn += vb->pfn_limit;
+	}
+
+	sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+	virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL);
+	virtqueue_kick(vq);
+	mutex_unlock(&vb->balloon_lock);
+}
+
 /*
  * While most virtqueues communicate guest-initiated requests to the hypervisor,
  * the stats queue operates in reverse.  The driver initializes the virtqueue
@@ -511,18 +559,49 @@ static void update_balloon_size_func(struct work_struct *work)
 		queue_work(system_freezable_wq, work);
 }
 
+static void misc_handle_rq(struct virtio_balloon *vb)
+{
+	struct balloon_req_hdr *ptr_hdr;
+	unsigned int len;
+
+	ptr_hdr = virtqueue_get_buf(vb->misc_vq, &len);
+	if (!ptr_hdr || len != sizeof(vb->req_hdr))
+		return;
+
+	switch (ptr_hdr->cmd) {
+	case BALLOON_GET_FREE_PAGES:
+		update_free_pages_stats(vb, ptr_hdr->param);
+		break;
+	default:
+		break;
+	}
+}
+
+static void misc_request(struct virtqueue *vq)
+{
+	struct virtio_balloon *vb = vq->vdev->priv;
+
+	misc_handle_rq(vb);
+}
+
 static int init_vqs(struct virtio_balloon *vb)
 {
-	struct virtqueue *vqs[3];
-	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
-	static const char * const names[] = { "inflate", "deflate", "stats" };
+	struct virtqueue *vqs[4];
+	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
+					 stats_request, misc_request };
+	static const char * const names[] = { "inflate", "deflate", "stats",
+						 "misc" };
 	int err, nvqs;
 
 	/*
 	 * We expect two virtqueues: inflate and deflate, and
 	 * optionally stat.
 	 */
-	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
+		nvqs = 4;
+	else
+		nvqs = virtio_has_feature(vb->vdev,
+					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
 	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
 	if (err)
 		return err;
@@ -543,6 +622,16 @@ static int init_vqs(struct virtio_balloon *vb)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
 	}
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) {
+		struct scatterlist sg_in;
+
+		vb->misc_vq = vqs[3];
+		sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+		if (virtqueue_add_inbuf(vb->misc_vq, &sg_in, 1,
+		    &vb->req_hdr, GFP_KERNEL) < 0)
+			BUG();
+		virtqueue_kick(vb->misc_vq);
+	}
 	return 0;
 }
 
@@ -639,8 +728,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
 
 	/* Clear the feature bit if memory allocation fails */
-	if (!vb->bmap_hdr)
+	if (!vb->bmap_hdr) {
 		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_MISC_VQ);
+	}
 	else
 		vb->page_bitmap = vb->bmap_hdr + hdr_len;
 	mutex_init(&vb->balloon_lock);
@@ -743,6 +834,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
 	VIRTIO_BALLOON_F_PAGE_BITMAP,
+	VIRTIO_BALLOON_F_MISC_VQ,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [Qemu-devel] [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
@ 2016-07-27  1:23   ` Liang Li
  0 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Liang Li, Michael S. Tsirkin, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

Support the request for vm's free page information, response with
a page bitmap. QEMU can make use of this free page bitmap to speed
up live migration process by skipping process the free pages.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 104 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 98 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 2d18ff6..5ca4ad3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -62,10 +62,13 @@ module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 
 extern unsigned long get_max_pfn(void);
+extern int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
+		unsigned long *bitmap, unsigned long len);
+
 
 struct virtio_balloon {
 	struct virtio_device *vdev;
-	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *misc_vq;
 
 	/* The balloon servicing is delegated to a freezable workqueue. */
 	struct work_struct update_balloon_stats_work;
@@ -89,6 +92,8 @@ struct virtio_balloon {
 	unsigned long pfn_limit;
 	/* Used to record the processed pfn range */
 	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
+	/* Request header */
+	struct balloon_req_hdr req_hdr;
 	/*
 	 * The pages we've told the Host we're not using are enqueued
 	 * at vb_dev_info->pages list.
@@ -373,6 +378,49 @@ static void update_balloon_stats(struct virtio_balloon *vb)
 				pages_to_bytes(available));
 }
 
+static void update_free_pages_stats(struct virtio_balloon *vb,
+				unsigned long req_id)
+{
+	struct scatterlist sg_in, sg_out;
+	unsigned long pfn = 0, bmap_len, max_pfn;
+	struct virtqueue *vq = vb->misc_vq;
+	struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
+	int ret = 1;
+
+	max_pfn = get_max_pfn();
+	mutex_lock(&vb->balloon_lock);
+	while (pfn < max_pfn) {
+		memset(vb->page_bitmap, 0, vb->bmap_len);
+		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
+			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
+		hdr->cmd = cpu_to_virtio16(vb->vdev, BALLOON_GET_FREE_PAGES);
+		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
+		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
+		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
+		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
+		if (!ret) {
+			hdr->flag = cpu_to_virtio16(vb->vdev,
+							BALLOON_FLAG_DONE);
+			if (pfn + vb->pfn_limit > max_pfn)
+				bmap_len = (max_pfn - pfn) / BITS_PER_BYTE;
+		} else
+			hdr->flag = cpu_to_virtio16(vb->vdev,
+							BALLOON_FLAG_CONT);
+		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
+		sg_init_one(&sg_out, hdr,
+			 sizeof(struct balloon_bmap_hdr) + bmap_len);
+
+		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
+		virtqueue_kick(vq);
+		pfn += vb->pfn_limit;
+	}
+
+	sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+	virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL);
+	virtqueue_kick(vq);
+	mutex_unlock(&vb->balloon_lock);
+}
+
 /*
  * While most virtqueues communicate guest-initiated requests to the hypervisor,
  * the stats queue operates in reverse.  The driver initializes the virtqueue
@@ -511,18 +559,49 @@ static void update_balloon_size_func(struct work_struct *work)
 		queue_work(system_freezable_wq, work);
 }
 
+static void misc_handle_rq(struct virtio_balloon *vb)
+{
+	struct balloon_req_hdr *ptr_hdr;
+	unsigned int len;
+
+	ptr_hdr = virtqueue_get_buf(vb->misc_vq, &len);
+	if (!ptr_hdr || len != sizeof(vb->req_hdr))
+		return;
+
+	switch (ptr_hdr->cmd) {
+	case BALLOON_GET_FREE_PAGES:
+		update_free_pages_stats(vb, ptr_hdr->param);
+		break;
+	default:
+		break;
+	}
+}
+
+static void misc_request(struct virtqueue *vq)
+{
+	struct virtio_balloon *vb = vq->vdev->priv;
+
+	misc_handle_rq(vb);
+}
+
 static int init_vqs(struct virtio_balloon *vb)
 {
-	struct virtqueue *vqs[3];
-	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
-	static const char * const names[] = { "inflate", "deflate", "stats" };
+	struct virtqueue *vqs[4];
+	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
+					 stats_request, misc_request };
+	static const char * const names[] = { "inflate", "deflate", "stats",
+						 "misc" };
 	int err, nvqs;
 
 	/*
 	 * We expect two virtqueues: inflate and deflate, and
 	 * optionally stat.
 	 */
-	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
+		nvqs = 4;
+	else
+		nvqs = virtio_has_feature(vb->vdev,
+					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
 	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
 	if (err)
 		return err;
@@ -543,6 +622,16 @@ static int init_vqs(struct virtio_balloon *vb)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
 	}
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) {
+		struct scatterlist sg_in;
+
+		vb->misc_vq = vqs[3];
+		sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+		if (virtqueue_add_inbuf(vb->misc_vq, &sg_in, 1,
+		    &vb->req_hdr, GFP_KERNEL) < 0)
+			BUG();
+		virtqueue_kick(vb->misc_vq);
+	}
 	return 0;
 }
 
@@ -639,8 +728,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
 
 	/* Clear the feature bit if memory allocation fails */
-	if (!vb->bmap_hdr)
+	if (!vb->bmap_hdr) {
 		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_MISC_VQ);
+	}
 	else
 		vb->page_bitmap = vb->bmap_hdr + hdr_len;
 	mutex_init(&vb->balloon_lock);
@@ -743,6 +834,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
 	VIRTIO_BALLOON_F_PAGE_BITMAP,
+	VIRTIO_BALLOON_F_MISC_VQ,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
  2016-07-27  1:23 ` Liang Li
                   ` (14 preceding siblings ...)
  (?)
@ 2016-07-27  1:23 ` Liang Li
  -1 siblings, 0 replies; 171+ messages in thread
From: Liang Li @ 2016-07-27  1:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Liang Li, qemu-devel, dgilbert,
	linux-mm, Michael S. Tsirkin, Paolo Bonzini, Andrew Morton,
	virtualization, Mel Gorman, Vlastimil Babka

Support the request for vm's free page information, response with
a page bitmap. QEMU can make use of this free page bitmap to speed
up live migration process by skipping process the free pages.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
---
 drivers/virtio/virtio_balloon.c | 104 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 98 insertions(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 2d18ff6..5ca4ad3 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -62,10 +62,13 @@ module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
 
 extern unsigned long get_max_pfn(void);
+extern int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
+		unsigned long *bitmap, unsigned long len);
+
 
 struct virtio_balloon {
 	struct virtio_device *vdev;
-	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *misc_vq;
 
 	/* The balloon servicing is delegated to a freezable workqueue. */
 	struct work_struct update_balloon_stats_work;
@@ -89,6 +92,8 @@ struct virtio_balloon {
 	unsigned long pfn_limit;
 	/* Used to record the processed pfn range */
 	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
+	/* Request header */
+	struct balloon_req_hdr req_hdr;
 	/*
 	 * The pages we've told the Host we're not using are enqueued
 	 * at vb_dev_info->pages list.
@@ -373,6 +378,49 @@ static void update_balloon_stats(struct virtio_balloon *vb)
 				pages_to_bytes(available));
 }
 
+static void update_free_pages_stats(struct virtio_balloon *vb,
+				unsigned long req_id)
+{
+	struct scatterlist sg_in, sg_out;
+	unsigned long pfn = 0, bmap_len, max_pfn;
+	struct virtqueue *vq = vb->misc_vq;
+	struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
+	int ret = 1;
+
+	max_pfn = get_max_pfn();
+	mutex_lock(&vb->balloon_lock);
+	while (pfn < max_pfn) {
+		memset(vb->page_bitmap, 0, vb->bmap_len);
+		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
+			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
+		hdr->cmd = cpu_to_virtio16(vb->vdev, BALLOON_GET_FREE_PAGES);
+		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
+		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
+		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
+		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
+		if (!ret) {
+			hdr->flag = cpu_to_virtio16(vb->vdev,
+							BALLOON_FLAG_DONE);
+			if (pfn + vb->pfn_limit > max_pfn)
+				bmap_len = (max_pfn - pfn) / BITS_PER_BYTE;
+		} else
+			hdr->flag = cpu_to_virtio16(vb->vdev,
+							BALLOON_FLAG_CONT);
+		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
+		sg_init_one(&sg_out, hdr,
+			 sizeof(struct balloon_bmap_hdr) + bmap_len);
+
+		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
+		virtqueue_kick(vq);
+		pfn += vb->pfn_limit;
+	}
+
+	sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+	virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL);
+	virtqueue_kick(vq);
+	mutex_unlock(&vb->balloon_lock);
+}
+
 /*
  * While most virtqueues communicate guest-initiated requests to the hypervisor,
  * the stats queue operates in reverse.  The driver initializes the virtqueue
@@ -511,18 +559,49 @@ static void update_balloon_size_func(struct work_struct *work)
 		queue_work(system_freezable_wq, work);
 }
 
+static void misc_handle_rq(struct virtio_balloon *vb)
+{
+	struct balloon_req_hdr *ptr_hdr;
+	unsigned int len;
+
+	ptr_hdr = virtqueue_get_buf(vb->misc_vq, &len);
+	if (!ptr_hdr || len != sizeof(vb->req_hdr))
+		return;
+
+	switch (ptr_hdr->cmd) {
+	case BALLOON_GET_FREE_PAGES:
+		update_free_pages_stats(vb, ptr_hdr->param);
+		break;
+	default:
+		break;
+	}
+}
+
+static void misc_request(struct virtqueue *vq)
+{
+	struct virtio_balloon *vb = vq->vdev->priv;
+
+	misc_handle_rq(vb);
+}
+
 static int init_vqs(struct virtio_balloon *vb)
 {
-	struct virtqueue *vqs[3];
-	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
-	static const char * const names[] = { "inflate", "deflate", "stats" };
+	struct virtqueue *vqs[4];
+	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
+					 stats_request, misc_request };
+	static const char * const names[] = { "inflate", "deflate", "stats",
+						 "misc" };
 	int err, nvqs;
 
 	/*
 	 * We expect two virtqueues: inflate and deflate, and
 	 * optionally stat.
 	 */
-	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
+		nvqs = 4;
+	else
+		nvqs = virtio_has_feature(vb->vdev,
+					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
 	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
 	if (err)
 		return err;
@@ -543,6 +622,16 @@ static int init_vqs(struct virtio_balloon *vb)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
 	}
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) {
+		struct scatterlist sg_in;
+
+		vb->misc_vq = vqs[3];
+		sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+		if (virtqueue_add_inbuf(vb->misc_vq, &sg_in, 1,
+		    &vb->req_hdr, GFP_KERNEL) < 0)
+			BUG();
+		virtqueue_kick(vb->misc_vq);
+	}
 	return 0;
 }
 
@@ -639,8 +728,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
 
 	/* Clear the feature bit if memory allocation fails */
-	if (!vb->bmap_hdr)
+	if (!vb->bmap_hdr) {
 		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_MISC_VQ);
+	}
 	else
 		vb->page_bitmap = vb->bmap_hdr + hdr_len;
 	mutex_init(&vb->balloon_lock);
@@ -743,6 +834,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
 	VIRTIO_BALLOON_F_PAGE_BITMAP,
+	VIRTIO_BALLOON_F_MISC_VQ,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27  1:23   ` Liang Li
  (?)
  (?)
@ 2016-07-27 16:03     ` Dave Hansen
  -1 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 16:03 UTC (permalink / raw)
  To: Liang Li, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Michael S. Tsirkin, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/26/2016 06:23 PM, Liang Li wrote:
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);

This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How
big was the pfn buffer before?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27 16:03     ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 16:03 UTC (permalink / raw)
  To: Liang Li, linux-kernel
  Cc: virtio-dev, kvm, quintela, Amit Shah, Michael S. Tsirkin,
	qemu-devel, dgilbert, linux-mm, Cornelia Huck, Paolo Bonzini,
	Andrew Morton, virtualization, Mel Gorman, Vlastimil Babka

On 07/26/2016 06:23 PM, Liang Li wrote:
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);

This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How
big was the pfn buffer before?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27 16:03     ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 16:03 UTC (permalink / raw)
  To: Liang Li, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Michael S. Tsirkin, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/26/2016 06:23 PM, Liang Li wrote:
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);

This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How
big was the pfn buffer before?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27 16:03     ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 16:03 UTC (permalink / raw)
  To: Liang Li, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Michael S. Tsirkin, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/26/2016 06:23 PM, Liang Li wrote:
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);

This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How
big was the pfn buffer before?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27  1:23   ` Liang Li
  (?)
  (?)
@ 2016-07-27 16:03   ` Dave Hansen
  -1 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 16:03 UTC (permalink / raw)
  To: Liang Li, linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Michael S. Tsirkin, qemu-devel,
	dgilbert, linux-mm, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, Vlastimil Babka

On 07/26/2016 06:23 PM, Liang Li wrote:
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);

This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How
big was the pfn buffer before?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27  1:23   ` Liang Li
  (?)
@ 2016-07-27 16:40     ` Dave Hansen
  -1 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 16:40 UTC (permalink / raw)
  To: Liang Li, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/26/2016 06:23 PM, Liang Li wrote:
> +	for_each_migratetype_order(order, t) {
> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> +				page_num = 1UL << order;
> +				if (pfn + page_num > end_pfn)
> +					page_num = end_pfn - pfn;
> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> +			}
> +		}
> +	}

Nit:  The 'page_num' nomenclature really confused me here.  It is the
number of bits being set in the bitmap.  Seems like calling it nr_pages
or num_pages would be more appropriate.

Isn't this bitmap out of date by the time it's send up to the
hypervisor?  Is there something that makes the inaccuracy OK here?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27 16:40     ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 16:40 UTC (permalink / raw)
  To: Liang Li, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/26/2016 06:23 PM, Liang Li wrote:
> +	for_each_migratetype_order(order, t) {
> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> +				page_num = 1UL << order;
> +				if (pfn + page_num > end_pfn)
> +					page_num = end_pfn - pfn;
> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> +			}
> +		}
> +	}

Nit:  The 'page_num' nomenclature really confused me here.  It is the
number of bits being set in the bitmap.  Seems like calling it nr_pages
or num_pages would be more appropriate.

Isn't this bitmap out of date by the time it's send up to the
hypervisor?  Is there something that makes the inaccuracy OK here?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27 16:40     ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 16:40 UTC (permalink / raw)
  To: Liang Li, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/26/2016 06:23 PM, Liang Li wrote:
> +	for_each_migratetype_order(order, t) {
> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> +				page_num = 1UL << order;
> +				if (pfn + page_num > end_pfn)
> +					page_num = end_pfn - pfn;
> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> +			}
> +		}
> +	}

Nit:  The 'page_num' nomenclature really confused me here.  It is the
number of bits being set in the bitmap.  Seems like calling it nr_pages
or num_pages would be more appropriate.

Isn't this bitmap out of date by the time it's send up to the
hypervisor?  Is there something that makes the inaccuracy OK here?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27  1:23   ` Liang Li
  (?)
  (?)
@ 2016-07-27 16:40   ` Dave Hansen
  -1 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 16:40 UTC (permalink / raw)
  To: Liang Li, linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Michael S. Tsirkin, qemu-devel,
	dgilbert, linux-mm, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, Vlastimil Babka

On 07/26/2016 06:23 PM, Liang Li wrote:
> +	for_each_migratetype_order(order, t) {
> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> +				page_num = 1UL << order;
> +				if (pfn + page_num > end_pfn)
> +					page_num = end_pfn - pfn;
> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> +			}
> +		}
> +	}

Nit:  The 'page_num' nomenclature really confused me here.  It is the
number of bits being set in the bitmap.  Seems like calling it nr_pages
or num_pages would be more appropriate.

Isn't this bitmap out of date by the time it's send up to the
hypervisor?  Is there something that makes the inaccuracy OK here?

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27  1:23   ` Liang Li
  (?)
@ 2016-07-27 21:36     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 21:36 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:33AM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very
> efficient, the time spends on different stages of inflating
> the balloon to 7GB of a 8GB idle guest:
> 
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> It takes about 4126ms for the inflating process to complete.
> Debugging shows that the bottle neck are the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we
> can reduce the overhead in stage b quite a lot. Furthermore, we
> can do the address translation and call madvise() with a bulk of
> RAM pages, instead of the current page per page way, the overhead
> of stage c and stage d can also be reduced a lot.
> 
> This patch is the kernel side implementation which is intended to
> speed up the inflating & deflating process by adding a new feature
> to the virtio-balloon device. With this new feature, inflating the
> balloon to 7GB of a 8GB idle guest only takes 590ms, the
> performance improvement is about 85%.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages
> instead of a single page at a time.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 162 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8d649a2..2d18ff6 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -41,10 +41,28 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +/*
> + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> + * to prevent a very large page bitmap, there are two reasons for this:
> + * 1) to save memory.
> + * 2) allocate a large bitmap may fail.
> + *
> + * The actual limit of pfn is determined by:
> + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> + *
> + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
> + * the page list and send the PFNs with several times. To reduce the
> + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
> + * be set with a value which can cover most cases.

So what if it covers 1/32 of the memory? We'll do 32 exits and not 1,
still not a big deal for a big guest.

> + */
> +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */

I already said this with a smaller limit.

	2<< 30  is 2G but that is not a useful comment.
	pls explain what is the reason for this selection.

Still applies here.


> +
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
> +extern unsigned long get_max_pfn(void);
> +
>  struct virtio_balloon {
>  	struct virtio_device *vdev;
>  	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -62,6 +80,15 @@ struct virtio_balloon {
>  
>  	/* Number of balloon pages we've told the Host we're not using. */
>  	unsigned int num_pages;
> +	/* Pointer of the bitmap header. */
> +	void *bmap_hdr;
> +	/* Bitmap and length used to tell the host the pages */
> +	unsigned long *page_bitmap;
> +	unsigned long bmap_len;
> +	/* Pfn limit */
> +	unsigned long pfn_limit;
> +	/* Used to record the processed pfn range */
> +	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
>  	wake_up(&vb->acked);
>  }
>  
> +static inline void init_pfn_range(struct virtio_balloon *vb)
> +{
> +	vb->min_pfn = ULONG_MAX;
> +	vb->max_pfn = 0;
> +}
> +
> +static inline void update_pfn_range(struct virtio_balloon *vb,
> +				 struct page *page)
> +{
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	if (balloon_pfn < vb->min_pfn)
> +		vb->min_pfn = balloon_pfn;
> +	if (balloon_pfn > vb->max_pfn)
> +		vb->max_pfn = balloon_pfn;
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
>  	struct scatterlist sg;
>  	unsigned int len;
>  
> -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +		unsigned long bmap_len;
> +
> +		/* cmd and req_id are not used here, set them to 0 */
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);

no need to swap 0, just fill it in. in fact you allocated all 0s
so no need to touch these fields at all.

> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
> +		bmap_len = min(vb->bmap_len,
> +			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> +	} else
> +		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
>  
>  	/* We should always be able to add one buffer to an empty queue. */
>  	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> @@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  
>  	/* When host has read buffer, this completes via balloon_ack */
>  	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> -
>  }
>  
>  static void set_page_pfns(struct virtio_balloon *vb,
> @@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static void set_page_bitmap(struct virtio_balloon *vb,
> +			 struct list_head *pages, struct virtqueue *vq)
> +{
> +	unsigned long pfn;
> +	struct page *page;
> +	bool found;
> +
> +	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
> +	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
> +	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
> +			pfn += vb->pfn_limit) {
> +		vb->start_pfn = pfn + vb->pfn_limit;
> +		vb->end_pfn = pfn;
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		found = false;
> +		list_for_each_entry(page, pages, lru) {
> +			unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +			if (balloon_pfn < pfn ||
> +				 balloon_pfn >= pfn + vb->pfn_limit)
> +				continue;
> +			set_bit(balloon_pfn - pfn, vb->page_bitmap);
> +			if (balloon_pfn > vb->end_pfn)
> +				vb->end_pfn = balloon_pfn;
> +			if (balloon_pfn < vb->start_pfn)
> +				vb->start_pfn = balloon_pfn;
> +			found = true;
> +		}
> +		if (found) {
> +			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> +			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> +			tell_host(vb, vq);
> +		}
> +	}
> +}
> +
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
> +				 bool use_bmap)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
> -	unsigned num_allocated_pages;
> +	unsigned int num_allocated_pages;
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  			msleep(200);
>  			break;
>  		}
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  		if (!virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> @@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  
>  	num_allocated_pages = vb->num_pfns;
>  	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &vb_dev_info->pages,
> +					vb->inflate_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  
>  	return num_allocated_pages;
> @@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
> +				bool use_bmap)
>  {
> -	unsigned num_freed_pages;
> +	unsigned int num_freed_pages;
>  	struct page *page;
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	LIST_HEAD(pages);
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  		page = balloon_page_dequeue(vb_dev_info);
>  		if (!page)
>  			break;
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		list_add(&page->lru, &pages);
>  		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
>  	}
> @@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
>  	 * is true, we *have* to do it in this order
>  	 */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->deflate_vq);
> -	release_pages_balloon(vb, &pages);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &pages, vb->deflate_vq);
> +		else
> +			tell_host(vb, vb->deflate_vq);
> +
> +		release_pages_balloon(vb, &pages);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
>  }
> @@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	struct virtio_balloon *vb;
>  	unsigned long *freed;
>  	unsigned num_freed_pages;
> +	bool use_bmap;
>  
>  	vb = container_of(self, struct virtio_balloon, nb);
>  	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
>  {
>  	struct virtio_balloon *vb;
>  	s64 diff;
> +	bool use_bmap;
>  
>  	vb = container_of(work, struct virtio_balloon,
>  			  update_balloon_size_work);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	diff = towards_target(vb);
>  
>  	if (diff > 0)
> -		diff -= fill_balloon(vb, diff);
> +		diff -= fill_balloon(vb, diff, use_bmap);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, use_bmap);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> -	int err;
> +	int err, hdr_len;
>  
>  	if (!vdev->config->get) {
>  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);

What are these 2 longs in aid of?

> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);

So it can go up to 1MByte but adding header size etc you need a higher
order allocation. This is a waste, there is no need to have a power of
two allocation. Start from the other side. Say "I want to allocate
32KBytes for the bitmap". Subtract the header and you get bitmap size.
Calculate the pfn limit from there.


> +
> +	/* Clear the feature bit if memory allocation fails */
> +	if (!vb->bmap_hdr)
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	else
> +		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
> @@ -541,9 +676,12 @@ out:
>  
>  static void remove_common(struct virtio_balloon *vb)
>  {
> +	bool use_bmap;
> +
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, use_bmap);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> @@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	cancel_work_sync(&vb->update_balloon_stats_work);
>  
>  	remove_common(vb);
> +	kfree(vb->page_bitmap);
>  	kfree(vb);
>  }
>  
> @@ -603,6 +742,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
> +	VIRTIO_BALLOON_F_PAGE_BITMAP,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27 21:36     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 21:36 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:33AM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very
> efficient, the time spends on different stages of inflating
> the balloon to 7GB of a 8GB idle guest:
> 
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> It takes about 4126ms for the inflating process to complete.
> Debugging shows that the bottle neck are the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we
> can reduce the overhead in stage b quite a lot. Furthermore, we
> can do the address translation and call madvise() with a bulk of
> RAM pages, instead of the current page per page way, the overhead
> of stage c and stage d can also be reduced a lot.
> 
> This patch is the kernel side implementation which is intended to
> speed up the inflating & deflating process by adding a new feature
> to the virtio-balloon device. With this new feature, inflating the
> balloon to 7GB of a 8GB idle guest only takes 590ms, the
> performance improvement is about 85%.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages
> instead of a single page at a time.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 162 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8d649a2..2d18ff6 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -41,10 +41,28 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +/*
> + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> + * to prevent a very large page bitmap, there are two reasons for this:
> + * 1) to save memory.
> + * 2) allocate a large bitmap may fail.
> + *
> + * The actual limit of pfn is determined by:
> + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> + *
> + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
> + * the page list and send the PFNs with several times. To reduce the
> + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
> + * be set with a value which can cover most cases.

So what if it covers 1/32 of the memory? We'll do 32 exits and not 1,
still not a big deal for a big guest.

> + */
> +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */

I already said this with a smaller limit.

	2<< 30  is 2G but that is not a useful comment.
	pls explain what is the reason for this selection.

Still applies here.


> +
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
> +extern unsigned long get_max_pfn(void);
> +
>  struct virtio_balloon {
>  	struct virtio_device *vdev;
>  	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -62,6 +80,15 @@ struct virtio_balloon {
>  
>  	/* Number of balloon pages we've told the Host we're not using. */
>  	unsigned int num_pages;
> +	/* Pointer of the bitmap header. */
> +	void *bmap_hdr;
> +	/* Bitmap and length used to tell the host the pages */
> +	unsigned long *page_bitmap;
> +	unsigned long bmap_len;
> +	/* Pfn limit */
> +	unsigned long pfn_limit;
> +	/* Used to record the processed pfn range */
> +	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
>  	wake_up(&vb->acked);
>  }
>  
> +static inline void init_pfn_range(struct virtio_balloon *vb)
> +{
> +	vb->min_pfn = ULONG_MAX;
> +	vb->max_pfn = 0;
> +}
> +
> +static inline void update_pfn_range(struct virtio_balloon *vb,
> +				 struct page *page)
> +{
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	if (balloon_pfn < vb->min_pfn)
> +		vb->min_pfn = balloon_pfn;
> +	if (balloon_pfn > vb->max_pfn)
> +		vb->max_pfn = balloon_pfn;
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
>  	struct scatterlist sg;
>  	unsigned int len;
>  
> -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +		unsigned long bmap_len;
> +
> +		/* cmd and req_id are not used here, set them to 0 */
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);

no need to swap 0, just fill it in. in fact you allocated all 0s
so no need to touch these fields at all.

> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
> +		bmap_len = min(vb->bmap_len,
> +			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> +	} else
> +		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
>  
>  	/* We should always be able to add one buffer to an empty queue. */
>  	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> @@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  
>  	/* When host has read buffer, this completes via balloon_ack */
>  	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> -
>  }
>  
>  static void set_page_pfns(struct virtio_balloon *vb,
> @@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static void set_page_bitmap(struct virtio_balloon *vb,
> +			 struct list_head *pages, struct virtqueue *vq)
> +{
> +	unsigned long pfn;
> +	struct page *page;
> +	bool found;
> +
> +	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
> +	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
> +	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
> +			pfn += vb->pfn_limit) {
> +		vb->start_pfn = pfn + vb->pfn_limit;
> +		vb->end_pfn = pfn;
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		found = false;
> +		list_for_each_entry(page, pages, lru) {
> +			unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +			if (balloon_pfn < pfn ||
> +				 balloon_pfn >= pfn + vb->pfn_limit)
> +				continue;
> +			set_bit(balloon_pfn - pfn, vb->page_bitmap);
> +			if (balloon_pfn > vb->end_pfn)
> +				vb->end_pfn = balloon_pfn;
> +			if (balloon_pfn < vb->start_pfn)
> +				vb->start_pfn = balloon_pfn;
> +			found = true;
> +		}
> +		if (found) {
> +			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> +			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> +			tell_host(vb, vq);
> +		}
> +	}
> +}
> +
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
> +				 bool use_bmap)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
> -	unsigned num_allocated_pages;
> +	unsigned int num_allocated_pages;
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  			msleep(200);
>  			break;
>  		}
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  		if (!virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> @@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  
>  	num_allocated_pages = vb->num_pfns;
>  	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &vb_dev_info->pages,
> +					vb->inflate_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  
>  	return num_allocated_pages;
> @@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
> +				bool use_bmap)
>  {
> -	unsigned num_freed_pages;
> +	unsigned int num_freed_pages;
>  	struct page *page;
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	LIST_HEAD(pages);
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  		page = balloon_page_dequeue(vb_dev_info);
>  		if (!page)
>  			break;
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		list_add(&page->lru, &pages);
>  		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
>  	}
> @@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
>  	 * is true, we *have* to do it in this order
>  	 */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->deflate_vq);
> -	release_pages_balloon(vb, &pages);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &pages, vb->deflate_vq);
> +		else
> +			tell_host(vb, vb->deflate_vq);
> +
> +		release_pages_balloon(vb, &pages);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
>  }
> @@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	struct virtio_balloon *vb;
>  	unsigned long *freed;
>  	unsigned num_freed_pages;
> +	bool use_bmap;
>  
>  	vb = container_of(self, struct virtio_balloon, nb);
>  	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
>  {
>  	struct virtio_balloon *vb;
>  	s64 diff;
> +	bool use_bmap;
>  
>  	vb = container_of(work, struct virtio_balloon,
>  			  update_balloon_size_work);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	diff = towards_target(vb);
>  
>  	if (diff > 0)
> -		diff -= fill_balloon(vb, diff);
> +		diff -= fill_balloon(vb, diff, use_bmap);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, use_bmap);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> -	int err;
> +	int err, hdr_len;
>  
>  	if (!vdev->config->get) {
>  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);

What are these 2 longs in aid of?

> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);

So it can go up to 1MByte but adding header size etc you need a higher
order allocation. This is a waste, there is no need to have a power of
two allocation. Start from the other side. Say "I want to allocate
32KBytes for the bitmap". Subtract the header and you get bitmap size.
Calculate the pfn limit from there.


> +
> +	/* Clear the feature bit if memory allocation fails */
> +	if (!vb->bmap_hdr)
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	else
> +		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
> @@ -541,9 +676,12 @@ out:
>  
>  static void remove_common(struct virtio_balloon *vb)
>  {
> +	bool use_bmap;
> +
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, use_bmap);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> @@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	cancel_work_sync(&vb->update_balloon_stats_work);
>  
>  	remove_common(vb);
> +	kfree(vb->page_bitmap);
>  	kfree(vb);
>  }
>  
> @@ -603,6 +742,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
> +	VIRTIO_BALLOON_F_PAGE_BITMAP,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27 21:36     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 21:36 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:33AM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very
> efficient, the time spends on different stages of inflating
> the balloon to 7GB of a 8GB idle guest:
> 
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> It takes about 4126ms for the inflating process to complete.
> Debugging shows that the bottle neck are the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we
> can reduce the overhead in stage b quite a lot. Furthermore, we
> can do the address translation and call madvise() with a bulk of
> RAM pages, instead of the current page per page way, the overhead
> of stage c and stage d can also be reduced a lot.
> 
> This patch is the kernel side implementation which is intended to
> speed up the inflating & deflating process by adding a new feature
> to the virtio-balloon device. With this new feature, inflating the
> balloon to 7GB of a 8GB idle guest only takes 590ms, the
> performance improvement is about 85%.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages
> instead of a single page at a time.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 162 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8d649a2..2d18ff6 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -41,10 +41,28 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +/*
> + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> + * to prevent a very large page bitmap, there are two reasons for this:
> + * 1) to save memory.
> + * 2) allocate a large bitmap may fail.
> + *
> + * The actual limit of pfn is determined by:
> + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> + *
> + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
> + * the page list and send the PFNs with several times. To reduce the
> + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
> + * be set with a value which can cover most cases.

So what if it covers 1/32 of the memory? We'll do 32 exits and not 1,
still not a big deal for a big guest.

> + */
> +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */

I already said this with a smaller limit.

	2<< 30  is 2G but that is not a useful comment.
	pls explain what is the reason for this selection.

Still applies here.


> +
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
> +extern unsigned long get_max_pfn(void);
> +
>  struct virtio_balloon {
>  	struct virtio_device *vdev;
>  	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -62,6 +80,15 @@ struct virtio_balloon {
>  
>  	/* Number of balloon pages we've told the Host we're not using. */
>  	unsigned int num_pages;
> +	/* Pointer of the bitmap header. */
> +	void *bmap_hdr;
> +	/* Bitmap and length used to tell the host the pages */
> +	unsigned long *page_bitmap;
> +	unsigned long bmap_len;
> +	/* Pfn limit */
> +	unsigned long pfn_limit;
> +	/* Used to record the processed pfn range */
> +	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
>  	wake_up(&vb->acked);
>  }
>  
> +static inline void init_pfn_range(struct virtio_balloon *vb)
> +{
> +	vb->min_pfn = ULONG_MAX;
> +	vb->max_pfn = 0;
> +}
> +
> +static inline void update_pfn_range(struct virtio_balloon *vb,
> +				 struct page *page)
> +{
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	if (balloon_pfn < vb->min_pfn)
> +		vb->min_pfn = balloon_pfn;
> +	if (balloon_pfn > vb->max_pfn)
> +		vb->max_pfn = balloon_pfn;
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
>  	struct scatterlist sg;
>  	unsigned int len;
>  
> -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +		unsigned long bmap_len;
> +
> +		/* cmd and req_id are not used here, set them to 0 */
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);

no need to swap 0, just fill it in. in fact you allocated all 0s
so no need to touch these fields at all.

> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
> +		bmap_len = min(vb->bmap_len,
> +			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> +	} else
> +		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
>  
>  	/* We should always be able to add one buffer to an empty queue. */
>  	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> @@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  
>  	/* When host has read buffer, this completes via balloon_ack */
>  	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> -
>  }
>  
>  static void set_page_pfns(struct virtio_balloon *vb,
> @@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static void set_page_bitmap(struct virtio_balloon *vb,
> +			 struct list_head *pages, struct virtqueue *vq)
> +{
> +	unsigned long pfn;
> +	struct page *page;
> +	bool found;
> +
> +	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
> +	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
> +	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
> +			pfn += vb->pfn_limit) {
> +		vb->start_pfn = pfn + vb->pfn_limit;
> +		vb->end_pfn = pfn;
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		found = false;
> +		list_for_each_entry(page, pages, lru) {
> +			unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +			if (balloon_pfn < pfn ||
> +				 balloon_pfn >= pfn + vb->pfn_limit)
> +				continue;
> +			set_bit(balloon_pfn - pfn, vb->page_bitmap);
> +			if (balloon_pfn > vb->end_pfn)
> +				vb->end_pfn = balloon_pfn;
> +			if (balloon_pfn < vb->start_pfn)
> +				vb->start_pfn = balloon_pfn;
> +			found = true;
> +		}
> +		if (found) {
> +			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> +			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> +			tell_host(vb, vq);
> +		}
> +	}
> +}
> +
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
> +				 bool use_bmap)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
> -	unsigned num_allocated_pages;
> +	unsigned int num_allocated_pages;
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  			msleep(200);
>  			break;
>  		}
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  		if (!virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> @@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  
>  	num_allocated_pages = vb->num_pfns;
>  	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &vb_dev_info->pages,
> +					vb->inflate_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  
>  	return num_allocated_pages;
> @@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
> +				bool use_bmap)
>  {
> -	unsigned num_freed_pages;
> +	unsigned int num_freed_pages;
>  	struct page *page;
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	LIST_HEAD(pages);
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  		page = balloon_page_dequeue(vb_dev_info);
>  		if (!page)
>  			break;
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		list_add(&page->lru, &pages);
>  		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
>  	}
> @@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
>  	 * is true, we *have* to do it in this order
>  	 */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->deflate_vq);
> -	release_pages_balloon(vb, &pages);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &pages, vb->deflate_vq);
> +		else
> +			tell_host(vb, vb->deflate_vq);
> +
> +		release_pages_balloon(vb, &pages);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
>  }
> @@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	struct virtio_balloon *vb;
>  	unsigned long *freed;
>  	unsigned num_freed_pages;
> +	bool use_bmap;
>  
>  	vb = container_of(self, struct virtio_balloon, nb);
>  	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
>  {
>  	struct virtio_balloon *vb;
>  	s64 diff;
> +	bool use_bmap;
>  
>  	vb = container_of(work, struct virtio_balloon,
>  			  update_balloon_size_work);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	diff = towards_target(vb);
>  
>  	if (diff > 0)
> -		diff -= fill_balloon(vb, diff);
> +		diff -= fill_balloon(vb, diff, use_bmap);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, use_bmap);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> -	int err;
> +	int err, hdr_len;
>  
>  	if (!vdev->config->get) {
>  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);

What are these 2 longs in aid of?

> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);

So it can go up to 1MByte but adding header size etc you need a higher
order allocation. This is a waste, there is no need to have a power of
two allocation. Start from the other side. Say "I want to allocate
32KBytes for the bitmap". Subtract the header and you get bitmap size.
Calculate the pfn limit from there.


> +
> +	/* Clear the feature bit if memory allocation fails */
> +	if (!vb->bmap_hdr)
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	else
> +		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
> @@ -541,9 +676,12 @@ out:
>  
>  static void remove_common(struct virtio_balloon *vb)
>  {
> +	bool use_bmap;
> +
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, use_bmap);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> @@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	cancel_work_sync(&vb->update_balloon_stats_work);
>  
>  	remove_common(vb);
> +	kfree(vb->page_bitmap);
>  	kfree(vb);
>  }
>  
> @@ -603,6 +742,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
> +	VIRTIO_BALLOON_F_PAGE_BITMAP,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27  1:23   ` Liang Li
                     ` (3 preceding siblings ...)
  (?)
@ 2016-07-27 21:36   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 21:36 UTC (permalink / raw)
  To: Liang Li
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

On Wed, Jul 27, 2016 at 09:23:33AM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very
> efficient, the time spends on different stages of inflating
> the balloon to 7GB of a 8GB idle guest:
> 
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> It takes about 4126ms for the inflating process to complete.
> Debugging shows that the bottle neck are the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we
> can reduce the overhead in stage b quite a lot. Furthermore, we
> can do the address translation and call madvise() with a bulk of
> RAM pages, instead of the current page per page way, the overhead
> of stage c and stage d can also be reduced a lot.
> 
> This patch is the kernel side implementation which is intended to
> speed up the inflating & deflating process by adding a new feature
> to the virtio-balloon device. With this new feature, inflating the
> balloon to 7GB of a 8GB idle guest only takes 590ms, the
> performance improvement is about 85%.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages
> instead of a single page at a time.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 162 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8d649a2..2d18ff6 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -41,10 +41,28 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +/*
> + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> + * to prevent a very large page bitmap, there are two reasons for this:
> + * 1) to save memory.
> + * 2) allocate a large bitmap may fail.
> + *
> + * The actual limit of pfn is determined by:
> + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> + *
> + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
> + * the page list and send the PFNs with several times. To reduce the
> + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
> + * be set with a value which can cover most cases.

So what if it covers 1/32 of the memory? We'll do 32 exits and not 1,
still not a big deal for a big guest.

> + */
> +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */

I already said this with a smaller limit.

	2<< 30  is 2G but that is not a useful comment.
	pls explain what is the reason for this selection.

Still applies here.


> +
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
> +extern unsigned long get_max_pfn(void);
> +
>  struct virtio_balloon {
>  	struct virtio_device *vdev;
>  	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -62,6 +80,15 @@ struct virtio_balloon {
>  
>  	/* Number of balloon pages we've told the Host we're not using. */
>  	unsigned int num_pages;
> +	/* Pointer of the bitmap header. */
> +	void *bmap_hdr;
> +	/* Bitmap and length used to tell the host the pages */
> +	unsigned long *page_bitmap;
> +	unsigned long bmap_len;
> +	/* Pfn limit */
> +	unsigned long pfn_limit;
> +	/* Used to record the processed pfn range */
> +	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
>  	wake_up(&vb->acked);
>  }
>  
> +static inline void init_pfn_range(struct virtio_balloon *vb)
> +{
> +	vb->min_pfn = ULONG_MAX;
> +	vb->max_pfn = 0;
> +}
> +
> +static inline void update_pfn_range(struct virtio_balloon *vb,
> +				 struct page *page)
> +{
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	if (balloon_pfn < vb->min_pfn)
> +		vb->min_pfn = balloon_pfn;
> +	if (balloon_pfn > vb->max_pfn)
> +		vb->max_pfn = balloon_pfn;
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
>  	struct scatterlist sg;
>  	unsigned int len;
>  
> -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +		unsigned long bmap_len;
> +
> +		/* cmd and req_id are not used here, set them to 0 */
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);

no need to swap 0, just fill it in. in fact you allocated all 0s
so no need to touch these fields at all.

> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
> +		bmap_len = min(vb->bmap_len,
> +			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> +	} else
> +		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
>  
>  	/* We should always be able to add one buffer to an empty queue. */
>  	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> @@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  
>  	/* When host has read buffer, this completes via balloon_ack */
>  	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> -
>  }
>  
>  static void set_page_pfns(struct virtio_balloon *vb,
> @@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static void set_page_bitmap(struct virtio_balloon *vb,
> +			 struct list_head *pages, struct virtqueue *vq)
> +{
> +	unsigned long pfn;
> +	struct page *page;
> +	bool found;
> +
> +	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
> +	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
> +	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
> +			pfn += vb->pfn_limit) {
> +		vb->start_pfn = pfn + vb->pfn_limit;
> +		vb->end_pfn = pfn;
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		found = false;
> +		list_for_each_entry(page, pages, lru) {
> +			unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +			if (balloon_pfn < pfn ||
> +				 balloon_pfn >= pfn + vb->pfn_limit)
> +				continue;
> +			set_bit(balloon_pfn - pfn, vb->page_bitmap);
> +			if (balloon_pfn > vb->end_pfn)
> +				vb->end_pfn = balloon_pfn;
> +			if (balloon_pfn < vb->start_pfn)
> +				vb->start_pfn = balloon_pfn;
> +			found = true;
> +		}
> +		if (found) {
> +			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> +			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> +			tell_host(vb, vq);
> +		}
> +	}
> +}
> +
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
> +				 bool use_bmap)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
> -	unsigned num_allocated_pages;
> +	unsigned int num_allocated_pages;
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  			msleep(200);
>  			break;
>  		}
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  		if (!virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> @@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  
>  	num_allocated_pages = vb->num_pfns;
>  	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &vb_dev_info->pages,
> +					vb->inflate_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  
>  	return num_allocated_pages;
> @@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
> +				bool use_bmap)
>  {
> -	unsigned num_freed_pages;
> +	unsigned int num_freed_pages;
>  	struct page *page;
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	LIST_HEAD(pages);
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  		page = balloon_page_dequeue(vb_dev_info);
>  		if (!page)
>  			break;
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		list_add(&page->lru, &pages);
>  		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
>  	}
> @@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
>  	 * is true, we *have* to do it in this order
>  	 */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->deflate_vq);
> -	release_pages_balloon(vb, &pages);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &pages, vb->deflate_vq);
> +		else
> +			tell_host(vb, vb->deflate_vq);
> +
> +		release_pages_balloon(vb, &pages);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
>  }
> @@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	struct virtio_balloon *vb;
>  	unsigned long *freed;
>  	unsigned num_freed_pages;
> +	bool use_bmap;
>  
>  	vb = container_of(self, struct virtio_balloon, nb);
>  	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
>  {
>  	struct virtio_balloon *vb;
>  	s64 diff;
> +	bool use_bmap;
>  
>  	vb = container_of(work, struct virtio_balloon,
>  			  update_balloon_size_work);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	diff = towards_target(vb);
>  
>  	if (diff > 0)
> -		diff -= fill_balloon(vb, diff);
> +		diff -= fill_balloon(vb, diff, use_bmap);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, use_bmap);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> -	int err;
> +	int err, hdr_len;
>  
>  	if (!vdev->config->get) {
>  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);

What are these 2 longs in aid of?

> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);

So it can go up to 1MByte but adding header size etc you need a higher
order allocation. This is a waste, there is no need to have a power of
two allocation. Start from the other side. Say "I want to allocate
32KBytes for the bitmap". Subtract the header and you get bitmap size.
Calculate the pfn limit from there.


> +
> +	/* Clear the feature bit if memory allocation fails */
> +	if (!vb->bmap_hdr)
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	else
> +		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
> @@ -541,9 +676,12 @@ out:
>  
>  static void remove_common(struct virtio_balloon *vb)
>  {
> +	bool use_bmap;
> +
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, use_bmap);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> @@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	cancel_work_sync(&vb->update_balloon_stats_work);
>  
>  	remove_common(vb);
> +	kfree(vb->page_bitmap);
>  	kfree(vb);
>  }
>  
> @@ -603,6 +742,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
> +	VIRTIO_BALLOON_F_PAGE_BITMAP,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27 16:03     ` Dave Hansen
  (?)
@ 2016-07-27 21:39       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 21:39 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How
> big was the pfn buffer before?


Yes I would limit this to 1G memory in a go, will result
in a 32KByte bitmap.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27 21:39       ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 21:39 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How
> big was the pfn buffer before?


Yes I would limit this to 1G memory in a go, will result
in a 32KByte bitmap.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27 21:39       ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 21:39 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How
> big was the pfn buffer before?


Yes I would limit this to 1G memory in a go, will result
in a 32KByte bitmap.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27 16:03     ` Dave Hansen
                       ` (3 preceding siblings ...)
  (?)
@ 2016-07-27 21:39     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 21:39 UTC (permalink / raw)
  To: Dave Hansen
  Cc: virtio-dev, kvm, qemu-devel, Amit Shah, Liang Li, linux-kernel,
	virtualization, linux-mm, Vlastimil Babka, Paolo Bonzini,
	Andrew Morton, Mel Gorman, dgilbert

On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How
> big was the pfn buffer before?


Yes I would limit this to 1G memory in a go, will result
in a 32KByte bitmap.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
  2016-07-27  1:23   ` Liang Li
  (?)
  (?)
@ 2016-07-27 22:00     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:00 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:36AM +0800, Liang Li wrote:
> Support the request for vm's free page information, response with
> a page bitmap. QEMU can make use of this free page bitmap to speed
> up live migration process by skipping process the free pages.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 104 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 98 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 2d18ff6..5ca4ad3 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -62,10 +62,13 @@ module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
>  extern unsigned long get_max_pfn(void);
> +extern int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
> +		unsigned long *bitmap, unsigned long len);
> +
>  
>  struct virtio_balloon {
>  	struct virtio_device *vdev;
> -	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> +	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *misc_vq;
>  
>  	/* The balloon servicing is delegated to a freezable workqueue. */
>  	struct work_struct update_balloon_stats_work;
> @@ -89,6 +92,8 @@ struct virtio_balloon {
>  	unsigned long pfn_limit;
>  	/* Used to record the processed pfn range */
>  	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
> +	/* Request header */
> +	struct balloon_req_hdr req_hdr;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -373,6 +378,49 @@ static void update_balloon_stats(struct virtio_balloon *vb)
>  				pages_to_bytes(available));
>  }
>  
> +static void update_free_pages_stats(struct virtio_balloon *vb,

why _stats?

> +				unsigned long req_id)
> +{
> +	struct scatterlist sg_in, sg_out;
> +	unsigned long pfn = 0, bmap_len, max_pfn;
> +	struct virtqueue *vq = vb->misc_vq;
> +	struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +	int ret = 1;
> +
> +	max_pfn = get_max_pfn();
> +	mutex_lock(&vb->balloon_lock);
> +	while (pfn < max_pfn) {
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, BALLOON_GET_FREE_PAGES);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> +		if (!ret) {
> +			hdr->flag = cpu_to_virtio16(vb->vdev,
> +							BALLOON_FLAG_DONE);
> +			if (pfn + vb->pfn_limit > max_pfn)
> +				bmap_len = (max_pfn - pfn) / BITS_PER_BYTE;
> +		} else
> +			hdr->flag = cpu_to_virtio16(vb->vdev,
> +							BALLOON_FLAG_CONT);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg_out, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);

Wait a second. This adds the same buffer multiple times in a loop.
We will overwrite the buffer without waiting for
hypervisor to process it. What did I miss?
> +
> +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);

this can fail. you want to maybe make sure vq has enough space
before you use it or check error and wait.

> +		virtqueue_kick(vq);

why kick here within loop? wait until done. in fact kick
outside lock is better for smp.

> +		pfn += vb->pfn_limit;
> +	}
> +
> +	sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
> +	virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL);
> +	virtqueue_kick(vq);
> +	mutex_unlock(&vb->balloon_lock);
> +}
> +
>  /*
>   * While most virtqueues communicate guest-initiated requests to the hypervisor,
>   * the stats queue operates in reverse.  The driver initializes the virtqueue
> @@ -511,18 +559,49 @@ static void update_balloon_size_func(struct work_struct *work)
>  		queue_work(system_freezable_wq, work);
>  }
>  
> +static void misc_handle_rq(struct virtio_balloon *vb)
> +{
> +	struct balloon_req_hdr *ptr_hdr;
> +	unsigned int len;
> +
> +	ptr_hdr = virtqueue_get_buf(vb->misc_vq, &len);
> +	if (!ptr_hdr || len != sizeof(vb->req_hdr))
> +		return;
> +
> +	switch (ptr_hdr->cmd) {
> +	case BALLOON_GET_FREE_PAGES:
> +		update_free_pages_stats(vb, ptr_hdr->param);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +static void misc_request(struct virtqueue *vq)
> +{
> +	struct virtio_balloon *vb = vq->vdev->priv;
> +
> +	misc_handle_rq(vb);
> +}
> +
>  static int init_vqs(struct virtio_balloon *vb)
>  {
> -	struct virtqueue *vqs[3];
> -	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
> -	static const char * const names[] = { "inflate", "deflate", "stats" };
> +	struct virtqueue *vqs[4];
> +	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
> +					 stats_request, misc_request };
> +	static const char * const names[] = { "inflate", "deflate", "stats",
> +						 "misc" };
>  	int err, nvqs;
>  
>  	/*
>  	 * We expect two virtqueues: inflate and deflate, and
>  	 * optionally stat.
>  	 */
> -	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> +		nvqs = 4;

Does misc vq depend on stats vq feature then? if yes please validate that.


> +	else
> +		nvqs = virtio_has_feature(vb->vdev,
> +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;

Replace that ? with else too pls.

>  	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
>  	if (err)
>  		return err;
> @@ -543,6 +622,16 @@ static int init_vqs(struct virtio_balloon *vb)
>  			BUG();
>  		virtqueue_kick(vb->stats_vq);
>  	}
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) {
> +		struct scatterlist sg_in;
> +
> +		vb->misc_vq = vqs[3];
> +		sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
> +		if (virtqueue_add_inbuf(vb->misc_vq, &sg_in, 1,
> +		    &vb->req_hdr, GFP_KERNEL) < 0)
> +			BUG();
> +		virtqueue_kick(vb->misc_vq);
> +	}
>  	return 0;
>  }
>  
> @@ -639,8 +728,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
>  
>  	/* Clear the feature bit if memory allocation fails */
> -	if (!vb->bmap_hdr)
> +	if (!vb->bmap_hdr) {
>  		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_MISC_VQ);
> +	}
>  	else
>  		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
> @@ -743,6 +834,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
>  	VIRTIO_BALLOON_F_PAGE_BITMAP,
> +	VIRTIO_BALLOON_F_MISC_VQ,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
@ 2016-07-27 22:00     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:00 UTC (permalink / raw)
  To: Liang Li
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

On Wed, Jul 27, 2016 at 09:23:36AM +0800, Liang Li wrote:
> Support the request for vm's free page information, response with
> a page bitmap. QEMU can make use of this free page bitmap to speed
> up live migration process by skipping process the free pages.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 104 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 98 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 2d18ff6..5ca4ad3 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -62,10 +62,13 @@ module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
>  extern unsigned long get_max_pfn(void);
> +extern int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
> +		unsigned long *bitmap, unsigned long len);
> +
>  
>  struct virtio_balloon {
>  	struct virtio_device *vdev;
> -	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> +	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *misc_vq;
>  
>  	/* The balloon servicing is delegated to a freezable workqueue. */
>  	struct work_struct update_balloon_stats_work;
> @@ -89,6 +92,8 @@ struct virtio_balloon {
>  	unsigned long pfn_limit;
>  	/* Used to record the processed pfn range */
>  	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
> +	/* Request header */
> +	struct balloon_req_hdr req_hdr;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -373,6 +378,49 @@ static void update_balloon_stats(struct virtio_balloon *vb)
>  				pages_to_bytes(available));
>  }
>  
> +static void update_free_pages_stats(struct virtio_balloon *vb,

why _stats?

> +				unsigned long req_id)
> +{
> +	struct scatterlist sg_in, sg_out;
> +	unsigned long pfn = 0, bmap_len, max_pfn;
> +	struct virtqueue *vq = vb->misc_vq;
> +	struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +	int ret = 1;
> +
> +	max_pfn = get_max_pfn();
> +	mutex_lock(&vb->balloon_lock);
> +	while (pfn < max_pfn) {
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, BALLOON_GET_FREE_PAGES);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> +		if (!ret) {
> +			hdr->flag = cpu_to_virtio16(vb->vdev,
> +							BALLOON_FLAG_DONE);
> +			if (pfn + vb->pfn_limit > max_pfn)
> +				bmap_len = (max_pfn - pfn) / BITS_PER_BYTE;
> +		} else
> +			hdr->flag = cpu_to_virtio16(vb->vdev,
> +							BALLOON_FLAG_CONT);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg_out, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);

Wait a second. This adds the same buffer multiple times in a loop.
We will overwrite the buffer without waiting for
hypervisor to process it. What did I miss?
> +
> +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);

this can fail. you want to maybe make sure vq has enough space
before you use it or check error and wait.

> +		virtqueue_kick(vq);

why kick here within loop? wait until done. in fact kick
outside lock is better for smp.

> +		pfn += vb->pfn_limit;
> +	}
> +
> +	sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
> +	virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL);
> +	virtqueue_kick(vq);
> +	mutex_unlock(&vb->balloon_lock);
> +}
> +
>  /*
>   * While most virtqueues communicate guest-initiated requests to the hypervisor,
>   * the stats queue operates in reverse.  The driver initializes the virtqueue
> @@ -511,18 +559,49 @@ static void update_balloon_size_func(struct work_struct *work)
>  		queue_work(system_freezable_wq, work);
>  }
>  
> +static void misc_handle_rq(struct virtio_balloon *vb)
> +{
> +	struct balloon_req_hdr *ptr_hdr;
> +	unsigned int len;
> +
> +	ptr_hdr = virtqueue_get_buf(vb->misc_vq, &len);
> +	if (!ptr_hdr || len != sizeof(vb->req_hdr))
> +		return;
> +
> +	switch (ptr_hdr->cmd) {
> +	case BALLOON_GET_FREE_PAGES:
> +		update_free_pages_stats(vb, ptr_hdr->param);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +static void misc_request(struct virtqueue *vq)
> +{
> +	struct virtio_balloon *vb = vq->vdev->priv;
> +
> +	misc_handle_rq(vb);
> +}
> +
>  static int init_vqs(struct virtio_balloon *vb)
>  {
> -	struct virtqueue *vqs[3];
> -	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
> -	static const char * const names[] = { "inflate", "deflate", "stats" };
> +	struct virtqueue *vqs[4];
> +	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
> +					 stats_request, misc_request };
> +	static const char * const names[] = { "inflate", "deflate", "stats",
> +						 "misc" };
>  	int err, nvqs;
>  
>  	/*
>  	 * We expect two virtqueues: inflate and deflate, and
>  	 * optionally stat.
>  	 */
> -	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> +		nvqs = 4;

Does misc vq depend on stats vq feature then? if yes please validate that.


> +	else
> +		nvqs = virtio_has_feature(vb->vdev,
> +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;

Replace that ? with else too pls.

>  	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
>  	if (err)
>  		return err;
> @@ -543,6 +622,16 @@ static int init_vqs(struct virtio_balloon *vb)
>  			BUG();
>  		virtqueue_kick(vb->stats_vq);
>  	}
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) {
> +		struct scatterlist sg_in;
> +
> +		vb->misc_vq = vqs[3];
> +		sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
> +		if (virtqueue_add_inbuf(vb->misc_vq, &sg_in, 1,
> +		    &vb->req_hdr, GFP_KERNEL) < 0)
> +			BUG();
> +		virtqueue_kick(vb->misc_vq);
> +	}
>  	return 0;
>  }
>  
> @@ -639,8 +728,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
>  
>  	/* Clear the feature bit if memory allocation fails */
> -	if (!vb->bmap_hdr)
> +	if (!vb->bmap_hdr) {
>  		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_MISC_VQ);
> +	}
>  	else
>  		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
> @@ -743,6 +834,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
>  	VIRTIO_BALLOON_F_PAGE_BITMAP,
> +	VIRTIO_BALLOON_F_MISC_VQ,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
@ 2016-07-27 22:00     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:00 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:36AM +0800, Liang Li wrote:
> Support the request for vm's free page information, response with
> a page bitmap. QEMU can make use of this free page bitmap to speed
> up live migration process by skipping process the free pages.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 104 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 98 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 2d18ff6..5ca4ad3 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -62,10 +62,13 @@ module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
>  extern unsigned long get_max_pfn(void);
> +extern int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
> +		unsigned long *bitmap, unsigned long len);
> +
>  
>  struct virtio_balloon {
>  	struct virtio_device *vdev;
> -	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> +	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *misc_vq;
>  
>  	/* The balloon servicing is delegated to a freezable workqueue. */
>  	struct work_struct update_balloon_stats_work;
> @@ -89,6 +92,8 @@ struct virtio_balloon {
>  	unsigned long pfn_limit;
>  	/* Used to record the processed pfn range */
>  	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
> +	/* Request header */
> +	struct balloon_req_hdr req_hdr;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -373,6 +378,49 @@ static void update_balloon_stats(struct virtio_balloon *vb)
>  				pages_to_bytes(available));
>  }
>  
> +static void update_free_pages_stats(struct virtio_balloon *vb,

why _stats?

> +				unsigned long req_id)
> +{
> +	struct scatterlist sg_in, sg_out;
> +	unsigned long pfn = 0, bmap_len, max_pfn;
> +	struct virtqueue *vq = vb->misc_vq;
> +	struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +	int ret = 1;
> +
> +	max_pfn = get_max_pfn();
> +	mutex_lock(&vb->balloon_lock);
> +	while (pfn < max_pfn) {
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, BALLOON_GET_FREE_PAGES);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> +		if (!ret) {
> +			hdr->flag = cpu_to_virtio16(vb->vdev,
> +							BALLOON_FLAG_DONE);
> +			if (pfn + vb->pfn_limit > max_pfn)
> +				bmap_len = (max_pfn - pfn) / BITS_PER_BYTE;
> +		} else
> +			hdr->flag = cpu_to_virtio16(vb->vdev,
> +							BALLOON_FLAG_CONT);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg_out, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);

Wait a second. This adds the same buffer multiple times in a loop.
We will overwrite the buffer without waiting for
hypervisor to process it. What did I miss?
> +
> +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);

this can fail. you want to maybe make sure vq has enough space
before you use it or check error and wait.

> +		virtqueue_kick(vq);

why kick here within loop? wait until done. in fact kick
outside lock is better for smp.

> +		pfn += vb->pfn_limit;
> +	}
> +
> +	sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
> +	virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL);
> +	virtqueue_kick(vq);
> +	mutex_unlock(&vb->balloon_lock);
> +}
> +
>  /*
>   * While most virtqueues communicate guest-initiated requests to the hypervisor,
>   * the stats queue operates in reverse.  The driver initializes the virtqueue
> @@ -511,18 +559,49 @@ static void update_balloon_size_func(struct work_struct *work)
>  		queue_work(system_freezable_wq, work);
>  }
>  
> +static void misc_handle_rq(struct virtio_balloon *vb)
> +{
> +	struct balloon_req_hdr *ptr_hdr;
> +	unsigned int len;
> +
> +	ptr_hdr = virtqueue_get_buf(vb->misc_vq, &len);
> +	if (!ptr_hdr || len != sizeof(vb->req_hdr))
> +		return;
> +
> +	switch (ptr_hdr->cmd) {
> +	case BALLOON_GET_FREE_PAGES:
> +		update_free_pages_stats(vb, ptr_hdr->param);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +static void misc_request(struct virtqueue *vq)
> +{
> +	struct virtio_balloon *vb = vq->vdev->priv;
> +
> +	misc_handle_rq(vb);
> +}
> +
>  static int init_vqs(struct virtio_balloon *vb)
>  {
> -	struct virtqueue *vqs[3];
> -	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
> -	static const char * const names[] = { "inflate", "deflate", "stats" };
> +	struct virtqueue *vqs[4];
> +	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
> +					 stats_request, misc_request };
> +	static const char * const names[] = { "inflate", "deflate", "stats",
> +						 "misc" };
>  	int err, nvqs;
>  
>  	/*
>  	 * We expect two virtqueues: inflate and deflate, and
>  	 * optionally stat.
>  	 */
> -	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> +		nvqs = 4;

Does misc vq depend on stats vq feature then? if yes please validate that.


> +	else
> +		nvqs = virtio_has_feature(vb->vdev,
> +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;

Replace that ? with else too pls.

>  	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
>  	if (err)
>  		return err;
> @@ -543,6 +622,16 @@ static int init_vqs(struct virtio_balloon *vb)
>  			BUG();
>  		virtqueue_kick(vb->stats_vq);
>  	}
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) {
> +		struct scatterlist sg_in;
> +
> +		vb->misc_vq = vqs[3];
> +		sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
> +		if (virtqueue_add_inbuf(vb->misc_vq, &sg_in, 1,
> +		    &vb->req_hdr, GFP_KERNEL) < 0)
> +			BUG();
> +		virtqueue_kick(vb->misc_vq);
> +	}
>  	return 0;
>  }
>  
> @@ -639,8 +728,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
>  
>  	/* Clear the feature bit if memory allocation fails */
> -	if (!vb->bmap_hdr)
> +	if (!vb->bmap_hdr) {
>  		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_MISC_VQ);
> +	}
>  	else
>  		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
> @@ -743,6 +834,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
>  	VIRTIO_BALLOON_F_PAGE_BITMAP,
> +	VIRTIO_BALLOON_F_MISC_VQ,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
@ 2016-07-27 22:00     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:00 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:36AM +0800, Liang Li wrote:
> Support the request for vm's free page information, response with
> a page bitmap. QEMU can make use of this free page bitmap to speed
> up live migration process by skipping process the free pages.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 104 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 98 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 2d18ff6..5ca4ad3 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -62,10 +62,13 @@ module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
>  extern unsigned long get_max_pfn(void);
> +extern int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
> +		unsigned long *bitmap, unsigned long len);
> +
>  
>  struct virtio_balloon {
>  	struct virtio_device *vdev;
> -	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> +	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *misc_vq;
>  
>  	/* The balloon servicing is delegated to a freezable workqueue. */
>  	struct work_struct update_balloon_stats_work;
> @@ -89,6 +92,8 @@ struct virtio_balloon {
>  	unsigned long pfn_limit;
>  	/* Used to record the processed pfn range */
>  	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
> +	/* Request header */
> +	struct balloon_req_hdr req_hdr;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -373,6 +378,49 @@ static void update_balloon_stats(struct virtio_balloon *vb)
>  				pages_to_bytes(available));
>  }
>  
> +static void update_free_pages_stats(struct virtio_balloon *vb,

why _stats?

> +				unsigned long req_id)
> +{
> +	struct scatterlist sg_in, sg_out;
> +	unsigned long pfn = 0, bmap_len, max_pfn;
> +	struct virtqueue *vq = vb->misc_vq;
> +	struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +	int ret = 1;
> +
> +	max_pfn = get_max_pfn();
> +	mutex_lock(&vb->balloon_lock);
> +	while (pfn < max_pfn) {
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, BALLOON_GET_FREE_PAGES);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> +		if (!ret) {
> +			hdr->flag = cpu_to_virtio16(vb->vdev,
> +							BALLOON_FLAG_DONE);
> +			if (pfn + vb->pfn_limit > max_pfn)
> +				bmap_len = (max_pfn - pfn) / BITS_PER_BYTE;
> +		} else
> +			hdr->flag = cpu_to_virtio16(vb->vdev,
> +							BALLOON_FLAG_CONT);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg_out, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);

Wait a second. This adds the same buffer multiple times in a loop.
We will overwrite the buffer without waiting for
hypervisor to process it. What did I miss?
> +
> +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);

this can fail. you want to maybe make sure vq has enough space
before you use it or check error and wait.

> +		virtqueue_kick(vq);

why kick here within loop? wait until done. in fact kick
outside lock is better for smp.

> +		pfn += vb->pfn_limit;
> +	}
> +
> +	sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
> +	virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL);
> +	virtqueue_kick(vq);
> +	mutex_unlock(&vb->balloon_lock);
> +}
> +
>  /*
>   * While most virtqueues communicate guest-initiated requests to the hypervisor,
>   * the stats queue operates in reverse.  The driver initializes the virtqueue
> @@ -511,18 +559,49 @@ static void update_balloon_size_func(struct work_struct *work)
>  		queue_work(system_freezable_wq, work);
>  }
>  
> +static void misc_handle_rq(struct virtio_balloon *vb)
> +{
> +	struct balloon_req_hdr *ptr_hdr;
> +	unsigned int len;
> +
> +	ptr_hdr = virtqueue_get_buf(vb->misc_vq, &len);
> +	if (!ptr_hdr || len != sizeof(vb->req_hdr))
> +		return;
> +
> +	switch (ptr_hdr->cmd) {
> +	case BALLOON_GET_FREE_PAGES:
> +		update_free_pages_stats(vb, ptr_hdr->param);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +static void misc_request(struct virtqueue *vq)
> +{
> +	struct virtio_balloon *vb = vq->vdev->priv;
> +
> +	misc_handle_rq(vb);
> +}
> +
>  static int init_vqs(struct virtio_balloon *vb)
>  {
> -	struct virtqueue *vqs[3];
> -	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
> -	static const char * const names[] = { "inflate", "deflate", "stats" };
> +	struct virtqueue *vqs[4];
> +	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
> +					 stats_request, misc_request };
> +	static const char * const names[] = { "inflate", "deflate", "stats",
> +						 "misc" };
>  	int err, nvqs;
>  
>  	/*
>  	 * We expect two virtqueues: inflate and deflate, and
>  	 * optionally stat.
>  	 */
> -	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> +		nvqs = 4;

Does misc vq depend on stats vq feature then? if yes please validate that.


> +	else
> +		nvqs = virtio_has_feature(vb->vdev,
> +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;

Replace that ? with else too pls.

>  	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
>  	if (err)
>  		return err;
> @@ -543,6 +622,16 @@ static int init_vqs(struct virtio_balloon *vb)
>  			BUG();
>  		virtqueue_kick(vb->stats_vq);
>  	}
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ)) {
> +		struct scatterlist sg_in;
> +
> +		vb->misc_vq = vqs[3];
> +		sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
> +		if (virtqueue_add_inbuf(vb->misc_vq, &sg_in, 1,
> +		    &vb->req_hdr, GFP_KERNEL) < 0)
> +			BUG();
> +		virtqueue_kick(vb->misc_vq);
> +	}
>  	return 0;
>  }
>  
> @@ -639,8 +728,10 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
>  
>  	/* Clear the feature bit if memory allocation fails */
> -	if (!vb->bmap_hdr)
> +	if (!vb->bmap_hdr) {
>  		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_MISC_VQ);
> +	}
>  	else
>  		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
> @@ -743,6 +834,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
>  	VIRTIO_BALLOON_F_PAGE_BITMAP,
> +	VIRTIO_BALLOON_F_MISC_VQ,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 16:40     ` Dave Hansen
  (?)
@ 2016-07-27 22:05       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	for_each_migratetype_order(order, t) {
> > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > +				page_num = 1UL << order;
> > +				if (pfn + page_num > end_pfn)
> > +					page_num = end_pfn - pfn;
> > +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> > +			}
> > +		}
> > +	}
> 
> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> number of bits being set in the bitmap.  Seems like calling it nr_pages
> or num_pages would be more appropriate.
> 
> Isn't this bitmap out of date by the time it's send up to the
> hypervisor?  Is there something that makes the inaccuracy OK here?

Yes. Calling these free pages is unfortunate. It's likely to confuse
people thinking they can just discard these pages.

Hypervisor sends a request. We respond with this list of pages, and
the guarantee hypervisor needs is that these were free sometime between request
and response, so they are safe to free if they are unmodified
since the request. hypervisor can detect modifications so
it can detect modifications itself and does not need guest help.

Maybe just call these "free if unmodified" and reflect this
everywhere - verbose but hey. Better naming suggestions would be
welcome.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27 22:05       ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	for_each_migratetype_order(order, t) {
> > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > +				page_num = 1UL << order;
> > +				if (pfn + page_num > end_pfn)
> > +					page_num = end_pfn - pfn;
> > +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> > +			}
> > +		}
> > +	}
> 
> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> number of bits being set in the bitmap.  Seems like calling it nr_pages
> or num_pages would be more appropriate.
> 
> Isn't this bitmap out of date by the time it's send up to the
> hypervisor?  Is there something that makes the inaccuracy OK here?

Yes. Calling these free pages is unfortunate. It's likely to confuse
people thinking they can just discard these pages.

Hypervisor sends a request. We respond with this list of pages, and
the guarantee hypervisor needs is that these were free sometime between request
and response, so they are safe to free if they are unmodified
since the request. hypervisor can detect modifications so
it can detect modifications itself and does not need guest help.

Maybe just call these "free if unmodified" and reflect this
everywhere - verbose but hey. Better naming suggestions would be
welcome.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27 22:05       ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	for_each_migratetype_order(order, t) {
> > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > +				page_num = 1UL << order;
> > +				if (pfn + page_num > end_pfn)
> > +					page_num = end_pfn - pfn;
> > +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> > +			}
> > +		}
> > +	}
> 
> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> number of bits being set in the bitmap.  Seems like calling it nr_pages
> or num_pages would be more appropriate.
> 
> Isn't this bitmap out of date by the time it's send up to the
> hypervisor?  Is there something that makes the inaccuracy OK here?

Yes. Calling these free pages is unfortunate. It's likely to confuse
people thinking they can just discard these pages.

Hypervisor sends a request. We respond with this list of pages, and
the guarantee hypervisor needs is that these were free sometime between request
and response, so they are safe to free if they are unmodified
since the request. hypervisor can detect modifications so
it can detect modifications itself and does not need guest help.

Maybe just call these "free if unmodified" and reflect this
everywhere - verbose but hey. Better naming suggestions would be
welcome.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 16:40     ` Dave Hansen
  (?)
  (?)
@ 2016-07-27 22:05     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: virtio-dev, kvm, qemu-devel, Amit Shah, Liang Li, linux-kernel,
	virtualization, linux-mm, Vlastimil Babka, Paolo Bonzini,
	Andrew Morton, Mel Gorman, dgilbert

On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	for_each_migratetype_order(order, t) {
> > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > +				page_num = 1UL << order;
> > +				if (pfn + page_num > end_pfn)
> > +					page_num = end_pfn - pfn;
> > +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> > +			}
> > +		}
> > +	}
> 
> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> number of bits being set in the bitmap.  Seems like calling it nr_pages
> or num_pages would be more appropriate.
> 
> Isn't this bitmap out of date by the time it's send up to the
> hypervisor?  Is there something that makes the inaccuracy OK here?

Yes. Calling these free pages is unfortunate. It's likely to confuse
people thinking they can just discard these pages.

Hypervisor sends a request. We respond with this list of pages, and
the guarantee hypervisor needs is that these were free sometime between request
and response, so they are safe to free if they are unmodified
since the request. hypervisor can detect modifications so
it can detect modifications itself and does not need guest help.

Maybe just call these "free if unmodified" and reflect this
everywhere - verbose but hey. Better naming suggestions would be
welcome.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27  1:23   ` Liang Li
  (?)
  (?)
@ 2016-07-27 22:07     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:07 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:33AM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very
> efficient, the time spends on different stages of inflating
> the balloon to 7GB of a 8GB idle guest:
> 
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> It takes about 4126ms for the inflating process to complete.
> Debugging shows that the bottle neck are the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we
> can reduce the overhead in stage b quite a lot. Furthermore, we
> can do the address translation and call madvise() with a bulk of
> RAM pages, instead of the current page per page way, the overhead
> of stage c and stage d can also be reduced a lot.
> 
> This patch is the kernel side implementation which is intended to
> speed up the inflating & deflating process by adding a new feature
> to the virtio-balloon device. With this new feature, inflating the
> balloon to 7GB of a 8GB idle guest only takes 590ms, the
> performance improvement is about 85%.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages
> instead of a single page at a time.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 162 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8d649a2..2d18ff6 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -41,10 +41,28 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +/*
> + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> + * to prevent a very large page bitmap, there are two reasons for this:
> + * 1) to save memory.
> + * 2) allocate a large bitmap may fail.
> + *
> + * The actual limit of pfn is determined by:
> + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> + *
> + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
> + * the page list and send the PFNs with several times. To reduce the
> + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
> + * be set with a value which can cover most cases.
> + */
> +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */
> +
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
> +extern unsigned long get_max_pfn(void);
> +

Please just include the correct header. No need for this hackery.

>  struct virtio_balloon {
>  	struct virtio_device *vdev;
>  	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -62,6 +80,15 @@ struct virtio_balloon {
>  
>  	/* Number of balloon pages we've told the Host we're not using. */
>  	unsigned int num_pages;
> +	/* Pointer of the bitmap header. */
> +	void *bmap_hdr;
> +	/* Bitmap and length used to tell the host the pages */
> +	unsigned long *page_bitmap;
> +	unsigned long bmap_len;
> +	/* Pfn limit */
> +	unsigned long pfn_limit;
> +	/* Used to record the processed pfn range */
> +	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
>  	wake_up(&vb->acked);
>  }
>  
> +static inline void init_pfn_range(struct virtio_balloon *vb)
> +{
> +	vb->min_pfn = ULONG_MAX;
> +	vb->max_pfn = 0;
> +}
> +
> +static inline void update_pfn_range(struct virtio_balloon *vb,
> +				 struct page *page)
> +{
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	if (balloon_pfn < vb->min_pfn)
> +		vb->min_pfn = balloon_pfn;
> +	if (balloon_pfn > vb->max_pfn)
> +		vb->max_pfn = balloon_pfn;
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
>  	struct scatterlist sg;
>  	unsigned int len;
>  
> -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +		unsigned long bmap_len;
> +
> +		/* cmd and req_id are not used here, set them to 0 */
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
> +		bmap_len = min(vb->bmap_len,
> +			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> +	} else
> +		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
>  
>  	/* We should always be able to add one buffer to an empty queue. */
>  	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> @@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  
>  	/* When host has read buffer, this completes via balloon_ack */
>  	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> -
>  }
>  
>  static void set_page_pfns(struct virtio_balloon *vb,
> @@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static void set_page_bitmap(struct virtio_balloon *vb,
> +			 struct list_head *pages, struct virtqueue *vq)
> +{
> +	unsigned long pfn;
> +	struct page *page;
> +	bool found;
> +
> +	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
> +	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
> +	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
> +			pfn += vb->pfn_limit) {
> +		vb->start_pfn = pfn + vb->pfn_limit;
> +		vb->end_pfn = pfn;
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		found = false;
> +		list_for_each_entry(page, pages, lru) {
> +			unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +			if (balloon_pfn < pfn ||
> +				 balloon_pfn >= pfn + vb->pfn_limit)
> +				continue;
> +			set_bit(balloon_pfn - pfn, vb->page_bitmap);
> +			if (balloon_pfn > vb->end_pfn)
> +				vb->end_pfn = balloon_pfn;
> +			if (balloon_pfn < vb->start_pfn)
> +				vb->start_pfn = balloon_pfn;
> +			found = true;
> +		}
> +		if (found) {
> +			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> +			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> +			tell_host(vb, vq);
> +		}
> +	}
> +}
> +
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
> +				 bool use_bmap)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
> -	unsigned num_allocated_pages;
> +	unsigned int num_allocated_pages;
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  			msleep(200);
>  			break;
>  		}
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  		if (!virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> @@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  
>  	num_allocated_pages = vb->num_pfns;
>  	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &vb_dev_info->pages,
> +					vb->inflate_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  
>  	return num_allocated_pages;
> @@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
> +				bool use_bmap)
>  {
> -	unsigned num_freed_pages;
> +	unsigned int num_freed_pages;
>  	struct page *page;
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	LIST_HEAD(pages);
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  		page = balloon_page_dequeue(vb_dev_info);
>  		if (!page)
>  			break;
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		list_add(&page->lru, &pages);
>  		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
>  	}
> @@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
>  	 * is true, we *have* to do it in this order
>  	 */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->deflate_vq);
> -	release_pages_balloon(vb, &pages);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &pages, vb->deflate_vq);
> +		else
> +			tell_host(vb, vb->deflate_vq);
> +
> +		release_pages_balloon(vb, &pages);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
>  }
> @@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	struct virtio_balloon *vb;
>  	unsigned long *freed;
>  	unsigned num_freed_pages;
> +	bool use_bmap;
>  
>  	vb = container_of(self, struct virtio_balloon, nb);
>  	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
>  {
>  	struct virtio_balloon *vb;
>  	s64 diff;
> +	bool use_bmap;
>  
>  	vb = container_of(work, struct virtio_balloon,
>  			  update_balloon_size_work);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	diff = towards_target(vb);
>  
>  	if (diff > 0)
> -		diff -= fill_balloon(vb, diff);
> +		diff -= fill_balloon(vb, diff, use_bmap);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, use_bmap);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> -	int err;
> +	int err, hdr_len;
>  
>  	if (!vdev->config->get) {
>  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> +
> +	/* Clear the feature bit if memory allocation fails */
> +	if (!vb->bmap_hdr)
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	else
> +		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
> @@ -541,9 +676,12 @@ out:
>  
>  static void remove_common(struct virtio_balloon *vb)
>  {
> +	bool use_bmap;
> +
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, use_bmap);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> @@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	cancel_work_sync(&vb->update_balloon_stats_work);
>  
>  	remove_common(vb);
> +	kfree(vb->page_bitmap);
>  	kfree(vb);
>  }
>  
> @@ -603,6 +742,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
> +	VIRTIO_BALLOON_F_PAGE_BITMAP,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27 22:07     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:07 UTC (permalink / raw)
  To: Liang Li
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

On Wed, Jul 27, 2016 at 09:23:33AM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very
> efficient, the time spends on different stages of inflating
> the balloon to 7GB of a 8GB idle guest:
> 
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> It takes about 4126ms for the inflating process to complete.
> Debugging shows that the bottle neck are the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we
> can reduce the overhead in stage b quite a lot. Furthermore, we
> can do the address translation and call madvise() with a bulk of
> RAM pages, instead of the current page per page way, the overhead
> of stage c and stage d can also be reduced a lot.
> 
> This patch is the kernel side implementation which is intended to
> speed up the inflating & deflating process by adding a new feature
> to the virtio-balloon device. With this new feature, inflating the
> balloon to 7GB of a 8GB idle guest only takes 590ms, the
> performance improvement is about 85%.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages
> instead of a single page at a time.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 162 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8d649a2..2d18ff6 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -41,10 +41,28 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +/*
> + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> + * to prevent a very large page bitmap, there are two reasons for this:
> + * 1) to save memory.
> + * 2) allocate a large bitmap may fail.
> + *
> + * The actual limit of pfn is determined by:
> + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> + *
> + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
> + * the page list and send the PFNs with several times. To reduce the
> + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
> + * be set with a value which can cover most cases.
> + */
> +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */
> +
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
> +extern unsigned long get_max_pfn(void);
> +

Please just include the correct header. No need for this hackery.

>  struct virtio_balloon {
>  	struct virtio_device *vdev;
>  	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -62,6 +80,15 @@ struct virtio_balloon {
>  
>  	/* Number of balloon pages we've told the Host we're not using. */
>  	unsigned int num_pages;
> +	/* Pointer of the bitmap header. */
> +	void *bmap_hdr;
> +	/* Bitmap and length used to tell the host the pages */
> +	unsigned long *page_bitmap;
> +	unsigned long bmap_len;
> +	/* Pfn limit */
> +	unsigned long pfn_limit;
> +	/* Used to record the processed pfn range */
> +	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
>  	wake_up(&vb->acked);
>  }
>  
> +static inline void init_pfn_range(struct virtio_balloon *vb)
> +{
> +	vb->min_pfn = ULONG_MAX;
> +	vb->max_pfn = 0;
> +}
> +
> +static inline void update_pfn_range(struct virtio_balloon *vb,
> +				 struct page *page)
> +{
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	if (balloon_pfn < vb->min_pfn)
> +		vb->min_pfn = balloon_pfn;
> +	if (balloon_pfn > vb->max_pfn)
> +		vb->max_pfn = balloon_pfn;
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
>  	struct scatterlist sg;
>  	unsigned int len;
>  
> -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +		unsigned long bmap_len;
> +
> +		/* cmd and req_id are not used here, set them to 0 */
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
> +		bmap_len = min(vb->bmap_len,
> +			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> +	} else
> +		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
>  
>  	/* We should always be able to add one buffer to an empty queue. */
>  	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> @@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  
>  	/* When host has read buffer, this completes via balloon_ack */
>  	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> -
>  }
>  
>  static void set_page_pfns(struct virtio_balloon *vb,
> @@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static void set_page_bitmap(struct virtio_balloon *vb,
> +			 struct list_head *pages, struct virtqueue *vq)
> +{
> +	unsigned long pfn;
> +	struct page *page;
> +	bool found;
> +
> +	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
> +	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
> +	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
> +			pfn += vb->pfn_limit) {
> +		vb->start_pfn = pfn + vb->pfn_limit;
> +		vb->end_pfn = pfn;
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		found = false;
> +		list_for_each_entry(page, pages, lru) {
> +			unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +			if (balloon_pfn < pfn ||
> +				 balloon_pfn >= pfn + vb->pfn_limit)
> +				continue;
> +			set_bit(balloon_pfn - pfn, vb->page_bitmap);
> +			if (balloon_pfn > vb->end_pfn)
> +				vb->end_pfn = balloon_pfn;
> +			if (balloon_pfn < vb->start_pfn)
> +				vb->start_pfn = balloon_pfn;
> +			found = true;
> +		}
> +		if (found) {
> +			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> +			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> +			tell_host(vb, vq);
> +		}
> +	}
> +}
> +
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
> +				 bool use_bmap)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
> -	unsigned num_allocated_pages;
> +	unsigned int num_allocated_pages;
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  			msleep(200);
>  			break;
>  		}
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  		if (!virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> @@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  
>  	num_allocated_pages = vb->num_pfns;
>  	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &vb_dev_info->pages,
> +					vb->inflate_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  
>  	return num_allocated_pages;
> @@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
> +				bool use_bmap)
>  {
> -	unsigned num_freed_pages;
> +	unsigned int num_freed_pages;
>  	struct page *page;
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	LIST_HEAD(pages);
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  		page = balloon_page_dequeue(vb_dev_info);
>  		if (!page)
>  			break;
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		list_add(&page->lru, &pages);
>  		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
>  	}
> @@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
>  	 * is true, we *have* to do it in this order
>  	 */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->deflate_vq);
> -	release_pages_balloon(vb, &pages);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &pages, vb->deflate_vq);
> +		else
> +			tell_host(vb, vb->deflate_vq);
> +
> +		release_pages_balloon(vb, &pages);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
>  }
> @@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	struct virtio_balloon *vb;
>  	unsigned long *freed;
>  	unsigned num_freed_pages;
> +	bool use_bmap;
>  
>  	vb = container_of(self, struct virtio_balloon, nb);
>  	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
>  {
>  	struct virtio_balloon *vb;
>  	s64 diff;
> +	bool use_bmap;
>  
>  	vb = container_of(work, struct virtio_balloon,
>  			  update_balloon_size_work);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	diff = towards_target(vb);
>  
>  	if (diff > 0)
> -		diff -= fill_balloon(vb, diff);
> +		diff -= fill_balloon(vb, diff, use_bmap);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, use_bmap);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> -	int err;
> +	int err, hdr_len;
>  
>  	if (!vdev->config->get) {
>  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> +
> +	/* Clear the feature bit if memory allocation fails */
> +	if (!vb->bmap_hdr)
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	else
> +		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
> @@ -541,9 +676,12 @@ out:
>  
>  static void remove_common(struct virtio_balloon *vb)
>  {
> +	bool use_bmap;
> +
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, use_bmap);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> @@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	cancel_work_sync(&vb->update_balloon_stats_work);
>  
>  	remove_common(vb);
> +	kfree(vb->page_bitmap);
>  	kfree(vb);
>  }
>  
> @@ -603,6 +742,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
> +	VIRTIO_BALLOON_F_PAGE_BITMAP,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27 22:07     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:07 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:33AM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very
> efficient, the time spends on different stages of inflating
> the balloon to 7GB of a 8GB idle guest:
> 
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> It takes about 4126ms for the inflating process to complete.
> Debugging shows that the bottle neck are the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we
> can reduce the overhead in stage b quite a lot. Furthermore, we
> can do the address translation and call madvise() with a bulk of
> RAM pages, instead of the current page per page way, the overhead
> of stage c and stage d can also be reduced a lot.
> 
> This patch is the kernel side implementation which is intended to
> speed up the inflating & deflating process by adding a new feature
> to the virtio-balloon device. With this new feature, inflating the
> balloon to 7GB of a 8GB idle guest only takes 590ms, the
> performance improvement is about 85%.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages
> instead of a single page at a time.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 162 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8d649a2..2d18ff6 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -41,10 +41,28 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +/*
> + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> + * to prevent a very large page bitmap, there are two reasons for this:
> + * 1) to save memory.
> + * 2) allocate a large bitmap may fail.
> + *
> + * The actual limit of pfn is determined by:
> + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> + *
> + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
> + * the page list and send the PFNs with several times. To reduce the
> + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
> + * be set with a value which can cover most cases.
> + */
> +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */
> +
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
> +extern unsigned long get_max_pfn(void);
> +

Please just include the correct header. No need for this hackery.

>  struct virtio_balloon {
>  	struct virtio_device *vdev;
>  	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -62,6 +80,15 @@ struct virtio_balloon {
>  
>  	/* Number of balloon pages we've told the Host we're not using. */
>  	unsigned int num_pages;
> +	/* Pointer of the bitmap header. */
> +	void *bmap_hdr;
> +	/* Bitmap and length used to tell the host the pages */
> +	unsigned long *page_bitmap;
> +	unsigned long bmap_len;
> +	/* Pfn limit */
> +	unsigned long pfn_limit;
> +	/* Used to record the processed pfn range */
> +	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
>  	wake_up(&vb->acked);
>  }
>  
> +static inline void init_pfn_range(struct virtio_balloon *vb)
> +{
> +	vb->min_pfn = ULONG_MAX;
> +	vb->max_pfn = 0;
> +}
> +
> +static inline void update_pfn_range(struct virtio_balloon *vb,
> +				 struct page *page)
> +{
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	if (balloon_pfn < vb->min_pfn)
> +		vb->min_pfn = balloon_pfn;
> +	if (balloon_pfn > vb->max_pfn)
> +		vb->max_pfn = balloon_pfn;
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
>  	struct scatterlist sg;
>  	unsigned int len;
>  
> -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +		unsigned long bmap_len;
> +
> +		/* cmd and req_id are not used here, set them to 0 */
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
> +		bmap_len = min(vb->bmap_len,
> +			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> +	} else
> +		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
>  
>  	/* We should always be able to add one buffer to an empty queue. */
>  	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> @@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  
>  	/* When host has read buffer, this completes via balloon_ack */
>  	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> -
>  }
>  
>  static void set_page_pfns(struct virtio_balloon *vb,
> @@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static void set_page_bitmap(struct virtio_balloon *vb,
> +			 struct list_head *pages, struct virtqueue *vq)
> +{
> +	unsigned long pfn;
> +	struct page *page;
> +	bool found;
> +
> +	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
> +	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
> +	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
> +			pfn += vb->pfn_limit) {
> +		vb->start_pfn = pfn + vb->pfn_limit;
> +		vb->end_pfn = pfn;
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		found = false;
> +		list_for_each_entry(page, pages, lru) {
> +			unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +			if (balloon_pfn < pfn ||
> +				 balloon_pfn >= pfn + vb->pfn_limit)
> +				continue;
> +			set_bit(balloon_pfn - pfn, vb->page_bitmap);
> +			if (balloon_pfn > vb->end_pfn)
> +				vb->end_pfn = balloon_pfn;
> +			if (balloon_pfn < vb->start_pfn)
> +				vb->start_pfn = balloon_pfn;
> +			found = true;
> +		}
> +		if (found) {
> +			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> +			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> +			tell_host(vb, vq);
> +		}
> +	}
> +}
> +
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
> +				 bool use_bmap)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
> -	unsigned num_allocated_pages;
> +	unsigned int num_allocated_pages;
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  			msleep(200);
>  			break;
>  		}
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  		if (!virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> @@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  
>  	num_allocated_pages = vb->num_pfns;
>  	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &vb_dev_info->pages,
> +					vb->inflate_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  
>  	return num_allocated_pages;
> @@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
> +				bool use_bmap)
>  {
> -	unsigned num_freed_pages;
> +	unsigned int num_freed_pages;
>  	struct page *page;
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	LIST_HEAD(pages);
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  		page = balloon_page_dequeue(vb_dev_info);
>  		if (!page)
>  			break;
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		list_add(&page->lru, &pages);
>  		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
>  	}
> @@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
>  	 * is true, we *have* to do it in this order
>  	 */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->deflate_vq);
> -	release_pages_balloon(vb, &pages);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &pages, vb->deflate_vq);
> +		else
> +			tell_host(vb, vb->deflate_vq);
> +
> +		release_pages_balloon(vb, &pages);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
>  }
> @@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	struct virtio_balloon *vb;
>  	unsigned long *freed;
>  	unsigned num_freed_pages;
> +	bool use_bmap;
>  
>  	vb = container_of(self, struct virtio_balloon, nb);
>  	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
>  {
>  	struct virtio_balloon *vb;
>  	s64 diff;
> +	bool use_bmap;
>  
>  	vb = container_of(work, struct virtio_balloon,
>  			  update_balloon_size_work);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	diff = towards_target(vb);
>  
>  	if (diff > 0)
> -		diff -= fill_balloon(vb, diff);
> +		diff -= fill_balloon(vb, diff, use_bmap);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, use_bmap);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> -	int err;
> +	int err, hdr_len;
>  
>  	if (!vdev->config->get) {
>  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> +
> +	/* Clear the feature bit if memory allocation fails */
> +	if (!vb->bmap_hdr)
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	else
> +		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
> @@ -541,9 +676,12 @@ out:
>  
>  static void remove_common(struct virtio_balloon *vb)
>  {
> +	bool use_bmap;
> +
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, use_bmap);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> @@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	cancel_work_sync(&vb->update_balloon_stats_work);
>  
>  	remove_common(vb);
> +	kfree(vb->page_bitmap);
>  	kfree(vb);
>  }
>  
> @@ -603,6 +742,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
> +	VIRTIO_BALLOON_F_PAGE_BITMAP,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-27 22:07     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:07 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:33AM +0800, Liang Li wrote:
> The implementation of the current virtio-balloon is not very
> efficient, the time spends on different stages of inflating
> the balloon to 7GB of a 8GB idle guest:
> 
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> It takes about 4126ms for the inflating process to complete.
> Debugging shows that the bottle neck are the stage b and stage d.
> 
> If using a bitmap to send the page info instead of the PFNs, we
> can reduce the overhead in stage b quite a lot. Furthermore, we
> can do the address translation and call madvise() with a bulk of
> RAM pages, instead of the current page per page way, the overhead
> of stage c and stage d can also be reduced a lot.
> 
> This patch is the kernel side implementation which is intended to
> speed up the inflating & deflating process by adding a new feature
> to the virtio-balloon device. With this new feature, inflating the
> balloon to 7GB of a 8GB idle guest only takes 590ms, the
> performance improvement is about 85%.
> 
> TODO: optimize stage a by allocating/freeing a chunk of pages
> instead of a single page at a time.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Suggested-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  drivers/virtio/virtio_balloon.c | 184 +++++++++++++++++++++++++++++++++++-----
>  1 file changed, 162 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 8d649a2..2d18ff6 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -41,10 +41,28 @@
>  #define OOM_VBALLOON_DEFAULT_PAGES 256
>  #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
>  
> +/*
> + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> + * to prevent a very large page bitmap, there are two reasons for this:
> + * 1) to save memory.
> + * 2) allocate a large bitmap may fail.
> + *
> + * The actual limit of pfn is determined by:
> + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> + *
> + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will scan
> + * the page list and send the PFNs with several times. To reduce the
> + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT should
> + * be set with a value which can cover most cases.
> + */
> +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >> PAGE_SHIFT) /* 32GB */
> +
>  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
>  module_param(oom_pages, int, S_IRUSR | S_IWUSR);
>  MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
>  
> +extern unsigned long get_max_pfn(void);
> +

Please just include the correct header. No need for this hackery.

>  struct virtio_balloon {
>  	struct virtio_device *vdev;
>  	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> @@ -62,6 +80,15 @@ struct virtio_balloon {
>  
>  	/* Number of balloon pages we've told the Host we're not using. */
>  	unsigned int num_pages;
> +	/* Pointer of the bitmap header. */
> +	void *bmap_hdr;
> +	/* Bitmap and length used to tell the host the pages */
> +	unsigned long *page_bitmap;
> +	unsigned long bmap_len;
> +	/* Pfn limit */
> +	unsigned long pfn_limit;
> +	/* Used to record the processed pfn range */
> +	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
>  	/*
>  	 * The pages we've told the Host we're not using are enqueued
>  	 * at vb_dev_info->pages list.
> @@ -105,12 +132,45 @@ static void balloon_ack(struct virtqueue *vq)
>  	wake_up(&vb->acked);
>  }
>  
> +static inline void init_pfn_range(struct virtio_balloon *vb)
> +{
> +	vb->min_pfn = ULONG_MAX;
> +	vb->max_pfn = 0;
> +}
> +
> +static inline void update_pfn_range(struct virtio_balloon *vb,
> +				 struct page *page)
> +{
> +	unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +	if (balloon_pfn < vb->min_pfn)
> +		vb->min_pfn = balloon_pfn;
> +	if (balloon_pfn > vb->max_pfn)
> +		vb->max_pfn = balloon_pfn;
> +}
> +
>  static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  {
>  	struct scatterlist sg;
>  	unsigned int len;
>  
> -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> +		unsigned long bmap_len;
> +
> +		/* cmd and req_id are not used here, set them to 0 */
> +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, vb->start_pfn);
> +		bmap_len = min(vb->bmap_len,
> +			(vb->end_pfn - vb->start_pfn) / BITS_PER_BYTE);
> +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> +		sg_init_one(&sg, hdr,
> +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> +	} else
> +		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
>  
>  	/* We should always be able to add one buffer to an empty queue. */
>  	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
> @@ -118,7 +178,6 @@ static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
>  
>  	/* When host has read buffer, this completes via balloon_ack */
>  	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
> -
>  }
>  
>  static void set_page_pfns(struct virtio_balloon *vb,
> @@ -133,13 +192,53 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static void set_page_bitmap(struct virtio_balloon *vb,
> +			 struct list_head *pages, struct virtqueue *vq)
> +{
> +	unsigned long pfn;
> +	struct page *page;
> +	bool found;
> +
> +	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
> +	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
> +	for (pfn = vb->min_pfn; pfn < vb->max_pfn;
> +			pfn += vb->pfn_limit) {
> +		vb->start_pfn = pfn + vb->pfn_limit;
> +		vb->end_pfn = pfn;
> +		memset(vb->page_bitmap, 0, vb->bmap_len);
> +		found = false;
> +		list_for_each_entry(page, pages, lru) {
> +			unsigned long balloon_pfn = page_to_balloon_pfn(page);
> +
> +			if (balloon_pfn < pfn ||
> +				 balloon_pfn >= pfn + vb->pfn_limit)
> +				continue;
> +			set_bit(balloon_pfn - pfn, vb->page_bitmap);
> +			if (balloon_pfn > vb->end_pfn)
> +				vb->end_pfn = balloon_pfn;
> +			if (balloon_pfn < vb->start_pfn)
> +				vb->start_pfn = balloon_pfn;
> +			found = true;
> +		}
> +		if (found) {
> +			vb->start_pfn = rounddown(vb->start_pfn, BITS_PER_LONG);
> +			vb->end_pfn = roundup(vb->end_pfn, BITS_PER_LONG);
> +			tell_host(vb, vq);
> +		}
> +	}
> +}
> +
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num,
> +				 bool use_bmap)
>  {
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
> -	unsigned num_allocated_pages;
> +	unsigned int num_allocated_pages;
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -154,7 +253,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  			msleep(200);
>  			break;
>  		}
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  		if (!virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> @@ -163,8 +265,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  
>  	num_allocated_pages = vb->num_pfns;
>  	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &vb_dev_info->pages,
> +					vb->inflate_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  
>  	return num_allocated_pages;
> @@ -184,15 +291,19 @@ static void release_pages_balloon(struct virtio_balloon *vb,
>  	}
>  }
>  
> -static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num,
> +				bool use_bmap)
>  {
> -	unsigned num_freed_pages;
> +	unsigned int num_freed_pages;
>  	struct page *page;
>  	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
>  	LIST_HEAD(pages);
>  
> -	/* We can only do one array worth at a time. */
> -	num = min(num, ARRAY_SIZE(vb->pfns));
> +	if (use_bmap)
> +		init_pfn_range(vb);
> +	else
> +		/* We can only do one array worth at a time. */
> +		num = min(num, ARRAY_SIZE(vb->pfns));
>  
>  	mutex_lock(&vb->balloon_lock);
>  	for (vb->num_pfns = 0; vb->num_pfns < num;
> @@ -200,7 +311,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  		page = balloon_page_dequeue(vb_dev_info);
>  		if (!page)
>  			break;
> -		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> +		if (use_bmap)
> +			update_pfn_range(vb, page);
> +		else
> +			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>  		list_add(&page->lru, &pages);
>  		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
>  	}
> @@ -211,9 +325,14 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
>  	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
>  	 * is true, we *have* to do it in this order
>  	 */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->deflate_vq);
> -	release_pages_balloon(vb, &pages);
> +	if (vb->num_pfns != 0) {
> +		if (use_bmap)
> +			set_page_bitmap(vb, &pages, vb->deflate_vq);
> +		else
> +			tell_host(vb, vb->deflate_vq);
> +
> +		release_pages_balloon(vb, &pages);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  	return num_freed_pages;
>  }
> @@ -347,13 +466,15 @@ static int virtballoon_oom_notify(struct notifier_block *self,
>  	struct virtio_balloon *vb;
>  	unsigned long *freed;
>  	unsigned num_freed_pages;
> +	bool use_bmap;
>  
>  	vb = container_of(self, struct virtio_balloon, nb);
>  	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>  		return NOTIFY_OK;
>  
>  	freed = parm;
> -	num_freed_pages = leak_balloon(vb, oom_pages);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	num_freed_pages = leak_balloon(vb, oom_pages, use_bmap);
>  	update_balloon_size(vb);
>  	*freed += num_freed_pages;
>  
> @@ -373,15 +494,17 @@ static void update_balloon_size_func(struct work_struct *work)
>  {
>  	struct virtio_balloon *vb;
>  	s64 diff;
> +	bool use_bmap;
>  
>  	vb = container_of(work, struct virtio_balloon,
>  			  update_balloon_size_work);
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	diff = towards_target(vb);
>  
>  	if (diff > 0)
> -		diff -= fill_balloon(vb, diff);
> +		diff -= fill_balloon(vb, diff, use_bmap);
>  	else if (diff < 0)
> -		diff += leak_balloon(vb, -diff);
> +		diff += leak_balloon(vb, -diff, use_bmap);
>  	update_balloon_size(vb);
>  
>  	if (diff)
> @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
>  static int virtballoon_probe(struct virtio_device *vdev)
>  {
>  	struct virtio_balloon *vb;
> -	int err;
> +	int err, hdr_len;
>  
>  	if (!vdev->config->get) {
>  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@ -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  	spin_lock_init(&vb->stop_update_lock);
>  	vb->stop_update = false;
>  	vb->num_pages = 0;
> +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> +	hdr_len = sizeof(struct balloon_bmap_hdr);
> +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> +
> +	/* Clear the feature bit if memory allocation fails */
> +	if (!vb->bmap_hdr)
> +		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
> +	else
> +		vb->page_bitmap = vb->bmap_hdr + hdr_len;
>  	mutex_init(&vb->balloon_lock);
>  	init_waitqueue_head(&vb->acked);
>  	vb->vdev = vdev;
> @@ -541,9 +676,12 @@ out:
>  
>  static void remove_common(struct virtio_balloon *vb)
>  {
> +	bool use_bmap;
> +
> +	use_bmap = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
>  	/* There might be pages left in the balloon: free them. */
>  	while (vb->num_pages)
> -		leak_balloon(vb, vb->num_pages);
> +		leak_balloon(vb, vb->num_pages, use_bmap);
>  	update_balloon_size(vb);
>  
>  	/* Now we reset the device so we can clean up the queues. */
> @@ -565,6 +703,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
>  	cancel_work_sync(&vb->update_balloon_stats_work);
>  
>  	remove_common(vb);
> +	kfree(vb->page_bitmap);
>  	kfree(vb);
>  }
>  
> @@ -603,6 +742,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_MUST_TELL_HOST,
>  	VIRTIO_BALLOON_F_STATS_VQ,
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
> +	VIRTIO_BALLOON_F_PAGE_BITMAP,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 3/7] mm: add a function to get the max pfn
  2016-07-27  1:23   ` Liang Li
  (?)
@ 2016-07-27 22:08     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:08 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:32AM +0800, Liang Li wrote:
> Expose the function to get the max pfn, so it can be used in the
> virtio-balloon device driver.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  mm/page_alloc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8b3e134..7da61ad 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4517,6 +4517,12 @@ void show_free_areas(unsigned int filter)
>  	show_swap_cache_info();
>  }
>  
> +unsigned long get_max_pfn(void)
> +{
> +	return max_pfn;
> +}
> +EXPORT_SYMBOL(get_max_pfn);
> +


This needs a coment that this can change at any time.
So it's only good as a hint e.g. for sizing data structures.

>  static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
>  {
>  	zoneref->zone = zone;
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 3/7] mm: add a function to get the max pfn
@ 2016-07-27 22:08     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:08 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:32AM +0800, Liang Li wrote:
> Expose the function to get the max pfn, so it can be used in the
> virtio-balloon device driver.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  mm/page_alloc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8b3e134..7da61ad 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4517,6 +4517,12 @@ void show_free_areas(unsigned int filter)
>  	show_swap_cache_info();
>  }
>  
> +unsigned long get_max_pfn(void)
> +{
> +	return max_pfn;
> +}
> +EXPORT_SYMBOL(get_max_pfn);
> +


This needs a coment that this can change at any time.
So it's only good as a hint e.g. for sizing data structures.

>  static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
>  {
>  	zoneref->zone = zone;
> -- 
> 1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 3/7] mm: add a function to get the max pfn
@ 2016-07-27 22:08     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:08 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:32AM +0800, Liang Li wrote:
> Expose the function to get the max pfn, so it can be used in the
> virtio-balloon device driver.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  mm/page_alloc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8b3e134..7da61ad 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4517,6 +4517,12 @@ void show_free_areas(unsigned int filter)
>  	show_swap_cache_info();
>  }
>  
> +unsigned long get_max_pfn(void)
> +{
> +	return max_pfn;
> +}
> +EXPORT_SYMBOL(get_max_pfn);
> +


This needs a coment that this can change at any time.
So it's only good as a hint e.g. for sizing data structures.

>  static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
>  {
>  	zoneref->zone = zone;
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 3/7] mm: add a function to get the max pfn
  2016-07-27  1:23   ` Liang Li
                     ` (2 preceding siblings ...)
  (?)
@ 2016-07-27 22:08   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:08 UTC (permalink / raw)
  To: Liang Li
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

On Wed, Jul 27, 2016 at 09:23:32AM +0800, Liang Li wrote:
> Expose the function to get the max pfn, so it can be used in the
> virtio-balloon device driver.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  mm/page_alloc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8b3e134..7da61ad 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4517,6 +4517,12 @@ void show_free_areas(unsigned int filter)
>  	show_swap_cache_info();
>  }
>  
> +unsigned long get_max_pfn(void)
> +{
> +	return max_pfn;
> +}
> +EXPORT_SYMBOL(get_max_pfn);
> +


This needs a coment that this can change at any time.
So it's only good as a hint e.g. for sizing data structures.

>  static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
>  {
>  	zoneref->zone = zone;
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27  1:23   ` Liang Li
  (?)
@ 2016-07-27 22:13     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:13 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:35AM +0800, Liang Li wrote:
> Save the free page info into a page bitmap, will be used in virtio
> balloon device driver.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7da61ad..3ad8b10 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)
>  }
>  EXPORT_SYMBOL(get_max_pfn);
>  
> +static void mark_free_pages_bitmap(struct zone *zone, unsigned long start_pfn,
> +	unsigned long end_pfn, unsigned long *bitmap, unsigned long len)
> +{
> +	unsigned long pfn, flags, page_num;
> +	unsigned int order, t;
> +	struct list_head *curr;
> +
> +	if (zone_is_empty(zone))
> +		return;
> +	end_pfn = min(start_pfn + len, end_pfn);
> +	spin_lock_irqsave(&zone->lock, flags);
> +
> +	for_each_migratetype_order(order, t) {

Why not do each order separately? This way you can
use a single bit to pass a huge page to host.

Not a requirement but hey.

Alternatively (and maybe that is a better idea0
if you wanted to, you could just skip lone 4K pages.
It's not clear that they are worth bothering with.
Add a flag to start with some reasonably large order and go from there.


> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> +				page_num = 1UL << order;
> +				if (pfn + page_num > end_pfn)
> +					page_num = end_pfn - pfn;
> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> +			}
> +		}
> +	}
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
> +int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
> +		unsigned long *bitmap, unsigned long len)
> +{
> +	struct zone *zone;
> +	int ret = 0;
> +
> +	if (bitmap == NULL || start_pfn > end_pfn || start_pfn >= max_pfn)
> +		return 0;
> +	if (end_pfn < max_pfn)
> +		ret = 1;
> +	if (end_pfn >= max_pfn)
> +		ret = 0;
> +
> +	for_each_populated_zone(zone)
> +		mark_free_pages_bitmap(zone, start_pfn, end_pfn, bitmap, len);
> +	return ret;
> +}
> +EXPORT_SYMBOL(get_free_pages);
> +
>  static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
>  {
>  	zoneref->zone = zone;
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27 22:13     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:13 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:35AM +0800, Liang Li wrote:
> Save the free page info into a page bitmap, will be used in virtio
> balloon device driver.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7da61ad..3ad8b10 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)
>  }
>  EXPORT_SYMBOL(get_max_pfn);
>  
> +static void mark_free_pages_bitmap(struct zone *zone, unsigned long start_pfn,
> +	unsigned long end_pfn, unsigned long *bitmap, unsigned long len)
> +{
> +	unsigned long pfn, flags, page_num;
> +	unsigned int order, t;
> +	struct list_head *curr;
> +
> +	if (zone_is_empty(zone))
> +		return;
> +	end_pfn = min(start_pfn + len, end_pfn);
> +	spin_lock_irqsave(&zone->lock, flags);
> +
> +	for_each_migratetype_order(order, t) {

Why not do each order separately? This way you can
use a single bit to pass a huge page to host.

Not a requirement but hey.

Alternatively (and maybe that is a better idea0
if you wanted to, you could just skip lone 4K pages.
It's not clear that they are worth bothering with.
Add a flag to start with some reasonably large order and go from there.


> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> +				page_num = 1UL << order;
> +				if (pfn + page_num > end_pfn)
> +					page_num = end_pfn - pfn;
> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> +			}
> +		}
> +	}
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
> +int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
> +		unsigned long *bitmap, unsigned long len)
> +{
> +	struct zone *zone;
> +	int ret = 0;
> +
> +	if (bitmap == NULL || start_pfn > end_pfn || start_pfn >= max_pfn)
> +		return 0;
> +	if (end_pfn < max_pfn)
> +		ret = 1;
> +	if (end_pfn >= max_pfn)
> +		ret = 0;
> +
> +	for_each_populated_zone(zone)
> +		mark_free_pages_bitmap(zone, start_pfn, end_pfn, bitmap, len);
> +	return ret;
> +}
> +EXPORT_SYMBOL(get_free_pages);
> +
>  static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
>  {
>  	zoneref->zone = zone;
> -- 
> 1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27 22:13     ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:13 UTC (permalink / raw)
  To: Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Wed, Jul 27, 2016 at 09:23:35AM +0800, Liang Li wrote:
> Save the free page info into a page bitmap, will be used in virtio
> balloon device driver.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7da61ad..3ad8b10 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)
>  }
>  EXPORT_SYMBOL(get_max_pfn);
>  
> +static void mark_free_pages_bitmap(struct zone *zone, unsigned long start_pfn,
> +	unsigned long end_pfn, unsigned long *bitmap, unsigned long len)
> +{
> +	unsigned long pfn, flags, page_num;
> +	unsigned int order, t;
> +	struct list_head *curr;
> +
> +	if (zone_is_empty(zone))
> +		return;
> +	end_pfn = min(start_pfn + len, end_pfn);
> +	spin_lock_irqsave(&zone->lock, flags);
> +
> +	for_each_migratetype_order(order, t) {

Why not do each order separately? This way you can
use a single bit to pass a huge page to host.

Not a requirement but hey.

Alternatively (and maybe that is a better idea0
if you wanted to, you could just skip lone 4K pages.
It's not clear that they are worth bothering with.
Add a flag to start with some reasonably large order and go from there.


> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> +				page_num = 1UL << order;
> +				if (pfn + page_num > end_pfn)
> +					page_num = end_pfn - pfn;
> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> +			}
> +		}
> +	}
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
> +int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
> +		unsigned long *bitmap, unsigned long len)
> +{
> +	struct zone *zone;
> +	int ret = 0;
> +
> +	if (bitmap == NULL || start_pfn > end_pfn || start_pfn >= max_pfn)
> +		return 0;
> +	if (end_pfn < max_pfn)
> +		ret = 1;
> +	if (end_pfn >= max_pfn)
> +		ret = 0;
> +
> +	for_each_populated_zone(zone)
> +		mark_free_pages_bitmap(zone, start_pfn, end_pfn, bitmap, len);
> +	return ret;
> +}
> +EXPORT_SYMBOL(get_free_pages);
> +
>  static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
>  {
>  	zoneref->zone = zone;
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27  1:23   ` Liang Li
                     ` (4 preceding siblings ...)
  (?)
@ 2016-07-27 22:13   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 22:13 UTC (permalink / raw)
  To: Liang Li
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

On Wed, Jul 27, 2016 at 09:23:35AM +0800, Liang Li wrote:
> Save the free page info into a page bitmap, will be used in virtio
> balloon device driver.
> 
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
> Cc: Amit Shah <amit.shah@redhat.com>
> ---
>  mm/page_alloc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7da61ad..3ad8b10 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)
>  }
>  EXPORT_SYMBOL(get_max_pfn);
>  
> +static void mark_free_pages_bitmap(struct zone *zone, unsigned long start_pfn,
> +	unsigned long end_pfn, unsigned long *bitmap, unsigned long len)
> +{
> +	unsigned long pfn, flags, page_num;
> +	unsigned int order, t;
> +	struct list_head *curr;
> +
> +	if (zone_is_empty(zone))
> +		return;
> +	end_pfn = min(start_pfn + len, end_pfn);
> +	spin_lock_irqsave(&zone->lock, flags);
> +
> +	for_each_migratetype_order(order, t) {

Why not do each order separately? This way you can
use a single bit to pass a huge page to host.

Not a requirement but hey.

Alternatively (and maybe that is a better idea0
if you wanted to, you could just skip lone 4K pages.
It's not clear that they are worth bothering with.
Add a flag to start with some reasonably large order and go from there.


> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> +				page_num = 1UL << order;
> +				if (pfn + page_num > end_pfn)
> +					page_num = end_pfn - pfn;
> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> +			}
> +		}
> +	}
> +
> +	spin_unlock_irqrestore(&zone->lock, flags);
> +}
> +
> +int get_free_pages(unsigned long start_pfn, unsigned long end_pfn,
> +		unsigned long *bitmap, unsigned long len)
> +{
> +	struct zone *zone;
> +	int ret = 0;
> +
> +	if (bitmap == NULL || start_pfn > end_pfn || start_pfn >= max_pfn)
> +		return 0;
> +	if (end_pfn < max_pfn)
> +		ret = 1;
> +	if (end_pfn >= max_pfn)
> +		ret = 0;
> +
> +	for_each_populated_zone(zone)
> +		mark_free_pages_bitmap(zone, start_pfn, end_pfn, bitmap, len);
> +	return ret;
> +}
> +EXPORT_SYMBOL(get_free_pages);
> +
>  static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
>  {
>  	zoneref->zone = zone;
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 22:05       ` Michael S. Tsirkin
  (?)
@ 2016-07-27 22:16         ` Dave Hansen
  -1 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 22:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
>> On 07/26/2016 06:23 PM, Liang Li wrote:
>>> +	for_each_migratetype_order(order, t) {
>>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
>>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
>>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
>>> +				page_num = 1UL << order;
>>> +				if (pfn + page_num > end_pfn)
>>> +					page_num = end_pfn - pfn;
>>> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
>>> +			}
>>> +		}
>>> +	}
>>
>> Nit:  The 'page_num' nomenclature really confused me here.  It is the
>> number of bits being set in the bitmap.  Seems like calling it nr_pages
>> or num_pages would be more appropriate.
>>
>> Isn't this bitmap out of date by the time it's send up to the
>> hypervisor?  Is there something that makes the inaccuracy OK here?
> 
> Yes. Calling these free pages is unfortunate. It's likely to confuse
> people thinking they can just discard these pages.
> 
> Hypervisor sends a request. We respond with this list of pages, and
> the guarantee hypervisor needs is that these were free sometime between request
> and response, so they are safe to free if they are unmodified
> since the request. hypervisor can detect modifications so
> it can detect modifications itself and does not need guest help.

Ahh, that makes sense.

So the hypervisor is trying to figure out: "Which pages do I move?".  It
wants to know which pages the guest thinks have good data and need to
move.  But, the list of free pages is (likely) smaller than the list of
pages with good data, so it asks for that instead.

A write to a page means that it has valuable data, regardless of whether
it was in the free list or not.

The hypervisor only skips moving pages that were free *and* were never
written to.  So we never lose data, even if this "get free page info"
stuff is totally out of date.

The patch description and code comments are, um, a _bit_ light for this
level of subtlety. :)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27 22:16         ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 22:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
>> On 07/26/2016 06:23 PM, Liang Li wrote:
>>> +	for_each_migratetype_order(order, t) {
>>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
>>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
>>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
>>> +				page_num = 1UL << order;
>>> +				if (pfn + page_num > end_pfn)
>>> +					page_num = end_pfn - pfn;
>>> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
>>> +			}
>>> +		}
>>> +	}
>>
>> Nit:  The 'page_num' nomenclature really confused me here.  It is the
>> number of bits being set in the bitmap.  Seems like calling it nr_pages
>> or num_pages would be more appropriate.
>>
>> Isn't this bitmap out of date by the time it's send up to the
>> hypervisor?  Is there something that makes the inaccuracy OK here?
> 
> Yes. Calling these free pages is unfortunate. It's likely to confuse
> people thinking they can just discard these pages.
> 
> Hypervisor sends a request. We respond with this list of pages, and
> the guarantee hypervisor needs is that these were free sometime between request
> and response, so they are safe to free if they are unmodified
> since the request. hypervisor can detect modifications so
> it can detect modifications itself and does not need guest help.

Ahh, that makes sense.

So the hypervisor is trying to figure out: "Which pages do I move?".  It
wants to know which pages the guest thinks have good data and need to
move.  But, the list of free pages is (likely) smaller than the list of
pages with good data, so it asks for that instead.

A write to a page means that it has valuable data, regardless of whether
it was in the free list or not.

The hypervisor only skips moving pages that were free *and* were never
written to.  So we never lose data, even if this "get free page info"
stuff is totally out of date.

The patch description and code comments are, um, a _bit_ light for this
level of subtlety. :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27 22:16         ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 22:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
>> On 07/26/2016 06:23 PM, Liang Li wrote:
>>> +	for_each_migratetype_order(order, t) {
>>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
>>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
>>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
>>> +				page_num = 1UL << order;
>>> +				if (pfn + page_num > end_pfn)
>>> +					page_num = end_pfn - pfn;
>>> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
>>> +			}
>>> +		}
>>> +	}
>>
>> Nit:  The 'page_num' nomenclature really confused me here.  It is the
>> number of bits being set in the bitmap.  Seems like calling it nr_pages
>> or num_pages would be more appropriate.
>>
>> Isn't this bitmap out of date by the time it's send up to the
>> hypervisor?  Is there something that makes the inaccuracy OK here?
> 
> Yes. Calling these free pages is unfortunate. It's likely to confuse
> people thinking they can just discard these pages.
> 
> Hypervisor sends a request. We respond with this list of pages, and
> the guarantee hypervisor needs is that these were free sometime between request
> and response, so they are safe to free if they are unmodified
> since the request. hypervisor can detect modifications so
> it can detect modifications itself and does not need guest help.

Ahh, that makes sense.

So the hypervisor is trying to figure out: "Which pages do I move?".  It
wants to know which pages the guest thinks have good data and need to
move.  But, the list of free pages is (likely) smaller than the list of
pages with good data, so it asks for that instead.

A write to a page means that it has valuable data, regardless of whether
it was in the free list or not.

The hypervisor only skips moving pages that were free *and* were never
written to.  So we never lose data, even if this "get free page info"
stuff is totally out of date.

The patch description and code comments are, um, a _bit_ light for this
level of subtlety. :)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 22:05       ` Michael S. Tsirkin
                         ` (2 preceding siblings ...)
  (?)
@ 2016-07-27 22:16       ` Dave Hansen
  -1 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 22:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, qemu-devel, Amit Shah, Liang Li, linux-kernel,
	virtualization, linux-mm, Vlastimil Babka, Paolo Bonzini,
	Andrew Morton, Mel Gorman, dgilbert

On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
>> On 07/26/2016 06:23 PM, Liang Li wrote:
>>> +	for_each_migratetype_order(order, t) {
>>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
>>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
>>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
>>> +				page_num = 1UL << order;
>>> +				if (pfn + page_num > end_pfn)
>>> +					page_num = end_pfn - pfn;
>>> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
>>> +			}
>>> +		}
>>> +	}
>>
>> Nit:  The 'page_num' nomenclature really confused me here.  It is the
>> number of bits being set in the bitmap.  Seems like calling it nr_pages
>> or num_pages would be more appropriate.
>>
>> Isn't this bitmap out of date by the time it's send up to the
>> hypervisor?  Is there something that makes the inaccuracy OK here?
> 
> Yes. Calling these free pages is unfortunate. It's likely to confuse
> people thinking they can just discard these pages.
> 
> Hypervisor sends a request. We respond with this list of pages, and
> the guarantee hypervisor needs is that these were free sometime between request
> and response, so they are safe to free if they are unmodified
> since the request. hypervisor can detect modifications so
> it can detect modifications itself and does not need guest help.

Ahh, that makes sense.

So the hypervisor is trying to figure out: "Which pages do I move?".  It
wants to know which pages the guest thinks have good data and need to
move.  But, the list of free pages is (likely) smaller than the list of
pages with good data, so it asks for that instead.

A write to a page means that it has valuable data, regardless of whether
it was in the free list or not.

The hypervisor only skips moving pages that were free *and* were never
written to.  So we never lose data, even if this "get free page info"
stuff is totally out of date.

The patch description and code comments are, um, a _bit_ light for this
level of subtlety. :)

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 3/7] mm: add a function to get the max pfn
  2016-07-27 22:08     ` Michael S. Tsirkin
  (?)
@ 2016-07-27 22:52       ` Dave Hansen
  -1 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 22:52 UTC (permalink / raw)
  To: Michael S. Tsirkin, Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/27/2016 03:08 PM, Michael S. Tsirkin wrote:
>> > +unsigned long get_max_pfn(void)
>> > +{
>> > +	return max_pfn;
>> > +}
>> > +EXPORT_SYMBOL(get_max_pfn);
>> > +
> 
> This needs a coment that this can change at any time.
> So it's only good as a hint e.g. for sizing data structures.

Or, if we limit the batches to 1GB like you suggested earlier, then we
might not even need this exported.  It would mean that in the worst
case, we temporarily waste 28k out of the 32k allocation for a small VM
that had <128MB of memory.

That seems like a small price to pay for not having to track max_pfn
anywhere.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 3/7] mm: add a function to get the max pfn
@ 2016-07-27 22:52       ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 22:52 UTC (permalink / raw)
  To: Michael S. Tsirkin, Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/27/2016 03:08 PM, Michael S. Tsirkin wrote:
>> > +unsigned long get_max_pfn(void)
>> > +{
>> > +	return max_pfn;
>> > +}
>> > +EXPORT_SYMBOL(get_max_pfn);
>> > +
> 
> This needs a coment that this can change at any time.
> So it's only good as a hint e.g. for sizing data structures.

Or, if we limit the batches to 1GB like you suggested earlier, then we
might not even need this exported.  It would mean that in the worst
case, we temporarily waste 28k out of the 32k allocation for a small VM
that had <128MB of memory.

That seems like a small price to pay for not having to track max_pfn
anywhere.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 3/7] mm: add a function to get the max pfn
@ 2016-07-27 22:52       ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 22:52 UTC (permalink / raw)
  To: Michael S. Tsirkin, Liang Li
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/27/2016 03:08 PM, Michael S. Tsirkin wrote:
>> > +unsigned long get_max_pfn(void)
>> > +{
>> > +	return max_pfn;
>> > +}
>> > +EXPORT_SYMBOL(get_max_pfn);
>> > +
> 
> This needs a coment that this can change at any time.
> So it's only good as a hint e.g. for sizing data structures.

Or, if we limit the batches to 1GB like you suggested earlier, then we
might not even need this exported.  It would mean that in the worst
case, we temporarily waste 28k out of the 32k allocation for a small VM
that had <128MB of memory.

That seems like a small price to pay for not having to track max_pfn
anywhere.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 3/7] mm: add a function to get the max pfn
  2016-07-27 22:08     ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-27 22:52     ` Dave Hansen
  -1 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-27 22:52 UTC (permalink / raw)
  To: Michael S. Tsirkin, Liang Li
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

On 07/27/2016 03:08 PM, Michael S. Tsirkin wrote:
>> > +unsigned long get_max_pfn(void)
>> > +{
>> > +	return max_pfn;
>> > +}
>> > +EXPORT_SYMBOL(get_max_pfn);
>> > +
> 
> This needs a coment that this can change at any time.
> So it's only good as a hint e.g. for sizing data structures.

Or, if we limit the batches to 1GB like you suggested earlier, then we
might not even need this exported.  It would mean that in the worst
case, we temporarily waste 28k out of the 32k allocation for a small VM
that had <128MB of memory.

That seems like a small price to pay for not having to track max_pfn
anywhere.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 22:16         ` Dave Hansen
  (?)
@ 2016-07-27 23:05           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 23:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Wed, Jul 27, 2016 at 03:16:57PM -0700, Dave Hansen wrote:
> On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> > On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> >> On 07/26/2016 06:23 PM, Liang Li wrote:
> >>> +	for_each_migratetype_order(order, t) {
> >>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> >>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> >>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> >>> +				page_num = 1UL << order;
> >>> +				if (pfn + page_num > end_pfn)
> >>> +					page_num = end_pfn - pfn;
> >>> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> >>> +			}
> >>> +		}
> >>> +	}
> >>
> >> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> >> number of bits being set in the bitmap.  Seems like calling it nr_pages
> >> or num_pages would be more appropriate.
> >>
> >> Isn't this bitmap out of date by the time it's send up to the
> >> hypervisor?  Is there something that makes the inaccuracy OK here?
> > 
> > Yes. Calling these free pages is unfortunate. It's likely to confuse
> > people thinking they can just discard these pages.
> > 
> > Hypervisor sends a request. We respond with this list of pages, and
> > the guarantee hypervisor needs is that these were free sometime between request
> > and response, so they are safe to free if they are unmodified
> > since the request. hypervisor can detect modifications so
> > it can detect modifications itself and does not need guest help.
> 
> Ahh, that makes sense.
> 
> So the hypervisor is trying to figure out: "Which pages do I move?".  It
> wants to know which pages the guest thinks have good data and need to
> move.  But, the list of free pages is (likely) smaller than the list of
> pages with good data, so it asks for that instead.
> 
> A write to a page means that it has valuable data, regardless of whether
> it was in the free list or not.
> 
> The hypervisor only skips moving pages that were free *and* were never
> written to.

Right - except never is a long time, so we just need it "since the request
was received".

> So we never lose data, even if this "get free page info"
> stuff is totally out of date.

So if you include pages that were written to before the request
then yes data will be lost. This is why we do this scan
after we get the request and not e.g. on boot :)

> The patch description and code comments are, um, a _bit_ light for this
> level of subtlety. :)

Add to that, for any page it is safe to skip and not add it to list.
So the requirement is for when page must *not* be on this list:
it must not be there if it is needed by guest but was not
modified since the request.

Calling it "free" will just keep confusing people.  Either use the
verbose "free or modified" or invent a new word like "discardable" and
add a comment explaining that page is always discardable unless it's
content is needed by Linux but was not modified since the request.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27 23:05           ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 23:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Wed, Jul 27, 2016 at 03:16:57PM -0700, Dave Hansen wrote:
> On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> > On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> >> On 07/26/2016 06:23 PM, Liang Li wrote:
> >>> +	for_each_migratetype_order(order, t) {
> >>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> >>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> >>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> >>> +				page_num = 1UL << order;
> >>> +				if (pfn + page_num > end_pfn)
> >>> +					page_num = end_pfn - pfn;
> >>> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> >>> +			}
> >>> +		}
> >>> +	}
> >>
> >> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> >> number of bits being set in the bitmap.  Seems like calling it nr_pages
> >> or num_pages would be more appropriate.
> >>
> >> Isn't this bitmap out of date by the time it's send up to the
> >> hypervisor?  Is there something that makes the inaccuracy OK here?
> > 
> > Yes. Calling these free pages is unfortunate. It's likely to confuse
> > people thinking they can just discard these pages.
> > 
> > Hypervisor sends a request. We respond with this list of pages, and
> > the guarantee hypervisor needs is that these were free sometime between request
> > and response, so they are safe to free if they are unmodified
> > since the request. hypervisor can detect modifications so
> > it can detect modifications itself and does not need guest help.
> 
> Ahh, that makes sense.
> 
> So the hypervisor is trying to figure out: "Which pages do I move?".  It
> wants to know which pages the guest thinks have good data and need to
> move.  But, the list of free pages is (likely) smaller than the list of
> pages with good data, so it asks for that instead.
> 
> A write to a page means that it has valuable data, regardless of whether
> it was in the free list or not.
> 
> The hypervisor only skips moving pages that were free *and* were never
> written to.

Right - except never is a long time, so we just need it "since the request
was received".

> So we never lose data, even if this "get free page info"
> stuff is totally out of date.

So if you include pages that were written to before the request
then yes data will be lost. This is why we do this scan
after we get the request and not e.g. on boot :)

> The patch description and code comments are, um, a _bit_ light for this
> level of subtlety. :)

Add to that, for any page it is safe to skip and not add it to list.
So the requirement is for when page must *not* be on this list:
it must not be there if it is needed by guest but was not
modified since the request.

Calling it "free" will just keep confusing people.  Either use the
verbose "free or modified" or invent a new word like "discardable" and
add a comment explaining that page is always discardable unless it's
content is needed by Linux but was not modified since the request.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-27 23:05           ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 23:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Liang Li, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Wed, Jul 27, 2016 at 03:16:57PM -0700, Dave Hansen wrote:
> On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> > On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> >> On 07/26/2016 06:23 PM, Liang Li wrote:
> >>> +	for_each_migratetype_order(order, t) {
> >>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> >>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> >>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> >>> +				page_num = 1UL << order;
> >>> +				if (pfn + page_num > end_pfn)
> >>> +					page_num = end_pfn - pfn;
> >>> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> >>> +			}
> >>> +		}
> >>> +	}
> >>
> >> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> >> number of bits being set in the bitmap.  Seems like calling it nr_pages
> >> or num_pages would be more appropriate.
> >>
> >> Isn't this bitmap out of date by the time it's send up to the
> >> hypervisor?  Is there something that makes the inaccuracy OK here?
> > 
> > Yes. Calling these free pages is unfortunate. It's likely to confuse
> > people thinking they can just discard these pages.
> > 
> > Hypervisor sends a request. We respond with this list of pages, and
> > the guarantee hypervisor needs is that these were free sometime between request
> > and response, so they are safe to free if they are unmodified
> > since the request. hypervisor can detect modifications so
> > it can detect modifications itself and does not need guest help.
> 
> Ahh, that makes sense.
> 
> So the hypervisor is trying to figure out: "Which pages do I move?".  It
> wants to know which pages the guest thinks have good data and need to
> move.  But, the list of free pages is (likely) smaller than the list of
> pages with good data, so it asks for that instead.
> 
> A write to a page means that it has valuable data, regardless of whether
> it was in the free list or not.
> 
> The hypervisor only skips moving pages that were free *and* were never
> written to.

Right - except never is a long time, so we just need it "since the request
was received".

> So we never lose data, even if this "get free page info"
> stuff is totally out of date.

So if you include pages that were written to before the request
then yes data will be lost. This is why we do this scan
after we get the request and not e.g. on boot :)

> The patch description and code comments are, um, a _bit_ light for this
> level of subtlety. :)

Add to that, for any page it is safe to skip and not add it to list.
So the requirement is for when page must *not* be on this list:
it must not be there if it is needed by guest but was not
modified since the request.

Calling it "free" will just keep confusing people.  Either use the
verbose "free or modified" or invent a new word like "discardable" and
add a comment explaining that page is always discardable unless it's
content is needed by Linux but was not modified since the request.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 22:16         ` Dave Hansen
                           ` (2 preceding siblings ...)
  (?)
@ 2016-07-27 23:05         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-27 23:05 UTC (permalink / raw)
  To: Dave Hansen
  Cc: virtio-dev, kvm, qemu-devel, Amit Shah, Liang Li, linux-kernel,
	virtualization, linux-mm, Vlastimil Babka, Paolo Bonzini,
	Andrew Morton, Mel Gorman, dgilbert

On Wed, Jul 27, 2016 at 03:16:57PM -0700, Dave Hansen wrote:
> On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> > On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> >> On 07/26/2016 06:23 PM, Liang Li wrote:
> >>> +	for_each_migratetype_order(order, t) {
> >>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> >>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> >>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> >>> +				page_num = 1UL << order;
> >>> +				if (pfn + page_num > end_pfn)
> >>> +					page_num = end_pfn - pfn;
> >>> +				bitmap_set(bitmap, pfn - start_pfn, page_num);
> >>> +			}
> >>> +		}
> >>> +	}
> >>
> >> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> >> number of bits being set in the bitmap.  Seems like calling it nr_pages
> >> or num_pages would be more appropriate.
> >>
> >> Isn't this bitmap out of date by the time it's send up to the
> >> hypervisor?  Is there something that makes the inaccuracy OK here?
> > 
> > Yes. Calling these free pages is unfortunate. It's likely to confuse
> > people thinking they can just discard these pages.
> > 
> > Hypervisor sends a request. We respond with this list of pages, and
> > the guarantee hypervisor needs is that these were free sometime between request
> > and response, so they are safe to free if they are unmodified
> > since the request. hypervisor can detect modifications so
> > it can detect modifications itself and does not need guest help.
> 
> Ahh, that makes sense.
> 
> So the hypervisor is trying to figure out: "Which pages do I move?".  It
> wants to know which pages the guest thinks have good data and need to
> move.  But, the list of free pages is (likely) smaller than the list of
> pages with good data, so it asks for that instead.
> 
> A write to a page means that it has valuable data, regardless of whether
> it was in the free list or not.
> 
> The hypervisor only skips moving pages that were free *and* were never
> written to.

Right - except never is a long time, so we just need it "since the request
was received".

> So we never lose data, even if this "get free page info"
> stuff is totally out of date.

So if you include pages that were written to before the request
then yes data will be lost. This is why we do this scan
after we get the request and not e.g. on boot :)

> The patch description and code comments are, um, a _bit_ light for this
> level of subtlety. :)

Add to that, for any page it is safe to skip and not add it to list.
So the requirement is for when page must *not* be on this list:
it must not be there if it is needed by guest but was not
modified since the request.

Calling it "free" will just keep confusing people.  Either use the
verbose "free or modified" or invent a new word like "discardable" and
add a comment explaining that page is always discardable unless it's
content is needed by Linux but was not modified since the request.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 16:40     ` Dave Hansen
  (?)
@ 2016-07-28  0:10       ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  0:10 UTC (permalink / raw)
  To: Hansen, Dave, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

> Subject: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free
> page info
> 
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	for_each_migratetype_order(order, t) {
> > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > +				page_num = 1UL << order;
> > +				if (pfn + page_num > end_pfn)
> > +					page_num = end_pfn - pfn;
> > +				bitmap_set(bitmap, pfn - start_pfn,
> page_num);
> > +			}
> > +		}
> > +	}
> 
> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> number of bits being set in the bitmap.  Seems like calling it nr_pages or
> num_pages would be more appropriate.
> 

You are right,  will change.

> Isn't this bitmap out of date by the time it's send up to the hypervisor?  Is
> there something that makes the inaccuracy OK here?

Yes. The dirty page logging will be used to correct the inaccuracy.
The dirty page logging should be started before getting the free page bitmap, then if some of the free pages become no free for writing, these pages will be tracked by the dirty page logging mechanism.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-28  0:10       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  0:10 UTC (permalink / raw)
  To: Hansen, Dave, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

> Subject: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free
> page info
> 
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	for_each_migratetype_order(order, t) {
> > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > +				page_num = 1UL << order;
> > +				if (pfn + page_num > end_pfn)
> > +					page_num = end_pfn - pfn;
> > +				bitmap_set(bitmap, pfn - start_pfn,
> page_num);
> > +			}
> > +		}
> > +	}
> 
> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> number of bits being set in the bitmap.  Seems like calling it nr_pages or
> num_pages would be more appropriate.
> 

You are right,  will change.

> Isn't this bitmap out of date by the time it's send up to the hypervisor?  Is
> there something that makes the inaccuracy OK here?

Yes. The dirty page logging will be used to correct the inaccuracy.
The dirty page logging should be started before getting the free page bitmap, then if some of the free pages become no free for writing, these pages will be tracked by the dirty page logging mechanism.

Thanks!
Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-28  0:10       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  0:10 UTC (permalink / raw)
  To: Hansen, Dave, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Andrew Morton, Vlastimil Babka, Mel Gorman,
	Michael S. Tsirkin, Paolo Bonzini, Cornelia Huck, Amit Shah

> Subject: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free
> page info
> 
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	for_each_migratetype_order(order, t) {
> > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > +				page_num = 1UL << order;
> > +				if (pfn + page_num > end_pfn)
> > +					page_num = end_pfn - pfn;
> > +				bitmap_set(bitmap, pfn - start_pfn,
> page_num);
> > +			}
> > +		}
> > +	}
> 
> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> number of bits being set in the bitmap.  Seems like calling it nr_pages or
> num_pages would be more appropriate.
> 

You are right,  will change.

> Isn't this bitmap out of date by the time it's send up to the hypervisor?  Is
> there something that makes the inaccuracy OK here?

Yes. The dirty page logging will be used to correct the inaccuracy.
The dirty page logging should be started before getting the free page bitmap, then if some of the free pages become no free for writing, these pages will be tracked by the dirty page logging mechanism.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 16:40     ` Dave Hansen
                       ` (3 preceding siblings ...)
  (?)
@ 2016-07-28  0:10     ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  0:10 UTC (permalink / raw)
  To: Hansen, Dave, linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Michael S. Tsirkin, qemu-devel,
	dgilbert, linux-mm, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, Vlastimil Babka

> Subject: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free
> page info
> 
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	for_each_migratetype_order(order, t) {
> > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > +				page_num = 1UL << order;
> > +				if (pfn + page_num > end_pfn)
> > +					page_num = end_pfn - pfn;
> > +				bitmap_set(bitmap, pfn - start_pfn,
> page_num);
> > +			}
> > +		}
> > +	}
> 
> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> number of bits being set in the bitmap.  Seems like calling it nr_pages or
> num_pages would be more appropriate.
> 

You are right,  will change.

> Isn't this bitmap out of date by the time it's send up to the hypervisor?  Is
> there something that makes the inaccuracy OK here?

Yes. The dirty page logging will be used to correct the inaccuracy.
The dirty page logging should be started before getting the free page bitmap, then if some of the free pages become no free for writing, these pages will be tracked by the dirty page logging mechanism.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-28  0:10       ` Li, Liang Z
  (?)
  (?)
@ 2016-07-28  0:17         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28  0:17 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 12:10:16AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free
> > page info
> > 
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	for_each_migratetype_order(order, t) {
> > > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > > +				page_num = 1UL << order;
> > > +				if (pfn + page_num > end_pfn)
> > > +					page_num = end_pfn - pfn;
> > > +				bitmap_set(bitmap, pfn - start_pfn,
> > page_num);
> > > +			}
> > > +		}
> > > +	}
> > 
> > Nit:  The 'page_num' nomenclature really confused me here.  It is the
> > number of bits being set in the bitmap.  Seems like calling it nr_pages or
> > num_pages would be more appropriate.
> > 
> 
> You are right,  will change.
> 
> > Isn't this bitmap out of date by the time it's send up to the hypervisor?  Is
> > there something that makes the inaccuracy OK here?
> 
> Yes. The dirty page logging will be used to correct the inaccuracy.
> The dirty page logging should be started before getting the free page bitmap, then if some of the free pages become no free for writing, these pages will be tracked by the dirty page logging mechanism.
> 
> Thanks!
> Liang

Right but this should be clear from code and naming.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-28  0:17         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28  0:17 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 12:10:16AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free
> > page info
> > 
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	for_each_migratetype_order(order, t) {
> > > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > > +				page_num = 1UL << order;
> > > +				if (pfn + page_num > end_pfn)
> > > +					page_num = end_pfn - pfn;
> > > +				bitmap_set(bitmap, pfn - start_pfn,
> > page_num);
> > > +			}
> > > +		}
> > > +	}
> > 
> > Nit:  The 'page_num' nomenclature really confused me here.  It is the
> > number of bits being set in the bitmap.  Seems like calling it nr_pages or
> > num_pages would be more appropriate.
> > 
> 
> You are right,  will change.
> 
> > Isn't this bitmap out of date by the time it's send up to the hypervisor?  Is
> > there something that makes the inaccuracy OK here?
> 
> Yes. The dirty page logging will be used to correct the inaccuracy.
> The dirty page logging should be started before getting the free page bitmap, then if some of the free pages become no free for writing, these pages will be tracked by the dirty page logging mechanism.
> 
> Thanks!
> Liang

Right but this should be clear from code and naming.


^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-28  0:17         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28  0:17 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 12:10:16AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free
> > page info
> > 
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	for_each_migratetype_order(order, t) {
> > > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > > +				page_num = 1UL << order;
> > > +				if (pfn + page_num > end_pfn)
> > > +					page_num = end_pfn - pfn;
> > > +				bitmap_set(bitmap, pfn - start_pfn,
> > page_num);
> > > +			}
> > > +		}
> > > +	}
> > 
> > Nit:  The 'page_num' nomenclature really confused me here.  It is the
> > number of bits being set in the bitmap.  Seems like calling it nr_pages or
> > num_pages would be more appropriate.
> > 
> 
> You are right,  will change.
> 
> > Isn't this bitmap out of date by the time it's send up to the hypervisor?  Is
> > there something that makes the inaccuracy OK here?
> 
> Yes. The dirty page logging will be used to correct the inaccuracy.
> The dirty page logging should be started before getting the free page bitmap, then if some of the free pages become no free for writing, these pages will be tracked by the dirty page logging mechanism.
> 
> Thanks!
> Liang

Right but this should be clear from code and naming.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-28  0:17         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28  0:17 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 12:10:16AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free
> > page info
> > 
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	for_each_migratetype_order(order, t) {
> > > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > > +				page_num = 1UL << order;
> > > +				if (pfn + page_num > end_pfn)
> > > +					page_num = end_pfn - pfn;
> > > +				bitmap_set(bitmap, pfn - start_pfn,
> > page_num);
> > > +			}
> > > +		}
> > > +	}
> > 
> > Nit:  The 'page_num' nomenclature really confused me here.  It is the
> > number of bits being set in the bitmap.  Seems like calling it nr_pages or
> > num_pages would be more appropriate.
> > 
> 
> You are right,  will change.
> 
> > Isn't this bitmap out of date by the time it's send up to the hypervisor?  Is
> > there something that makes the inaccuracy OK here?
> 
> Yes. The dirty page logging will be used to correct the inaccuracy.
> The dirty page logging should be started before getting the free page bitmap, then if some of the free pages become no free for writing, these pages will be tracked by the dirty page logging mechanism.
> 
> Thanks!
> Liang

Right but this should be clear from code and naming.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-28  0:10       ` Li, Liang Z
                         ` (2 preceding siblings ...)
  (?)
@ 2016-07-28  0:17       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28  0:17 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: virtio-dev, kvm, Hansen, Dave, qemu-devel, Amit Shah,
	linux-kernel, virtualization, linux-mm, Vlastimil Babka,
	Paolo Bonzini, Andrew Morton, Mel Gorman, dgilbert

On Thu, Jul 28, 2016 at 12:10:16AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free
> > page info
> > 
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	for_each_migratetype_order(order, t) {
> > > +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> > > +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> > > +			if (pfn >= start_pfn && pfn <= end_pfn) {
> > > +				page_num = 1UL << order;
> > > +				if (pfn + page_num > end_pfn)
> > > +					page_num = end_pfn - pfn;
> > > +				bitmap_set(bitmap, pfn - start_pfn,
> > page_num);
> > > +			}
> > > +		}
> > > +	}
> > 
> > Nit:  The 'page_num' nomenclature really confused me here.  It is the
> > number of bits being set in the bitmap.  Seems like calling it nr_pages or
> > num_pages would be more appropriate.
> > 
> 
> You are right,  will change.
> 
> > Isn't this bitmap out of date by the time it's send up to the hypervisor?  Is
> > there something that makes the inaccuracy OK here?
> 
> Yes. The dirty page logging will be used to correct the inaccuracy.
> The dirty page logging should be started before getting the free page bitmap, then if some of the free pages become no free for writing, these pages will be tracked by the dirty page logging mechanism.
> 
> Thanks!
> Liang

Right but this should be clear from code and naming.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27 16:03     ` Dave Hansen
  (?)
  (?)
@ 2016-07-28  1:13       ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  1:13 UTC (permalink / raw)
  To: Hansen, Dave, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Michael S. Tsirkin, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> process
> 
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How big
> was the pfn buffer before?

Yes, it is if the max pfn is more than 32GB.
The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too small, 
and it's the main reason for bad performance.
Use the max 1MB kmalloc is a balance between performance and flexibility,
a large page bitmap covers the range of all the memory is no good for a system
with huge amount of memory. If the bitmap is too small, it means we have
to traverse a long list for many times, and it's bad for performance.

Thanks!
Liang   

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  1:13       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  1:13 UTC (permalink / raw)
  To: Hansen, Dave, linux-kernel
  Cc: virtio-dev, kvm, Amit Shah, Michael S. Tsirkin, qemu-devel,
	dgilbert, linux-mm, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, Vlastimil Babka

> Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> process
> 
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How big
> was the pfn buffer before?

Yes, it is if the max pfn is more than 32GB.
The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too small, 
and it's the main reason for bad performance.
Use the max 1MB kmalloc is a balance between performance and flexibility,
a large page bitmap covers the range of all the memory is no good for a system
with huge amount of memory. If the bitmap is too small, it means we have
to traverse a long list for many times, and it's bad for performance.

Thanks!
Liang   

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  1:13       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  1:13 UTC (permalink / raw)
  To: Hansen, Dave, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Michael S. Tsirkin, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> process
> 
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How big
> was the pfn buffer before?

Yes, it is if the max pfn is more than 32GB.
The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too small, 
and it's the main reason for bad performance.
Use the max 1MB kmalloc is a balance between performance and flexibility,
a large page bitmap covers the range of all the memory is no good for a system
with huge amount of memory. If the bitmap is too small, it means we have
to traverse a long list for many times, and it's bad for performance.

Thanks!
Liang   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  1:13       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  1:13 UTC (permalink / raw)
  To: Hansen, Dave, linux-kernel
  Cc: virtualization, linux-mm, virtio-dev, kvm, qemu-devel, dgilbert,
	quintela, Michael S. Tsirkin, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> process
> 
> On 07/26/2016 06:23 PM, Liang Li wrote:
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How big
> was the pfn buffer before?

Yes, it is if the max pfn is more than 32GB.
The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too small, 
and it's the main reason for bad performance.
Use the max 1MB kmalloc is a balance between performance and flexibility,
a large page bitmap covers the range of all the memory is no good for a system
with huge amount of memory. If the bitmap is too small, it means we have
to traverse a long list for many times, and it's bad for performance.

Thanks!
Liang   

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28  1:13       ` Li, Liang Z
  (?)
  (?)
@ 2016-07-28  1:45         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28  1:45 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 01:13:35AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> > process
> > 
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > 
> > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How big
> > was the pfn buffer before?
> 
> Yes, it is if the max pfn is more than 32GB.
> The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too small, 
> and it's the main reason for bad performance.
> Use the max 1MB kmalloc is a balance between performance and flexibility,
> a large page bitmap covers the range of all the memory is no good for a system
> with huge amount of memory. If the bitmap is too small, it means we have
> to traverse a long list for many times, and it's bad for performance.
> 
> Thanks!
> Liang   

There are all your implementation decisions though.

If guest memory is so fragmented that you only have order 0 4k pages,
then allocating a huge 1M contigious chunk is very problematic
in and of itself.

Most people rarely migrate and do not care how fast that happens.
Wasting a large chunk of memory (and it's zeroed for no good reason, so you
actually request host memory for it) for everyone to speed it up
when it does happen is not really an option.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  1:45         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28  1:45 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 01:13:35AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> > process
> > 
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > 
> > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How big
> > was the pfn buffer before?
> 
> Yes, it is if the max pfn is more than 32GB.
> The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too small, 
> and it's the main reason for bad performance.
> Use the max 1MB kmalloc is a balance between performance and flexibility,
> a large page bitmap covers the range of all the memory is no good for a system
> with huge amount of memory. If the bitmap is too small, it means we have
> to traverse a long list for many times, and it's bad for performance.
> 
> Thanks!
> Liang   

There are all your implementation decisions though.

If guest memory is so fragmented that you only have order 0 4k pages,
then allocating a huge 1M contigious chunk is very problematic
in and of itself.

Most people rarely migrate and do not care how fast that happens.
Wasting a large chunk of memory (and it's zeroed for no good reason, so you
actually request host memory for it) for everyone to speed it up
when it does happen is not really an option.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  1:45         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28  1:45 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 01:13:35AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> > process
> > 
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > 
> > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How big
> > was the pfn buffer before?
> 
> Yes, it is if the max pfn is more than 32GB.
> The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too small, 
> and it's the main reason for bad performance.
> Use the max 1MB kmalloc is a balance between performance and flexibility,
> a large page bitmap covers the range of all the memory is no good for a system
> with huge amount of memory. If the bitmap is too small, it means we have
> to traverse a long list for many times, and it's bad for performance.
> 
> Thanks!
> Liang   

There are all your implementation decisions though.

If guest memory is so fragmented that you only have order 0 4k pages,
then allocating a huge 1M contigious chunk is very problematic
in and of itself.

Most people rarely migrate and do not care how fast that happens.
Wasting a large chunk of memory (and it's zeroed for no good reason, so you
actually request host memory for it) for everyone to speed it up
when it does happen is not really an option.

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  1:45         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28  1:45 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 01:13:35AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> > process
> > 
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > 
> > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How big
> > was the pfn buffer before?
> 
> Yes, it is if the max pfn is more than 32GB.
> The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too small, 
> and it's the main reason for bad performance.
> Use the max 1MB kmalloc is a balance between performance and flexibility,
> a large page bitmap covers the range of all the memory is no good for a system
> with huge amount of memory. If the bitmap is too small, it means we have
> to traverse a long list for many times, and it's bad for performance.
> 
> Thanks!
> Liang   

There are all your implementation decisions though.

If guest memory is so fragmented that you only have order 0 4k pages,
then allocating a huge 1M contigious chunk is very problematic
in and of itself.

Most people rarely migrate and do not care how fast that happens.
Wasting a large chunk of memory (and it's zeroed for no good reason, so you
actually request host memory for it) for everyone to speed it up
when it does happen is not really an option.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28  1:13       ` Li, Liang Z
                         ` (2 preceding siblings ...)
  (?)
@ 2016-07-28  1:45       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28  1:45 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: virtio-dev, kvm, Hansen, Dave, qemu-devel, Amit Shah,
	linux-kernel, virtualization, linux-mm, Vlastimil Babka,
	Paolo Bonzini, Andrew Morton, Mel Gorman, dgilbert

On Thu, Jul 28, 2016 at 01:13:35AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> > process
> > 
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > 
> > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.  How big
> > was the pfn buffer before?
> 
> Yes, it is if the max pfn is more than 32GB.
> The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too small, 
> and it's the main reason for bad performance.
> Use the max 1MB kmalloc is a balance between performance and flexibility,
> a large page bitmap covers the range of all the memory is no good for a system
> with huge amount of memory. If the bitmap is too small, it means we have
> to traverse a long list for many times, and it's bad for performance.
> 
> Thanks!
> Liang   

There are all your implementation decisions though.

If guest memory is so fragmented that you only have order 0 4k pages,
then allocating a huge 1M contigious chunk is very problematic
in and of itself.

Most people rarely migrate and do not care how fast that happens.
Wasting a large chunk of memory (and it's zeroed for no good reason, so you
actually request host memory for it) for everyone to speed it up
when it does happen is not really an option.

-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27 21:36     ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-28  3:06       ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > + * to prevent a very large page bitmap, there are two reasons for this:
> > + * 1) to save memory.
> > + * 2) allocate a large bitmap may fail.
> > + *
> > + * The actual limit of pfn is determined by:
> > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > + *
> > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > +scan
> > + * the page list and send the PFNs with several times. To reduce the
> > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > +should
> > + * be set with a value which can cover most cases.
> 
> So what if it covers 1/32 of the memory? We'll do 32 exits and not 1, still not a
> big deal for a big guest.
> 

The issue here is the overhead is too high for scanning the page list for 32 times.
Limit the page bitmap size to a fixed value is better for a big guest?

> > + */
> > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> PAGE_SHIFT)
> > +/* 32GB */
> 
> I already said this with a smaller limit.
> 
> 	2<< 30  is 2G but that is not a useful comment.
> 	pls explain what is the reason for this selection.
> 
> Still applies here.
> 

I will add the comment for this.

> > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > +	if (virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> > +		unsigned long bmap_len;
> > +
> > +		/* cmd and req_id are not used here, set them to 0 */
> > +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> > +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> 
> no need to swap 0, just fill it in. in fact you allocated all 0s so no need to touch
> these fields at all.
> 

Will change in v3.

> > @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct
> > balloon_dev_info *vb_dev_info,  static int virtballoon_probe(struct
> > virtio_device *vdev)  {
> >  	struct virtio_balloon *vb;
> > -	int err;
> > +	int err, hdr_len;
> >
> >  	if (!vdev->config->get) {
> >  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@
> > -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
> >  	spin_lock_init(&vb->stop_update_lock);
> >  	vb->stop_update = false;
> >  	vb->num_pages = 0;
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> 
> What are these 2 longs in aid of?
> 
The rounddown(vb->start_pfn,  BITS_PER_LONG) and roundup(vb->end_pfn, BITS_PER_LONG) 
may cause (vb->end_pfn - vb->start_pfn) > vb->pfn_limit, so we need extra space to save the
bitmap for this case. 2 longs are enough.

> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> So it can go up to 1MByte but adding header size etc you need a higher order
> allocation. This is a waste, there is no need to have a power of two allocation.
> Start from the other side. Say "I want to allocate 32KBytes for the bitmap".
> Subtract the header and you get bitmap size.
> Calculate the pfn limit from there.
> 

Indeed, will change. Thanks a lot!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  3:06       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

> > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > + * to prevent a very large page bitmap, there are two reasons for this:
> > + * 1) to save memory.
> > + * 2) allocate a large bitmap may fail.
> > + *
> > + * The actual limit of pfn is determined by:
> > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > + *
> > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > +scan
> > + * the page list and send the PFNs with several times. To reduce the
> > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > +should
> > + * be set with a value which can cover most cases.
> 
> So what if it covers 1/32 of the memory? We'll do 32 exits and not 1, still not a
> big deal for a big guest.
> 

The issue here is the overhead is too high for scanning the page list for 32 times.
Limit the page bitmap size to a fixed value is better for a big guest?

> > + */
> > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> PAGE_SHIFT)
> > +/* 32GB */
> 
> I already said this with a smaller limit.
> 
> 	2<< 30  is 2G but that is not a useful comment.
> 	pls explain what is the reason for this selection.
> 
> Still applies here.
> 

I will add the comment for this.

> > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > +	if (virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> > +		unsigned long bmap_len;
> > +
> > +		/* cmd and req_id are not used here, set them to 0 */
> > +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> > +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> 
> no need to swap 0, just fill it in. in fact you allocated all 0s so no need to touch
> these fields at all.
> 

Will change in v3.

> > @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct
> > balloon_dev_info *vb_dev_info,  static int virtballoon_probe(struct
> > virtio_device *vdev)  {
> >  	struct virtio_balloon *vb;
> > -	int err;
> > +	int err, hdr_len;
> >
> >  	if (!vdev->config->get) {
> >  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@
> > -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
> >  	spin_lock_init(&vb->stop_update_lock);
> >  	vb->stop_update = false;
> >  	vb->num_pages = 0;
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> 
> What are these 2 longs in aid of?
> 
The rounddown(vb->start_pfn,  BITS_PER_LONG) and roundup(vb->end_pfn, BITS_PER_LONG) 
may cause (vb->end_pfn - vb->start_pfn) > vb->pfn_limit, so we need extra space to save the
bitmap for this case. 2 longs are enough.

> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> So it can go up to 1MByte but adding header size etc you need a higher order
> allocation. This is a waste, there is no need to have a power of two allocation.
> Start from the other side. Say "I want to allocate 32KBytes for the bitmap".
> Subtract the header and you get bitmap size.
> Calculate the pfn limit from there.
> 

Indeed, will change. Thanks a lot!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  3:06       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > + * to prevent a very large page bitmap, there are two reasons for this:
> > + * 1) to save memory.
> > + * 2) allocate a large bitmap may fail.
> > + *
> > + * The actual limit of pfn is determined by:
> > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > + *
> > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > +scan
> > + * the page list and send the PFNs with several times. To reduce the
> > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > +should
> > + * be set with a value which can cover most cases.
> 
> So what if it covers 1/32 of the memory? We'll do 32 exits and not 1, still not a
> big deal for a big guest.
> 

The issue here is the overhead is too high for scanning the page list for 32 times.
Limit the page bitmap size to a fixed value is better for a big guest?

> > + */
> > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> PAGE_SHIFT)
> > +/* 32GB */
> 
> I already said this with a smaller limit.
> 
> 	2<< 30  is 2G but that is not a useful comment.
> 	pls explain what is the reason for this selection.
> 
> Still applies here.
> 

I will add the comment for this.

> > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > +	if (virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> > +		unsigned long bmap_len;
> > +
> > +		/* cmd and req_id are not used here, set them to 0 */
> > +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> > +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> 
> no need to swap 0, just fill it in. in fact you allocated all 0s so no need to touch
> these fields at all.
> 

Will change in v3.

> > @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct
> > balloon_dev_info *vb_dev_info,  static int virtballoon_probe(struct
> > virtio_device *vdev)  {
> >  	struct virtio_balloon *vb;
> > -	int err;
> > +	int err, hdr_len;
> >
> >  	if (!vdev->config->get) {
> >  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@
> > -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
> >  	spin_lock_init(&vb->stop_update_lock);
> >  	vb->stop_update = false;
> >  	vb->num_pages = 0;
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> 
> What are these 2 longs in aid of?
> 
The rounddown(vb->start_pfn,  BITS_PER_LONG) and roundup(vb->end_pfn, BITS_PER_LONG) 
may cause (vb->end_pfn - vb->start_pfn) > vb->pfn_limit, so we need extra space to save the
bitmap for this case. 2 longs are enough.

> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> So it can go up to 1MByte but adding header size etc you need a higher order
> allocation. This is a waste, there is no need to have a power of two allocation.
> Start from the other side. Say "I want to allocate 32KBytes for the bitmap".
> Subtract the header and you get bitmap size.
> Calculate the pfn limit from there.
> 

Indeed, will change. Thanks a lot!

Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  3:06       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > + * to prevent a very large page bitmap, there are two reasons for this:
> > + * 1) to save memory.
> > + * 2) allocate a large bitmap may fail.
> > + *
> > + * The actual limit of pfn is determined by:
> > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > + *
> > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > +scan
> > + * the page list and send the PFNs with several times. To reduce the
> > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > +should
> > + * be set with a value which can cover most cases.
> 
> So what if it covers 1/32 of the memory? We'll do 32 exits and not 1, still not a
> big deal for a big guest.
> 

The issue here is the overhead is too high for scanning the page list for 32 times.
Limit the page bitmap size to a fixed value is better for a big guest?

> > + */
> > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> PAGE_SHIFT)
> > +/* 32GB */
> 
> I already said this with a smaller limit.
> 
> 	2<< 30  is 2G but that is not a useful comment.
> 	pls explain what is the reason for this selection.
> 
> Still applies here.
> 

I will add the comment for this.

> > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > +	if (virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> > +		unsigned long bmap_len;
> > +
> > +		/* cmd and req_id are not used here, set them to 0 */
> > +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> > +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> 
> no need to swap 0, just fill it in. in fact you allocated all 0s so no need to touch
> these fields at all.
> 

Will change in v3.

> > @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct
> > balloon_dev_info *vb_dev_info,  static int virtballoon_probe(struct
> > virtio_device *vdev)  {
> >  	struct virtio_balloon *vb;
> > -	int err;
> > +	int err, hdr_len;
> >
> >  	if (!vdev->config->get) {
> >  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> @@
> > -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
> >  	spin_lock_init(&vb->stop_update_lock);
> >  	vb->stop_update = false;
> >  	vb->num_pages = 0;
> > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> 
> What are these 2 longs in aid of?
> 
The rounddown(vb->start_pfn,  BITS_PER_LONG) and roundup(vb->end_pfn, BITS_PER_LONG) 
may cause (vb->end_pfn - vb->start_pfn) > vb->pfn_limit, so we need extra space to save the
bitmap for this case. 2 longs are enough.

> > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> 
> So it can go up to 1MByte but adding header size etc you need a higher order
> allocation. This is a waste, there is no need to have a power of two allocation.
> Start from the other side. Say "I want to allocate 32KBytes for the bitmap".
> Subtract the header and you get bitmap size.
> Calculate the pfn limit from there.
> 

Indeed, will change. Thanks a lot!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27 21:39       ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-28  3:30         ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:30 UTC (permalink / raw)
  To: Michael S. Tsirkin, Hansen, Dave
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> process
> 
> On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> >
> > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > How big was the pfn buffer before?
> 
> 
> Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> 
> --
> MST

Limit to 1G is bad for the performance, I sent you the test result several weeks ago.

Paste it bellow:
------------------------------------------------------------------------------------------------------------------------
About the size of page bitmap, I have test the performance of filling the balloon to 15GB with a
 16GB RAM VM.

===============================
32K Byte (cover 1GB of RAM)

Time spends on inflating: 2031ms
---------------------------------------------
64K Byte (cover 2GB of RAM)

Time spends on inflating: 1507ms
--------------------------------------------
512K Byte (cover 16GB of RAM)

Time spends on inflating: 1237ms
================================

If possible, a big bitmap is better for performance.

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  3:30         ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:30 UTC (permalink / raw)
  To: Michael S. Tsirkin, Hansen, Dave
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> process
> 
> On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> >
> > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > How big was the pfn buffer before?
> 
> 
> Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> 
> --
> MST

Limit to 1G is bad for the performance, I sent you the test result several weeks ago.

Paste it bellow:
------------------------------------------------------------------------------------------------------------------------
About the size of page bitmap, I have test the performance of filling the balloon to 15GB with a
 16GB RAM VM.

===============================
32K Byte (cover 1GB of RAM)

Time spends on inflating: 2031ms
---------------------------------------------
64K Byte (cover 2GB of RAM)

Time spends on inflating: 1507ms
--------------------------------------------
512K Byte (cover 16GB of RAM)

Time spends on inflating: 1237ms
================================

If possible, a big bitmap is better for performance.

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  3:30         ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:30 UTC (permalink / raw)
  To: Michael S. Tsirkin, Hansen, Dave
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> process
> 
> On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> >
> > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > How big was the pfn buffer before?
> 
> 
> Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> 
> --
> MST

Limit to 1G is bad for the performance, I sent you the test result several weeks ago.

Paste it bellow:
------------------------------------------------------------------------------------------------------------------------
About the size of page bitmap, I have test the performance of filling the balloon to 15GB with a
 16GB RAM VM.

===============================
32K Byte (cover 1GB of RAM)

Time spends on inflating: 2031ms
---------------------------------------------
64K Byte (cover 2GB of RAM)

Time spends on inflating: 1507ms
--------------------------------------------
512K Byte (cover 16GB of RAM)

Time spends on inflating: 1237ms
================================

If possible, a big bitmap is better for performance.

Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  3:30         ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:30 UTC (permalink / raw)
  To: Michael S. Tsirkin, Hansen, Dave
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> process
> 
> On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> >
> > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > How big was the pfn buffer before?
> 
> 
> Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> 
> --
> MST

Limit to 1G is bad for the performance, I sent you the test result several weeks ago.

Paste it bellow:
------------------------------------------------------------------------------------------------------------------------
About the size of page bitmap, I have test the performance of filling the balloon to 15GB with a
 16GB RAM VM.

===============================
32K Byte (cover 1GB of RAM)

Time spends on inflating: 2031ms
---------------------------------------------
64K Byte (cover 2GB of RAM)

Time spends on inflating: 1507ms
--------------------------------------------
512K Byte (cover 16GB of RAM)

Time spends on inflating: 1237ms
================================

If possible, a big bitmap is better for performance.

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27 21:39       ` Michael S. Tsirkin
                         ` (2 preceding siblings ...)
  (?)
@ 2016-07-28  3:30       ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:30 UTC (permalink / raw)
  To: Michael S. Tsirkin, Hansen, Dave
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

> Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> process
> 
> On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> >
> > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > How big was the pfn buffer before?
> 
> 
> Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> 
> --
> MST

Limit to 1G is bad for the performance, I sent you the test result several weeks ago.

Paste it bellow:
------------------------------------------------------------------------------------------------------------------------
About the size of page bitmap, I have test the performance of filling the balloon to 15GB with a
 16GB RAM VM.

===============================
32K Byte (cover 1GB of RAM)

Time spends on inflating: 2031ms
---------------------------------------------
64K Byte (cover 2GB of RAM)

Time spends on inflating: 1507ms
--------------------------------------------
512K Byte (cover 16GB of RAM)

Time spends on inflating: 1237ms
================================

If possible, a big bitmap is better for performance.

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-27 22:07     ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-28  3:48       ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > +/*
> > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > + * to prevent a very large page bitmap, there are two reasons for this:
> > + * 1) to save memory.
> > + * 2) allocate a large bitmap may fail.
> > + *
> > + * The actual limit of pfn is determined by:
> > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > + *
> > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > +scan
> > + * the page list and send the PFNs with several times. To reduce the
> > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > +should
> > + * be set with a value which can cover most cases.
> > + */
> > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> PAGE_SHIFT)
> > +/* 32GB */
> > +
> >  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> >
> > +extern unsigned long get_max_pfn(void);
> > +
> 
> Please just include the correct header. No need for this hackery.
> 

Will change. Thanks!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  3:48       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

> > +/*
> > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > + * to prevent a very large page bitmap, there are two reasons for this:
> > + * 1) to save memory.
> > + * 2) allocate a large bitmap may fail.
> > + *
> > + * The actual limit of pfn is determined by:
> > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > + *
> > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > +scan
> > + * the page list and send the PFNs with several times. To reduce the
> > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > +should
> > + * be set with a value which can cover most cases.
> > + */
> > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> PAGE_SHIFT)
> > +/* 32GB */
> > +
> >  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> >
> > +extern unsigned long get_max_pfn(void);
> > +
> 
> Please just include the correct header. No need for this hackery.
> 

Will change. Thanks!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  3:48       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > +/*
> > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > + * to prevent a very large page bitmap, there are two reasons for this:
> > + * 1) to save memory.
> > + * 2) allocate a large bitmap may fail.
> > + *
> > + * The actual limit of pfn is determined by:
> > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > + *
> > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > +scan
> > + * the page list and send the PFNs with several times. To reduce the
> > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > +should
> > + * be set with a value which can cover most cases.
> > + */
> > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> PAGE_SHIFT)
> > +/* 32GB */
> > +
> >  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> >
> > +extern unsigned long get_max_pfn(void);
> > +
> 
> Please just include the correct header. No need for this hackery.
> 

Will change. Thanks!

Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  3:48       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  3:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > +/*
> > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > + * to prevent a very large page bitmap, there are two reasons for this:
> > + * 1) to save memory.
> > + * 2) allocate a large bitmap may fail.
> > + *
> > + * The actual limit of pfn is determined by:
> > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > + *
> > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > +scan
> > + * the page list and send the PFNs with several times. To reduce the
> > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > +should
> > + * be set with a value which can cover most cases.
> > + */
> > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> PAGE_SHIFT)
> > +/* 32GB */
> > +
> >  static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
> > module_param(oom_pages, int, S_IRUSR | S_IWUSR);
> > MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
> >
> > +extern unsigned long get_max_pfn(void);
> > +
> 
> Please just include the correct header. No need for this hackery.
> 

Will change. Thanks!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 22:16         ` Dave Hansen
  (?)
@ 2016-07-28  4:36           ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  4:36 UTC (permalink / raw)
  To: Hansen, Dave, Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> > On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> >> On 07/26/2016 06:23 PM, Liang Li wrote:
> >>> +	for_each_migratetype_order(order, t) {
> >>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> >>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> >>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> >>> +				page_num = 1UL << order;
> >>> +				if (pfn + page_num > end_pfn)
> >>> +					page_num = end_pfn - pfn;
> >>> +				bitmap_set(bitmap, pfn - start_pfn,
> page_num);
> >>> +			}
> >>> +		}
> >>> +	}
> >>
> >> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> >> number of bits being set in the bitmap.  Seems like calling it
> >> nr_pages or num_pages would be more appropriate.
> >>
> >> Isn't this bitmap out of date by the time it's send up to the
> >> hypervisor?  Is there something that makes the inaccuracy OK here?
> >
> > Yes. Calling these free pages is unfortunate. It's likely to confuse
> > people thinking they can just discard these pages.
> >
> > Hypervisor sends a request. We respond with this list of pages, and
> > the guarantee hypervisor needs is that these were free sometime
> > between request and response, so they are safe to free if they are
> > unmodified since the request. hypervisor can detect modifications so
> > it can detect modifications itself and does not need guest help.
> 
> Ahh, that makes sense.
> 
> So the hypervisor is trying to figure out: "Which pages do I move?".  It wants
> to know which pages the guest thinks have good data and need to move.
> But, the list of free pages is (likely) smaller than the list of pages with good
> data, so it asks for that instead.
> 
> A write to a page means that it has valuable data, regardless of whether it
> was in the free list or not.
> 
> The hypervisor only skips moving pages that were free *and* were never
> written to.  So we never lose data, even if this "get free page info"
> stuff is totally out of date.
> 
> The patch description and code comments are, um, a _bit_ light for this level
> of subtlety. :)

I will add more description about this in v3.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-28  4:36           ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  4:36 UTC (permalink / raw)
  To: Hansen, Dave, Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> > On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> >> On 07/26/2016 06:23 PM, Liang Li wrote:
> >>> +	for_each_migratetype_order(order, t) {
> >>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> >>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> >>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> >>> +				page_num = 1UL << order;
> >>> +				if (pfn + page_num > end_pfn)
> >>> +					page_num = end_pfn - pfn;
> >>> +				bitmap_set(bitmap, pfn - start_pfn,
> page_num);
> >>> +			}
> >>> +		}
> >>> +	}
> >>
> >> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> >> number of bits being set in the bitmap.  Seems like calling it
> >> nr_pages or num_pages would be more appropriate.
> >>
> >> Isn't this bitmap out of date by the time it's send up to the
> >> hypervisor?  Is there something that makes the inaccuracy OK here?
> >
> > Yes. Calling these free pages is unfortunate. It's likely to confuse
> > people thinking they can just discard these pages.
> >
> > Hypervisor sends a request. We respond with this list of pages, and
> > the guarantee hypervisor needs is that these were free sometime
> > between request and response, so they are safe to free if they are
> > unmodified since the request. hypervisor can detect modifications so
> > it can detect modifications itself and does not need guest help.
> 
> Ahh, that makes sense.
> 
> So the hypervisor is trying to figure out: "Which pages do I move?".  It wants
> to know which pages the guest thinks have good data and need to move.
> But, the list of free pages is (likely) smaller than the list of pages with good
> data, so it asks for that instead.
> 
> A write to a page means that it has valuable data, regardless of whether it
> was in the free list or not.
> 
> The hypervisor only skips moving pages that were free *and* were never
> written to.  So we never lose data, even if this "get free page info"
> stuff is totally out of date.
> 
> The patch description and code comments are, um, a _bit_ light for this level
> of subtlety. :)

I will add more description about this in v3.

Thanks!
Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-28  4:36           ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  4:36 UTC (permalink / raw)
  To: Hansen, Dave, Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> > On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> >> On 07/26/2016 06:23 PM, Liang Li wrote:
> >>> +	for_each_migratetype_order(order, t) {
> >>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> >>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> >>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> >>> +				page_num = 1UL << order;
> >>> +				if (pfn + page_num > end_pfn)
> >>> +					page_num = end_pfn - pfn;
> >>> +				bitmap_set(bitmap, pfn - start_pfn,
> page_num);
> >>> +			}
> >>> +		}
> >>> +	}
> >>
> >> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> >> number of bits being set in the bitmap.  Seems like calling it
> >> nr_pages or num_pages would be more appropriate.
> >>
> >> Isn't this bitmap out of date by the time it's send up to the
> >> hypervisor?  Is there something that makes the inaccuracy OK here?
> >
> > Yes. Calling these free pages is unfortunate. It's likely to confuse
> > people thinking they can just discard these pages.
> >
> > Hypervisor sends a request. We respond with this list of pages, and
> > the guarantee hypervisor needs is that these were free sometime
> > between request and response, so they are safe to free if they are
> > unmodified since the request. hypervisor can detect modifications so
> > it can detect modifications itself and does not need guest help.
> 
> Ahh, that makes sense.
> 
> So the hypervisor is trying to figure out: "Which pages do I move?".  It wants
> to know which pages the guest thinks have good data and need to move.
> But, the list of free pages is (likely) smaller than the list of pages with good
> data, so it asks for that instead.
> 
> A write to a page means that it has valuable data, regardless of whether it
> was in the free list or not.
> 
> The hypervisor only skips moving pages that were free *and* were never
> written to.  So we never lose data, even if this "get free page info"
> stuff is totally out of date.
> 
> The patch description and code comments are, um, a _bit_ light for this level
> of subtlety. :)

I will add more description about this in v3.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 22:16         ` Dave Hansen
                           ` (4 preceding siblings ...)
  (?)
@ 2016-07-28  4:36         ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  4:36 UTC (permalink / raw)
  To: Hansen, Dave, Michael S. Tsirkin
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

> On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> > On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
> >> On 07/26/2016 06:23 PM, Liang Li wrote:
> >>> +	for_each_migratetype_order(order, t) {
> >>> +		list_for_each(curr, &zone->free_area[order].free_list[t]) {
> >>> +			pfn = page_to_pfn(list_entry(curr, struct page, lru));
> >>> +			if (pfn >= start_pfn && pfn <= end_pfn) {
> >>> +				page_num = 1UL << order;
> >>> +				if (pfn + page_num > end_pfn)
> >>> +					page_num = end_pfn - pfn;
> >>> +				bitmap_set(bitmap, pfn - start_pfn,
> page_num);
> >>> +			}
> >>> +		}
> >>> +	}
> >>
> >> Nit:  The 'page_num' nomenclature really confused me here.  It is the
> >> number of bits being set in the bitmap.  Seems like calling it
> >> nr_pages or num_pages would be more appropriate.
> >>
> >> Isn't this bitmap out of date by the time it's send up to the
> >> hypervisor?  Is there something that makes the inaccuracy OK here?
> >
> > Yes. Calling these free pages is unfortunate. It's likely to confuse
> > people thinking they can just discard these pages.
> >
> > Hypervisor sends a request. We respond with this list of pages, and
> > the guarantee hypervisor needs is that these were free sometime
> > between request and response, so they are safe to free if they are
> > unmodified since the request. hypervisor can detect modifications so
> > it can detect modifications itself and does not need guest help.
> 
> Ahh, that makes sense.
> 
> So the hypervisor is trying to figure out: "Which pages do I move?".  It wants
> to know which pages the guest thinks have good data and need to move.
> But, the list of free pages is (likely) smaller than the list of pages with good
> data, so it asks for that instead.
> 
> A write to a page means that it has valuable data, regardless of whether it
> was in the free list or not.
> 
> The hypervisor only skips moving pages that were free *and* were never
> written to.  So we never lose data, even if this "get free page info"
> stuff is totally out of date.
> 
> The patch description and code comments are, um, a _bit_ light for this level
> of subtlety. :)

I will add more description about this in v3.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 22:13     ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-28  5:30       ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  5:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7da61ad..3ad8b10
> > 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)  }
> > EXPORT_SYMBOL(get_max_pfn);
> >
> > +static void mark_free_pages_bitmap(struct zone *zone, unsigned long
> start_pfn,
> > +	unsigned long end_pfn, unsigned long *bitmap, unsigned long len) {
> > +	unsigned long pfn, flags, page_num;
> > +	unsigned int order, t;
> > +	struct list_head *curr;
> > +
> > +	if (zone_is_empty(zone))
> > +		return;
> > +	end_pfn = min(start_pfn + len, end_pfn);
> > +	spin_lock_irqsave(&zone->lock, flags);
> > +
> > +	for_each_migratetype_order(order, t) {
> 
> Why not do each order separately? This way you can use a single bit to pass a
> huge page to host.
> 

I thought about that before, and did not that because of complexity and small benefits.
Use separated page bitmaps for each order won't help to reduce the data traffic, except
ignoring the pages with small order. 

> Not a requirement but hey.
> 
> Alternatively (and maybe that is a better idea0 if you wanted to, you could
> just skip lone 4K pages.
> It's not clear that they are worth bothering with.
> Add a flag to start with some reasonably large order and go from there.
> 
One of the main reason of this patch is to reduce the network traffic as much as possible,
it looks strange to skip the lone 4K pages. Skipping these pages can made live migration
faster? I think it depends on the amount of lone 4K pages.

In the other hand, it's faster to send one bit through virtio than that send 4K bytes 
through even 10Gps network, is that true?

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-28  5:30       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  5:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7da61ad..3ad8b10
> > 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)  }
> > EXPORT_SYMBOL(get_max_pfn);
> >
> > +static void mark_free_pages_bitmap(struct zone *zone, unsigned long
> start_pfn,
> > +	unsigned long end_pfn, unsigned long *bitmap, unsigned long len) {
> > +	unsigned long pfn, flags, page_num;
> > +	unsigned int order, t;
> > +	struct list_head *curr;
> > +
> > +	if (zone_is_empty(zone))
> > +		return;
> > +	end_pfn = min(start_pfn + len, end_pfn);
> > +	spin_lock_irqsave(&zone->lock, flags);
> > +
> > +	for_each_migratetype_order(order, t) {
> 
> Why not do each order separately? This way you can use a single bit to pass a
> huge page to host.
> 

I thought about that before, and did not that because of complexity and small benefits.
Use separated page bitmaps for each order won't help to reduce the data traffic, except
ignoring the pages with small order. 

> Not a requirement but hey.
> 
> Alternatively (and maybe that is a better idea0 if you wanted to, you could
> just skip lone 4K pages.
> It's not clear that they are worth bothering with.
> Add a flag to start with some reasonably large order and go from there.
> 
One of the main reason of this patch is to reduce the network traffic as much as possible,
it looks strange to skip the lone 4K pages. Skipping these pages can made live migration
faster? I think it depends on the amount of lone 4K pages.

In the other hand, it's faster to send one bit through virtio than that send 4K bytes 
through even 10Gps network, is that true?

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-28  5:30       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  5:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7da61ad..3ad8b10
> > 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)  }
> > EXPORT_SYMBOL(get_max_pfn);
> >
> > +static void mark_free_pages_bitmap(struct zone *zone, unsigned long
> start_pfn,
> > +	unsigned long end_pfn, unsigned long *bitmap, unsigned long len) {
> > +	unsigned long pfn, flags, page_num;
> > +	unsigned int order, t;
> > +	struct list_head *curr;
> > +
> > +	if (zone_is_empty(zone))
> > +		return;
> > +	end_pfn = min(start_pfn + len, end_pfn);
> > +	spin_lock_irqsave(&zone->lock, flags);
> > +
> > +	for_each_migratetype_order(order, t) {
> 
> Why not do each order separately? This way you can use a single bit to pass a
> huge page to host.
> 

I thought about that before, and did not that because of complexity and small benefits.
Use separated page bitmaps for each order won't help to reduce the data traffic, except
ignoring the pages with small order. 

> Not a requirement but hey.
> 
> Alternatively (and maybe that is a better idea0 if you wanted to, you could
> just skip lone 4K pages.
> It's not clear that they are worth bothering with.
> Add a flag to start with some reasonably large order and go from there.
> 
One of the main reason of this patch is to reduce the network traffic as much as possible,
it looks strange to skip the lone 4K pages. Skipping these pages can made live migration
faster? I think it depends on the amount of lone 4K pages.

In the other hand, it's faster to send one bit through virtio than that send 4K bytes 
through even 10Gps network, is that true?

Liang


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
@ 2016-07-28  5:30       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  5:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7da61ad..3ad8b10
> > 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)  }
> > EXPORT_SYMBOL(get_max_pfn);
> >
> > +static void mark_free_pages_bitmap(struct zone *zone, unsigned long
> start_pfn,
> > +	unsigned long end_pfn, unsigned long *bitmap, unsigned long len) {
> > +	unsigned long pfn, flags, page_num;
> > +	unsigned int order, t;
> > +	struct list_head *curr;
> > +
> > +	if (zone_is_empty(zone))
> > +		return;
> > +	end_pfn = min(start_pfn + len, end_pfn);
> > +	spin_lock_irqsave(&zone->lock, flags);
> > +
> > +	for_each_migratetype_order(order, t) {
> 
> Why not do each order separately? This way you can use a single bit to pass a
> huge page to host.
> 

I thought about that before, and did not that because of complexity and small benefits.
Use separated page bitmaps for each order won't help to reduce the data traffic, except
ignoring the pages with small order. 

> Not a requirement but hey.
> 
> Alternatively (and maybe that is a better idea0 if you wanted to, you could
> just skip lone 4K pages.
> It's not clear that they are worth bothering with.
> Add a flag to start with some reasonably large order and go from there.
> 
One of the main reason of this patch is to reduce the network traffic as much as possible,
it looks strange to skip the lone 4K pages. Skipping these pages can made live migration
faster? I think it depends on the amount of lone 4K pages.

In the other hand, it's faster to send one bit through virtio than that send 4K bytes 
through even 10Gps network, is that true?

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info
  2016-07-27 22:13     ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-28  5:30     ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  5:30 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7da61ad..3ad8b10
> > 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4523,6 +4523,52 @@ unsigned long get_max_pfn(void)  }
> > EXPORT_SYMBOL(get_max_pfn);
> >
> > +static void mark_free_pages_bitmap(struct zone *zone, unsigned long
> start_pfn,
> > +	unsigned long end_pfn, unsigned long *bitmap, unsigned long len) {
> > +	unsigned long pfn, flags, page_num;
> > +	unsigned int order, t;
> > +	struct list_head *curr;
> > +
> > +	if (zone_is_empty(zone))
> > +		return;
> > +	end_pfn = min(start_pfn + len, end_pfn);
> > +	spin_lock_irqsave(&zone->lock, flags);
> > +
> > +	for_each_migratetype_order(order, t) {
> 
> Why not do each order separately? This way you can use a single bit to pass a
> huge page to host.
> 

I thought about that before, and did not that because of complexity and small benefits.
Use separated page bitmaps for each order won't help to reduce the data traffic, except
ignoring the pages with small order. 

> Not a requirement but hey.
> 
> Alternatively (and maybe that is a better idea0 if you wanted to, you could
> just skip lone 4K pages.
> It's not clear that they are worth bothering with.
> Add a flag to start with some reasonably large order and go from there.
> 
One of the main reason of this patch is to reduce the network traffic as much as possible,
it looks strange to skip the lone 4K pages. Skipping these pages can made live migration
faster? I think it depends on the amount of lone 4K pages.

In the other hand, it's faster to send one bit through virtio than that send 4K bytes 
through even 10Gps network, is that true?

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28  1:45         ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-28  6:36           ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  6:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > How big was the pfn buffer before?
> >
> > Yes, it is if the max pfn is more than 32GB.
> > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too
> > small, and it's the main reason for bad performance.
> > Use the max 1MB kmalloc is a balance between performance and
> > flexibility, a large page bitmap covers the range of all the memory is
> > no good for a system with huge amount of memory. If the bitmap is too
> > small, it means we have to traverse a long list for many times, and it's bad
> for performance.
> >
> > Thanks!
> > Liang
> 
> There are all your implementation decisions though.
> 
> If guest memory is so fragmented that you only have order 0 4k pages, then
> allocating a huge 1M contigious chunk is very problematic in and of itself.
> 

The memory is allocated in the probe stage. This will not happen if the driver is
 loaded when booting the guest.

> Most people rarely migrate and do not care how fast that happens.
> Wasting a large chunk of memory (and it's zeroed for no good reason, so you
> actually request host memory for it) for everyone to speed it up when it
> does happen is not really an option.
> 
If people don't plan to do inflating/deflating, they should not enable the virtio-balloon
at the beginning, once they decide to use it, the driver should provide better performance
as much as possible.

1MB is a very small portion for a VM with more than 32GB memory and it's the *worst case*, 
for VM with less than 32GB memory, the amount of RAM depends on VM's memory size
and will be less than 1MB.

If 1MB is too big, how about 512K, or 256K?  32K seems too small.

Liang

> --
> MST
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  6:36           ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  6:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > How big was the pfn buffer before?
> >
> > Yes, it is if the max pfn is more than 32GB.
> > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too
> > small, and it's the main reason for bad performance.
> > Use the max 1MB kmalloc is a balance between performance and
> > flexibility, a large page bitmap covers the range of all the memory is
> > no good for a system with huge amount of memory. If the bitmap is too
> > small, it means we have to traverse a long list for many times, and it's bad
> for performance.
> >
> > Thanks!
> > Liang
> 
> There are all your implementation decisions though.
> 
> If guest memory is so fragmented that you only have order 0 4k pages, then
> allocating a huge 1M contigious chunk is very problematic in and of itself.
> 

The memory is allocated in the probe stage. This will not happen if the driver is
 loaded when booting the guest.

> Most people rarely migrate and do not care how fast that happens.
> Wasting a large chunk of memory (and it's zeroed for no good reason, so you
> actually request host memory for it) for everyone to speed it up when it
> does happen is not really an option.
> 
If people don't plan to do inflating/deflating, they should not enable the virtio-balloon
at the beginning, once they decide to use it, the driver should provide better performance
as much as possible.

1MB is a very small portion for a VM with more than 32GB memory and it's the *worst case*, 
for VM with less than 32GB memory, the amount of RAM depends on VM's memory size
and will be less than 1MB.

If 1MB is too big, how about 512K, or 256K?  32K seems too small.

Liang

> --
> MST
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  6:36           ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  6:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > How big was the pfn buffer before?
> >
> > Yes, it is if the max pfn is more than 32GB.
> > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too
> > small, and it's the main reason for bad performance.
> > Use the max 1MB kmalloc is a balance between performance and
> > flexibility, a large page bitmap covers the range of all the memory is
> > no good for a system with huge amount of memory. If the bitmap is too
> > small, it means we have to traverse a long list for many times, and it's bad
> for performance.
> >
> > Thanks!
> > Liang
> 
> There are all your implementation decisions though.
> 
> If guest memory is so fragmented that you only have order 0 4k pages, then
> allocating a huge 1M contigious chunk is very problematic in and of itself.
> 

The memory is allocated in the probe stage. This will not happen if the driver is
 loaded when booting the guest.

> Most people rarely migrate and do not care how fast that happens.
> Wasting a large chunk of memory (and it's zeroed for no good reason, so you
> actually request host memory for it) for everyone to speed it up when it
> does happen is not really an option.
> 
If people don't plan to do inflating/deflating, they should not enable the virtio-balloon
at the beginning, once they decide to use it, the driver should provide better performance
as much as possible.

1MB is a very small portion for a VM with more than 32GB memory and it's the *worst case*, 
for VM with less than 32GB memory, the amount of RAM depends on VM's memory size
and will be less than 1MB.

If 1MB is too big, how about 512K, or 256K?  32K seems too small.

Liang

> --
> MST
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28  6:36           ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  6:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > How big was the pfn buffer before?
> >
> > Yes, it is if the max pfn is more than 32GB.
> > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too
> > small, and it's the main reason for bad performance.
> > Use the max 1MB kmalloc is a balance between performance and
> > flexibility, a large page bitmap covers the range of all the memory is
> > no good for a system with huge amount of memory. If the bitmap is too
> > small, it means we have to traverse a long list for many times, and it's bad
> for performance.
> >
> > Thanks!
> > Liang
> 
> There are all your implementation decisions though.
> 
> If guest memory is so fragmented that you only have order 0 4k pages, then
> allocating a huge 1M contigious chunk is very problematic in and of itself.
> 

The memory is allocated in the probe stage. This will not happen if the driver is
 loaded when booting the guest.

> Most people rarely migrate and do not care how fast that happens.
> Wasting a large chunk of memory (and it's zeroed for no good reason, so you
> actually request host memory for it) for everyone to speed it up when it
> does happen is not really an option.
> 
If people don't plan to do inflating/deflating, they should not enable the virtio-balloon
at the beginning, once they decide to use it, the driver should provide better performance
as much as possible.

1MB is a very small portion for a VM with more than 32GB memory and it's the *worst case*, 
for VM with less than 32GB memory, the amount of RAM depends on VM's memory size
and will be less than 1MB.

If 1MB is too big, how about 512K, or 256K?  32K seems too small.

Liang

> --
> MST
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28  1:45         ` Michael S. Tsirkin
                           ` (2 preceding siblings ...)
  (?)
@ 2016-07-28  6:36         ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  6:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, Hansen, Dave, qemu-devel, Amit Shah,
	linux-kernel, virtualization, linux-mm, Vlastimil Babka,
	Paolo Bonzini, Andrew Morton, Mel Gorman, dgilbert

> > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > How big was the pfn buffer before?
> >
> > Yes, it is if the max pfn is more than 32GB.
> > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too
> > small, and it's the main reason for bad performance.
> > Use the max 1MB kmalloc is a balance between performance and
> > flexibility, a large page bitmap covers the range of all the memory is
> > no good for a system with huge amount of memory. If the bitmap is too
> > small, it means we have to traverse a long list for many times, and it's bad
> for performance.
> >
> > Thanks!
> > Liang
> 
> There are all your implementation decisions though.
> 
> If guest memory is so fragmented that you only have order 0 4k pages, then
> allocating a huge 1M contigious chunk is very problematic in and of itself.
> 

The memory is allocated in the probe stage. This will not happen if the driver is
 loaded when booting the guest.

> Most people rarely migrate and do not care how fast that happens.
> Wasting a large chunk of memory (and it's zeroed for no good reason, so you
> actually request host memory for it) for everyone to speed it up when it
> does happen is not really an option.
> 
If people don't plan to do inflating/deflating, they should not enable the virtio-balloon
at the beginning, once they decide to use it, the driver should provide better performance
as much as possible.

1MB is a very small portion for a VM with more than 32GB memory and it's the *worst case*, 
for VM with less than 32GB memory, the amount of RAM depends on VM's memory size
and will be less than 1MB.

If 1MB is too big, how about 512K, or 256K?  32K seems too small.

Liang

> --
> MST
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
  2016-07-27 22:00     ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-28  7:50       ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  7:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> >  }
> >
> > +static void update_free_pages_stats(struct virtio_balloon *vb,
> 
> why _stats?

Will change.

> > +	max_pfn = get_max_pfn();
> > +	mutex_lock(&vb->balloon_lock);
> > +	while (pfn < max_pfn) {
> > +		memset(vb->page_bitmap, 0, vb->bmap_len);
> > +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> > +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> > +		hdr->cmd = cpu_to_virtio16(vb->vdev,
> BALLOON_GET_FREE_PAGES);
> > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> > +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> > +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> > +		if (!ret) {
> > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > +
> 	BALLOON_FLAG_DONE);
> > +			if (pfn + vb->pfn_limit > max_pfn)
> > +				bmap_len = (max_pfn - pfn) /
> BITS_PER_BYTE;
> > +		} else
> > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > +
> 	BALLOON_FLAG_CONT);
> > +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> > +		sg_init_one(&sg_out, hdr,
> > +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> 
> Wait a second. This adds the same buffer multiple times in a loop.
> We will overwrite the buffer without waiting for hypervisor to process it.
> What did I miss?

I am no quite sure about this part, I though the virtqueue_kick(vq) will prevent
the buffer from overwrite, I realized it's wrong.

> > +
> > +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
> 
> this can fail. you want to maybe make sure vq has enough space before you
> use it or check error and wait.
> 
> > +		virtqueue_kick(vq);
> 
> why kick here within loop? wait until done. in fact kick outside lock is better
> for smp.

I will change this part in v3.

> 
> > +		pfn += vb->pfn_limit;
> > +	static const char * const names[] = { "inflate", "deflate", "stats",
> > +						 "misc" };
> >  	int err, nvqs;
> >
> >  	/*
> >  	 * We expect two virtqueues: inflate and deflate, and
> >  	 * optionally stat.
> >  	 */
> > -	nvqs = virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> > +		nvqs = 4;
> 
> Does misc vq depend on stats vq feature then? if yes please validate that.

Yes, what's you mean by 'validate' that?

> 
> 
> > +	else
> > +		nvqs = virtio_has_feature(vb->vdev,
> > +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 :
> 2;
> 
> Replace that ? with else too pls.

Will change.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
@ 2016-07-28  7:50       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  7:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> >  }
> >
> > +static void update_free_pages_stats(struct virtio_balloon *vb,
> 
> why _stats?

Will change.

> > +	max_pfn = get_max_pfn();
> > +	mutex_lock(&vb->balloon_lock);
> > +	while (pfn < max_pfn) {
> > +		memset(vb->page_bitmap, 0, vb->bmap_len);
> > +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> > +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> > +		hdr->cmd = cpu_to_virtio16(vb->vdev,
> BALLOON_GET_FREE_PAGES);
> > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> > +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> > +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> > +		if (!ret) {
> > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > +
> 	BALLOON_FLAG_DONE);
> > +			if (pfn + vb->pfn_limit > max_pfn)
> > +				bmap_len = (max_pfn - pfn) /
> BITS_PER_BYTE;
> > +		} else
> > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > +
> 	BALLOON_FLAG_CONT);
> > +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> > +		sg_init_one(&sg_out, hdr,
> > +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> 
> Wait a second. This adds the same buffer multiple times in a loop.
> We will overwrite the buffer without waiting for hypervisor to process it.
> What did I miss?

I am no quite sure about this part, I though the virtqueue_kick(vq) will prevent
the buffer from overwrite, I realized it's wrong.

> > +
> > +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
> 
> this can fail. you want to maybe make sure vq has enough space before you
> use it or check error and wait.
> 
> > +		virtqueue_kick(vq);
> 
> why kick here within loop? wait until done. in fact kick outside lock is better
> for smp.

I will change this part in v3.

> 
> > +		pfn += vb->pfn_limit;
> > +	static const char * const names[] = { "inflate", "deflate", "stats",
> > +						 "misc" };
> >  	int err, nvqs;
> >
> >  	/*
> >  	 * We expect two virtqueues: inflate and deflate, and
> >  	 * optionally stat.
> >  	 */
> > -	nvqs = virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> > +		nvqs = 4;
> 
> Does misc vq depend on stats vq feature then? if yes please validate that.

Yes, what's you mean by 'validate' that?

> 
> 
> > +	else
> > +		nvqs = virtio_has_feature(vb->vdev,
> > +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 :
> 2;
> 
> Replace that ? with else too pls.

Will change.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
@ 2016-07-28  7:50       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  7:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> >  }
> >
> > +static void update_free_pages_stats(struct virtio_balloon *vb,
> 
> why _stats?

Will change.

> > +	max_pfn = get_max_pfn();
> > +	mutex_lock(&vb->balloon_lock);
> > +	while (pfn < max_pfn) {
> > +		memset(vb->page_bitmap, 0, vb->bmap_len);
> > +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> > +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> > +		hdr->cmd = cpu_to_virtio16(vb->vdev,
> BALLOON_GET_FREE_PAGES);
> > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> > +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> > +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> > +		if (!ret) {
> > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > +
> 	BALLOON_FLAG_DONE);
> > +			if (pfn + vb->pfn_limit > max_pfn)
> > +				bmap_len = (max_pfn - pfn) /
> BITS_PER_BYTE;
> > +		} else
> > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > +
> 	BALLOON_FLAG_CONT);
> > +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> > +		sg_init_one(&sg_out, hdr,
> > +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> 
> Wait a second. This adds the same buffer multiple times in a loop.
> We will overwrite the buffer without waiting for hypervisor to process it.
> What did I miss?

I am no quite sure about this part, I though the virtqueue_kick(vq) will prevent
the buffer from overwrite, I realized it's wrong.

> > +
> > +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
> 
> this can fail. you want to maybe make sure vq has enough space before you
> use it or check error and wait.
> 
> > +		virtqueue_kick(vq);
> 
> why kick here within loop? wait until done. in fact kick outside lock is better
> for smp.

I will change this part in v3.

> 
> > +		pfn += vb->pfn_limit;
> > +	static const char * const names[] = { "inflate", "deflate", "stats",
> > +						 "misc" };
> >  	int err, nvqs;
> >
> >  	/*
> >  	 * We expect two virtqueues: inflate and deflate, and
> >  	 * optionally stat.
> >  	 */
> > -	nvqs = virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> > +		nvqs = 4;
> 
> Does misc vq depend on stats vq feature then? if yes please validate that.

Yes, what's you mean by 'validate' that?

> 
> 
> > +	else
> > +		nvqs = virtio_has_feature(vb->vdev,
> > +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 :
> 2;
> 
> Replace that ? with else too pls.

Will change.

Thanks!
Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
@ 2016-07-28  7:50       ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  7:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> >  }
> >
> > +static void update_free_pages_stats(struct virtio_balloon *vb,
> 
> why _stats?

Will change.

> > +	max_pfn = get_max_pfn();
> > +	mutex_lock(&vb->balloon_lock);
> > +	while (pfn < max_pfn) {
> > +		memset(vb->page_bitmap, 0, vb->bmap_len);
> > +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> > +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> > +		hdr->cmd = cpu_to_virtio16(vb->vdev,
> BALLOON_GET_FREE_PAGES);
> > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> > +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> > +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> > +		if (!ret) {
> > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > +
> 	BALLOON_FLAG_DONE);
> > +			if (pfn + vb->pfn_limit > max_pfn)
> > +				bmap_len = (max_pfn - pfn) /
> BITS_PER_BYTE;
> > +		} else
> > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > +
> 	BALLOON_FLAG_CONT);
> > +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> > +		sg_init_one(&sg_out, hdr,
> > +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> 
> Wait a second. This adds the same buffer multiple times in a loop.
> We will overwrite the buffer without waiting for hypervisor to process it.
> What did I miss?

I am no quite sure about this part, I though the virtqueue_kick(vq) will prevent
the buffer from overwrite, I realized it's wrong.

> > +
> > +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
> 
> this can fail. you want to maybe make sure vq has enough space before you
> use it or check error and wait.
> 
> > +		virtqueue_kick(vq);
> 
> why kick here within loop? wait until done. in fact kick outside lock is better
> for smp.

I will change this part in v3.

> 
> > +		pfn += vb->pfn_limit;
> > +	static const char * const names[] = { "inflate", "deflate", "stats",
> > +						 "misc" };
> >  	int err, nvqs;
> >
> >  	/*
> >  	 * We expect two virtqueues: inflate and deflate, and
> >  	 * optionally stat.
> >  	 */
> > -	nvqs = virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> > +		nvqs = 4;
> 
> Does misc vq depend on stats vq feature then? if yes please validate that.

Yes, what's you mean by 'validate' that?

> 
> 
> > +	else
> > +		nvqs = virtio_has_feature(vb->vdev,
> > +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 :
> 2;
> 
> Replace that ? with else too pls.

Will change.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
  2016-07-27 22:00     ` Michael S. Tsirkin
                       ` (2 preceding siblings ...)
  (?)
@ 2016-07-28  7:50     ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-28  7:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

> >  }
> >
> > +static void update_free_pages_stats(struct virtio_balloon *vb,
> 
> why _stats?

Will change.

> > +	max_pfn = get_max_pfn();
> > +	mutex_lock(&vb->balloon_lock);
> > +	while (pfn < max_pfn) {
> > +		memset(vb->page_bitmap, 0, vb->bmap_len);
> > +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> > +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> > +		hdr->cmd = cpu_to_virtio16(vb->vdev,
> BALLOON_GET_FREE_PAGES);
> > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> > +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> > +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> > +		if (!ret) {
> > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > +
> 	BALLOON_FLAG_DONE);
> > +			if (pfn + vb->pfn_limit > max_pfn)
> > +				bmap_len = (max_pfn - pfn) /
> BITS_PER_BYTE;
> > +		} else
> > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > +
> 	BALLOON_FLAG_CONT);
> > +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> > +		sg_init_one(&sg_out, hdr,
> > +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> 
> Wait a second. This adds the same buffer multiple times in a loop.
> We will overwrite the buffer without waiting for hypervisor to process it.
> What did I miss?

I am no quite sure about this part, I though the virtqueue_kick(vq) will prevent
the buffer from overwrite, I realized it's wrong.

> > +
> > +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
> 
> this can fail. you want to maybe make sure vq has enough space before you
> use it or check error and wait.
> 
> > +		virtqueue_kick(vq);
> 
> why kick here within loop? wait until done. in fact kick outside lock is better
> for smp.

I will change this part in v3.

> 
> > +		pfn += vb->pfn_limit;
> > +	static const char * const names[] = { "inflate", "deflate", "stats",
> > +						 "misc" };
> >  	int err, nvqs;
> >
> >  	/*
> >  	 * We expect two virtqueues: inflate and deflate, and
> >  	 * optionally stat.
> >  	 */
> > -	nvqs = virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> > +		nvqs = 4;
> 
> Does misc vq depend on stats vq feature then? if yes please validate that.

Yes, what's you mean by 'validate' that?

> 
> 
> > +	else
> > +		nvqs = virtio_has_feature(vb->vdev,
> > +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 :
> 2;
> 
> Replace that ? with else too pls.

Will change.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
  2016-07-28  7:50       ` Li, Liang Z
  (?)
@ 2016-07-28 21:37         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 21:37 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Thu, Jul 28, 2016 at 07:50:52AM +0000, Li, Liang Z wrote:
> > >  }
> > >
> > > +static void update_free_pages_stats(struct virtio_balloon *vb,
> > 
> > why _stats?
> 
> Will change.
> 
> > > +	max_pfn = get_max_pfn();
> > > +	mutex_lock(&vb->balloon_lock);
> > > +	while (pfn < max_pfn) {
> > > +		memset(vb->page_bitmap, 0, vb->bmap_len);
> > > +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> > > +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> > > +		hdr->cmd = cpu_to_virtio16(vb->vdev,
> > BALLOON_GET_FREE_PAGES);
> > > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > > +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> > > +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> > > +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> > > +		if (!ret) {
> > > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > > +
> > 	BALLOON_FLAG_DONE);
> > > +			if (pfn + vb->pfn_limit > max_pfn)
> > > +				bmap_len = (max_pfn - pfn) /
> > BITS_PER_BYTE;
> > > +		} else
> > > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > > +
> > 	BALLOON_FLAG_CONT);
> > > +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> > > +		sg_init_one(&sg_out, hdr,
> > > +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> > 
> > Wait a second. This adds the same buffer multiple times in a loop.
> > We will overwrite the buffer without waiting for hypervisor to process it.
> > What did I miss?
> 
> I am no quite sure about this part, I though the virtqueue_kick(vq) will prevent
> the buffer from overwrite, I realized it's wrong.
> 
> > > +
> > > +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
> > 
> > this can fail. you want to maybe make sure vq has enough space before you
> > use it or check error and wait.
> > 
> > > +		virtqueue_kick(vq);
> > 
> > why kick here within loop? wait until done. in fact kick outside lock is better
> > for smp.
> 
> I will change this part in v3.
> 
> > 
> > > +		pfn += vb->pfn_limit;
> > > +	static const char * const names[] = { "inflate", "deflate", "stats",
> > > +						 "misc" };
> > >  	int err, nvqs;
> > >
> > >  	/*
> > >  	 * We expect two virtqueues: inflate and deflate, and
> > >  	 * optionally stat.
> > >  	 */
> > > -	nvqs = virtio_has_feature(vb->vdev,
> > VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> > > +		nvqs = 4;
> > 
> > Does misc vq depend on stats vq feature then? if yes please validate that.
> 
> Yes, what's you mean by 'validate' that?

Either handle misc vq without a stats vq, or
clear VIRTIO_BALLOON_F_MISC_VQ if stats vq is off.

> > 
> > 
> > > +	else
> > > +		nvqs = virtio_has_feature(vb->vdev,
> > > +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 :
> > 2;
> > 
> > Replace that ? with else too pls.
> 
> Will change.
> 
> Thanks!
> Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
@ 2016-07-28 21:37         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 21:37 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Thu, Jul 28, 2016 at 07:50:52AM +0000, Li, Liang Z wrote:
> > >  }
> > >
> > > +static void update_free_pages_stats(struct virtio_balloon *vb,
> > 
> > why _stats?
> 
> Will change.
> 
> > > +	max_pfn = get_max_pfn();
> > > +	mutex_lock(&vb->balloon_lock);
> > > +	while (pfn < max_pfn) {
> > > +		memset(vb->page_bitmap, 0, vb->bmap_len);
> > > +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> > > +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> > > +		hdr->cmd = cpu_to_virtio16(vb->vdev,
> > BALLOON_GET_FREE_PAGES);
> > > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > > +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> > > +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> > > +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> > > +		if (!ret) {
> > > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > > +
> > 	BALLOON_FLAG_DONE);
> > > +			if (pfn + vb->pfn_limit > max_pfn)
> > > +				bmap_len = (max_pfn - pfn) /
> > BITS_PER_BYTE;
> > > +		} else
> > > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > > +
> > 	BALLOON_FLAG_CONT);
> > > +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> > > +		sg_init_one(&sg_out, hdr,
> > > +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> > 
> > Wait a second. This adds the same buffer multiple times in a loop.
> > We will overwrite the buffer without waiting for hypervisor to process it.
> > What did I miss?
> 
> I am no quite sure about this part, I though the virtqueue_kick(vq) will prevent
> the buffer from overwrite, I realized it's wrong.
> 
> > > +
> > > +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
> > 
> > this can fail. you want to maybe make sure vq has enough space before you
> > use it or check error and wait.
> > 
> > > +		virtqueue_kick(vq);
> > 
> > why kick here within loop? wait until done. in fact kick outside lock is better
> > for smp.
> 
> I will change this part in v3.
> 
> > 
> > > +		pfn += vb->pfn_limit;
> > > +	static const char * const names[] = { "inflate", "deflate", "stats",
> > > +						 "misc" };
> > >  	int err, nvqs;
> > >
> > >  	/*
> > >  	 * We expect two virtqueues: inflate and deflate, and
> > >  	 * optionally stat.
> > >  	 */
> > > -	nvqs = virtio_has_feature(vb->vdev,
> > VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> > > +		nvqs = 4;
> > 
> > Does misc vq depend on stats vq feature then? if yes please validate that.
> 
> Yes, what's you mean by 'validate' that?

Either handle misc vq without a stats vq, or
clear VIRTIO_BALLOON_F_MISC_VQ if stats vq is off.

> > 
> > 
> > > +	else
> > > +		nvqs = virtio_has_feature(vb->vdev,
> > > +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 :
> > 2;
> > 
> > Replace that ? with else too pls.
> 
> Will change.
> 
> Thanks!
> Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
@ 2016-07-28 21:37         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 21:37 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Thu, Jul 28, 2016 at 07:50:52AM +0000, Li, Liang Z wrote:
> > >  }
> > >
> > > +static void update_free_pages_stats(struct virtio_balloon *vb,
> > 
> > why _stats?
> 
> Will change.
> 
> > > +	max_pfn = get_max_pfn();
> > > +	mutex_lock(&vb->balloon_lock);
> > > +	while (pfn < max_pfn) {
> > > +		memset(vb->page_bitmap, 0, vb->bmap_len);
> > > +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> > > +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> > > +		hdr->cmd = cpu_to_virtio16(vb->vdev,
> > BALLOON_GET_FREE_PAGES);
> > > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > > +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> > > +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> > > +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> > > +		if (!ret) {
> > > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > > +
> > 	BALLOON_FLAG_DONE);
> > > +			if (pfn + vb->pfn_limit > max_pfn)
> > > +				bmap_len = (max_pfn - pfn) /
> > BITS_PER_BYTE;
> > > +		} else
> > > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > > +
> > 	BALLOON_FLAG_CONT);
> > > +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> > > +		sg_init_one(&sg_out, hdr,
> > > +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> > 
> > Wait a second. This adds the same buffer multiple times in a loop.
> > We will overwrite the buffer without waiting for hypervisor to process it.
> > What did I miss?
> 
> I am no quite sure about this part, I though the virtqueue_kick(vq) will prevent
> the buffer from overwrite, I realized it's wrong.
> 
> > > +
> > > +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
> > 
> > this can fail. you want to maybe make sure vq has enough space before you
> > use it or check error and wait.
> > 
> > > +		virtqueue_kick(vq);
> > 
> > why kick here within loop? wait until done. in fact kick outside lock is better
> > for smp.
> 
> I will change this part in v3.
> 
> > 
> > > +		pfn += vb->pfn_limit;
> > > +	static const char * const names[] = { "inflate", "deflate", "stats",
> > > +						 "misc" };
> > >  	int err, nvqs;
> > >
> > >  	/*
> > >  	 * We expect two virtqueues: inflate and deflate, and
> > >  	 * optionally stat.
> > >  	 */
> > > -	nvqs = virtio_has_feature(vb->vdev,
> > VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> > > +		nvqs = 4;
> > 
> > Does misc vq depend on stats vq feature then? if yes please validate that.
> 
> Yes, what's you mean by 'validate' that?

Either handle misc vq without a stats vq, or
clear VIRTIO_BALLOON_F_MISC_VQ if stats vq is off.

> > 
> > 
> > > +	else
> > > +		nvqs = virtio_has_feature(vb->vdev,
> > > +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 :
> > 2;
> > 
> > Replace that ? with else too pls.
> 
> Will change.
> 
> Thanks!
> Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 7/7] virtio-balloon: tell host vm's free page info
  2016-07-28  7:50       ` Li, Liang Z
                         ` (2 preceding siblings ...)
  (?)
@ 2016-07-28 21:37       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 21:37 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

On Thu, Jul 28, 2016 at 07:50:52AM +0000, Li, Liang Z wrote:
> > >  }
> > >
> > > +static void update_free_pages_stats(struct virtio_balloon *vb,
> > 
> > why _stats?
> 
> Will change.
> 
> > > +	max_pfn = get_max_pfn();
> > > +	mutex_lock(&vb->balloon_lock);
> > > +	while (pfn < max_pfn) {
> > > +		memset(vb->page_bitmap, 0, vb->bmap_len);
> > > +		ret = get_free_pages(pfn, pfn + vb->pfn_limit,
> > > +			vb->page_bitmap, vb->bmap_len * BITS_PER_BYTE);
> > > +		hdr->cmd = cpu_to_virtio16(vb->vdev,
> > BALLOON_GET_FREE_PAGES);
> > > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > > +		hdr->req_id = cpu_to_virtio64(vb->vdev, req_id);
> > > +		hdr->start_pfn = cpu_to_virtio64(vb->vdev, pfn);
> > > +		bmap_len = vb->pfn_limit / BITS_PER_BYTE;
> > > +		if (!ret) {
> > > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > > +
> > 	BALLOON_FLAG_DONE);
> > > +			if (pfn + vb->pfn_limit > max_pfn)
> > > +				bmap_len = (max_pfn - pfn) /
> > BITS_PER_BYTE;
> > > +		} else
> > > +			hdr->flag = cpu_to_virtio16(vb->vdev,
> > > +
> > 	BALLOON_FLAG_CONT);
> > > +		hdr->bmap_len = cpu_to_virtio64(vb->vdev, bmap_len);
> > > +		sg_init_one(&sg_out, hdr,
> > > +			 sizeof(struct balloon_bmap_hdr) + bmap_len);
> > 
> > Wait a second. This adds the same buffer multiple times in a loop.
> > We will overwrite the buffer without waiting for hypervisor to process it.
> > What did I miss?
> 
> I am no quite sure about this part, I though the virtqueue_kick(vq) will prevent
> the buffer from overwrite, I realized it's wrong.
> 
> > > +
> > > +		virtqueue_add_outbuf(vq, &sg_out, 1, vb, GFP_KERNEL);
> > 
> > this can fail. you want to maybe make sure vq has enough space before you
> > use it or check error and wait.
> > 
> > > +		virtqueue_kick(vq);
> > 
> > why kick here within loop? wait until done. in fact kick outside lock is better
> > for smp.
> 
> I will change this part in v3.
> 
> > 
> > > +		pfn += vb->pfn_limit;
> > > +	static const char * const names[] = { "inflate", "deflate", "stats",
> > > +						 "misc" };
> > >  	int err, nvqs;
> > >
> > >  	/*
> > >  	 * We expect two virtqueues: inflate and deflate, and
> > >  	 * optionally stat.
> > >  	 */
> > > -	nvqs = virtio_has_feature(vb->vdev,
> > VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
> > > +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_MISC_VQ))
> > > +		nvqs = 4;
> > 
> > Does misc vq depend on stats vq feature then? if yes please validate that.
> 
> Yes, what's you mean by 'validate' that?

Either handle misc vq without a stats vq, or
clear VIRTIO_BALLOON_F_MISC_VQ if stats vq is off.

> > 
> > 
> > > +	else
> > > +		nvqs = virtio_has_feature(vb->vdev,
> > > +					  VIRTIO_BALLOON_F_STATS_VQ) ? 3 :
> > 2;
> > 
> > Replace that ? with else too pls.
> 
> Will change.
> 
> Thanks!
> Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28  6:36           ` Li, Liang Z
  (?)
  (?)
@ 2016-07-28 21:51             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 21:51 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > > Yes, it is if the max pfn is more than 32GB.
> > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too
> > > small, and it's the main reason for bad performance.
> > > Use the max 1MB kmalloc is a balance between performance and
> > > flexibility, a large page bitmap covers the range of all the memory is
> > > no good for a system with huge amount of memory. If the bitmap is too
> > > small, it means we have to traverse a long list for many times, and it's bad
> > for performance.
> > >
> > > Thanks!
> > > Liang
> > 
> > There are all your implementation decisions though.
> > 
> > If guest memory is so fragmented that you only have order 0 4k pages, then
> > allocating a huge 1M contigious chunk is very problematic in and of itself.
> > 
> 
> The memory is allocated in the probe stage. This will not happen if the driver is
>  loaded when booting the guest.
> 
> > Most people rarely migrate and do not care how fast that happens.
> > Wasting a large chunk of memory (and it's zeroed for no good reason, so you
> > actually request host memory for it) for everyone to speed it up when it
> > does happen is not really an option.
> > 
> If people don't plan to do inflating/deflating, they should not enable the virtio-balloon
> at the beginning, once they decide to use it, the driver should provide better performance
> as much as possible.

The reason people inflate/deflate is so they can overcommit memory.
Do they need to overcommit very quickly? I don't see why.
So let's get what we can for free but I don't really believe
people would want to pay for it.

> 1MB is a very small portion for a VM with more than 32GB memory and it's the *worst case*, 
> for VM with less than 32GB memory, the amount of RAM depends on VM's memory size
> and will be less than 1MB.

It's guest memmory so might all be in swap and never touched,
your memset at probe time will fault it in and make hypervisor
actually pay for it.

> If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> 
> Liang

It's only small because it makes you rescan the free list.
So maybe you should do something else.
I looked at it a bit. Instead of scanning the free list, how about
scanning actual page structures? If page is unused, pass it to host.
Solves the problem of rescanning multiple times, does it not?


Another idea: allocate a small bitmap at probe time (e.g. for deflate),
allocate a bunch more on each request. Use something like
GFP_ATOMIC and a scatter/gather, if that fails use the smaller bitmap.



> > --
> > MST
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28 21:51             ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 21:51 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > > Yes, it is if the max pfn is more than 32GB.
> > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too
> > > small, and it's the main reason for bad performance.
> > > Use the max 1MB kmalloc is a balance between performance and
> > > flexibility, a large page bitmap covers the range of all the memory is
> > > no good for a system with huge amount of memory. If the bitmap is too
> > > small, it means we have to traverse a long list for many times, and it's bad
> > for performance.
> > >
> > > Thanks!
> > > Liang
> > 
> > There are all your implementation decisions though.
> > 
> > If guest memory is so fragmented that you only have order 0 4k pages, then
> > allocating a huge 1M contigious chunk is very problematic in and of itself.
> > 
> 
> The memory is allocated in the probe stage. This will not happen if the driver is
>  loaded when booting the guest.
> 
> > Most people rarely migrate and do not care how fast that happens.
> > Wasting a large chunk of memory (and it's zeroed for no good reason, so you
> > actually request host memory for it) for everyone to speed it up when it
> > does happen is not really an option.
> > 
> If people don't plan to do inflating/deflating, they should not enable the virtio-balloon
> at the beginning, once they decide to use it, the driver should provide better performance
> as much as possible.

The reason people inflate/deflate is so they can overcommit memory.
Do they need to overcommit very quickly? I don't see why.
So let's get what we can for free but I don't really believe
people would want to pay for it.

> 1MB is a very small portion for a VM with more than 32GB memory and it's the *worst case*, 
> for VM with less than 32GB memory, the amount of RAM depends on VM's memory size
> and will be less than 1MB.

It's guest memmory so might all be in swap and never touched,
your memset at probe time will fault it in and make hypervisor
actually pay for it.

> If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> 
> Liang

It's only small because it makes you rescan the free list.
So maybe you should do something else.
I looked at it a bit. Instead of scanning the free list, how about
scanning actual page structures? If page is unused, pass it to host.
Solves the problem of rescanning multiple times, does it not?


Another idea: allocate a small bitmap at probe time (e.g. for deflate),
allocate a bunch more on each request. Use something like
GFP_ATOMIC and a scatter/gather, if that fails use the smaller bitmap.



> > --
> > MST
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28 21:51             ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 21:51 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > > Yes, it is if the max pfn is more than 32GB.
> > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too
> > > small, and it's the main reason for bad performance.
> > > Use the max 1MB kmalloc is a balance between performance and
> > > flexibility, a large page bitmap covers the range of all the memory is
> > > no good for a system with huge amount of memory. If the bitmap is too
> > > small, it means we have to traverse a long list for many times, and it's bad
> > for performance.
> > >
> > > Thanks!
> > > Liang
> > 
> > There are all your implementation decisions though.
> > 
> > If guest memory is so fragmented that you only have order 0 4k pages, then
> > allocating a huge 1M contigious chunk is very problematic in and of itself.
> > 
> 
> The memory is allocated in the probe stage. This will not happen if the driver is
>  loaded when booting the guest.
> 
> > Most people rarely migrate and do not care how fast that happens.
> > Wasting a large chunk of memory (and it's zeroed for no good reason, so you
> > actually request host memory for it) for everyone to speed it up when it
> > does happen is not really an option.
> > 
> If people don't plan to do inflating/deflating, they should not enable the virtio-balloon
> at the beginning, once they decide to use it, the driver should provide better performance
> as much as possible.

The reason people inflate/deflate is so they can overcommit memory.
Do they need to overcommit very quickly? I don't see why.
So let's get what we can for free but I don't really believe
people would want to pay for it.

> 1MB is a very small portion for a VM with more than 32GB memory and it's the *worst case*, 
> for VM with less than 32GB memory, the amount of RAM depends on VM's memory size
> and will be less than 1MB.

It's guest memmory so might all be in swap and never touched,
your memset at probe time will fault it in and make hypervisor
actually pay for it.

> If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> 
> Liang

It's only small because it makes you rescan the free list.
So maybe you should do something else.
I looked at it a bit. Instead of scanning the free list, how about
scanning actual page structures? If page is unused, pass it to host.
Solves the problem of rescanning multiple times, does it not?


Another idea: allocate a small bitmap at probe time (e.g. for deflate),
allocate a bunch more on each request. Use something like
GFP_ATOMIC and a scatter/gather, if that fails use the smaller bitmap.



> > --
> > MST
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28 21:51             ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 21:51 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > > Yes, it is if the max pfn is more than 32GB.
> > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too
> > > small, and it's the main reason for bad performance.
> > > Use the max 1MB kmalloc is a balance between performance and
> > > flexibility, a large page bitmap covers the range of all the memory is
> > > no good for a system with huge amount of memory. If the bitmap is too
> > > small, it means we have to traverse a long list for many times, and it's bad
> > for performance.
> > >
> > > Thanks!
> > > Liang
> > 
> > There are all your implementation decisions though.
> > 
> > If guest memory is so fragmented that you only have order 0 4k pages, then
> > allocating a huge 1M contigious chunk is very problematic in and of itself.
> > 
> 
> The memory is allocated in the probe stage. This will not happen if the driver is
>  loaded when booting the guest.
> 
> > Most people rarely migrate and do not care how fast that happens.
> > Wasting a large chunk of memory (and it's zeroed for no good reason, so you
> > actually request host memory for it) for everyone to speed it up when it
> > does happen is not really an option.
> > 
> If people don't plan to do inflating/deflating, they should not enable the virtio-balloon
> at the beginning, once they decide to use it, the driver should provide better performance
> as much as possible.

The reason people inflate/deflate is so they can overcommit memory.
Do they need to overcommit very quickly? I don't see why.
So let's get what we can for free but I don't really believe
people would want to pay for it.

> 1MB is a very small portion for a VM with more than 32GB memory and it's the *worst case*, 
> for VM with less than 32GB memory, the amount of RAM depends on VM's memory size
> and will be less than 1MB.

It's guest memmory so might all be in swap and never touched,
your memset at probe time will fault it in and make hypervisor
actually pay for it.

> If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> 
> Liang

It's only small because it makes you rescan the free list.
So maybe you should do something else.
I looked at it a bit. Instead of scanning the free list, how about
scanning actual page structures? If page is unused, pass it to host.
Solves the problem of rescanning multiple times, does it not?


Another idea: allocate a small bitmap at probe time (e.g. for deflate),
allocate a bunch more on each request. Use something like
GFP_ATOMIC and a scatter/gather, if that fails use the smaller bitmap.



> > --
> > MST
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28  6:36           ` Li, Liang Z
                             ` (2 preceding siblings ...)
  (?)
@ 2016-07-28 21:51           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 21:51 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: virtio-dev, kvm, Hansen, Dave, qemu-devel, Amit Shah,
	linux-kernel, virtualization, linux-mm, Vlastimil Babka,
	Paolo Bonzini, Andrew Morton, Mel Gorman, dgilbert

On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > > Yes, it is if the max pfn is more than 32GB.
> > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's too
> > > small, and it's the main reason for bad performance.
> > > Use the max 1MB kmalloc is a balance between performance and
> > > flexibility, a large page bitmap covers the range of all the memory is
> > > no good for a system with huge amount of memory. If the bitmap is too
> > > small, it means we have to traverse a long list for many times, and it's bad
> > for performance.
> > >
> > > Thanks!
> > > Liang
> > 
> > There are all your implementation decisions though.
> > 
> > If guest memory is so fragmented that you only have order 0 4k pages, then
> > allocating a huge 1M contigious chunk is very problematic in and of itself.
> > 
> 
> The memory is allocated in the probe stage. This will not happen if the driver is
>  loaded when booting the guest.
> 
> > Most people rarely migrate and do not care how fast that happens.
> > Wasting a large chunk of memory (and it's zeroed for no good reason, so you
> > actually request host memory for it) for everyone to speed it up when it
> > does happen is not really an option.
> > 
> If people don't plan to do inflating/deflating, they should not enable the virtio-balloon
> at the beginning, once they decide to use it, the driver should provide better performance
> as much as possible.

The reason people inflate/deflate is so they can overcommit memory.
Do they need to overcommit very quickly? I don't see why.
So let's get what we can for free but I don't really believe
people would want to pay for it.

> 1MB is a very small portion for a VM with more than 32GB memory and it's the *worst case*, 
> for VM with less than 32GB memory, the amount of RAM depends on VM's memory size
> and will be less than 1MB.

It's guest memmory so might all be in swap and never touched,
your memset at probe time will fault it in and make hypervisor
actually pay for it.

> If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> 
> Liang

It's only small because it makes you rescan the free list.
So maybe you should do something else.
I looked at it a bit. Instead of scanning the free list, how about
scanning actual page structures? If page is unused, pass it to host.
Solves the problem of rescanning multiple times, does it not?


Another idea: allocate a small bitmap at probe time (e.g. for deflate),
allocate a bunch more on each request. Use something like
GFP_ATOMIC and a scatter/gather, if that fails use the smaller bitmap.



> > --
> > MST
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28  3:30         ` Li, Liang Z
  (?)
  (?)
@ 2016-07-28 22:15           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 22:15 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 03:30:09AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> > process
> > 
> > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > >
> > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > How big was the pfn buffer before?
> > 
> > 
> > Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> > 
> > --
> > MST
> 
> Limit to 1G is bad for the performance, I sent you the test result several weeks ago.
> 
> Paste it bellow:
> ------------------------------------------------------------------------------------------------------------------------
> About the size of page bitmap, I have test the performance of filling the balloon to 15GB with a
>  16GB RAM VM.
> 
> ===============================
> 32K Byte (cover 1GB of RAM)
> 
> Time spends on inflating: 2031ms
> ---------------------------------------------
> 64K Byte (cover 2GB of RAM)
> 
> Time spends on inflating: 1507ms
> --------------------------------------------
> 512K Byte (cover 16GB of RAM)
> 
> Time spends on inflating: 1237ms
> ================================
> 
> If possible, a big bitmap is better for performance.
> 
> Liang

Earlier you said:
a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

Here sending PFNs to host with 512K Byte map
should be almost free.

So is something else taking up the time?


-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28 22:15           ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 22:15 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 03:30:09AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> > process
> > 
> > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > >
> > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > How big was the pfn buffer before?
> > 
> > 
> > Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> > 
> > --
> > MST
> 
> Limit to 1G is bad for the performance, I sent you the test result several weeks ago.
> 
> Paste it bellow:
> ------------------------------------------------------------------------------------------------------------------------
> About the size of page bitmap, I have test the performance of filling the balloon to 15GB with a
>  16GB RAM VM.
> 
> ===============================
> 32K Byte (cover 1GB of RAM)
> 
> Time spends on inflating: 2031ms
> ---------------------------------------------
> 64K Byte (cover 2GB of RAM)
> 
> Time spends on inflating: 1507ms
> --------------------------------------------
> 512K Byte (cover 16GB of RAM)
> 
> Time spends on inflating: 1237ms
> ================================
> 
> If possible, a big bitmap is better for performance.
> 
> Liang

Earlier you said:
a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

Here sending PFNs to host with 512K Byte map
should be almost free.

So is something else taking up the time?


-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28 22:15           ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 22:15 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 03:30:09AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> > process
> > 
> > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > >
> > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > How big was the pfn buffer before?
> > 
> > 
> > Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> > 
> > --
> > MST
> 
> Limit to 1G is bad for the performance, I sent you the test result several weeks ago.
> 
> Paste it bellow:
> ------------------------------------------------------------------------------------------------------------------------
> About the size of page bitmap, I have test the performance of filling the balloon to 15GB with a
>  16GB RAM VM.
> 
> ===============================
> 32K Byte (cover 1GB of RAM)
> 
> Time spends on inflating: 2031ms
> ---------------------------------------------
> 64K Byte (cover 2GB of RAM)
> 
> Time spends on inflating: 1507ms
> --------------------------------------------
> 512K Byte (cover 16GB of RAM)
> 
> Time spends on inflating: 1237ms
> ================================
> 
> If possible, a big bitmap is better for performance.
> 
> Liang

Earlier you said:
a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

Here sending PFNs to host with 512K Byte map
should be almost free.

So is something else taking up the time?


-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28 22:15           ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 22:15 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

On Thu, Jul 28, 2016 at 03:30:09AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> > process
> > 
> > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > >
> > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > How big was the pfn buffer before?
> > 
> > 
> > Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> > 
> > --
> > MST
> 
> Limit to 1G is bad for the performance, I sent you the test result several weeks ago.
> 
> Paste it bellow:
> ------------------------------------------------------------------------------------------------------------------------
> About the size of page bitmap, I have test the performance of filling the balloon to 15GB with a
>  16GB RAM VM.
> 
> ===============================
> 32K Byte (cover 1GB of RAM)
> 
> Time spends on inflating: 2031ms
> ---------------------------------------------
> 64K Byte (cover 2GB of RAM)
> 
> Time spends on inflating: 1507ms
> --------------------------------------------
> 512K Byte (cover 16GB of RAM)
> 
> Time spends on inflating: 1237ms
> ================================
> 
> If possible, a big bitmap is better for performance.
> 
> Liang

Earlier you said:
a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

Here sending PFNs to host with 512K Byte map
should be almost free.

So is something else taking up the time?


-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28  3:30         ` Li, Liang Z
                           ` (3 preceding siblings ...)
  (?)
@ 2016-07-28 22:15         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 22:15 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: virtio-dev, kvm, Hansen, Dave, qemu-devel, Amit Shah,
	linux-kernel, virtualization, linux-mm, Vlastimil Babka,
	Paolo Bonzini, Andrew Morton, Mel Gorman, dgilbert

On Thu, Jul 28, 2016 at 03:30:09AM +0000, Li, Liang Z wrote:
> > Subject: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate
> > process
> > 
> > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > >
> > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > How big was the pfn buffer before?
> > 
> > 
> > Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> > 
> > --
> > MST
> 
> Limit to 1G is bad for the performance, I sent you the test result several weeks ago.
> 
> Paste it bellow:
> ------------------------------------------------------------------------------------------------------------------------
> About the size of page bitmap, I have test the performance of filling the balloon to 15GB with a
>  16GB RAM VM.
> 
> ===============================
> 32K Byte (cover 1GB of RAM)
> 
> Time spends on inflating: 2031ms
> ---------------------------------------------
> 64K Byte (cover 2GB of RAM)
> 
> Time spends on inflating: 1507ms
> --------------------------------------------
> 512K Byte (cover 16GB of RAM)
> 
> Time spends on inflating: 1237ms
> ================================
> 
> If possible, a big bitmap is better for performance.
> 
> Liang

Earlier you said:
a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

Here sending PFNs to host with 512K Byte map
should be almost free.

So is something else taking up the time?


-- 
MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28  3:06       ` Li, Liang Z
  (?)
  (?)
@ 2016-07-28 22:17         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 22:17 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Thu, Jul 28, 2016 at 03:06:37AM +0000, Li, Liang Z wrote:
> > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > + * 1) to save memory.
> > > + * 2) allocate a large bitmap may fail.
> > > + *
> > > + * The actual limit of pfn is determined by:
> > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > + *
> > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > > +scan
> > > + * the page list and send the PFNs with several times. To reduce the
> > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > +should
> > > + * be set with a value which can cover most cases.
> > 
> > So what if it covers 1/32 of the memory? We'll do 32 exits and not 1, still not a
> > big deal for a big guest.
> > 
> 
> The issue here is the overhead is too high for scanning the page list for 32 times.
> Limit the page bitmap size to a fixed value is better for a big guest?
> 

I'd say avoid scanning free lists completely. Scan pages themselves and
check the refcount to see whether they are free.
This way each page needs to be tested once.

And skip the whole optimization if less than e.g. 10% is free.

> > > + */
> > > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> > PAGE_SHIFT)
> > > +/* 32GB */
> > 
> > I already said this with a smaller limit.
> > 
> > 	2<< 30  is 2G but that is not a useful comment.
> > 	pls explain what is the reason for this selection.
> > 
> > Still applies here.
> > 
> 
> I will add the comment for this.
> 
> > > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > +	if (virtio_has_feature(vb->vdev,
> > VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > > +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> > > +		unsigned long bmap_len;
> > > +
> > > +		/* cmd and req_id are not used here, set them to 0 */
> > > +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> > > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > > +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> > > +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> > 
> > no need to swap 0, just fill it in. in fact you allocated all 0s so no need to touch
> > these fields at all.
> > 
> 
> Will change in v3.
> 
> > > @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct
> > > balloon_dev_info *vb_dev_info,  static int virtballoon_probe(struct
> > > virtio_device *vdev)  {
> > >  	struct virtio_balloon *vb;
> > > -	int err;
> > > +	int err, hdr_len;
> > >
> > >  	if (!vdev->config->get) {
> > >  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > @@
> > > -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
> > >  	spin_lock_init(&vb->stop_update_lock);
> > >  	vb->stop_update = false;
> > >  	vb->num_pages = 0;
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > 
> > What are these 2 longs in aid of?
> > 
> The rounddown(vb->start_pfn,  BITS_PER_LONG) and roundup(vb->end_pfn, BITS_PER_LONG) 
> may cause (vb->end_pfn - vb->start_pfn) > vb->pfn_limit, so we need extra space to save the
> bitmap for this case. 2 longs are enough.
> 
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > 
> > So it can go up to 1MByte but adding header size etc you need a higher order
> > allocation. This is a waste, there is no need to have a power of two allocation.
> > Start from the other side. Say "I want to allocate 32KBytes for the bitmap".
> > Subtract the header and you get bitmap size.
> > Calculate the pfn limit from there.
> > 
> 
> Indeed, will change. Thanks a lot!
> 
> Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28 22:17         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 22:17 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Thu, Jul 28, 2016 at 03:06:37AM +0000, Li, Liang Z wrote:
> > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > + * 1) to save memory.
> > > + * 2) allocate a large bitmap may fail.
> > > + *
> > > + * The actual limit of pfn is determined by:
> > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > + *
> > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > > +scan
> > > + * the page list and send the PFNs with several times. To reduce the
> > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > +should
> > > + * be set with a value which can cover most cases.
> > 
> > So what if it covers 1/32 of the memory? We'll do 32 exits and not 1, still not a
> > big deal for a big guest.
> > 
> 
> The issue here is the overhead is too high for scanning the page list for 32 times.
> Limit the page bitmap size to a fixed value is better for a big guest?
> 

I'd say avoid scanning free lists completely. Scan pages themselves and
check the refcount to see whether they are free.
This way each page needs to be tested once.

And skip the whole optimization if less than e.g. 10% is free.

> > > + */
> > > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> > PAGE_SHIFT)
> > > +/* 32GB */
> > 
> > I already said this with a smaller limit.
> > 
> > 	2<< 30  is 2G but that is not a useful comment.
> > 	pls explain what is the reason for this selection.
> > 
> > Still applies here.
> > 
> 
> I will add the comment for this.
> 
> > > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > +	if (virtio_has_feature(vb->vdev,
> > VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > > +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> > > +		unsigned long bmap_len;
> > > +
> > > +		/* cmd and req_id are not used here, set them to 0 */
> > > +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> > > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > > +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> > > +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> > 
> > no need to swap 0, just fill it in. in fact you allocated all 0s so no need to touch
> > these fields at all.
> > 
> 
> Will change in v3.
> 
> > > @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct
> > > balloon_dev_info *vb_dev_info,  static int virtballoon_probe(struct
> > > virtio_device *vdev)  {
> > >  	struct virtio_balloon *vb;
> > > -	int err;
> > > +	int err, hdr_len;
> > >
> > >  	if (!vdev->config->get) {
> > >  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > @@
> > > -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
> > >  	spin_lock_init(&vb->stop_update_lock);
> > >  	vb->stop_update = false;
> > >  	vb->num_pages = 0;
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > 
> > What are these 2 longs in aid of?
> > 
> The rounddown(vb->start_pfn,  BITS_PER_LONG) and roundup(vb->end_pfn, BITS_PER_LONG) 
> may cause (vb->end_pfn - vb->start_pfn) > vb->pfn_limit, so we need extra space to save the
> bitmap for this case. 2 longs are enough.
> 
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > 
> > So it can go up to 1MByte but adding header size etc you need a higher order
> > allocation. This is a waste, there is no need to have a power of two allocation.
> > Start from the other side. Say "I want to allocate 32KBytes for the bitmap".
> > Subtract the header and you get bitmap size.
> > Calculate the pfn limit from there.
> > 
> 
> Indeed, will change. Thanks a lot!
> 
> Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28 22:17         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 22:17 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Thu, Jul 28, 2016 at 03:06:37AM +0000, Li, Liang Z wrote:
> > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > + * 1) to save memory.
> > > + * 2) allocate a large bitmap may fail.
> > > + *
> > > + * The actual limit of pfn is determined by:
> > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > + *
> > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > > +scan
> > > + * the page list and send the PFNs with several times. To reduce the
> > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > +should
> > > + * be set with a value which can cover most cases.
> > 
> > So what if it covers 1/32 of the memory? We'll do 32 exits and not 1, still not a
> > big deal for a big guest.
> > 
> 
> The issue here is the overhead is too high for scanning the page list for 32 times.
> Limit the page bitmap size to a fixed value is better for a big guest?
> 

I'd say avoid scanning free lists completely. Scan pages themselves and
check the refcount to see whether they are free.
This way each page needs to be tested once.

And skip the whole optimization if less than e.g. 10% is free.

> > > + */
> > > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> > PAGE_SHIFT)
> > > +/* 32GB */
> > 
> > I already said this with a smaller limit.
> > 
> > 	2<< 30  is 2G but that is not a useful comment.
> > 	pls explain what is the reason for this selection.
> > 
> > Still applies here.
> > 
> 
> I will add the comment for this.
> 
> > > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > +	if (virtio_has_feature(vb->vdev,
> > VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > > +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> > > +		unsigned long bmap_len;
> > > +
> > > +		/* cmd and req_id are not used here, set them to 0 */
> > > +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> > > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > > +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> > > +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> > 
> > no need to swap 0, just fill it in. in fact you allocated all 0s so no need to touch
> > these fields at all.
> > 
> 
> Will change in v3.
> 
> > > @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct
> > > balloon_dev_info *vb_dev_info,  static int virtballoon_probe(struct
> > > virtio_device *vdev)  {
> > >  	struct virtio_balloon *vb;
> > > -	int err;
> > > +	int err, hdr_len;
> > >
> > >  	if (!vdev->config->get) {
> > >  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > @@
> > > -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
> > >  	spin_lock_init(&vb->stop_update_lock);
> > >  	vb->stop_update = false;
> > >  	vb->num_pages = 0;
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > 
> > What are these 2 longs in aid of?
> > 
> The rounddown(vb->start_pfn,  BITS_PER_LONG) and roundup(vb->end_pfn, BITS_PER_LONG) 
> may cause (vb->end_pfn - vb->start_pfn) > vb->pfn_limit, so we need extra space to save the
> bitmap for this case. 2 longs are enough.
> 
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > 
> > So it can go up to 1MByte but adding header size etc you need a higher order
> > allocation. This is a waste, there is no need to have a power of two allocation.
> > Start from the other side. Say "I want to allocate 32KBytes for the bitmap".
> > Subtract the header and you get bitmap size.
> > Calculate the pfn limit from there.
> > 
> 
> Indeed, will change. Thanks a lot!
> 
> Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-28 22:17         ` Michael S. Tsirkin
  0 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 22:17 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On Thu, Jul 28, 2016 at 03:06:37AM +0000, Li, Liang Z wrote:
> > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > + * 1) to save memory.
> > > + * 2) allocate a large bitmap may fail.
> > > + *
> > > + * The actual limit of pfn is determined by:
> > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > + *
> > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > > +scan
> > > + * the page list and send the PFNs with several times. To reduce the
> > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > +should
> > > + * be set with a value which can cover most cases.
> > 
> > So what if it covers 1/32 of the memory? We'll do 32 exits and not 1, still not a
> > big deal for a big guest.
> > 
> 
> The issue here is the overhead is too high for scanning the page list for 32 times.
> Limit the page bitmap size to a fixed value is better for a big guest?
> 

I'd say avoid scanning free lists completely. Scan pages themselves and
check the refcount to see whether they are free.
This way each page needs to be tested once.

And skip the whole optimization if less than e.g. 10% is free.

> > > + */
> > > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> > PAGE_SHIFT)
> > > +/* 32GB */
> > 
> > I already said this with a smaller limit.
> > 
> > 	2<< 30  is 2G but that is not a useful comment.
> > 	pls explain what is the reason for this selection.
> > 
> > Still applies here.
> > 
> 
> I will add the comment for this.
> 
> > > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > +	if (virtio_has_feature(vb->vdev,
> > VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > > +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> > > +		unsigned long bmap_len;
> > > +
> > > +		/* cmd and req_id are not used here, set them to 0 */
> > > +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> > > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > > +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> > > +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> > 
> > no need to swap 0, just fill it in. in fact you allocated all 0s so no need to touch
> > these fields at all.
> > 
> 
> Will change in v3.
> 
> > > @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct
> > > balloon_dev_info *vb_dev_info,  static int virtballoon_probe(struct
> > > virtio_device *vdev)  {
> > >  	struct virtio_balloon *vb;
> > > -	int err;
> > > +	int err, hdr_len;
> > >
> > >  	if (!vdev->config->get) {
> > >  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > @@
> > > -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
> > >  	spin_lock_init(&vb->stop_update_lock);
> > >  	vb->stop_update = false;
> > >  	vb->num_pages = 0;
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > 
> > What are these 2 longs in aid of?
> > 
> The rounddown(vb->start_pfn,  BITS_PER_LONG) and roundup(vb->end_pfn, BITS_PER_LONG) 
> may cause (vb->end_pfn - vb->start_pfn) > vb->pfn_limit, so we need extra space to save the
> bitmap for this case. 2 longs are enough.
> 
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > 
> > So it can go up to 1MByte but adding header size etc you need a higher order
> > allocation. This is a waste, there is no need to have a power of two allocation.
> > Start from the other side. Say "I want to allocate 32KBytes for the bitmap".
> > Subtract the header and you get bitmap size.
> > Calculate the pfn limit from there.
> > 
> 
> Indeed, will change. Thanks a lot!
> 
> Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28  3:06       ` Li, Liang Z
                         ` (2 preceding siblings ...)
  (?)
@ 2016-07-28 22:17       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 171+ messages in thread
From: Michael S. Tsirkin @ 2016-07-28 22:17 UTC (permalink / raw)
  To: Li, Liang Z
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

On Thu, Jul 28, 2016 at 03:06:37AM +0000, Li, Liang Z wrote:
> > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page bitmap
> > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > + * 1) to save memory.
> > > + * 2) allocate a large bitmap may fail.
> > > + *
> > > + * The actual limit of pfn is determined by:
> > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > + *
> > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we will
> > > +scan
> > > + * the page list and send the PFNs with several times. To reduce the
> > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > +should
> > > + * be set with a value which can cover most cases.
> > 
> > So what if it covers 1/32 of the memory? We'll do 32 exits and not 1, still not a
> > big deal for a big guest.
> > 
> 
> The issue here is the overhead is too high for scanning the page list for 32 times.
> Limit the page bitmap size to a fixed value is better for a big guest?
> 

I'd say avoid scanning free lists completely. Scan pages themselves and
check the refcount to see whether they are free.
This way each page needs to be tested once.

And skip the whole optimization if less than e.g. 10% is free.

> > > + */
> > > +#define VIRTIO_BALLOON_PFNS_LIMIT ((32 * (1ULL << 30)) >>
> > PAGE_SHIFT)
> > > +/* 32GB */
> > 
> > I already said this with a smaller limit.
> > 
> > 	2<< 30  is 2G but that is not a useful comment.
> > 	pls explain what is the reason for this selection.
> > 
> > Still applies here.
> > 
> 
> I will add the comment for this.
> 
> > > -	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
> > > +	if (virtio_has_feature(vb->vdev,
> > VIRTIO_BALLOON_F_PAGE_BITMAP)) {
> > > +		struct balloon_bmap_hdr *hdr = vb->bmap_hdr;
> > > +		unsigned long bmap_len;
> > > +
> > > +		/* cmd and req_id are not used here, set them to 0 */
> > > +		hdr->cmd = cpu_to_virtio16(vb->vdev, 0);
> > > +		hdr->page_shift = cpu_to_virtio16(vb->vdev, PAGE_SHIFT);
> > > +		hdr->reserved = cpu_to_virtio16(vb->vdev, 0);
> > > +		hdr->req_id = cpu_to_virtio64(vb->vdev, 0);
> > 
> > no need to swap 0, just fill it in. in fact you allocated all 0s so no need to touch
> > these fields at all.
> > 
> 
> Will change in v3.
> 
> > > @@ -489,7 +612,7 @@ static int virtballoon_migratepage(struct
> > > balloon_dev_info *vb_dev_info,  static int virtballoon_probe(struct
> > > virtio_device *vdev)  {
> > >  	struct virtio_balloon *vb;
> > > -	int err;
> > > +	int err, hdr_len;
> > >
> > >  	if (!vdev->config->get) {
> > >  		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > @@
> > > -508,6 +631,18 @@ static int virtballoon_probe(struct virtio_device *vdev)
> > >  	spin_lock_init(&vb->stop_update_lock);
> > >  	vb->stop_update = false;
> > >  	vb->num_pages = 0;
> > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > 
> > What are these 2 longs in aid of?
> > 
> The rounddown(vb->start_pfn,  BITS_PER_LONG) and roundup(vb->end_pfn, BITS_PER_LONG) 
> may cause (vb->end_pfn - vb->start_pfn) > vb->pfn_limit, so we need extra space to save the
> bitmap for this case. 2 longs are enough.
> 
> > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len, GFP_KERNEL);
> > 
> > So it can go up to 1MByte but adding header size etc you need a higher order
> > allocation. This is a waste, there is no need to have a power of two allocation.
> > Start from the other side. Say "I want to allocate 32KBytes for the bitmap".
> > Subtract the header and you get bitmap size.
> > Calculate the pfn limit from there.
> > 
> 
> Indeed, will change. Thanks a lot!
> 
> Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28 22:17         ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-29  0:38           ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  0:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> On Thu, Jul 28, 2016 at 03:06:37AM +0000, Li, Liang Z wrote:
> > > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page
> > > > +bitmap
> > > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > > + * 1) to save memory.
> > > > + * 2) allocate a large bitmap may fail.
> > > > + *
> > > > + * The actual limit of pfn is determined by:
> > > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > > + *
> > > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we
> > > > +will scan
> > > > + * the page list and send the PFNs with several times. To reduce
> > > > +the
> > > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > > +should
> > > > + * be set with a value which can cover most cases.
> > >
> > > So what if it covers 1/32 of the memory? We'll do 32 exits and not
> > > 1, still not a big deal for a big guest.
> > >
> >
> > The issue here is the overhead is too high for scanning the page list for 32
> times.
> > Limit the page bitmap size to a fixed value is better for a big guest?
> >
> 
> I'd say avoid scanning free lists completely. Scan pages themselves and check
> the refcount to see whether they are free.
> This way each page needs to be tested once.
> 
> And skip the whole optimization if less than e.g. 10% is free.

That's better than rescanning the free list. Will change in next version.

Thanks!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29  0:38           ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  0:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

> On Thu, Jul 28, 2016 at 03:06:37AM +0000, Li, Liang Z wrote:
> > > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page
> > > > +bitmap
> > > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > > + * 1) to save memory.
> > > > + * 2) allocate a large bitmap may fail.
> > > > + *
> > > > + * The actual limit of pfn is determined by:
> > > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > > + *
> > > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we
> > > > +will scan
> > > > + * the page list and send the PFNs with several times. To reduce
> > > > +the
> > > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > > +should
> > > > + * be set with a value which can cover most cases.
> > >
> > > So what if it covers 1/32 of the memory? We'll do 32 exits and not
> > > 1, still not a big deal for a big guest.
> > >
> >
> > The issue here is the overhead is too high for scanning the page list for 32
> times.
> > Limit the page bitmap size to a fixed value is better for a big guest?
> >
> 
> I'd say avoid scanning free lists completely. Scan pages themselves and check
> the refcount to see whether they are free.
> This way each page needs to be tested once.
> 
> And skip the whole optimization if less than e.g. 10% is free.

That's better than rescanning the free list. Will change in next version.

Thanks!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29  0:38           ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  0:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> On Thu, Jul 28, 2016 at 03:06:37AM +0000, Li, Liang Z wrote:
> > > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page
> > > > +bitmap
> > > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > > + * 1) to save memory.
> > > > + * 2) allocate a large bitmap may fail.
> > > > + *
> > > > + * The actual limit of pfn is determined by:
> > > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > > + *
> > > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we
> > > > +will scan
> > > > + * the page list and send the PFNs with several times. To reduce
> > > > +the
> > > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > > +should
> > > > + * be set with a value which can cover most cases.
> > >
> > > So what if it covers 1/32 of the memory? We'll do 32 exits and not
> > > 1, still not a big deal for a big guest.
> > >
> >
> > The issue here is the overhead is too high for scanning the page list for 32
> times.
> > Limit the page bitmap size to a fixed value is better for a big guest?
> >
> 
> I'd say avoid scanning free lists completely. Scan pages themselves and check
> the refcount to see whether they are free.
> This way each page needs to be tested once.
> 
> And skip the whole optimization if less than e.g. 10% is free.

That's better than rescanning the free list. Will change in next version.

Thanks!

Liang


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29  0:38           ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  0:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> On Thu, Jul 28, 2016 at 03:06:37AM +0000, Li, Liang Z wrote:
> > > > + * VIRTIO_BALLOON_PFNS_LIMIT is used to limit the size of page
> > > > +bitmap
> > > > + * to prevent a very large page bitmap, there are two reasons for this:
> > > > + * 1) to save memory.
> > > > + * 2) allocate a large bitmap may fail.
> > > > + *
> > > > + * The actual limit of pfn is determined by:
> > > > + * pfn_limit = min(max_pfn, VIRTIO_BALLOON_PFNS_LIMIT);
> > > > + *
> > > > + * If system has more pages than VIRTIO_BALLOON_PFNS_LIMIT, we
> > > > +will scan
> > > > + * the page list and send the PFNs with several times. To reduce
> > > > +the
> > > > + * overhead of scanning the page list. VIRTIO_BALLOON_PFNS_LIMIT
> > > > +should
> > > > + * be set with a value which can cover most cases.
> > >
> > > So what if it covers 1/32 of the memory? We'll do 32 exits and not
> > > 1, still not a big deal for a big guest.
> > >
> >
> > The issue here is the overhead is too high for scanning the page list for 32
> times.
> > Limit the page bitmap size to a fixed value is better for a big guest?
> >
> 
> I'd say avoid scanning free lists completely. Scan pages themselves and check
> the refcount to see whether they are free.
> This way each page needs to be tested once.
> 
> And skip the whole optimization if less than e.g. 10% is free.

That's better than rescanning the free list. Will change in next version.

Thanks!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28 21:51             ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-29  0:46               ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  0:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > > How big was the pfn buffer before?
> > > >
> > > > Yes, it is if the max pfn is more than 32GB.
> > > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's
> > > > too small, and it's the main reason for bad performance.
> > > > Use the max 1MB kmalloc is a balance between performance and
> > > > flexibility, a large page bitmap covers the range of all the
> > > > memory is no good for a system with huge amount of memory. If the
> > > > bitmap is too small, it means we have to traverse a long list for
> > > > many times, and it's bad
> > > for performance.
> > > >
> > > > Thanks!
> > > > Liang
> > >
> > > There are all your implementation decisions though.
> > >
> > > If guest memory is so fragmented that you only have order 0 4k
> > > pages, then allocating a huge 1M contigious chunk is very problematic in
> and of itself.
> > >
> >
> > The memory is allocated in the probe stage. This will not happen if
> > the driver is  loaded when booting the guest.
> >
> > > Most people rarely migrate and do not care how fast that happens.
> > > Wasting a large chunk of memory (and it's zeroed for no good reason,
> > > so you actually request host memory for it) for everyone to speed it
> > > up when it does happen is not really an option.
> > >
> > If people don't plan to do inflating/deflating, they should not enable
> > the virtio-balloon at the beginning, once they decide to use it, the
> > driver should provide better performance as much as possible.
> 
> The reason people inflate/deflate is so they can overcommit memory.
> Do they need to overcommit very quickly? I don't see why.
> So let's get what we can for free but I don't really believe people would want
> to pay for it.
> 
> > 1MB is a very small portion for a VM with more than 32GB memory and
> > it's the *worst case*, for VM with less than 32GB memory, the amount
> > of RAM depends on VM's memory size and will be less than 1MB.
> 
> It's guest memmory so might all be in swap and never touched, your memset
> at probe time will fault it in and make hypervisor actually pay for it.
> 
> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> >
> > Liang
> 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about scanning actual
> page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?
> 

Yes, agree.
> 
> Another idea: allocate a small bitmap at probe time (e.g. for deflate), allocate
> a bunch more on each request. Use something like GFP_ATOMIC and a
> scatter/gather, if that fails use the smaller bitmap.
> 

So, the aim of v3 is to use a smaller bitmap without too heavy performance penalty.
Thanks a lot!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29  0:46               ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  0:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > > How big was the pfn buffer before?
> > > >
> > > > Yes, it is if the max pfn is more than 32GB.
> > > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's
> > > > too small, and it's the main reason for bad performance.
> > > > Use the max 1MB kmalloc is a balance between performance and
> > > > flexibility, a large page bitmap covers the range of all the
> > > > memory is no good for a system with huge amount of memory. If the
> > > > bitmap is too small, it means we have to traverse a long list for
> > > > many times, and it's bad
> > > for performance.
> > > >
> > > > Thanks!
> > > > Liang
> > >
> > > There are all your implementation decisions though.
> > >
> > > If guest memory is so fragmented that you only have order 0 4k
> > > pages, then allocating a huge 1M contigious chunk is very problematic in
> and of itself.
> > >
> >
> > The memory is allocated in the probe stage. This will not happen if
> > the driver is  loaded when booting the guest.
> >
> > > Most people rarely migrate and do not care how fast that happens.
> > > Wasting a large chunk of memory (and it's zeroed for no good reason,
> > > so you actually request host memory for it) for everyone to speed it
> > > up when it does happen is not really an option.
> > >
> > If people don't plan to do inflating/deflating, they should not enable
> > the virtio-balloon at the beginning, once they decide to use it, the
> > driver should provide better performance as much as possible.
> 
> The reason people inflate/deflate is so they can overcommit memory.
> Do they need to overcommit very quickly? I don't see why.
> So let's get what we can for free but I don't really believe people would want
> to pay for it.
> 
> > 1MB is a very small portion for a VM with more than 32GB memory and
> > it's the *worst case*, for VM with less than 32GB memory, the amount
> > of RAM depends on VM's memory size and will be less than 1MB.
> 
> It's guest memmory so might all be in swap and never touched, your memset
> at probe time will fault it in and make hypervisor actually pay for it.
> 
> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> >
> > Liang
> 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about scanning actual
> page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?
> 

Yes, agree.
> 
> Another idea: allocate a small bitmap at probe time (e.g. for deflate), allocate
> a bunch more on each request. Use something like GFP_ATOMIC and a
> scatter/gather, if that fails use the smaller bitmap.
> 

So, the aim of v3 is to use a smaller bitmap without too heavy performance penalty.
Thanks a lot!

Liang



^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29  0:46               ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  0:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > > How big was the pfn buffer before?
> > > >
> > > > Yes, it is if the max pfn is more than 32GB.
> > > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's
> > > > too small, and it's the main reason for bad performance.
> > > > Use the max 1MB kmalloc is a balance between performance and
> > > > flexibility, a large page bitmap covers the range of all the
> > > > memory is no good for a system with huge amount of memory. If the
> > > > bitmap is too small, it means we have to traverse a long list for
> > > > many times, and it's bad
> > > for performance.
> > > >
> > > > Thanks!
> > > > Liang
> > >
> > > There are all your implementation decisions though.
> > >
> > > If guest memory is so fragmented that you only have order 0 4k
> > > pages, then allocating a huge 1M contigious chunk is very problematic in
> and of itself.
> > >
> >
> > The memory is allocated in the probe stage. This will not happen if
> > the driver is  loaded when booting the guest.
> >
> > > Most people rarely migrate and do not care how fast that happens.
> > > Wasting a large chunk of memory (and it's zeroed for no good reason,
> > > so you actually request host memory for it) for everyone to speed it
> > > up when it does happen is not really an option.
> > >
> > If people don't plan to do inflating/deflating, they should not enable
> > the virtio-balloon at the beginning, once they decide to use it, the
> > driver should provide better performance as much as possible.
> 
> The reason people inflate/deflate is so they can overcommit memory.
> Do they need to overcommit very quickly? I don't see why.
> So let's get what we can for free but I don't really believe people would want
> to pay for it.
> 
> > 1MB is a very small portion for a VM with more than 32GB memory and
> > it's the *worst case*, for VM with less than 32GB memory, the amount
> > of RAM depends on VM's memory size and will be less than 1MB.
> 
> It's guest memmory so might all be in swap and never touched, your memset
> at probe time will fault it in and make hypervisor actually pay for it.
> 
> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> >
> > Liang
> 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about scanning actual
> page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?
> 

Yes, agree.
> 
> Another idea: allocate a small bitmap at probe time (e.g. for deflate), allocate
> a bunch more on each request. Use something like GFP_ATOMIC and a
> scatter/gather, if that fails use the smaller bitmap.
> 

So, the aim of v3 is to use a smaller bitmap without too heavy performance penalty.
Thanks a lot!

Liang


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29  0:46               ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  0:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > > How big was the pfn buffer before?
> > > >
> > > > Yes, it is if the max pfn is more than 32GB.
> > > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's
> > > > too small, and it's the main reason for bad performance.
> > > > Use the max 1MB kmalloc is a balance between performance and
> > > > flexibility, a large page bitmap covers the range of all the
> > > > memory is no good for a system with huge amount of memory. If the
> > > > bitmap is too small, it means we have to traverse a long list for
> > > > many times, and it's bad
> > > for performance.
> > > >
> > > > Thanks!
> > > > Liang
> > >
> > > There are all your implementation decisions though.
> > >
> > > If guest memory is so fragmented that you only have order 0 4k
> > > pages, then allocating a huge 1M contigious chunk is very problematic in
> and of itself.
> > >
> >
> > The memory is allocated in the probe stage. This will not happen if
> > the driver is  loaded when booting the guest.
> >
> > > Most people rarely migrate and do not care how fast that happens.
> > > Wasting a large chunk of memory (and it's zeroed for no good reason,
> > > so you actually request host memory for it) for everyone to speed it
> > > up when it does happen is not really an option.
> > >
> > If people don't plan to do inflating/deflating, they should not enable
> > the virtio-balloon at the beginning, once they decide to use it, the
> > driver should provide better performance as much as possible.
> 
> The reason people inflate/deflate is so they can overcommit memory.
> Do they need to overcommit very quickly? I don't see why.
> So let's get what we can for free but I don't really believe people would want
> to pay for it.
> 
> > 1MB is a very small portion for a VM with more than 32GB memory and
> > it's the *worst case*, for VM with less than 32GB memory, the amount
> > of RAM depends on VM's memory size and will be less than 1MB.
> 
> It's guest memmory so might all be in swap and never touched, your memset
> at probe time will fault it in and make hypervisor actually pay for it.
> 
> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> >
> > Liang
> 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about scanning actual
> page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?
> 

Yes, agree.
> 
> Another idea: allocate a small bitmap at probe time (e.g. for deflate), allocate
> a bunch more on each request. Use something like GFP_ATOMIC and a
> scatter/gather, if that fails use the smaller bitmap.
> 

So, the aim of v3 is to use a smaller bitmap without too heavy performance penalty.
Thanks a lot!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28 21:51             ` Michael S. Tsirkin
                               ` (2 preceding siblings ...)
  (?)
@ 2016-07-29  0:46             ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  0:46 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, Hansen, Dave, qemu-devel, Amit Shah,
	linux-kernel, virtualization, linux-mm, Vlastimil Babka,
	Paolo Bonzini, Andrew Morton, Mel Gorman, dgilbert

> On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > > How big was the pfn buffer before?
> > > >
> > > > Yes, it is if the max pfn is more than 32GB.
> > > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's
> > > > too small, and it's the main reason for bad performance.
> > > > Use the max 1MB kmalloc is a balance between performance and
> > > > flexibility, a large page bitmap covers the range of all the
> > > > memory is no good for a system with huge amount of memory. If the
> > > > bitmap is too small, it means we have to traverse a long list for
> > > > many times, and it's bad
> > > for performance.
> > > >
> > > > Thanks!
> > > > Liang
> > >
> > > There are all your implementation decisions though.
> > >
> > > If guest memory is so fragmented that you only have order 0 4k
> > > pages, then allocating a huge 1M contigious chunk is very problematic in
> and of itself.
> > >
> >
> > The memory is allocated in the probe stage. This will not happen if
> > the driver is  loaded when booting the guest.
> >
> > > Most people rarely migrate and do not care how fast that happens.
> > > Wasting a large chunk of memory (and it's zeroed for no good reason,
> > > so you actually request host memory for it) for everyone to speed it
> > > up when it does happen is not really an option.
> > >
> > If people don't plan to do inflating/deflating, they should not enable
> > the virtio-balloon at the beginning, once they decide to use it, the
> > driver should provide better performance as much as possible.
> 
> The reason people inflate/deflate is so they can overcommit memory.
> Do they need to overcommit very quickly? I don't see why.
> So let's get what we can for free but I don't really believe people would want
> to pay for it.
> 
> > 1MB is a very small portion for a VM with more than 32GB memory and
> > it's the *worst case*, for VM with less than 32GB memory, the amount
> > of RAM depends on VM's memory size and will be less than 1MB.
> 
> It's guest memmory so might all be in swap and never touched, your memset
> at probe time will fault it in and make hypervisor actually pay for it.
> 
> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> >
> > Liang
> 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about scanning actual
> page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?
> 

Yes, agree.
> 
> Another idea: allocate a small bitmap at probe time (e.g. for deflate), allocate
> a bunch more on each request. Use something like GFP_ATOMIC and a
> scatter/gather, if that fails use the smaller bitmap.
> 

So, the aim of v3 is to use a smaller bitmap without too heavy performance penalty.
Thanks a lot!

Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28 22:15           ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-07-29  1:08             ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  1:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> > > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len,
> GFP_KERNEL);
> > > >
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > >
> > > Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> > >
> > > --
> > > MST
> >
> > Limit to 1G is bad for the performance, I sent you the test result several
> weeks ago.
> >
> > Paste it bellow:
> > ----------------------------------------------------------------------
> > --------------------------------------------------
> > About the size of page bitmap, I have test the performance of filling
> > the balloon to 15GB with a  16GB RAM VM.
> >
> > ===============================
> > 32K Byte (cover 1GB of RAM)
> >
> > Time spends on inflating: 2031ms
> > ---------------------------------------------
> > 64K Byte (cover 2GB of RAM)
> >
> > Time spends on inflating: 1507ms
> > --------------------------------------------
> > 512K Byte (cover 16GB of RAM)
> >
> > Time spends on inflating: 1237ms
> > ================================
> >
> > If possible, a big bitmap is better for performance.
> >
> > Liang
> 
> Earlier you said:
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> Here sending PFNs to host with 512K Byte map should be almost free.
> 
> So is something else taking up the time?
> 
I just want to show you the benefits of using a big bitmap. :)
I did not measure the time spend on each stage after optimization(I will do it later),
but I have tried to allocate the page with big chunk and found it can make things faster.
Without allocating big chunk page, the performance improvement is about 85%, and with
 allocating big  chunk page, the improvement is about 94%.

Liang

> 
> --
> MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29  1:08             ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  1:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, Hansen, Dave, qemu-devel, Amit Shah,
	linux-kernel, virtualization, linux-mm, Vlastimil Babka,
	Paolo Bonzini, Andrew Morton, Mel Gorman, dgilbert

> > > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len,
> GFP_KERNEL);
> > > >
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > >
> > > Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> > >
> > > --
> > > MST
> >
> > Limit to 1G is bad for the performance, I sent you the test result several
> weeks ago.
> >
> > Paste it bellow:
> > ----------------------------------------------------------------------
> > --------------------------------------------------
> > About the size of page bitmap, I have test the performance of filling
> > the balloon to 15GB with a  16GB RAM VM.
> >
> > ===============================
> > 32K Byte (cover 1GB of RAM)
> >
> > Time spends on inflating: 2031ms
> > ---------------------------------------------
> > 64K Byte (cover 2GB of RAM)
> >
> > Time spends on inflating: 1507ms
> > --------------------------------------------
> > 512K Byte (cover 16GB of RAM)
> >
> > Time spends on inflating: 1237ms
> > ================================
> >
> > If possible, a big bitmap is better for performance.
> >
> > Liang
> 
> Earlier you said:
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> Here sending PFNs to host with 512K Byte map should be almost free.
> 
> So is something else taking up the time?
> 
I just want to show you the benefits of using a big bitmap. :)
I did not measure the time spend on each stage after optimization(I will do it later),
but I have tried to allocate the page with big chunk and found it can make things faster.
Without allocating big chunk page, the performance improvement is about 85%, and with
 allocating big  chunk page, the improvement is about 94%.

Liang

> 
> --
> MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29  1:08             ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  1:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> > > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len,
> GFP_KERNEL);
> > > >
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > >
> > > Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> > >
> > > --
> > > MST
> >
> > Limit to 1G is bad for the performance, I sent you the test result several
> weeks ago.
> >
> > Paste it bellow:
> > ----------------------------------------------------------------------
> > --------------------------------------------------
> > About the size of page bitmap, I have test the performance of filling
> > the balloon to 15GB with a  16GB RAM VM.
> >
> > ===============================
> > 32K Byte (cover 1GB of RAM)
> >
> > Time spends on inflating: 2031ms
> > ---------------------------------------------
> > 64K Byte (cover 2GB of RAM)
> >
> > Time spends on inflating: 1507ms
> > --------------------------------------------
> > 512K Byte (cover 16GB of RAM)
> >
> > Time spends on inflating: 1237ms
> > ================================
> >
> > If possible, a big bitmap is better for performance.
> >
> > Liang
> 
> Earlier you said:
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> Here sending PFNs to host with 512K Byte map should be almost free.
> 
> So is something else taking up the time?
> 
I just want to show you the benefits of using a big bitmap. :)
I did not measure the time spend on each stage after optimization(I will do it later),
but I have tried to allocate the page with big chunk and found it can make things faster.
Without allocating big chunk page, the performance improvement is about 85%, and with
 allocating big  chunk page, the improvement is about 94%.

Liang

> 
> --
> MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29  1:08             ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-07-29  1:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Hansen, Dave, linux-kernel, virtualization, linux-mm, virtio-dev,
	kvm, qemu-devel, dgilbert, quintela, Andrew Morton,
	Vlastimil Babka, Mel Gorman, Paolo Bonzini, Cornelia Huck,
	Amit Shah

> > > On Wed, Jul 27, 2016 at 09:03:21AM -0700, Dave Hansen wrote:
> > > > On 07/26/2016 06:23 PM, Liang Li wrote:
> > > > > +	vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> > > > > +	vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> > > > > +	vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> > > > > +		 BITS_PER_BYTE + 2 * sizeof(unsigned long);
> > > > > +	hdr_len = sizeof(struct balloon_bmap_hdr);
> > > > > +	vb->bmap_hdr = kzalloc(hdr_len + vb->bmap_len,
> GFP_KERNEL);
> > > >
> > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > How big was the pfn buffer before?
> > >
> > >
> > > Yes I would limit this to 1G memory in a go, will result in a 32KByte bitmap.
> > >
> > > --
> > > MST
> >
> > Limit to 1G is bad for the performance, I sent you the test result several
> weeks ago.
> >
> > Paste it bellow:
> > ----------------------------------------------------------------------
> > --------------------------------------------------
> > About the size of page bitmap, I have test the performance of filling
> > the balloon to 15GB with a  16GB RAM VM.
> >
> > ===============================
> > 32K Byte (cover 1GB of RAM)
> >
> > Time spends on inflating: 2031ms
> > ---------------------------------------------
> > 64K Byte (cover 2GB of RAM)
> >
> > Time spends on inflating: 1507ms
> > --------------------------------------------
> > 512K Byte (cover 16GB of RAM)
> >
> > Time spends on inflating: 1237ms
> > ================================
> >
> > If possible, a big bitmap is better for performance.
> >
> > Liang
> 
> Earlier you said:
> a. allocating pages (6.5%)
> b. sending PFNs to host (68.3%)
> c. address translation (6.1%)
> d. madvise (19%)
> 
> Here sending PFNs to host with 512K Byte map should be almost free.
> 
> So is something else taking up the time?
> 
I just want to show you the benefits of using a big bitmap. :)
I did not measure the time spend on each stage after optimization(I will do it later),
but I have tried to allocate the page with big chunk and found it can make things faster.
Without allocating big chunk page, the performance improvement is about 85%, and with
 allocating big  chunk page, the improvement is about 94%.

Liang

> 
> --
> MST

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28 21:51             ` Michael S. Tsirkin
  (?)
@ 2016-07-29 19:48               ` Dave Hansen
  -1 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-29 19:48 UTC (permalink / raw)
  To: Michael S. Tsirkin, Li, Liang Z
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/28/2016 02:51 PM, Michael S. Tsirkin wrote:
>> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
>> > 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about
> scanning actual page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?

FWIW, I think the new data structure needs some work.

Before, we had a potentially very long list of 4k areas.  Now, we've
just got a very large bitmap.  The bitmap might not even be very dense
if we are ballooning relatively few things.

Can I suggest an alternate scheme?  I think you actually need a hybrid
scheme that has bitmaps but also allows more flexibility in the pfn
ranges.  The payload could be a number of records each containing 3 things:

	pfn, page order, length of bitmap (maybe in powers of 2)

Each record is followed by the bitmap.  Or, if the bitmap length is 0,
immediately followed by another record.  A bitmap length of 0 implies a
bitmap with the least significant bit set.  Page order specifies how
many pages each bit represents.

This scheme could easily encode the new data structure you are proposing
by just setting pfn=0, order=0, and a very long bitmap length.  But, it
could handle sparse bitmaps much better *and* represent large pages much
more efficiently.

There's plenty of space to fit a whole record in 64 bits.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29 19:48               ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-29 19:48 UTC (permalink / raw)
  To: Michael S. Tsirkin, Li, Liang Z
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/28/2016 02:51 PM, Michael S. Tsirkin wrote:
>> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
>> > 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about
> scanning actual page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?

FWIW, I think the new data structure needs some work.

Before, we had a potentially very long list of 4k areas.  Now, we've
just got a very large bitmap.  The bitmap might not even be very dense
if we are ballooning relatively few things.

Can I suggest an alternate scheme?  I think you actually need a hybrid
scheme that has bitmaps but also allows more flexibility in the pfn
ranges.  The payload could be a number of records each containing 3 things:

	pfn, page order, length of bitmap (maybe in powers of 2)

Each record is followed by the bitmap.  Or, if the bitmap length is 0,
immediately followed by another record.  A bitmap length of 0 implies a
bitmap with the least significant bit set.  Page order specifies how
many pages each bit represents.

This scheme could easily encode the new data structure you are proposing
by just setting pfn=0, order=0, and a very long bitmap length.  But, it
could handle sparse bitmaps much better *and* represent large pages much
more efficiently.

There's plenty of space to fit a whole record in 64 bits.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-07-29 19:48               ` Dave Hansen
  0 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-29 19:48 UTC (permalink / raw)
  To: Michael S. Tsirkin, Li, Liang Z
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

On 07/28/2016 02:51 PM, Michael S. Tsirkin wrote:
>> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
>> > 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about
> scanning actual page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?

FWIW, I think the new data structure needs some work.

Before, we had a potentially very long list of 4k areas.  Now, we've
just got a very large bitmap.  The bitmap might not even be very dense
if we are ballooning relatively few things.

Can I suggest an alternate scheme?  I think you actually need a hybrid
scheme that has bitmaps but also allows more flexibility in the pfn
ranges.  The payload could be a number of records each containing 3 things:

	pfn, page order, length of bitmap (maybe in powers of 2)

Each record is followed by the bitmap.  Or, if the bitmap length is 0,
immediately followed by another record.  A bitmap length of 0 implies a
bitmap with the least significant bit set.  Page order specifies how
many pages each bit represents.

This scheme could easily encode the new data structure you are proposing
by just setting pfn=0, order=0, and a very long bitmap length.  But, it
could handle sparse bitmaps much better *and* represent large pages much
more efficiently.

There's plenty of space to fit a whole record in 64 bits.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-28 21:51             ` Michael S. Tsirkin
                               ` (4 preceding siblings ...)
  (?)
@ 2016-07-29 19:48             ` Dave Hansen
  -1 siblings, 0 replies; 171+ messages in thread
From: Dave Hansen @ 2016-07-29 19:48 UTC (permalink / raw)
  To: Michael S. Tsirkin, Li, Liang Z
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

On 07/28/2016 02:51 PM, Michael S. Tsirkin wrote:
>> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
>> > 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about
> scanning actual page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?

FWIW, I think the new data structure needs some work.

Before, we had a potentially very long list of 4k areas.  Now, we've
just got a very large bitmap.  The bitmap might not even be very dense
if we are ballooning relatively few things.

Can I suggest an alternate scheme?  I think you actually need a hybrid
scheme that has bitmaps but also allows more flexibility in the pfn
ranges.  The payload could be a number of records each containing 3 things:

	pfn, page order, length of bitmap (maybe in powers of 2)

Each record is followed by the bitmap.  Or, if the bitmap length is 0,
immediately followed by another record.  A bitmap length of 0 implies a
bitmap with the least significant bit set.  Page order specifies how
many pages each bit represents.

This scheme could easily encode the new data structure you are proposing
by just setting pfn=0, order=0, and a very long bitmap length.  But, it
could handle sparse bitmaps much better *and* represent large pages much
more efficiently.

There's plenty of space to fit a whole record in 64 bits.

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-29 19:48               ` Dave Hansen
  (?)
  (?)
@ 2016-08-02  0:28                 ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-08-02  0:28 UTC (permalink / raw)
  To: Hansen, Dave, Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > It's only small because it makes you rescan the free list.
> > So maybe you should do something else.
> > I looked at it a bit. Instead of scanning the free list, how about
> > scanning actual page structures? If page is unused, pass it to host.
> > Solves the problem of rescanning multiple times, does it not?
> 
> FWIW, I think the new data structure needs some work.
> 
> Before, we had a potentially very long list of 4k areas.  Now, we've just got a
> very large bitmap.  The bitmap might not even be very dense if we are
> ballooning relatively few things.
> 
> Can I suggest an alternate scheme?  I think you actually need a hybrid
> scheme that has bitmaps but also allows more flexibility in the pfn ranges.
> The payload could be a number of records each containing 3 things:
> 
> 	pfn, page order, length of bitmap (maybe in powers of 2)
> 
> Each record is followed by the bitmap.  Or, if the bitmap length is 0,
> immediately followed by another record.  A bitmap length of 0 implies a
> bitmap with the least significant bit set.  Page order specifies how many
> pages each bit represents.
> 
> This scheme could easily encode the new data structure you are proposing
> by just setting pfn=0, order=0, and a very long bitmap length.  But, it could
> handle sparse bitmaps much better *and* represent large pages much more
> efficiently.
> 
> There's plenty of space to fit a whole record in 64 bits.

I like your idea and it's more flexible, and it's very useful if we want to optimize the
page allocating stage further. I believe the memory fragmentation will not be very
serious, so the performance won't be too bad in the worst case.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-08-02  0:28                 ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-08-02  0:28 UTC (permalink / raw)
  To: Hansen, Dave, Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > It's only small because it makes you rescan the free list.
> > So maybe you should do something else.
> > I looked at it a bit. Instead of scanning the free list, how about
> > scanning actual page structures? If page is unused, pass it to host.
> > Solves the problem of rescanning multiple times, does it not?
> 
> FWIW, I think the new data structure needs some work.
> 
> Before, we had a potentially very long list of 4k areas.  Now, we've just got a
> very large bitmap.  The bitmap might not even be very dense if we are
> ballooning relatively few things.
> 
> Can I suggest an alternate scheme?  I think you actually need a hybrid
> scheme that has bitmaps but also allows more flexibility in the pfn ranges.
> The payload could be a number of records each containing 3 things:
> 
> 	pfn, page order, length of bitmap (maybe in powers of 2)
> 
> Each record is followed by the bitmap.  Or, if the bitmap length is 0,
> immediately followed by another record.  A bitmap length of 0 implies a
> bitmap with the least significant bit set.  Page order specifies how many
> pages each bit represents.
> 
> This scheme could easily encode the new data structure you are proposing
> by just setting pfn=0, order=0, and a very long bitmap length.  But, it could
> handle sparse bitmaps much better *and* represent large pages much more
> efficiently.
> 
> There's plenty of space to fit a whole record in 64 bits.

I like your idea and it's more flexible, and it's very useful if we want to optimize the
page allocating stage further. I believe the memory fragmentation will not be very
serious, so the performance won't be too bad in the worst case.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-08-02  0:28                 ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-08-02  0:28 UTC (permalink / raw)
  To: Hansen, Dave, Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > It's only small because it makes you rescan the free list.
> > So maybe you should do something else.
> > I looked at it a bit. Instead of scanning the free list, how about
> > scanning actual page structures? If page is unused, pass it to host.
> > Solves the problem of rescanning multiple times, does it not?
> 
> FWIW, I think the new data structure needs some work.
> 
> Before, we had a potentially very long list of 4k areas.  Now, we've just got a
> very large bitmap.  The bitmap might not even be very dense if we are
> ballooning relatively few things.
> 
> Can I suggest an alternate scheme?  I think you actually need a hybrid
> scheme that has bitmaps but also allows more flexibility in the pfn ranges.
> The payload could be a number of records each containing 3 things:
> 
> 	pfn, page order, length of bitmap (maybe in powers of 2)
> 
> Each record is followed by the bitmap.  Or, if the bitmap length is 0,
> immediately followed by another record.  A bitmap length of 0 implies a
> bitmap with the least significant bit set.  Page order specifies how many
> pages each bit represents.
> 
> This scheme could easily encode the new data structure you are proposing
> by just setting pfn=0, order=0, and a very long bitmap length.  But, it could
> handle sparse bitmaps much better *and* represent large pages much more
> efficiently.
> 
> There's plenty of space to fit a whole record in 64 bits.

I like your idea and it's more flexible, and it's very useful if we want to optimize the
page allocating stage further. I believe the memory fragmentation will not be very
serious, so the performance won't be too bad in the worst case.

Thanks!
Liang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 171+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
@ 2016-08-02  0:28                 ` Li, Liang Z
  0 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-08-02  0:28 UTC (permalink / raw)
  To: Hansen, Dave, Michael S. Tsirkin
  Cc: linux-kernel, virtualization, linux-mm, virtio-dev, kvm,
	qemu-devel, dgilbert, quintela, Andrew Morton, Vlastimil Babka,
	Mel Gorman, Paolo Bonzini, Cornelia Huck, Amit Shah

> > It's only small because it makes you rescan the free list.
> > So maybe you should do something else.
> > I looked at it a bit. Instead of scanning the free list, how about
> > scanning actual page structures? If page is unused, pass it to host.
> > Solves the problem of rescanning multiple times, does it not?
> 
> FWIW, I think the new data structure needs some work.
> 
> Before, we had a potentially very long list of 4k areas.  Now, we've just got a
> very large bitmap.  The bitmap might not even be very dense if we are
> ballooning relatively few things.
> 
> Can I suggest an alternate scheme?  I think you actually need a hybrid
> scheme that has bitmaps but also allows more flexibility in the pfn ranges.
> The payload could be a number of records each containing 3 things:
> 
> 	pfn, page order, length of bitmap (maybe in powers of 2)
> 
> Each record is followed by the bitmap.  Or, if the bitmap length is 0,
> immediately followed by another record.  A bitmap length of 0 implies a
> bitmap with the least significant bit set.  Page order specifies how many
> pages each bit represents.
> 
> This scheme could easily encode the new data structure you are proposing
> by just setting pfn=0, order=0, and a very long bitmap length.  But, it could
> handle sparse bitmaps much better *and* represent large pages much more
> efficiently.
> 
> There's plenty of space to fit a whole record in 64 bits.

I like your idea and it's more flexible, and it's very useful if we want to optimize the
page allocating stage further. I believe the memory fragmentation will not be very
serious, so the performance won't be too bad in the worst case.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

* RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process
  2016-07-29 19:48               ` Dave Hansen
  (?)
  (?)
@ 2016-08-02  0:28               ` Li, Liang Z
  -1 siblings, 0 replies; 171+ messages in thread
From: Li, Liang Z @ 2016-08-02  0:28 UTC (permalink / raw)
  To: Hansen, Dave, Michael S. Tsirkin
  Cc: virtio-dev, kvm, Amit Shah, qemu-devel, linux-kernel, linux-mm,
	Vlastimil Babka, Paolo Bonzini, Andrew Morton, virtualization,
	Mel Gorman, dgilbert

> > It's only small because it makes you rescan the free list.
> > So maybe you should do something else.
> > I looked at it a bit. Instead of scanning the free list, how about
> > scanning actual page structures? If page is unused, pass it to host.
> > Solves the problem of rescanning multiple times, does it not?
> 
> FWIW, I think the new data structure needs some work.
> 
> Before, we had a potentially very long list of 4k areas.  Now, we've just got a
> very large bitmap.  The bitmap might not even be very dense if we are
> ballooning relatively few things.
> 
> Can I suggest an alternate scheme?  I think you actually need a hybrid
> scheme that has bitmaps but also allows more flexibility in the pfn ranges.
> The payload could be a number of records each containing 3 things:
> 
> 	pfn, page order, length of bitmap (maybe in powers of 2)
> 
> Each record is followed by the bitmap.  Or, if the bitmap length is 0,
> immediately followed by another record.  A bitmap length of 0 implies a
> bitmap with the least significant bit set.  Page order specifies how many
> pages each bit represents.
> 
> This scheme could easily encode the new data structure you are proposing
> by just setting pfn=0, order=0, and a very long bitmap length.  But, it could
> handle sparse bitmaps much better *and* represent large pages much more
> efficiently.
> 
> There's plenty of space to fit a whole record in 64 bits.

I like your idea and it's more flexible, and it's very useful if we want to optimize the
page allocating stage further. I believe the memory fragmentation will not be very
serious, so the performance won't be too bad in the worst case.

Thanks!
Liang

^ permalink raw reply	[flat|nested] 171+ messages in thread

end of thread, other threads:[~2016-08-02  0:34 UTC | newest]

Thread overview: 171+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-27  1:23 [PATCH v2 repost 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
2016-07-27  1:23 ` [Qemu-devel] " Liang Li
2016-07-27  1:23 ` Liang Li
2016-07-27  1:23 ` [PATCH v2 repost 1/7] virtio-balloon: rework deflate to add page to a list Liang Li
2016-07-27  1:23 ` Liang Li
2016-07-27  1:23   ` [Qemu-devel] " Liang Li
2016-07-27  1:23   ` Liang Li
2016-07-27  1:23 ` [PATCH v2 repost 2/7] virtio-balloon: define new feature bit and page bitmap head Liang Li
2016-07-27  1:23 ` Liang Li
2016-07-27  1:23   ` [Qemu-devel] " Liang Li
2016-07-27  1:23   ` Liang Li
2016-07-27  1:23 ` [PATCH v2 repost 3/7] mm: add a function to get the max pfn Liang Li
2016-07-27  1:23 ` Liang Li
2016-07-27  1:23   ` [Qemu-devel] " Liang Li
2016-07-27  1:23   ` Liang Li
2016-07-27 22:08   ` Michael S. Tsirkin
2016-07-27 22:08     ` [Qemu-devel] " Michael S. Tsirkin
2016-07-27 22:08     ` Michael S. Tsirkin
2016-07-27 22:52     ` Dave Hansen
2016-07-27 22:52     ` Dave Hansen
2016-07-27 22:52       ` [Qemu-devel] " Dave Hansen
2016-07-27 22:52       ` Dave Hansen
2016-07-27 22:08   ` Michael S. Tsirkin
2016-07-27  1:23 ` [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process Liang Li
2016-07-27  1:23 ` Liang Li
2016-07-27  1:23   ` [Qemu-devel] " Liang Li
2016-07-27  1:23   ` Liang Li
2016-07-27 16:03   ` Dave Hansen
2016-07-27 16:03   ` Dave Hansen
2016-07-27 16:03     ` [Qemu-devel] " Dave Hansen
2016-07-27 16:03     ` Dave Hansen
2016-07-27 16:03     ` Dave Hansen
2016-07-27 21:39     ` Michael S. Tsirkin
2016-07-27 21:39       ` [Qemu-devel] " Michael S. Tsirkin
2016-07-27 21:39       ` Michael S. Tsirkin
2016-07-28  3:30       ` Li, Liang Z
2016-07-28  3:30         ` [Qemu-devel] " Li, Liang Z
2016-07-28  3:30         ` Li, Liang Z
2016-07-28  3:30         ` Li, Liang Z
2016-07-28 22:15         ` Michael S. Tsirkin
2016-07-28 22:15           ` [Qemu-devel] " Michael S. Tsirkin
2016-07-28 22:15           ` Michael S. Tsirkin
2016-07-28 22:15           ` Michael S. Tsirkin
2016-07-29  1:08           ` [virtio-dev] " Li, Liang Z
2016-07-29  1:08             ` [Qemu-devel] " Li, Liang Z
2016-07-29  1:08             ` Li, Liang Z
2016-07-29  1:08             ` Li, Liang Z
2016-07-28 22:15         ` Michael S. Tsirkin
2016-07-28  3:30       ` Li, Liang Z
2016-07-27 21:39     ` Michael S. Tsirkin
2016-07-28  1:13     ` Li, Liang Z
2016-07-28  1:13       ` [Qemu-devel] " Li, Liang Z
2016-07-28  1:13       ` Li, Liang Z
2016-07-28  1:13       ` Li, Liang Z
2016-07-28  1:45       ` Michael S. Tsirkin
2016-07-28  1:45       ` Michael S. Tsirkin
2016-07-28  1:45         ` [Qemu-devel] " Michael S. Tsirkin
2016-07-28  1:45         ` Michael S. Tsirkin
2016-07-28  1:45         ` Michael S. Tsirkin
2016-07-28  6:36         ` [virtio-dev] " Li, Liang Z
2016-07-28  6:36         ` Li, Liang Z
2016-07-28  6:36           ` [Qemu-devel] " Li, Liang Z
2016-07-28  6:36           ` Li, Liang Z
2016-07-28  6:36           ` Li, Liang Z
2016-07-28 21:51           ` [virtio-dev] " Michael S. Tsirkin
2016-07-28 21:51           ` Michael S. Tsirkin
2016-07-28 21:51             ` [Qemu-devel] " Michael S. Tsirkin
2016-07-28 21:51             ` Michael S. Tsirkin
2016-07-28 21:51             ` Michael S. Tsirkin
2016-07-29  0:46             ` [virtio-dev] " Li, Liang Z
2016-07-29  0:46             ` Li, Liang Z
2016-07-29  0:46               ` [Qemu-devel] " Li, Liang Z
2016-07-29  0:46               ` Li, Liang Z
2016-07-29  0:46               ` Li, Liang Z
2016-07-29 19:48             ` Dave Hansen
2016-07-29 19:48             ` Dave Hansen
2016-07-29 19:48               ` [Qemu-devel] " Dave Hansen
2016-07-29 19:48               ` Dave Hansen
2016-08-02  0:28               ` Li, Liang Z
2016-08-02  0:28               ` Li, Liang Z
2016-08-02  0:28                 ` [Qemu-devel] " Li, Liang Z
2016-08-02  0:28                 ` Li, Liang Z
2016-08-02  0:28                 ` Li, Liang Z
2016-07-27 21:36   ` Michael S. Tsirkin
2016-07-27 21:36   ` Michael S. Tsirkin
2016-07-27 21:36     ` [Qemu-devel] " Michael S. Tsirkin
2016-07-27 21:36     ` Michael S. Tsirkin
2016-07-28  3:06     ` Li, Liang Z
2016-07-28  3:06       ` [Qemu-devel] " Li, Liang Z
2016-07-28  3:06       ` Li, Liang Z
2016-07-28  3:06       ` Li, Liang Z
2016-07-28 22:17       ` Michael S. Tsirkin
2016-07-28 22:17       ` Michael S. Tsirkin
2016-07-28 22:17         ` [Qemu-devel] " Michael S. Tsirkin
2016-07-28 22:17         ` Michael S. Tsirkin
2016-07-28 22:17         ` Michael S. Tsirkin
2016-07-29  0:38         ` Li, Liang Z
2016-07-29  0:38           ` [Qemu-devel] " Li, Liang Z
2016-07-29  0:38           ` Li, Liang Z
2016-07-29  0:38           ` Li, Liang Z
2016-07-27 22:07   ` Michael S. Tsirkin
2016-07-27 22:07     ` [Qemu-devel] " Michael S. Tsirkin
2016-07-27 22:07     ` Michael S. Tsirkin
2016-07-27 22:07     ` Michael S. Tsirkin
2016-07-28  3:48     ` Li, Liang Z
2016-07-28  3:48       ` [Qemu-devel] " Li, Liang Z
2016-07-28  3:48       ` Li, Liang Z
2016-07-28  3:48       ` Li, Liang Z
2016-07-27  1:23 ` [PATCH v2 repost 5/7] virtio-balloon: define feature bit and head for misc virt queue Liang Li
2016-07-27  1:23   ` [Qemu-devel] " Liang Li
2016-07-27  1:23   ` Liang Li
2016-07-27  1:23 ` Liang Li
2016-07-27  1:23 ` [PATCH v2 repost 6/7] mm: add the related functions to get free page info Liang Li
2016-07-27  1:23   ` [Qemu-devel] " Liang Li
2016-07-27  1:23   ` Liang Li
2016-07-27 16:40   ` Dave Hansen
2016-07-27 16:40   ` Dave Hansen
2016-07-27 16:40     ` [Qemu-devel] " Dave Hansen
2016-07-27 16:40     ` Dave Hansen
2016-07-27 22:05     ` Michael S. Tsirkin
2016-07-27 22:05     ` Michael S. Tsirkin
2016-07-27 22:05       ` [Qemu-devel] " Michael S. Tsirkin
2016-07-27 22:05       ` Michael S. Tsirkin
2016-07-27 22:16       ` Dave Hansen
2016-07-27 22:16         ` [Qemu-devel] " Dave Hansen
2016-07-27 22:16         ` Dave Hansen
2016-07-27 23:05         ` Michael S. Tsirkin
2016-07-27 23:05           ` [Qemu-devel] " Michael S. Tsirkin
2016-07-27 23:05           ` Michael S. Tsirkin
2016-07-27 23:05         ` Michael S. Tsirkin
2016-07-28  4:36         ` Li, Liang Z
2016-07-28  4:36           ` [Qemu-devel] " Li, Liang Z
2016-07-28  4:36           ` Li, Liang Z
2016-07-28  4:36         ` Li, Liang Z
2016-07-27 22:16       ` Dave Hansen
2016-07-28  0:10     ` Li, Liang Z
2016-07-28  0:10     ` Li, Liang Z
2016-07-28  0:10       ` [Qemu-devel] " Li, Liang Z
2016-07-28  0:10       ` Li, Liang Z
2016-07-28  0:17       ` Michael S. Tsirkin
2016-07-28  0:17         ` [Qemu-devel] " Michael S. Tsirkin
2016-07-28  0:17         ` Michael S. Tsirkin
2016-07-28  0:17         ` Michael S. Tsirkin
2016-07-28  0:17       ` Michael S. Tsirkin
2016-07-27 22:13   ` Michael S. Tsirkin
2016-07-27 22:13     ` [Qemu-devel] " Michael S. Tsirkin
2016-07-27 22:13     ` Michael S. Tsirkin
2016-07-28  5:30     ` [virtio-dev] " Li, Liang Z
2016-07-28  5:30     ` Li, Liang Z
2016-07-28  5:30       ` [Qemu-devel] " Li, Liang Z
2016-07-28  5:30       ` Li, Liang Z
2016-07-28  5:30       ` Li, Liang Z
2016-07-27 22:13   ` Michael S. Tsirkin
2016-07-27  1:23 ` Liang Li
2016-07-27  1:23 ` [PATCH v2 repost 7/7] virtio-balloon: tell host vm's " Liang Li
2016-07-27  1:23   ` [Qemu-devel] " Liang Li
2016-07-27  1:23   ` Liang Li
2016-07-27 22:00   ` Michael S. Tsirkin
2016-07-27 22:00     ` [Qemu-devel] " Michael S. Tsirkin
2016-07-27 22:00     ` Michael S. Tsirkin
2016-07-27 22:00     ` Michael S. Tsirkin
2016-07-28  7:50     ` Li, Liang Z
2016-07-28  7:50     ` Li, Liang Z
2016-07-28  7:50       ` [Qemu-devel] " Li, Liang Z
2016-07-28  7:50       ` Li, Liang Z
2016-07-28  7:50       ` Li, Liang Z
2016-07-28 21:37       ` Michael S. Tsirkin
2016-07-28 21:37       ` Michael S. Tsirkin
2016-07-28 21:37         ` [Qemu-devel] " Michael S. Tsirkin
2016-07-28 21:37         ` Michael S. Tsirkin
2016-07-27  1:23 ` Liang Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.