linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH kernel v4 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration
@ 2016-11-02  6:17 Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 1/7] virtio-balloon: rework deflate to add page to a list Liang Li
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Liang Li @ 2016-11-02  6:17 UTC (permalink / raw)
  To: mst, dave.hansen
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck, Liang Li

This patch set contains two parts of changes to the virtio-balloon.
 
One is the change for speeding up the inflating & deflating process,
the main idea of this optimization is to use bitmap to send the page
information to host instead of the PFNs, to reduce the overhead of
virtio data transmission, address translation and madvise(). This can
help to improve the performance by about 85%.
 
Another change is for speeding up live migration. By skipping process
guest's unused pages in the first round of data copy, to reduce needless
data processing, this can help to save quite a lot of CPU cycles and
network bandwidth. We put guest's unused page information in a bitmap
and send it to host with the virt queue of virtio-balloon. For an idle
guest with 8GB RAM, this can help to shorten the total live migration
time from 2Sec to about 500ms in 10Gbps network environment.
 
Changes from v3 to v4:
    * Use the new scheme suggested by Dave Hansen to encode the bitmap.
    * Add code which is missed in v3 to handle migrate page. 
    * Free the memory for bitmap intime once the operation is done.
    * Address some of the comments in v3.

Changes from v2 to v3:
    * Change the name of 'free page' to 'unused page'.
    * Use the scatter & gather bitmap instead of a 1MB page bitmap.
    * Fix overwriting the page bitmap after kicking.
    * Some of MST's comments for v2.
 
Changes from v1 to v2:
    * Abandon the patch for dropping page cache.
    * Put some structures to uapi head file.
    * Use a new way to determine the page bitmap size.
    * Use a unified way to send the free page information with the bitmap
    * Address the issues referred in MST's comments

Liang Li (7):
  virtio-balloon: rework deflate to add page to a list
  virtio-balloon: define new feature bit and head struct
  mm: add a function to get the max pfn
  virtio-balloon: speed up inflate/deflate process
  mm: add the related functions to get unused page
  virtio-balloon: define flags and head for host request vq
  virtio-balloon: tell host vm's unused page info

 drivers/virtio/virtio_balloon.c     | 546 ++++++++++++++++++++++++++++++++----
 include/linux/mm.h                  |   3 +
 include/uapi/linux/virtio_balloon.h |  41 +++
 mm/page_alloc.c                     |  95 +++++++
 4 files changed, 636 insertions(+), 49 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH kernel v4 1/7] virtio-balloon: rework deflate to add page to a list
  2016-11-02  6:17 [PATCH kernel v4 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
@ 2016-11-02  6:17 ` Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 2/7] virtio-balloon: define new feature bit and head struct Liang Li
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Liang Li @ 2016-11-02  6:17 UTC (permalink / raw)
  To: mst, dave.hansen
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck, Liang Li

When doing the inflating/deflating operation, the current virtio-balloon
implementation uses an array to save 256 PFNS, then send these PFNS to
host through virtio and process each PFN one by one. This way is not
efficient when inflating/deflating a large mount of memory because too
many times of the following operations:

    1. Virtio data transmission
    2. Page allocate/free
    3. Address translation(GPA->HVA)
    4. madvise

The over head of these operations will consume a lot of CPU cycles and
will take a long time to complete, it may impact the QoS of the guest as
well as the host. The overhead will be reduced a lot if batch processing
is used. E.g. If there are several pages whose address are physical
contiguous in the guest, these bulk pages can be processed in one
operation.

The main idea for the optimization is to reduce the above operations as
much as possible. And it can be achieved by using a bitmap instead of an
PFN array. Comparing with PFN array, for a specific size buffer, bitmap
can present more pages, which is very important for batch processing.

Using bitmap instead of PFN is not very helpful when inflating/deflating
a small mount of pages, in this case, using PFNs is better. But using
bitmap will not impact the QoS of guest or host heavily because the
operation will be completed very soon for a small mount of pages, and we
will use some methods to make sure the efficiency not drop too much.

This patch saves the deflated pages to a list instead of the PFN array,
which will allow faster notifications using a bitmap down the road.
balloon_pfn_to_page() can be removed because it's useless.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
---
 drivers/virtio/virtio_balloon.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 4e7003d..59ffe5a 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -103,12 +103,6 @@ static u32 page_to_balloon_pfn(struct page *page)
 	return pfn * VIRTIO_BALLOON_PAGES_PER_PAGE;
 }
 
-static struct page *balloon_pfn_to_page(u32 pfn)
-{
-	BUG_ON(pfn % VIRTIO_BALLOON_PAGES_PER_PAGE);
-	return pfn_to_page(pfn / VIRTIO_BALLOON_PAGES_PER_PAGE);
-}
-
 static void balloon_ack(struct virtqueue *vq)
 {
 	struct virtio_balloon *vb = vq->vdev->priv;
@@ -181,18 +175,16 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 	return num_allocated_pages;
 }
 
-static void release_pages_balloon(struct virtio_balloon *vb)
+static void release_pages_balloon(struct virtio_balloon *vb,
+				 struct list_head *pages)
 {
-	unsigned int i;
-	struct page *page;
+	struct page *page, *next;
 
-	/* Find pfns pointing at start of each page, get pages and free them. */
-	for (i = 0; i < vb->num_pfns; i += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		page = balloon_pfn_to_page(virtio32_to_cpu(vb->vdev,
-							   vb->pfns[i]));
+	list_for_each_entry_safe(page, next, pages, lru) {
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
 			adjust_managed_page_count(page, 1);
+		list_del(&page->lru);
 		put_page(page); /* balloon reference */
 	}
 }
@@ -202,6 +194,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	unsigned num_freed_pages;
 	struct page *page;
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
+	LIST_HEAD(pages);
 
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
@@ -215,6 +208,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 		if (!page)
 			break;
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		list_add(&page->lru, &pages);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
 
@@ -226,7 +220,7 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 */
 	if (vb->num_pfns != 0)
 		tell_host(vb, vb->deflate_vq);
-	release_pages_balloon(vb);
+	release_pages_balloon(vb, &pages);
 	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH kernel v4 2/7] virtio-balloon: define new feature bit and head struct
  2016-11-02  6:17 [PATCH kernel v4 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 1/7] virtio-balloon: rework deflate to add page to a list Liang Li
@ 2016-11-02  6:17 ` Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 3/7] mm: add a function to get the max pfn Liang Li
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Liang Li @ 2016-11-02  6:17 UTC (permalink / raw)
  To: mst, dave.hansen
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck, Liang Li

Add a new feature which supports sending the page information with
a bitmap. The current implementation uses PFNs array, which is not
very efficient. Using bitmap can improve the performance of
inflating/deflating significantly

The page bitmap header will used to tell the host some information
about the page bitmap. e.g. the page size, page bitmap length and
start pfn.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
---
 include/uapi/linux/virtio_balloon.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index 343d7dd..bed6f41 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -34,6 +34,7 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_PAGE_BITMAP	3 /* Send page info with bitmap */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -82,4 +83,22 @@ struct virtio_balloon_stat {
 	__virtio64 val;
 } __attribute__((packed));
 
+/* Response header structure */
+struct virtio_balloon_resp_hdr {
+	__le64 cmd : 8; /* Distinguish different requests type */
+	__le64 flag: 8; /* Mark status for a specific request type */
+	__le64 id : 16; /* Distinguish requests of a specific type */
+	__le64 data_len: 32; /* Length of the following data, in bytes */
+};
+
+/* Page bitmap header structure */
+struct virtio_balloon_bmap_hdr {
+	struct {
+		__le64 start_pfn : 52; /* start pfn for the bitmap */
+		__le64 page_shift : 6; /* page shift width, in bytes */
+		__le64 bmap_len : 6;  /* bitmap length, in bytes */
+	} head;
+	__le64 bmap[0];
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH kernel v4 3/7] mm: add a function to get the max pfn
  2016-11-02  6:17 [PATCH kernel v4 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 1/7] virtio-balloon: rework deflate to add page to a list Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 2/7] virtio-balloon: define new feature bit and head struct Liang Li
@ 2016-11-02  6:17 ` Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 4/7] virtio-balloon: speed up inflate/deflate process Liang Li
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Liang Li @ 2016-11-02  6:17 UTC (permalink / raw)
  To: mst, dave.hansen
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck, Liang Li, Andrew Morton

Expose the function to get the max pfn, so it can be used in the
virtio-balloon device driver. Simply include the 'linux/bootmem.h'
is not enough, if the device driver is built to a module, directly
refer the max_pfn lead to build failed.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
---
 include/linux/mm.h |  1 +
 mm/page_alloc.c    | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a92c8d7..f47862a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1772,6 +1772,7 @@ static inline spinlock_t *pmd_lock(struct mm_struct *mm, pmd_t *pmd)
 extern void free_area_init_node(int nid, unsigned long * zones_size,
 		unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
+extern unsigned long get_max_pfn(void);
 
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fd42aa..12cc8ed 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4428,6 +4428,16 @@ void show_free_areas(unsigned int filter)
 	show_swap_cache_info();
 }
 
+/*
+ * The max_pfn can change because of memory hot plug, so it's only good
+ * as a hint. e.g. for sizing data structures.
+ */
+unsigned long get_max_pfn(void)
+{
+	return max_pfn;
+}
+EXPORT_SYMBOL(get_max_pfn);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH kernel v4 4/7] virtio-balloon: speed up inflate/deflate process
  2016-11-02  6:17 [PATCH kernel v4 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
                   ` (2 preceding siblings ...)
  2016-11-02  6:17 ` [PATCH kernel v4 3/7] mm: add a function to get the max pfn Liang Li
@ 2016-11-02  6:17 ` Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 5/7] mm: add the related functions to get unused page Liang Li
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Liang Li @ 2016-11-02  6:17 UTC (permalink / raw)
  To: mst, dave.hansen
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck, Liang Li

The implementation of the current virtio-balloon is not very
efficient, the time spends on different stages of inflating
the balloon to 7GB of a 8GB idle guest:

a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

It takes about 4126ms for the inflating process to complete.
Debugging shows that the bottle neck are the stage b and stage d.

If using a bitmap to send the page info instead of the PFNs, we
can reduce the overhead in stage b quite a lot. Furthermore, we
can do the address translation and call madvise() with a bulk of
RAM pages, instead of the current page per page way, the overhead
of stage c and stage d can also be reduced a lot.

This patch is the kernel side implementation which is intended to
speed up the inflating & deflating process by adding a new feature
to the virtio-balloon device. With this new feature, inflating the
balloon to 7GB of a 8GB idle guest only takes 590ms, the
performance improvement is about 85%.

TODO: optimize stage a by allocating/freeing a chunk of pages
instead of a single page at a time.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
---
 drivers/virtio/virtio_balloon.c | 398 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 369 insertions(+), 29 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 59ffe5a..c6c94b6 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -42,6 +42,10 @@
 #define OOM_VBALLOON_DEFAULT_PAGES 256
 #define VIRTBALLOON_OOM_NOTIFY_PRIORITY 80
 
+#define BALLOON_BMAP_SIZE	(8 * PAGE_SIZE)
+#define PFNS_PER_BMAP		(BALLOON_BMAP_SIZE * BITS_PER_BYTE)
+#define BALLOON_BMAP_COUNT	32
+
 static int oom_pages = OOM_VBALLOON_DEFAULT_PAGES;
 module_param(oom_pages, int, S_IRUSR | S_IWUSR);
 MODULE_PARM_DESC(oom_pages, "pages to free on OOM");
@@ -67,6 +71,18 @@ struct virtio_balloon {
 
 	/* Number of balloon pages we've told the Host we're not using. */
 	unsigned int num_pages;
+	/* Pointer to the response header. */
+	void *resp_hdr;
+	/* Pointer to the start address of response data. */
+	unsigned long *resp_data;
+	/* Pointer offset of the response data. */
+	unsigned long resp_pos;
+	/* Bitmap and bitmap count used to tell the host the pages */
+	unsigned long *page_bitmap[BALLOON_BMAP_COUNT];
+	/* Number of split page bitmaps */
+	unsigned int nr_page_bmap;
+	/* Used to record the processed pfn range */
+	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
 	/*
 	 * The pages we've told the Host we're not using are enqueued
 	 * at vb_dev_info->pages list.
@@ -110,20 +126,227 @@ static void balloon_ack(struct virtqueue *vq)
 	wake_up(&vb->acked);
 }
 
-static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
+static inline void init_bmap_pfn_range(struct virtio_balloon *vb)
 {
-	struct scatterlist sg;
+	vb->min_pfn = ULONG_MAX;
+	vb->max_pfn = 0;
+}
+
+static inline void update_bmap_pfn_range(struct virtio_balloon *vb,
+				 struct page *page)
+{
+	unsigned long balloon_pfn = page_to_balloon_pfn(page);
+
+	vb->min_pfn = min(balloon_pfn, vb->min_pfn);
+	vb->max_pfn = max(balloon_pfn, vb->max_pfn);
+}
+
+static void extend_page_bitmap(struct virtio_balloon *vb)
+{
+	int i, bmap_count;
+	unsigned long bmap_len;
+
+	bmap_len = ALIGN(get_max_pfn(), BITS_PER_LONG) / BITS_PER_BYTE;
+	bmap_len = ALIGN(bmap_len, BALLOON_BMAP_SIZE);
+	bmap_count = min((int)(bmap_len / BALLOON_BMAP_SIZE),
+				 BALLOON_BMAP_COUNT);
+
+	for (i = 1; i < bmap_count; i++) {
+		vb->page_bitmap[i] = kmalloc(BALLOON_BMAP_SIZE, GFP_KERNEL);
+		if (vb->page_bitmap[i])
+			vb->nr_page_bmap++;
+		else
+			break;
+	}
+}
+
+static void free_extended_page_bitmap(struct virtio_balloon *vb)
+{
+	int i, bmap_count = vb->nr_page_bmap;
+
+
+	for (i = 1; i < bmap_count; i++) {
+		kfree(vb->page_bitmap[i]);
+		vb->page_bitmap[i] = NULL;
+		vb->nr_page_bmap--;
+	}
+}
+
+static void kfree_page_bitmap(struct virtio_balloon *vb)
+{
+	int i;
+
+	for (i = 0; i < vb->nr_page_bmap; i++)
+		kfree(vb->page_bitmap[i]);
+}
+
+static void clear_page_bitmap(struct virtio_balloon *vb)
+{
+	int i;
+
+	for (i = 0; i < vb->nr_page_bmap; i++)
+		memset(vb->page_bitmap[i], 0, BALLOON_BMAP_SIZE);
+}
+
+static unsigned long do_set_resp_bitmap(struct virtio_balloon *vb,
+	unsigned long *bitmap,	unsigned long base_pfn,
+	unsigned long pos, int nr_page)
+
+{
+	struct virtio_balloon_bmap_hdr *hdr;
+	unsigned long end, new_pos, new_end, nr_left, proccessed = 0;
+
+	new_pos = pos;
+	new_end = end = pos + nr_page;
+
+	if (pos % BITS_PER_LONG) {
+		unsigned long pos_s;
+
+		pos_s = rounddown(pos, BITS_PER_LONG);
+		hdr = (struct virtio_balloon_bmap_hdr *)(vb->resp_data
+							 + vb->resp_pos);
+		hdr->head.start_pfn = base_pfn + pos_s;
+		hdr->head.page_shift = PAGE_SHIFT;
+		hdr->head.bmap_len = sizeof(unsigned long);
+		hdr->bmap[0] = cpu_to_virtio64(vb->vdev,
+				 bitmap[pos_s / BITS_PER_LONG]);
+		vb->resp_pos += 2;
+		if (pos_s + BITS_PER_LONG >= end)
+			return roundup(end, BITS_PER_LONG) - pos;
+		new_pos = roundup(pos, BITS_PER_LONG);
+	}
+
+	if (end % BITS_PER_LONG) {
+		unsigned long pos_e;
+
+		pos_e = roundup(end, BITS_PER_LONG);
+		hdr = (struct virtio_balloon_bmap_hdr *)(vb->resp_data
+							 + vb->resp_pos);
+		hdr->head.start_pfn = base_pfn + pos_e - BITS_PER_LONG;
+		hdr->head.page_shift = PAGE_SHIFT;
+		hdr->head.bmap_len = sizeof(unsigned long);
+		hdr->bmap[0] = bitmap[pos_e / BITS_PER_LONG - 1];
+		vb->resp_pos += 2;
+		if (new_pos + BITS_PER_LONG >= pos_e)
+			return pos_e - pos;
+		new_end = rounddown(end, BITS_PER_LONG);
+	}
+
+	nr_left = nr_page = new_end - new_pos;
+
+	while (proccessed < nr_page) {
+		int bulk, order;
+
+		order = get_order(nr_left << PAGE_SHIFT);
+		if ((1 << order) > nr_left)
+			order--;
+		hdr = (struct virtio_balloon_bmap_hdr *)(vb->resp_data
+							 + vb->resp_pos);
+		hdr->head.start_pfn = base_pfn + new_pos + proccessed;
+		hdr->head.page_shift = order + PAGE_SHIFT;
+		hdr->head.bmap_len = 0;
+		bulk = 1 << order;
+		nr_left -= bulk;
+		proccessed += bulk;
+		vb->resp_pos++;
+	}
+
+	return roundup(end, BITS_PER_LONG) - pos;
+}
+
+static void send_resp_data(struct virtio_balloon *vb, struct virtqueue *vq,
+			bool busy_wait)
+{
+	struct scatterlist sg[2];
+	struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr;
 	unsigned int len;
 
-	sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
+	len = hdr->data_len = vb->resp_pos * sizeof(unsigned long);
+	sg_init_table(sg, 2);
+	sg_set_buf(&sg[0], hdr, sizeof(struct virtio_balloon_resp_hdr));
+	sg_set_buf(&sg[1], vb->resp_data, len);
+
+	if (virtqueue_add_outbuf(vq, sg, 2, vb, GFP_KERNEL) == 0) {
+		virtqueue_kick(vq);
+		if (busy_wait)
+			while (!virtqueue_get_buf(vq, &len)
+				&& !virtqueue_is_broken(vq))
+				cpu_relax();
+		else
+			wait_event(vb->acked, virtqueue_get_buf(vq, &len));
+		vb->resp_pos = 0;
+		free_extended_page_bitmap(vb);
+	}
+}
 
-	/* We should always be able to add one buffer to an empty queue. */
-	virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
-	virtqueue_kick(vq);
+static void set_bulk_pages(struct virtio_balloon *vb, struct virtqueue *vq,
+		unsigned long start_pfn, unsigned long *bitmap,
+		unsigned long len, bool busy_wait)
+{
+	unsigned long pos = 0, end = len * BITS_PER_BYTE;
+
+	while (pos < end) {
+		unsigned long one = find_next_bit(bitmap, end, pos);
+
+		if ((vb->resp_pos + 64) * sizeof(unsigned long) >
+			 BALLOON_BMAP_SIZE)
+			send_resp_data(vb, vq, busy_wait);
+		if (one < end) {
+			unsigned long pages, zero;
+
+			zero = find_next_zero_bit(bitmap, end, one + 1);
+			if (zero >= end)
+				pages = end - one;
+			else
+				pages = zero - one;
+			if (pages) {
+				pages = do_set_resp_bitmap(vb, bitmap,
+					 start_pfn, one, pages);
+			}
+			pos = one + pages;
+		} else
+			pos = one;
+	}
+}
 
-	/* When host has read buffer, this completes via balloon_ack */
-	wait_event(vb->acked, virtqueue_get_buf(vq, &len));
+static void tell_host(struct virtio_balloon *vb, struct virtqueue *vq)
+{
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP)) {
+		int nr_pfn, nr_used_bmap, i;
+		unsigned long start_pfn, bmap_len;
+
+		start_pfn = vb->start_pfn;
+		nr_pfn = vb->end_pfn - start_pfn + 1;
+		nr_pfn = roundup(nr_pfn, BITS_PER_LONG);
+		nr_used_bmap = nr_pfn / PFNS_PER_BMAP;
+		if (nr_pfn % PFNS_PER_BMAP)
+			nr_used_bmap++;
+		bmap_len = nr_pfn / BITS_PER_BYTE;
+
+		for (i = 0; i < nr_used_bmap; i++) {
+			unsigned int bmap_size = BALLOON_BMAP_SIZE;
+
+			if (i + 1 == nr_used_bmap)
+				bmap_size = bmap_len - BALLOON_BMAP_SIZE * i;
+			set_bulk_pages(vb, vq, start_pfn + i * PFNS_PER_BMAP,
+				 vb->page_bitmap[i], bmap_size, false);
+		}
+		if (vb->resp_pos > 0)
+			send_resp_data(vb, vq, false);
+	} else {
+		struct scatterlist sg;
+		unsigned int len;
+
+		sg_init_one(&sg, vb->pfns, sizeof(vb->pfns[0]) * vb->num_pfns);
 
+		/* We should always be able to add one buffer to an
+		 * empty queue
+		 */
+		virtqueue_add_outbuf(vq, &sg, 1, vb, GFP_KERNEL);
+		virtqueue_kick(vq);
+		/* When host has read buffer, this completes via balloon_ack */
+		wait_event(vb->acked, virtqueue_get_buf(vq, &len));
+	}
 }
 
 static void set_page_pfns(struct virtio_balloon *vb,
@@ -138,13 +361,59 @@ static void set_page_pfns(struct virtio_balloon *vb,
 					  page_to_balloon_pfn(page) + i);
 }
 
-static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
+static void set_page_bitmap(struct virtio_balloon *vb,
+			 struct list_head *pages, struct virtqueue *vq)
 {
-	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
-	unsigned num_allocated_pages;
+	unsigned long pfn, pfn_limit;
+	struct page *page;
+	bool found;
+	int bmap_idx;
+
+	vb->min_pfn = rounddown(vb->min_pfn, BITS_PER_LONG);
+	vb->max_pfn = roundup(vb->max_pfn, BITS_PER_LONG);
+	pfn_limit = PFNS_PER_BMAP * vb->nr_page_bmap;
+
+	for (pfn = vb->min_pfn; pfn < vb->max_pfn; pfn += pfn_limit) {
+		unsigned long end_pfn;
+
+		clear_page_bitmap(vb);
+		vb->start_pfn = pfn;
+		end_pfn = pfn;
+		found = false;
+		list_for_each_entry(page, pages, lru) {
+			unsigned long pos, balloon_pfn;
+
+			balloon_pfn = page_to_balloon_pfn(page);
+			if (balloon_pfn < pfn || balloon_pfn >= pfn + pfn_limit)
+				continue;
+			bmap_idx = (balloon_pfn - pfn) / PFNS_PER_BMAP;
+			pos = (balloon_pfn - pfn) % PFNS_PER_BMAP;
+			set_bit(pos, vb->page_bitmap[bmap_idx]);
+			if (balloon_pfn > end_pfn)
+				end_pfn = balloon_pfn;
+			found = true;
+		}
+		if (found) {
+			vb->end_pfn = end_pfn;
+			tell_host(vb, vq);
+		}
+	}
+}
 
-	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num)
+{
+	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
+	unsigned int num_allocated_pages;
+	bool use_bmap = virtio_has_feature(vb->vdev,
+				 VIRTIO_BALLOON_F_PAGE_BITMAP);
+
+	if (use_bmap) {
+		if (vb->nr_page_bmap == 1)
+			extend_page_bitmap(vb);
+		init_bmap_pfn_range(vb);
+	} else
+		/* We can only do one array worth at a time. */
+		num = min(num, ARRAY_SIZE(vb->pfns));
 
 	mutex_lock(&vb->balloon_lock);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -159,7 +428,10 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 			msleep(200);
 			break;
 		}
-		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		if (use_bmap)
+			update_bmap_pfn_range(vb, page);
+		else
+			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
@@ -168,8 +440,13 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 
 	num_allocated_pages = vb->num_pfns;
 	/* Did we get any? */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->inflate_vq);
+	if (vb->num_pfns != 0) {
+		if (use_bmap)
+			set_page_bitmap(vb, &vb_dev_info->pages,
+					vb->inflate_vq);
+		else
+			tell_host(vb, vb->inflate_vq);
+	}
 	mutex_unlock(&vb->balloon_lock);
 
 	return num_allocated_pages;
@@ -189,15 +466,22 @@ static void release_pages_balloon(struct virtio_balloon *vb,
 	}
 }
 
-static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned int leak_balloon(struct virtio_balloon *vb, size_t num)
 {
-	unsigned num_freed_pages;
+	unsigned int num_freed_pages;
 	struct page *page;
 	struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info;
 	LIST_HEAD(pages);
+	bool use_bmap = virtio_has_feature(vb->vdev,
+			 VIRTIO_BALLOON_F_PAGE_BITMAP);
 
-	/* We can only do one array worth at a time. */
-	num = min(num, ARRAY_SIZE(vb->pfns));
+	if (use_bmap) {
+		if (vb->nr_page_bmap == 1)
+			extend_page_bitmap(vb);
+		init_bmap_pfn_range(vb);
+	} else
+		/* We can only do one array worth at a time. */
+		num = min(num, ARRAY_SIZE(vb->pfns));
 
 	mutex_lock(&vb->balloon_lock);
 	/* We can't release more pages than taken */
@@ -207,7 +491,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 		page = balloon_page_dequeue(vb_dev_info);
 		if (!page)
 			break;
-		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+		if (use_bmap)
+			update_bmap_pfn_range(vb, page);
+		else
+			set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
 		list_add(&page->lru, &pages);
 		vb->num_pages -= VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
@@ -218,8 +505,12 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	 * virtio_has_feature(vdev, VIRTIO_BALLOON_F_MUST_TELL_HOST);
 	 * is true, we *have* to do it in this order
 	 */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->deflate_vq);
+	if (vb->num_pfns != 0) {
+		if (use_bmap)
+			set_page_bitmap(vb, &pages, vb->deflate_vq);
+		else
+			tell_host(vb, vb->deflate_vq);
+	}
 	release_pages_balloon(vb, &pages);
 	mutex_unlock(&vb->balloon_lock);
 	return num_freed_pages;
@@ -431,6 +722,20 @@ static int init_vqs(struct virtio_balloon *vb)
 }
 
 #ifdef CONFIG_BALLOON_COMPACTION
+static void tell_host_one_page(struct virtio_balloon *vb,
+	struct virtqueue *vq, struct page *page)
+{
+	struct virtio_balloon_bmap_hdr *bmap_hdr;
+
+	bmap_hdr = (struct virtio_balloon_bmap_hdr *)(vb->resp_data
+							 + vb->resp_pos);
+	bmap_hdr->head.start_pfn = page_to_pfn(page);
+	bmap_hdr->head.page_shift = PAGE_SHIFT;
+	bmap_hdr->head.bmap_len = 0;
+	vb->resp_pos++;
+	send_resp_data(vb, vq, false);
+}
+
 /*
  * virtballoon_migratepage - perform the balloon page migration on behalf of
  *			     a compation thread.     (called under page lock)
@@ -455,6 +760,8 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
 	struct virtio_balloon *vb = container_of(vb_dev_info,
 			struct virtio_balloon, vb_dev_info);
 	unsigned long flags;
+	bool use_bmap = virtio_has_feature(vb->vdev,
+				 VIRTIO_BALLOON_F_PAGE_BITMAP);
 
 	/*
 	 * In order to avoid lock contention while migrating pages concurrently
@@ -475,15 +782,23 @@ static int virtballoon_migratepage(struct balloon_dev_info *vb_dev_info,
 	vb_dev_info->isolated_pages--;
 	__count_vm_event(BALLOON_MIGRATE);
 	spin_unlock_irqrestore(&vb_dev_info->pages_lock, flags);
-	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
-	set_page_pfns(vb, vb->pfns, newpage);
-	tell_host(vb, vb->inflate_vq);
+	if (use_bmap)
+		tell_host_one_page(vb, vb->inflate_vq, newpage);
+	else {
+		vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+		set_page_pfns(vb, vb->pfns, newpage);
+		tell_host(vb, vb->inflate_vq);
+	}
 
 	/* balloon's page migration 2nd step -- deflate "page" */
 	balloon_page_delete(page);
-	vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
-	set_page_pfns(vb, vb->pfns, page);
-	tell_host(vb, vb->deflate_vq);
+	if (use_bmap)
+		tell_host_one_page(vb, vb->deflate_vq, page);
+	else {
+		vb->num_pfns = VIRTIO_BALLOON_PAGES_PER_PAGE;
+		set_page_pfns(vb, vb->pfns, page);
+		tell_host(vb, vb->deflate_vq);
+	}
 
 	mutex_unlock(&vb->balloon_lock);
 
@@ -533,6 +848,28 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	spin_lock_init(&vb->stop_update_lock);
 	vb->stop_update = false;
 	vb->num_pages = 0;
+	vb->resp_hdr = kzalloc(sizeof(struct virtio_balloon_resp_hdr),
+				 GFP_KERNEL);
+	/* Clear the feature bit if memory allocation fails */
+	if (!vb->resp_hdr)
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+	else {
+		vb->page_bitmap[0] = kmalloc(BALLOON_BMAP_SIZE, GFP_KERNEL);
+		if (!vb->page_bitmap[0]) {
+			__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+			kfree(vb->resp_hdr);
+		} else {
+			vb->nr_page_bmap = 1;
+			vb->resp_data = kmalloc(BALLOON_BMAP_SIZE, GFP_KERNEL);
+			if (!vb->resp_data) {
+				__virtio_clear_bit(vdev,
+						VIRTIO_BALLOON_F_PAGE_BITMAP);
+				kfree(vb->page_bitmap[0]);
+				kfree(vb->resp_hdr);
+			}
+		}
+	}
+	vb->resp_pos = 0;
 	mutex_init(&vb->balloon_lock);
 	init_waitqueue_head(&vb->acked);
 	vb->vdev = vdev;
@@ -609,6 +946,8 @@ static void virtballoon_remove(struct virtio_device *vdev)
 	remove_common(vb);
 	if (vb->vb_dev_info.inode)
 		iput(vb->vb_dev_info.inode);
+	kfree_page_bitmap(vb);
+	kfree(vb->resp_hdr);
 	kfree(vb);
 }
 
@@ -647,6 +986,7 @@ static int virtballoon_restore(struct virtio_device *vdev)
 	VIRTIO_BALLOON_F_MUST_TELL_HOST,
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
+	VIRTIO_BALLOON_F_PAGE_BITMAP,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH kernel v4 5/7] mm: add the related functions to get unused page
  2016-11-02  6:17 [PATCH kernel v4 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
                   ` (3 preceding siblings ...)
  2016-11-02  6:17 ` [PATCH kernel v4 4/7] virtio-balloon: speed up inflate/deflate process Liang Li
@ 2016-11-02  6:17 ` Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 6/7] virtio-balloon: define flags and head for host request vq Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info Liang Li
  6 siblings, 0 replies; 14+ messages in thread
From: Liang Li @ 2016-11-02  6:17 UTC (permalink / raw)
  To: mst, dave.hansen
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck, Liang Li, Andrew Morton

Save the unused page info into a split page bitmap. The virtio
balloon driver will use this new API to get the unused page bitmap
and send the bitmap to hypervisor(QEMU) to speed up live migration.
During sending the bitmap, some the pages may be modified and are
no free anymore, this inaccuracy can be corrected by the dirty
page logging mechanism.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
---
 include/linux/mm.h |  2 ++
 mm/page_alloc.c    | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f47862a..7014d8a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1773,6 +1773,8 @@ extern void free_area_init_node(int nid, unsigned long * zones_size,
 		unsigned long zone_start_pfn, unsigned long *zholes_size);
 extern void free_initmem(void);
 extern unsigned long get_max_pfn(void);
+extern int get_unused_pages(unsigned long start_pfn, unsigned long end_pfn,
+	unsigned long *bitmap[], unsigned long len, unsigned int nr_bmap);
 
 /*
  * Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 12cc8ed..72537cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4438,6 +4438,91 @@ unsigned long get_max_pfn(void)
 }
 EXPORT_SYMBOL(get_max_pfn);
 
+static void mark_unused_pages_bitmap(struct zone *zone,
+		unsigned long start_pfn, unsigned long end_pfn,
+		unsigned long *bitmap[], unsigned long bits,
+		unsigned int nr_bmap)
+{
+	unsigned long pfn, flags, nr_pg, pos, *bmap;
+	unsigned int order, i, t, bmap_idx;
+	struct list_head *curr;
+
+	if (zone_is_empty(zone))
+		return;
+
+	end_pfn = min(start_pfn + nr_bmap * bits, end_pfn);
+	spin_lock_irqsave(&zone->lock, flags);
+
+	for_each_migratetype_order(order, t) {
+		list_for_each(curr, &zone->free_area[order].free_list[t]) {
+			pfn = page_to_pfn(list_entry(curr, struct page, lru));
+			if (pfn < start_pfn || pfn >= end_pfn)
+				continue;
+			nr_pg = 1UL << order;
+			if (pfn + nr_pg > end_pfn)
+				nr_pg = end_pfn - pfn;
+			bmap_idx = (pfn - start_pfn) / bits;
+			if (bmap_idx == (pfn + nr_pg - start_pfn) / bits) {
+				bmap = bitmap[bmap_idx];
+				pos = (pfn - start_pfn) % bits;
+				bitmap_set(bmap, pos, nr_pg);
+			} else
+				for (i = 0; i < nr_pg; i++) {
+					pos = pfn - start_pfn + i;
+					bmap_idx = pos / bits;
+					bmap = bitmap[bmap_idx];
+					pos = pos % bits;
+					bitmap_set(bmap, pos, 1);
+				}
+		}
+	}
+
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
+/*
+ * During live migration, page is always discardable unless it's
+ * content is needed by the system.
+ * get_unused_pages provides an API to get the unused pages, these
+ * unused pages can be discarded if there is no modification since
+ * the request. Some other mechanism, like the dirty page logging
+ * can be used to track the modification.
+ *
+ * This function scans the free page list to get the unused pages
+ * whose pfn are range from start_pfn to end_pfn, and set the
+ * corresponding bit in the bitmap if an unused page is found.
+ *
+ * Allocating a large bitmap may fail because of fragmentation,
+ * instead of using a single bitmap, we use a scatter/gather bitmap.
+ * The 'bitmap' is the start address of an array which contains
+ * 'nr_bmap' separate small bitmaps, each bitmap contains 'bits' bits.
+ *
+ * return -1 if parameters are invalid
+ * return 0 when end_pfn >= max_pfn
+ * return 1 when end_pfn < max_pfn
+ */
+int get_unused_pages(unsigned long start_pfn, unsigned long end_pfn,
+	unsigned long *bitmap[], unsigned long bits, unsigned int nr_bmap)
+{
+	struct zone *zone;
+	int ret = 0;
+
+	if (bitmap == NULL || *bitmap == NULL || nr_bmap == 0 ||
+		 bits == 0 || start_pfn > end_pfn)
+		return -1;
+	if (end_pfn < max_pfn)
+		ret = 1;
+	if (end_pfn >= max_pfn)
+		ret = 0;
+
+	for_each_populated_zone(zone)
+		mark_unused_pages_bitmap(zone, start_pfn, end_pfn, bitmap,
+					 bits, nr_bmap);
+
+	return ret;
+}
+EXPORT_SYMBOL(get_unused_pages);
+
 static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref)
 {
 	zoneref->zone = zone;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH kernel v4 6/7] virtio-balloon: define flags and head for host request vq
  2016-11-02  6:17 [PATCH kernel v4 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
                   ` (4 preceding siblings ...)
  2016-11-02  6:17 ` [PATCH kernel v4 5/7] mm: add the related functions to get unused page Liang Li
@ 2016-11-02  6:17 ` Liang Li
  2016-11-02  6:17 ` [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info Liang Li
  6 siblings, 0 replies; 14+ messages in thread
From: Liang Li @ 2016-11-02  6:17 UTC (permalink / raw)
  To: mst, dave.hansen
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck, Liang Li, Andrew Morton

Define the flags and head struct for a new host request virtual
queue. Guest can get requests from host and then responds to them on
this new virtual queue.
Host can make use of this virtual queue to request the guest do some
operations, e.g. drop page cache, synchronize file system, etc.
And the hypervisor can get some of guest's runtime information
through this virtual queue too, e.g. the guest's unused page
information, which can be used for live migration optimization.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
---
 include/uapi/linux/virtio_balloon.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index bed6f41..c4e34d0 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -35,6 +35,7 @@
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_PAGE_BITMAP	3 /* Send page info with bitmap */
+#define VIRTIO_BALLOON_F_HOST_REQ_VQ	4 /* Host request virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -101,4 +102,25 @@ struct virtio_balloon_bmap_hdr {
 	__le64 bmap[0];
 };
 
+enum virtio_balloon_req_id {
+	/* Get unused page information */
+	BALLOON_GET_UNUSED_PAGES,
+};
+
+enum virtio_balloon_flag {
+	/* Have more data for a request */
+	BALLOON_FLAG_CONT,
+	/* No more data for a request */
+	BALLOON_FLAG_DONE,
+};
+
+struct virtio_balloon_req_hdr {
+	/* Used to distinguish different requests */
+	__le16 cmd;
+	/* Reserved */
+	__le16 reserved[3];
+	/* Request parameter */
+	__le64 param;
+};
+
 #endif /* _LINUX_VIRTIO_BALLOON_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info
  2016-11-02  6:17 [PATCH kernel v4 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
                   ` (5 preceding siblings ...)
  2016-11-02  6:17 ` [PATCH kernel v4 6/7] virtio-balloon: define flags and head for host request vq Liang Li
@ 2016-11-02  6:17 ` Liang Li
  2016-11-04 18:10   ` Dave Hansen
  6 siblings, 1 reply; 14+ messages in thread
From: Liang Li @ 2016-11-02  6:17 UTC (permalink / raw)
  To: mst, dave.hansen
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck, Liang Li

Support the request for vm's unused page information, response with
a page bitmap. QEMU can make use of this bitmap and the dirty page
logging mechanism to skip the transportation of some of these unused
pages, this is very helpful to reduce the network traffic and  speed
up the live migration process.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
---
 drivers/virtio/virtio_balloon.c | 128 +++++++++++++++++++++++++++++++++++++---
 1 file changed, 121 insertions(+), 7 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index c6c94b6..ba2d37b 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -56,7 +56,7 @@
 
 struct virtio_balloon {
 	struct virtio_device *vdev;
-	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *req_vq;
 
 	/* The balloon servicing is delegated to a freezable workqueue. */
 	struct work_struct update_balloon_stats_work;
@@ -83,6 +83,8 @@ struct virtio_balloon {
 	unsigned int nr_page_bmap;
 	/* Used to record the processed pfn range */
 	unsigned long min_pfn, max_pfn, start_pfn, end_pfn;
+	/* Request header */
+	struct virtio_balloon_req_hdr req_hdr;
 	/*
 	 * The pages we've told the Host we're not using are enqueued
 	 * at vb_dev_info->pages list.
@@ -552,6 +554,63 @@ static void update_balloon_stats(struct virtio_balloon *vb)
 				pages_to_bytes(available));
 }
 
+static void send_unused_pages_info(struct virtio_balloon *vb,
+				unsigned long req_id)
+{
+	struct scatterlist sg_in;
+	unsigned long pfn = 0, bmap_len, pfn_limit, last_pfn, nr_pfn;
+	struct virtqueue *vq = vb->req_vq;
+	struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr;
+	int ret = 1, used_nr_bmap = 0, i;
+
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP) &&
+		vb->nr_page_bmap == 1)
+		extend_page_bitmap(vb);
+
+	pfn_limit = PFNS_PER_BMAP * vb->nr_page_bmap;
+	mutex_lock(&vb->balloon_lock);
+	last_pfn = get_max_pfn();
+
+	while (ret) {
+		clear_page_bitmap(vb);
+		ret = get_unused_pages(pfn, pfn + pfn_limit, vb->page_bitmap,
+			 PFNS_PER_BMAP, vb->nr_page_bmap);
+		if (ret < 0)
+			break;
+		hdr->cmd = BALLOON_GET_UNUSED_PAGES;
+		hdr->id = req_id;
+		bmap_len = BALLOON_BMAP_SIZE * vb->nr_page_bmap;
+
+		if (!ret) {
+			hdr->flag = BALLOON_FLAG_DONE;
+			nr_pfn = last_pfn - pfn;
+			used_nr_bmap = nr_pfn / PFNS_PER_BMAP;
+			if (nr_pfn % PFNS_PER_BMAP)
+				used_nr_bmap++;
+			bmap_len = nr_pfn / BITS_PER_BYTE;
+		} else {
+			hdr->flag = BALLOON_FLAG_CONT;
+			used_nr_bmap = vb->nr_page_bmap;
+		}
+		for (i = 0; i < used_nr_bmap; i++) {
+			unsigned int bmap_size = BALLOON_BMAP_SIZE;
+
+			if (i + 1 == used_nr_bmap)
+				bmap_size = bmap_len - BALLOON_BMAP_SIZE * i;
+			set_bulk_pages(vb, vq, pfn + i * PFNS_PER_BMAP,
+				 vb->page_bitmap[i], bmap_size, true);
+		}
+		if (vb->resp_pos > 0)
+			send_resp_data(vb, vq, true);
+		pfn += pfn_limit;
+	}
+
+	mutex_unlock(&vb->balloon_lock);
+	sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+	virtqueue_add_inbuf(vq, &sg_in, 1, &vb->req_hdr, GFP_KERNEL);
+	virtqueue_kick(vq);
+}
+
 /*
  * While most virtqueues communicate guest-initiated requests to the hypervisor,
  * the stats queue operates in reverse.  The driver initializes the virtqueue
@@ -686,18 +745,56 @@ static void update_balloon_size_func(struct work_struct *work)
 		queue_work(system_freezable_wq, work);
 }
 
+static void misc_handle_rq(struct virtio_balloon *vb)
+{
+	struct virtio_balloon_req_hdr *ptr_hdr;
+	unsigned int len;
+
+	ptr_hdr = virtqueue_get_buf(vb->req_vq, &len);
+	if (!ptr_hdr || len != sizeof(vb->req_hdr))
+		return;
+
+	switch (ptr_hdr->cmd) {
+	case BALLOON_GET_UNUSED_PAGES:
+		send_unused_pages_info(vb, ptr_hdr->param);
+		break;
+	default:
+		break;
+	}
+}
+
+static void misc_request(struct virtqueue *vq)
+{
+	struct virtio_balloon *vb = vq->vdev->priv;
+
+	misc_handle_rq(vb);
+}
+
 static int init_vqs(struct virtio_balloon *vb)
 {
-	struct virtqueue *vqs[3];
-	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request };
-	static const char * const names[] = { "inflate", "deflate", "stats" };
+	struct virtqueue *vqs[4];
+	vq_callback_t *callbacks[] = { balloon_ack, balloon_ack,
+					 stats_request, misc_request };
+	static const char * const names[] = { "inflate", "deflate", "stats",
+						 "misc" };
 	int err, nvqs;
 
 	/*
 	 * We expect two virtqueues: inflate and deflate, and
 	 * optionally stat.
 	 */
-	nvqs = virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ) ? 3 : 2;
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HOST_REQ_VQ))
+		nvqs = 4;
+	else if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ))
+		nvqs = 3;
+	else
+		nvqs = 2;
+
+	if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
+		__virtio_clear_bit(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+		__virtio_clear_bit(vb->vdev, VIRTIO_BALLOON_F_HOST_REQ_VQ);
+	}
+
 	err = vb->vdev->config->find_vqs(vb->vdev, nvqs, vqs, callbacks, names);
 	if (err)
 		return err;
@@ -718,6 +815,18 @@ static int init_vqs(struct virtio_balloon *vb)
 			BUG();
 		virtqueue_kick(vb->stats_vq);
 	}
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HOST_REQ_VQ)) {
+		struct scatterlist sg_in;
+
+		vb->req_vq = vqs[3];
+		sg_init_one(&sg_in, &vb->req_hdr, sizeof(vb->req_hdr));
+		if (virtqueue_add_inbuf(vb->req_vq, &sg_in, 1,
+		    &vb->req_hdr, GFP_KERNEL) < 0)
+			__virtio_clear_bit(vb->vdev,
+					VIRTIO_BALLOON_F_HOST_REQ_VQ);
+		else
+			virtqueue_kick(vb->req_vq);
+	}
 	return 0;
 }
 
@@ -851,11 +960,13 @@ static int virtballoon_probe(struct virtio_device *vdev)
 	vb->resp_hdr = kzalloc(sizeof(struct virtio_balloon_resp_hdr),
 				 GFP_KERNEL);
 	/* Clear the feature bit if memory allocation fails */
-	if (!vb->resp_hdr)
+	if (!vb->resp_hdr) {
 		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
-	else {
+		__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_HOST_REQ_VQ);
+	} else {
 		vb->page_bitmap[0] = kmalloc(BALLOON_BMAP_SIZE, GFP_KERNEL);
 		if (!vb->page_bitmap[0]) {
+			__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_HOST_REQ_VQ);
 			__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
 			kfree(vb->resp_hdr);
 		} else {
@@ -864,6 +975,8 @@ static int virtballoon_probe(struct virtio_device *vdev)
 			if (!vb->resp_data) {
 				__virtio_clear_bit(vdev,
 						VIRTIO_BALLOON_F_PAGE_BITMAP);
+				__virtio_clear_bit(vdev,
+						VIRTIO_BALLOON_F_HOST_REQ_VQ);
 				kfree(vb->page_bitmap[0]);
 				kfree(vb->resp_hdr);
 			}
@@ -987,6 +1100,7 @@ static int virtballoon_restore(struct virtio_device *vdev)
 	VIRTIO_BALLOON_F_STATS_VQ,
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
 	VIRTIO_BALLOON_F_PAGE_BITMAP,
+	VIRTIO_BALLOON_F_HOST_REQ_VQ,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info
  2016-11-02  6:17 ` [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info Liang Li
@ 2016-11-04 18:10   ` Dave Hansen
  2016-11-07  3:37     ` Li, Liang Z
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2016-11-04 18:10 UTC (permalink / raw)
  To: Liang Li, mst
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck

Please squish this and patch 5 together.  It makes no sense to separate
them.

> +static void send_unused_pages_info(struct virtio_balloon *vb,
> +				unsigned long req_id)
> +{
> +	struct scatterlist sg_in;
> +	unsigned long pfn = 0, bmap_len, pfn_limit, last_pfn, nr_pfn;
> +	struct virtqueue *vq = vb->req_vq;
> +	struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr;
> +	int ret = 1, used_nr_bmap = 0, i;
> +
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_PAGE_BITMAP) &&
> +		vb->nr_page_bmap == 1)
> +		extend_page_bitmap(vb);
> +
> +	pfn_limit = PFNS_PER_BMAP * vb->nr_page_bmap;
> +	mutex_lock(&vb->balloon_lock);
> +	last_pfn = get_max_pfn();
> +
> +	while (ret) {
> +		clear_page_bitmap(vb);
> +		ret = get_unused_pages(pfn, pfn + pfn_limit, vb->page_bitmap,
> +			 PFNS_PER_BMAP, vb->nr_page_bmap);

This changed the underlying data structure without changing the way that
the structure is populated.

This algorithm picks a "PFNS_PER_BMAP * vb->nr_page_bmap"-sized set of
pfns, allocates a bitmap for them, the loops through all zones looking
for pages in any free list that are in that range.

Unpacking all the indirection, it looks like this:

for (pfn = 0; pfn < get_max_pfn(); pfn += BITMAP_SIZE_IN_PFNS)
	for_each_populated_zone(zone)
		for_each_migratetype_order(order, t)
			list_for_each(..., &zone->free_area[order])...

Let's say we do a 32k bitmap that can hold ~1M pages.  That's 4GB of
RAM.  On a 1TB system, that's 256 passes through the top-level loop.
The bottom-level lists have tens of thousands of pages in them, even on
my laptop.  Only 1/256 of these pages will get consumed in a given pass.

That's an awfully inefficient way of doing it.  This patch essentially
changed the data structure without changing the algorithm to populate it.

Please change the *algorithm* to use the new data structure efficiently.
 Such a change would only do a single pass through each freelist, and
would choose whether to use the extent-based (pfn -> range) or
bitmap-based approach based on the contents of the free lists.

You should not be using get_max_pfn().  Any patch set that continues to
use it is not likely to be using a proper algorithm.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info
  2016-11-04 18:10   ` Dave Hansen
@ 2016-11-07  3:37     ` Li, Liang Z
  2016-11-07 17:23       ` Dave Hansen
  0 siblings, 1 reply; 14+ messages in thread
From: Li, Liang Z @ 2016-11-07  3:37 UTC (permalink / raw)
  To: Hansen, Dave, mst
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck

> Please squish this and patch 5 together.  It makes no sense to separate them.
> 

OK.

> > +static void send_unused_pages_info(struct virtio_balloon *vb,
> > +				unsigned long req_id)
> > +{
> > +	struct scatterlist sg_in;
> > +	unsigned long pfn = 0, bmap_len, pfn_limit, last_pfn, nr_pfn;
> > +	struct virtqueue *vq = vb->req_vq;
> > +	struct virtio_balloon_resp_hdr *hdr = vb->resp_hdr;
> > +	int ret = 1, used_nr_bmap = 0, i;
> > +
> > +	if (virtio_has_feature(vb->vdev,
> VIRTIO_BALLOON_F_PAGE_BITMAP) &&
> > +		vb->nr_page_bmap == 1)
> > +		extend_page_bitmap(vb);
> > +
> > +	pfn_limit = PFNS_PER_BMAP * vb->nr_page_bmap;
> > +	mutex_lock(&vb->balloon_lock);
> > +	last_pfn = get_max_pfn();
> > +
> > +	while (ret) {
> > +		clear_page_bitmap(vb);
> > +		ret = get_unused_pages(pfn, pfn + pfn_limit, vb-
> >page_bitmap,
> > +			 PFNS_PER_BMAP, vb->nr_page_bmap);
> 
> This changed the underlying data structure without changing the way that
> the structure is populated.
> 
> This algorithm picks a "PFNS_PER_BMAP * vb->nr_page_bmap"-sized set of
> pfns, allocates a bitmap for them, the loops through all zones looking for
> pages in any free list that are in that range.
> 
> Unpacking all the indirection, it looks like this:
> 
> for (pfn = 0; pfn < get_max_pfn(); pfn += BITMAP_SIZE_IN_PFNS)
> 	for_each_populated_zone(zone)
> 		for_each_migratetype_order(order, t)
> 			list_for_each(..., &zone->free_area[order])...
> 
> Let's say we do a 32k bitmap that can hold ~1M pages.  That's 4GB of RAM.
> On a 1TB system, that's 256 passes through the top-level loop.
> The bottom-level lists have tens of thousands of pages in them, even on my
> laptop.  Only 1/256 of these pages will get consumed in a given pass.
> 
Your description is not exactly.
A 32k bitmap is used only when there is few free memory left in the system and when 
the extend_page_bitmap() failed to allocate more memory for the bitmap. Or dozens of 
32k split bitmap will be used, this version limit the bitmap count to 32, it means we can use
at most 32*32 kB for the bitmap, which can cover 128GB for RAM. We can increase the bitmap
count limit to a larger value if 32 is not big enough.

> That's an awfully inefficient way of doing it.  This patch essentially changed
> the data structure without changing the algorithm to populate it.
> 
> Please change the *algorithm* to use the new data structure efficiently.
>  Such a change would only do a single pass through each freelist, and would
> choose whether to use the extent-based (pfn -> range) or bitmap-based
> approach based on the contents of the free lists.

Save the free page info to a raw bitmap first and then process the raw bitmap to
get the proper ' extent-based ' and  'bitmap-based' is the most efficient way I can 
come up with to save the virtio data transmission.  Do you have some better idea?


In the QEMU, no matter how we encode the bitmap, the raw format bitmap will be
used in the end.  But what I did in this version is:
   kernel: get the raw bitmap  --> encode the bitmap
   QEMU: decode the bitmap --> get the raw bitmap

Is it worth to do this kind of job here? we can save the virtio data transmission, but at the
same time, we did extra work.

It seems the benefit we get for this feature is not as big as that in fast balloon inflating/deflating.
> 
> You should not be using get_max_pfn().  Any patch set that continues to use
> it is not likely to be using a proper algorithm.

Do you have any suggestion about how to avoid it?

Thanks!
Liang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info
  2016-11-07  3:37     ` Li, Liang Z
@ 2016-11-07 17:23       ` Dave Hansen
  2016-11-08  5:50         ` Li, Liang Z
  2016-11-08 21:07         ` Michael S. Tsirkin
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Hansen @ 2016-11-07 17:23 UTC (permalink / raw)
  To: Li, Liang Z, mst
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck

On 11/06/2016 07:37 PM, Li, Liang Z wrote:
>> Let's say we do a 32k bitmap that can hold ~1M pages.  That's 4GB of RAM.
>> On a 1TB system, that's 256 passes through the top-level loop.
>> The bottom-level lists have tens of thousands of pages in them, even on my
>> laptop.  Only 1/256 of these pages will get consumed in a given pass.
>>
> Your description is not exactly.
> A 32k bitmap is used only when there is few free memory left in the system and when 
> the extend_page_bitmap() failed to allocate more memory for the bitmap. Or dozens of 
> 32k split bitmap will be used, this version limit the bitmap count to 32, it means we can use
> at most 32*32 kB for the bitmap, which can cover 128GB for RAM. We can increase the bitmap
> count limit to a larger value if 32 is not big enough.

OK, so it tries to allocate a large bitmap.  But, if it fails, it will
try to work with a smaller bitmap.  Correct?

So, what's the _worst_ case?  It sounds like it is even worse than I was
positing.

>> That's an awfully inefficient way of doing it.  This patch essentially changed
>> the data structure without changing the algorithm to populate it.
>>
>> Please change the *algorithm* to use the new data structure efficiently.
>>  Such a change would only do a single pass through each freelist, and would
>> choose whether to use the extent-based (pfn -> range) or bitmap-based
>> approach based on the contents of the free lists.
> 
> Save the free page info to a raw bitmap first and then process the raw bitmap to
> get the proper ' extent-based ' and  'bitmap-based' is the most efficient way I can 
> come up with to save the virtio data transmission.  Do you have some better idea?

That's kinda my point.  This patch *does* processing to try to pack the
bitmaps full of pages from the various pfn ranges.  It's a form of
processing that gets *REALLY*, *REALLY* bad in some (admittedly obscure)
cases.

Let's not pretend that making an essentially unlimited number of passes
over the free lists is not processing.

1. Allocate as large of a bitmap as you can. (what you already do)
2. Iterate from the largest freelist order.  Store those pages in the
   bitmap.
3. If you can no longer fit pages in the bitmap, return the list that
   you have.
4. Make an approximation about where the bitmap does not make any more,
   and fall back to listing individual PFNs.  This would make sens, for
   instance in a large zone with very few free order-0 pages left.
			

> It seems the benefit we get for this feature is not as big as that in fast balloon inflating/deflating.
>>
>> You should not be using get_max_pfn().  Any patch set that continues to use
>> it is not likely to be using a proper algorithm.
> 
> Do you have any suggestion about how to avoid it?

Yes: get the pfns from the page free lists alone.  Don't derive them
from the pfn limits of the system or zones.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info
  2016-11-07 17:23       ` Dave Hansen
@ 2016-11-08  5:50         ` Li, Liang Z
  2016-11-08 18:30           ` Dave Hansen
  2016-11-08 21:07         ` Michael S. Tsirkin
  1 sibling, 1 reply; 14+ messages in thread
From: Li, Liang Z @ 2016-11-08  5:50 UTC (permalink / raw)
  To: Hansen, Dave, mst
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck

> On 11/06/2016 07:37 PM, Li, Liang Z wrote:
> >> Let's say we do a 32k bitmap that can hold ~1M pages.  That's 4GB of RAM.
> >> On a 1TB system, that's 256 passes through the top-level loop.
> >> The bottom-level lists have tens of thousands of pages in them, even
> >> on my laptop.  Only 1/256 of these pages will get consumed in a given pass.
> >>
> > Your description is not exactly.
> > A 32k bitmap is used only when there is few free memory left in the
> > system and when the extend_page_bitmap() failed to allocate more
> > memory for the bitmap. Or dozens of 32k split bitmap will be used,
> > this version limit the bitmap count to 32, it means we can use at most
> > 32*32 kB for the bitmap, which can cover 128GB for RAM. We can increase
> the bitmap count limit to a larger value if 32 is not big enough.
> 
> OK, so it tries to allocate a large bitmap.  But, if it fails, it will try to work with a
> smaller bitmap.  Correct?
> 
Yes.

> So, what's the _worst_ case?  It sounds like it is even worse than I was
> positing.
> 

Only a  32KB bitmap can be allocated, and there are a huge amount of low order (<3) free pages is the worst case. 

> >> That's an awfully inefficient way of doing it.  This patch
> >> essentially changed the data structure without changing the algorithm to
> populate it.
> >>
> >> Please change the *algorithm* to use the new data structure efficiently.
> >>  Such a change would only do a single pass through each freelist, and
> >> would choose whether to use the extent-based (pfn -> range) or
> >> bitmap-based approach based on the contents of the free lists.
> >
> > Save the free page info to a raw bitmap first and then process the raw
> > bitmap to get the proper ' extent-based ' and  'bitmap-based' is the
> > most efficient way I can come up with to save the virtio data transmission.
> Do you have some better idea?
> 
> That's kinda my point.  This patch *does* processing to try to pack the
> bitmaps full of pages from the various pfn ranges.  It's a form of processing
> that gets *REALLY*, *REALLY* bad in some (admittedly obscure) cases.
> 
> Let's not pretend that making an essentially unlimited number of passes over
> the free lists is not processing.
> 
> 1. Allocate as large of a bitmap as you can. (what you already do) 2. Iterate
> from the largest freelist order.  Store those pages in the
>    bitmap.
> 3. If you can no longer fit pages in the bitmap, return the list that
>    you have.
> 4. Make an approximation about where the bitmap does not make any more,
>    and fall back to listing individual PFNs.  This would make sens, for
>    instance in a large zone with very few free order-0 pages left.
> 
Sounds good.  Should we ignore some of the order-0 pages in step 4 if the bitmap is full?
Or should retry to get a complete list of order-0 pages?

> 
> > It seems the benefit we get for this feature is not as big as that in fast
> balloon inflating/deflating.
> >>
> >> You should not be using get_max_pfn().  Any patch set that continues
> >> to use it is not likely to be using a proper algorithm.
> >
> > Do you have any suggestion about how to avoid it?
> 
> Yes: get the pfns from the page free lists alone.  Don't derive them from the
> pfn limits of the system or zones.

The ' get_max_pfn()' can be avoid in this patch, but I think we can't avoid it completely.
We need it as a hint for allocating a proper size bitmap. No?

Thanks!
Liang

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info
  2016-11-08  5:50         ` Li, Liang Z
@ 2016-11-08 18:30           ` Dave Hansen
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Hansen @ 2016-11-08 18:30 UTC (permalink / raw)
  To: Li, Liang Z, mst
  Cc: pbonzini, amit.shah, quintela, dgilbert, qemu-devel, kvm,
	virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck

On 11/07/2016 09:50 PM, Li, Liang Z wrote:
> Sounds good.  Should we ignore some of the order-0 pages in step 4 if the bitmap is full?
> Or should retry to get a complete list of order-0 pages?

I think that's a pretty reasonable thing to do.

>>> It seems the benefit we get for this feature is not as big as that in fast
>> balloon inflating/deflating.
>>>>
>>>> You should not be using get_max_pfn().  Any patch set that continues
>>>> to use it is not likely to be using a proper algorithm.
>>>
>>> Do you have any suggestion about how to avoid it?
>>
>> Yes: get the pfns from the page free lists alone.  Don't derive
>> them from the pfn limits of the system or zones.
> 
> The ' get_max_pfn()' can be avoid in this patch, but I think we can't
> avoid it completely. We need it as a hint for allocating a proper
> size bitmap. No?

If you start with higher-order pages, you'll be unlikely to get anywhere
close to filling up a bitmap that was sized to hold all possible order-0
pages on the system.  Any use of max_pfn also means that you'll
completely mis-size bitmaps on sparse systems with large holes.

I think you should size it based on the size of the free lists, if anything.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info
  2016-11-07 17:23       ` Dave Hansen
  2016-11-08  5:50         ` Li, Liang Z
@ 2016-11-08 21:07         ` Michael S. Tsirkin
  1 sibling, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2016-11-08 21:07 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Li, Liang Z, pbonzini, amit.shah, quintela, dgilbert, qemu-devel,
	kvm, virtio-dev, linux-mm, linux-kernel, virtualization, mgorman,
	cornelia.huck

On Mon, Nov 07, 2016 at 09:23:38AM -0800, Dave Hansen wrote:
> On 11/06/2016 07:37 PM, Li, Liang Z wrote:
> >> Let's say we do a 32k bitmap that can hold ~1M pages.  That's 4GB of RAM.
> >> On a 1TB system, that's 256 passes through the top-level loop.
> >> The bottom-level lists have tens of thousands of pages in them, even on my
> >> laptop.  Only 1/256 of these pages will get consumed in a given pass.
> >>
> > Your description is not exactly.
> > A 32k bitmap is used only when there is few free memory left in the system and when 
> > the extend_page_bitmap() failed to allocate more memory for the bitmap. Or dozens of 
> > 32k split bitmap will be used, this version limit the bitmap count to 32, it means we can use
> > at most 32*32 kB for the bitmap, which can cover 128GB for RAM. We can increase the bitmap
> > count limit to a larger value if 32 is not big enough.
> 
> OK, so it tries to allocate a large bitmap.  But, if it fails, it will
> try to work with a smaller bitmap.  Correct?
> 
> So, what's the _worst_ case?  It sounds like it is even worse than I was
> positing.
> 
> >> That's an awfully inefficient way of doing it.  This patch essentially changed
> >> the data structure without changing the algorithm to populate it.
> >>
> >> Please change the *algorithm* to use the new data structure efficiently.
> >>  Such a change would only do a single pass through each freelist, and would
> >> choose whether to use the extent-based (pfn -> range) or bitmap-based
> >> approach based on the contents of the free lists.
> > 
> > Save the free page info to a raw bitmap first and then process the raw bitmap to
> > get the proper ' extent-based ' and  'bitmap-based' is the most efficient way I can 
> > come up with to save the virtio data transmission.  Do you have some better idea?
> 
> That's kinda my point.  This patch *does* processing to try to pack the
> bitmaps full of pages from the various pfn ranges.  It's a form of
> processing that gets *REALLY*, *REALLY* bad in some (admittedly obscure)
> cases.
> 
> Let's not pretend that making an essentially unlimited number of passes
> over the free lists is not processing.
> 
> 1. Allocate as large of a bitmap as you can. (what you already do)
> 2. Iterate from the largest freelist order.  Store those pages in the
>    bitmap.
> 3. If you can no longer fit pages in the bitmap, return the list that
>    you have.
> 4. Make an approximation about where the bitmap does not make any more,
>    and fall back to listing individual PFNs.  This would make sens, for
>    instance in a large zone with very few free order-0 pages left.

In practice, a single PFN using the bitmap format
only takes up twice the size: I think it's 128 instead of 64 bit
per entry.

So it's not a a given that point 4 is worth it at any point,
just packing multiple bitmaps might be good enough.

> 
> > It seems the benefit we get for this feature is not as big as that in fast balloon inflating/deflating.
> >>
> >> You should not be using get_max_pfn().  Any patch set that continues to use
> >> it is not likely to be using a proper algorithm.
> > 
> > Do you have any suggestion about how to avoid it?
> 
> Yes: get the pfns from the page free lists alone.  Don't derive them
> from the pfn limits of the system or zones.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-11-08 21:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-02  6:17 [PATCH kernel v4 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration Liang Li
2016-11-02  6:17 ` [PATCH kernel v4 1/7] virtio-balloon: rework deflate to add page to a list Liang Li
2016-11-02  6:17 ` [PATCH kernel v4 2/7] virtio-balloon: define new feature bit and head struct Liang Li
2016-11-02  6:17 ` [PATCH kernel v4 3/7] mm: add a function to get the max pfn Liang Li
2016-11-02  6:17 ` [PATCH kernel v4 4/7] virtio-balloon: speed up inflate/deflate process Liang Li
2016-11-02  6:17 ` [PATCH kernel v4 5/7] mm: add the related functions to get unused page Liang Li
2016-11-02  6:17 ` [PATCH kernel v4 6/7] virtio-balloon: define flags and head for host request vq Liang Li
2016-11-02  6:17 ` [PATCH kernel v4 7/7] virtio-balloon: tell host vm's unused page info Liang Li
2016-11-04 18:10   ` Dave Hansen
2016-11-07  3:37     ` Li, Liang Z
2016-11-07 17:23       ` Dave Hansen
2016-11-08  5:50         ` Li, Liang Z
2016-11-08 18:30           ` Dave Hansen
2016-11-08 21:07         ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).