QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue
@ 2020-03-22 16:04 Hui Zhu
  2020-03-22 16:04 ` [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT Hui Zhu
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Hui Zhu @ 2020-03-22 16:04 UTC (permalink / raw)
  To: mst, jasowang, akpm, mojha, pagupta, aquini, namit, david,
	virtualization, linux-kernel, linux-mm, qemu-devel
  Cc: Hui Zhu, Hui Zhu

The first version is in [1].
According to the comments from Michael and David, I updated the patch.
1. Added a separate vq inflate_cont_vq to transport inflate continuous
   pages.
2. Set all the pages in the continuous pages movable then they can be
   compaction.
3. Added a new element pages_order to virtio_balloon_config.  It is the
   inflate pages order that is set by the QEMU.
4. If the balloon cannot get any continuous pages from the system.
   Go back to use the single page to fill the balloon.
5.  Use balloon_pages_alloc to allocate the single page and continuous
    pages.  Replace __GFP_NORETRY with __GFP_RETRY_MAYFAIL when allocating
    the continuous pages because it can increase the success rate of
    allocating large chunks of memory.

Following is the introduction of the function.
If the guest kernel has many fragmentation pages, use virtio_balloon
will split THP of QEMU when it calls MADV_DONTNEED madvise to release
the balloon pages.
This is an example in a VM with 1G memory 1CPU:
// This is the THP number before VM execution in the host.
// None use THP.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages:         0 kB

// After VM start, use usemem
// (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git)
// punch-holes function generates 400m fragmentation pages in the guest
// kernel.
usemem --punch-holes -s -1 800m &

// This is the THP number after this command in the host.
// Some THP is used by VM because usemem will access 800M memory
// in the guest.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages:    976896 kB

// Connect to the QEMU monitor, setup balloon, and set it size to 600M.
(qemu) device_add virtio-balloon-pci,id=balloon1
(qemu) info balloon
balloon: actual=1024
(qemu) balloon 600
(qemu) info balloon
balloon: actual=600

// This is the THP number after inflate the balloon in the host.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages:    151552 kB

THP number decreased more than 800M.
The reason is usemem with punch-holes option will free every other
page after allocation.  Then 400M free memory inside the guest kernel
is fragmentation pages.
The guest kernel will use them to inflate the balloon.  When these
fragmentation pages are freed, THP will be split.

This commit tries to handle this with add a new flag
VIRTIO_BALLOON_VQ_INFLATE_CONT.
When this flag is set, the balloon will try to use continuous pages
inflate the balloon.  And the pages order is set to THP order.
Then THP pages will be freed together in the host.
This is an example in a VM with 1G memory 1CPU:
// This is the THP number before VM execution in the host.
// None use THP.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages:         0 kB

// After VM start, use usemem punch-holes function generates 400M
// fragmentation pages in the guest kernel.
usemem --punch-holes -s -1 800m &

// This is the THP number after this command in the host.
// Some THP is used by VM because usemem will access 800M memory
// in the guest.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages:    976896 kB

// Connect to the QEMU monitor, setup balloon, and set it size to 600M.
(qemu) device_add virtio-balloon-pci,id=balloon1,cont-pages=on
(qemu) info balloon
balloon: actual=1024
(qemu) balloon 600
(qemu) info balloon
balloon: actual=600

// This is the THP number after inflate the balloon in the host.
cat /proc/meminfo | grep AnonHugePages:
AnonHugePages:    610304 kB

The THP number decreases 358M.  This shows that
VIRTIO_BALLOON_VQ_INFLATE_CONT can help handle the THP split issue.

[1] https://lkml.org/lkml/2020/3/12/144

Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
---
 drivers/virtio/virtio_balloon.c     | 78 ++++++++++++++++++++++++++++++-------
 include/linux/balloon_compaction.h  |  9 ++++-
 include/uapi/linux/virtio_balloon.h |  3 ++
 mm/balloon_compaction.c             | 40 +++++++++++++++----
 4 files changed, 109 insertions(+), 21 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 341458f..fbd2b02f 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -47,6 +47,7 @@ enum virtio_balloon_vq {
 	VIRTIO_BALLOON_VQ_DEFLATE,
 	VIRTIO_BALLOON_VQ_STATS,
 	VIRTIO_BALLOON_VQ_FREE_PAGE,
+	VIRTIO_BALLOON_VQ_INFLATE_CONT,
 	VIRTIO_BALLOON_VQ_MAX
 };
 
@@ -56,7 +57,8 @@ enum virtio_balloon_config_read {
 
 struct virtio_balloon {
 	struct virtio_device *vdev;
-	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq,
+			 *inflate_cont_vq;
 
 	/* Balloon's own wq for cpu-intensive work items */
 	struct workqueue_struct *balloon_wq;
@@ -114,6 +116,12 @@ struct virtio_balloon {
 
 	/* To register a shrinker to shrink memory upon memory pressure */
 	struct shrinker shrinker;
+
+	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set.
+	 * if VIRTIO_BALLOON_F_CONT_PAGES is not set,
+	 * pages_order will be set to 0.
+	 */
+	__u32 pages_order;
 };
 
 static struct virtio_device_id id_table[] = {
@@ -169,10 +177,11 @@ static void set_page_pfns(struct virtio_balloon *vb,
 					  page_to_balloon_pfn(page) + i);
 }
 
-static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned int fill_balloon_order(struct virtio_balloon *vb, size_t num,
+				       unsigned int order)
 {
-	unsigned num_allocated_pages;
-	unsigned num_pfns;
+	unsigned int num_allocated_pages;
+	unsigned int num_pfns;
 	struct page *page;
 	LIST_HEAD(pages);
 
@@ -181,12 +190,12 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 
 	for (num_pfns = 0; num_pfns < num;
 	     num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
-		struct page *page = balloon_page_alloc();
+		struct page *page = balloon_pages_alloc(order);
 
 		if (!page) {
 			dev_info_ratelimited(&vb->vdev->dev,
-					     "Out of puff! Can't get %u pages\n",
-					     VIRTIO_BALLOON_PAGES_PER_PAGE);
+				"Out of puff! Can't get %u pages\n",
+				VIRTIO_BALLOON_PAGES_PER_PAGE << order);
 			/* Sleep for at least 1/5 of a second before retry. */
 			msleep(200);
 			break;
@@ -200,25 +209,48 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
 	vb->num_pfns = 0;
 
 	while ((page = balloon_page_pop(&pages))) {
-		balloon_page_enqueue(&vb->vb_dev_info, page);
+		balloon_pages_enqueue(&vb->vb_dev_info, page, order);
 
 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
-		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
+		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE << order;
 		if (!virtio_has_feature(vb->vdev,
 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
-			adjust_managed_page_count(page, -1);
+			adjust_managed_page_count(page, -(1 << order));
 		vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
 	}
 
-	num_allocated_pages = vb->num_pfns;
+	num_allocated_pages = vb->num_pfns << order;
 	/* Did we get any? */
-	if (vb->num_pfns != 0)
-		tell_host(vb, vb->inflate_vq);
+	if (vb->num_pfns != 0) {
+		if (order)
+			tell_host(vb, vb->inflate_cont_vq);
+		else
+			tell_host(vb, vb->inflate_vq);
+	}
 	mutex_unlock(&vb->balloon_lock);
 
 	return num_allocated_pages;
 }
 
+static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num)
+{
+	unsigned int num_allocated_pages = 0;
+
+	if (vb->pages_order) {
+		num_allocated_pages = fill_balloon_order(vb, num,
+							 vb->pages_order);
+		if (!num_allocated_pages)
+			dev_info_ratelimited(&vb->vdev->dev,
+				"Fail get any continuous pages in order %u\n",
+				vb->pages_order);
+	}
+
+	if (!num_allocated_pages)
+		num_allocated_pages = fill_balloon_order(vb, num, 0);
+
+	return num_allocated_pages;
+}
+
 static void release_pages_balloon(struct virtio_balloon *vb,
 				 struct list_head *pages)
 {
@@ -481,6 +513,8 @@ static int init_vqs(struct virtio_balloon *vb)
 	names[VIRTIO_BALLOON_VQ_STATS] = NULL;
 	callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
 	names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
+	callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
+	names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
 
 	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
 		names[VIRTIO_BALLOON_VQ_STATS] = "stats";
@@ -492,6 +526,11 @@ static int init_vqs(struct virtio_balloon *vb)
 		callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
 	}
 
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
+		names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = "inflate_cont";
+		callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = balloon_ack;
+	}
+
 	err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,
 					 vqs, callbacks, names, NULL, NULL);
 	if (err)
@@ -524,6 +563,10 @@ static int init_vqs(struct virtio_balloon *vb)
 	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
 		vb->free_page_vq = vqs[VIRTIO_BALLOON_VQ_FREE_PAGE];
 
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES))
+		vb->inflate_cont_vq
+			= vqs[VIRTIO_BALLOON_VQ_INFLATE_CONT];
+
 	return 0;
 }
 
@@ -945,6 +988,14 @@ static int virtballoon_probe(struct virtio_device *vdev)
 		if (err)
 			goto out_del_balloon_wq;
 	}
+	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
+		virtio_cread(vb->vdev, struct virtio_balloon_config,
+			     pages_order, &vb->pages_order);
+		if (!virtio_has_feature(vb->vdev, VIRTIO_F_VERSION_1))
+			vb->pages_order
+				= le32_to_cpu((__force __le32)vb->pages_order);
+	} else
+		vb->pages_order = 0;
 	virtio_device_ready(vdev);
 
 	if (towards_target(vb))
@@ -1058,6 +1109,7 @@ static unsigned int features[] = {
 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
 	VIRTIO_BALLOON_F_FREE_PAGE_HINT,
 	VIRTIO_BALLOON_F_PAGE_POISON,
+	VIRTIO_BALLOON_F_CONT_PAGES,
 };
 
 static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
index 338aa27..8180bbf 100644
--- a/include/linux/balloon_compaction.h
+++ b/include/linux/balloon_compaction.h
@@ -60,15 +60,22 @@ struct balloon_dev_info {
 	struct inode *inode;
 };
 
-extern struct page *balloon_page_alloc(void);
+extern struct page *balloon_pages_alloc(unsigned int order);
 extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
 				 struct page *page);
+extern void balloon_pages_enqueue(struct balloon_dev_info *b_dev_info,
+				  struct page *page, unsigned int order);
 extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
 extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
 				      struct list_head *pages);
 extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
 				     struct list_head *pages, size_t n_req_pages);
 
+static inline struct page *balloon_page_alloc(void)
+{
+	return balloon_pages_alloc(0);
+}
+
 static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
 {
 	balloon->isolated_pages = 0;
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index a1966cd7..e21b0aa 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_CONT_PAGES	5 /* VQ to report continuous pages */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -51,6 +52,8 @@ struct virtio_balloon_config {
 	__u32 free_page_report_cmd_id;
 	/* Stores PAGE_POISON if page poisoning is in use */
 	__u32 poison_val;
+	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set */
+	__u32 pages_order;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
index 26de020..87df4b2 100644
--- a/mm/balloon_compaction.c
+++ b/mm/balloon_compaction.c
@@ -112,8 +112,8 @@ size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
 EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
 
 /*
- * balloon_page_alloc - allocates a new page for insertion into the balloon
- *			page list.
+ * balloon_pages_alloc - allocates a new page for insertion into the balloon
+ *			 page list.
  *
  * Driver must call this function to properly allocate a new balloon page.
  * Driver must call balloon_page_enqueue before definitively removing the page
@@ -121,14 +121,21 @@ EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
  *
  * Return: struct page for the allocated page or NULL on allocation failure.
  */
-struct page *balloon_page_alloc(void)
+struct page *balloon_pages_alloc(unsigned int order)
 {
-	struct page *page = alloc_page(balloon_mapping_gfp_mask() |
-				       __GFP_NOMEMALLOC | __GFP_NORETRY |
-				       __GFP_NOWARN);
+	gfp_t gfp_mask;
+
+	if (order > 1)
+		gfp_mask = __GFP_RETRY_MAYFAIL;
+	else
+		gfp_mask = __GFP_NORETRY;
+
+	struct page *page = alloc_pages(balloon_mapping_gfp_mask() |
+					gfp_mask | __GFP_NOMEMALLOC |
+					__GFP_NOWARN, order);
 	return page;
 }
-EXPORT_SYMBOL_GPL(balloon_page_alloc);
+EXPORT_SYMBOL_GPL(balloon_pages_alloc);
 
 /*
  * balloon_page_enqueue - inserts a new page into the balloon page list.
@@ -155,6 +162,25 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
 EXPORT_SYMBOL_GPL(balloon_page_enqueue);
 
 /*
+ * balloon_pages_enqueue - inserts continuous pages into the balloon page list.
+ */
+void balloon_pages_enqueue(struct balloon_dev_info *b_dev_info,
+			   struct page *page, unsigned int order)
+{
+	unsigned long flags;
+	unsigned long pfn = page_to_pfn(page);
+	unsigned long last_pfn = pfn + (1 << order) - 1;
+
+	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
+	for (; pfn <= last_pfn; pfn++) {
+		page = pfn_to_page(pfn);
+		balloon_page_enqueue_one(b_dev_info, page);
+	}
+	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
+}
+EXPORT_SYMBOL_GPL(balloon_pages_enqueue);
+
+/*
  * balloon_page_dequeue - removes a page from balloon's page list and returns
  *			  its address to allow the driver to release the page.
  * @b_dev_info: balloon device decriptor where we will grab a page from.
-- 
2.7.4



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT
  2020-03-22 16:04 [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue Hui Zhu
@ 2020-03-22 16:04 ` Hui Zhu
  2020-03-22 16:19   ` no-reply
                     ` (2 more replies)
  2020-03-26  6:45 ` [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue teawater
  2020-03-26  7:17 ` Michael S. Tsirkin
  2 siblings, 3 replies; 9+ messages in thread
From: Hui Zhu @ 2020-03-22 16:04 UTC (permalink / raw)
  To: mst, jasowang, akpm, mojha, pagupta, aquini, namit, david,
	virtualization, linux-kernel, linux-mm, qemu-devel
  Cc: Hui Zhu, Hui Zhu

If the guest kernel has many fragmentation pages, use virtio_balloon
will split THP of QEMU when it calls MADV_DONTNEED madvise to release
the balloon pages.
Set option cont-pages to on will open flags VIRTIO_BALLOON_VQ_INFLATE_CONT
and set continuous pages order to THP order.
Then It will get continuous pages PFN from VQ icvq use use madvise
MADV_DONTNEED release the THP page.
This will handle the THP split issue.

Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
---
 hw/virtio/virtio-balloon.c                      | 32 +++++++++++++++++++++----
 include/hw/virtio/virtio-balloon.h              |  4 +++-
 include/standard-headers/linux/virtio_balloon.h |  4 ++++
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7..88bdaca 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -34,6 +34,7 @@
 #include "hw/virtio/virtio-access.h"
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
+#define CONT_PAGES_ORDER   9
 
 typedef struct PartiallyBalloonedPage {
     ram_addr_t base_gpa;
@@ -65,7 +66,8 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
 
 static void balloon_inflate_page(VirtIOBalloon *balloon,
                                  MemoryRegion *mr, hwaddr mr_offset,
-                                 PartiallyBalloonedPage *pbp)
+                                 PartiallyBalloonedPage *pbp, 
+                                 bool is_cont_pages)
 {
     void *addr = memory_region_get_ram_ptr(mr) + mr_offset;
     ram_addr_t rb_offset, rb_aligned_offset, base_gpa;
@@ -76,6 +78,13 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
     /* XXX is there a better way to get to the RAMBlock than via a
      * host address? */
     rb = qemu_ram_block_from_host(addr, false, &rb_offset);
+
+    if (is_cont_pages) {
+        ram_block_discard_range(rb, rb_offset,
+                                BALLOON_PAGE_SIZE << CONT_PAGES_ORDER);
+        return;
+    }
+
     rb_page_size = qemu_ram_pagesize(rb);
 
     if (rb_page_size == BALLOON_PAGE_SIZE) {
@@ -361,9 +370,10 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
             trace_virtio_balloon_handle_output(memory_region_name(section.mr),
                                                pa);
             if (!qemu_balloon_is_inhibited()) {
-                if (vq == s->ivq) {
+                if (vq == s->ivq || vq == s->icvq) {
                     balloon_inflate_page(s, section.mr,
-                                         section.offset_within_region, &pbp);
+                                         section.offset_within_region, &pbp,
+                                         vq == s->icvq);
                 } else if (vq == s->dvq) {
                     balloon_deflate_page(s, section.mr, section.offset_within_region);
                 } else {
@@ -618,9 +628,12 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
     if (s->qemu_4_0_config_size) {
         return sizeof(struct virtio_balloon_config);
     }
-    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
+    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
         return sizeof(struct virtio_balloon_config);
     }
+    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
+        return offsetof(struct virtio_balloon_config, pages_order);
+    }
     if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
         return offsetof(struct virtio_balloon_config, poison_val);
     }
@@ -646,6 +659,10 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
                        cpu_to_le32(VIRTIO_BALLOON_CMD_ID_DONE);
     }
 
+    if (virtio_has_feature(dev->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
+        config.pages_order = cpu_to_le32(CONT_PAGES_ORDER);
+    }
+
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, virtio_balloon_config_size(dev));
 }
@@ -816,6 +833,11 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
             virtio_error(vdev, "iothread is missing");
         }
     }
+
+    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
+        s->icvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
+    }
+
     reset_stats(s);
 }
 
@@ -916,6 +938,8 @@ static Property virtio_balloon_properties[] = {
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
     DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+    DEFINE_PROP_BIT("cont-pages", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_CONT_PAGES, false),
     /* QEMU 4.0 accidentally changed the config size even when free-page-hint
      * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
      * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index d1c968d..61d2419 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
 
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq;
     uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
@@ -70,6 +70,8 @@ typedef struct VirtIOBalloon {
     uint32_t host_features;
 
     bool qemu_4_0_config_size;
+
+    uint32_t pages_order;
 } VirtIOBalloon;
 
 #endif
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2..ee18be7 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,8 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_CONT_PAGES	5 /* VQ to report continuous pages */
+
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
@@ -51,6 +53,8 @@ struct virtio_balloon_config {
 	uint32_t free_page_report_cmd_id;
 	/* Stores PAGE_POISON if page poisoning is in use */
 	uint32_t poison_val;
+	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set */
+	uint32_t pages_order;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
2.7.4



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT
  2020-03-22 16:04 ` [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT Hui Zhu
@ 2020-03-22 16:19   ` no-reply
  2020-03-26  7:06   ` teawater
  2020-03-26  7:25   ` Michael S. Tsirkin
  2 siblings, 0 replies; 9+ messages in thread
From: no-reply @ 2020-03-22 16:19 UTC (permalink / raw)
  To: teawater
  Cc: pagupta, aquini, mst, qemu-devel, mojha, david, linux-kernel,
	virtualization, linux-mm, namit, teawaterz, akpm, jasowang,
	teawater

Patchew URL: https://patchew.org/QEMU/1584893097-12317-2-git-send-email-teawater@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT
Message-id: 1584893097-12317-2-git-send-email-teawater@gmail.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
206d1fd virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT

=== OUTPUT BEGIN ===
ERROR: trailing whitespace
#36: FILE: hw/virtio/virtio-balloon.c:69:
+                                 PartiallyBalloonedPage *pbp, $

total: 1 errors, 0 warnings, 115 lines checked

Commit 206d1fd85a23 (virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/1584893097-12317-2-git-send-email-teawater@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue
  2020-03-22 16:04 [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue Hui Zhu
  2020-03-22 16:04 ` [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT Hui Zhu
@ 2020-03-26  6:45 ` teawater
  2020-03-26  7:17 ` Michael S. Tsirkin
  2 siblings, 0 replies; 9+ messages in thread
From: teawater @ 2020-03-26  6:45 UTC (permalink / raw)
  To: Hui Zhu
  Cc: pagupta, aquini, Michael S. Tsirkin, qemu-devel, mojha, david,
	linux-kernel, virtualization, linux-mm, namit, Andrew Morton,
	Jason Wang

Ping.

Thanks,
Hui

> 2020年3月23日 00:04,Hui Zhu <teawater@gmail.com> :
> 
> The first version is in [1].
> According to the comments from Michael and David, I updated the patch.
> 1. Added a separate vq inflate_cont_vq to transport inflate continuous
>   pages.
> 2. Set all the pages in the continuous pages movable then they can be
>   compaction.
> 3. Added a new element pages_order to virtio_balloon_config.  It is the
>   inflate pages order that is set by the QEMU.
> 4. If the balloon cannot get any continuous pages from the system.
>   Go back to use the single page to fill the balloon.
> 5.  Use balloon_pages_alloc to allocate the single page and continuous
>    pages.  Replace __GFP_NORETRY with __GFP_RETRY_MAYFAIL when allocating
>    the continuous pages because it can increase the success rate of
>    allocating large chunks of memory.
> 
> Following is the introduction of the function.
> If the guest kernel has many fragmentation pages, use virtio_balloon
> will split THP of QEMU when it calls MADV_DONTNEED madvise to release
> the balloon pages.
> This is an example in a VM with 1G memory 1CPU:
> // This is the THP number before VM execution in the host.
> // None use THP.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:         0 kB
> 
> // After VM start, use usemem
> // (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git)
> // punch-holes function generates 400m fragmentation pages in the guest
> // kernel.
> usemem --punch-holes -s -1 800m &
> 
> // This is the THP number after this command in the host.
> // Some THP is used by VM because usemem will access 800M memory
> // in the guest.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:    976896 kB
> 
> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
> (qemu) device_add virtio-balloon-pci,id=balloon1
> (qemu) info balloon
> balloon: actual=1024
> (qemu) balloon 600
> (qemu) info balloon
> balloon: actual=600
> 
> // This is the THP number after inflate the balloon in the host.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:    151552 kB
> 
> THP number decreased more than 800M.
> The reason is usemem with punch-holes option will free every other
> page after allocation.  Then 400M free memory inside the guest kernel
> is fragmentation pages.
> The guest kernel will use them to inflate the balloon.  When these
> fragmentation pages are freed, THP will be split.
> 
> This commit tries to handle this with add a new flag
> VIRTIO_BALLOON_VQ_INFLATE_CONT.
> When this flag is set, the balloon will try to use continuous pages
> inflate the balloon.  And the pages order is set to THP order.
> Then THP pages will be freed together in the host.
> This is an example in a VM with 1G memory 1CPU:
> // This is the THP number before VM execution in the host.
> // None use THP.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:         0 kB
> 
> // After VM start, use usemem punch-holes function generates 400M
> // fragmentation pages in the guest kernel.
> usemem --punch-holes -s -1 800m &
> 
> // This is the THP number after this command in the host.
> // Some THP is used by VM because usemem will access 800M memory
> // in the guest.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:    976896 kB
> 
> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
> (qemu) device_add virtio-balloon-pci,id=balloon1,cont-pages=on
> (qemu) info balloon
> balloon: actual=1024
> (qemu) balloon 600
> (qemu) info balloon
> balloon: actual=600
> 
> // This is the THP number after inflate the balloon in the host.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:    610304 kB
> 
> The THP number decreases 358M.  This shows that
> VIRTIO_BALLOON_VQ_INFLATE_CONT can help handle the THP split issue.
> 
> [1] https://lkml.org/lkml/2020/3/12/144
> 
> Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
> ---
> drivers/virtio/virtio_balloon.c     | 78 ++++++++++++++++++++++++++++++-------
> include/linux/balloon_compaction.h  |  9 ++++-
> include/uapi/linux/virtio_balloon.h |  3 ++
> mm/balloon_compaction.c             | 40 +++++++++++++++----
> 4 files changed, 109 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 341458f..fbd2b02f 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -47,6 +47,7 @@ enum virtio_balloon_vq {
> 	VIRTIO_BALLOON_VQ_DEFLATE,
> 	VIRTIO_BALLOON_VQ_STATS,
> 	VIRTIO_BALLOON_VQ_FREE_PAGE,
> +	VIRTIO_BALLOON_VQ_INFLATE_CONT,
> 	VIRTIO_BALLOON_VQ_MAX
> };
> 
> @@ -56,7 +57,8 @@ enum virtio_balloon_config_read {
> 
> struct virtio_balloon {
> 	struct virtio_device *vdev;
> -	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq,
> +			 *inflate_cont_vq;
> 
> 	/* Balloon's own wq for cpu-intensive work items */
> 	struct workqueue_struct *balloon_wq;
> @@ -114,6 +116,12 @@ struct virtio_balloon {
> 
> 	/* To register a shrinker to shrink memory upon memory pressure */
> 	struct shrinker shrinker;
> +
> +	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set.
> +	 * if VIRTIO_BALLOON_F_CONT_PAGES is not set,
> +	 * pages_order will be set to 0.
> +	 */
> +	__u32 pages_order;
> };
> 
> static struct virtio_device_id id_table[] = {
> @@ -169,10 +177,11 @@ static void set_page_pfns(struct virtio_balloon *vb,
> 					  page_to_balloon_pfn(page) + i);
> }
> 
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int fill_balloon_order(struct virtio_balloon *vb, size_t num,
> +				       unsigned int order)
> {
> -	unsigned num_allocated_pages;
> -	unsigned num_pfns;
> +	unsigned int num_allocated_pages;
> +	unsigned int num_pfns;
> 	struct page *page;
> 	LIST_HEAD(pages);
> 
> @@ -181,12 +190,12 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> 
> 	for (num_pfns = 0; num_pfns < num;
> 	     num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> -		struct page *page = balloon_page_alloc();
> +		struct page *page = balloon_pages_alloc(order);
> 
> 		if (!page) {
> 			dev_info_ratelimited(&vb->vdev->dev,
> -					     "Out of puff! Can't get %u pages\n",
> -					     VIRTIO_BALLOON_PAGES_PER_PAGE);
> +				"Out of puff! Can't get %u pages\n",
> +				VIRTIO_BALLOON_PAGES_PER_PAGE << order);
> 			/* Sleep for at least 1/5 of a second before retry. */
> 			msleep(200);
> 			break;
> @@ -200,25 +209,48 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> 	vb->num_pfns = 0;
> 
> 	while ((page = balloon_page_pop(&pages))) {
> -		balloon_page_enqueue(&vb->vb_dev_info, page);
> +		balloon_pages_enqueue(&vb->vb_dev_info, page, order);
> 
> 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> -		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> +		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE << order;
> 		if (!virtio_has_feature(vb->vdev,
> 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> -			adjust_managed_page_count(page, -1);
> +			adjust_managed_page_count(page, -(1 << order));
> 		vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
> 	}
> 
> -	num_allocated_pages = vb->num_pfns;
> +	num_allocated_pages = vb->num_pfns << order;
> 	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (order)
> +			tell_host(vb, vb->inflate_cont_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
> 	mutex_unlock(&vb->balloon_lock);
> 
> 	return num_allocated_pages;
> }
> 
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num)
> +{
> +	unsigned int num_allocated_pages = 0;
> +
> +	if (vb->pages_order) {
> +		num_allocated_pages = fill_balloon_order(vb, num,
> +							 vb->pages_order);
> +		if (!num_allocated_pages)
> +			dev_info_ratelimited(&vb->vdev->dev,
> +				"Fail get any continuous pages in order %u\n",
> +				vb->pages_order);
> +	}
> +
> +	if (!num_allocated_pages)
> +		num_allocated_pages = fill_balloon_order(vb, num, 0);
> +
> +	return num_allocated_pages;
> +}
> +
> static void release_pages_balloon(struct virtio_balloon *vb,
> 				 struct list_head *pages)
> {
> @@ -481,6 +513,8 @@ static int init_vqs(struct virtio_balloon *vb)
> 	names[VIRTIO_BALLOON_VQ_STATS] = NULL;
> 	callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
> 	names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
> +	callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
> +	names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
> 
> 	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
> 		names[VIRTIO_BALLOON_VQ_STATS] = "stats";
> @@ -492,6 +526,11 @@ static int init_vqs(struct virtio_balloon *vb)
> 		callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
> 	}
> 
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
> +		names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = "inflate_cont";
> +		callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = balloon_ack;
> +	}
> +
> 	err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,
> 					 vqs, callbacks, names, NULL, NULL);
> 	if (err)
> @@ -524,6 +563,10 @@ static int init_vqs(struct virtio_balloon *vb)
> 	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
> 		vb->free_page_vq = vqs[VIRTIO_BALLOON_VQ_FREE_PAGE];
> 
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES))
> +		vb->inflate_cont_vq
> +			= vqs[VIRTIO_BALLOON_VQ_INFLATE_CONT];
> +
> 	return 0;
> }
> 
> @@ -945,6 +988,14 @@ static int virtballoon_probe(struct virtio_device *vdev)
> 		if (err)
> 			goto out_del_balloon_wq;
> 	}
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
> +		virtio_cread(vb->vdev, struct virtio_balloon_config,
> +			     pages_order, &vb->pages_order);
> +		if (!virtio_has_feature(vb->vdev, VIRTIO_F_VERSION_1))
> +			vb->pages_order
> +				= le32_to_cpu((__force __le32)vb->pages_order);
> +	} else
> +		vb->pages_order = 0;
> 	virtio_device_ready(vdev);
> 
> 	if (towards_target(vb))
> @@ -1058,6 +1109,7 @@ static unsigned int features[] = {
> 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
> 	VIRTIO_BALLOON_F_FREE_PAGE_HINT,
> 	VIRTIO_BALLOON_F_PAGE_POISON,
> +	VIRTIO_BALLOON_F_CONT_PAGES,
> };
> 
> static struct virtio_driver virtio_balloon_driver = {
> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> index 338aa27..8180bbf 100644
> --- a/include/linux/balloon_compaction.h
> +++ b/include/linux/balloon_compaction.h
> @@ -60,15 +60,22 @@ struct balloon_dev_info {
> 	struct inode *inode;
> };
> 
> -extern struct page *balloon_page_alloc(void);
> +extern struct page *balloon_pages_alloc(unsigned int order);
> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
> 				 struct page *page);
> +extern void balloon_pages_enqueue(struct balloon_dev_info *b_dev_info,
> +				  struct page *page, unsigned int order);
> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
> extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
> 				      struct list_head *pages);
> extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> 				     struct list_head *pages, size_t n_req_pages);
> 
> +static inline struct page *balloon_page_alloc(void)
> +{
> +	return balloon_pages_alloc(0);
> +}
> +
> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
> {
> 	balloon->isolated_pages = 0;
> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> index a1966cd7..e21b0aa 100644
> --- a/include/uapi/linux/virtio_balloon.h
> +++ b/include/uapi/linux/virtio_balloon.h
> @@ -36,6 +36,7 @@
> #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> #define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
> #define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
> +#define VIRTIO_BALLOON_F_CONT_PAGES	5 /* VQ to report continuous pages */
> 
> /* Size of a PFN in the balloon interface. */
> #define VIRTIO_BALLOON_PFN_SHIFT 12
> @@ -51,6 +52,8 @@ struct virtio_balloon_config {
> 	__u32 free_page_report_cmd_id;
> 	/* Stores PAGE_POISON if page poisoning is in use */
> 	__u32 poison_val;
> +	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set */
> +	__u32 pages_order;
> };
> 
> #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index 26de020..87df4b2 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -112,8 +112,8 @@ size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
> EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
> 
> /*
> - * balloon_page_alloc - allocates a new page for insertion into the balloon
> - *			page list.
> + * balloon_pages_alloc - allocates a new page for insertion into the balloon
> + *			 page list.
>  *
>  * Driver must call this function to properly allocate a new balloon page.
>  * Driver must call balloon_page_enqueue before definitively removing the page
> @@ -121,14 +121,21 @@ EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
>  *
>  * Return: struct page for the allocated page or NULL on allocation failure.
>  */
> -struct page *balloon_page_alloc(void)
> +struct page *balloon_pages_alloc(unsigned int order)
> {
> -	struct page *page = alloc_page(balloon_mapping_gfp_mask() |
> -				       __GFP_NOMEMALLOC | __GFP_NORETRY |
> -				       __GFP_NOWARN);
> +	gfp_t gfp_mask;
> +
> +	if (order > 1)
> +		gfp_mask = __GFP_RETRY_MAYFAIL;
> +	else
> +		gfp_mask = __GFP_NORETRY;
> +
> +	struct page *page = alloc_pages(balloon_mapping_gfp_mask() |
> +					gfp_mask | __GFP_NOMEMALLOC |
> +					__GFP_NOWARN, order);
> 	return page;
> }
> -EXPORT_SYMBOL_GPL(balloon_page_alloc);
> +EXPORT_SYMBOL_GPL(balloon_pages_alloc);
> 
> /*
>  * balloon_page_enqueue - inserts a new page into the balloon page list.
> @@ -155,6 +162,25 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
> EXPORT_SYMBOL_GPL(balloon_page_enqueue);
> 
> /*
> + * balloon_pages_enqueue - inserts continuous pages into the balloon page list.
> + */
> +void balloon_pages_enqueue(struct balloon_dev_info *b_dev_info,
> +			   struct page *page, unsigned int order)
> +{
> +	unsigned long flags;
> +	unsigned long pfn = page_to_pfn(page);
> +	unsigned long last_pfn = pfn + (1 << order) - 1;
> +
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	for (; pfn <= last_pfn; pfn++) {
> +		page = pfn_to_page(pfn);
> +		balloon_page_enqueue_one(b_dev_info, page);
> +	}
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(balloon_pages_enqueue);
> +
> +/*
>  * balloon_page_dequeue - removes a page from balloon's page list and returns
>  *			  its address to allow the driver to release the page.
>  * @b_dev_info: balloon device decriptor where we will grab a page from.
> -- 
> 2.7.4
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT
  2020-03-22 16:04 ` [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT Hui Zhu
  2020-03-22 16:19   ` no-reply
@ 2020-03-26  7:06   ` teawater
  2020-03-26  7:20     ` David Hildenbrand
  2020-03-26  7:25   ` Michael S. Tsirkin
  2 siblings, 1 reply; 9+ messages in thread
From: teawater @ 2020-03-26  7:06 UTC (permalink / raw)
  To: Hui Zhu
  Cc: pagupta, aquini, Michael S. Tsirkin, qemu-devel, mojha, david,
	linux-kernel, virtualization, linux-mm, namit, Andrew Morton,
	Jason Wang

Ping.

Thanks,
Hui

> 2020年3月23日 00:04,Hui Zhu <teawater@gmail.com> 写道:
> 
> If the guest kernel has many fragmentation pages, use virtio_balloon
> will split THP of QEMU when it calls MADV_DONTNEED madvise to release
> the balloon pages.
> Set option cont-pages to on will open flags VIRTIO_BALLOON_VQ_INFLATE_CONT
> and set continuous pages order to THP order.
> Then It will get continuous pages PFN from VQ icvq use use madvise
> MADV_DONTNEED release the THP page.
> This will handle the THP split issue.
> 
> Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
> ---
> hw/virtio/virtio-balloon.c                      | 32 +++++++++++++++++++++----
> include/hw/virtio/virtio-balloon.h              |  4 +++-
> include/standard-headers/linux/virtio_balloon.h |  4 ++++
> 3 files changed, 35 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index a4729f7..88bdaca 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -34,6 +34,7 @@
> #include "hw/virtio/virtio-access.h"
> 
> #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
> +#define CONT_PAGES_ORDER   9
> 
> typedef struct PartiallyBalloonedPage {
>     ram_addr_t base_gpa;
> @@ -65,7 +66,8 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
> 
> static void balloon_inflate_page(VirtIOBalloon *balloon,
>                                  MemoryRegion *mr, hwaddr mr_offset,
> -                                 PartiallyBalloonedPage *pbp)
> +                                 PartiallyBalloonedPage *pbp, 
> +                                 bool is_cont_pages)
> {
>     void *addr = memory_region_get_ram_ptr(mr) + mr_offset;
>     ram_addr_t rb_offset, rb_aligned_offset, base_gpa;
> @@ -76,6 +78,13 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
>     /* XXX is there a better way to get to the RAMBlock than via a
>      * host address? */
>     rb = qemu_ram_block_from_host(addr, false, &rb_offset);
> +
> +    if (is_cont_pages) {
> +        ram_block_discard_range(rb, rb_offset,
> +                                BALLOON_PAGE_SIZE << CONT_PAGES_ORDER);
> +        return;
> +    }
> +
>     rb_page_size = qemu_ram_pagesize(rb);
> 
>     if (rb_page_size == BALLOON_PAGE_SIZE) {
> @@ -361,9 +370,10 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>             trace_virtio_balloon_handle_output(memory_region_name(section.mr),
>                                                pa);
>             if (!qemu_balloon_is_inhibited()) {
> -                if (vq == s->ivq) {
> +                if (vq == s->ivq || vq == s->icvq) {
>                     balloon_inflate_page(s, section.mr,
> -                                         section.offset_within_region, &pbp);
> +                                         section.offset_within_region, &pbp,
> +                                         vq == s->icvq);
>                 } else if (vq == s->dvq) {
>                     balloon_deflate_page(s, section.mr, section.offset_within_region);
>                 } else {
> @@ -618,9 +628,12 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>     if (s->qemu_4_0_config_size) {
>         return sizeof(struct virtio_balloon_config);
>     }
> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> +    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
>         return sizeof(struct virtio_balloon_config);
>     }
> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> +        return offsetof(struct virtio_balloon_config, pages_order);
> +    }
>     if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>         return offsetof(struct virtio_balloon_config, poison_val);
>     }
> @@ -646,6 +659,10 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>                        cpu_to_le32(VIRTIO_BALLOON_CMD_ID_DONE);
>     }
> 
> +    if (virtio_has_feature(dev->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
> +        config.pages_order = cpu_to_le32(CONT_PAGES_ORDER);
> +    }
> +
>     trace_virtio_balloon_get_config(config.num_pages, config.actual);
>     memcpy(config_data, &config, virtio_balloon_config_size(dev));
> }
> @@ -816,6 +833,11 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>             virtio_error(vdev, "iothread is missing");
>         }
>     }
> +
> +    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
> +        s->icvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
> +    }
> +
>     reset_stats(s);
> }
> 
> @@ -916,6 +938,8 @@ static Property virtio_balloon_properties[] = {
>                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
>     DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
>                     VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> +    DEFINE_PROP_BIT("cont-pages", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_CONT_PAGES, false),
>     /* QEMU 4.0 accidentally changed the config size even when free-page-hint
>      * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
>      * property retains this quirk for QEMU 4.1 machine types.
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index d1c968d..61d2419 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
> 
> typedef struct VirtIOBalloon {
>     VirtIODevice parent_obj;
> -    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
> +    VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq;
>     uint32_t free_page_report_status;
>     uint32_t num_pages;
>     uint32_t actual;
> @@ -70,6 +70,8 @@ typedef struct VirtIOBalloon {
>     uint32_t host_features;
> 
>     bool qemu_4_0_config_size;
> +
> +    uint32_t pages_order;
> } VirtIOBalloon;
> 
> #endif
> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
> index 9375ca2..ee18be7 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -36,6 +36,8 @@
> #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> #define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
> #define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
> +#define VIRTIO_BALLOON_F_CONT_PAGES	5 /* VQ to report continuous pages */
> +
> 
> /* Size of a PFN in the balloon interface. */
> #define VIRTIO_BALLOON_PFN_SHIFT 12
> @@ -51,6 +53,8 @@ struct virtio_balloon_config {
> 	uint32_t free_page_report_cmd_id;
> 	/* Stores PAGE_POISON if page poisoning is in use */
> 	uint32_t poison_val;
> +	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set */
> +	uint32_t pages_order;
> };
> 
> #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> -- 
> 2.7.4
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue
  2020-03-22 16:04 [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue Hui Zhu
  2020-03-22 16:04 ` [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT Hui Zhu
  2020-03-26  6:45 ` [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue teawater
@ 2020-03-26  7:17 ` Michael S. Tsirkin
  2020-04-09  6:05   ` teawater
  2 siblings, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2020-03-26  7:17 UTC (permalink / raw)
  To: Hui Zhu
  Cc: pagupta, aquini, david, qemu-devel, mojha, linux-kernel,
	virtualization, linux-mm, namit, Hui Zhu, akpm, jasowang

First, either QEMU or Linux version of any interface changes
should be copied to the virtio TC.

:


On Mon, Mar 23, 2020 at 12:04:56AM +0800, Hui Zhu wrote:
> The first version is in [1].
> According to the comments from Michael and David, I updated the patch.
> 1. Added a separate vq inflate_cont_vq to transport inflate continuous
>    pages.
> 2. Set all the pages in the continuous pages movable then they can be
>    compaction.
> 3. Added a new element pages_order to virtio_balloon_config.  It is the
>    inflate pages order that is set by the QEMU.
> 4. If the balloon cannot get any continuous pages from the system.
>    Go back to use the single page to fill the balloon.
> 5.  Use balloon_pages_alloc to allocate the single page and continuous
>     pages.  Replace __GFP_NORETRY with __GFP_RETRY_MAYFAIL when allocating
>     the continuous pages because it can increase the success rate of
>     allocating large chunks of memory.
> 
> Following is the introduction of the function.
> If the guest kernel has many fragmentation pages, use virtio_balloon
> will split THP of QEMU when it calls MADV_DONTNEED madvise to release
> the balloon pages.
> This is an example in a VM with 1G memory 1CPU:
> // This is the THP number before VM execution in the host.
> // None use THP.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:         0 kB
> 
> // After VM start, use usemem
> // (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git)
> // punch-holes function generates 400m fragmentation pages in the guest
> // kernel.
> usemem --punch-holes -s -1 800m &
> 
> // This is the THP number after this command in the host.
> // Some THP is used by VM because usemem will access 800M memory
> // in the guest.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:    976896 kB
> 
> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
> (qemu) device_add virtio-balloon-pci,id=balloon1
> (qemu) info balloon
> balloon: actual=1024
> (qemu) balloon 600
> (qemu) info balloon
> balloon: actual=600
> 
> // This is the THP number after inflate the balloon in the host.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:    151552 kB
> 
> THP number decreased more than 800M.
> The reason is usemem with punch-holes option will free every other
> page after allocation.  Then 400M free memory inside the guest kernel
> is fragmentation pages.
> The guest kernel will use them to inflate the balloon.  When these
> fragmentation pages are freed, THP will be split.
> 
> This commit tries to handle this with add a new flag
> VIRTIO_BALLOON_VQ_INFLATE_CONT.
> When this flag is set, the balloon will try to use continuous pages
> inflate the balloon.  And the pages order is set to THP order.
> Then THP pages will be freed together in the host.
> This is an example in a VM with 1G memory 1CPU:
> // This is the THP number before VM execution in the host.
> // None use THP.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:         0 kB
> 
> // After VM start, use usemem punch-holes function generates 400M
> // fragmentation pages in the guest kernel.
> usemem --punch-holes -s -1 800m &
> 
> // This is the THP number after this command in the host.
> // Some THP is used by VM because usemem will access 800M memory
> // in the guest.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:    976896 kB
> 
> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
> (qemu) device_add virtio-balloon-pci,id=balloon1,cont-pages=on
> (qemu) info balloon
> balloon: actual=1024
> (qemu) balloon 600
> (qemu) info balloon
> balloon: actual=600
> 
> // This is the THP number after inflate the balloon in the host.
> cat /proc/meminfo | grep AnonHugePages:
> AnonHugePages:    610304 kB
> 
> The THP number decreases 358M.  This shows that
> VIRTIO_BALLOON_VQ_INFLATE_CONT can help handle the THP split issue.
> 
> [1] https://lkml.org/lkml/2020/3/12/144
> 
> Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>

I'd like to repeat my original idea of doing large page allocations
unconditionally within guest.


> ---
>  drivers/virtio/virtio_balloon.c     | 78 ++++++++++++++++++++++++++++++-------
>  include/linux/balloon_compaction.h  |  9 ++++-
>  include/uapi/linux/virtio_balloon.h |  3 ++
>  mm/balloon_compaction.c             | 40 +++++++++++++++----
>  4 files changed, 109 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 341458f..fbd2b02f 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -47,6 +47,7 @@ enum virtio_balloon_vq {
>  	VIRTIO_BALLOON_VQ_DEFLATE,
>  	VIRTIO_BALLOON_VQ_STATS,
>  	VIRTIO_BALLOON_VQ_FREE_PAGE,
> +	VIRTIO_BALLOON_VQ_INFLATE_CONT,
>  	VIRTIO_BALLOON_VQ_MAX
>  };
>  
> @@ -56,7 +57,8 @@ enum virtio_balloon_config_read {
>  
>  struct virtio_balloon {
>  	struct virtio_device *vdev;
> -	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq,
> +			 *inflate_cont_vq;
>  
>  	/* Balloon's own wq for cpu-intensive work items */
>  	struct workqueue_struct *balloon_wq;
> @@ -114,6 +116,12 @@ struct virtio_balloon {
>  
>  	/* To register a shrinker to shrink memory upon memory pressure */
>  	struct shrinker shrinker;
> +
> +	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set.
> +	 * if VIRTIO_BALLOON_F_CONT_PAGES is not set,
> +	 * pages_order will be set to 0.
> +	 */
> +	__u32 pages_order;
>  };
>  
>  static struct virtio_device_id id_table[] = {
> @@ -169,10 +177,11 @@ static void set_page_pfns(struct virtio_balloon *vb,
>  					  page_to_balloon_pfn(page) + i);
>  }
>  
> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
> +static unsigned int fill_balloon_order(struct virtio_balloon *vb, size_t num,
> +				       unsigned int order)
>  {
> -	unsigned num_allocated_pages;
> -	unsigned num_pfns;
> +	unsigned int num_allocated_pages;
> +	unsigned int num_pfns;
>  	struct page *page;
>  	LIST_HEAD(pages);
>  
> @@ -181,12 +190,12 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  
>  	for (num_pfns = 0; num_pfns < num;
>  	     num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
> -		struct page *page = balloon_page_alloc();
> +		struct page *page = balloon_pages_alloc(order);
>  
>  		if (!page) {
>  			dev_info_ratelimited(&vb->vdev->dev,
> -					     "Out of puff! Can't get %u pages\n",
> -					     VIRTIO_BALLOON_PAGES_PER_PAGE);
> +				"Out of puff! Can't get %u pages\n",
> +				VIRTIO_BALLOON_PAGES_PER_PAGE << order);
>  			/* Sleep for at least 1/5 of a second before retry. */
>  			msleep(200);
>  			break;
> @@ -200,25 +209,48 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>  	vb->num_pfns = 0;
>  
>  	while ((page = balloon_page_pop(&pages))) {
> -		balloon_page_enqueue(&vb->vb_dev_info, page);
> +		balloon_pages_enqueue(&vb->vb_dev_info, page, order);
>  
>  		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> -		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
> +		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE << order;
>  		if (!virtio_has_feature(vb->vdev,
>  					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> -			adjust_managed_page_count(page, -1);
> +			adjust_managed_page_count(page, -(1 << order));
>  		vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
>  	}
>  
> -	num_allocated_pages = vb->num_pfns;
> +	num_allocated_pages = vb->num_pfns << order;
>  	/* Did we get any? */
> -	if (vb->num_pfns != 0)
> -		tell_host(vb, vb->inflate_vq);
> +	if (vb->num_pfns != 0) {
> +		if (order)
> +			tell_host(vb, vb->inflate_cont_vq);
> +		else
> +			tell_host(vb, vb->inflate_vq);
> +	}
>  	mutex_unlock(&vb->balloon_lock);
>  
>  	return num_allocated_pages;
>  }
>  
> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num)
> +{
> +	unsigned int num_allocated_pages = 0;
> +
> +	if (vb->pages_order) {
> +		num_allocated_pages = fill_balloon_order(vb, num,
> +							 vb->pages_order);
> +		if (!num_allocated_pages)
> +			dev_info_ratelimited(&vb->vdev->dev,
> +				"Fail get any continuous pages in order %u\n",
> +				vb->pages_order);
> +	}
> +
> +	if (!num_allocated_pages)
> +		num_allocated_pages = fill_balloon_order(vb, num, 0);
> +
> +	return num_allocated_pages;
> +}
> +
>  static void release_pages_balloon(struct virtio_balloon *vb,
>  				 struct list_head *pages)
>  {


Hmm no this is not what was previously suggested. David Hildenbrand proposed
starting with highest order and iterating until we reach 0.
And I proposed a new API to allocate some memory between min and max
orders.


> @@ -481,6 +513,8 @@ static int init_vqs(struct virtio_balloon *vb)
>  	names[VIRTIO_BALLOON_VQ_STATS] = NULL;
>  	callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
>  	names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
> +	callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
> +	names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
>  
>  	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
>  		names[VIRTIO_BALLOON_VQ_STATS] = "stats";
> @@ -492,6 +526,11 @@ static int init_vqs(struct virtio_balloon *vb)
>  		callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
>  	}
>  
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
> +		names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = "inflate_cont";
> +		callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = balloon_ack;
> +	}
> +
>  	err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,
>  					 vqs, callbacks, names, NULL, NULL);
>  	if (err)
> @@ -524,6 +563,10 @@ static int init_vqs(struct virtio_balloon *vb)
>  	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
>  		vb->free_page_vq = vqs[VIRTIO_BALLOON_VQ_FREE_PAGE];
>  
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES))
> +		vb->inflate_cont_vq
> +			= vqs[VIRTIO_BALLOON_VQ_INFLATE_CONT];
> +
>  	return 0;
>  }
>  
> @@ -945,6 +988,14 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  		if (err)
>  			goto out_del_balloon_wq;
>  	}
> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
> +		virtio_cread(vb->vdev, struct virtio_balloon_config,
> +			     pages_order, &vb->pages_order);
> +		if (!virtio_has_feature(vb->vdev, VIRTIO_F_VERSION_1))
> +			vb->pages_order
> +				= le32_to_cpu((__force __le32)vb->pages_order);
> +	} else
> +		vb->pages_order = 0;
>  	virtio_device_ready(vdev);
>  
>  	if (towards_target(vb))
> @@ -1058,6 +1109,7 @@ static unsigned int features[] = {
>  	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
>  	VIRTIO_BALLOON_F_FREE_PAGE_HINT,
>  	VIRTIO_BALLOON_F_PAGE_POISON,
> +	VIRTIO_BALLOON_F_CONT_PAGES,
>  };
>  
>  static struct virtio_driver virtio_balloon_driver = {
> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
> index 338aa27..8180bbf 100644
> --- a/include/linux/balloon_compaction.h
> +++ b/include/linux/balloon_compaction.h
> @@ -60,15 +60,22 @@ struct balloon_dev_info {
>  	struct inode *inode;
>  };
>  
> -extern struct page *balloon_page_alloc(void);
> +extern struct page *balloon_pages_alloc(unsigned int order);
>  extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>  				 struct page *page);
> +extern void balloon_pages_enqueue(struct balloon_dev_info *b_dev_info,
> +				  struct page *page, unsigned int order);
>  extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
>  extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>  				      struct list_head *pages);
>  extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>  				     struct list_head *pages, size_t n_req_pages);
>  
> +static inline struct page *balloon_page_alloc(void)
> +{
> +	return balloon_pages_alloc(0);
> +}
> +
>  static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>  {
>  	balloon->isolated_pages = 0;
> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
> index a1966cd7..e21b0aa 100644
> --- a/include/uapi/linux/virtio_balloon.h
> +++ b/include/uapi/linux/virtio_balloon.h
> @@ -36,6 +36,7 @@
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
>  #define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
>  #define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
> +#define VIRTIO_BALLOON_F_CONT_PAGES	5 /* VQ to report continuous pages */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
> @@ -51,6 +52,8 @@ struct virtio_balloon_config {
>  	__u32 free_page_report_cmd_id;
>  	/* Stores PAGE_POISON if page poisoning is in use */
>  	__u32 poison_val;
> +	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set */
> +	__u32 pages_order;
>  };
>  
>  #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index 26de020..87df4b2 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -112,8 +112,8 @@ size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>  EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
>  
>  /*
> - * balloon_page_alloc - allocates a new page for insertion into the balloon
> - *			page list.
> + * balloon_pages_alloc - allocates a new page for insertion into the balloon
> + *			 page list.
>   *
>   * Driver must call this function to properly allocate a new balloon page.
>   * Driver must call balloon_page_enqueue before definitively removing the page
> @@ -121,14 +121,21 @@ EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
>   *
>   * Return: struct page for the allocated page or NULL on allocation failure.
>   */
> -struct page *balloon_page_alloc(void)
> +struct page *balloon_pages_alloc(unsigned int order)
>  {
> -	struct page *page = alloc_page(balloon_mapping_gfp_mask() |
> -				       __GFP_NOMEMALLOC | __GFP_NORETRY |
> -				       __GFP_NOWARN);
> +	gfp_t gfp_mask;
> +
> +	if (order > 1)
> +		gfp_mask = __GFP_RETRY_MAYFAIL;
> +	else
> +		gfp_mask = __GFP_NORETRY;
> +
> +	struct page *page = alloc_pages(balloon_mapping_gfp_mask() |
> +					gfp_mask | __GFP_NOMEMALLOC |
> +					__GFP_NOWARN, order);
>  	return page;
>  }
> -EXPORT_SYMBOL_GPL(balloon_page_alloc);
> +EXPORT_SYMBOL_GPL(balloon_pages_alloc);
>  
>  /*
>   * balloon_page_enqueue - inserts a new page into the balloon page list.
> @@ -155,6 +162,25 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>  EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>  
>  /*
> + * balloon_pages_enqueue - inserts continuous pages into the balloon page list.
> + */
> +void balloon_pages_enqueue(struct balloon_dev_info *b_dev_info,
> +			   struct page *page, unsigned int order)
> +{
> +	unsigned long flags;
> +	unsigned long pfn = page_to_pfn(page);
> +	unsigned long last_pfn = pfn + (1 << order) - 1;
> +
> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
> +	for (; pfn <= last_pfn; pfn++) {
> +		page = pfn_to_page(pfn);
> +		balloon_page_enqueue_one(b_dev_info, page);
> +	}
> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(balloon_pages_enqueue);
> +
> +/*
>   * balloon_page_dequeue - removes a page from balloon's page list and returns
>   *			  its address to allow the driver to release the page.
>   * @b_dev_info: balloon device decriptor where we will grab a page from.
> -- 
> 2.7.4



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT
  2020-03-26  7:06   ` teawater
@ 2020-03-26  7:20     ` David Hildenbrand
  0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-03-26  7:20 UTC (permalink / raw)
  To: teawater
  Cc: pagupta, aquini, Michael S. Tsirkin, qemu-devel, mojha, david,
	linux-kernel, virtualization, linux-mm, namit, Andrew Morton,
	Jason Wang, Hui Zhu



> Am 26.03.2020 um 08:06 schrieb teawater <teawaterz@linux.alibaba.com>:
> 
> Ping.
> 

On paid leave this week. Will try to have a look next week, but it could take a bit longer.

Cheers

> Thanks,
> Hui
> 
>> 2020年3月23日 00:04,Hui Zhu <teawater@gmail.com> 写道:
>> 
>> If the guest kernel has many fragmentation pages, use virtio_balloon
>> will split THP of QEMU when it calls MADV_DONTNEED madvise to release
>> the balloon pages.
>> Set option cont-pages to on will open flags VIRTIO_BALLOON_VQ_INFLATE_CONT
>> and set continuous pages order to THP order.
>> Then It will get continuous pages PFN from VQ icvq use use madvise
>> MADV_DONTNEED release the THP page.
>> This will handle the THP split issue.
>> 
>> Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
>> ---
>> hw/virtio/virtio-balloon.c                      | 32 +++++++++++++++++++++----
>> include/hw/virtio/virtio-balloon.h              |  4 +++-
>> include/standard-headers/linux/virtio_balloon.h |  4 ++++
>> 3 files changed, 35 insertions(+), 5 deletions(-)
>> 
>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>> index a4729f7..88bdaca 100644
>> --- a/hw/virtio/virtio-balloon.c
>> +++ b/hw/virtio/virtio-balloon.c
>> @@ -34,6 +34,7 @@
>> #include "hw/virtio/virtio-access.h"
>> 
>> #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
>> +#define CONT_PAGES_ORDER   9
>> 
>> typedef struct PartiallyBalloonedPage {
>>    ram_addr_t base_gpa;
>> @@ -65,7 +66,8 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
>> 
>> static void balloon_inflate_page(VirtIOBalloon *balloon,
>>                                 MemoryRegion *mr, hwaddr mr_offset,
>> -                                 PartiallyBalloonedPage *pbp)
>> +                                 PartiallyBalloonedPage *pbp, 
>> +                                 bool is_cont_pages)
>> {
>>    void *addr = memory_region_get_ram_ptr(mr) + mr_offset;
>>    ram_addr_t rb_offset, rb_aligned_offset, base_gpa;
>> @@ -76,6 +78,13 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
>>    /* XXX is there a better way to get to the RAMBlock than via a
>>     * host address? */
>>    rb = qemu_ram_block_from_host(addr, false, &rb_offset);
>> +
>> +    if (is_cont_pages) {
>> +        ram_block_discard_range(rb, rb_offset,
>> +                                BALLOON_PAGE_SIZE << CONT_PAGES_ORDER);
>> +        return;
>> +    }
>> +
>>    rb_page_size = qemu_ram_pagesize(rb);
>> 
>>    if (rb_page_size == BALLOON_PAGE_SIZE) {
>> @@ -361,9 +370,10 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>>            trace_virtio_balloon_handle_output(memory_region_name(section.mr),
>>                                               pa);
>>            if (!qemu_balloon_is_inhibited()) {
>> -                if (vq == s->ivq) {
>> +                if (vq == s->ivq || vq == s->icvq) {
>>                    balloon_inflate_page(s, section.mr,
>> -                                         section.offset_within_region, &pbp);
>> +                                         section.offset_within_region, &pbp,
>> +                                         vq == s->icvq);
>>                } else if (vq == s->dvq) {
>>                    balloon_deflate_page(s, section.mr, section.offset_within_region);
>>                } else {
>> @@ -618,9 +628,12 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>>    if (s->qemu_4_0_config_size) {
>>        return sizeof(struct virtio_balloon_config);
>>    }
>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
>> +    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
>>        return sizeof(struct virtio_balloon_config);
>>    }
>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
>> +        return offsetof(struct virtio_balloon_config, pages_order);
>> +    }
>>    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>        return offsetof(struct virtio_balloon_config, poison_val);
>>    }
>> @@ -646,6 +659,10 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>>                       cpu_to_le32(VIRTIO_BALLOON_CMD_ID_DONE);
>>    }
>> 
>> +    if (virtio_has_feature(dev->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
>> +        config.pages_order = cpu_to_le32(CONT_PAGES_ORDER);
>> +    }
>> +
>>    trace_virtio_balloon_get_config(config.num_pages, config.actual);
>>    memcpy(config_data, &config, virtio_balloon_config_size(dev));
>> }
>> @@ -816,6 +833,11 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>>            virtio_error(vdev, "iothread is missing");
>>        }
>>    }
>> +
>> +    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
>> +        s->icvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>> +    }
>> +
>>    reset_stats(s);
>> }
>> 
>> @@ -916,6 +938,8 @@ static Property virtio_balloon_properties[] = {
>>                    VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
>>    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
>>                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
>> +    DEFINE_PROP_BIT("cont-pages", VirtIOBalloon, host_features,
>> +                    VIRTIO_BALLOON_F_CONT_PAGES, false),
>>    /* QEMU 4.0 accidentally changed the config size even when free-page-hint
>>     * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
>>     * property retains this quirk for QEMU 4.1 machine types.
>> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
>> index d1c968d..61d2419 100644
>> --- a/include/hw/virtio/virtio-balloon.h
>> +++ b/include/hw/virtio/virtio-balloon.h
>> @@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
>> 
>> typedef struct VirtIOBalloon {
>>    VirtIODevice parent_obj;
>> -    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
>> +    VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq;
>>    uint32_t free_page_report_status;
>>    uint32_t num_pages;
>>    uint32_t actual;
>> @@ -70,6 +70,8 @@ typedef struct VirtIOBalloon {
>>    uint32_t host_features;
>> 
>>    bool qemu_4_0_config_size;
>> +
>> +    uint32_t pages_order;
>> } VirtIOBalloon;
>> 
>> #endif
>> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
>> index 9375ca2..ee18be7 100644
>> --- a/include/standard-headers/linux/virtio_balloon.h
>> +++ b/include/standard-headers/linux/virtio_balloon.h
>> @@ -36,6 +36,8 @@
>> #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM    2 /* Deflate balloon on OOM */
>> #define VIRTIO_BALLOON_F_FREE_PAGE_HINT    3 /* VQ to report free pages */
>> #define VIRTIO_BALLOON_F_PAGE_POISON    4 /* Guest is using page poisoning */
>> +#define VIRTIO_BALLOON_F_CONT_PAGES    5 /* VQ to report continuous pages */
>> +
>> 
>> /* Size of a PFN in the balloon interface. */
>> #define VIRTIO_BALLOON_PFN_SHIFT 12
>> @@ -51,6 +53,8 @@ struct virtio_balloon_config {
>>    uint32_t free_page_report_cmd_id;
>>    /* Stores PAGE_POISON if page poisoning is in use */
>>    uint32_t poison_val;
>> +    /* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set */
>> +    uint32_t pages_order;
>> };
>> 
>> #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
>> -- 
>> 2.7.4
>> 
> 



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT
  2020-03-22 16:04 ` [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT Hui Zhu
  2020-03-22 16:19   ` no-reply
  2020-03-26  7:06   ` teawater
@ 2020-03-26  7:25   ` Michael S. Tsirkin
  2 siblings, 0 replies; 9+ messages in thread
From: Michael S. Tsirkin @ 2020-03-26  7:25 UTC (permalink / raw)
  To: Hui Zhu
  Cc: pagupta, aquini, david, qemu-devel, mojha, linux-kernel,
	virtualization, linux-mm, namit, Hui Zhu, akpm, jasowang

On Mon, Mar 23, 2020 at 12:04:57AM +0800, Hui Zhu wrote:
> If the guest kernel has many fragmentation pages, use virtio_balloon
> will split THP of QEMU when it calls MADV_DONTNEED madvise to release
> the balloon pages.
> Set option cont-pages to on will open flags VIRTIO_BALLOON_VQ_INFLATE_CONT
> and set continuous pages order to THP order.
> Then It will get continuous pages PFN from VQ icvq use use madvise
> MADV_DONTNEED release the THP page.
> This will handle the THP split issue.
> 
> Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
> ---
>  hw/virtio/virtio-balloon.c                      | 32 +++++++++++++++++++++----
>  include/hw/virtio/virtio-balloon.h              |  4 +++-
>  include/standard-headers/linux/virtio_balloon.h |  4 ++++
>  3 files changed, 35 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index a4729f7..88bdaca 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -34,6 +34,7 @@
>  #include "hw/virtio/virtio-access.h"
>  
>  #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
> +#define CONT_PAGES_ORDER   9
>  
>  typedef struct PartiallyBalloonedPage {
>      ram_addr_t base_gpa;

This doesn't look right to me. I suspect this is different between
different hosts. Fixing this would also be tricky as we
might need to migrate beween two hosts with different
huge page sizes.

My proposal is to instead enhance the PartiallyBalloonedPage
machinery, teaching it to handle the case where host page
size is smaller than the supported number of subpages.


> @@ -65,7 +66,8 @@ static bool virtio_balloon_pbp_matches(PartiallyBalloonedPage *pbp,
>  
>  static void balloon_inflate_page(VirtIOBalloon *balloon,
>                                   MemoryRegion *mr, hwaddr mr_offset,
> -                                 PartiallyBalloonedPage *pbp)
> +                                 PartiallyBalloonedPage *pbp, 
> +                                 bool is_cont_pages)
>  {
>      void *addr = memory_region_get_ram_ptr(mr) + mr_offset;
>      ram_addr_t rb_offset, rb_aligned_offset, base_gpa;
> @@ -76,6 +78,13 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
>      /* XXX is there a better way to get to the RAMBlock than via a
>       * host address? */
>      rb = qemu_ram_block_from_host(addr, false, &rb_offset);
> +
> +    if (is_cont_pages) {
> +        ram_block_discard_range(rb, rb_offset,
> +                                BALLOON_PAGE_SIZE << CONT_PAGES_ORDER);
> +        return;
> +    }
> +
>      rb_page_size = qemu_ram_pagesize(rb);
>  
>      if (rb_page_size == BALLOON_PAGE_SIZE) {
> @@ -361,9 +370,10 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>              trace_virtio_balloon_handle_output(memory_region_name(section.mr),
>                                                 pa);
>              if (!qemu_balloon_is_inhibited()) {
> -                if (vq == s->ivq) {
> +                if (vq == s->ivq || vq == s->icvq) {
>                      balloon_inflate_page(s, section.mr,
> -                                         section.offset_within_region, &pbp);
> +                                         section.offset_within_region, &pbp,
> +                                         vq == s->icvq);
>                  } else if (vq == s->dvq) {
>                      balloon_deflate_page(s, section.mr, section.offset_within_region);
>                  } else {
> @@ -618,9 +628,12 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>      if (s->qemu_4_0_config_size) {
>          return sizeof(struct virtio_balloon_config);
>      }
> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> +    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
>          return sizeof(struct virtio_balloon_config);
>      }
> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> +        return offsetof(struct virtio_balloon_config, pages_order);
> +    }
>      if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>          return offsetof(struct virtio_balloon_config, poison_val);
>      }
> @@ -646,6 +659,10 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>                         cpu_to_le32(VIRTIO_BALLOON_CMD_ID_DONE);
>      }
>  
> +    if (virtio_has_feature(dev->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
> +        config.pages_order = cpu_to_le32(CONT_PAGES_ORDER);
> +    }
> +
>      trace_virtio_balloon_get_config(config.num_pages, config.actual);
>      memcpy(config_data, &config, virtio_balloon_config_size(dev));
>  }
> @@ -816,6 +833,11 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>              virtio_error(vdev, "iothread is missing");
>          }
>      }
> +
> +    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_CONT_PAGES)) {
> +        s->icvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
> +    }
> +
>      reset_stats(s);
>  }
>  
> @@ -916,6 +938,8 @@ static Property virtio_balloon_properties[] = {
>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
>      DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
>                      VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> +    DEFINE_PROP_BIT("cont-pages", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_CONT_PAGES, false),
>      /* QEMU 4.0 accidentally changed the config size even when free-page-hint
>       * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
>       * property retains this quirk for QEMU 4.1 machine types.
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index d1c968d..61d2419 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
>  
>  typedef struct VirtIOBalloon {
>      VirtIODevice parent_obj;
> -    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
> +    VirtQueue *ivq, *dvq, *svq, *free_page_vq, *icvq;
>      uint32_t free_page_report_status;
>      uint32_t num_pages;
>      uint32_t actual;
> @@ -70,6 +70,8 @@ typedef struct VirtIOBalloon {
>      uint32_t host_features;
>  
>      bool qemu_4_0_config_size;
> +
> +    uint32_t pages_order;
>  } VirtIOBalloon;
>  
>  #endif
> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
> index 9375ca2..ee18be7 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -36,6 +36,8 @@
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
>  #define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
>  #define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
> +#define VIRTIO_BALLOON_F_CONT_PAGES	5 /* VQ to report continuous pages */
> +
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
> @@ -51,6 +53,8 @@ struct virtio_balloon_config {
>  	uint32_t free_page_report_cmd_id;
>  	/* Stores PAGE_POISON if page poisoning is in use */
>  	uint32_t poison_val;
> +	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set */
> +	uint32_t pages_order;
>  };
>  
>  #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> -- 
> 2.7.4



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue
  2020-03-26  7:17 ` Michael S. Tsirkin
@ 2020-04-09  6:05   ` teawater
  0 siblings, 0 replies; 9+ messages in thread
From: teawater @ 2020-04-09  6:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: pagupta, aquini, david, qemu-devel, mojha, linux-kernel,
	virtualization, linux-mm, Nadav Amit, Andrew Morton, Jason Wang,
	Hui Zhu



> 2020年3月26日 15:17,Michael S. Tsirkin <mst@redhat.com> 写道:
> 
> First, either QEMU or Linux version of any interface changes
> should be copied to the virtio TC.
> 
> :

OK.

> 
> 
> On Mon, Mar 23, 2020 at 12:04:56AM +0800, Hui Zhu wrote:
>> The first version is in [1].
>> According to the comments from Michael and David, I updated the patch.
>> 1. Added a separate vq inflate_cont_vq to transport inflate continuous
>>   pages.
>> 2. Set all the pages in the continuous pages movable then they can be
>>   compaction.
>> 3. Added a new element pages_order to virtio_balloon_config.  It is the
>>   inflate pages order that is set by the QEMU.
>> 4. If the balloon cannot get any continuous pages from the system.
>>   Go back to use the single page to fill the balloon.
>> 5.  Use balloon_pages_alloc to allocate the single page and continuous
>>    pages.  Replace __GFP_NORETRY with __GFP_RETRY_MAYFAIL when allocating
>>    the continuous pages because it can increase the success rate of
>>    allocating large chunks of memory.
>> 
>> Following is the introduction of the function.
>> If the guest kernel has many fragmentation pages, use virtio_balloon
>> will split THP of QEMU when it calls MADV_DONTNEED madvise to release
>> the balloon pages.
>> This is an example in a VM with 1G memory 1CPU:
>> // This is the THP number before VM execution in the host.
>> // None use THP.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages:         0 kB
>> 
>> // After VM start, use usemem
>> // (https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git)
>> // punch-holes function generates 400m fragmentation pages in the guest
>> // kernel.
>> usemem --punch-holes -s -1 800m &
>> 
>> // This is the THP number after this command in the host.
>> // Some THP is used by VM because usemem will access 800M memory
>> // in the guest.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages:    976896 kB
>> 
>> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
>> (qemu) device_add virtio-balloon-pci,id=balloon1
>> (qemu) info balloon
>> balloon: actual=1024
>> (qemu) balloon 600
>> (qemu) info balloon
>> balloon: actual=600
>> 
>> // This is the THP number after inflate the balloon in the host.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages:    151552 kB
>> 
>> THP number decreased more than 800M.
>> The reason is usemem with punch-holes option will free every other
>> page after allocation.  Then 400M free memory inside the guest kernel
>> is fragmentation pages.
>> The guest kernel will use them to inflate the balloon.  When these
>> fragmentation pages are freed, THP will be split.
>> 
>> This commit tries to handle this with add a new flag
>> VIRTIO_BALLOON_VQ_INFLATE_CONT.
>> When this flag is set, the balloon will try to use continuous pages
>> inflate the balloon.  And the pages order is set to THP order.
>> Then THP pages will be freed together in the host.
>> This is an example in a VM with 1G memory 1CPU:
>> // This is the THP number before VM execution in the host.
>> // None use THP.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages:         0 kB
>> 
>> // After VM start, use usemem punch-holes function generates 400M
>> // fragmentation pages in the guest kernel.
>> usemem --punch-holes -s -1 800m &
>> 
>> // This is the THP number after this command in the host.
>> // Some THP is used by VM because usemem will access 800M memory
>> // in the guest.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages:    976896 kB
>> 
>> // Connect to the QEMU monitor, setup balloon, and set it size to 600M.
>> (qemu) device_add virtio-balloon-pci,id=balloon1,cont-pages=on
>> (qemu) info balloon
>> balloon: actual=1024
>> (qemu) balloon 600
>> (qemu) info balloon
>> balloon: actual=600
>> 
>> // This is the THP number after inflate the balloon in the host.
>> cat /proc/meminfo | grep AnonHugePages:
>> AnonHugePages:    610304 kB
>> 
>> The THP number decreases 358M.  This shows that
>> VIRTIO_BALLOON_VQ_INFLATE_CONT can help handle the THP split issue.
>> 
>> [1] https://lkml.org/lkml/2020/3/12/144
>> 
>> Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
> 
> I'd like to repeat my original idea of doing large page allocations
> unconditionally within guest.
> 

If so, should it transfer the large page information using inflate_vq?

> 
>> ---
>> drivers/virtio/virtio_balloon.c     | 78 ++++++++++++++++++++++++++++++-------
>> include/linux/balloon_compaction.h  |  9 ++++-
>> include/uapi/linux/virtio_balloon.h |  3 ++
>> mm/balloon_compaction.c             | 40 +++++++++++++++----
>> 4 files changed, 109 insertions(+), 21 deletions(-)
>> 
>> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
>> index 341458f..fbd2b02f 100644
>> --- a/drivers/virtio/virtio_balloon.c
>> +++ b/drivers/virtio/virtio_balloon.c
>> @@ -47,6 +47,7 @@ enum virtio_balloon_vq {
>> 	VIRTIO_BALLOON_VQ_DEFLATE,
>> 	VIRTIO_BALLOON_VQ_STATS,
>> 	VIRTIO_BALLOON_VQ_FREE_PAGE,
>> +	VIRTIO_BALLOON_VQ_INFLATE_CONT,
>> 	VIRTIO_BALLOON_VQ_MAX
>> };
>> 
>> @@ -56,7 +57,8 @@ enum virtio_balloon_config_read {
>> 
>> struct virtio_balloon {
>> 	struct virtio_device *vdev;
>> -	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
>> +	struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq,
>> +			 *inflate_cont_vq;
>> 
>> 	/* Balloon's own wq for cpu-intensive work items */
>> 	struct workqueue_struct *balloon_wq;
>> @@ -114,6 +116,12 @@ struct virtio_balloon {
>> 
>> 	/* To register a shrinker to shrink memory upon memory pressure */
>> 	struct shrinker shrinker;
>> +
>> +	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set.
>> +	 * if VIRTIO_BALLOON_F_CONT_PAGES is not set,
>> +	 * pages_order will be set to 0.
>> +	 */
>> +	__u32 pages_order;
>> };
>> 
>> static struct virtio_device_id id_table[] = {
>> @@ -169,10 +177,11 @@ static void set_page_pfns(struct virtio_balloon *vb,
>> 					  page_to_balloon_pfn(page) + i);
>> }
>> 
>> -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>> +static unsigned int fill_balloon_order(struct virtio_balloon *vb, size_t num,
>> +				       unsigned int order)
>> {
>> -	unsigned num_allocated_pages;
>> -	unsigned num_pfns;
>> +	unsigned int num_allocated_pages;
>> +	unsigned int num_pfns;
>> 	struct page *page;
>> 	LIST_HEAD(pages);
>> 
>> @@ -181,12 +190,12 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>> 
>> 	for (num_pfns = 0; num_pfns < num;
>> 	     num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) {
>> -		struct page *page = balloon_page_alloc();
>> +		struct page *page = balloon_pages_alloc(order);
>> 
>> 		if (!page) {
>> 			dev_info_ratelimited(&vb->vdev->dev,
>> -					     "Out of puff! Can't get %u pages\n",
>> -					     VIRTIO_BALLOON_PAGES_PER_PAGE);
>> +				"Out of puff! Can't get %u pages\n",
>> +				VIRTIO_BALLOON_PAGES_PER_PAGE << order);
>> 			/* Sleep for at least 1/5 of a second before retry. */
>> 			msleep(200);
>> 			break;
>> @@ -200,25 +209,48 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
>> 	vb->num_pfns = 0;
>> 
>> 	while ((page = balloon_page_pop(&pages))) {
>> -		balloon_page_enqueue(&vb->vb_dev_info, page);
>> +		balloon_pages_enqueue(&vb->vb_dev_info, page, order);
>> 
>> 		set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
>> -		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
>> +		vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE << order;
>> 		if (!virtio_has_feature(vb->vdev,
>> 					VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
>> -			adjust_managed_page_count(page, -1);
>> +			adjust_managed_page_count(page, -(1 << order));
>> 		vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
>> 	}
>> 
>> -	num_allocated_pages = vb->num_pfns;
>> +	num_allocated_pages = vb->num_pfns << order;
>> 	/* Did we get any? */
>> -	if (vb->num_pfns != 0)
>> -		tell_host(vb, vb->inflate_vq);
>> +	if (vb->num_pfns != 0) {
>> +		if (order)
>> +			tell_host(vb, vb->inflate_cont_vq);
>> +		else
>> +			tell_host(vb, vb->inflate_vq);
>> +	}
>> 	mutex_unlock(&vb->balloon_lock);
>> 
>> 	return num_allocated_pages;
>> }
>> 
>> +static unsigned int fill_balloon(struct virtio_balloon *vb, size_t num)
>> +{
>> +	unsigned int num_allocated_pages = 0;
>> +
>> +	if (vb->pages_order) {
>> +		num_allocated_pages = fill_balloon_order(vb, num,
>> +							 vb->pages_order);
>> +		if (!num_allocated_pages)
>> +			dev_info_ratelimited(&vb->vdev->dev,
>> +				"Fail get any continuous pages in order %u\n",
>> +				vb->pages_order);
>> +	}
>> +
>> +	if (!num_allocated_pages)
>> +		num_allocated_pages = fill_balloon_order(vb, num, 0);
>> +
>> +	return num_allocated_pages;
>> +}
>> +
>> static void release_pages_balloon(struct virtio_balloon *vb,
>> 				 struct list_head *pages)
>> {
> 
> 
> Hmm no this is not what was previously suggested. David Hildenbrand proposed
> starting with highest order and iterating until we reach 0.
> And I proposed a new API to allocate some memory between min and max
> orders.
> 
> 
>> @@ -481,6 +513,8 @@ static int init_vqs(struct virtio_balloon *vb)
>> 	names[VIRTIO_BALLOON_VQ_STATS] = NULL;
>> 	callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
>> 	names[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
>> +	callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
>> +	names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = NULL;
>> 
>> 	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
>> 		names[VIRTIO_BALLOON_VQ_STATS] = "stats";
>> @@ -492,6 +526,11 @@ static int init_vqs(struct virtio_balloon *vb)
>> 		callbacks[VIRTIO_BALLOON_VQ_FREE_PAGE] = NULL;
>> 	}
>> 
>> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
>> +		names[VIRTIO_BALLOON_VQ_INFLATE_CONT] = "inflate_cont";
>> +		callbacks[VIRTIO_BALLOON_VQ_INFLATE_CONT] = balloon_ack;
>> +	}
>> +
>> 	err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX,
>> 					 vqs, callbacks, names, NULL, NULL);
>> 	if (err)
>> @@ -524,6 +563,10 @@ static int init_vqs(struct virtio_balloon *vb)
>> 	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
>> 		vb->free_page_vq = vqs[VIRTIO_BALLOON_VQ_FREE_PAGE];
>> 
>> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES))
>> +		vb->inflate_cont_vq
>> +			= vqs[VIRTIO_BALLOON_VQ_INFLATE_CONT];
>> +
>> 	return 0;
>> }
>> 
>> @@ -945,6 +988,14 @@ static int virtballoon_probe(struct virtio_device *vdev)
>> 		if (err)
>> 			goto out_del_balloon_wq;
>> 	}
>> +	if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_CONT_PAGES)) {
>> +		virtio_cread(vb->vdev, struct virtio_balloon_config,
>> +			     pages_order, &vb->pages_order);
>> +		if (!virtio_has_feature(vb->vdev, VIRTIO_F_VERSION_1))
>> +			vb->pages_order
>> +				= le32_to_cpu((__force __le32)vb->pages_order);
>> +	} else
>> +		vb->pages_order = 0;
>> 	virtio_device_ready(vdev);
>> 
>> 	if (towards_target(vb))
>> @@ -1058,6 +1109,7 @@ static unsigned int features[] = {
>> 	VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
>> 	VIRTIO_BALLOON_F_FREE_PAGE_HINT,
>> 	VIRTIO_BALLOON_F_PAGE_POISON,
>> +	VIRTIO_BALLOON_F_CONT_PAGES,
>> };
>> 
>> static struct virtio_driver virtio_balloon_driver = {
>> diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h
>> index 338aa27..8180bbf 100644
>> --- a/include/linux/balloon_compaction.h
>> +++ b/include/linux/balloon_compaction.h
>> @@ -60,15 +60,22 @@ struct balloon_dev_info {
>> 	struct inode *inode;
>> };
>> 
>> -extern struct page *balloon_page_alloc(void);
>> +extern struct page *balloon_pages_alloc(unsigned int order);
>> extern void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>> 				 struct page *page);
>> +extern void balloon_pages_enqueue(struct balloon_dev_info *b_dev_info,
>> +				  struct page *page, unsigned int order);
>> extern struct page *balloon_page_dequeue(struct balloon_dev_info *b_dev_info);
>> extern size_t balloon_page_list_enqueue(struct balloon_dev_info *b_dev_info,
>> 				      struct list_head *pages);
>> extern size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>> 				     struct list_head *pages, size_t n_req_pages);
>> 
>> +static inline struct page *balloon_page_alloc(void)
>> +{
>> +	return balloon_pages_alloc(0);
>> +}
>> +
>> static inline void balloon_devinfo_init(struct balloon_dev_info *balloon)
>> {
>> 	balloon->isolated_pages = 0;
>> diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
>> index a1966cd7..e21b0aa 100644
>> --- a/include/uapi/linux/virtio_balloon.h
>> +++ b/include/uapi/linux/virtio_balloon.h
>> @@ -36,6 +36,7 @@
>> #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
>> #define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
>> #define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
>> +#define VIRTIO_BALLOON_F_CONT_PAGES	5 /* VQ to report continuous pages */
>> 
>> /* Size of a PFN in the balloon interface. */
>> #define VIRTIO_BALLOON_PFN_SHIFT 12
>> @@ -51,6 +52,8 @@ struct virtio_balloon_config {
>> 	__u32 free_page_report_cmd_id;
>> 	/* Stores PAGE_POISON if page poisoning is in use */
>> 	__u32 poison_val;
>> +	/* Pages order if VIRTIO_BALLOON_F_CONT_PAGES is set */
>> +	__u32 pages_order;
>> };
>> 
>> #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
>> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
>> index 26de020..87df4b2 100644
>> --- a/mm/balloon_compaction.c
>> +++ b/mm/balloon_compaction.c
>> @@ -112,8 +112,8 @@ size_t balloon_page_list_dequeue(struct balloon_dev_info *b_dev_info,
>> EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
>> 
>> /*
>> - * balloon_page_alloc - allocates a new page for insertion into the balloon
>> - *			page list.
>> + * balloon_pages_alloc - allocates a new page for insertion into the balloon
>> + *			 page list.
>>  *
>>  * Driver must call this function to properly allocate a new balloon page.
>>  * Driver must call balloon_page_enqueue before definitively removing the page
>> @@ -121,14 +121,21 @@ EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
>>  *
>>  * Return: struct page for the allocated page or NULL on allocation failure.
>>  */
>> -struct page *balloon_page_alloc(void)
>> +struct page *balloon_pages_alloc(unsigned int order)
>> {
>> -	struct page *page = alloc_page(balloon_mapping_gfp_mask() |
>> -				       __GFP_NOMEMALLOC | __GFP_NORETRY |
>> -				       __GFP_NOWARN);
>> +	gfp_t gfp_mask;
>> +
>> +	if (order > 1)
>> +		gfp_mask = __GFP_RETRY_MAYFAIL;
>> +	else
>> +		gfp_mask = __GFP_NORETRY;
>> +
>> +	struct page *page = alloc_pages(balloon_mapping_gfp_mask() |
>> +					gfp_mask | __GFP_NOMEMALLOC |
>> +					__GFP_NOWARN, order);
>> 	return page;
>> }
>> -EXPORT_SYMBOL_GPL(balloon_page_alloc);
>> +EXPORT_SYMBOL_GPL(balloon_pages_alloc);
>> 
>> /*
>>  * balloon_page_enqueue - inserts a new page into the balloon page list.
>> @@ -155,6 +162,25 @@ void balloon_page_enqueue(struct balloon_dev_info *b_dev_info,
>> EXPORT_SYMBOL_GPL(balloon_page_enqueue);
>> 
>> /*
>> + * balloon_pages_enqueue - inserts continuous pages into the balloon page list.
>> + */
>> +void balloon_pages_enqueue(struct balloon_dev_info *b_dev_info,
>> +			   struct page *page, unsigned int order)
>> +{
>> +	unsigned long flags;
>> +	unsigned long pfn = page_to_pfn(page);
>> +	unsigned long last_pfn = pfn + (1 << order) - 1;
>> +
>> +	spin_lock_irqsave(&b_dev_info->pages_lock, flags);
>> +	for (; pfn <= last_pfn; pfn++) {
>> +		page = pfn_to_page(pfn);
>> +		balloon_page_enqueue_one(b_dev_info, page);
>> +	}
>> +	spin_unlock_irqrestore(&b_dev_info->pages_lock, flags);
>> +}
>> +EXPORT_SYMBOL_GPL(balloon_pages_enqueue);
>> +
>> +/*
>>  * balloon_page_dequeue - removes a page from balloon's page list and returns
>>  *			  its address to allow the driver to release the page.
>>  * @b_dev_info: balloon device decriptor where we will grab a page from.
>> -- 
>> 2.7.4



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-22 16:04 [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue Hui Zhu
2020-03-22 16:04 ` [PATCH for QEMU v2] virtio-balloon: Add option cont-pages to set VIRTIO_BALLOON_VQ_INFLATE_CONT Hui Zhu
2020-03-22 16:19   ` no-reply
2020-03-26  7:06   ` teawater
2020-03-26  7:20     ` David Hildenbrand
2020-03-26  7:25   ` Michael S. Tsirkin
2020-03-26  6:45 ` [PATCH for Linux v2] virtio_balloon: Add VIRTIO_BALLOON_VQ_INFLATE_CONT to handle THP split issue teawater
2020-03-26  7:17 ` Michael S. Tsirkin
2020-04-09  6:05   ` teawater

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git