All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v1 0/4] virtio-balloon: support free page reporting
@ 2018-01-17  6:31 ` Wei Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-17  6:31 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_VQ to the virtio-balloon device. The device
receives the guest free page hint from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred to the destination. Please see the results in the
commit log of patch 1.

Link to the driver patches:
https://marc.info/?l=kvm&m=151616696828185&w=2

Wei Wang (4):
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
  migration: call balloon to clear bits of free pages from dirty bitmap
  virtio-balloon: add a timer to limit the free page report wating time
  virtio-balloon: Don't skip free pages if the poison val is non-zero

 balloon.c                                       |  46 ++++-
 hw/virtio/virtio-balloon.c                      | 224 ++++++++++++++++++++++--
 hw/virtio/virtio-pci.c                          |   3 +
 include/hw/virtio/virtio-balloon.h              |  12 +-
 include/migration/misc.h                        |   3 +
 include/standard-headers/linux/virtio_balloon.h |   6 +
 include/sysemu/balloon.h                        |  15 +-
 migration/ram.c                                 |  38 +++-
 8 files changed, 316 insertions(+), 31 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [virtio-dev] [PATCH v1 0/4] virtio-balloon: support free page reporting
@ 2018-01-17  6:31 ` Wei Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-17  6:31 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_VQ to the virtio-balloon device. The device
receives the guest free page hint from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred to the destination. Please see the results in the
commit log of patch 1.

Link to the driver patches:
https://marc.info/?l=kvm&m=151616696828185&w=2

Wei Wang (4):
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
  migration: call balloon to clear bits of free pages from dirty bitmap
  virtio-balloon: add a timer to limit the free page report wating time
  virtio-balloon: Don't skip free pages if the poison val is non-zero

 balloon.c                                       |  46 ++++-
 hw/virtio/virtio-balloon.c                      | 224 ++++++++++++++++++++++--
 hw/virtio/virtio-pci.c                          |   3 +
 include/hw/virtio/virtio-balloon.h              |  12 +-
 include/migration/misc.h                        |   3 +
 include/standard-headers/linux/virtio_balloon.h |   6 +
 include/sysemu/balloon.h                        |  15 +-
 migration/ram.c                                 |  38 +++-
 8 files changed, 316 insertions(+), 31 deletions(-)

-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v1 1/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
  2018-01-17  6:31 ` [virtio-dev] " Wei Wang
@ 2018-01-17  6:31   ` Wei Wang
  -1 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-17  6:31 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The new feature enables the virtio-balloon device to receive the hint of
guest free pages from the free page vq, and clears the corresponding bits
of the free page from the dirty bitmap, so that those free pages are not
transferred by the migration thread.

Without this feature, to local live migrate an 8G idle guest takes
~2286 ms. With this featrue, it takes ~260 ms, which redues the
migration time to ~11%.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 balloon.c                                       |  46 ++++++--
 hw/virtio/virtio-balloon.c                      | 140 +++++++++++++++++++++---
 include/hw/virtio/virtio-balloon.h              |   7 +-
 include/migration/misc.h                        |   3 +
 include/standard-headers/linux/virtio_balloon.h |   4 +
 include/sysemu/balloon.h                        |  15 ++-
 migration/ram.c                                 |  10 ++
 7 files changed, 198 insertions(+), 27 deletions(-)

diff --git a/balloon.c b/balloon.c
index 1d720ff..480a989 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,9 @@
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
+static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
+static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
+static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
 static void *balloon_opaque;
 static bool balloon_inhibited;
 
@@ -64,17 +67,41 @@ static bool have_balloon(Error **errp)
     return true;
 }
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-                             QEMUBalloonStatus *stat_func, void *opaque)
+bool balloon_free_page_support(void)
 {
-    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
-        /* We're already registered one balloon handler.  How many can
-         * a guest really have?
-         */
+    return balloon_free_page_support_fn &&
+           balloon_free_page_support_fn(balloon_opaque);
+}
+
+void balloon_free_page_start(void)
+{
+    balloon_free_page_start_fn(balloon_opaque);
+}
+
+void balloon_free_page_stop(void)
+{
+    balloon_free_page_stop_fn(balloon_opaque);
+}
+
+int qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                             QEMUBalloonStatus *stat_fn,
+                             QEMUBalloonFreePageSupport *free_page_support_fn,
+                             QEMUBalloonFreePageStart *free_page_start_fn,
+                             QEMUBalloonFreePageStop *free_page_stop_fn,
+                             void *opaque)
+{
+    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
+        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
+        balloon_opaque) {
+        /* We already registered one balloon handler. */
         return -1;
     }
-    balloon_event_fn = event_func;
-    balloon_stat_fn = stat_func;
+
+    balloon_event_fn = event_fn;
+    balloon_stat_fn = stat_fn;
+    balloon_free_page_support_fn = free_page_support_fn;
+    balloon_free_page_start_fn = free_page_start_fn;
+    balloon_free_page_stop_fn = free_page_stop_fn;
     balloon_opaque = opaque;
     return 0;
 }
@@ -86,6 +113,9 @@ void qemu_remove_balloon_handler(void *opaque)
     }
     balloon_event_fn = NULL;
     balloon_stat_fn = NULL;
+    balloon_free_page_support_fn = NULL;
+    balloon_free_page_start_fn = NULL;
+    balloon_free_page_stop_fn = NULL;
     balloon_opaque = NULL;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 14e08d2..a2a5536 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -30,6 +30,7 @@
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "migration/misc.h"
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
 
@@ -73,6 +74,13 @@ static bool balloon_stats_supported(const VirtIOBalloon *s)
     return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ);
 }
 
+static bool balloon_free_page_supported(const VirtIOBalloon *s)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_VQ);
+}
+
 static bool balloon_stats_enabled(const VirtIOBalloon *s)
 {
     return s->stats_poll_interval > 0;
@@ -305,6 +313,85 @@ out:
     }
 }
 
+static void virtio_balloon_handle_free_pages(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+    VirtQueueElement *elem;
+    uint32_t size, id;
+
+    for (;;) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            break;
+        }
+
+        if (elem->out_num) {
+            iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(uint32_t));
+            size = elem->out_sg[0].iov_len;
+            if (id == dev->free_page_report_cmd_id) {
+                atomic_set(&dev->free_page_report_status,
+                           FREE_PAGE_REPORT_S_IN_PROGRESS);
+            } else {
+                atomic_set(&dev->free_page_report_status,
+                           FREE_PAGE_REPORT_S_STOP);
+            }
+        }
+
+        if (elem->in_num) {
+            RAMBlock *block;
+            ram_addr_t offset;
+
+            if (atomic_read(&dev->free_page_report_status) ==
+                FREE_PAGE_REPORT_S_IN_PROGRESS) {
+                block = qemu_ram_block_from_host(elem->in_sg[0].iov_base,
+                                                 false, &offset);
+                size = elem->in_sg[0].iov_len;
+                skip_free_pages_from_dirty_bitmap(block, offset, size);
+            }
+        }
+
+        virtqueue_push(vq, elem, sizeof(id));
+        g_free(elem);
+    }
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+
+    if (!balloon_free_page_supported(s)) {
+        return false;
+    }
+
+    return true;
+}
+
+static void virtio_balloon_free_page_start(void *opaque)
+{
+    VirtIOBalloon *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+
+    dev->free_page_report_cmd_id++;
+    virtio_notify_config(vdev);
+    atomic_set(&dev->free_page_report_status, FREE_PAGE_REPORT_S_START);
+}
+
+static void virtio_balloon_free_page_stop(void *opaque)
+{
+    VirtIOBalloon *dev = opaque;
+
+    /* The guest has done the report */
+    if (atomic_read(&dev->free_page_report_status) ==
+        FREE_PAGE_REPORT_S_STOP) {
+        return;
+    }
+
+    /* Wait till a stop sign is received from the guest */
+    while (atomic_read(&dev->free_page_report_status) !=
+           FREE_PAGE_REPORT_S_STOP)
+        ;
+}
+
 static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
@@ -312,6 +399,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
+    config.free_page_report_cmd_id = cpu_to_le32(dev->free_page_report_cmd_id);
 
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
@@ -418,6 +506,7 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(num_pages, VirtIOBalloon),
         VMSTATE_UINT32(actual, VirtIOBalloon),
+        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
         VMSTATE_END_OF_LIST()
     },
 };
@@ -426,24 +515,18 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
-    int ret;
 
     virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
                 sizeof(struct virtio_balloon_config));
 
-    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
-                                   virtio_balloon_stat, s);
-
-    if (ret < 0) {
-        error_setg(errp, "Only one balloon device is supported");
-        virtio_cleanup(vdev);
-        return;
-    }
-
     s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
-
+    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_FREE_PAGE_VQ)) {
+        s->free_page_vq = virtio_add_queue(vdev, 128,
+                                           virtio_balloon_handle_free_pages);
+        atomic_set(&s->free_page_report_status, FREE_PAGE_REPORT_S_STOP);
+    }
     reset_stats(s);
 }
 
@@ -471,12 +554,35 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+    int ret = 0;
+
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (!s->stats_vq_elem && vdev->vm_running &&
+            virtqueue_rewind(s->svq, 1)) {
+            /*
+             * Poll stats queue for the element we have discarded when the VM
+             * was stopped.
+             */
+            virtio_balloon_receive_stats(vdev, s->svq);
+        }
 
-    if (!s->stats_vq_elem && vdev->vm_running &&
-        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
-        /* poll stats queue for the element we have discarded when the VM
-         * was stopped */
-        virtio_balloon_receive_stats(vdev, s->svq);
+        if (balloon_free_page_supported(s)) {
+            ret = qemu_add_balloon_handler(virtio_balloon_to_target,
+                                           virtio_balloon_stat,
+                                           virtio_balloon_free_page_support,
+                                           virtio_balloon_free_page_start,
+                                           virtio_balloon_free_page_stop,
+                                           s);
+        } else {
+            ret = qemu_add_balloon_handler(virtio_balloon_to_target,
+                                           virtio_balloon_stat, NULL, NULL,
+                                           NULL, s);
+        }
+        if (ret < 0) {
+                fprintf(stderr, "Only one balloon device is supported\n");
+                virtio_cleanup(vdev);
+                return;
+        }
     }
 }
 
@@ -506,6 +612,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
 static Property virtio_balloon_properties[] = {
     DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
+    DEFINE_PROP_BIT("free-page-vq", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_FREE_PAGE_VQ, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 1ea13bd..b84b4af 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -31,11 +31,16 @@ typedef struct virtio_balloon_stat_modern {
        uint64_t val;
 } VirtIOBalloonStatModern;
 
+#define FREE_PAGE_REPORT_S_START        0
+#define FREE_PAGE_REPORT_S_IN_PROGRESS  1
+#define FREE_PAGE_REPORT_S_STOP         2
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
+    uint32_t free_page_report_cmd_id;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 77fd4f5..6df419c 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,14 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void skip_free_pages_from_dirty_bitmap(RAMBlock *block, ram_addr_t offset,
+                                       size_t len);
 
 /* migration/block.c */
 
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 9d06ccd..596df5d 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,15 +34,19 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_VQ	3 /* VQ to report free pages */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
 struct virtio_balloon_config {
 	/* Number of pages host wants Guest to give up. */
 	uint32_t num_pages;
 	/* Number of pages we've actually got in balloon. */
 	uint32_t actual;
+        /* The free_page report command id, readonly by guest */
+	uint32_t free_page_report_cmd_id;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index af49e19..cfee3ca 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -18,11 +18,22 @@
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
 typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
+typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
+typedef void (QEMUBalloonFreePageStart)(void *opaque);
+typedef void (QEMUBalloonFreePageStop)(void *opaque);
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 bool qemu_balloon_is_inhibited(void);
 void qemu_balloon_inhibit(bool state);
+bool balloon_free_page_support(void);
+void balloon_free_page_start(void);
+void balloon_free_page_stop(void);
+
+int qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                             QEMUBalloonStatus *stat_fn,
+                             QEMUBalloonFreePageSupport *free_page_support_fn,
+                             QEMUBalloonFreePageStart *free_page_start_fn,
+                             QEMUBalloonFreePageStop *free_page_stop_fn,
+                             void *opaque);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index cb1950f..d6f462c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2186,6 +2186,16 @@ static int ram_init_all(RAMState **rsp)
     return 0;
 }
 
+void skip_free_pages_from_dirty_bitmap(RAMBlock *block, ram_addr_t offset,
+                                       size_t len)
+{
+    long start = offset >> TARGET_PAGE_BITS,
+         nr = len >> TARGET_PAGE_BITS;
+
+    bitmap_clear(block->bmap, start, nr);
+    ram_state->migration_dirty_pages -= nr;
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [virtio-dev] [PATCH v1 1/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
@ 2018-01-17  6:31   ` Wei Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-17  6:31 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The new feature enables the virtio-balloon device to receive the hint of
guest free pages from the free page vq, and clears the corresponding bits
of the free page from the dirty bitmap, so that those free pages are not
transferred by the migration thread.

Without this feature, to local live migrate an 8G idle guest takes
~2286 ms. With this featrue, it takes ~260 ms, which redues the
migration time to ~11%.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 balloon.c                                       |  46 ++++++--
 hw/virtio/virtio-balloon.c                      | 140 +++++++++++++++++++++---
 include/hw/virtio/virtio-balloon.h              |   7 +-
 include/migration/misc.h                        |   3 +
 include/standard-headers/linux/virtio_balloon.h |   4 +
 include/sysemu/balloon.h                        |  15 ++-
 migration/ram.c                                 |  10 ++
 7 files changed, 198 insertions(+), 27 deletions(-)

diff --git a/balloon.c b/balloon.c
index 1d720ff..480a989 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,9 @@
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
+static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
+static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
+static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
 static void *balloon_opaque;
 static bool balloon_inhibited;
 
@@ -64,17 +67,41 @@ static bool have_balloon(Error **errp)
     return true;
 }
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-                             QEMUBalloonStatus *stat_func, void *opaque)
+bool balloon_free_page_support(void)
 {
-    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
-        /* We're already registered one balloon handler.  How many can
-         * a guest really have?
-         */
+    return balloon_free_page_support_fn &&
+           balloon_free_page_support_fn(balloon_opaque);
+}
+
+void balloon_free_page_start(void)
+{
+    balloon_free_page_start_fn(balloon_opaque);
+}
+
+void balloon_free_page_stop(void)
+{
+    balloon_free_page_stop_fn(balloon_opaque);
+}
+
+int qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                             QEMUBalloonStatus *stat_fn,
+                             QEMUBalloonFreePageSupport *free_page_support_fn,
+                             QEMUBalloonFreePageStart *free_page_start_fn,
+                             QEMUBalloonFreePageStop *free_page_stop_fn,
+                             void *opaque)
+{
+    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
+        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
+        balloon_opaque) {
+        /* We already registered one balloon handler. */
         return -1;
     }
-    balloon_event_fn = event_func;
-    balloon_stat_fn = stat_func;
+
+    balloon_event_fn = event_fn;
+    balloon_stat_fn = stat_fn;
+    balloon_free_page_support_fn = free_page_support_fn;
+    balloon_free_page_start_fn = free_page_start_fn;
+    balloon_free_page_stop_fn = free_page_stop_fn;
     balloon_opaque = opaque;
     return 0;
 }
@@ -86,6 +113,9 @@ void qemu_remove_balloon_handler(void *opaque)
     }
     balloon_event_fn = NULL;
     balloon_stat_fn = NULL;
+    balloon_free_page_support_fn = NULL;
+    balloon_free_page_start_fn = NULL;
+    balloon_free_page_stop_fn = NULL;
     balloon_opaque = NULL;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 14e08d2..a2a5536 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -30,6 +30,7 @@
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "migration/misc.h"
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
 
@@ -73,6 +74,13 @@ static bool balloon_stats_supported(const VirtIOBalloon *s)
     return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ);
 }
 
+static bool balloon_free_page_supported(const VirtIOBalloon *s)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_VQ);
+}
+
 static bool balloon_stats_enabled(const VirtIOBalloon *s)
 {
     return s->stats_poll_interval > 0;
@@ -305,6 +313,85 @@ out:
     }
 }
 
+static void virtio_balloon_handle_free_pages(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+    VirtQueueElement *elem;
+    uint32_t size, id;
+
+    for (;;) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            break;
+        }
+
+        if (elem->out_num) {
+            iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(uint32_t));
+            size = elem->out_sg[0].iov_len;
+            if (id == dev->free_page_report_cmd_id) {
+                atomic_set(&dev->free_page_report_status,
+                           FREE_PAGE_REPORT_S_IN_PROGRESS);
+            } else {
+                atomic_set(&dev->free_page_report_status,
+                           FREE_PAGE_REPORT_S_STOP);
+            }
+        }
+
+        if (elem->in_num) {
+            RAMBlock *block;
+            ram_addr_t offset;
+
+            if (atomic_read(&dev->free_page_report_status) ==
+                FREE_PAGE_REPORT_S_IN_PROGRESS) {
+                block = qemu_ram_block_from_host(elem->in_sg[0].iov_base,
+                                                 false, &offset);
+                size = elem->in_sg[0].iov_len;
+                skip_free_pages_from_dirty_bitmap(block, offset, size);
+            }
+        }
+
+        virtqueue_push(vq, elem, sizeof(id));
+        g_free(elem);
+    }
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+
+    if (!balloon_free_page_supported(s)) {
+        return false;
+    }
+
+    return true;
+}
+
+static void virtio_balloon_free_page_start(void *opaque)
+{
+    VirtIOBalloon *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+
+    dev->free_page_report_cmd_id++;
+    virtio_notify_config(vdev);
+    atomic_set(&dev->free_page_report_status, FREE_PAGE_REPORT_S_START);
+}
+
+static void virtio_balloon_free_page_stop(void *opaque)
+{
+    VirtIOBalloon *dev = opaque;
+
+    /* The guest has done the report */
+    if (atomic_read(&dev->free_page_report_status) ==
+        FREE_PAGE_REPORT_S_STOP) {
+        return;
+    }
+
+    /* Wait till a stop sign is received from the guest */
+    while (atomic_read(&dev->free_page_report_status) !=
+           FREE_PAGE_REPORT_S_STOP)
+        ;
+}
+
 static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
@@ -312,6 +399,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
+    config.free_page_report_cmd_id = cpu_to_le32(dev->free_page_report_cmd_id);
 
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
@@ -418,6 +506,7 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
     .fields = (VMStateField[]) {
         VMSTATE_UINT32(num_pages, VirtIOBalloon),
         VMSTATE_UINT32(actual, VirtIOBalloon),
+        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
         VMSTATE_END_OF_LIST()
     },
 };
@@ -426,24 +515,18 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
-    int ret;
 
     virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
                 sizeof(struct virtio_balloon_config));
 
-    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
-                                   virtio_balloon_stat, s);
-
-    if (ret < 0) {
-        error_setg(errp, "Only one balloon device is supported");
-        virtio_cleanup(vdev);
-        return;
-    }
-
     s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
-
+    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_FREE_PAGE_VQ)) {
+        s->free_page_vq = virtio_add_queue(vdev, 128,
+                                           virtio_balloon_handle_free_pages);
+        atomic_set(&s->free_page_report_status, FREE_PAGE_REPORT_S_STOP);
+    }
     reset_stats(s);
 }
 
@@ -471,12 +554,35 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
+    int ret = 0;
+
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (!s->stats_vq_elem && vdev->vm_running &&
+            virtqueue_rewind(s->svq, 1)) {
+            /*
+             * Poll stats queue for the element we have discarded when the VM
+             * was stopped.
+             */
+            virtio_balloon_receive_stats(vdev, s->svq);
+        }
 
-    if (!s->stats_vq_elem && vdev->vm_running &&
-        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
-        /* poll stats queue for the element we have discarded when the VM
-         * was stopped */
-        virtio_balloon_receive_stats(vdev, s->svq);
+        if (balloon_free_page_supported(s)) {
+            ret = qemu_add_balloon_handler(virtio_balloon_to_target,
+                                           virtio_balloon_stat,
+                                           virtio_balloon_free_page_support,
+                                           virtio_balloon_free_page_start,
+                                           virtio_balloon_free_page_stop,
+                                           s);
+        } else {
+            ret = qemu_add_balloon_handler(virtio_balloon_to_target,
+                                           virtio_balloon_stat, NULL, NULL,
+                                           NULL, s);
+        }
+        if (ret < 0) {
+                fprintf(stderr, "Only one balloon device is supported\n");
+                virtio_cleanup(vdev);
+                return;
+        }
     }
 }
 
@@ -506,6 +612,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
 static Property virtio_balloon_properties[] = {
     DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
+    DEFINE_PROP_BIT("free-page-vq", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_FREE_PAGE_VQ, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 1ea13bd..b84b4af 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -31,11 +31,16 @@ typedef struct virtio_balloon_stat_modern {
        uint64_t val;
 } VirtIOBalloonStatModern;
 
+#define FREE_PAGE_REPORT_S_START        0
+#define FREE_PAGE_REPORT_S_IN_PROGRESS  1
+#define FREE_PAGE_REPORT_S_STOP         2
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
+    uint32_t free_page_report_cmd_id;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 77fd4f5..6df419c 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,14 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void skip_free_pages_from_dirty_bitmap(RAMBlock *block, ram_addr_t offset,
+                                       size_t len);
 
 /* migration/block.c */
 
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 9d06ccd..596df5d 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,15 +34,19 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_VQ	3 /* VQ to report free pages */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
 struct virtio_balloon_config {
 	/* Number of pages host wants Guest to give up. */
 	uint32_t num_pages;
 	/* Number of pages we've actually got in balloon. */
 	uint32_t actual;
+        /* The free_page report command id, readonly by guest */
+	uint32_t free_page_report_cmd_id;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index af49e19..cfee3ca 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -18,11 +18,22 @@
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
 typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
+typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
+typedef void (QEMUBalloonFreePageStart)(void *opaque);
+typedef void (QEMUBalloonFreePageStop)(void *opaque);
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 bool qemu_balloon_is_inhibited(void);
 void qemu_balloon_inhibit(bool state);
+bool balloon_free_page_support(void);
+void balloon_free_page_start(void);
+void balloon_free_page_stop(void);
+
+int qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                             QEMUBalloonStatus *stat_fn,
+                             QEMUBalloonFreePageSupport *free_page_support_fn,
+                             QEMUBalloonFreePageStart *free_page_start_fn,
+                             QEMUBalloonFreePageStop *free_page_stop_fn,
+                             void *opaque);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index cb1950f..d6f462c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2186,6 +2186,16 @@ static int ram_init_all(RAMState **rsp)
     return 0;
 }
 
+void skip_free_pages_from_dirty_bitmap(RAMBlock *block, ram_addr_t offset,
+                                       size_t len)
+{
+    long start = offset >> TARGET_PAGE_BITS,
+         nr = len >> TARGET_PAGE_BITS;
+
+    bitmap_clear(block->bmap, start, nr);
+    ram_state->migration_dirty_pages -= nr;
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v1 2/4] migration: call balloon to clear bits of free pages from dirty bitmap
  2018-01-17  6:31 ` [virtio-dev] " Wei Wang
@ 2018-01-17  6:31   ` Wei Wang
  -1 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-17  6:31 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

When migration starts, call the related balloon functions to clear the
bits of guest free pages from the dirty bitmap. The dirty bitmap should
be ready to use when sending pages to the destination, so stop the guest
from reporting free pages before sending pages.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d6f462c..9b8d102 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -49,6 +49,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "sysemu/balloon.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -206,6 +207,10 @@ struct RAMState {
     uint32_t last_version;
     /* We are in the first round */
     bool ram_bulk_stage;
+    /* The feature, skipping the transfer of free pages, is supported */
+    bool free_page_support;
+    /* Skip the transfer of free pages in the bulk stage */
+    bool free_page_done;
     /* How many times we have dirty too many pages */
     int dirty_rate_high_cnt;
     /* these variables are used for bitmap sync */
@@ -773,7 +778,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     unsigned long *bitmap = rb->bmap;
     unsigned long next;
 
-    if (rs->ram_bulk_stage && start > 0) {
+    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
         next = start + 1;
     } else {
         next = find_next_bit(bitmap, size, start);
@@ -1653,6 +1658,8 @@ static void ram_state_reset(RAMState *rs)
     rs->last_page = 0;
     rs->last_version = ram_list.version;
     rs->ram_bulk_stage = true;
+    rs->free_page_support = balloon_free_page_support();
+    rs->free_page_done = false;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -2135,7 +2142,7 @@ static int ram_state_init(RAMState **rsp)
     return 0;
 }
 
-static void ram_list_init_bitmaps(void)
+static void ram_list_init_bitmaps(RAMState *rs)
 {
     RAMBlock *block;
     unsigned long pages;
@@ -2145,7 +2152,11 @@ static void ram_list_init_bitmaps(void)
         QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
             pages = block->max_length >> TARGET_PAGE_BITS;
             block->bmap = bitmap_new(pages);
-            bitmap_set(block->bmap, 0, pages);
+            if (rs->free_page_support) {
+                bitmap_set(block->bmap, 1, pages);
+            } else {
+                bitmap_set(block->bmap, 0, pages);
+            }
             if (migrate_postcopy_ram()) {
                 block->unsentmap = bitmap_new(pages);
                 bitmap_set(block->unsentmap, 0, pages);
@@ -2161,8 +2172,11 @@ static void ram_init_bitmaps(RAMState *rs)
     qemu_mutex_lock_ramlist();
     rcu_read_lock();
 
-    ram_list_init_bitmaps();
+    ram_list_init_bitmaps(rs);
     memory_global_dirty_log_start();
+    if (rs->free_page_support) {
+        balloon_free_page_start();
+    }
     migration_bitmap_sync(rs);
 
     rcu_read_unlock();
@@ -2275,6 +2289,12 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 
     ram_control_before_iterate(f, RAM_CONTROL_ROUND);
 
+    /* Before sending the pages, stop the guest from reporting free pages. */
+    if (rs->free_page_support && !rs->free_page_done) {
+        balloon_free_page_stop();
+        rs->free_page_done = true;
+    }
+
     t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     i = 0;
     while ((ret = qemu_file_rate_limit(f)) == 0) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [virtio-dev] [PATCH v1 2/4] migration: call balloon to clear bits of free pages from dirty bitmap
@ 2018-01-17  6:31   ` Wei Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-17  6:31 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

When migration starts, call the related balloon functions to clear the
bits of guest free pages from the dirty bitmap. The dirty bitmap should
be ready to use when sending pages to the destination, so stop the guest
from reporting free pages before sending pages.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d6f462c..9b8d102 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -49,6 +49,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "sysemu/balloon.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -206,6 +207,10 @@ struct RAMState {
     uint32_t last_version;
     /* We are in the first round */
     bool ram_bulk_stage;
+    /* The feature, skipping the transfer of free pages, is supported */
+    bool free_page_support;
+    /* Skip the transfer of free pages in the bulk stage */
+    bool free_page_done;
     /* How many times we have dirty too many pages */
     int dirty_rate_high_cnt;
     /* these variables are used for bitmap sync */
@@ -773,7 +778,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     unsigned long *bitmap = rb->bmap;
     unsigned long next;
 
-    if (rs->ram_bulk_stage && start > 0) {
+    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
         next = start + 1;
     } else {
         next = find_next_bit(bitmap, size, start);
@@ -1653,6 +1658,8 @@ static void ram_state_reset(RAMState *rs)
     rs->last_page = 0;
     rs->last_version = ram_list.version;
     rs->ram_bulk_stage = true;
+    rs->free_page_support = balloon_free_page_support();
+    rs->free_page_done = false;
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -2135,7 +2142,7 @@ static int ram_state_init(RAMState **rsp)
     return 0;
 }
 
-static void ram_list_init_bitmaps(void)
+static void ram_list_init_bitmaps(RAMState *rs)
 {
     RAMBlock *block;
     unsigned long pages;
@@ -2145,7 +2152,11 @@ static void ram_list_init_bitmaps(void)
         QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
             pages = block->max_length >> TARGET_PAGE_BITS;
             block->bmap = bitmap_new(pages);
-            bitmap_set(block->bmap, 0, pages);
+            if (rs->free_page_support) {
+                bitmap_set(block->bmap, 1, pages);
+            } else {
+                bitmap_set(block->bmap, 0, pages);
+            }
             if (migrate_postcopy_ram()) {
                 block->unsentmap = bitmap_new(pages);
                 bitmap_set(block->unsentmap, 0, pages);
@@ -2161,8 +2172,11 @@ static void ram_init_bitmaps(RAMState *rs)
     qemu_mutex_lock_ramlist();
     rcu_read_lock();
 
-    ram_list_init_bitmaps();
+    ram_list_init_bitmaps(rs);
     memory_global_dirty_log_start();
+    if (rs->free_page_support) {
+        balloon_free_page_start();
+    }
     migration_bitmap_sync(rs);
 
     rcu_read_unlock();
@@ -2275,6 +2289,12 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 
     ram_control_before_iterate(f, RAM_CONTROL_ROUND);
 
+    /* Before sending the pages, stop the guest from reporting free pages. */
+    if (rs->free_page_support && !rs->free_page_done) {
+        balloon_free_page_stop();
+        rs->free_page_done = true;
+    }
+
     t0 = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
     i = 0;
     while ((ret = qemu_file_rate_limit(f)) == 0) {
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v1 3/4] virtio-balloon: add a timer to limit the free page report wating time
  2018-01-17  6:31 ` [virtio-dev] " Wei Wang
@ 2018-01-17  6:31   ` Wei Wang
  -1 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-17  6:31 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This patch adds a timer to limit the time that the host waits for the
free pages reported by the guest. Users can specify the time in ms via
"free-page-wait-time" command line option. If a user doesn't specify a
time, the host waits till the guest finishes reporting all the free
pages. The policy (wait for all the free pages to be reported or use a
time limit) is determined by the orchestration layer.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c         | 85 +++++++++++++++++++++++++++++++++++++-
 hw/virtio/virtio-pci.c             |  3 ++
 include/hw/virtio/virtio-balloon.h |  4 ++
 3 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a2a5536..dcc8cb3 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -213,6 +213,66 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
     balloon_stats_change_timer(s, 0);
 }
 
+static void balloon_free_page_change_timer(VirtIOBalloon *s, int64_t ms)
+{
+    timer_mod(s->free_page_timer,
+              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + ms);
+}
+
+static void balloon_stop_free_page_report(void *opaque)
+{
+    VirtIOBalloon *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+
+    timer_del(dev->free_page_timer);
+    timer_free(dev->free_page_timer);
+    dev->free_page_timer = NULL;
+
+    if (atomic_read(&dev->free_page_report_status) ==
+        FREE_PAGE_REPORT_S_IN_PROGRESS) {
+        dev->host_stop_free_page = true;
+        virtio_notify_config(vdev);
+    }
+}
+
+static void balloon_free_page_get_wait_time(Object *obj, Visitor *v,
+                                            const char *name, void *opaque,
+                                            Error **errp)
+{
+    VirtIOBalloon *s = opaque;
+
+    visit_type_int(v, name, &s->free_page_wait_time, errp);
+}
+
+static void balloon_free_page_set_wait_time(Object *obj, Visitor *v,
+                                            const char *name, void *opaque,
+                                            Error **errp)
+{
+    VirtIOBalloon *s = opaque;
+    Error *local_err = NULL;
+    int64_t value;
+
+    visit_type_int(v, name, &value, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    if (value < 0) {
+        error_setg(errp, "free page wait time must be greater than zero");
+        return;
+    }
+
+    if (value > UINT32_MAX) {
+        error_setg(errp, "free page wait time value is too big");
+        return;
+    }
+
+    s->free_page_wait_time = value;
+    g_assert(s->free_page_timer == NULL);
+    s->free_page_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
+                                      balloon_stop_free_page_report, s);
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -332,6 +392,7 @@ static void virtio_balloon_handle_free_pages(VirtIODevice *vdev, VirtQueue *vq)
                 atomic_set(&dev->free_page_report_status,
                            FREE_PAGE_REPORT_S_IN_PROGRESS);
             } else {
+                dev->host_stop_free_page = false;
                 atomic_set(&dev->free_page_report_status,
                            FREE_PAGE_REPORT_S_STOP);
             }
@@ -386,6 +447,10 @@ static void virtio_balloon_free_page_stop(void *opaque)
         return;
     }
 
+    if (dev->free_page_wait_time) {
+        balloon_free_page_change_timer(dev, dev->free_page_wait_time);
+    }
+
     /* Wait till a stop sign is received from the guest */
     while (atomic_read(&dev->free_page_report_status) !=
            FREE_PAGE_REPORT_S_STOP)
@@ -399,7 +464,19 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
-    config.free_page_report_cmd_id = cpu_to_le32(dev->free_page_report_cmd_id);
+    if (dev->host_stop_free_page) {
+        /*
+         * Host is actively requesting to stop the free page report, send the
+         * stop sign to the guest. This happens when the migration thread has
+         * reached the phase to send pages to the destination while the guest
+         * hasn't done the reporting.
+         */
+        config.free_page_report_cmd_id =
+                                    VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID;
+    } else {
+        config.free_page_report_cmd_id =
+                                    cpu_to_le32(dev->free_page_report_cmd_id);
+    }
 
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
@@ -526,6 +603,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
         s->free_page_vq = virtio_add_queue(vdev, 128,
                                            virtio_balloon_handle_free_pages);
         atomic_set(&s->free_page_report_status, FREE_PAGE_REPORT_S_STOP);
+        s->host_stop_free_page = false;
     }
     reset_stats(s);
 }
@@ -597,6 +675,11 @@ static void virtio_balloon_instance_init(Object *obj)
                         balloon_stats_get_poll_interval,
                         balloon_stats_set_poll_interval,
                         NULL, s, NULL);
+
+    object_property_add(obj, "free-page-wait-time", "int",
+                        balloon_free_page_get_wait_time,
+                        balloon_free_page_set_wait_time,
+                        NULL, s, NULL);
 }
 
 static const VMStateDescription vmstate_virtio_balloon = {
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 6c75cca..3345104 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2250,6 +2250,9 @@ static void virtio_balloon_pci_instance_init(Object *obj)
     object_property_add_alias(obj, "guest-stats-polling-interval",
                               OBJECT(&dev->vdev),
                               "guest-stats-polling-interval", &error_abort);
+    object_property_add_alias(obj, "free-page-wait-time",
+                              OBJECT(&dev->vdev),
+                              "free-page-wait-time", &error_abort);
 }
 
 static const TypeInfo virtio_balloon_pci_info = {
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index b84b4af..268f7d6 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -37,6 +37,8 @@ typedef struct virtio_balloon_stat_modern {
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
     VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    /* The host is requesting the guest to stop free page reporting */
+    bool host_stop_free_page;
     uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
@@ -45,8 +47,10 @@ typedef struct VirtIOBalloon {
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
+    QEMUTimer *free_page_timer;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
+    int64_t free_page_wait_time;
     uint32_t host_features;
 } VirtIOBalloon;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [virtio-dev] [PATCH v1 3/4] virtio-balloon: add a timer to limit the free page report wating time
@ 2018-01-17  6:31   ` Wei Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-17  6:31 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This patch adds a timer to limit the time that the host waits for the
free pages reported by the guest. Users can specify the time in ms via
"free-page-wait-time" command line option. If a user doesn't specify a
time, the host waits till the guest finishes reporting all the free
pages. The policy (wait for all the free pages to be reported or use a
time limit) is determined by the orchestration layer.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c         | 85 +++++++++++++++++++++++++++++++++++++-
 hw/virtio/virtio-pci.c             |  3 ++
 include/hw/virtio/virtio-balloon.h |  4 ++
 3 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a2a5536..dcc8cb3 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -213,6 +213,66 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
     balloon_stats_change_timer(s, 0);
 }
 
+static void balloon_free_page_change_timer(VirtIOBalloon *s, int64_t ms)
+{
+    timer_mod(s->free_page_timer,
+              qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + ms);
+}
+
+static void balloon_stop_free_page_report(void *opaque)
+{
+    VirtIOBalloon *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+
+    timer_del(dev->free_page_timer);
+    timer_free(dev->free_page_timer);
+    dev->free_page_timer = NULL;
+
+    if (atomic_read(&dev->free_page_report_status) ==
+        FREE_PAGE_REPORT_S_IN_PROGRESS) {
+        dev->host_stop_free_page = true;
+        virtio_notify_config(vdev);
+    }
+}
+
+static void balloon_free_page_get_wait_time(Object *obj, Visitor *v,
+                                            const char *name, void *opaque,
+                                            Error **errp)
+{
+    VirtIOBalloon *s = opaque;
+
+    visit_type_int(v, name, &s->free_page_wait_time, errp);
+}
+
+static void balloon_free_page_set_wait_time(Object *obj, Visitor *v,
+                                            const char *name, void *opaque,
+                                            Error **errp)
+{
+    VirtIOBalloon *s = opaque;
+    Error *local_err = NULL;
+    int64_t value;
+
+    visit_type_int(v, name, &value, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    if (value < 0) {
+        error_setg(errp, "free page wait time must be greater than zero");
+        return;
+    }
+
+    if (value > UINT32_MAX) {
+        error_setg(errp, "free page wait time value is too big");
+        return;
+    }
+
+    s->free_page_wait_time = value;
+    g_assert(s->free_page_timer == NULL);
+    s->free_page_timer = timer_new_ms(QEMU_CLOCK_REALTIME,
+                                      balloon_stop_free_page_report, s);
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -332,6 +392,7 @@ static void virtio_balloon_handle_free_pages(VirtIODevice *vdev, VirtQueue *vq)
                 atomic_set(&dev->free_page_report_status,
                            FREE_PAGE_REPORT_S_IN_PROGRESS);
             } else {
+                dev->host_stop_free_page = false;
                 atomic_set(&dev->free_page_report_status,
                            FREE_PAGE_REPORT_S_STOP);
             }
@@ -386,6 +447,10 @@ static void virtio_balloon_free_page_stop(void *opaque)
         return;
     }
 
+    if (dev->free_page_wait_time) {
+        balloon_free_page_change_timer(dev, dev->free_page_wait_time);
+    }
+
     /* Wait till a stop sign is received from the guest */
     while (atomic_read(&dev->free_page_report_status) !=
            FREE_PAGE_REPORT_S_STOP)
@@ -399,7 +464,19 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
-    config.free_page_report_cmd_id = cpu_to_le32(dev->free_page_report_cmd_id);
+    if (dev->host_stop_free_page) {
+        /*
+         * Host is actively requesting to stop the free page report, send the
+         * stop sign to the guest. This happens when the migration thread has
+         * reached the phase to send pages to the destination while the guest
+         * hasn't done the reporting.
+         */
+        config.free_page_report_cmd_id =
+                                    VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID;
+    } else {
+        config.free_page_report_cmd_id =
+                                    cpu_to_le32(dev->free_page_report_cmd_id);
+    }
 
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
@@ -526,6 +603,7 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
         s->free_page_vq = virtio_add_queue(vdev, 128,
                                            virtio_balloon_handle_free_pages);
         atomic_set(&s->free_page_report_status, FREE_PAGE_REPORT_S_STOP);
+        s->host_stop_free_page = false;
     }
     reset_stats(s);
 }
@@ -597,6 +675,11 @@ static void virtio_balloon_instance_init(Object *obj)
                         balloon_stats_get_poll_interval,
                         balloon_stats_set_poll_interval,
                         NULL, s, NULL);
+
+    object_property_add(obj, "free-page-wait-time", "int",
+                        balloon_free_page_get_wait_time,
+                        balloon_free_page_set_wait_time,
+                        NULL, s, NULL);
 }
 
 static const VMStateDescription vmstate_virtio_balloon = {
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 6c75cca..3345104 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2250,6 +2250,9 @@ static void virtio_balloon_pci_instance_init(Object *obj)
     object_property_add_alias(obj, "guest-stats-polling-interval",
                               OBJECT(&dev->vdev),
                               "guest-stats-polling-interval", &error_abort);
+    object_property_add_alias(obj, "free-page-wait-time",
+                              OBJECT(&dev->vdev),
+                              "free-page-wait-time", &error_abort);
 }
 
 static const TypeInfo virtio_balloon_pci_info = {
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index b84b4af..268f7d6 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -37,6 +37,8 @@ typedef struct virtio_balloon_stat_modern {
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
     VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    /* The host is requesting the guest to stop free page reporting */
+    bool host_stop_free_page;
     uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
@@ -45,8 +47,10 @@ typedef struct VirtIOBalloon {
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
+    QEMUTimer *free_page_timer;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
+    int64_t free_page_wait_time;
     uint32_t host_features;
 } VirtIOBalloon;
 
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [PATCH v1 4/4] virtio-balloon: Don't skip free pages if the poison val is non-zero
  2018-01-17  6:31 ` [virtio-dev] " Wei Wang
@ 2018-01-17  6:32   ` Wei Wang
  -1 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-17  6:32 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

If the guest is using a non-zero poisoning, we don't skip the transfer
of guest free pages.

Todo: As a next step optimization, we can try
1) skip the transfer of guest poisoned free pages;
2) send the poison value to destination; and
3) seek a way to poison the guest free pages before the guest starts to
run on the destination.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c                      | 3 ++-
 include/hw/virtio/virtio-balloon.h              | 1 +
 include/standard-headers/linux/virtio_balloon.h | 2 ++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index dcc8cb3..0cdc755 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -403,7 +403,7 @@ static void virtio_balloon_handle_free_pages(VirtIODevice *vdev, VirtQueue *vq)
             ram_addr_t offset;
 
             if (atomic_read(&dev->free_page_report_status) ==
-                FREE_PAGE_REPORT_S_IN_PROGRESS) {
+                FREE_PAGE_REPORT_S_IN_PROGRESS && !dev->poison_val) {
                 block = qemu_ram_block_from_host(elem->in_sg[0].iov_base,
                                                  false, &offset);
                 size = elem->in_sg[0].iov_len;
@@ -530,6 +530,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
                         ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
                         &error_abort);
     }
+    dev->poison_val = le32_to_cpu(config.poison_val);
     trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 268f7d6..0bb80cd 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -43,6 +43,7 @@ typedef struct VirtIOBalloon {
     uint32_t num_pages;
     uint32_t actual;
     uint32_t free_page_report_cmd_id;
+    uint32_t poison_val;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 596df5d..f6f4ff3 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -47,6 +47,8 @@ struct virtio_balloon_config {
 	uint32_t actual;
         /* The free_page report command id, readonly by guest */
 	uint32_t free_page_report_cmd_id;
+	/* Stores PAGE_POISON if page poisoning with sanity check is in use */
+	uint32_t poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [virtio-dev] [PATCH v1 4/4] virtio-balloon: Don't skip free pages if the poison val is non-zero
@ 2018-01-17  6:32   ` Wei Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-17  6:32 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

If the guest is using a non-zero poisoning, we don't skip the transfer
of guest free pages.

Todo: As a next step optimization, we can try
1) skip the transfer of guest poisoned free pages;
2) send the poison value to destination; and
3) seek a way to poison the guest free pages before the guest starts to
run on the destination.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c                      | 3 ++-
 include/hw/virtio/virtio-balloon.h              | 1 +
 include/standard-headers/linux/virtio_balloon.h | 2 ++
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index dcc8cb3..0cdc755 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -403,7 +403,7 @@ static void virtio_balloon_handle_free_pages(VirtIODevice *vdev, VirtQueue *vq)
             ram_addr_t offset;
 
             if (atomic_read(&dev->free_page_report_status) ==
-                FREE_PAGE_REPORT_S_IN_PROGRESS) {
+                FREE_PAGE_REPORT_S_IN_PROGRESS && !dev->poison_val) {
                 block = qemu_ram_block_from_host(elem->in_sg[0].iov_base,
                                                  false, &offset);
                 size = elem->in_sg[0].iov_len;
@@ -530,6 +530,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
                         ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
                         &error_abort);
     }
+    dev->poison_val = le32_to_cpu(config.poison_val);
     trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 268f7d6..0bb80cd 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -43,6 +43,7 @@ typedef struct VirtIOBalloon {
     uint32_t num_pages;
     uint32_t actual;
     uint32_t free_page_report_cmd_id;
+    uint32_t poison_val;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 596df5d..f6f4ff3 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -47,6 +47,8 @@ struct virtio_balloon_config {
 	uint32_t actual;
         /* The free_page report command id, readonly by guest */
 	uint32_t free_page_report_cmd_id;
+	/* Stores PAGE_POISON if page poisoning with sanity check is in use */
+	uint32_t poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
  2018-01-17  6:31   ` [virtio-dev] " Wei Wang
  (?)
@ 2018-01-17 11:40   ` Juan Quintela
  2018-01-17 15:52       ` [virtio-dev] " Wang, Wei W
  -1 siblings, 1 reply; 16+ messages in thread
From: Juan Quintela @ 2018-01-17 11:40 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

Wei Wang <wei.w.wang@intel.com> wrote:
> The new feature enables the virtio-balloon device to receive the hint of
> guest free pages from the free page vq, and clears the corresponding bits
> of the free page from the dirty bitmap, so that those free pages are not
> transferred by the migration thread.
>
> Without this feature, to local live migrate an 8G idle guest takes
> ~2286 ms. With this featrue, it takes ~260 ms, which redues the
> migration time to ~11%.
>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> CC: Michael S. Tsirkin <mst@redhat.com>

I don't claim to understandthe full balloon driver


> diff --git a/migration/ram.c b/migration/ram.c
> index cb1950f..d6f462c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2186,6 +2186,16 @@ static int ram_init_all(RAMState **rsp)
>      return 0;
>  }
>  
> +void skip_free_pages_from_dirty_bitmap(RAMBlock *block, ram_addr_t offset,
> +                                       size_t len)
> +{
> +    long start = offset >> TARGET_PAGE_BITS,
> +         nr = len >> TARGET_PAGE_BITS;
> +
> +    bitmap_clear(block->bmap, start, nr);

But what assures us that all the nr pages are dirty?


> +    ram_state->migration_dirty_pages -= nr;

This should be
ram_state->migration_dirty_pages -=
     count_ones(block->bmap, start, nr);

For a count_ones function, no?

Furthermore, we have one "optimization" and this only works for the
second stages onward:

static inline
unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
                                          unsigned long start)
{
    unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
    unsigned long *bitmap = rb->bmap;
    unsigned long next;

    if (rs->ram_bulk_stage && start > 0) {
        next = start + 1;
    } else {
        next = find_next_bit(bitmap, size, start);
    }

    return next;
}

So, for making this really work, we have more work to do.

Actually, what I think we should do was to _ask_ the guest which pages
are used at the beggining, instead of just setting all pages as dirty,
but that requires kernel changes and lot of search of corner cases.

Later, Juan.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
  2018-01-17  6:31   ` [virtio-dev] " Wei Wang
  (?)
  (?)
@ 2018-01-17 12:31   ` Juan Quintela
  2018-01-19  3:52       ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 16+ messages in thread
From: Juan Quintela @ 2018-01-17 12:31 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

Wei Wang <wei.w.wang@intel.com> wrote:
> The new feature enables the virtio-balloon device to receive the hint of
> guest free pages from the free page vq, and clears the corresponding bits
> of the free page from the dirty bitmap, so that those free pages are not
> transferred by the migration thread.
>
> Without this feature, to local live migrate an 8G idle guest takes
> ~2286 ms. With this featrue, it takes ~260 ms, which redues the
> migration time to ~11%.
>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> CC: Michael S. Tsirkin <mst@redhat.com>

I hate to answer twice,but ...


> +static bool virtio_balloon_free_page_support(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +
> +    if (!balloon_free_page_supported(s)) {
> +        return false;
> +    }
> +
> +    return true;

return balloon_free_page_supported(s); ???


> @@ -312,6 +399,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  
>      config.num_pages = cpu_to_le32(dev->num_pages);
>      config.actual = cpu_to_le32(dev->actual);
> +    config.free_page_report_cmd_id = cpu_to_le32(dev->free_page_report_cmd_id);
>  
>      trace_virtio_balloon_get_config(config.num_pages, config.actual);
>      memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
> @@ -418,6 +506,7 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>      .fields = (VMStateField[]) {
>          VMSTATE_UINT32(num_pages, VirtIOBalloon),
>          VMSTATE_UINT32(actual, VirtIOBalloon),
> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
>          VMSTATE_END_OF_LIST()

No new version, no subsection, and we add a new field?
I think that is wrong.  We need to send that only for old versions.
Look at how was handled for ....

[PATCH v2] hpet: recover timer offset correctly

v2 discussion explains why we want to handle that way for different
machine types.

v3 does it correctly.


And I think that with this I have reviewed all the migration/vmstate
bits, no?

If something is missing, please ask.

Later, Juan.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
  2018-01-17 11:40   ` [Qemu-devel] " Juan Quintela
@ 2018-01-17 15:52       ` Wang, Wei W
  0 siblings, 0 replies; 16+ messages in thread
From: Wang, Wei W @ 2018-01-17 15:52 UTC (permalink / raw)
  To: quintela
  Cc: yang.zhang.wz, virtio-dev, quan.xu0, mst, liliang.opensource,
	qemu-devel, dgilbert, pbonzini, nilal

On Wednesday, January 17, 2018 7:41 PM, Juan Quintela wrote:
> Wei Wang <wei.w.wang@intel.com> wrote:
> > +void skip_free_pages_from_dirty_bitmap(RAMBlock *block, ram_addr_t
> offset,
> > +                                       size_t len) {
> > +    long start = offset >> TARGET_PAGE_BITS,
> > +         nr = len >> TARGET_PAGE_BITS;
> > +
> > +    bitmap_clear(block->bmap, start, nr);
> 
> But what assures us that all the nr pages are dirty?

Actually the free page optimization is used for the bulk stage, where all the pages are treated as dirty. In patch 2, we have:

+            if (rs->free_page_support) {
+                bitmap_set(block->bmap, 1, pages);
+            } else {
+                bitmap_set(block->bmap, 0, pages);
+            }


> 
> > +    ram_state->migration_dirty_pages -= nr;
> 
> This should be
> ram_state->migration_dirty_pages -=
>      count_ones(block->bmap, start, nr);
> 
> For a count_ones function, no?

Not really. "nr" is the number of bits to clear from the bitmap, so "migration_dirty_pages -= nr" shows how many dirty bits left in the bitmap.

One cornercase I'm thinking about is when we do 
bitmap_clear(block->bmap, start, nr);
would it be possible that "start + nr" exceeds the bitmap size? that is, would it be possible that the free page block (e.g. 2MB) from the guest crosses the QEMU ram block boundary?


> Furthermore, we have one "optimization" and this only works for the second
> stages onward:
> 
> static inline
> unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>                                           unsigned long start) {
>     unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
>     unsigned long *bitmap = rb->bmap;
>     unsigned long next;
> 
>     if (rs->ram_bulk_stage && start > 0) {
>         next = start + 1;
>     } else {
>         next = find_next_bit(bitmap, size, start);
>     }
> 
>     return next;
> }
> 
> So, for making this really work, we have more work to do.
> 
> Actually, what I think we should do was to _ask_ the guest which pages are
> used at the beggining, instead of just setting all pages as dirty, but that
> requires kernel changes and lot of search of corner cases.
> 

This series optimizes the bulk stage memory transfer. Do you mean you have another idea to optimize the second round and onward? How would you let the guest track pages that are written?

Best,
Wei

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [virtio-dev] RE: [Qemu-devel] [PATCH v1 1/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
@ 2018-01-17 15:52       ` Wang, Wei W
  0 siblings, 0 replies; 16+ messages in thread
From: Wang, Wei W @ 2018-01-17 15:52 UTC (permalink / raw)
  To: quintela
  Cc: yang.zhang.wz, virtio-dev, quan.xu0, mst, liliang.opensource,
	qemu-devel, dgilbert, pbonzini, nilal

On Wednesday, January 17, 2018 7:41 PM, Juan Quintela wrote:
> Wei Wang <wei.w.wang@intel.com> wrote:
> > +void skip_free_pages_from_dirty_bitmap(RAMBlock *block, ram_addr_t
> offset,
> > +                                       size_t len) {
> > +    long start = offset >> TARGET_PAGE_BITS,
> > +         nr = len >> TARGET_PAGE_BITS;
> > +
> > +    bitmap_clear(block->bmap, start, nr);
> 
> But what assures us that all the nr pages are dirty?

Actually the free page optimization is used for the bulk stage, where all the pages are treated as dirty. In patch 2, we have:

+            if (rs->free_page_support) {
+                bitmap_set(block->bmap, 1, pages);
+            } else {
+                bitmap_set(block->bmap, 0, pages);
+            }


> 
> > +    ram_state->migration_dirty_pages -= nr;
> 
> This should be
> ram_state->migration_dirty_pages -=
>      count_ones(block->bmap, start, nr);
> 
> For a count_ones function, no?

Not really. "nr" is the number of bits to clear from the bitmap, so "migration_dirty_pages -= nr" shows how many dirty bits left in the bitmap.

One cornercase I'm thinking about is when we do 
bitmap_clear(block->bmap, start, nr);
would it be possible that "start + nr" exceeds the bitmap size? that is, would it be possible that the free page block (e.g. 2MB) from the guest crosses the QEMU ram block boundary?


> Furthermore, we have one "optimization" and this only works for the second
> stages onward:
> 
> static inline
> unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>                                           unsigned long start) {
>     unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
>     unsigned long *bitmap = rb->bmap;
>     unsigned long next;
> 
>     if (rs->ram_bulk_stage && start > 0) {
>         next = start + 1;
>     } else {
>         next = find_next_bit(bitmap, size, start);
>     }
> 
>     return next;
> }
> 
> So, for making this really work, we have more work to do.
> 
> Actually, what I think we should do was to _ask_ the guest which pages are
> used at the beggining, instead of just setting all pages as dirty, but that
> requires kernel changes and lot of search of corner cases.
> 

This series optimizes the bulk stage memory transfer. Do you mean you have another idea to optimize the second round and onward? How would you let the guest track pages that are written?

Best,
Wei



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
  2018-01-17 12:31   ` Juan Quintela
@ 2018-01-19  3:52       ` Wei Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-19  3:52 UTC (permalink / raw)
  To: quintela
  Cc: qemu-devel, virtio-dev, mst, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 01/17/2018 08:31 PM, Juan Quintela wrote:
> Wei Wang <wei.w.wang@intel.com> wrote:
>> The new feature enables the virtio-balloon device to receive the hint of
>> guest free pages from the free page vq, and clears the corresponding bits
>> of the free page from the dirty bitmap, so that those free pages are not
>> transferred by the migration thread.
>>
>> Without this feature, to local live migrate an 8G idle guest takes
>> ~2286 ms. With this featrue, it takes ~260 ms, which redues the
>> migration time to ~11%.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
> I hate to answer twice,but ...

Thanks for the second review :)

>
>
>> +static bool virtio_balloon_free_page_support(void *opaque)
>> +{
>> +    VirtIOBalloon *s = opaque;
>> +
>> +    if (!balloon_free_page_supported(s)) {
>> +        return false;
>> +    }
>> +
>> +    return true;
> return balloon_free_page_supported(s); ???

Forgot to clean this up. It is actually a duplicate of 
balloon_free_page_supported() now, I'll remove one of them in the next 
version.

>
>
>> @@ -312,6 +399,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>>   
>>       config.num_pages = cpu_to_le32(dev->num_pages);
>>       config.actual = cpu_to_le32(dev->actual);
>> +    config.free_page_report_cmd_id = cpu_to_le32(dev->free_page_report_cmd_id);
>>   
>>       trace_virtio_balloon_get_config(config.num_pages, config.actual);
>>       memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
>> @@ -418,6 +506,7 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>>       .fields = (VMStateField[]) {
>>           VMSTATE_UINT32(num_pages, VirtIOBalloon),
>>           VMSTATE_UINT32(actual, VirtIOBalloon),
>> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
>>           VMSTATE_END_OF_LIST()
> No new version, no subsection, and we add a new field?
> I think that is wrong.  We need to send that only for old versions.
> Look at how was handled for ....
>
> [PATCH v2] hpet: recover timer offset correctly
>
> v2 discussion explains why we want to handle that way for different
> machine types.
>
> v3 does it correctly.

OK, I'll put it to a subsection.

>
>
> And I think that with this I have reviewed all the migration/vmstate
> bits, no?
>
> If something is missing, please ask.

Thanks. This is the only new device state that need to be saved. Please 
have a review of patch 2 as well, which makes the migration thread use 
this feature.

Best,
Wei

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [virtio-dev] Re: [PATCH v1 1/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
@ 2018-01-19  3:52       ` Wei Wang
  0 siblings, 0 replies; 16+ messages in thread
From: Wei Wang @ 2018-01-19  3:52 UTC (permalink / raw)
  To: quintela
  Cc: qemu-devel, virtio-dev, mst, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 01/17/2018 08:31 PM, Juan Quintela wrote:
> Wei Wang <wei.w.wang@intel.com> wrote:
>> The new feature enables the virtio-balloon device to receive the hint of
>> guest free pages from the free page vq, and clears the corresponding bits
>> of the free page from the dirty bitmap, so that those free pages are not
>> transferred by the migration thread.
>>
>> Without this feature, to local live migrate an 8G idle guest takes
>> ~2286 ms. With this featrue, it takes ~260 ms, which redues the
>> migration time to ~11%.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
> I hate to answer twice,but ...

Thanks for the second review :)

>
>
>> +static bool virtio_balloon_free_page_support(void *opaque)
>> +{
>> +    VirtIOBalloon *s = opaque;
>> +
>> +    if (!balloon_free_page_supported(s)) {
>> +        return false;
>> +    }
>> +
>> +    return true;
> return balloon_free_page_supported(s); ???

Forgot to clean this up. It is actually a duplicate of 
balloon_free_page_supported() now, I'll remove one of them in the next 
version.

>
>
>> @@ -312,6 +399,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>>   
>>       config.num_pages = cpu_to_le32(dev->num_pages);
>>       config.actual = cpu_to_le32(dev->actual);
>> +    config.free_page_report_cmd_id = cpu_to_le32(dev->free_page_report_cmd_id);
>>   
>>       trace_virtio_balloon_get_config(config.num_pages, config.actual);
>>       memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
>> @@ -418,6 +506,7 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>>       .fields = (VMStateField[]) {
>>           VMSTATE_UINT32(num_pages, VirtIOBalloon),
>>           VMSTATE_UINT32(actual, VirtIOBalloon),
>> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
>>           VMSTATE_END_OF_LIST()
> No new version, no subsection, and we add a new field?
> I think that is wrong.  We need to send that only for old versions.
> Look at how was handled for ....
>
> [PATCH v2] hpet: recover timer offset correctly
>
> v2 discussion explains why we want to handle that way for different
> machine types.
>
> v3 does it correctly.

OK, I'll put it to a subsection.

>
>
> And I think that with this I have reviewed all the migration/vmstate
> bits, no?
>
> If something is missing, please ask.

Thanks. This is the only new device state that need to be saved. Please 
have a review of patch 2 as well, which makes the migration thread use 
this feature.

Best,
Wei


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-01-19  3:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-17  6:31 [Qemu-devel] [PATCH v1 0/4] virtio-balloon: support free page reporting Wei Wang
2018-01-17  6:31 ` [virtio-dev] " Wei Wang
2018-01-17  6:31 ` [Qemu-devel] [PATCH v1 1/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ Wei Wang
2018-01-17  6:31   ` [virtio-dev] " Wei Wang
2018-01-17 11:40   ` [Qemu-devel] " Juan Quintela
2018-01-17 15:52     ` Wang, Wei W
2018-01-17 15:52       ` [virtio-dev] " Wang, Wei W
2018-01-17 12:31   ` Juan Quintela
2018-01-19  3:52     ` Wei Wang
2018-01-19  3:52       ` [virtio-dev] " Wei Wang
2018-01-17  6:31 ` [Qemu-devel] [PATCH v1 2/4] migration: call balloon to clear bits of free pages from dirty bitmap Wei Wang
2018-01-17  6:31   ` [virtio-dev] " Wei Wang
2018-01-17  6:31 ` [Qemu-devel] [PATCH v1 3/4] virtio-balloon: add a timer to limit the free page report wating time Wei Wang
2018-01-17  6:31   ` [virtio-dev] " Wei Wang
2018-01-17  6:32 ` [Qemu-devel] [PATCH v1 4/4] virtio-balloon: Don't skip free pages if the poison val is non-zero Wei Wang
2018-01-17  6:32   ` [virtio-dev] " Wei Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.