All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
@ 2018-04-24  6:13 ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.

- Test Environment
    Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    Guest: 8G RAM, 4 vCPU
    Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
    - Idle Guest Live Migration Time (results are averaged over 10 runs):
        - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
    - Guest with Linux Compilation Workload (make bzImage -j4):
        - Live Migration Time (average)
          Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
        - Linux Compilation Time
          Optimization v.s. Legacy = 4min56s v.s. 5min3s
          --> no obvious difference

- Source Code
    - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
    - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

ChangeLog:
v6->v7:
      virtio-balloon/virtio_balloo_poll_free_page_hints:
          - add virtio_notify() at the end to notify the driver that
            the optimization is done, which indicates that the entries
            have all been put back to the vq and ready to detach them.
v5->v6:
      virtio-balloon: use iothread to get free page hint
v4->v5:
    1) migration:
        - bitmap_clear_dirty: update the dirty bitmap and dirty page
          count under the bitmap mutex as what other functions are doing;
        - qemu_guest_free_page_hint:
            - add comments for this function;
            - check the !block case;
            - check "offset > block->used_length" before proceed;
            - assign used_len inside the for{} body;
            - update the dirty bitmap and dirty page counter under the
              bitmap mutex;
        - ram_state_reset:
            - rs->free_page_support: && with use "migrate_postcopy"
              instead of migration_in_postcopy;
            - clear the ram_bulk_stage flag if free_page_support is true;
    2) balloon:
         - add the usage documentation of balloon_free_page_start and
           balloon_free_page_stop in code;
         - the optimization thread is named "balloon_fpo" to meet the
           requirement of "less than 14 characters";
         - virtio_balloon_poll_free_page_hints:
             - run on condition when runstate_is_running() is true;
             - add a qemu spin lock to synchronize accesses to the free
               page reporting related fields shared among the migration
               thread and the optimization thread;
          - virtio_balloon_free_page_start: just return if
            runstate_is_running is false;
          - virtio_balloon_free_page_stop: access to the free page
            reporting related fields under a qemu spin lock;
          - virtio_balloon_device_unrealize/reset: call
            virtio_balloon_free_page_stop is the free page hint feature is
            used;
          - virtio_balloon_set_status: call irtio_balloon_free_page_stop
            in case the guest is stopped by qmp when the optimization is
            running;
v3->v4:
    1) bitmap: add a new API to count 1s starting from an offset of a
       bitmap
    2) migration:
        - qemu_guest_free_page_hint: calculate
          ram_state->migration_dirty_pages by counting how many bits of
          free pages are truely cleared. If some of the bits were
          already 0, they shouldn't be deducted by
          ram_state->migration_dirty_pages. This wasn't needed for
          previous versions since we optimized bulk stage only,
          where all bits are guaranteed to be set. It's needed now
          because we extened the usage of this optimizaton to all stages
          except the last stop&copy stage. From 2nd stage onward, there
          are possibilities that some bits of free pages are already 0.
     3) virtio-balloon:
         - virtio_balloon_free_page_report_status: introduce a new status,
           FREE_PAGE_REPORT_S_EXIT. This status indicates that the
           optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
           the reporting is stopped, but the optimization thread still needs
           to be joined by the migration thread.
v2->v3:
    1) virtio-balloon
        - virtio_balloon_free_page_start: poll the hints using a new
          thread;
        - use cmd id between [0x80000000, UINT_MAX];
        - virtio_balloon_poll_free_page_hints:
            - stop the optimization only when it has started;
            - don't skip free pages when !poison_val;
        - add poison_val to vmsd to migrate;
        - virtio_balloon_get_features: add the F_PAGE_POISON feature when
          host has F_FREE_PAGE_HINT;
        - remove the timer patch which is not needed now.
    2) migration
       - new api, qemu_guest_free_page_hint;
       - rs->free_page_support set only in the precopy case;
       - use the new balloon APIs.
v1->v2: 
    1) virtio-balloon
        - use subsections to save free_page_report_cmd_id;
        - poll the free page vq after sending a cmd id to the driver;
        - change the free page vq size to VIRTQUEUE_MAX_SIZE;
        - virtio_balloon_poll_free_page_hints: handle the corner case
          that the free page block reported from the driver may cross
          the RAMBlock boundary.
    2) migration/ram.c
        - use balloon_free_page_poll to start the optimization


Wei Wang (5):
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  migration: use the free page hint feature from balloon

 balloon.c                                       |  58 +++++-
 hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
 include/hw/virtio/virtio-balloon.h              |  27 ++-
 include/migration/misc.h                        |   2 +
 include/qemu/bitmap.h                           |  13 ++
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 migration/ram.c                                 |  73 ++++++-
 8 files changed, 406 insertions(+), 30 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
@ 2018-04-24  6:13 ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.

- Test Environment
    Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    Guest: 8G RAM, 4 vCPU
    Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
    - Idle Guest Live Migration Time (results are averaged over 10 runs):
        - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
    - Guest with Linux Compilation Workload (make bzImage -j4):
        - Live Migration Time (average)
          Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
        - Linux Compilation Time
          Optimization v.s. Legacy = 4min56s v.s. 5min3s
          --> no obvious difference

- Source Code
    - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
    - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

ChangeLog:
v6->v7:
      virtio-balloon/virtio_balloo_poll_free_page_hints:
          - add virtio_notify() at the end to notify the driver that
            the optimization is done, which indicates that the entries
            have all been put back to the vq and ready to detach them.
v5->v6:
      virtio-balloon: use iothread to get free page hint
v4->v5:
    1) migration:
        - bitmap_clear_dirty: update the dirty bitmap and dirty page
          count under the bitmap mutex as what other functions are doing;
        - qemu_guest_free_page_hint:
            - add comments for this function;
            - check the !block case;
            - check "offset > block->used_length" before proceed;
            - assign used_len inside the for{} body;
            - update the dirty bitmap and dirty page counter under the
              bitmap mutex;
        - ram_state_reset:
            - rs->free_page_support: && with use "migrate_postcopy"
              instead of migration_in_postcopy;
            - clear the ram_bulk_stage flag if free_page_support is true;
    2) balloon:
         - add the usage documentation of balloon_free_page_start and
           balloon_free_page_stop in code;
         - the optimization thread is named "balloon_fpo" to meet the
           requirement of "less than 14 characters";
         - virtio_balloon_poll_free_page_hints:
             - run on condition when runstate_is_running() is true;
             - add a qemu spin lock to synchronize accesses to the free
               page reporting related fields shared among the migration
               thread and the optimization thread;
          - virtio_balloon_free_page_start: just return if
            runstate_is_running is false;
          - virtio_balloon_free_page_stop: access to the free page
            reporting related fields under a qemu spin lock;
          - virtio_balloon_device_unrealize/reset: call
            virtio_balloon_free_page_stop is the free page hint feature is
            used;
          - virtio_balloon_set_status: call irtio_balloon_free_page_stop
            in case the guest is stopped by qmp when the optimization is
            running;
v3->v4:
    1) bitmap: add a new API to count 1s starting from an offset of a
       bitmap
    2) migration:
        - qemu_guest_free_page_hint: calculate
          ram_state->migration_dirty_pages by counting how many bits of
          free pages are truely cleared. If some of the bits were
          already 0, they shouldn't be deducted by
          ram_state->migration_dirty_pages. This wasn't needed for
          previous versions since we optimized bulk stage only,
          where all bits are guaranteed to be set. It's needed now
          because we extened the usage of this optimizaton to all stages
          except the last stop&copy stage. From 2nd stage onward, there
          are possibilities that some bits of free pages are already 0.
     3) virtio-balloon:
         - virtio_balloon_free_page_report_status: introduce a new status,
           FREE_PAGE_REPORT_S_EXIT. This status indicates that the
           optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
           the reporting is stopped, but the optimization thread still needs
           to be joined by the migration thread.
v2->v3:
    1) virtio-balloon
        - virtio_balloon_free_page_start: poll the hints using a new
          thread;
        - use cmd id between [0x80000000, UINT_MAX];
        - virtio_balloon_poll_free_page_hints:
            - stop the optimization only when it has started;
            - don't skip free pages when !poison_val;
        - add poison_val to vmsd to migrate;
        - virtio_balloon_get_features: add the F_PAGE_POISON feature when
          host has F_FREE_PAGE_HINT;
        - remove the timer patch which is not needed now.
    2) migration
       - new api, qemu_guest_free_page_hint;
       - rs->free_page_support set only in the precopy case;
       - use the new balloon APIs.
v1->v2: 
    1) virtio-balloon
        - use subsections to save free_page_report_cmd_id;
        - poll the free page vq after sending a cmd id to the driver;
        - change the free page vq size to VIRTQUEUE_MAX_SIZE;
        - virtio_balloon_poll_free_page_hints: handle the corner case
          that the free page block reported from the driver may cross
          the RAMBlock boundary.
    2) migration/ram.c
        - use balloon_free_page_poll to start the optimization


Wei Wang (5):
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  migration: use the free page hint feature from balloon

 balloon.c                                       |  58 +++++-
 hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
 include/hw/virtio/virtio-balloon.h              |  27 ++-
 include/migration/misc.h                        |   2 +
 include/qemu/bitmap.h                           |  13 ++
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 migration/ram.c                                 |  73 ++++++-
 8 files changed, 406 insertions(+), 30 deletions(-)

-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* [Qemu-devel] [PATCH v7 1/5] bitmap: bitmap_count_one_with_offset
  2018-04-24  6:13 ` [virtio-dev] " Wei Wang
@ 2018-04-24  6:13   ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/qemu/bitmap.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..e3f31f1 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -228,6 +228,19 @@ static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
     }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+                                                long offset, long nbits)
+{
+    long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+    long redundant_bits = offset - aligned_offset;
+    long bits_to_count = nbits + redundant_bits;
+    const unsigned long *bitmap_start = bitmap +
+                                        aligned_offset / BITS_PER_LONG;
+
+    return bitmap_count_one(bitmap_start, bits_to_count) -
+           bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [virtio-dev] [PATCH v7 1/5] bitmap: bitmap_count_one_with_offset
@ 2018-04-24  6:13   ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/qemu/bitmap.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..e3f31f1 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -228,6 +228,19 @@ static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
     }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+                                                long offset, long nbits)
+{
+    long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+    long redundant_bits = offset - aligned_offset;
+    long bits_to_count = nbits + redundant_bits;
+    const unsigned long *bitmap_start = bitmap +
+                                        aligned_offset / BITS_PER_LONG;
+
+    return bitmap_count_one(bitmap_start, bits_to_count) -
+           bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [PATCH v7 2/5] migration: use bitmap_mutex in migration_bitmap_clear_dirty
  2018-04-24  6:13 ` [virtio-dev] " Wei Wang
@ 2018-04-24  6:13   ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. This patch makes
migration_bitmap_clear_dirty update the bitmap and counter under the
mutex.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 0e90efa..9a72b1a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -795,11 +795,14 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 {
     bool ret;
 
+    qemu_mutex_lock(&rs->bitmap_mutex);
     ret = test_and_clear_bit(page, rb->bmap);
 
     if (ret) {
         rs->migration_dirty_pages--;
     }
+    qemu_mutex_unlock(&rs->bitmap_mutex);
+
     return ret;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [virtio-dev] [PATCH v7 2/5] migration: use bitmap_mutex in migration_bitmap_clear_dirty
@ 2018-04-24  6:13   ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages counter. This patch makes
migration_bitmap_clear_dirty update the bitmap and counter under the
mutex.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 0e90efa..9a72b1a 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -795,11 +795,14 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 {
     bool ret;
 
+    qemu_mutex_lock(&rs->bitmap_mutex);
     ret = test_and_clear_bit(page, rb->bmap);
 
     if (ret) {
         rs->migration_dirty_pages--;
     }
+    qemu_mutex_unlock(&rs->bitmap_mutex);
+
     return ret;
 }
 
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-04-24  6:13 ` [virtio-dev] " Wei Wang
@ 2018-04-24  6:13   ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 include/migration/misc.h |  2 ++
 migration/ram.c          | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4ebf24c..113320e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 9a72b1a..0147548 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2198,6 +2198,50 @@ static int ram_init_all(RAMState **rsp)
 }
 
 /*
+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+    RAMBlock *block;
+    ram_addr_t offset;
+    size_t used_len, start, npages;
+
+    for (; len > 0; len -= used_len) {
+        block = qemu_ram_block_from_host(addr, false, &offset);
+        if (unlikely(!block)) {
+            return;
+        }
+
+        /*
+         * This handles the case that the RAMBlock is resized after the free
+         * page hint is reported.
+         */
+        if (unlikely(offset > block->used_length)) {
+            return;
+        }
+
+        if (len <= block->used_length - offset) {
+            used_len = len;
+        } else {
+            used_len = block->used_length - offset;
+            addr += used_len;
+        }
+
+        start = offset >> TARGET_PAGE_BITS;
+        npages = used_len >> TARGET_PAGE_BITS;
+
+        qemu_mutex_lock(&ram_state->bitmap_mutex);
+        ram_state->migration_dirty_pages -=
+                      bitmap_count_one_with_offset(block->bmap, start, npages);
+        bitmap_clear(block->bmap, start, npages);
+        qemu_mutex_unlock(&ram_state->bitmap_mutex);
+    }
+}
+
+/*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [virtio-dev] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-04-24  6:13   ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 include/migration/misc.h |  2 ++
 migration/ram.c          | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 4ebf24c..113320e 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 9a72b1a..0147548 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2198,6 +2198,50 @@ static int ram_init_all(RAMState **rsp)
 }
 
 /*
+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+    RAMBlock *block;
+    ram_addr_t offset;
+    size_t used_len, start, npages;
+
+    for (; len > 0; len -= used_len) {
+        block = qemu_ram_block_from_host(addr, false, &offset);
+        if (unlikely(!block)) {
+            return;
+        }
+
+        /*
+         * This handles the case that the RAMBlock is resized after the free
+         * page hint is reported.
+         */
+        if (unlikely(offset > block->used_length)) {
+            return;
+        }
+
+        if (len <= block->used_length - offset) {
+            used_len = len;
+        } else {
+            used_len = block->used_length - offset;
+            addr += used_len;
+        }
+
+        start = offset >> TARGET_PAGE_BITS;
+        npages = used_len >> TARGET_PAGE_BITS;
+
+        qemu_mutex_lock(&ram_state->bitmap_mutex);
+        ram_state->migration_dirty_pages -=
+                      bitmap_count_one_with_offset(block->bmap, start, npages);
+        bitmap_clear(block->bmap, start, npages);
+        qemu_mutex_unlock(&ram_state->bitmap_mutex);
+    }
+}
+
+/*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-04-24  6:13 ` [virtio-dev] " Wei Wang
@ 2018-04-24  6:13   ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

balloon_free_page_start - start guest free page hint reporting.
balloon_free_page_stop - stop guest free page hint reporting.

Note: balloon will report pages which were free at the time
of this call. As the reporting happens asynchronously, dirty bit logging
must be enabled before this call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

TODO:
- handle the case when page poisoning is in use

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
---
 balloon.c                                       |  58 +++++-
 hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
 include/hw/virtio/virtio-balloon.h              |  27 ++-
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 5 files changed, 319 insertions(+), 29 deletions(-)

diff --git a/balloon.c b/balloon.c
index 6bf0a96..87a0410 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,9 @@
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
+static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
+static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
+static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
 static void *balloon_opaque;
 static bool balloon_inhibited;
 
@@ -64,19 +67,51 @@ static bool have_balloon(Error **errp)
     return true;
 }
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-                             QEMUBalloonStatus *stat_func, void *opaque)
+bool balloon_free_page_support(void)
 {
-    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
-        /* We're already registered one balloon handler.  How many can
-         * a guest really have?
-         */
-        return -1;
+    return balloon_free_page_support_fn &&
+           balloon_free_page_support_fn(balloon_opaque);
+}
+
+/*
+ * Balloon will report pages which were free at the time of this call. As the
+ * reporting happens asynchronously, dirty bit logging must be enabled before
+ * this call is made.
+ */
+void balloon_free_page_start(void)
+{
+    balloon_free_page_start_fn(balloon_opaque);
+}
+
+/*
+ * Guest reporting must be disabled before the migration dirty bitmap is
+ * synchronized.
+ */
+void balloon_free_page_stop(void)
+{
+    balloon_free_page_stop_fn(balloon_opaque);
+}
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque)
+{
+    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
+        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
+        balloon_opaque) {
+        /* We already registered one balloon handler. */
+        return;
     }
-    balloon_event_fn = event_func;
-    balloon_stat_fn = stat_func;
+
+    balloon_event_fn = event_fn;
+    balloon_stat_fn = stat_fn;
+    balloon_free_page_support_fn = free_page_support_fn;
+    balloon_free_page_start_fn = free_page_start_fn;
+    balloon_free_page_stop_fn = free_page_stop_fn;
     balloon_opaque = opaque;
-    return 0;
 }
 
 void qemu_remove_balloon_handler(void *opaque)
@@ -86,6 +121,9 @@ void qemu_remove_balloon_handler(void *opaque)
     }
     balloon_event_fn = NULL;
     balloon_stat_fn = NULL;
+    balloon_free_page_support_fn = NULL;
+    balloon_free_page_start_fn = NULL;
+    balloon_free_page_stop_fn = NULL;
     balloon_opaque = NULL;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index f456cea..13bf0db 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -31,6 +31,7 @@
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "migration/misc.h"
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
 
@@ -308,6 +309,125 @@ out:
     }
 }
 
+static void virtio_balloon_poll_free_page_hints(void *opaque)
+{
+    VirtQueueElement *elem;
+    VirtIOBalloon *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtQueue *vq = dev->free_page_vq;
+    uint32_t id;
+    size_t size;
+
+    while (1) {
+        qemu_mutex_lock(&dev->free_page_lock);
+        while (dev->block_iothread) {
+            qemu_cond_wait(&dev->free_page_cond, &dev->free_page_lock);
+        }
+
+        /*
+         * If the migration thread actively stops the reporting, exit
+         * immediately.
+         */
+        if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
+            qemu_mutex_unlock(&dev->free_page_lock);
+            break;
+        }
+
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            qemu_mutex_unlock(&dev->free_page_lock);
+            continue;
+        }
+
+        if (elem->out_num) {
+            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
+            virtqueue_push(vq, elem, size);
+            g_free(elem);
+
+            virtio_tswap32s(vdev, &id);
+            if (unlikely(size != sizeof(id))) {
+                virtio_error(vdev, "received an incorrect cmd id");
+                break;
+            }
+            if (id == dev->free_page_report_cmd_id) {
+                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+            } else {
+                /*
+                 * Stop the optimization only when it has started. This
+                 * avoids a stale stop sign for the previous command.
+                 */
+                if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+                    dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+                    qemu_mutex_unlock(&dev->free_page_lock);
+                    break;
+                }
+            }
+        }
+
+        if (elem->in_num) {
+            /* TODO: send the poison value to the destination */
+            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
+                !dev->poison_val) {
+                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+                                          elem->in_sg[0].iov_len);
+            }
+            virtqueue_push(vq, elem, 0);
+            g_free(elem);
+        }
+        qemu_mutex_unlock(&dev->free_page_lock);
+    }
+    virtio_notify(vdev, vq);
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    /* For the stop and copy phase, we don't need to start the optimization */
+    if (!vdev->vm_running) {
+        return;
+    }
+
+    if (s->free_page_report_cmd_id == UINT_MAX) {
+        s->free_page_report_cmd_id =
+                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+    } else {
+        s->free_page_report_cmd_id++;
+    }
+
+    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+    virtio_notify_config(vdev);
+    qemu_bh_schedule(s->free_page_bh);
+}
+
+static void virtio_balloon_free_page_stop(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    if (s->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
+        return;
+    } else {
+        qemu_mutex_lock(&s->free_page_lock);
+        /*
+         * The guest hasn't done the reporting, so host sends a notification
+         * to the guest to actively stop the reporting.
+         */
+        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+        qemu_mutex_unlock(&s->free_page_lock);
+        virtio_notify_config(vdev);
+    }
+}
+
 static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
@@ -315,6 +435,17 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
+    if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
+        config.poison_val = cpu_to_le32(dev->poison_val);
+    }
+
+    if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+    } else {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(dev->free_page_report_cmd_id);
+    }
 
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
@@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
                         ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
                         &error_abort);
     }
+    dev->poison_val = le32_to_cpu(config.poison_val);
     trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -377,6 +509,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+
+    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
+        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
+    }
+
     return f;
 }
 
@@ -413,6 +550,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
     return 0;
 }
 
+static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
+    .name = "virtio-balloon-device/free-page-report",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = virtio_balloon_free_page_support,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
+        VMSTATE_UINT32(poison_val, VirtIOBalloon),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
     .name = "virtio-balloon-device",
     .version_id = 1,
@@ -423,30 +572,42 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
         VMSTATE_UINT32(actual, VirtIOBalloon),
         VMSTATE_END_OF_LIST()
     },
+    .subsections = (const VMStateDescription * []) {
+        &vmstate_virtio_balloon_free_page_report,
+        NULL
+    }
 };
 
 static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
-    int ret;
 
     virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
                 sizeof(struct virtio_balloon_config));
 
-    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
-                                   virtio_balloon_stat, s);
-
-    if (ret < 0) {
-        error_setg(errp, "Only one balloon device is supported");
-        virtio_cleanup(vdev);
-        return;
-    }
-
     s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
-
+    if (virtio_has_feature(s->host_features,
+                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
+        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+        s->free_page_report_cmd_id =
+                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
+        if (s->iothread) {
+            object_ref(OBJECT(s->iothread));
+            s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
+                                       virtio_balloon_poll_free_page_hints, s);
+            qemu_mutex_init(&s->free_page_lock);
+            qemu_cond_init(&s->free_page_cond);
+            s->block_iothread = false;
+        } else {
+            /* Simply disable this feature if the iothread wasn't created. */
+            s->host_features &= ~(1 << VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+            virtio_error(vdev, "iothread is missing");
+        }
+    }
     reset_stats(s);
 }
 
@@ -455,6 +616,10 @@ static void virtio_balloon_device_unrealize(DeviceState *dev, Error **errp)
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
 
+    if (virtio_balloon_free_page_support(s)) {
+        qemu_bh_delete(s->free_page_bh);
+        virtio_balloon_free_page_stop(s);
+    }
     balloon_stats_destroy_timer(s);
     qemu_remove_balloon_handler(s);
     virtio_cleanup(vdev);
@@ -464,6 +629,10 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
 
+    if (virtio_balloon_free_page_support(s)) {
+        virtio_balloon_free_page_stop(s);
+    }
+
     if (s->stats_vq_elem != NULL) {
         virtqueue_unpop(s->svq, s->stats_vq_elem, 0);
         g_free(s->stats_vq_elem);
@@ -475,11 +644,47 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
 
-    if (!s->stats_vq_elem && vdev->vm_running &&
-        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
-        /* poll stats queue for the element we have discarded when the VM
-         * was stopped */
-        virtio_balloon_receive_stats(vdev, s->svq);
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (!s->stats_vq_elem && vdev->vm_running &&
+            virtqueue_rewind(s->svq, 1)) {
+            /*
+             * Poll stats queue for the element we have discarded when the VM
+             * was stopped.
+             */
+            virtio_balloon_receive_stats(vdev, s->svq);
+        }
+
+        if (virtio_balloon_free_page_support(s)) {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat,
+                                     virtio_balloon_free_page_support,
+                                     virtio_balloon_free_page_start,
+                                     virtio_balloon_free_page_stop,
+                                     s);
+        } else {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat, NULL, NULL, NULL, s);
+        }
+    }
+
+    if (virtio_balloon_free_page_support(s)) {
+        /*
+         * The VM is woken up and the iothread was blocked, so signal it to
+         * continue.
+         */
+        if (vdev->vm_running && s->block_iothread) {
+            qemu_mutex_lock(&s->free_page_lock);
+            s->block_iothread = false;
+            qemu_cond_signal(&s->free_page_cond);
+            qemu_mutex_unlock(&s->free_page_lock);
+        }
+
+        /* The VM is stopped, block the iothread. */
+        if (!vdev->vm_running) {
+            qemu_mutex_lock(&s->free_page_lock);
+            s->block_iothread = true;
+            qemu_mutex_unlock(&s->free_page_lock);
+        }
     }
 }
 
@@ -509,6 +714,10 @@ static const VMStateDescription vmstate_virtio_balloon = {
 static Property virtio_balloon_properties[] = {
     DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
+    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+    DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
+                     IOThread *),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 1ea13bd..f865832 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -18,11 +18,14 @@
 #include "standard-headers/linux/virtio_balloon.h"
 #include "hw/virtio/virtio.h"
 #include "hw/pci/pci.h"
+#include "sysemu/iothread.h"
 
 #define TYPE_VIRTIO_BALLOON "virtio-balloon-device"
 #define VIRTIO_BALLOON(obj) \
         OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
+
 typedef struct virtio_balloon_stat VirtIOBalloonStat;
 
 typedef struct virtio_balloon_stat_modern {
@@ -31,15 +34,37 @@ typedef struct virtio_balloon_stat_modern {
        uint64_t val;
 } VirtIOBalloonStatModern;
 
+enum virtio_balloon_free_page_report_status {
+    FREE_PAGE_REPORT_S_STOP = 0,
+    FREE_PAGE_REPORT_S_REQUESTED = 1,
+    FREE_PAGE_REPORT_S_START = 2,
+};
+
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
+    uint32_t free_page_report_cmd_id;
+    uint32_t poison_val;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
+    IOThread *iothread;
+    QEMUBH *free_page_bh;
+    /*
+     * Lock to synchronize threads to access the free page reporting related
+     * fields (e.g. free_page_report_status).
+     */
+    QemuMutex free_page_lock;
+    QemuCond  free_page_cond;
+    /*
+     * Set to block iothread to continue reading free page hints as the VM is
+     * stopped.
+     */
+    bool block_iothread;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
     uint32_t host_features;
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 7b0a41b..f89e80f 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,15 +34,22 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
 struct virtio_balloon_config {
 	/* Number of pages host wants Guest to give up. */
 	uint32_t num_pages;
 	/* Number of pages we've actually got in balloon. */
 	uint32_t actual;
+	/* Free page report command id, readonly by guest */
+	uint32_t free_page_report_cmd_id;
+	/* Stores PAGE_POISON if page poisoning is in use */
+	uint32_t poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index 66543ae..6561a08 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -18,11 +18,22 @@
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
 typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
+typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
+typedef void (QEMUBalloonFreePageStart)(void *opaque);
+typedef void (QEMUBalloonFreePageStop)(void *opaque);
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 bool qemu_balloon_is_inhibited(void);
 void qemu_balloon_inhibit(bool state);
+bool balloon_free_page_support(void);
+void balloon_free_page_start(void);
+void balloon_free_page_stop(void);
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque);
 
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [virtio-dev] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-04-24  6:13   ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

balloon_free_page_start - start guest free page hint reporting.
balloon_free_page_stop - stop guest free page hint reporting.

Note: balloon will report pages which were free at the time
of this call. As the reporting happens asynchronously, dirty bit logging
must be enabled before this call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

TODO:
- handle the case when page poisoning is in use

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
---
 balloon.c                                       |  58 +++++-
 hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
 include/hw/virtio/virtio-balloon.h              |  27 ++-
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 5 files changed, 319 insertions(+), 29 deletions(-)

diff --git a/balloon.c b/balloon.c
index 6bf0a96..87a0410 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,9 @@
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
+static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
+static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
+static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
 static void *balloon_opaque;
 static bool balloon_inhibited;
 
@@ -64,19 +67,51 @@ static bool have_balloon(Error **errp)
     return true;
 }
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-                             QEMUBalloonStatus *stat_func, void *opaque)
+bool balloon_free_page_support(void)
 {
-    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
-        /* We're already registered one balloon handler.  How many can
-         * a guest really have?
-         */
-        return -1;
+    return balloon_free_page_support_fn &&
+           balloon_free_page_support_fn(balloon_opaque);
+}
+
+/*
+ * Balloon will report pages which were free at the time of this call. As the
+ * reporting happens asynchronously, dirty bit logging must be enabled before
+ * this call is made.
+ */
+void balloon_free_page_start(void)
+{
+    balloon_free_page_start_fn(balloon_opaque);
+}
+
+/*
+ * Guest reporting must be disabled before the migration dirty bitmap is
+ * synchronized.
+ */
+void balloon_free_page_stop(void)
+{
+    balloon_free_page_stop_fn(balloon_opaque);
+}
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque)
+{
+    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
+        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
+        balloon_opaque) {
+        /* We already registered one balloon handler. */
+        return;
     }
-    balloon_event_fn = event_func;
-    balloon_stat_fn = stat_func;
+
+    balloon_event_fn = event_fn;
+    balloon_stat_fn = stat_fn;
+    balloon_free_page_support_fn = free_page_support_fn;
+    balloon_free_page_start_fn = free_page_start_fn;
+    balloon_free_page_stop_fn = free_page_stop_fn;
     balloon_opaque = opaque;
-    return 0;
 }
 
 void qemu_remove_balloon_handler(void *opaque)
@@ -86,6 +121,9 @@ void qemu_remove_balloon_handler(void *opaque)
     }
     balloon_event_fn = NULL;
     balloon_stat_fn = NULL;
+    balloon_free_page_support_fn = NULL;
+    balloon_free_page_start_fn = NULL;
+    balloon_free_page_stop_fn = NULL;
     balloon_opaque = NULL;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index f456cea..13bf0db 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -31,6 +31,7 @@
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "migration/misc.h"
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
 
@@ -308,6 +309,125 @@ out:
     }
 }
 
+static void virtio_balloon_poll_free_page_hints(void *opaque)
+{
+    VirtQueueElement *elem;
+    VirtIOBalloon *dev = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtQueue *vq = dev->free_page_vq;
+    uint32_t id;
+    size_t size;
+
+    while (1) {
+        qemu_mutex_lock(&dev->free_page_lock);
+        while (dev->block_iothread) {
+            qemu_cond_wait(&dev->free_page_cond, &dev->free_page_lock);
+        }
+
+        /*
+         * If the migration thread actively stops the reporting, exit
+         * immediately.
+         */
+        if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
+            qemu_mutex_unlock(&dev->free_page_lock);
+            break;
+        }
+
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            qemu_mutex_unlock(&dev->free_page_lock);
+            continue;
+        }
+
+        if (elem->out_num) {
+            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
+            virtqueue_push(vq, elem, size);
+            g_free(elem);
+
+            virtio_tswap32s(vdev, &id);
+            if (unlikely(size != sizeof(id))) {
+                virtio_error(vdev, "received an incorrect cmd id");
+                break;
+            }
+            if (id == dev->free_page_report_cmd_id) {
+                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+            } else {
+                /*
+                 * Stop the optimization only when it has started. This
+                 * avoids a stale stop sign for the previous command.
+                 */
+                if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
+                    dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+                    qemu_mutex_unlock(&dev->free_page_lock);
+                    break;
+                }
+            }
+        }
+
+        if (elem->in_num) {
+            /* TODO: send the poison value to the destination */
+            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
+                !dev->poison_val) {
+                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+                                          elem->in_sg[0].iov_len);
+            }
+            virtqueue_push(vq, elem, 0);
+            g_free(elem);
+        }
+        qemu_mutex_unlock(&dev->free_page_lock);
+    }
+    virtio_notify(vdev, vq);
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    /* For the stop and copy phase, we don't need to start the optimization */
+    if (!vdev->vm_running) {
+        return;
+    }
+
+    if (s->free_page_report_cmd_id == UINT_MAX) {
+        s->free_page_report_cmd_id =
+                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+    } else {
+        s->free_page_report_cmd_id++;
+    }
+
+    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+    virtio_notify_config(vdev);
+    qemu_bh_schedule(s->free_page_bh);
+}
+
+static void virtio_balloon_free_page_stop(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    if (s->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
+        return;
+    } else {
+        qemu_mutex_lock(&s->free_page_lock);
+        /*
+         * The guest hasn't done the reporting, so host sends a notification
+         * to the guest to actively stop the reporting.
+         */
+        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+        qemu_mutex_unlock(&s->free_page_lock);
+        virtio_notify_config(vdev);
+    }
+}
+
 static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
@@ -315,6 +435,17 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
+    if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
+        config.poison_val = cpu_to_le32(dev->poison_val);
+    }
+
+    if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+    } else {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(dev->free_page_report_cmd_id);
+    }
 
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
@@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
                         ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
                         &error_abort);
     }
+    dev->poison_val = le32_to_cpu(config.poison_val);
     trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -377,6 +509,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+
+    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
+        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
+    }
+
     return f;
 }
 
@@ -413,6 +550,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
     return 0;
 }
 
+static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
+    .name = "virtio-balloon-device/free-page-report",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = virtio_balloon_free_page_support,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
+        VMSTATE_UINT32(poison_val, VirtIOBalloon),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
     .name = "virtio-balloon-device",
     .version_id = 1,
@@ -423,30 +572,42 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
         VMSTATE_UINT32(actual, VirtIOBalloon),
         VMSTATE_END_OF_LIST()
     },
+    .subsections = (const VMStateDescription * []) {
+        &vmstate_virtio_balloon_free_page_report,
+        NULL
+    }
 };
 
 static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
-    int ret;
 
     virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
                 sizeof(struct virtio_balloon_config));
 
-    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
-                                   virtio_balloon_stat, s);
-
-    if (ret < 0) {
-        error_setg(errp, "Only one balloon device is supported");
-        virtio_cleanup(vdev);
-        return;
-    }
-
     s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
-
+    if (virtio_has_feature(s->host_features,
+                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
+        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+        s->free_page_report_cmd_id =
+                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
+        if (s->iothread) {
+            object_ref(OBJECT(s->iothread));
+            s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
+                                       virtio_balloon_poll_free_page_hints, s);
+            qemu_mutex_init(&s->free_page_lock);
+            qemu_cond_init(&s->free_page_cond);
+            s->block_iothread = false;
+        } else {
+            /* Simply disable this feature if the iothread wasn't created. */
+            s->host_features &= ~(1 << VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+            virtio_error(vdev, "iothread is missing");
+        }
+    }
     reset_stats(s);
 }
 
@@ -455,6 +616,10 @@ static void virtio_balloon_device_unrealize(DeviceState *dev, Error **errp)
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
 
+    if (virtio_balloon_free_page_support(s)) {
+        qemu_bh_delete(s->free_page_bh);
+        virtio_balloon_free_page_stop(s);
+    }
     balloon_stats_destroy_timer(s);
     qemu_remove_balloon_handler(s);
     virtio_cleanup(vdev);
@@ -464,6 +629,10 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
 
+    if (virtio_balloon_free_page_support(s)) {
+        virtio_balloon_free_page_stop(s);
+    }
+
     if (s->stats_vq_elem != NULL) {
         virtqueue_unpop(s->svq, s->stats_vq_elem, 0);
         g_free(s->stats_vq_elem);
@@ -475,11 +644,47 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
 
-    if (!s->stats_vq_elem && vdev->vm_running &&
-        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
-        /* poll stats queue for the element we have discarded when the VM
-         * was stopped */
-        virtio_balloon_receive_stats(vdev, s->svq);
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (!s->stats_vq_elem && vdev->vm_running &&
+            virtqueue_rewind(s->svq, 1)) {
+            /*
+             * Poll stats queue for the element we have discarded when the VM
+             * was stopped.
+             */
+            virtio_balloon_receive_stats(vdev, s->svq);
+        }
+
+        if (virtio_balloon_free_page_support(s)) {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat,
+                                     virtio_balloon_free_page_support,
+                                     virtio_balloon_free_page_start,
+                                     virtio_balloon_free_page_stop,
+                                     s);
+        } else {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat, NULL, NULL, NULL, s);
+        }
+    }
+
+    if (virtio_balloon_free_page_support(s)) {
+        /*
+         * The VM is woken up and the iothread was blocked, so signal it to
+         * continue.
+         */
+        if (vdev->vm_running && s->block_iothread) {
+            qemu_mutex_lock(&s->free_page_lock);
+            s->block_iothread = false;
+            qemu_cond_signal(&s->free_page_cond);
+            qemu_mutex_unlock(&s->free_page_lock);
+        }
+
+        /* The VM is stopped, block the iothread. */
+        if (!vdev->vm_running) {
+            qemu_mutex_lock(&s->free_page_lock);
+            s->block_iothread = true;
+            qemu_mutex_unlock(&s->free_page_lock);
+        }
     }
 }
 
@@ -509,6 +714,10 @@ static const VMStateDescription vmstate_virtio_balloon = {
 static Property virtio_balloon_properties[] = {
     DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
+    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+    DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
+                     IOThread *),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 1ea13bd..f865832 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -18,11 +18,14 @@
 #include "standard-headers/linux/virtio_balloon.h"
 #include "hw/virtio/virtio.h"
 #include "hw/pci/pci.h"
+#include "sysemu/iothread.h"
 
 #define TYPE_VIRTIO_BALLOON "virtio-balloon-device"
 #define VIRTIO_BALLOON(obj) \
         OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
+
 typedef struct virtio_balloon_stat VirtIOBalloonStat;
 
 typedef struct virtio_balloon_stat_modern {
@@ -31,15 +34,37 @@ typedef struct virtio_balloon_stat_modern {
        uint64_t val;
 } VirtIOBalloonStatModern;
 
+enum virtio_balloon_free_page_report_status {
+    FREE_PAGE_REPORT_S_STOP = 0,
+    FREE_PAGE_REPORT_S_REQUESTED = 1,
+    FREE_PAGE_REPORT_S_START = 2,
+};
+
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
+    uint32_t free_page_report_cmd_id;
+    uint32_t poison_val;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
+    IOThread *iothread;
+    QEMUBH *free_page_bh;
+    /*
+     * Lock to synchronize threads to access the free page reporting related
+     * fields (e.g. free_page_report_status).
+     */
+    QemuMutex free_page_lock;
+    QemuCond  free_page_cond;
+    /*
+     * Set to block iothread to continue reading free page hints as the VM is
+     * stopped.
+     */
+    bool block_iothread;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
     uint32_t host_features;
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 7b0a41b..f89e80f 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,15 +34,22 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
 struct virtio_balloon_config {
 	/* Number of pages host wants Guest to give up. */
 	uint32_t num_pages;
 	/* Number of pages we've actually got in balloon. */
 	uint32_t actual;
+	/* Free page report command id, readonly by guest */
+	uint32_t free_page_report_cmd_id;
+	/* Stores PAGE_POISON if page poisoning is in use */
+	uint32_t poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index 66543ae..6561a08 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -18,11 +18,22 @@
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
 typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
+typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
+typedef void (QEMUBalloonFreePageStart)(void *opaque);
+typedef void (QEMUBalloonFreePageStop)(void *opaque);
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 bool qemu_balloon_is_inhibited(void);
 void qemu_balloon_inhibit(bool state);
+bool balloon_free_page_support(void);
+void balloon_free_page_start(void);
+void balloon_free_page_stop(void);
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque);
 
 #endif
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [Qemu-devel] [PATCH v7 5/5] migration: use the free page hint feature from balloon
  2018-04-24  6:13 ` [virtio-dev] " Wei Wang
@ 2018-04-24  6:13   ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Start the free page optimization after the migration bitmap is
synchronized. This can't be used in the stop&copy phase since the guest
is paused. Make sure the guest reporting has stopped before
synchronizing the migration dirty bitmap. Currently, the optimization is
added to precopy only.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 0147548..1d85ffa 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -51,6 +51,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "sysemu/balloon.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -213,6 +214,8 @@ struct RAMState {
     uint32_t last_version;
     /* We are in the first round */
     bool ram_bulk_stage;
+    /* The free pages optimization feature is supported */
+    bool free_page_support;
     /* How many times we have dirty too many pages */
     int dirty_rate_high_cnt;
     /* these variables are used for bitmap sync */
@@ -841,6 +844,10 @@ static void migration_bitmap_sync(RAMState *rs)
     int64_t end_time;
     uint64_t bytes_xfer_now;
 
+    if (rs->free_page_support) {
+        balloon_free_page_stop();
+    }
+
     ram_counters.dirty_sync_count++;
 
     if (!rs->time_last_bitmap_sync) {
@@ -907,6 +914,10 @@ static void migration_bitmap_sync(RAMState *rs)
     if (migrate_use_events()) {
         qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
     }
+
+    if (rs->free_page_support) {
+        balloon_free_page_start();
+    }
 }
 
 /**
@@ -1663,7 +1674,17 @@ static void ram_state_reset(RAMState *rs)
     rs->last_sent_block = NULL;
     rs->last_page = 0;
     rs->last_version = ram_list.version;
-    rs->ram_bulk_stage = true;
+    rs->free_page_support = balloon_free_page_support() && !migrate_postcopy();
+    if (rs->free_page_support) {
+        /*
+         * When the free page optimization is used, not all the pages are
+         * treated as dirty pages (via migration_bitmap_find_dirty), which need
+         * to be sent. So disable ram_bulk_stage in this case.
+         */
+        rs->ram_bulk_stage = false;
+    } else {
+        rs->ram_bulk_stage = true;
+    }
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -2369,6 +2390,9 @@ out:
 
     ret = qemu_file_get_error(f);
     if (ret < 0) {
+        if (rs->free_page_support) {
+            balloon_free_page_stop();
+        }
         return ret;
     }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [virtio-dev] [PATCH v7 5/5] migration: use the free page hint feature from balloon
@ 2018-04-24  6:13   ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:13 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Start the free page optimization after the migration bitmap is
synchronized. This can't be used in the stop&copy phase since the guest
is paused. Make sure the guest reporting has stopped before
synchronizing the migration dirty bitmap. Currently, the optimization is
added to precopy only.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 0147548..1d85ffa 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -51,6 +51,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "sysemu/balloon.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -213,6 +214,8 @@ struct RAMState {
     uint32_t last_version;
     /* We are in the first round */
     bool ram_bulk_stage;
+    /* The free pages optimization feature is supported */
+    bool free_page_support;
     /* How many times we have dirty too many pages */
     int dirty_rate_high_cnt;
     /* these variables are used for bitmap sync */
@@ -841,6 +844,10 @@ static void migration_bitmap_sync(RAMState *rs)
     int64_t end_time;
     uint64_t bytes_xfer_now;
 
+    if (rs->free_page_support) {
+        balloon_free_page_stop();
+    }
+
     ram_counters.dirty_sync_count++;
 
     if (!rs->time_last_bitmap_sync) {
@@ -907,6 +914,10 @@ static void migration_bitmap_sync(RAMState *rs)
     if (migrate_use_events()) {
         qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
     }
+
+    if (rs->free_page_support) {
+        balloon_free_page_start();
+    }
 }
 
 /**
@@ -1663,7 +1674,17 @@ static void ram_state_reset(RAMState *rs)
     rs->last_sent_block = NULL;
     rs->last_page = 0;
     rs->last_version = ram_list.version;
-    rs->ram_bulk_stage = true;
+    rs->free_page_support = balloon_free_page_support() && !migrate_postcopy();
+    if (rs->free_page_support) {
+        /*
+         * When the free page optimization is used, not all the pages are
+         * treated as dirty pages (via migration_bitmap_find_dirty), which need
+         * to be sent. So disable ram_bulk_stage in this case.
+         */
+        rs->ram_bulk_stage = false;
+    } else {
+        rs->ram_bulk_stage = true;
+    }
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -2369,6 +2390,9 @@ out:
 
     ret = qemu_file_get_error(f);
     if (ret < 0) {
+        if (rs->free_page_support) {
+            balloon_free_page_stop();
+        }
         return ret;
     }
 
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-04-24  6:13 ` [virtio-dev] " Wei Wang
@ 2018-04-24  6:42   ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:42 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 04/24/2018 02:13 PM, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
>
> - Test Environment
>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>      Guest: 8G RAM, 4 vCPU
>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>
> - Test Results
>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>      - Guest with Linux Compilation Workload (make bzImage -j4):
>          - Live Migration Time (average)
>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>          - Linux Compilation Time
>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>            --> no obvious difference
>
> - Source Code
>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>
> ChangeLog:
> v6->v7:
>        virtio-balloon/virtio_balloo_poll_free_page_hints:
>            - add virtio_notify() at the end to notify the driver that
>              the optimization is done, which indicates that the entries
>              have all been put back to the vq and ready to detach them.

Hi Dave,

Thanks for reviewing this patch series. Do you have more comments on 
them? If no, would it be possible to get your reviewed-by?
The current kernel part is done already. Hope we could finish the QEMU 
part soon, and have people start to use this feature. Thanks.

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 0/5] virtio-balloon: free page hint reporting support
@ 2018-04-24  6:42   ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-04-24  6:42 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 04/24/2018 02:13 PM, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
>
> - Test Environment
>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>      Guest: 8G RAM, 4 vCPU
>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>
> - Test Results
>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>      - Guest with Linux Compilation Workload (make bzImage -j4):
>          - Live Migration Time (average)
>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>          - Linux Compilation Time
>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>            --> no obvious difference
>
> - Source Code
>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>
> ChangeLog:
> v6->v7:
>        virtio-balloon/virtio_balloo_poll_free_page_hints:
>            - add virtio_notify() at the end to notify the driver that
>              the optimization is done, which indicates that the entries
>              have all been put back to the vq and ready to detach them.

Hi Dave,

Thanks for reviewing this patch series. Do you have more comments on 
them? If no, would it be possible to get your reviewed-by?
The current kernel part is done already. Hope we could finish the QEMU 
part soon, and have people start to use this feature. Thanks.

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-04-24  6:13 ` [virtio-dev] " Wei Wang
@ 2018-05-14  1:22   ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-05-14  1:22 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 04/24/2018 02:13 PM, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
>
> - Test Environment
>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>      Guest: 8G RAM, 4 vCPU
>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>
> - Test Results
>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>      - Guest with Linux Compilation Workload (make bzImage -j4):
>          - Live Migration Time (average)
>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>          - Linux Compilation Time
>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>            --> no obvious difference
>
> - Source Code
>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>
> ChangeLog:
> v6->v7:
>        virtio-balloon/virtio_balloo_poll_free_page_hints:
>            - add virtio_notify() at the end to notify the driver that
>              the optimization is done, which indicates that the entries
>              have all been put back to the vq and ready to detach them.
> v5->v6:
>        virtio-balloon: use iothread to get free page hint
> v4->v5:
>      1) migration:
>          - bitmap_clear_dirty: update the dirty bitmap and dirty page
>            count under the bitmap mutex as what other functions are doing;
>          - qemu_guest_free_page_hint:
>              - add comments for this function;
>              - check the !block case;
>              - check "offset > block->used_length" before proceed;
>              - assign used_len inside the for{} body;
>              - update the dirty bitmap and dirty page counter under the
>                bitmap mutex;
>          - ram_state_reset:
>              - rs->free_page_support: && with use "migrate_postcopy"
>                instead of migration_in_postcopy;
>              - clear the ram_bulk_stage flag if free_page_support is true;
>      2) balloon:
>           - add the usage documentation of balloon_free_page_start and
>             balloon_free_page_stop in code;
>           - the optimization thread is named "balloon_fpo" to meet the
>             requirement of "less than 14 characters";
>           - virtio_balloon_poll_free_page_hints:
>               - run on condition when runstate_is_running() is true;
>               - add a qemu spin lock to synchronize accesses to the free
>                 page reporting related fields shared among the migration
>                 thread and the optimization thread;
>            - virtio_balloon_free_page_start: just return if
>              runstate_is_running is false;
>            - virtio_balloon_free_page_stop: access to the free page
>              reporting related fields under a qemu spin lock;
>            - virtio_balloon_device_unrealize/reset: call
>              virtio_balloon_free_page_stop is the free page hint feature is
>              used;
>            - virtio_balloon_set_status: call irtio_balloon_free_page_stop
>              in case the guest is stopped by qmp when the optimization is
>              running;
> v3->v4:
>      1) bitmap: add a new API to count 1s starting from an offset of a
>         bitmap
>      2) migration:
>          - qemu_guest_free_page_hint: calculate
>            ram_state->migration_dirty_pages by counting how many bits of
>            free pages are truely cleared. If some of the bits were
>            already 0, they shouldn't be deducted by
>            ram_state->migration_dirty_pages. This wasn't needed for
>            previous versions since we optimized bulk stage only,
>            where all bits are guaranteed to be set. It's needed now
>            because we extened the usage of this optimizaton to all stages
>            except the last stop&copy stage. From 2nd stage onward, there
>            are possibilities that some bits of free pages are already 0.
>       3) virtio-balloon:
>           - virtio_balloon_free_page_report_status: introduce a new status,
>             FREE_PAGE_REPORT_S_EXIT. This status indicates that the
>             optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
>             the reporting is stopped, but the optimization thread still needs
>             to be joined by the migration thread.
> v2->v3:
>      1) virtio-balloon
>          - virtio_balloon_free_page_start: poll the hints using a new
>            thread;
>          - use cmd id between [0x80000000, UINT_MAX];
>          - virtio_balloon_poll_free_page_hints:
>              - stop the optimization only when it has started;
>              - don't skip free pages when !poison_val;
>          - add poison_val to vmsd to migrate;
>          - virtio_balloon_get_features: add the F_PAGE_POISON feature when
>            host has F_FREE_PAGE_HINT;
>          - remove the timer patch which is not needed now.
>      2) migration
>         - new api, qemu_guest_free_page_hint;
>         - rs->free_page_support set only in the precopy case;
>         - use the new balloon APIs.
> v1->v2:
>      1) virtio-balloon
>          - use subsections to save free_page_report_cmd_id;
>          - poll the free page vq after sending a cmd id to the driver;
>          - change the free page vq size to VIRTQUEUE_MAX_SIZE;
>          - virtio_balloon_poll_free_page_hints: handle the corner case
>            that the free page block reported from the driver may cross
>            the RAMBlock boundary.
>      2) migration/ram.c
>          - use balloon_free_page_poll to start the optimization
>
>
> Wei Wang (5):
>    bitmap: bitmap_count_one_with_offset
>    migration: use bitmap_mutex in migration_bitmap_clear_dirty
>    migration: API to clear bits of guest free pages from the dirty bitmap
>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>    migration: use the free page hint feature from balloon
>
>   balloon.c                                       |  58 +++++-
>   hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
>   include/hw/virtio/virtio-balloon.h              |  27 ++-
>   include/migration/misc.h                        |   2 +
>   include/qemu/bitmap.h                           |  13 ++
>   include/standard-headers/linux/virtio_balloon.h |   7 +
>   include/sysemu/balloon.h                        |  15 +-
>   migration/ram.c                                 |  73 ++++++-
>   8 files changed, 406 insertions(+), 30 deletions(-)
>

Ping for comments, thanks.

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 0/5] virtio-balloon: free page hint reporting support
@ 2018-05-14  1:22   ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-05-14  1:22 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 04/24/2018 02:13 PM, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
>
> - Test Environment
>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>      Guest: 8G RAM, 4 vCPU
>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>
> - Test Results
>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>      - Guest with Linux Compilation Workload (make bzImage -j4):
>          - Live Migration Time (average)
>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>          - Linux Compilation Time
>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>            --> no obvious difference
>
> - Source Code
>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>
> ChangeLog:
> v6->v7:
>        virtio-balloon/virtio_balloo_poll_free_page_hints:
>            - add virtio_notify() at the end to notify the driver that
>              the optimization is done, which indicates that the entries
>              have all been put back to the vq and ready to detach them.
> v5->v6:
>        virtio-balloon: use iothread to get free page hint
> v4->v5:
>      1) migration:
>          - bitmap_clear_dirty: update the dirty bitmap and dirty page
>            count under the bitmap mutex as what other functions are doing;
>          - qemu_guest_free_page_hint:
>              - add comments for this function;
>              - check the !block case;
>              - check "offset > block->used_length" before proceed;
>              - assign used_len inside the for{} body;
>              - update the dirty bitmap and dirty page counter under the
>                bitmap mutex;
>          - ram_state_reset:
>              - rs->free_page_support: && with use "migrate_postcopy"
>                instead of migration_in_postcopy;
>              - clear the ram_bulk_stage flag if free_page_support is true;
>      2) balloon:
>           - add the usage documentation of balloon_free_page_start and
>             balloon_free_page_stop in code;
>           - the optimization thread is named "balloon_fpo" to meet the
>             requirement of "less than 14 characters";
>           - virtio_balloon_poll_free_page_hints:
>               - run on condition when runstate_is_running() is true;
>               - add a qemu spin lock to synchronize accesses to the free
>                 page reporting related fields shared among the migration
>                 thread and the optimization thread;
>            - virtio_balloon_free_page_start: just return if
>              runstate_is_running is false;
>            - virtio_balloon_free_page_stop: access to the free page
>              reporting related fields under a qemu spin lock;
>            - virtio_balloon_device_unrealize/reset: call
>              virtio_balloon_free_page_stop is the free page hint feature is
>              used;
>            - virtio_balloon_set_status: call irtio_balloon_free_page_stop
>              in case the guest is stopped by qmp when the optimization is
>              running;
> v3->v4:
>      1) bitmap: add a new API to count 1s starting from an offset of a
>         bitmap
>      2) migration:
>          - qemu_guest_free_page_hint: calculate
>            ram_state->migration_dirty_pages by counting how many bits of
>            free pages are truely cleared. If some of the bits were
>            already 0, they shouldn't be deducted by
>            ram_state->migration_dirty_pages. This wasn't needed for
>            previous versions since we optimized bulk stage only,
>            where all bits are guaranteed to be set. It's needed now
>            because we extened the usage of this optimizaton to all stages
>            except the last stop&copy stage. From 2nd stage onward, there
>            are possibilities that some bits of free pages are already 0.
>       3) virtio-balloon:
>           - virtio_balloon_free_page_report_status: introduce a new status,
>             FREE_PAGE_REPORT_S_EXIT. This status indicates that the
>             optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
>             the reporting is stopped, but the optimization thread still needs
>             to be joined by the migration thread.
> v2->v3:
>      1) virtio-balloon
>          - virtio_balloon_free_page_start: poll the hints using a new
>            thread;
>          - use cmd id between [0x80000000, UINT_MAX];
>          - virtio_balloon_poll_free_page_hints:
>              - stop the optimization only when it has started;
>              - don't skip free pages when !poison_val;
>          - add poison_val to vmsd to migrate;
>          - virtio_balloon_get_features: add the F_PAGE_POISON feature when
>            host has F_FREE_PAGE_HINT;
>          - remove the timer patch which is not needed now.
>      2) migration
>         - new api, qemu_guest_free_page_hint;
>         - rs->free_page_support set only in the precopy case;
>         - use the new balloon APIs.
> v1->v2:
>      1) virtio-balloon
>          - use subsections to save free_page_report_cmd_id;
>          - poll the free page vq after sending a cmd id to the driver;
>          - change the free page vq size to VIRTQUEUE_MAX_SIZE;
>          - virtio_balloon_poll_free_page_hints: handle the corner case
>            that the free page block reported from the driver may cross
>            the RAMBlock boundary.
>      2) migration/ram.c
>          - use balloon_free_page_poll to start the optimization
>
>
> Wei Wang (5):
>    bitmap: bitmap_count_one_with_offset
>    migration: use bitmap_mutex in migration_bitmap_clear_dirty
>    migration: API to clear bits of guest free pages from the dirty bitmap
>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>    migration: use the free page hint feature from balloon
>
>   balloon.c                                       |  58 +++++-
>   hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
>   include/hw/virtio/virtio-balloon.h              |  27 ++-
>   include/migration/misc.h                        |   2 +
>   include/qemu/bitmap.h                           |  13 ++
>   include/standard-headers/linux/virtio_balloon.h |   7 +
>   include/sysemu/balloon.h                        |  15 +-
>   migration/ram.c                                 |  73 ++++++-
>   8 files changed, 406 insertions(+), 30 deletions(-)
>

Ping for comments, thanks.

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-04-24  6:13 ` [virtio-dev] " Wei Wang
                   ` (7 preceding siblings ...)
  (?)
@ 2018-05-29 15:00 ` Hailiang Zhang
  2018-05-29 15:24     ` [virtio-dev] " Michael S. Tsirkin
  -1 siblings, 1 reply; 93+ messages in thread
From: Hailiang Zhang @ 2018-05-29 15:00 UTC (permalink / raw)
  To: Wei Wang, qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On 2018/4/24 14:13, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
>
> - Test Environment
>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>      Guest: 8G RAM, 4 vCPU
>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>
> - Test Results
>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>      - Guest with Linux Compilation Workload (make bzImage -j4):
>          - Live Migration Time (average)
>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>          - Linux Compilation Time
>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>            --> no obvious difference
>
> - Source Code
>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>
> ChangeLog:
> v6->v7:
>        virtio-balloon/virtio_balloo_poll_free_page_hints:
>            - add virtio_notify() at the end to notify the driver that
>              the optimization is done, which indicates that the entries
>              have all been put back to the vq and ready to detach them.
> v5->v6:
>        virtio-balloon: use iothread to get free page hint
> v4->v5:
>      1) migration:
>          - bitmap_clear_dirty: update the dirty bitmap and dirty page
>            count under the bitmap mutex as what other functions are doing;
>          - qemu_guest_free_page_hint:
>              - add comments for this function;
>              - check the !block case;
>              - check "offset > block->used_length" before proceed;
>              - assign used_len inside the for{} body;
>              - update the dirty bitmap and dirty page counter under the
>                bitmap mutex;
>          - ram_state_reset:
>              - rs->free_page_support: && with use "migrate_postcopy"
>                instead of migration_in_postcopy;
>              - clear the ram_bulk_stage flag if free_page_support is true;
>      2) balloon:
>           - add the usage documentation of balloon_free_page_start and
>             balloon_free_page_stop in code;
>           - the optimization thread is named "balloon_fpo" to meet the
>             requirement of "less than 14 characters";
>           - virtio_balloon_poll_free_page_hints:
>               - run on condition when runstate_is_running() is true;
>               - add a qemu spin lock to synchronize accesses to the free
>                 page reporting related fields shared among the migration
>                 thread and the optimization thread;
>            - virtio_balloon_free_page_start: just return if
>              runstate_is_running is false;
>            - virtio_balloon_free_page_stop: access to the free page
>              reporting related fields under a qemu spin lock;
>            - virtio_balloon_device_unrealize/reset: call
>              virtio_balloon_free_page_stop is the free page hint feature is
>              used;
>            - virtio_balloon_set_status: call irtio_balloon_free_page_stop
>              in case the guest is stopped by qmp when the optimization is
>              running;
> v3->v4:
>      1) bitmap: add a new API to count 1s starting from an offset of a
>         bitmap
>      2) migration:
>          - qemu_guest_free_page_hint: calculate
>            ram_state->migration_dirty_pages by counting how many bits of
>            free pages are truely cleared. If some of the bits were
>            already 0, they shouldn't be deducted by
>            ram_state->migration_dirty_pages. This wasn't needed for
>            previous versions since we optimized bulk stage only,
>            where all bits are guaranteed to be set. It's needed now
>            because we extened the usage of this optimizaton to all stages
>            except the last stop&copy stage. From 2nd stage onward, there
>            are possibilities that some bits of free pages are already 0.
>       3) virtio-balloon:
>           - virtio_balloon_free_page_report_status: introduce a new status,
>             FREE_PAGE_REPORT_S_EXIT. This status indicates that the
>             optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
>             the reporting is stopped, but the optimization thread still needs
>             to be joined by the migration thread.
> v2->v3:
>      1) virtio-balloon
>          - virtio_balloon_free_page_start: poll the hints using a new
>            thread;
>          - use cmd id between [0x80000000, UINT_MAX];
>          - virtio_balloon_poll_free_page_hints:
>              - stop the optimization only when it has started;
>              - don't skip free pages when !poison_val;
>          - add poison_val to vmsd to migrate;
>          - virtio_balloon_get_features: add the F_PAGE_POISON feature when
>            host has F_FREE_PAGE_HINT;
>          - remove the timer patch which is not needed now.
>      2) migration
>         - new api, qemu_guest_free_page_hint;
>         - rs->free_page_support set only in the precopy case;
>         - use the new balloon APIs.
> v1->v2:
>      1) virtio-balloon
>          - use subsections to save free_page_report_cmd_id;
>          - poll the free page vq after sending a cmd id to the driver;
>          - change the free page vq size to VIRTQUEUE_MAX_SIZE;
>          - virtio_balloon_poll_free_page_hints: handle the corner case
>            that the free page block reported from the driver may cross
>            the RAMBlock boundary.
>      2) migration/ram.c
>          - use balloon_free_page_poll to start the optimization
>
>
> Wei Wang (5):
>    bitmap: bitmap_count_one_with_offset
>    migration: use bitmap_mutex in migration_bitmap_clear_dirty
>    migration: API to clear bits of guest free pages from the dirty bitmap
>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>    migration: use the free page hint feature from balloon
>
>   balloon.c                                       |  58 +++++-
>   hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
>   include/hw/virtio/virtio-balloon.h              |  27 ++-
>   include/migration/misc.h                        |   2 +
>   include/qemu/bitmap.h                           |  13 ++
>   include/standard-headers/linux/virtio_balloon.h |   7 +
>   include/sysemu/balloon.h                        |  15 +-
>   migration/ram.c                                 |  73 ++++++-
>   8 files changed, 406 insertions(+), 30 deletions(-)

Nice optimization, for the first stage of  current migration method, we need to migrate all the pages of
VM to destination,  with this capability, we can reduce lots of unnecessary pages migrating.

Just a small piece of advice, it is better to split the fourth patch into small ones, to make it more easy
for reviewing. Besides, should we make this capability an optional one, just like other migration capabilities do ?

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-04-24  6:13   ` [virtio-dev] " Wei Wang
@ 2018-05-29 15:24     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-05-29 15:24 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
> The new feature enables the virtio-balloon device to receive hints of
> guest free pages from the free page vq.
> 
> balloon_free_page_start - start guest free page hint reporting.
> balloon_free_page_stop - stop guest free page hint reporting.
> 
> Note: balloon will report pages which were free at the time
> of this call. As the reporting happens asynchronously, dirty bit logging
> must be enabled before this call is made. Guest reporting must be
> disabled before the migration dirty bitmap is synchronized.
> 
> TODO:
> - handle the case when page poisoning is in use
>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> ---
>  balloon.c                                       |  58 +++++-
>  hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
>  include/hw/virtio/virtio-balloon.h              |  27 ++-
>  include/standard-headers/linux/virtio_balloon.h |   7 +
>  include/sysemu/balloon.h                        |  15 +-
>  5 files changed, 319 insertions(+), 29 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index 6bf0a96..87a0410 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -36,6 +36,9 @@
>  
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
> +static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
> +static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
>  static void *balloon_opaque;
>  static bool balloon_inhibited;
>  
> @@ -64,19 +67,51 @@ static bool have_balloon(Error **errp)
>      return true;
>  }
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -                             QEMUBalloonStatus *stat_func, void *opaque)
> +bool balloon_free_page_support(void)
>  {
> -    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> -        /* We're already registered one balloon handler.  How many can
> -         * a guest really have?
> -         */
> -        return -1;
> +    return balloon_free_page_support_fn &&
> +           balloon_free_page_support_fn(balloon_opaque);
> +}
> +
> +/*
> + * Balloon will report pages which were free at the time of this call. As the
> + * reporting happens asynchronously, dirty bit logging must be enabled before
> + * this call is made.
> + */
> +void balloon_free_page_start(void)
> +{
> +    balloon_free_page_start_fn(balloon_opaque);
> +}

Please create notifier support, not a single global.


> +
> +/*
> + * Guest reporting must be disabled before the migration dirty bitmap is
> + * synchronized.
> + */
> +void balloon_free_page_stop(void)
> +{
> +    balloon_free_page_stop_fn(balloon_opaque);
> +}
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque)
> +{
> +    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
> +        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
> +        balloon_opaque) {
> +        /* We already registered one balloon handler. */
> +        return;
>      }
> -    balloon_event_fn = event_func;
> -    balloon_stat_fn = stat_func;
> +
> +    balloon_event_fn = event_fn;
> +    balloon_stat_fn = stat_fn;
> +    balloon_free_page_support_fn = free_page_support_fn;
> +    balloon_free_page_start_fn = free_page_start_fn;
> +    balloon_free_page_stop_fn = free_page_stop_fn;
>      balloon_opaque = opaque;
> -    return 0;
>  }
>  
>  void qemu_remove_balloon_handler(void *opaque)
> @@ -86,6 +121,9 @@ void qemu_remove_balloon_handler(void *opaque)
>      }
>      balloon_event_fn = NULL;
>      balloon_stat_fn = NULL;
> +    balloon_free_page_support_fn = NULL;
> +    balloon_free_page_start_fn = NULL;
> +    balloon_free_page_stop_fn = NULL;
>      balloon_opaque = NULL;
>  }
>  
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index f456cea..13bf0db 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -31,6 +31,7 @@
>  
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "migration/misc.h"
>  
>  #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
>  
> @@ -308,6 +309,125 @@ out:
>      }
>  }
>  
> +static void virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;
> +
> +    while (1) {
> +        qemu_mutex_lock(&dev->free_page_lock);
> +        while (dev->block_iothread) {
> +            qemu_cond_wait(&dev->free_page_cond, &dev->free_page_lock);
> +        }
> +
> +        /*
> +         * If the migration thread actively stops the reporting, exit
> +         * immediately.
> +         */
> +        if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {

Please refactor this : move loop body into a function so
you can do lock/unlock in a single place.

> +            qemu_mutex_unlock(&dev->free_page_lock);
> +            break;
> +        }
> +
> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +        if (!elem) {
> +            qemu_mutex_unlock(&dev->free_page_lock);
> +            continue;
> +        }
> +
> +        if (elem->out_num) {
> +            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
> +            virtqueue_push(vq, elem, size);
> +            g_free(elem);
> +
> +            virtio_tswap32s(vdev, &id);
> +            if (unlikely(size != sizeof(id))) {
> +                virtio_error(vdev, "received an incorrect cmd id");
> +                break;
> +            }
> +            if (id == dev->free_page_report_cmd_id) {
> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> +            } else {
> +                /*
> +                 * Stop the optimization only when it has started. This
> +                 * avoids a stale stop sign for the previous command.
> +                 */
> +                if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
> +                    dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +                    qemu_mutex_unlock(&dev->free_page_lock);
> +                    break;
> +                }
> +            }
> +        }
> +
> +        if (elem->in_num) {
> +            /* TODO: send the poison value to the destination */
> +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
> +                !dev->poison_val) {
> +                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
> +                                          elem->in_sg[0].iov_len);
> +            }
> +            virtqueue_push(vq, elem, 0);
> +            g_free(elem);
> +        }
> +        qemu_mutex_unlock(&dev->free_page_lock);
> +    }
> +    virtio_notify(vdev, vq);
> +}
> +
> +static bool virtio_balloon_free_page_support(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);

or if poison is negotiated.

> +}
> +
> +static void virtio_balloon_free_page_start(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    /* For the stop and copy phase, we don't need to start the optimization */
> +    if (!vdev->vm_running) {
> +        return;
> +    }
> +
> +    if (s->free_page_report_cmd_id == UINT_MAX) {
> +        s->free_page_report_cmd_id =
> +                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> +    } else {
> +        s->free_page_report_cmd_id++;
> +    }
> +
> +    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
> +    virtio_notify_config(vdev);
> +    qemu_bh_schedule(s->free_page_bh);
> +}
> +
> +static void virtio_balloon_free_page_stop(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    if (s->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
> +        return;

Please just reverse the logic.

> +    } else {
> +        qemu_mutex_lock(&s->free_page_lock);
> +        /*
> +         * The guest hasn't done the reporting, so host sends a notification
> +         * to the guest to actively stop the reporting.
> +         */
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +        qemu_mutex_unlock(&s->free_page_lock);
> +        virtio_notify_config(vdev);
> +    }
> +}
> +
>  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> @@ -315,6 +435,17 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  
>      config.num_pages = cpu_to_le32(dev->num_pages);
>      config.actual = cpu_to_le32(dev->actual);
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
> +        config.poison_val = cpu_to_le32(dev->poison_val);
> +    }
> +
> +    if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
> +    } else {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(dev->free_page_report_cmd_id);
> +    }
>  
>      trace_virtio_balloon_get_config(config.num_pages, config.actual);
>      memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
> @@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
>                          &error_abort);
>      }
> +    dev->poison_val = le32_to_cpu(config.poison_val);
>      trace_virtio_balloon_set_config(dev->actual, oldactual);
>  }
>  
> @@ -377,6 +509,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +
> +    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> +    }
> +
>      return f;
>  }
>  
> @@ -413,6 +550,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
>      return 0;
>  }
>  
> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> +    .name = "virtio-balloon-device/free-page-report",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = virtio_balloon_free_page_support,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_virtio_balloon_device = {
>      .name = "virtio-balloon-device",
>      .version_id = 1,
> @@ -423,30 +572,42 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>          VMSTATE_UINT32(actual, VirtIOBalloon),
>          VMSTATE_END_OF_LIST()
>      },
> +    .subsections = (const VMStateDescription * []) {
> +        &vmstate_virtio_balloon_free_page_report,
> +        NULL
> +    }
>  };
>  
>  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> -    int ret;
>  
>      virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
>                  sizeof(struct virtio_balloon_config));
>  
> -    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> -                                   virtio_balloon_stat, s);
> -
> -    if (ret < 0) {
> -        error_setg(errp, "Only one balloon device is supported");
> -        virtio_cleanup(vdev);
> -        return;
> -    }
> -
>      s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> -
> +    if (virtio_has_feature(s->host_features,
> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +        s->free_page_report_cmd_id =
> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
> +        if (s->iothread) {
> +            object_ref(OBJECT(s->iothread));
> +            s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
> +                                       virtio_balloon_poll_free_page_hints, s);
> +            qemu_mutex_init(&s->free_page_lock);
> +            qemu_cond_init(&s->free_page_cond);
> +            s->block_iothread = false;
> +        } else {
> +            /* Simply disable this feature if the iothread wasn't created. */
> +            s->host_features &= ~(1 << VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> +            virtio_error(vdev, "iothread is missing");
> +        }
> +    }
>      reset_stats(s);
>  }
>  
> @@ -455,6 +616,10 @@ static void virtio_balloon_device_unrealize(DeviceState *dev, Error **errp)
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
>  
> +    if (virtio_balloon_free_page_support(s)) {
> +        qemu_bh_delete(s->free_page_bh);
> +        virtio_balloon_free_page_stop(s);
> +    }
>      balloon_stats_destroy_timer(s);
>      qemu_remove_balloon_handler(s);
>      virtio_cleanup(vdev);
> @@ -464,6 +629,10 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> +    if (virtio_balloon_free_page_support(s)) {
> +        virtio_balloon_free_page_stop(s);
> +    }
> +
>      if (s->stats_vq_elem != NULL) {
>          virtqueue_unpop(s->svq, s->stats_vq_elem, 0);
>          g_free(s->stats_vq_elem);
> @@ -475,11 +644,47 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> -    if (!s->stats_vq_elem && vdev->vm_running &&
> -        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
> -        /* poll stats queue for the element we have discarded when the VM
> -         * was stopped */
> -        virtio_balloon_receive_stats(vdev, s->svq);
> +    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +        if (!s->stats_vq_elem && vdev->vm_running &&
> +            virtqueue_rewind(s->svq, 1)) {
> +            /*
> +             * Poll stats queue for the element we have discarded when the VM
> +             * was stopped.
> +             */
> +            virtio_balloon_receive_stats(vdev, s->svq);
> +        }
> +
> +        if (virtio_balloon_free_page_support(s)) {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat,
> +                                     virtio_balloon_free_page_support,
> +                                     virtio_balloon_free_page_start,
> +                                     virtio_balloon_free_page_stop,
> +                                     s);
> +        } else {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat, NULL, NULL, NULL, s);
> +        }
> +    }
> +
> +    if (virtio_balloon_free_page_support(s)) {
> +        /*
> +         * The VM is woken up and the iothread was blocked, so signal it to
> +         * continue.
> +         */
> +        if (vdev->vm_running && s->block_iothread) {
> +            qemu_mutex_lock(&s->free_page_lock);
> +            s->block_iothread = false;
> +            qemu_cond_signal(&s->free_page_cond);
> +            qemu_mutex_unlock(&s->free_page_lock);
> +        }
> +
> +        /* The VM is stopped, block the iothread. */
> +        if (!vdev->vm_running) {
> +            qemu_mutex_lock(&s->free_page_lock);
> +            s->block_iothread = true;
> +            qemu_mutex_unlock(&s->free_page_lock);
> +        }
>      }
>  }
>  
> @@ -509,6 +714,10 @@ static const VMStateDescription vmstate_virtio_balloon = {
>  static Property virtio_balloon_properties[] = {
>      DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> +    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> +    DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
> +                     IOThread *),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 1ea13bd..f865832 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -18,11 +18,14 @@
>  #include "standard-headers/linux/virtio_balloon.h"
>  #include "hw/virtio/virtio.h"
>  #include "hw/pci/pci.h"
> +#include "sysemu/iothread.h"
>  
>  #define TYPE_VIRTIO_BALLOON "virtio-balloon-device"
>  #define VIRTIO_BALLOON(obj) \
>          OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
> +
>  typedef struct virtio_balloon_stat VirtIOBalloonStat;
>  
>  typedef struct virtio_balloon_stat_modern {
> @@ -31,15 +34,37 @@ typedef struct virtio_balloon_stat_modern {
>         uint64_t val;
>  } VirtIOBalloonStatModern;
>  
> +enum virtio_balloon_free_page_report_status {
> +    FREE_PAGE_REPORT_S_STOP = 0,
> +    FREE_PAGE_REPORT_S_REQUESTED = 1,
> +    FREE_PAGE_REPORT_S_START = 2,
> +};
> +
>  typedef struct VirtIOBalloon {
>      VirtIODevice parent_obj;
> -    VirtQueue *ivq, *dvq, *svq;
> +    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
> +    uint32_t free_page_report_status;
>      uint32_t num_pages;
>      uint32_t actual;
> +    uint32_t free_page_report_cmd_id;
> +    uint32_t poison_val;
>      uint64_t stats[VIRTIO_BALLOON_S_NR];
>      VirtQueueElement *stats_vq_elem;
>      size_t stats_vq_offset;
>      QEMUTimer *stats_timer;
> +    IOThread *iothread;
> +    QEMUBH *free_page_bh;
> +    /*
> +     * Lock to synchronize threads to access the free page reporting related
> +     * fields (e.g. free_page_report_status).
> +     */
> +    QemuMutex free_page_lock;
> +    QemuCond  free_page_cond;
> +    /*
> +     * Set to block iothread to continue reading free page hints as the VM is
> +     * stopped.
> +     */
> +    bool block_iothread;
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
>      uint32_t host_features;
> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
> index 7b0a41b..f89e80f 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -34,15 +34,22 @@
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> +#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
> +#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
>  struct virtio_balloon_config {
>  	/* Number of pages host wants Guest to give up. */
>  	uint32_t num_pages;
>  	/* Number of pages we've actually got in balloon. */
>  	uint32_t actual;
> +	/* Free page report command id, readonly by guest */
> +	uint32_t free_page_report_cmd_id;
> +	/* Stores PAGE_POISON if page poisoning is in use */
> +	uint32_t poison_val;
>  };
>  
>  #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index 66543ae..6561a08 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h
> @@ -18,11 +18,22 @@
>  
>  typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
>  typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
> +typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
> +typedef void (QEMUBalloonFreePageStart)(void *opaque);
> +typedef void (QEMUBalloonFreePageStop)(void *opaque);
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -			     QEMUBalloonStatus *stat_func, void *opaque);
>  void qemu_remove_balloon_handler(void *opaque);
>  bool qemu_balloon_is_inhibited(void);
>  void qemu_balloon_inhibit(bool state);
> +bool balloon_free_page_support(void);
> +void balloon_free_page_start(void);
> +void balloon_free_page_stop(void);
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque);
>  
>  #endif
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-05-29 15:24     ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-05-29 15:24 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
> The new feature enables the virtio-balloon device to receive hints of
> guest free pages from the free page vq.
> 
> balloon_free_page_start - start guest free page hint reporting.
> balloon_free_page_stop - stop guest free page hint reporting.
> 
> Note: balloon will report pages which were free at the time
> of this call. As the reporting happens asynchronously, dirty bit logging
> must be enabled before this call is made. Guest reporting must be
> disabled before the migration dirty bitmap is synchronized.
> 
> TODO:
> - handle the case when page poisoning is in use
>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> ---
>  balloon.c                                       |  58 +++++-
>  hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
>  include/hw/virtio/virtio-balloon.h              |  27 ++-
>  include/standard-headers/linux/virtio_balloon.h |   7 +
>  include/sysemu/balloon.h                        |  15 +-
>  5 files changed, 319 insertions(+), 29 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index 6bf0a96..87a0410 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -36,6 +36,9 @@
>  
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
> +static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
> +static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
>  static void *balloon_opaque;
>  static bool balloon_inhibited;
>  
> @@ -64,19 +67,51 @@ static bool have_balloon(Error **errp)
>      return true;
>  }
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -                             QEMUBalloonStatus *stat_func, void *opaque)
> +bool balloon_free_page_support(void)
>  {
> -    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> -        /* We're already registered one balloon handler.  How many can
> -         * a guest really have?
> -         */
> -        return -1;
> +    return balloon_free_page_support_fn &&
> +           balloon_free_page_support_fn(balloon_opaque);
> +}
> +
> +/*
> + * Balloon will report pages which were free at the time of this call. As the
> + * reporting happens asynchronously, dirty bit logging must be enabled before
> + * this call is made.
> + */
> +void balloon_free_page_start(void)
> +{
> +    balloon_free_page_start_fn(balloon_opaque);
> +}

Please create notifier support, not a single global.


> +
> +/*
> + * Guest reporting must be disabled before the migration dirty bitmap is
> + * synchronized.
> + */
> +void balloon_free_page_stop(void)
> +{
> +    balloon_free_page_stop_fn(balloon_opaque);
> +}
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque)
> +{
> +    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
> +        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
> +        balloon_opaque) {
> +        /* We already registered one balloon handler. */
> +        return;
>      }
> -    balloon_event_fn = event_func;
> -    balloon_stat_fn = stat_func;
> +
> +    balloon_event_fn = event_fn;
> +    balloon_stat_fn = stat_fn;
> +    balloon_free_page_support_fn = free_page_support_fn;
> +    balloon_free_page_start_fn = free_page_start_fn;
> +    balloon_free_page_stop_fn = free_page_stop_fn;
>      balloon_opaque = opaque;
> -    return 0;
>  }
>  
>  void qemu_remove_balloon_handler(void *opaque)
> @@ -86,6 +121,9 @@ void qemu_remove_balloon_handler(void *opaque)
>      }
>      balloon_event_fn = NULL;
>      balloon_stat_fn = NULL;
> +    balloon_free_page_support_fn = NULL;
> +    balloon_free_page_start_fn = NULL;
> +    balloon_free_page_stop_fn = NULL;
>      balloon_opaque = NULL;
>  }
>  
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index f456cea..13bf0db 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -31,6 +31,7 @@
>  
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "migration/misc.h"
>  
>  #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
>  
> @@ -308,6 +309,125 @@ out:
>      }
>  }
>  
> +static void virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;
> +
> +    while (1) {
> +        qemu_mutex_lock(&dev->free_page_lock);
> +        while (dev->block_iothread) {
> +            qemu_cond_wait(&dev->free_page_cond, &dev->free_page_lock);
> +        }
> +
> +        /*
> +         * If the migration thread actively stops the reporting, exit
> +         * immediately.
> +         */
> +        if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {

Please refactor this : move loop body into a function so
you can do lock/unlock in a single place.

> +            qemu_mutex_unlock(&dev->free_page_lock);
> +            break;
> +        }
> +
> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +        if (!elem) {
> +            qemu_mutex_unlock(&dev->free_page_lock);
> +            continue;
> +        }
> +
> +        if (elem->out_num) {
> +            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
> +            virtqueue_push(vq, elem, size);
> +            g_free(elem);
> +
> +            virtio_tswap32s(vdev, &id);
> +            if (unlikely(size != sizeof(id))) {
> +                virtio_error(vdev, "received an incorrect cmd id");
> +                break;
> +            }
> +            if (id == dev->free_page_report_cmd_id) {
> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> +            } else {
> +                /*
> +                 * Stop the optimization only when it has started. This
> +                 * avoids a stale stop sign for the previous command.
> +                 */
> +                if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
> +                    dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +                    qemu_mutex_unlock(&dev->free_page_lock);
> +                    break;
> +                }
> +            }
> +        }
> +
> +        if (elem->in_num) {
> +            /* TODO: send the poison value to the destination */
> +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
> +                !dev->poison_val) {
> +                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
> +                                          elem->in_sg[0].iov_len);
> +            }
> +            virtqueue_push(vq, elem, 0);
> +            g_free(elem);
> +        }
> +        qemu_mutex_unlock(&dev->free_page_lock);
> +    }
> +    virtio_notify(vdev, vq);
> +}
> +
> +static bool virtio_balloon_free_page_support(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);

or if poison is negotiated.

> +}
> +
> +static void virtio_balloon_free_page_start(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    /* For the stop and copy phase, we don't need to start the optimization */
> +    if (!vdev->vm_running) {
> +        return;
> +    }
> +
> +    if (s->free_page_report_cmd_id == UINT_MAX) {
> +        s->free_page_report_cmd_id =
> +                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> +    } else {
> +        s->free_page_report_cmd_id++;
> +    }
> +
> +    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
> +    virtio_notify_config(vdev);
> +    qemu_bh_schedule(s->free_page_bh);
> +}
> +
> +static void virtio_balloon_free_page_stop(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    if (s->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
> +        return;

Please just reverse the logic.

> +    } else {
> +        qemu_mutex_lock(&s->free_page_lock);
> +        /*
> +         * The guest hasn't done the reporting, so host sends a notification
> +         * to the guest to actively stop the reporting.
> +         */
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +        qemu_mutex_unlock(&s->free_page_lock);
> +        virtio_notify_config(vdev);
> +    }
> +}
> +
>  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> @@ -315,6 +435,17 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  
>      config.num_pages = cpu_to_le32(dev->num_pages);
>      config.actual = cpu_to_le32(dev->actual);
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ)) {
> +        config.poison_val = cpu_to_le32(dev->poison_val);
> +    }
> +
> +    if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
> +    } else {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(dev->free_page_report_cmd_id);
> +    }
>  
>      trace_virtio_balloon_get_config(config.num_pages, config.actual);
>      memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
> @@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
>                          &error_abort);
>      }
> +    dev->poison_val = le32_to_cpu(config.poison_val);
>      trace_virtio_balloon_set_config(dev->actual, oldactual);
>  }
>  
> @@ -377,6 +509,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +
> +    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> +    }
> +
>      return f;
>  }
>  
> @@ -413,6 +550,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
>      return 0;
>  }
>  
> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> +    .name = "virtio-balloon-device/free-page-report",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = virtio_balloon_free_page_support,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_virtio_balloon_device = {
>      .name = "virtio-balloon-device",
>      .version_id = 1,
> @@ -423,30 +572,42 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>          VMSTATE_UINT32(actual, VirtIOBalloon),
>          VMSTATE_END_OF_LIST()
>      },
> +    .subsections = (const VMStateDescription * []) {
> +        &vmstate_virtio_balloon_free_page_report,
> +        NULL
> +    }
>  };
>  
>  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> -    int ret;
>  
>      virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
>                  sizeof(struct virtio_balloon_config));
>  
> -    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> -                                   virtio_balloon_stat, s);
> -
> -    if (ret < 0) {
> -        error_setg(errp, "Only one balloon device is supported");
> -        virtio_cleanup(vdev);
> -        return;
> -    }
> -
>      s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> -
> +    if (virtio_has_feature(s->host_features,
> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +        s->free_page_report_cmd_id =
> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
> +        if (s->iothread) {
> +            object_ref(OBJECT(s->iothread));
> +            s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
> +                                       virtio_balloon_poll_free_page_hints, s);
> +            qemu_mutex_init(&s->free_page_lock);
> +            qemu_cond_init(&s->free_page_cond);
> +            s->block_iothread = false;
> +        } else {
> +            /* Simply disable this feature if the iothread wasn't created. */
> +            s->host_features &= ~(1 << VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> +            virtio_error(vdev, "iothread is missing");
> +        }
> +    }
>      reset_stats(s);
>  }
>  
> @@ -455,6 +616,10 @@ static void virtio_balloon_device_unrealize(DeviceState *dev, Error **errp)
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
>  
> +    if (virtio_balloon_free_page_support(s)) {
> +        qemu_bh_delete(s->free_page_bh);
> +        virtio_balloon_free_page_stop(s);
> +    }
>      balloon_stats_destroy_timer(s);
>      qemu_remove_balloon_handler(s);
>      virtio_cleanup(vdev);
> @@ -464,6 +629,10 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> +    if (virtio_balloon_free_page_support(s)) {
> +        virtio_balloon_free_page_stop(s);
> +    }
> +
>      if (s->stats_vq_elem != NULL) {
>          virtqueue_unpop(s->svq, s->stats_vq_elem, 0);
>          g_free(s->stats_vq_elem);
> @@ -475,11 +644,47 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> -    if (!s->stats_vq_elem && vdev->vm_running &&
> -        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
> -        /* poll stats queue for the element we have discarded when the VM
> -         * was stopped */
> -        virtio_balloon_receive_stats(vdev, s->svq);
> +    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +        if (!s->stats_vq_elem && vdev->vm_running &&
> +            virtqueue_rewind(s->svq, 1)) {
> +            /*
> +             * Poll stats queue for the element we have discarded when the VM
> +             * was stopped.
> +             */
> +            virtio_balloon_receive_stats(vdev, s->svq);
> +        }
> +
> +        if (virtio_balloon_free_page_support(s)) {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat,
> +                                     virtio_balloon_free_page_support,
> +                                     virtio_balloon_free_page_start,
> +                                     virtio_balloon_free_page_stop,
> +                                     s);
> +        } else {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat, NULL, NULL, NULL, s);
> +        }
> +    }
> +
> +    if (virtio_balloon_free_page_support(s)) {
> +        /*
> +         * The VM is woken up and the iothread was blocked, so signal it to
> +         * continue.
> +         */
> +        if (vdev->vm_running && s->block_iothread) {
> +            qemu_mutex_lock(&s->free_page_lock);
> +            s->block_iothread = false;
> +            qemu_cond_signal(&s->free_page_cond);
> +            qemu_mutex_unlock(&s->free_page_lock);
> +        }
> +
> +        /* The VM is stopped, block the iothread. */
> +        if (!vdev->vm_running) {
> +            qemu_mutex_lock(&s->free_page_lock);
> +            s->block_iothread = true;
> +            qemu_mutex_unlock(&s->free_page_lock);
> +        }
>      }
>  }
>  
> @@ -509,6 +714,10 @@ static const VMStateDescription vmstate_virtio_balloon = {
>  static Property virtio_balloon_properties[] = {
>      DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> +    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> +    DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
> +                     IOThread *),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 1ea13bd..f865832 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -18,11 +18,14 @@
>  #include "standard-headers/linux/virtio_balloon.h"
>  #include "hw/virtio/virtio.h"
>  #include "hw/pci/pci.h"
> +#include "sysemu/iothread.h"
>  
>  #define TYPE_VIRTIO_BALLOON "virtio-balloon-device"
>  #define VIRTIO_BALLOON(obj) \
>          OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
> +
>  typedef struct virtio_balloon_stat VirtIOBalloonStat;
>  
>  typedef struct virtio_balloon_stat_modern {
> @@ -31,15 +34,37 @@ typedef struct virtio_balloon_stat_modern {
>         uint64_t val;
>  } VirtIOBalloonStatModern;
>  
> +enum virtio_balloon_free_page_report_status {
> +    FREE_PAGE_REPORT_S_STOP = 0,
> +    FREE_PAGE_REPORT_S_REQUESTED = 1,
> +    FREE_PAGE_REPORT_S_START = 2,
> +};
> +
>  typedef struct VirtIOBalloon {
>      VirtIODevice parent_obj;
> -    VirtQueue *ivq, *dvq, *svq;
> +    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
> +    uint32_t free_page_report_status;
>      uint32_t num_pages;
>      uint32_t actual;
> +    uint32_t free_page_report_cmd_id;
> +    uint32_t poison_val;
>      uint64_t stats[VIRTIO_BALLOON_S_NR];
>      VirtQueueElement *stats_vq_elem;
>      size_t stats_vq_offset;
>      QEMUTimer *stats_timer;
> +    IOThread *iothread;
> +    QEMUBH *free_page_bh;
> +    /*
> +     * Lock to synchronize threads to access the free page reporting related
> +     * fields (e.g. free_page_report_status).
> +     */
> +    QemuMutex free_page_lock;
> +    QemuCond  free_page_cond;
> +    /*
> +     * Set to block iothread to continue reading free page hints as the VM is
> +     * stopped.
> +     */
> +    bool block_iothread;
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
>      uint32_t host_features;
> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
> index 7b0a41b..f89e80f 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -34,15 +34,22 @@
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> +#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
> +#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
>  struct virtio_balloon_config {
>  	/* Number of pages host wants Guest to give up. */
>  	uint32_t num_pages;
>  	/* Number of pages we've actually got in balloon. */
>  	uint32_t actual;
> +	/* Free page report command id, readonly by guest */
> +	uint32_t free_page_report_cmd_id;
> +	/* Stores PAGE_POISON if page poisoning is in use */
> +	uint32_t poison_val;
>  };
>  
>  #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index 66543ae..6561a08 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h
> @@ -18,11 +18,22 @@
>  
>  typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
>  typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
> +typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
> +typedef void (QEMUBalloonFreePageStart)(void *opaque);
> +typedef void (QEMUBalloonFreePageStop)(void *opaque);
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -			     QEMUBalloonStatus *stat_func, void *opaque);
>  void qemu_remove_balloon_handler(void *opaque);
>  bool qemu_balloon_is_inhibited(void);
>  void qemu_balloon_inhibit(bool state);
> +bool balloon_free_page_support(void);
> +void balloon_free_page_start(void);
> +void balloon_free_page_stop(void);
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque);
>  
>  #endif
> -- 
> 1.8.3.1

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-05-29 15:00 ` [Qemu-devel] " Hailiang Zhang
@ 2018-05-29 15:24     ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-05-29 15:24 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, dgilbert,
	yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On Tue, May 29, 2018 at 11:00:21PM +0800, Hailiang Zhang wrote:
> On 2018/4/24 14:13, Wei Wang wrote:
> > This is the deivce part implementation to add a new feature,
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > receives the guest free page hints from the driver and clears the
> > corresponding bits in the dirty bitmap, so that those free pages are
> > not transferred by the migration thread to the destination.
> > 
> > - Test Environment
> >      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> >      Guest: 8G RAM, 4 vCPU
> >      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > 
> > - Test Results
> >      - Idle Guest Live Migration Time (results are averaged over 10 runs):
> >          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> >      - Guest with Linux Compilation Workload (make bzImage -j4):
> >          - Live Migration Time (average)
> >            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> >          - Linux Compilation Time
> >            Optimization v.s. Legacy = 4min56s v.s. 5min3s
> >            --> no obvious difference
> > 
> > - Source Code
> >      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> >      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > 
> > ChangeLog:
> > v6->v7:
> >        virtio-balloon/virtio_balloo_poll_free_page_hints:
> >            - add virtio_notify() at the end to notify the driver that
> >              the optimization is done, which indicates that the entries
> >              have all been put back to the vq and ready to detach them.
> > v5->v6:
> >        virtio-balloon: use iothread to get free page hint
> > v4->v5:
> >      1) migration:
> >          - bitmap_clear_dirty: update the dirty bitmap and dirty page
> >            count under the bitmap mutex as what other functions are doing;
> >          - qemu_guest_free_page_hint:
> >              - add comments for this function;
> >              - check the !block case;
> >              - check "offset > block->used_length" before proceed;
> >              - assign used_len inside the for{} body;
> >              - update the dirty bitmap and dirty page counter under the
> >                bitmap mutex;
> >          - ram_state_reset:
> >              - rs->free_page_support: && with use "migrate_postcopy"
> >                instead of migration_in_postcopy;
> >              - clear the ram_bulk_stage flag if free_page_support is true;
> >      2) balloon:
> >           - add the usage documentation of balloon_free_page_start and
> >             balloon_free_page_stop in code;
> >           - the optimization thread is named "balloon_fpo" to meet the
> >             requirement of "less than 14 characters";
> >           - virtio_balloon_poll_free_page_hints:
> >               - run on condition when runstate_is_running() is true;
> >               - add a qemu spin lock to synchronize accesses to the free
> >                 page reporting related fields shared among the migration
> >                 thread and the optimization thread;
> >            - virtio_balloon_free_page_start: just return if
> >              runstate_is_running is false;
> >            - virtio_balloon_free_page_stop: access to the free page
> >              reporting related fields under a qemu spin lock;
> >            - virtio_balloon_device_unrealize/reset: call
> >              virtio_balloon_free_page_stop is the free page hint feature is
> >              used;
> >            - virtio_balloon_set_status: call irtio_balloon_free_page_stop
> >              in case the guest is stopped by qmp when the optimization is
> >              running;
> > v3->v4:
> >      1) bitmap: add a new API to count 1s starting from an offset of a
> >         bitmap
> >      2) migration:
> >          - qemu_guest_free_page_hint: calculate
> >            ram_state->migration_dirty_pages by counting how many bits of
> >            free pages are truely cleared. If some of the bits were
> >            already 0, they shouldn't be deducted by
> >            ram_state->migration_dirty_pages. This wasn't needed for
> >            previous versions since we optimized bulk stage only,
> >            where all bits are guaranteed to be set. It's needed now
> >            because we extened the usage of this optimizaton to all stages
> >            except the last stop&copy stage. From 2nd stage onward, there
> >            are possibilities that some bits of free pages are already 0.
> >       3) virtio-balloon:
> >           - virtio_balloon_free_page_report_status: introduce a new status,
> >             FREE_PAGE_REPORT_S_EXIT. This status indicates that the
> >             optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
> >             the reporting is stopped, but the optimization thread still needs
> >             to be joined by the migration thread.
> > v2->v3:
> >      1) virtio-balloon
> >          - virtio_balloon_free_page_start: poll the hints using a new
> >            thread;
> >          - use cmd id between [0x80000000, UINT_MAX];
> >          - virtio_balloon_poll_free_page_hints:
> >              - stop the optimization only when it has started;
> >              - don't skip free pages when !poison_val;
> >          - add poison_val to vmsd to migrate;
> >          - virtio_balloon_get_features: add the F_PAGE_POISON feature when
> >            host has F_FREE_PAGE_HINT;
> >          - remove the timer patch which is not needed now.
> >      2) migration
> >         - new api, qemu_guest_free_page_hint;
> >         - rs->free_page_support set only in the precopy case;
> >         - use the new balloon APIs.
> > v1->v2:
> >      1) virtio-balloon
> >          - use subsections to save free_page_report_cmd_id;
> >          - poll the free page vq after sending a cmd id to the driver;
> >          - change the free page vq size to VIRTQUEUE_MAX_SIZE;
> >          - virtio_balloon_poll_free_page_hints: handle the corner case
> >            that the free page block reported from the driver may cross
> >            the RAMBlock boundary.
> >      2) migration/ram.c
> >          - use balloon_free_page_poll to start the optimization
> > 
> > 
> > Wei Wang (5):
> >    bitmap: bitmap_count_one_with_offset
> >    migration: use bitmap_mutex in migration_bitmap_clear_dirty
> >    migration: API to clear bits of guest free pages from the dirty bitmap
> >    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> >    migration: use the free page hint feature from balloon
> > 
> >   balloon.c                                       |  58 +++++-
> >   hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
> >   include/hw/virtio/virtio-balloon.h              |  27 ++-
> >   include/migration/misc.h                        |   2 +
> >   include/qemu/bitmap.h                           |  13 ++
> >   include/standard-headers/linux/virtio_balloon.h |   7 +
> >   include/sysemu/balloon.h                        |  15 +-
> >   migration/ram.c                                 |  73 ++++++-
> >   8 files changed, 406 insertions(+), 30 deletions(-)
> 
> Nice optimization, for the first stage of  current migration method, we need to migrate all the pages of
> VM to destination,  with this capability, we can reduce lots of unnecessary pages migrating.
> 
> Just a small piece of advice, it is better to split the fourth patch into small ones, to make it more easy
> for reviewing. Besides, should we make this capability an optional one, just like other migration capabilities do ?

That's already the case, one has to enable it in the balloon, and set
the iothread.

-- 
MST

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
@ 2018-05-29 15:24     ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-05-29 15:24 UTC (permalink / raw)
  To: Hailiang Zhang
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, dgilbert,
	yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On Tue, May 29, 2018 at 11:00:21PM +0800, Hailiang Zhang wrote:
> On 2018/4/24 14:13, Wei Wang wrote:
> > This is the deivce part implementation to add a new feature,
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > receives the guest free page hints from the driver and clears the
> > corresponding bits in the dirty bitmap, so that those free pages are
> > not transferred by the migration thread to the destination.
> > 
> > - Test Environment
> >      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> >      Guest: 8G RAM, 4 vCPU
> >      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > 
> > - Test Results
> >      - Idle Guest Live Migration Time (results are averaged over 10 runs):
> >          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> >      - Guest with Linux Compilation Workload (make bzImage -j4):
> >          - Live Migration Time (average)
> >            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> >          - Linux Compilation Time
> >            Optimization v.s. Legacy = 4min56s v.s. 5min3s
> >            --> no obvious difference
> > 
> > - Source Code
> >      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> >      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > 
> > ChangeLog:
> > v6->v7:
> >        virtio-balloon/virtio_balloo_poll_free_page_hints:
> >            - add virtio_notify() at the end to notify the driver that
> >              the optimization is done, which indicates that the entries
> >              have all been put back to the vq and ready to detach them.
> > v5->v6:
> >        virtio-balloon: use iothread to get free page hint
> > v4->v5:
> >      1) migration:
> >          - bitmap_clear_dirty: update the dirty bitmap and dirty page
> >            count under the bitmap mutex as what other functions are doing;
> >          - qemu_guest_free_page_hint:
> >              - add comments for this function;
> >              - check the !block case;
> >              - check "offset > block->used_length" before proceed;
> >              - assign used_len inside the for{} body;
> >              - update the dirty bitmap and dirty page counter under the
> >                bitmap mutex;
> >          - ram_state_reset:
> >              - rs->free_page_support: && with use "migrate_postcopy"
> >                instead of migration_in_postcopy;
> >              - clear the ram_bulk_stage flag if free_page_support is true;
> >      2) balloon:
> >           - add the usage documentation of balloon_free_page_start and
> >             balloon_free_page_stop in code;
> >           - the optimization thread is named "balloon_fpo" to meet the
> >             requirement of "less than 14 characters";
> >           - virtio_balloon_poll_free_page_hints:
> >               - run on condition when runstate_is_running() is true;
> >               - add a qemu spin lock to synchronize accesses to the free
> >                 page reporting related fields shared among the migration
> >                 thread and the optimization thread;
> >            - virtio_balloon_free_page_start: just return if
> >              runstate_is_running is false;
> >            - virtio_balloon_free_page_stop: access to the free page
> >              reporting related fields under a qemu spin lock;
> >            - virtio_balloon_device_unrealize/reset: call
> >              virtio_balloon_free_page_stop is the free page hint feature is
> >              used;
> >            - virtio_balloon_set_status: call irtio_balloon_free_page_stop
> >              in case the guest is stopped by qmp when the optimization is
> >              running;
> > v3->v4:
> >      1) bitmap: add a new API to count 1s starting from an offset of a
> >         bitmap
> >      2) migration:
> >          - qemu_guest_free_page_hint: calculate
> >            ram_state->migration_dirty_pages by counting how many bits of
> >            free pages are truely cleared. If some of the bits were
> >            already 0, they shouldn't be deducted by
> >            ram_state->migration_dirty_pages. This wasn't needed for
> >            previous versions since we optimized bulk stage only,
> >            where all bits are guaranteed to be set. It's needed now
> >            because we extened the usage of this optimizaton to all stages
> >            except the last stop&copy stage. From 2nd stage onward, there
> >            are possibilities that some bits of free pages are already 0.
> >       3) virtio-balloon:
> >           - virtio_balloon_free_page_report_status: introduce a new status,
> >             FREE_PAGE_REPORT_S_EXIT. This status indicates that the
> >             optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
> >             the reporting is stopped, but the optimization thread still needs
> >             to be joined by the migration thread.
> > v2->v3:
> >      1) virtio-balloon
> >          - virtio_balloon_free_page_start: poll the hints using a new
> >            thread;
> >          - use cmd id between [0x80000000, UINT_MAX];
> >          - virtio_balloon_poll_free_page_hints:
> >              - stop the optimization only when it has started;
> >              - don't skip free pages when !poison_val;
> >          - add poison_val to vmsd to migrate;
> >          - virtio_balloon_get_features: add the F_PAGE_POISON feature when
> >            host has F_FREE_PAGE_HINT;
> >          - remove the timer patch which is not needed now.
> >      2) migration
> >         - new api, qemu_guest_free_page_hint;
> >         - rs->free_page_support set only in the precopy case;
> >         - use the new balloon APIs.
> > v1->v2:
> >      1) virtio-balloon
> >          - use subsections to save free_page_report_cmd_id;
> >          - poll the free page vq after sending a cmd id to the driver;
> >          - change the free page vq size to VIRTQUEUE_MAX_SIZE;
> >          - virtio_balloon_poll_free_page_hints: handle the corner case
> >            that the free page block reported from the driver may cross
> >            the RAMBlock boundary.
> >      2) migration/ram.c
> >          - use balloon_free_page_poll to start the optimization
> > 
> > 
> > Wei Wang (5):
> >    bitmap: bitmap_count_one_with_offset
> >    migration: use bitmap_mutex in migration_bitmap_clear_dirty
> >    migration: API to clear bits of guest free pages from the dirty bitmap
> >    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> >    migration: use the free page hint feature from balloon
> > 
> >   balloon.c                                       |  58 +++++-
> >   hw/virtio/virtio-balloon.c                      | 241 ++++++++++++++++++++++--
> >   include/hw/virtio/virtio-balloon.h              |  27 ++-
> >   include/migration/misc.h                        |   2 +
> >   include/qemu/bitmap.h                           |  13 ++
> >   include/standard-headers/linux/virtio_balloon.h |   7 +
> >   include/sysemu/balloon.h                        |  15 +-
> >   migration/ram.c                                 |  73 ++++++-
> >   8 files changed, 406 insertions(+), 30 deletions(-)
> 
> Nice optimization, for the first stage of  current migration method, we need to migrate all the pages of
> VM to destination,  with this capability, we can reduce lots of unnecessary pages migrating.
> 
> Just a small piece of advice, it is better to split the fourth patch into small ones, to make it more easy
> for reviewing. Besides, should we make this capability an optional one, just like other migration capabilities do ?

That's already the case, one has to enable it in the balloon, and set
the iothread.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-05-29 15:24     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-05-30  9:12       ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-05-30  9:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang

On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>> +/*
>> + * Balloon will report pages which were free at the time of this call. As the
>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>> + * this call is made.
>> + */
>> +void balloon_free_page_start(void)
>> +{
>> +    balloon_free_page_start_fn(balloon_opaque);
>> +}
> Please create notifier support, not a single global.

OK. The start is called at the end of bitmap_sync, and the stop is 
called at the beginning of bitmap_sync. In this case, we will need to 
add two migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and 
MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?


>
> +static void virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;
> +
> +    while (1) {
> +        qemu_mutex_lock(&dev->free_page_lock);
> +        while (dev->block_iothread) {
> +            qemu_cond_wait(&dev->free_page_cond, &dev->free_page_lock);
> +        }
> +
> +        /*
> +         * If the migration thread actively stops the reporting, exit
> +         * immediately.
> +         */
> +        if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
> Please refactor this : move loop body into a function so
> you can do lock/unlock in a single place.

Sounds good.

>
> +
> +static bool virtio_balloon_free_page_support(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> or if poison is negotiated.

Will make it
return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) && 
!virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)


Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-05-30  9:12       ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-05-30  9:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang

On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>> +/*
>> + * Balloon will report pages which were free at the time of this call. As the
>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>> + * this call is made.
>> + */
>> +void balloon_free_page_start(void)
>> +{
>> +    balloon_free_page_start_fn(balloon_opaque);
>> +}
> Please create notifier support, not a single global.

OK. The start is called at the end of bitmap_sync, and the stop is 
called at the beginning of bitmap_sync. In this case, we will need to 
add two migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and 
MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?


>
> +static void virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;
> +
> +    while (1) {
> +        qemu_mutex_lock(&dev->free_page_lock);
> +        while (dev->block_iothread) {
> +            qemu_cond_wait(&dev->free_page_cond, &dev->free_page_lock);
> +        }
> +
> +        /*
> +         * If the migration thread actively stops the reporting, exit
> +         * immediately.
> +         */
> +        if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
> Please refactor this : move loop body into a function so
> you can do lock/unlock in a single place.

Sounds good.

>
> +
> +static bool virtio_balloon_free_page_support(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> or if poison is negotiated.

Will make it
return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) && 
!virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)


Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-05-30  9:12       ` [virtio-dev] " Wei Wang
@ 2018-05-30 12:47         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-05-30 12:47 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang

On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
> > On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
> > > +/*
> > > + * Balloon will report pages which were free at the time of this call. As the
> > > + * reporting happens asynchronously, dirty bit logging must be enabled before
> > > + * this call is made.
> > > + */
> > > +void balloon_free_page_start(void)
> > > +{
> > > +    balloon_free_page_start_fn(balloon_opaque);
> > > +}
> > Please create notifier support, not a single global.
> 
> OK. The start is called at the end of bitmap_sync, and the stop is called at
> the beginning of bitmap_sync. In this case, we will need to add two
> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?

If that's the way you do it, you need to ask migration guys, not me.

> 
> > 
> > +static void virtio_balloon_poll_free_page_hints(void *opaque)
> > +{
> > +    VirtQueueElement *elem;
> > +    VirtIOBalloon *dev = opaque;
> > +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > +    VirtQueue *vq = dev->free_page_vq;
> > +    uint32_t id;
> > +    size_t size;
> > +
> > +    while (1) {
> > +        qemu_mutex_lock(&dev->free_page_lock);
> > +        while (dev->block_iothread) {
> > +            qemu_cond_wait(&dev->free_page_cond, &dev->free_page_lock);
> > +        }
> > +
> > +        /*
> > +         * If the migration thread actively stops the reporting, exit
> > +         * immediately.
> > +         */
> > +        if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
> > Please refactor this : move loop body into a function so
> > you can do lock/unlock in a single place.
> 
> Sounds good.
> 
> > 
> > +
> > +static bool virtio_balloon_free_page_support(void *opaque)
> > +{
> > +    VirtIOBalloon *s = opaque;
> > +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > +
> > +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> > or if poison is negotiated.
> 
> Will make it
> return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) &&
> !virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)


I mean the reverse:
	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)


If poison has been negotiated you must migrate the
guest supplied value even if you don't use it for hints.


> 
> 
> Best,
> Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-05-30 12:47         ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-05-30 12:47 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang

On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
> > On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
> > > +/*
> > > + * Balloon will report pages which were free at the time of this call. As the
> > > + * reporting happens asynchronously, dirty bit logging must be enabled before
> > > + * this call is made.
> > > + */
> > > +void balloon_free_page_start(void)
> > > +{
> > > +    balloon_free_page_start_fn(balloon_opaque);
> > > +}
> > Please create notifier support, not a single global.
> 
> OK. The start is called at the end of bitmap_sync, and the stop is called at
> the beginning of bitmap_sync. In this case, we will need to add two
> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?

If that's the way you do it, you need to ask migration guys, not me.

> 
> > 
> > +static void virtio_balloon_poll_free_page_hints(void *opaque)
> > +{
> > +    VirtQueueElement *elem;
> > +    VirtIOBalloon *dev = opaque;
> > +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > +    VirtQueue *vq = dev->free_page_vq;
> > +    uint32_t id;
> > +    size_t size;
> > +
> > +    while (1) {
> > +        qemu_mutex_lock(&dev->free_page_lock);
> > +        while (dev->block_iothread) {
> > +            qemu_cond_wait(&dev->free_page_cond, &dev->free_page_lock);
> > +        }
> > +
> > +        /*
> > +         * If the migration thread actively stops the reporting, exit
> > +         * immediately.
> > +         */
> > +        if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
> > Please refactor this : move loop body into a function so
> > you can do lock/unlock in a single place.
> 
> Sounds good.
> 
> > 
> > +
> > +static bool virtio_balloon_free_page_support(void *opaque)
> > +{
> > +    VirtIOBalloon *s = opaque;
> > +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > +
> > +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> > or if poison is negotiated.
> 
> Will make it
> return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) &&
> !virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)


I mean the reverse:
	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)


If poison has been negotiated you must migrate the
guest supplied value even if you don't use it for hints.


> 
> 
> Best,
> Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-05-30 12:47         ` [virtio-dev] " Michael S. Tsirkin
@ 2018-05-31  2:27           ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-05-31  2:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang

On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
> On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
>> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
>>> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>>>> +/*
>>>> + * Balloon will report pages which were free at the time of this call. As the
>>>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>>>> + * this call is made.
>>>> + */
>>>> +void balloon_free_page_start(void)
>>>> +{
>>>> +    balloon_free_page_start_fn(balloon_opaque);
>>>> +}
>>> Please create notifier support, not a single global.
>> OK. The start is called at the end of bitmap_sync, and the stop is called at
>> the beginning of bitmap_sync. In this case, we will need to add two
>> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
>> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
> If that's the way you do it, you need to ask migration guys, not me.

Yeah, I know.. thanks for the virtio part.

>>> +
>>> +static bool virtio_balloon_free_page_support(void *opaque)
>>> +{
>>> +    VirtIOBalloon *s = opaque;
>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
>>> +
>>> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
>>> or if poison is negotiated.
>> Will make it
>> return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) &&
>> !virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
>
> I mean the reverse:
> 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
> 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
>
>
> If poison has been negotiated you must migrate the
> guest supplied value even if you don't use it for hints.


Just a little confused with the logic. Writing it that way means that we 
are taking this possibility "virtio_vdev_has_feature(vdev, 
VIRTIO_BALLOON_F_FREE_PAGE_HINT)=fasle, virtio_vdev_has_feature(vdev, 
VIRTIO_BALLOON_F_PAGE_POISON)=true" into account, and let the support 
function return true when F_FREE_PAGE_HINT isn't supported.

If guest doesn't support F_FREE_PAGE_HINT, it doesn't support the free 
page reporting (even the free page vq). I'm not sure why we tell the 
migration thread that the free page reporting feature is supported via 
this support function. If the support function simply returns false when 
F_FREE_PAGE_HINT isn't negotiated, the legacy migration already migrates 
the poisoned pages (not skipped, but may be compressed).

I think it would be better to simply use the original "return 
virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)" here.


Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-05-31  2:27           ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-05-31  2:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang

On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
> On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
>> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
>>> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>>>> +/*
>>>> + * Balloon will report pages which were free at the time of this call. As the
>>>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>>>> + * this call is made.
>>>> + */
>>>> +void balloon_free_page_start(void)
>>>> +{
>>>> +    balloon_free_page_start_fn(balloon_opaque);
>>>> +}
>>> Please create notifier support, not a single global.
>> OK. The start is called at the end of bitmap_sync, and the stop is called at
>> the beginning of bitmap_sync. In this case, we will need to add two
>> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
>> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
> If that's the way you do it, you need to ask migration guys, not me.

Yeah, I know.. thanks for the virtio part.

>>> +
>>> +static bool virtio_balloon_free_page_support(void *opaque)
>>> +{
>>> +    VirtIOBalloon *s = opaque;
>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
>>> +
>>> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
>>> or if poison is negotiated.
>> Will make it
>> return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) &&
>> !virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
>
> I mean the reverse:
> 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
> 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
>
>
> If poison has been negotiated you must migrate the
> guest supplied value even if you don't use it for hints.


Just a little confused with the logic. Writing it that way means that we 
are taking this possibility "virtio_vdev_has_feature(vdev, 
VIRTIO_BALLOON_F_FREE_PAGE_HINT)=fasle, virtio_vdev_has_feature(vdev, 
VIRTIO_BALLOON_F_PAGE_POISON)=true" into account, and let the support 
function return true when F_FREE_PAGE_HINT isn't supported.

If guest doesn't support F_FREE_PAGE_HINT, it doesn't support the free 
page reporting (even the free page vq). I'm not sure why we tell the 
migration thread that the free page reporting feature is supported via 
this support function. If the support function simply returns false when 
F_FREE_PAGE_HINT isn't negotiated, the legacy migration already migrates 
the poisoned pages (not skipped, but may be compressed).

I think it would be better to simply use the original "return 
virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)" here.


Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-05-31  2:27           ` [virtio-dev] " Wei Wang
@ 2018-05-31 17:42             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-05-31 17:42 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang

On Thu, May 31, 2018 at 10:27:00AM +0800, Wei Wang wrote:
> On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
> > On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
> > > On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
> > > > > +/*
> > > > > + * Balloon will report pages which were free at the time of this call. As the
> > > > > + * reporting happens asynchronously, dirty bit logging must be enabled before
> > > > > + * this call is made.
> > > > > + */
> > > > > +void balloon_free_page_start(void)
> > > > > +{
> > > > > +    balloon_free_page_start_fn(balloon_opaque);
> > > > > +}
> > > > Please create notifier support, not a single global.
> > > OK. The start is called at the end of bitmap_sync, and the stop is called at
> > > the beginning of bitmap_sync. In this case, we will need to add two
> > > migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
> > > MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
> > If that's the way you do it, you need to ask migration guys, not me.
> 
> Yeah, I know.. thanks for the virtio part.
> 
> > > > +
> > > > +static bool virtio_balloon_free_page_support(void *opaque)
> > > > +{
> > > > +    VirtIOBalloon *s = opaque;
> > > > +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > > > +
> > > > +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> > > > or if poison is negotiated.
> > > Will make it
> > > return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) &&
> > > !virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
> > 
> > I mean the reverse:
> > 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
> > 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
> > 
> > 
> > If poison has been negotiated you must migrate the
> > guest supplied value even if you don't use it for hints.
> 
> 
> Just a little confused with the logic. Writing it that way means that we are
> taking this possibility "virtio_vdev_has_feature(vdev,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT)=fasle, virtio_vdev_has_feature(vdev,
> VIRTIO_BALLOON_F_PAGE_POISON)=true" into account, and let the support
> function return true when F_FREE_PAGE_HINT isn't supported.

All I am saying is that in this configuration, you must migrate
the poison value programmed by guest even if you do not
yet use it without VIRTIO_BALLOON_F_FREE_PAGE_HINT.

Right now you have
a section:
+    .needed = virtio_balloon_free_page_support,

which includes the poison value.

So if guest migrates after writing the poison value,
it's lost. Not nice.

> If guest doesn't support F_FREE_PAGE_HINT, it doesn't support the free page
> reporting (even the free page vq). I'm not sure why we tell the migration
> thread that the free page reporting feature is supported via this support
> function. If the support function simply returns false when F_FREE_PAGE_HINT
> isn't negotiated, the legacy migration already migrates the poisoned pages
> (not skipped, but may be compressed).
> 
> I think it would be better to simply use the original "return
> virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)" here.


So maybe you should put the poison value in a separate section then.


> 
> Best,
> Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-05-31 17:42             ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-05-31 17:42 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang

On Thu, May 31, 2018 at 10:27:00AM +0800, Wei Wang wrote:
> On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
> > On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
> > > On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
> > > > > +/*
> > > > > + * Balloon will report pages which were free at the time of this call. As the
> > > > > + * reporting happens asynchronously, dirty bit logging must be enabled before
> > > > > + * this call is made.
> > > > > + */
> > > > > +void balloon_free_page_start(void)
> > > > > +{
> > > > > +    balloon_free_page_start_fn(balloon_opaque);
> > > > > +}
> > > > Please create notifier support, not a single global.
> > > OK. The start is called at the end of bitmap_sync, and the stop is called at
> > > the beginning of bitmap_sync. In this case, we will need to add two
> > > migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
> > > MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
> > If that's the way you do it, you need to ask migration guys, not me.
> 
> Yeah, I know.. thanks for the virtio part.
> 
> > > > +
> > > > +static bool virtio_balloon_free_page_support(void *opaque)
> > > > +{
> > > > +    VirtIOBalloon *s = opaque;
> > > > +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > > > +
> > > > +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> > > > or if poison is negotiated.
> > > Will make it
> > > return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) &&
> > > !virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
> > 
> > I mean the reverse:
> > 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
> > 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
> > 
> > 
> > If poison has been negotiated you must migrate the
> > guest supplied value even if you don't use it for hints.
> 
> 
> Just a little confused with the logic. Writing it that way means that we are
> taking this possibility "virtio_vdev_has_feature(vdev,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT)=fasle, virtio_vdev_has_feature(vdev,
> VIRTIO_BALLOON_F_PAGE_POISON)=true" into account, and let the support
> function return true when F_FREE_PAGE_HINT isn't supported.

All I am saying is that in this configuration, you must migrate
the poison value programmed by guest even if you do not
yet use it without VIRTIO_BALLOON_F_FREE_PAGE_HINT.

Right now you have
a section:
+    .needed = virtio_balloon_free_page_support,

which includes the poison value.

So if guest migrates after writing the poison value,
it's lost. Not nice.

> If guest doesn't support F_FREE_PAGE_HINT, it doesn't support the free page
> reporting (even the free page vq). I'm not sure why we tell the migration
> thread that the free page reporting feature is supported via this support
> function. If the support function simply returns false when F_FREE_PAGE_HINT
> isn't negotiated, the legacy migration already migrates the poisoned pages
> (not skipped, but may be compressed).
> 
> I think it would be better to simply use the original "return
> virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)" here.


So maybe you should put the poison value in a separate section then.


> 
> Best,
> Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-05-31 17:42             ` [virtio-dev] " Michael S. Tsirkin
@ 2018-06-01  3:18               ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01  3:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang

On 06/01/2018 01:42 AM, Michael S. Tsirkin wrote:
> On Thu, May 31, 2018 at 10:27:00AM +0800, Wei Wang wrote:
>> On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
>>>> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>>>>>> +/*
>>>>>> + * Balloon will report pages which were free at the time of this call. As the
>>>>>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>>>>>> + * this call is made.
>>>>>> + */
>>>>>> +void balloon_free_page_start(void)
>>>>>> +{
>>>>>> +    balloon_free_page_start_fn(balloon_opaque);
>>>>>> +}
>>>>> Please create notifier support, not a single global.
>>>> OK. The start is called at the end of bitmap_sync, and the stop is called at
>>>> the beginning of bitmap_sync. In this case, we will need to add two
>>>> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
>>>> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
>>> If that's the way you do it, you need to ask migration guys, not me.
>> Yeah, I know.. thanks for the virtio part.
>>
>>>>> +
>>>>> +static bool virtio_balloon_free_page_support(void *opaque)
>>>>> +{
>>>>> +    VirtIOBalloon *s = opaque;
>>>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
>>>>> +
>>>>> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
>>>>> or if poison is negotiated.
>>>> Will make it
>>>> return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) &&
>>>> !virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
>>> I mean the reverse:
>>> 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
>>> 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
>>>
>>>
>>> If poison has been negotiated you must migrate the
>>> guest supplied value even if you don't use it for hints.
>>
>> Just a little confused with the logic. Writing it that way means that we are
>> taking this possibility "virtio_vdev_has_feature(vdev,
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT)=fasle, virtio_vdev_has_feature(vdev,
>> VIRTIO_BALLOON_F_PAGE_POISON)=true" into account, and let the support
>> function return true when F_FREE_PAGE_HINT isn't supported.
> All I am saying is that in this configuration, you must migrate
> the poison value programmed by guest even if you do not
> yet use it without VIRTIO_BALLOON_F_FREE_PAGE_HINT.
>
> Right now you have
> a section:
> +    .needed = virtio_balloon_free_page_support,
>
> which includes the poison value.
>
> So if guest migrates after writing the poison value,
> it's lost. Not nice.
>
>> If guest doesn't support F_FREE_PAGE_HINT, it doesn't support the free page
>> reporting (even the free page vq). I'm not sure why we tell the migration
>> thread that the free page reporting feature is supported via this support
>> function. If the support function simply returns false when F_FREE_PAGE_HINT
>> isn't negotiated, the legacy migration already migrates the poisoned pages
>> (not skipped, but may be compressed).
>>
>> I think it would be better to simply use the original "return
>> virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)" here.
>
> So maybe you should put the poison value in a separate section then.

Yes, that looks good to me, thanks.

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-01  3:18               ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01  3:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang

On 06/01/2018 01:42 AM, Michael S. Tsirkin wrote:
> On Thu, May 31, 2018 at 10:27:00AM +0800, Wei Wang wrote:
>> On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
>>>> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>>>>>> +/*
>>>>>> + * Balloon will report pages which were free at the time of this call. As the
>>>>>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>>>>>> + * this call is made.
>>>>>> + */
>>>>>> +void balloon_free_page_start(void)
>>>>>> +{
>>>>>> +    balloon_free_page_start_fn(balloon_opaque);
>>>>>> +}
>>>>> Please create notifier support, not a single global.
>>>> OK. The start is called at the end of bitmap_sync, and the stop is called at
>>>> the beginning of bitmap_sync. In this case, we will need to add two
>>>> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
>>>> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
>>> If that's the way you do it, you need to ask migration guys, not me.
>> Yeah, I know.. thanks for the virtio part.
>>
>>>>> +
>>>>> +static bool virtio_balloon_free_page_support(void *opaque)
>>>>> +{
>>>>> +    VirtIOBalloon *s = opaque;
>>>>> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
>>>>> +
>>>>> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
>>>>> or if poison is negotiated.
>>>> Will make it
>>>> return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) &&
>>>> !virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
>>> I mean the reverse:
>>> 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
>>> 	virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)
>>>
>>>
>>> If poison has been negotiated you must migrate the
>>> guest supplied value even if you don't use it for hints.
>>
>> Just a little confused with the logic. Writing it that way means that we are
>> taking this possibility "virtio_vdev_has_feature(vdev,
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT)=fasle, virtio_vdev_has_feature(vdev,
>> VIRTIO_BALLOON_F_PAGE_POISON)=true" into account, and let the support
>> function return true when F_FREE_PAGE_HINT isn't supported.
> All I am saying is that in this configuration, you must migrate
> the poison value programmed by guest even if you do not
> yet use it without VIRTIO_BALLOON_F_FREE_PAGE_HINT.
>
> Right now you have
> a section:
> +    .needed = virtio_balloon_free_page_support,
>
> which includes the poison value.
>
> So if guest migrates after writing the poison value,
> it's lost. Not nice.
>
>> If guest doesn't support F_FREE_PAGE_HINT, it doesn't support the free page
>> reporting (even the free page vq). I'm not sure why we tell the migration
>> thread that the free page reporting feature is supported via this support
>> function. If the support function simply returns false when F_FREE_PAGE_HINT
>> isn't negotiated, the legacy migration already migrates the poisoned pages
>> (not skipped, but may be compressed).
>>
>> I think it would be better to simply use the original "return
>> virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)" here.
>
> So maybe you should put the poison value in a separate section then.

Yes, that looks good to me, thanks.

Best,
Wei


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/5] migration: use bitmap_mutex in migration_bitmap_clear_dirty
  2018-04-24  6:13   ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-01  3:37   ` Peter Xu
  -1 siblings, 0 replies; 93+ messages in thread
From: Peter Xu @ 2018-06-01  3:37 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Tue, Apr 24, 2018 at 02:13:45PM +0800, Wei Wang wrote:
> The bitmap mutex is used to synchronize threads to update the dirty
> bitmap and the migration_dirty_pages counter. This patch makes
> migration_bitmap_clear_dirty update the bitmap and counter under the
> mutex.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> ---
>  migration/ram.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 0e90efa..9a72b1a 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -795,11 +795,14 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
>  {
>      bool ret;
>  
> +    qemu_mutex_lock(&rs->bitmap_mutex);
>      ret = test_and_clear_bit(page, rb->bmap);
>  
>      if (ret) {
>          rs->migration_dirty_pages--;
>      }
> +    qemu_mutex_unlock(&rs->bitmap_mutex);
> +
>      return ret;
>  }
>  
> -- 
> 1.8.3.1
> 
> 

Do we need the lock after all?

I see that we introduced this lock due to device hotplug at dd63169766
("migration: extend migration_bitmap", 2015-07-07), however now we
actually don't allow that to happen any more after commit b06424de62
("migration: Disable hotplug/unplug during migration", 2017-04-21).
I'm not sure whether it means that the lock is not needed now.

If so, this patch seems to be unecessary.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-04-24  6:13   ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-01  4:00   ` Peter Xu
  2018-06-01  7:36       ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-01  4:00 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote:
> This patch adds an API to clear bits corresponding to guest free pages
> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
> RAMBlock boundary.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> ---
>  include/migration/misc.h |  2 ++
>  migration/ram.c          | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+)
> 
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index 4ebf24c..113320e 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -14,11 +14,13 @@
>  #ifndef MIGRATION_MISC_H
>  #define MIGRATION_MISC_H
>  
> +#include "exec/cpu-common.h"
>  #include "qemu/notify.h"
>  
>  /* migration/ram.c */
>  
>  void ram_mig_init(void);
> +void qemu_guest_free_page_hint(void *addr, size_t len);
>  
>  /* migration/block.c */
>  
> diff --git a/migration/ram.c b/migration/ram.c
> index 9a72b1a..0147548 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2198,6 +2198,50 @@ static int ram_init_all(RAMState **rsp)
>  }
>  
>  /*
> + * This function clears bits of the free pages reported by the caller from the
> + * migration dirty bitmap. @addr is the host address corresponding to the
> + * start of the continuous guest free pages, and @len is the total bytes of
> + * those pages.
> + */
> +void qemu_guest_free_page_hint(void *addr, size_t len)
> +{
> +    RAMBlock *block;
> +    ram_addr_t offset;
> +    size_t used_len, start, npages;

Do we need to check here on whether a migration is in progress?  Since
if not I'm not sure whether this hint still makes any sense any more,
and more importantly it seems to me that block->bmap below at [1] is
only valid during a migration.  So I'm not sure whether QEMU will
crash if this function is called without a running migration.

> +
> +    for (; len > 0; len -= used_len) {
> +        block = qemu_ram_block_from_host(addr, false, &offset);
> +        if (unlikely(!block)) {
> +            return;

We should never reach here, should we?  Assuming the callers of this
function should always pass in a correct host address. If we are very
sure that the host addr should be valid, could we just assert?

> +        }
> +
> +        /*
> +         * This handles the case that the RAMBlock is resized after the free
> +         * page hint is reported.
> +         */
> +        if (unlikely(offset > block->used_length)) {
> +            return;
> +        }
> +
> +        if (len <= block->used_length - offset) {
> +            used_len = len;
> +        } else {
> +            used_len = block->used_length - offset;
> +            addr += used_len;
> +        }
> +
> +        start = offset >> TARGET_PAGE_BITS;
> +        npages = used_len >> TARGET_PAGE_BITS;
> +
> +        qemu_mutex_lock(&ram_state->bitmap_mutex);

So now I think I understand the lock can still be meaningful since
this function now can be called outside the migration thread (e.g., in
vcpu thread).  But still it would be nice to mention it somewhere on
the truth of the lock.

Regards,

> +        ram_state->migration_dirty_pages -=
> +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> +        bitmap_clear(block->bmap, start, npages);

[1]

> +        qemu_mutex_unlock(&ram_state->bitmap_mutex);
> +    }
> +}
> +
> +/*
>   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>   * long-running RCU critical section.  When rcu-reclaims in the code
>   * start to become numerous it will be necessary to reduce the
> -- 
> 1.8.3.1
> 
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-04-24  6:13 ` [virtio-dev] " Wei Wang
                   ` (8 preceding siblings ...)
  (?)
@ 2018-06-01  4:58 ` Peter Xu
  2018-06-01  5:07   ` Peter Xu
  2018-06-01  7:21     ` [virtio-dev] " Wei Wang
  -1 siblings, 2 replies; 93+ messages in thread
From: Peter Xu @ 2018-06-01  4:58 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> This is the deivce part implementation to add a new feature,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> receives the guest free page hints from the driver and clears the
> corresponding bits in the dirty bitmap, so that those free pages are
> not transferred by the migration thread to the destination.
> 
> - Test Environment
>     Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>     Guest: 8G RAM, 4 vCPU
>     Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> 
> - Test Results
>     - Idle Guest Live Migration Time (results are averaged over 10 runs):
>         - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>     - Guest with Linux Compilation Workload (make bzImage -j4):
>         - Live Migration Time (average)
>           Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>         - Linux Compilation Time
>           Optimization v.s. Legacy = 4min56s v.s. 5min3s
>           --> no obvious difference
> 
> - Source Code
>     - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>     - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

Hi, Wei,

I have a very high-level question to the series.

IIUC the core idea for this series is that we can avoid sending some
of the pages if we know that we don't need to send them.  I think this
is based on the fact that on the destination side all the pages are by
default zero after they are malloced.  While before this series, IIUC
any migration will send every single page to destination, no matter
whether it's zeroed or not.  So I'm uncertain about whether this will
affect the received bitmap on the destination side.  Say, before this
series, the received bitmap will directly cover the whole RAM bitmap
after migration is finished, now it's won't.  Will there be any side
effect?  I don't see obvious issue now, but just raise this question
up.

Meanwhile, this reminds me about a more funny idea: whether we can
just avoid sending the zero pages directly from QEMU's perspective.
In other words, can we just do nothing if save_zero_page() detected
that the page is zero (I guess the is_zero_range() can be fast too,
but I don't know exactly how fast it is)?  And how that would be
differed from this page hinting way in either performance and other
aspects.

I haven't digged into the kernel patches yet so I have totally no idea
on the detailed implementation of the page hinting.  Please feel free
to correct me if there is obvious misunderstandings.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-06-01  4:58 ` Peter Xu
@ 2018-06-01  5:07   ` Peter Xu
  2018-06-01  7:29       ` [virtio-dev] " Wei Wang
  2018-06-01  7:21     ` [virtio-dev] " Wei Wang
  1 sibling, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-01  5:07 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> > This is the deivce part implementation to add a new feature,
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > receives the guest free page hints from the driver and clears the
> > corresponding bits in the dirty bitmap, so that those free pages are
> > not transferred by the migration thread to the destination.
> > 
> > - Test Environment
> >     Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> >     Guest: 8G RAM, 4 vCPU
> >     Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > 
> > - Test Results
> >     - Idle Guest Live Migration Time (results are averaged over 10 runs):
> >         - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> >     - Guest with Linux Compilation Workload (make bzImage -j4):
> >         - Live Migration Time (average)
> >           Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> >         - Linux Compilation Time
> >           Optimization v.s. Legacy = 4min56s v.s. 5min3s
> >           --> no obvious difference
> > 
> > - Source Code
> >     - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> >     - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> 
> Hi, Wei,
> 
> I have a very high-level question to the series.
> 
> IIUC the core idea for this series is that we can avoid sending some
> of the pages if we know that we don't need to send them.  I think this
> is based on the fact that on the destination side all the pages are by
> default zero after they are malloced.  While before this series, IIUC
> any migration will send every single page to destination, no matter
> whether it's zeroed or not.  So I'm uncertain about whether this will
> affect the received bitmap on the destination side.  Say, before this
> series, the received bitmap will directly cover the whole RAM bitmap
> after migration is finished, now it's won't.  Will there be any side
> effect?  I don't see obvious issue now, but just raise this question
> up.
> 
> Meanwhile, this reminds me about a more funny idea: whether we can
> just avoid sending the zero pages directly from QEMU's perspective.
> In other words, can we just do nothing if save_zero_page() detected
> that the page is zero (I guess the is_zero_range() can be fast too,
> but I don't know exactly how fast it is)?  And how that would be
> differed from this page hinting way in either performance and other
> aspects.

I noticed a problem (after I wrote the above paragraph 5 minutes
ago...): when a page was valid and sent to the destination (with
non-zero data), however after a while that page was zeroed.  Then if
we don't send zero pages at all, we won't send the page after it's
zeroed.  Then on the destination side we'll have a stale non-zero
page.  Is my understanding correct?  Will that be a problem to this
series too where a valid page can be possibly freed and hinted?

> 
> I haven't digged into the kernel patches yet so I have totally no idea
> on the detailed implementation of the page hinting.  Please feel free
> to correct me if there is obvious misunderstandings.
> 
> Regards,
> 
> -- 
> Peter Xu

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-06-01  4:58 ` Peter Xu
@ 2018-06-01  7:21     ` Wei Wang
  2018-06-01  7:21     ` [virtio-dev] " Wei Wang
  1 sibling, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01  7:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/01/2018 12:58 PM, Peter Xu wrote:
> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
>> This is the deivce part implementation to add a new feature,
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
>> receives the guest free page hints from the driver and clears the
>> corresponding bits in the dirty bitmap, so that those free pages are
>> not transferred by the migration thread to the destination.
>>
>> - Test Environment
>>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>>      Guest: 8G RAM, 4 vCPU
>>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>>
>> - Test Results
>>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>>      - Guest with Linux Compilation Workload (make bzImage -j4):
>>          - Live Migration Time (average)
>>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>>          - Linux Compilation Time
>>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>>            --> no obvious difference
>>
>> - Source Code
>>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> Hi, Wei,
>
> I have a very high-level question to the series.

Hi Peter,

Thanks for joining the discussion :)

>
> IIUC the core idea for this series is that we can avoid sending some
> of the pages if we know that we don't need to send them.  I think this
> is based on the fact that on the destination side all the pages are by
> default zero after they are malloced.  While before this series, IIUC
> any migration will send every single page to destination, no matter
> whether it's zeroed or not.  So I'm uncertain about whether this will
> affect the received bitmap on the destination side.  Say, before this
> series, the received bitmap will directly cover the whole RAM bitmap
> after migration is finished, now it's won't.  Will there be any side
> effect?  I don't see obvious issue now, but just raise this question
> up.

This feature currently only supports pre-copy (I think the received 
bitmap is something matters to post copy only).
That's why we have
rs->free_page_support = ..&& !migrate_postcopy();

> Meanwhile, this reminds me about a more funny idea: whether we can
> just avoid sending the zero pages directly from QEMU's perspective.
> In other words, can we just do nothing if save_zero_page() detected
> that the page is zero (I guess the is_zero_range() can be fast too,
> but I don't know exactly how fast it is)?  And how that would be
> differed from this page hinting way in either performance and other
> aspects.

I guess you referred to the zero page optimization. I think the major 
overhead comes to the zero page checking - lots of memory accesses, 
which also waste memory bandwidth. Please see the results attached in 
the cover letter. The legacy case already includes the zero page 
optimization.

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
@ 2018-06-01  7:21     ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01  7:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/01/2018 12:58 PM, Peter Xu wrote:
> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
>> This is the deivce part implementation to add a new feature,
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
>> receives the guest free page hints from the driver and clears the
>> corresponding bits in the dirty bitmap, so that those free pages are
>> not transferred by the migration thread to the destination.
>>
>> - Test Environment
>>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>>      Guest: 8G RAM, 4 vCPU
>>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>>
>> - Test Results
>>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>>      - Guest with Linux Compilation Workload (make bzImage -j4):
>>          - Live Migration Time (average)
>>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>>          - Linux Compilation Time
>>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>>            --> no obvious difference
>>
>> - Source Code
>>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> Hi, Wei,
>
> I have a very high-level question to the series.

Hi Peter,

Thanks for joining the discussion :)

>
> IIUC the core idea for this series is that we can avoid sending some
> of the pages if we know that we don't need to send them.  I think this
> is based on the fact that on the destination side all the pages are by
> default zero after they are malloced.  While before this series, IIUC
> any migration will send every single page to destination, no matter
> whether it's zeroed or not.  So I'm uncertain about whether this will
> affect the received bitmap on the destination side.  Say, before this
> series, the received bitmap will directly cover the whole RAM bitmap
> after migration is finished, now it's won't.  Will there be any side
> effect?  I don't see obvious issue now, but just raise this question
> up.

This feature currently only supports pre-copy (I think the received 
bitmap is something matters to post copy only).
That's why we have
rs->free_page_support = ..&& !migrate_postcopy();

> Meanwhile, this reminds me about a more funny idea: whether we can
> just avoid sending the zero pages directly from QEMU's perspective.
> In other words, can we just do nothing if save_zero_page() detected
> that the page is zero (I guess the is_zero_range() can be fast too,
> but I don't know exactly how fast it is)?  And how that would be
> differed from this page hinting way in either performance and other
> aspects.

I guess you referred to the zero page optimization. I think the major 
overhead comes to the zero page checking - lots of memory accesses, 
which also waste memory bandwidth. Please see the results attached in 
the cover letter. The legacy case already includes the zero page 
optimization.

Best,
Wei




---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-06-01  5:07   ` Peter Xu
@ 2018-06-01  7:29       ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01  7:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/01/2018 01:07 PM, Peter Xu wrote:
> On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
>> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
>>> This is the deivce part implementation to add a new feature,
>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
>>> receives the guest free page hints from the driver and clears the
>>> corresponding bits in the dirty bitmap, so that those free pages are
>>> not transferred by the migration thread to the destination.
>>>
>>> - Test Environment
>>>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>>>      Guest: 8G RAM, 4 vCPU
>>>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>>>
>>> - Test Results
>>>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>>>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>>>      - Guest with Linux Compilation Workload (make bzImage -j4):
>>>          - Live Migration Time (average)
>>>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>>>          - Linux Compilation Time
>>>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>>>            --> no obvious difference
>>>
>>> - Source Code
>>>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>>>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>> Hi, Wei,
>>
>> I have a very high-level question to the series.
>>
>> IIUC the core idea for this series is that we can avoid sending some
>> of the pages if we know that we don't need to send them.  I think this
>> is based on the fact that on the destination side all the pages are by
>> default zero after they are malloced.  While before this series, IIUC
>> any migration will send every single page to destination, no matter
>> whether it's zeroed or not.  So I'm uncertain about whether this will
>> affect the received bitmap on the destination side.  Say, before this
>> series, the received bitmap will directly cover the whole RAM bitmap
>> after migration is finished, now it's won't.  Will there be any side
>> effect?  I don't see obvious issue now, but just raise this question
>> up.
>>
>> Meanwhile, this reminds me about a more funny idea: whether we can
>> just avoid sending the zero pages directly from QEMU's perspective.
>> In other words, can we just do nothing if save_zero_page() detected
>> that the page is zero (I guess the is_zero_range() can be fast too,
>> but I don't know exactly how fast it is)?  And how that would be
>> differed from this page hinting way in either performance and other
>> aspects.
> I noticed a problem (after I wrote the above paragraph 5 minutes
> ago...): when a page was valid and sent to the destination (with
> non-zero data), however after a while that page was zeroed.  Then if
> we don't send zero pages at all, we won't send the page after it's
> zeroed.  Then on the destination side we'll have a stale non-zero
> page.  Is my understanding correct?  Will that be a problem to this
> series too where a valid page can be possibly freed and hinted?

I think that won't be an issue either for zero page optimization or this 
free page optimization.

For the zero page optimization, QEMU always sends compressed 0s to the 
destination. The zero page is detected at the time QEMU checks it 
(before sending the page). if it is a 0 page, QEMU compresses all 0s 
(actually just a flag) and send it.

For the free page optimization, we skip free pages (could be thought of 
as 0 pages in this context). The zero pages are detected at the time 
guest reports it QEMU. The page won't be reported if it is non-zero 
(i.e. used).


Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
@ 2018-06-01  7:29       ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01  7:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/01/2018 01:07 PM, Peter Xu wrote:
> On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
>> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
>>> This is the deivce part implementation to add a new feature,
>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
>>> receives the guest free page hints from the driver and clears the
>>> corresponding bits in the dirty bitmap, so that those free pages are
>>> not transferred by the migration thread to the destination.
>>>
>>> - Test Environment
>>>      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>>>      Guest: 8G RAM, 4 vCPU
>>>      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>>>
>>> - Test Results
>>>      - Idle Guest Live Migration Time (results are averaged over 10 runs):
>>>          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>>>      - Guest with Linux Compilation Workload (make bzImage -j4):
>>>          - Live Migration Time (average)
>>>            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>>>          - Linux Compilation Time
>>>            Optimization v.s. Legacy = 4min56s v.s. 5min3s
>>>            --> no obvious difference
>>>
>>> - Source Code
>>>      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>>>      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>> Hi, Wei,
>>
>> I have a very high-level question to the series.
>>
>> IIUC the core idea for this series is that we can avoid sending some
>> of the pages if we know that we don't need to send them.  I think this
>> is based on the fact that on the destination side all the pages are by
>> default zero after they are malloced.  While before this series, IIUC
>> any migration will send every single page to destination, no matter
>> whether it's zeroed or not.  So I'm uncertain about whether this will
>> affect the received bitmap on the destination side.  Say, before this
>> series, the received bitmap will directly cover the whole RAM bitmap
>> after migration is finished, now it's won't.  Will there be any side
>> effect?  I don't see obvious issue now, but just raise this question
>> up.
>>
>> Meanwhile, this reminds me about a more funny idea: whether we can
>> just avoid sending the zero pages directly from QEMU's perspective.
>> In other words, can we just do nothing if save_zero_page() detected
>> that the page is zero (I guess the is_zero_range() can be fast too,
>> but I don't know exactly how fast it is)?  And how that would be
>> differed from this page hinting way in either performance and other
>> aspects.
> I noticed a problem (after I wrote the above paragraph 5 minutes
> ago...): when a page was valid and sent to the destination (with
> non-zero data), however after a while that page was zeroed.  Then if
> we don't send zero pages at all, we won't send the page after it's
> zeroed.  Then on the destination side we'll have a stale non-zero
> page.  Is my understanding correct?  Will that be a problem to this
> series too where a valid page can be possibly freed and hinted?

I think that won't be an issue either for zero page optimization or this 
free page optimization.

For the zero page optimization, QEMU always sends compressed 0s to the 
destination. The zero page is detected at the time QEMU checks it 
(before sending the page). if it is a 0 page, QEMU compresses all 0s 
(actually just a flag) and send it.

For the free page optimization, we skip free pages (could be thought of 
as 0 pages in this context). The zero pages are detected at the time 
guest reports it QEMU. The page won't be reported if it is non-zero 
(i.e. used).


Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-06-01  4:00   ` [Qemu-devel] " Peter Xu
@ 2018-06-01  7:36       ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01  7:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/01/2018 12:00 PM, Peter Xu wrote:
> On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote:
>> This patch adds an API to clear bits corresponding to guest free pages
>> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
>> RAMBlock boundary.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>   include/migration/misc.h |  2 ++
>>   migration/ram.c          | 44 ++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 46 insertions(+)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index 4ebf24c..113320e 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -14,11 +14,13 @@
>>   #ifndef MIGRATION_MISC_H
>>   #define MIGRATION_MISC_H
>>   
>> +#include "exec/cpu-common.h"
>>   #include "qemu/notify.h"
>>   
>>   /* migration/ram.c */
>>   
>>   void ram_mig_init(void);
>> +void qemu_guest_free_page_hint(void *addr, size_t len);
>>   
>>   /* migration/block.c */
>>   
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 9a72b1a..0147548 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2198,6 +2198,50 @@ static int ram_init_all(RAMState **rsp)
>>   }
>>   
>>   /*
>> + * This function clears bits of the free pages reported by the caller from the
>> + * migration dirty bitmap. @addr is the host address corresponding to the
>> + * start of the continuous guest free pages, and @len is the total bytes of
>> + * those pages.
>> + */
>> +void qemu_guest_free_page_hint(void *addr, size_t len)
>> +{
>> +    RAMBlock *block;
>> +    ram_addr_t offset;
>> +    size_t used_len, start, npages;
> Do we need to check here on whether a migration is in progress?  Since
> if not I'm not sure whether this hint still makes any sense any more,
> and more importantly it seems to me that block->bmap below at [1] is
> only valid during a migration.  So I'm not sure whether QEMU will
> crash if this function is called without a running migration.

OK. How about just adding comments above to have users noted that this 
function should be used during migration?

If we want to do a sanity check here, I think it would be easier to just 
check !block->bmap here.


>
>> +
>> +    for (; len > 0; len -= used_len) {
>> +        block = qemu_ram_block_from_host(addr, false, &offset);
>> +        if (unlikely(!block)) {
>> +            return;
> We should never reach here, should we?  Assuming the callers of this
> function should always pass in a correct host address. If we are very
> sure that the host addr should be valid, could we just assert?

Probably not the case, because of the corner case that the memory would 
be hot unplugged after the free page is reported to QEMU.



>
>> +        }
>> +
>> +        /*
>> +         * This handles the case that the RAMBlock is resized after the free
>> +         * page hint is reported.
>> +         */
>> +        if (unlikely(offset > block->used_length)) {
>> +            return;
>> +        }
>> +
>> +        if (len <= block->used_length - offset) {
>> +            used_len = len;
>> +        } else {
>> +            used_len = block->used_length - offset;
>> +            addr += used_len;
>> +        }
>> +
>> +        start = offset >> TARGET_PAGE_BITS;
>> +        npages = used_len >> TARGET_PAGE_BITS;
>> +
>> +        qemu_mutex_lock(&ram_state->bitmap_mutex);
> So now I think I understand the lock can still be meaningful since
> this function now can be called outside the migration thread (e.g., in
> vcpu thread).  But still it would be nice to mention it somewhere on
> the truth of the lock.
>

Yes. Thanks for the reminder. I will add some explanation to the patch 2 
commit log.


Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-06-01  7:36       ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01  7:36 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/01/2018 12:00 PM, Peter Xu wrote:
> On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote:
>> This patch adds an API to clear bits corresponding to guest free pages
>> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
>> RAMBlock boundary.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>   include/migration/misc.h |  2 ++
>>   migration/ram.c          | 44 ++++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 46 insertions(+)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index 4ebf24c..113320e 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -14,11 +14,13 @@
>>   #ifndef MIGRATION_MISC_H
>>   #define MIGRATION_MISC_H
>>   
>> +#include "exec/cpu-common.h"
>>   #include "qemu/notify.h"
>>   
>>   /* migration/ram.c */
>>   
>>   void ram_mig_init(void);
>> +void qemu_guest_free_page_hint(void *addr, size_t len);
>>   
>>   /* migration/block.c */
>>   
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 9a72b1a..0147548 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2198,6 +2198,50 @@ static int ram_init_all(RAMState **rsp)
>>   }
>>   
>>   /*
>> + * This function clears bits of the free pages reported by the caller from the
>> + * migration dirty bitmap. @addr is the host address corresponding to the
>> + * start of the continuous guest free pages, and @len is the total bytes of
>> + * those pages.
>> + */
>> +void qemu_guest_free_page_hint(void *addr, size_t len)
>> +{
>> +    RAMBlock *block;
>> +    ram_addr_t offset;
>> +    size_t used_len, start, npages;
> Do we need to check here on whether a migration is in progress?  Since
> if not I'm not sure whether this hint still makes any sense any more,
> and more importantly it seems to me that block->bmap below at [1] is
> only valid during a migration.  So I'm not sure whether QEMU will
> crash if this function is called without a running migration.

OK. How about just adding comments above to have users noted that this 
function should be used during migration?

If we want to do a sanity check here, I think it would be easier to just 
check !block->bmap here.


>
>> +
>> +    for (; len > 0; len -= used_len) {
>> +        block = qemu_ram_block_from_host(addr, false, &offset);
>> +        if (unlikely(!block)) {
>> +            return;
> We should never reach here, should we?  Assuming the callers of this
> function should always pass in a correct host address. If we are very
> sure that the host addr should be valid, could we just assert?

Probably not the case, because of the corner case that the memory would 
be hot unplugged after the free page is reported to QEMU.



>
>> +        }
>> +
>> +        /*
>> +         * This handles the case that the RAMBlock is resized after the free
>> +         * page hint is reported.
>> +         */
>> +        if (unlikely(offset > block->used_length)) {
>> +            return;
>> +        }
>> +
>> +        if (len <= block->used_length - offset) {
>> +            used_len = len;
>> +        } else {
>> +            used_len = block->used_length - offset;
>> +            addr += used_len;
>> +        }
>> +
>> +        start = offset >> TARGET_PAGE_BITS;
>> +        npages = used_len >> TARGET_PAGE_BITS;
>> +
>> +        qemu_mutex_lock(&ram_state->bitmap_mutex);
> So now I think I understand the lock can still be meaningful since
> this function now can be called outside the migration thread (e.g., in
> vcpu thread).  But still it would be nice to mention it somewhere on
> the truth of the lock.
>

Yes. Thanks for the reminder. I will add some explanation to the patch 2 
commit log.


Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-06-01  7:29       ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-01 10:02       ` Peter Xu
  2018-06-01 12:31           ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-01 10:02 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 01, 2018 at 03:29:45PM +0800, Wei Wang wrote:
> On 06/01/2018 01:07 PM, Peter Xu wrote:
> > On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
> > > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> > > > This is the deivce part implementation to add a new feature,
> > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > > > receives the guest free page hints from the driver and clears the
> > > > corresponding bits in the dirty bitmap, so that those free pages are
> > > > not transferred by the migration thread to the destination.
> > > > 
> > > > - Test Environment
> > > >      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > > >      Guest: 8G RAM, 4 vCPU
> > > >      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > > > 
> > > > - Test Results
> > > >      - Idle Guest Live Migration Time (results are averaged over 10 runs):
> > > >          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> > > >      - Guest with Linux Compilation Workload (make bzImage -j4):
> > > >          - Live Migration Time (average)
> > > >            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> > > >          - Linux Compilation Time
> > > >            Optimization v.s. Legacy = 4min56s v.s. 5min3s
> > > >            --> no obvious difference
> > > > 
> > > > - Source Code
> > > >      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> > > >      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > > Hi, Wei,
> > > 
> > > I have a very high-level question to the series.
> > > 
> > > IIUC the core idea for this series is that we can avoid sending some
> > > of the pages if we know that we don't need to send them.  I think this
> > > is based on the fact that on the destination side all the pages are by
> > > default zero after they are malloced.  While before this series, IIUC
> > > any migration will send every single page to destination, no matter
> > > whether it's zeroed or not.  So I'm uncertain about whether this will
> > > affect the received bitmap on the destination side.  Say, before this
> > > series, the received bitmap will directly cover the whole RAM bitmap
> > > after migration is finished, now it's won't.  Will there be any side
> > > effect?  I don't see obvious issue now, but just raise this question
> > > up.
> > > 
> > > Meanwhile, this reminds me about a more funny idea: whether we can
> > > just avoid sending the zero pages directly from QEMU's perspective.
> > > In other words, can we just do nothing if save_zero_page() detected
> > > that the page is zero (I guess the is_zero_range() can be fast too,
> > > but I don't know exactly how fast it is)?  And how that would be
> > > differed from this page hinting way in either performance and other
> > > aspects.
> > I noticed a problem (after I wrote the above paragraph 5 minutes
> > ago...): when a page was valid and sent to the destination (with
> > non-zero data), however after a while that page was zeroed.  Then if
> > we don't send zero pages at all, we won't send the page after it's
> > zeroed.  Then on the destination side we'll have a stale non-zero
> > page.  Is my understanding correct?  Will that be a problem to this
> > series too where a valid page can be possibly freed and hinted?
> 
> I think that won't be an issue either for zero page optimization or this
> free page optimization.
> 
> For the zero page optimization, QEMU always sends compressed 0s to the
> destination. The zero page is detected at the time QEMU checks it (before
> sending the page). if it is a 0 page, QEMU compresses all 0s (actually just
> a flag) and send it.

what I meant is, can we just do not even send that ZERO flag at all? :)

> 
> For the free page optimization, we skip free pages (could be thought of as 0
> pages in this context). The zero pages are detected at the time guest
> reports it QEMU. The page won't be reported if it is non-zero (i.e. used).

Sorry I must have not explained myself well.  Let's assume the page
hint is used.  I meant this:

- start precopy, page P is non-zero (let's say, page has content P1,
  which is non-zero)
- we send page P with content P1 on src, then latest destination cache
  of page P is P1
- page P is freed by the guest, then it becomes zero, dirty bitmap of
  P is set since it's changed (from P1 to zeroed page)
- page P is provided as hint that we can skip it since it's zeroed,
  then the dirty bit of P is cleared
- ... (page P is never used until migration completes)

After migration completes, page P should be an zeroed page on the
source, while IIUC on the destination side it's still with stale data
P1.  Did I miss anything important?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-06-01  7:36       ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-01 10:06       ` Peter Xu
  2018-06-01 12:32           ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-01 10:06 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 01, 2018 at 03:36:01PM +0800, Wei Wang wrote:
> On 06/01/2018 12:00 PM, Peter Xu wrote:
> > On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote:
> > > This patch adds an API to clear bits corresponding to guest free pages
> > > from the dirty bitmap. Spilt the free page block if it crosses the QEMU
> > > RAMBlock boundary.
> > > 
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >   include/migration/misc.h |  2 ++
> > >   migration/ram.c          | 44 ++++++++++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 46 insertions(+)
> > > 
> > > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > > index 4ebf24c..113320e 100644
> > > --- a/include/migration/misc.h
> > > +++ b/include/migration/misc.h
> > > @@ -14,11 +14,13 @@
> > >   #ifndef MIGRATION_MISC_H
> > >   #define MIGRATION_MISC_H
> > > +#include "exec/cpu-common.h"
> > >   #include "qemu/notify.h"
> > >   /* migration/ram.c */
> > >   void ram_mig_init(void);
> > > +void qemu_guest_free_page_hint(void *addr, size_t len);
> > >   /* migration/block.c */
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 9a72b1a..0147548 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -2198,6 +2198,50 @@ static int ram_init_all(RAMState **rsp)
> > >   }
> > >   /*
> > > + * This function clears bits of the free pages reported by the caller from the
> > > + * migration dirty bitmap. @addr is the host address corresponding to the
> > > + * start of the continuous guest free pages, and @len is the total bytes of
> > > + * those pages.
> > > + */
> > > +void qemu_guest_free_page_hint(void *addr, size_t len)
> > > +{
> > > +    RAMBlock *block;
> > > +    ram_addr_t offset;
> > > +    size_t used_len, start, npages;
> > Do we need to check here on whether a migration is in progress?  Since
> > if not I'm not sure whether this hint still makes any sense any more,
> > and more importantly it seems to me that block->bmap below at [1] is
> > only valid during a migration.  So I'm not sure whether QEMU will
> > crash if this function is called without a running migration.
> 
> OK. How about just adding comments above to have users noted that this
> function should be used during migration?
> 
> If we want to do a sanity check here, I think it would be easier to just
> check !block->bmap here.

I think the faster way might be that we check against the migration
state.

> 
> 
> > 
> > > +
> > > +    for (; len > 0; len -= used_len) {
> > > +        block = qemu_ram_block_from_host(addr, false, &offset);
> > > +        if (unlikely(!block)) {
> > > +            return;
> > We should never reach here, should we?  Assuming the callers of this
> > function should always pass in a correct host address. If we are very
> > sure that the host addr should be valid, could we just assert?
> 
> Probably not the case, because of the corner case that the memory would be
> hot unplugged after the free page is reported to QEMU.

Question: Do we allow to do hot plug/unplug for memory during
migration?

> 
> 
> 
> > 
> > > +        }
> > > +
> > > +        /*
> > > +         * This handles the case that the RAMBlock is resized after the free
> > > +         * page hint is reported.
> > > +         */
> > > +        if (unlikely(offset > block->used_length)) {
> > > +            return;
> > > +        }
> > > +
> > > +        if (len <= block->used_length - offset) {
> > > +            used_len = len;
> > > +        } else {
> > > +            used_len = block->used_length - offset;
> > > +            addr += used_len;
> > > +        }
> > > +
> > > +        start = offset >> TARGET_PAGE_BITS;
> > > +        npages = used_len >> TARGET_PAGE_BITS;
> > > +
> > > +        qemu_mutex_lock(&ram_state->bitmap_mutex);
> > So now I think I understand the lock can still be meaningful since
> > this function now can be called outside the migration thread (e.g., in
> > vcpu thread).  But still it would be nice to mention it somewhere on

(Actually after read the next patch I think it's in iothread, so I'd
 better reply with all the series read over next time :)

> > the truth of the lock.
> > 
> 
> Yes. Thanks for the reminder. I will add some explanation to the patch 2
> commit log.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-06-01  7:21     ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-01 10:40     ` Peter Xu
  2018-06-01 15:33       ` Dr. David Alan Gilbert
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-01 10:40 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 01, 2018 at 03:21:54PM +0800, Wei Wang wrote:
> On 06/01/2018 12:58 PM, Peter Xu wrote:
> > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> > > This is the deivce part implementation to add a new feature,
> > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > > receives the guest free page hints from the driver and clears the
> > > corresponding bits in the dirty bitmap, so that those free pages are
> > > not transferred by the migration thread to the destination.
> > > 
> > > - Test Environment
> > >      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > >      Guest: 8G RAM, 4 vCPU
> > >      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > > 
> > > - Test Results
> > >      - Idle Guest Live Migration Time (results are averaged over 10 runs):
> > >          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> > >      - Guest with Linux Compilation Workload (make bzImage -j4):
> > >          - Live Migration Time (average)
> > >            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> > >          - Linux Compilation Time
> > >            Optimization v.s. Legacy = 4min56s v.s. 5min3s
> > >            --> no obvious difference
> > > 
> > > - Source Code
> > >      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> > >      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > Hi, Wei,
> > 
> > I have a very high-level question to the series.
> 
> Hi Peter,
> 
> Thanks for joining the discussion :)

Thanks for letting me know this thread.  It's an interesting idea. :)

> 
> > 
> > IIUC the core idea for this series is that we can avoid sending some
> > of the pages if we know that we don't need to send them.  I think this
> > is based on the fact that on the destination side all the pages are by
> > default zero after they are malloced.  While before this series, IIUC
> > any migration will send every single page to destination, no matter
> > whether it's zeroed or not.  So I'm uncertain about whether this will
> > affect the received bitmap on the destination side.  Say, before this
> > series, the received bitmap will directly cover the whole RAM bitmap
> > after migration is finished, now it's won't.  Will there be any side
> > effect?  I don't see obvious issue now, but just raise this question
> > up.
> 
> This feature currently only supports pre-copy (I think the received bitmap
> is something matters to post copy only).
> That's why we have
> rs->free_page_support = ..&& !migrate_postcopy();

Okay.

> 
> > Meanwhile, this reminds me about a more funny idea: whether we can
> > just avoid sending the zero pages directly from QEMU's perspective.
> > In other words, can we just do nothing if save_zero_page() detected
> > that the page is zero (I guess the is_zero_range() can be fast too,
> > but I don't know exactly how fast it is)?  And how that would be
> > differed from this page hinting way in either performance and other
> > aspects.
> 
> I guess you referred to the zero page optimization. I think the major
> overhead comes to the zero page checking - lots of memory accesses, which
> also waste memory bandwidth. Please see the results attached in the cover
> letter. The legacy case already includes the zero page optimization.

I replied in the other thread.  We can discuss there altogether.

Actually after a second thought I think maybe what I worried there is
exactly the reason why we must send the zero page flag - otherwise
there can be stale non-zero page on destination.  Here "zero page" and
"freed page" is totally different idea since even if a page is zeroed
it might still be in use (not freed)!  While instead for a "free page"
even if it's non-zero we might be able to not send it at all, though I
am not sure whether that mismatch of data might cause any side effect
too. I think the corresponding question would be: if a page is freed
in Linux kernel, would its data matter any more?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-06-01 10:02       ` Peter Xu
@ 2018-06-01 12:31           ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01 12:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/01/2018 06:02 PM, Peter Xu wrote:
> On Fri, Jun 01, 2018 at 03:29:45PM +0800, Wei Wang wrote:
>> On 06/01/2018 01:07 PM, Peter Xu wrote:
>>> On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
>>>> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
>>>>> This is the deivce part implementation to add a new feature,
>>>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
>>>>> receives the guest free page hints from the driver and clears the
>>>>> corresponding bits in the dirty bitmap, so that those free pages are
>>>>> not transferred by the migration thread to the destination.
>>>>>
>>>>> - Test Environment
>>>>>       Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>>>>>       Guest: 8G RAM, 4 vCPU
>>>>>       Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>>>>>
>>>>> - Test Results
>>>>>       - Idle Guest Live Migration Time (results are averaged over 10 runs):
>>>>>           - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>>>>>       - Guest with Linux Compilation Workload (make bzImage -j4):
>>>>>           - Live Migration Time (average)
>>>>>             Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>>>>>           - Linux Compilation Time
>>>>>             Optimization v.s. Legacy = 4min56s v.s. 5min3s
>>>>>             --> no obvious difference
>>>>>
>>>>> - Source Code
>>>>>       - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>>>>>       - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>>>> Hi, Wei,
>>>>
>>>> I have a very high-level question to the series.
>>>>
>>>> IIUC the core idea for this series is that we can avoid sending some
>>>> of the pages if we know that we don't need to send them.  I think this
>>>> is based on the fact that on the destination side all the pages are by
>>>> default zero after they are malloced.  While before this series, IIUC
>>>> any migration will send every single page to destination, no matter
>>>> whether it's zeroed or not.  So I'm uncertain about whether this will
>>>> affect the received bitmap on the destination side.  Say, before this
>>>> series, the received bitmap will directly cover the whole RAM bitmap
>>>> after migration is finished, now it's won't.  Will there be any side
>>>> effect?  I don't see obvious issue now, but just raise this question
>>>> up.
>>>>
>>>> Meanwhile, this reminds me about a more funny idea: whether we can
>>>> just avoid sending the zero pages directly from QEMU's perspective.
>>>> In other words, can we just do nothing if save_zero_page() detected
>>>> that the page is zero (I guess the is_zero_range() can be fast too,
>>>> but I don't know exactly how fast it is)?  And how that would be
>>>> differed from this page hinting way in either performance and other
>>>> aspects.
>>> I noticed a problem (after I wrote the above paragraph 5 minutes
>>> ago...): when a page was valid and sent to the destination (with
>>> non-zero data), however after a while that page was zeroed.  Then if
>>> we don't send zero pages at all, we won't send the page after it's
>>> zeroed.  Then on the destination side we'll have a stale non-zero
>>> page.  Is my understanding correct?  Will that be a problem to this
>>> series too where a valid page can be possibly freed and hinted?
>> I think that won't be an issue either for zero page optimization or this
>> free page optimization.
>>
>> For the zero page optimization, QEMU always sends compressed 0s to the
>> destination. The zero page is detected at the time QEMU checks it (before
>> sending the page). if it is a 0 page, QEMU compresses all 0s (actually just
>> a flag) and send it.
> what I meant is, can we just do not even send that ZERO flag at all? :)

I think you just figured out that zero pages and free pages are not 
completely the same case. So I guess this question is done :)
Please let me know if not.

>
>> For the free page optimization, we skip free pages (could be thought of as 0
>> pages in this context). The zero pages are detected at the time guest
>> reports it QEMU. The page won't be reported if it is non-zero (i.e. used).
> Sorry I must have not explained myself well.  Let's assume the page
> hint is used.  I meant this:
>
> - start precopy, page P is non-zero (let's say, page has content P1,
>    which is non-zero)
> - we send page P with content P1 on src, then latest destination cache
>    of page P is P1
> - page P is freed by the guest, then it becomes zero, dirty bitmap of
>    P is set since it's changed (from P1 to zeroed page)

The page doesn't become 0 itself when it stays on the free page list. 
Probably the above referred to this:
#1 memset(pageP, 0, PAGE_SIZE);
#2 kfree(pageP);

#1 causes the page to be tracked in the bitmap, and #2 may cause the 
page to be cleared from the bitmap. This is no different than the 
general case, a page is used, written to any value, and then freed.
Essentially, this leads to the question asked in another thread: does 
the data in free pages matter?

As far as I know, Linux treats values in free pages as garbage. People 
don't rely on values from free pages. It is similar as the case when we 
use an uninitialized variable, the compiler pops out a warning (not a 
correct behavior).

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
@ 2018-06-01 12:31           ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01 12:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/01/2018 06:02 PM, Peter Xu wrote:
> On Fri, Jun 01, 2018 at 03:29:45PM +0800, Wei Wang wrote:
>> On 06/01/2018 01:07 PM, Peter Xu wrote:
>>> On Fri, Jun 01, 2018 at 12:58:24PM +0800, Peter Xu wrote:
>>>> On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
>>>>> This is the deivce part implementation to add a new feature,
>>>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
>>>>> receives the guest free page hints from the driver and clears the
>>>>> corresponding bits in the dirty bitmap, so that those free pages are
>>>>> not transferred by the migration thread to the destination.
>>>>>
>>>>> - Test Environment
>>>>>       Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>>>>>       Guest: 8G RAM, 4 vCPU
>>>>>       Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
>>>>>
>>>>> - Test Results
>>>>>       - Idle Guest Live Migration Time (results are averaged over 10 runs):
>>>>>           - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
>>>>>       - Guest with Linux Compilation Workload (make bzImage -j4):
>>>>>           - Live Migration Time (average)
>>>>>             Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
>>>>>           - Linux Compilation Time
>>>>>             Optimization v.s. Legacy = 4min56s v.s. 5min3s
>>>>>             --> no obvious difference
>>>>>
>>>>> - Source Code
>>>>>       - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
>>>>>       - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
>>>> Hi, Wei,
>>>>
>>>> I have a very high-level question to the series.
>>>>
>>>> IIUC the core idea for this series is that we can avoid sending some
>>>> of the pages if we know that we don't need to send them.  I think this
>>>> is based on the fact that on the destination side all the pages are by
>>>> default zero after they are malloced.  While before this series, IIUC
>>>> any migration will send every single page to destination, no matter
>>>> whether it's zeroed or not.  So I'm uncertain about whether this will
>>>> affect the received bitmap on the destination side.  Say, before this
>>>> series, the received bitmap will directly cover the whole RAM bitmap
>>>> after migration is finished, now it's won't.  Will there be any side
>>>> effect?  I don't see obvious issue now, but just raise this question
>>>> up.
>>>>
>>>> Meanwhile, this reminds me about a more funny idea: whether we can
>>>> just avoid sending the zero pages directly from QEMU's perspective.
>>>> In other words, can we just do nothing if save_zero_page() detected
>>>> that the page is zero (I guess the is_zero_range() can be fast too,
>>>> but I don't know exactly how fast it is)?  And how that would be
>>>> differed from this page hinting way in either performance and other
>>>> aspects.
>>> I noticed a problem (after I wrote the above paragraph 5 minutes
>>> ago...): when a page was valid and sent to the destination (with
>>> non-zero data), however after a while that page was zeroed.  Then if
>>> we don't send zero pages at all, we won't send the page after it's
>>> zeroed.  Then on the destination side we'll have a stale non-zero
>>> page.  Is my understanding correct?  Will that be a problem to this
>>> series too where a valid page can be possibly freed and hinted?
>> I think that won't be an issue either for zero page optimization or this
>> free page optimization.
>>
>> For the zero page optimization, QEMU always sends compressed 0s to the
>> destination. The zero page is detected at the time QEMU checks it (before
>> sending the page). if it is a 0 page, QEMU compresses all 0s (actually just
>> a flag) and send it.
> what I meant is, can we just do not even send that ZERO flag at all? :)

I think you just figured out that zero pages and free pages are not 
completely the same case. So I guess this question is done :)
Please let me know if not.

>
>> For the free page optimization, we skip free pages (could be thought of as 0
>> pages in this context). The zero pages are detected at the time guest
>> reports it QEMU. The page won't be reported if it is non-zero (i.e. used).
> Sorry I must have not explained myself well.  Let's assume the page
> hint is used.  I meant this:
>
> - start precopy, page P is non-zero (let's say, page has content P1,
>    which is non-zero)
> - we send page P with content P1 on src, then latest destination cache
>    of page P is P1
> - page P is freed by the guest, then it becomes zero, dirty bitmap of
>    P is set since it's changed (from P1 to zeroed page)

The page doesn't become 0 itself when it stays on the free page list. 
Probably the above referred to this:
#1 memset(pageP, 0, PAGE_SIZE);
#2 kfree(pageP);

#1 causes the page to be tracked in the bitmap, and #2 may cause the 
page to be cleared from the bitmap. This is no different than the 
general case, a page is used, written to any value, and then freed.
Essentially, this leads to the question asked in another thread: does 
the data in free pages matter?

As far as I know, Linux treats values in free pages as garbage. People 
don't rely on values from free pages. It is similar as the case when we 
use an uninitialized variable, the compiler pops out a warning (not a 
correct behavior).

Best,
Wei




---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-06-01 10:06       ` Peter Xu
@ 2018-06-01 12:32           ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01 12:32 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/01/2018 06:06 PM, Peter Xu wrote:
> On Fri, Jun 01, 2018 at 03:36:01PM +0800, Wei Wang wrote:
>> On 06/01/2018 12:00 PM, Peter Xu wrote:
>>> On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote:
>>>>    /*
>>>> + * This function clears bits of the free pages reported by the caller from the
>>>> + * migration dirty bitmap. @addr is the host address corresponding to the
>>>> + * start of the continuous guest free pages, and @len is the total bytes of
>>>> + * those pages.
>>>> + */
>>>> +void qemu_guest_free_page_hint(void *addr, size_t len)
>>>> +{
>>>> +    RAMBlock *block;
>>>> +    ram_addr_t offset;
>>>> +    size_t used_len, start, npages;
>>> Do we need to check here on whether a migration is in progress?  Since
>>> if not I'm not sure whether this hint still makes any sense any more,
>>> and more importantly it seems to me that block->bmap below at [1] is
>>> only valid during a migration.  So I'm not sure whether QEMU will
>>> crash if this function is called without a running migration.
>> OK. How about just adding comments above to have users noted that this
>> function should be used during migration?
>>
>> If we want to do a sanity check here, I think it would be easier to just
>> check !block->bmap here.
> I think the faster way might be that we check against the migration
> state.
>

Sounds good. We can do a sanity check:

     MigrationState *s = migrate_get_current();
     if (!migration_is_setup_or_active(s->state))
         return;



>>
>>>> +
>>>> +    for (; len > 0; len -= used_len) {
>>>> +        block = qemu_ram_block_from_host(addr, false, &offset);
>>>> +        if (unlikely(!block)) {
>>>> +            return;
>>> We should never reach here, should we?  Assuming the callers of this
>>> function should always pass in a correct host address. If we are very
>>> sure that the host addr should be valid, could we just assert?
>> Probably not the case, because of the corner case that the memory would be
>> hot unplugged after the free page is reported to QEMU.
> Question: Do we allow to do hot plug/unplug for memory during
> migration?

I think so. From the code, I don't find where it forbids memory hotplug 
during migration.

>>
>>
>>>> +        }
>>>> +
>>>> +        /*
>>>> +         * This handles the case that the RAMBlock is resized after the free
>>>> +         * page hint is reported.
>>>> +         */
>>>> +        if (unlikely(offset > block->used_length)) {
>>>> +            return;
>>>> +        }
>>>> +
>>>> +        if (len <= block->used_length - offset) {
>>>> +            used_len = len;
>>>> +        } else {
>>>> +            used_len = block->used_length - offset;
>>>> +            addr += used_len;
>>>> +        }
>>>> +
>>>> +        start = offset >> TARGET_PAGE_BITS;
>>>> +        npages = used_len >> TARGET_PAGE_BITS;
>>>> +
>>>> +        qemu_mutex_lock(&ram_state->bitmap_mutex);
>>> So now I think I understand the lock can still be meaningful since
>>> this function now can be called outside the migration thread (e.g., in
>>> vcpu thread).  But still it would be nice to mention it somewhere on
> (Actually after read the next patch I think it's in iothread, so I'd
>   better reply with all the series read over next time :)

That's fine actually :) Whether it is called by an iothread or a vcpu 
thread doesn't affect our discussion here.

I think we could just focus on the interfaces here and the usage in live 
migration. I can explain more when needed.

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-06-01 12:32           ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-01 12:32 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/01/2018 06:06 PM, Peter Xu wrote:
> On Fri, Jun 01, 2018 at 03:36:01PM +0800, Wei Wang wrote:
>> On 06/01/2018 12:00 PM, Peter Xu wrote:
>>> On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote:
>>>>    /*
>>>> + * This function clears bits of the free pages reported by the caller from the
>>>> + * migration dirty bitmap. @addr is the host address corresponding to the
>>>> + * start of the continuous guest free pages, and @len is the total bytes of
>>>> + * those pages.
>>>> + */
>>>> +void qemu_guest_free_page_hint(void *addr, size_t len)
>>>> +{
>>>> +    RAMBlock *block;
>>>> +    ram_addr_t offset;
>>>> +    size_t used_len, start, npages;
>>> Do we need to check here on whether a migration is in progress?  Since
>>> if not I'm not sure whether this hint still makes any sense any more,
>>> and more importantly it seems to me that block->bmap below at [1] is
>>> only valid during a migration.  So I'm not sure whether QEMU will
>>> crash if this function is called without a running migration.
>> OK. How about just adding comments above to have users noted that this
>> function should be used during migration?
>>
>> If we want to do a sanity check here, I think it would be easier to just
>> check !block->bmap here.
> I think the faster way might be that we check against the migration
> state.
>

Sounds good. We can do a sanity check:

     MigrationState *s = migrate_get_current();
     if (!migration_is_setup_or_active(s->state))
         return;



>>
>>>> +
>>>> +    for (; len > 0; len -= used_len) {
>>>> +        block = qemu_ram_block_from_host(addr, false, &offset);
>>>> +        if (unlikely(!block)) {
>>>> +            return;
>>> We should never reach here, should we?  Assuming the callers of this
>>> function should always pass in a correct host address. If we are very
>>> sure that the host addr should be valid, could we just assert?
>> Probably not the case, because of the corner case that the memory would be
>> hot unplugged after the free page is reported to QEMU.
> Question: Do we allow to do hot plug/unplug for memory during
> migration?

I think so. From the code, I don't find where it forbids memory hotplug 
during migration.

>>
>>
>>>> +        }
>>>> +
>>>> +        /*
>>>> +         * This handles the case that the RAMBlock is resized after the free
>>>> +         * page hint is reported.
>>>> +         */
>>>> +        if (unlikely(offset > block->used_length)) {
>>>> +            return;
>>>> +        }
>>>> +
>>>> +        if (len <= block->used_length - offset) {
>>>> +            used_len = len;
>>>> +        } else {
>>>> +            used_len = block->used_length - offset;
>>>> +            addr += used_len;
>>>> +        }
>>>> +
>>>> +        start = offset >> TARGET_PAGE_BITS;
>>>> +        npages = used_len >> TARGET_PAGE_BITS;
>>>> +
>>>> +        qemu_mutex_lock(&ram_state->bitmap_mutex);
>>> So now I think I understand the lock can still be meaningful since
>>> this function now can be called outside the migration thread (e.g., in
>>> vcpu thread).  But still it would be nice to mention it somewhere on
> (Actually after read the next patch I think it's in iothread, so I'd
>   better reply with all the series read over next time :)

That's fine actually :) Whether it is called by an iothread or a vcpu 
thread doesn't affect our discussion here.

I think we could just focus on the interfaces here and the usage in live 
migration. I can explain more when needed.

Best,
Wei




---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-06-01 10:40     ` Peter Xu
@ 2018-06-01 15:33       ` Dr. David Alan Gilbert
  2018-06-05  6:42         ` Peter Xu
  2018-06-05 14:39           ` [virtio-dev] " Michael S. Tsirkin
  0 siblings, 2 replies; 93+ messages in thread
From: Dr. David Alan Gilbert @ 2018-06-01 15:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: Wei Wang, qemu-devel, virtio-dev, mst, quintela, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Jun 01, 2018 at 03:21:54PM +0800, Wei Wang wrote:
> > On 06/01/2018 12:58 PM, Peter Xu wrote:
> > > On Tue, Apr 24, 2018 at 02:13:43PM +0800, Wei Wang wrote:
> > > > This is the deivce part implementation to add a new feature,
> > > > VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
> > > > receives the guest free page hints from the driver and clears the
> > > > corresponding bits in the dirty bitmap, so that those free pages are
> > > > not transferred by the migration thread to the destination.
> > > > 
> > > > - Test Environment
> > > >      Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > > >      Guest: 8G RAM, 4 vCPU
> > > >      Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > > > 
> > > > - Test Results
> > > >      - Idle Guest Live Migration Time (results are averaged over 10 runs):
> > > >          - Optimization v.s. Legacy = 271ms vs 1769ms --> ~86% reduction
> > > >      - Guest with Linux Compilation Workload (make bzImage -j4):
> > > >          - Live Migration Time (average)
> > > >            Optimization v.s. Legacy = 1265ms v.s. 2634ms --> ~51% reduction
> > > >          - Linux Compilation Time
> > > >            Optimization v.s. Legacy = 4min56s v.s. 5min3s
> > > >            --> no obvious difference
> > > > 
> > > > - Source Code
> > > >      - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
> > > >      - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git
> > > Hi, Wei,
> > > 
> > > I have a very high-level question to the series.
> > 
> > Hi Peter,
> > 
> > Thanks for joining the discussion :)
> 
> Thanks for letting me know this thread.  It's an interesting idea. :)
> 
> > 
> > > 
> > > IIUC the core idea for this series is that we can avoid sending some
> > > of the pages if we know that we don't need to send them.  I think this
> > > is based on the fact that on the destination side all the pages are by
> > > default zero after they are malloced.  While before this series, IIUC
> > > any migration will send every single page to destination, no matter
> > > whether it's zeroed or not.  So I'm uncertain about whether this will
> > > affect the received bitmap on the destination side.  Say, before this
> > > series, the received bitmap will directly cover the whole RAM bitmap
> > > after migration is finished, now it's won't.  Will there be any side
> > > effect?  I don't see obvious issue now, but just raise this question
> > > up.
> > 
> > This feature currently only supports pre-copy (I think the received bitmap
> > is something matters to post copy only).
> > That's why we have
> > rs->free_page_support = ..&& !migrate_postcopy();
> 
> Okay.
> 
> > 
> > > Meanwhile, this reminds me about a more funny idea: whether we can
> > > just avoid sending the zero pages directly from QEMU's perspective.
> > > In other words, can we just do nothing if save_zero_page() detected
> > > that the page is zero (I guess the is_zero_range() can be fast too,
> > > but I don't know exactly how fast it is)?  And how that would be
> > > differed from this page hinting way in either performance and other
> > > aspects.
> > 
> > I guess you referred to the zero page optimization. I think the major
> > overhead comes to the zero page checking - lots of memory accesses, which
> > also waste memory bandwidth. Please see the results attached in the cover
> > letter. The legacy case already includes the zero page optimization.
> 
> I replied in the other thread.  We can discuss there altogether.
> 
> Actually after a second thought I think maybe what I worried there is
> exactly the reason why we must send the zero page flag - otherwise
> there can be stale non-zero page on destination.  Here "zero page" and
> "freed page" is totally different idea since even if a page is zeroed
> it might still be in use (not freed)!  While instead for a "free page"
> even if it's non-zero we might be able to not send it at all, though I
> am not sure whether that mismatch of data might cause any side effect
> too. I think the corresponding question would be: if a page is freed
> in Linux kernel, would its data matter any more?

I think the answer is no - it doesn't matter; by telling the hypervisor
the page is 'free' the kernel gives freedom to the hypervisor to
discard the page contents.
Now, that is trusting the kernel to get it's 'free' flags right,
and we wouldn't want a malicious guest kernel to be able to read random
data, so we have to be a little careful that what actually lands
in there is something the guest has had at some point - or zero
which is a very nice empty value.

Dave

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-06-01 12:32           ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-04  2:49           ` Peter Xu
  2018-06-04  7:43               ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-04  2:49 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 01, 2018 at 08:32:27PM +0800, Wei Wang wrote:
> On 06/01/2018 06:06 PM, Peter Xu wrote:
> > On Fri, Jun 01, 2018 at 03:36:01PM +0800, Wei Wang wrote:
> > > On 06/01/2018 12:00 PM, Peter Xu wrote:
> > > > On Tue, Apr 24, 2018 at 02:13:46PM +0800, Wei Wang wrote:
> > > > >    /*
> > > > > + * This function clears bits of the free pages reported by the caller from the
> > > > > + * migration dirty bitmap. @addr is the host address corresponding to the
> > > > > + * start of the continuous guest free pages, and @len is the total bytes of
> > > > > + * those pages.
> > > > > + */
> > > > > +void qemu_guest_free_page_hint(void *addr, size_t len)
> > > > > +{
> > > > > +    RAMBlock *block;
> > > > > +    ram_addr_t offset;
> > > > > +    size_t used_len, start, npages;
> > > > Do we need to check here on whether a migration is in progress?  Since
> > > > if not I'm not sure whether this hint still makes any sense any more,
> > > > and more importantly it seems to me that block->bmap below at [1] is
> > > > only valid during a migration.  So I'm not sure whether QEMU will
> > > > crash if this function is called without a running migration.
> > > OK. How about just adding comments above to have users noted that this
> > > function should be used during migration?
> > > 
> > > If we want to do a sanity check here, I think it would be easier to just
> > > check !block->bmap here.
> > I think the faster way might be that we check against the migration
> > state.
> > 
> 
> Sounds good. We can do a sanity check:
> 
>     MigrationState *s = migrate_get_current();
>     if (!migration_is_setup_or_active(s->state))
>         return;

Yes.

> 
> 
> 
> > > 
> > > > > +
> > > > > +    for (; len > 0; len -= used_len) {
> > > > > +        block = qemu_ram_block_from_host(addr, false, &offset);
> > > > > +        if (unlikely(!block)) {
> > > > > +            return;
> > > > We should never reach here, should we?  Assuming the callers of this
> > > > function should always pass in a correct host address. If we are very
> > > > sure that the host addr should be valid, could we just assert?
> > > Probably not the case, because of the corner case that the memory would be
> > > hot unplugged after the free page is reported to QEMU.
> > Question: Do we allow to do hot plug/unplug for memory during
> > migration?
> 
> I think so. From the code, I don't find where it forbids memory hotplug
> during migration.

I don't play with that much; do we need to do "device_add" after all?

  (qemu) object_add memory-backend-file,id=mem1,size=1G,mem-path=/mnt/hugepages-1GB
  (qemu) device_add pc-dimm,id=dimm1,memdev=mem1

If so, we may not allow that since in qdev_device_add() we don't allow
that:

    if (!migration_is_idle()) {
        error_setg(errp, "device_add not allowed while migrating");
        return NULL;
    }

> 
> > > 
> > > 
> > > > > +        }
> > > > > +
> > > > > +        /*
> > > > > +         * This handles the case that the RAMBlock is resized after the free
> > > > > +         * page hint is reported.
> > > > > +         */
> > > > > +        if (unlikely(offset > block->used_length)) {
> > > > > +            return;
> > > > > +        }
> > > > > +
> > > > > +        if (len <= block->used_length - offset) {
> > > > > +            used_len = len;
> > > > > +        } else {
> > > > > +            used_len = block->used_length - offset;
> > > > > +            addr += used_len;
> > > > > +        }
> > > > > +
> > > > > +        start = offset >> TARGET_PAGE_BITS;
> > > > > +        npages = used_len >> TARGET_PAGE_BITS;
> > > > > +
> > > > > +        qemu_mutex_lock(&ram_state->bitmap_mutex);
> > > > So now I think I understand the lock can still be meaningful since
> > > > this function now can be called outside the migration thread (e.g., in
> > > > vcpu thread).  But still it would be nice to mention it somewhere on
> > (Actually after read the next patch I think it's in iothread, so I'd
> >   better reply with all the series read over next time :)
> 
> That's fine actually :) Whether it is called by an iothread or a vcpu thread
> doesn't affect our discussion here.
> 
> I think we could just focus on the interfaces here and the usage in live
> migration. I can explain more when needed.

Ok.  Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-06-04  2:49           ` Peter Xu
@ 2018-06-04  7:43               ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-04  7:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/04/2018 10:49 AM, Peter Xu wrote:
>
>>
>>
>>>>>> +
>>>>>> +    for (; len > 0; len -= used_len) {
>>>>>> +        block = qemu_ram_block_from_host(addr, false, &offset);
>>>>>> +        if (unlikely(!block)) {
>>>>>> +            return;
>>>>> We should never reach here, should we?  Assuming the callers of this
>>>>> function should always pass in a correct host address. If we are very
>>>>> sure that the host addr should be valid, could we just assert?
>>>> Probably not the case, because of the corner case that the memory would be
>>>> hot unplugged after the free page is reported to QEMU.
>>> Question: Do we allow to do hot plug/unplug for memory during
>>> migration?
>> I think so. From the code, I don't find where it forbids memory hotplug
>> during migration.
> I don't play with that much; do we need to do "device_add" after all?
>
>    (qemu) object_add memory-backend-file,id=mem1,size=1G,mem-path=/mnt/hugepages-1GB
>    (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
>
> If so, we may not allow that since in qdev_device_add() we don't allow
> that:
>
>      if (!migration_is_idle()) {
>          error_setg(errp, "device_add not allowed while migrating");
>          return NULL;
>      }
>

OK, I missed that part, and thanks for correcting it. I'll use an assert 
there if no objections from others.


Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-06-04  7:43               ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-04  7:43 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/04/2018 10:49 AM, Peter Xu wrote:
>
>>
>>
>>>>>> +
>>>>>> +    for (; len > 0; len -= used_len) {
>>>>>> +        block = qemu_ram_block_from_host(addr, false, &offset);
>>>>>> +        if (unlikely(!block)) {
>>>>>> +            return;
>>>>> We should never reach here, should we?  Assuming the callers of this
>>>>> function should always pass in a correct host address. If we are very
>>>>> sure that the host addr should be valid, could we just assert?
>>>> Probably not the case, because of the corner case that the memory would be
>>>> hot unplugged after the free page is reported to QEMU.
>>> Question: Do we allow to do hot plug/unplug for memory during
>>> migration?
>> I think so. From the code, I don't find where it forbids memory hotplug
>> during migration.
> I don't play with that much; do we need to do "device_add" after all?
>
>    (qemu) object_add memory-backend-file,id=mem1,size=1G,mem-path=/mnt/hugepages-1GB
>    (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
>
> If so, we may not allow that since in qdev_device_add() we don't allow
> that:
>
>      if (!migration_is_idle()) {
>          error_setg(errp, "device_add not allowed while migrating");
>          return NULL;
>      }
>

OK, I missed that part, and thanks for correcting it. I'll use an assert 
there if no objections from others.


Best,
Wei



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-05-30 12:47         ` [virtio-dev] " Michael S. Tsirkin
@ 2018-06-04  8:04           ` Wei Wang
  -1 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-04  8:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang, Peter Xu

On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
> On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
>> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
>>> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>>>> +/*
>>>> + * Balloon will report pages which were free at the time of this call. As the
>>>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>>>> + * this call is made.
>>>> + */
>>>> +void balloon_free_page_start(void)
>>>> +{
>>>> +    balloon_free_page_start_fn(balloon_opaque);
>>>> +}
>>> Please create notifier support, not a single global.
>> OK. The start is called at the end of bitmap_sync, and the stop is called at
>> the beginning of bitmap_sync. In this case, we will need to add two
>> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
>> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?

Peter, do you have any thought about this?

Currently, the usage of free page optimization isn't limited to the 
first stage. It is used in each stage. A global call to start the free 
page optimization is made after bitmap sync, and another global call to 
stop the optimization is made before bitmap sync. It is simple to just 
use global calls.

If we change the implementation to use notifiers, I think we will need 
to add two new MigrationStatus as above. Would you think that is 
worthwhile for some reason?

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-04  8:04           ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-04  8:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel,
	zhang.zhanghailiang, Peter Xu

On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
> On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
>> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
>>> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>>>> +/*
>>>> + * Balloon will report pages which were free at the time of this call. As the
>>>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>>>> + * this call is made.
>>>> + */
>>>> +void balloon_free_page_start(void)
>>>> +{
>>>> +    balloon_free_page_start_fn(balloon_opaque);
>>>> +}
>>> Please create notifier support, not a single global.
>> OK. The start is called at the end of bitmap_sync, and the stop is called at
>> the beginning of bitmap_sync. In this case, we will need to add two
>> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
>> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?

Peter, do you have any thought about this?

Currently, the usage of free page optimization isn't limited to the 
first stage. It is used in each stage. A global call to start the free 
page optimization is made after bitmap sync, and another global call to 
stop the optimization is made before bitmap sync. It is simple to just 
use global calls.

If we change the implementation to use notifiers, I think we will need 
to add two new MigrationStatus as above. Would you think that is 
worthwhile for some reason?

Best,
Wei




---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-06-01 15:33       ` Dr. David Alan Gilbert
@ 2018-06-05  6:42         ` Peter Xu
  2018-06-05 14:40             ` [virtio-dev] " Michael S. Tsirkin
  2018-06-05 14:39           ` [virtio-dev] " Michael S. Tsirkin
  1 sibling, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-05  6:42 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Wei Wang, qemu-devel, virtio-dev, mst, quintela, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 01, 2018 at 04:33:29PM +0100, Dr. David Alan Gilbert wrote:

[...]

> > > > Meanwhile, this reminds me about a more funny idea: whether we can
> > > > just avoid sending the zero pages directly from QEMU's perspective.
> > > > In other words, can we just do nothing if save_zero_page() detected
> > > > that the page is zero (I guess the is_zero_range() can be fast too,
> > > > but I don't know exactly how fast it is)?  And how that would be
> > > > differed from this page hinting way in either performance and other
> > > > aspects.
> > > 
> > > I guess you referred to the zero page optimization. I think the major
> > > overhead comes to the zero page checking - lots of memory accesses, which
> > > also waste memory bandwidth. Please see the results attached in the cover
> > > letter. The legacy case already includes the zero page optimization.
> > 
> > I replied in the other thread.  We can discuss there altogether.
> > 
> > Actually after a second thought I think maybe what I worried there is
> > exactly the reason why we must send the zero page flag - otherwise
> > there can be stale non-zero page on destination.  Here "zero page" and
> > "freed page" is totally different idea since even if a page is zeroed
> > it might still be in use (not freed)!  While instead for a "free page"
> > even if it's non-zero we might be able to not send it at all, though I
> > am not sure whether that mismatch of data might cause any side effect
> > too. I think the corresponding question would be: if a page is freed
> > in Linux kernel, would its data matter any more?
> 
> I think the answer is no - it doesn't matter; by telling the hypervisor
> the page is 'free' the kernel gives freedom to the hypervisor to
> discard the page contents.

Yeh it seems so.  I just read over the whole work so I think there is
a future work for the poisoned bits.  If that's the only usage that
might make the content of freed page meaningful then it seems fine to
me.  After all I don't know much about that...  However still this
seems to be a bit tricky, e.g., we need to be very careful on the
guest OS side (when writting up the balloon driver for one guest OS)
to make sure of that otherwise it'll be very easy to break a guest
when something similar is enabled without our notice just like the
poisoned feature.

> Now, that is trusting the kernel to get it's 'free' flags right,
> and we wouldn't want a malicious guest kernel to be able to read random
> data, so we have to be a little careful that what actually lands
> in there is something the guest has had at some point - or zero
> which is a very nice empty value.

Yeah I agree - basically this feature brings more trouble from the
security POV, but I don't know whether that can be a problem since
after all we can disable this when we care very much about security.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-04  8:04           ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-05  6:58           ` Peter Xu
  2018-06-05 13:22               ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-05  6:58 UTC (permalink / raw)
  To: Wei Wang
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On Mon, Jun 04, 2018 at 04:04:51PM +0800, Wei Wang wrote:
> On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
> > On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
> > > On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
> > > > > +/*
> > > > > + * Balloon will report pages which were free at the time of this call. As the
> > > > > + * reporting happens asynchronously, dirty bit logging must be enabled before
> > > > > + * this call is made.
> > > > > + */
> > > > > +void balloon_free_page_start(void)
> > > > > +{
> > > > > +    balloon_free_page_start_fn(balloon_opaque);
> > > > > +}
> > > > Please create notifier support, not a single global.
> > > OK. The start is called at the end of bitmap_sync, and the stop is called at
> > > the beginning of bitmap_sync. In this case, we will need to add two
> > > migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
> > > MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
> 
> Peter, do you have any thought about this?
> 
> Currently, the usage of free page optimization isn't limited to the first
> stage. It is used in each stage. A global call to start the free page
> optimization is made after bitmap sync, and another global call to stop the
> optimization is made before bitmap sync. It is simple to just use global
> calls.
> 
> If we change the implementation to use notifiers, I think we will need to
> add two new MigrationStatus as above. Would you think that is worthwhile for
> some reason?

I'm a bit confused.  Could you elaborate why we need those extra
states?

Or, to ask a more general question - could you elaborate a bit on how
you order these operations?  I would be really glad if you can point
me to some documents for the feature.  Is there any latest virtio
document that I can refer to (or old cover letter links)?  It'll be
good if the document could mention about things like:

- why we need this feature? Is that purely for migration purpose?  Or
  it can be used somewhere else too?
- high level stuff about how this is implemented, e.g.:
  - the protocol of the new virtio queues
  - how we should get the free page hints (please see below)

For now, what I see is that we do:

(1) stop hinting
(2) sync bitmap
(3) start hinting

Why this order?  My understanding is that obviously there is a race
between the page hinting thread and the dirty bitmap tracking part
(which is done in KVM).  How do we make sure there is no race?

An direct question is that, do we need to make sure step (1) must be
before step (2)?  Asked since I see that currently step (1) is an
async operation (taking a lock, set status, then return).  Then would
such an async operation satisfy any ordering requirement after all?

Btw, I would appreciate if you can push your new trees (both QEMU and
kernel) to the links you mentioned in the cover letter - I noticed
that they are not the same as what you have posted on the list.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-05  6:58           ` [Qemu-devel] " Peter Xu
@ 2018-06-05 13:22               ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-05 13:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/05/2018 02:58 PM, Peter Xu wrote:
> On Mon, Jun 04, 2018 at 04:04:51PM +0800, Wei Wang wrote:
>> On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
>>>> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>>>>>> +/*
>>>>>> + * Balloon will report pages which were free at the time of this call. As the
>>>>>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>>>>>> + * this call is made.
>>>>>> + */
>>>>>> +void balloon_free_page_start(void)
>>>>>> +{
>>>>>> +    balloon_free_page_start_fn(balloon_opaque);
>>>>>> +}
>>>>> Please create notifier support, not a single global.
>>>> OK. The start is called at the end of bitmap_sync, and the stop is called at
>>>> the beginning of bitmap_sync. In this case, we will need to add two
>>>> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
>>>> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
>> Peter, do you have any thought about this?
>>
>> Currently, the usage of free page optimization isn't limited to the first
>> stage. It is used in each stage. A global call to start the free page
>> optimization is made after bitmap sync, and another global call to stop the
>> optimization is made before bitmap sync. It is simple to just use global
>> calls.
>>
>> If we change the implementation to use notifiers, I think we will need to
>> add two new MigrationStatus as above. Would you think that is worthwhile for
>> some reason?
> I'm a bit confused.  Could you elaborate why we need those extra
> states?

Sure. Notifiers are used when an event happens. In this case, it would 
be a state change, which invokes the state change callback. So I think 
we probably need to add 2 new states for the start and stop callback.


> Or, to ask a more general question - could you elaborate a bit on how
> you order these operations?  I would be really glad if you can point
> me to some documents for the feature.  Is there any latest virtio
> document that I can refer to (or old cover letter links)?  It'll be
> good if the document could mention about things like:

I haven't made documents to explain it yet. It's planed to be ready 
after this code series is done. But I'm glad to answer the questions below.


>
> - why we need this feature? Is that purely for migration purpose?  Or
>    it can be used somewhere else too?

Yes. Migration is the one that currently benefits a lot from this 
feature. I haven't thought of others so far. It is common that new 
features start with just 1 or 2 typical use cases.


> - high level stuff about how this is implemented, e.g.:
>    - the protocol of the new virtio queues
>    - how we should get the free page hints (please see below)

The high-level introduction would be
1. host sends a start cmd id to the guest;
2. the guest starts a new round of reporting by sending a cmd_id+free 
page hints to host;
3. QEMU side optimization code applies the free page hints (filter them 
from the dirty bitmap) only when the reported cmd id matches the one 
that was just sent.

The protocol was suggested by Michael and has been thoroughly discussed 
when upstreaming the kernel part. It might not be necessary to go over 
that again :)
I would suggest to focus on the supplied interface and its usage in live 
migration. That is, now we have two APIs, start() and stop(), to start 
and stop the optimization.

1) where in the migration code should we use them (do you agree with the 
step (1), (2), (3) you concluded below?)
2) how should we use them, directly do global call or via notifiers?

>
> For now, what I see is that we do:
>
> (1) stop hinting
> (2) sync bitmap
> (3) start hinting
>
> Why this order?

We start to filter out free pages from the dirty bitmap only when all 
the dirty bits are ready there, i.e. after sync bitmap. To some degree, 
the action of synchronizing bitmap indicates the end of the last round 
and the beginning of the new round, so we stop the free page 
optimization for the old round when the old round ends.


>   My understanding is that obviously there is a race
> between the page hinting thread and the dirty bitmap tracking part
> (which is done in KVM).  How do we make sure there is no race?

Could you please explain more about the race you saw? (free page is 
reported from the guest, and the bitmap is tracked in KVM)



>
> An direct question is that, do we need to make sure step (1) must be
> before step (2)?  Asked since I see that currently step (1) is an
> async operation (taking a lock, set status, then return).  Then would
> such an async operation satisfy any ordering requirement after all?

Yes. Step(1) guarantees us that the QEMU side optimization call has 
exited (we don't need to rely on guest side ACK because the guest could 
be in any state). This is enough. If the guest continues to report after 
that, that reported hints will be detected as stale hints and dropped in 
the next start of optimization.


>
> Btw, I would appreciate if you can push your new trees (both QEMU and
> kernel) to the links you mentioned in the cover letter - I noticed
> that they are not the same as what you have posted on the list.
>

Sure.
For kernel part, you can get it from linux-next: 
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
For the v7 QEMU part: 
git://github.com/wei-w-wang/qemu-free-page-hint.git (my connection to 
github is too slow, it would be ready in 24hours, I can also send you 
the raw patches via email if you need)


Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-05 13:22               ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-05 13:22 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/05/2018 02:58 PM, Peter Xu wrote:
> On Mon, Jun 04, 2018 at 04:04:51PM +0800, Wei Wang wrote:
>> On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
>>> On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
>>>> On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
>>>>> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>>>>>> +/*
>>>>>> + * Balloon will report pages which were free at the time of this call. As the
>>>>>> + * reporting happens asynchronously, dirty bit logging must be enabled before
>>>>>> + * this call is made.
>>>>>> + */
>>>>>> +void balloon_free_page_start(void)
>>>>>> +{
>>>>>> +    balloon_free_page_start_fn(balloon_opaque);
>>>>>> +}
>>>>> Please create notifier support, not a single global.
>>>> OK. The start is called at the end of bitmap_sync, and the stop is called at
>>>> the beginning of bitmap_sync. In this case, we will need to add two
>>>> migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
>>>> MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
>> Peter, do you have any thought about this?
>>
>> Currently, the usage of free page optimization isn't limited to the first
>> stage. It is used in each stage. A global call to start the free page
>> optimization is made after bitmap sync, and another global call to stop the
>> optimization is made before bitmap sync. It is simple to just use global
>> calls.
>>
>> If we change the implementation to use notifiers, I think we will need to
>> add two new MigrationStatus as above. Would you think that is worthwhile for
>> some reason?
> I'm a bit confused.  Could you elaborate why we need those extra
> states?

Sure. Notifiers are used when an event happens. In this case, it would 
be a state change, which invokes the state change callback. So I think 
we probably need to add 2 new states for the start and stop callback.


> Or, to ask a more general question - could you elaborate a bit on how
> you order these operations?  I would be really glad if you can point
> me to some documents for the feature.  Is there any latest virtio
> document that I can refer to (or old cover letter links)?  It'll be
> good if the document could mention about things like:

I haven't made documents to explain it yet. It's planed to be ready 
after this code series is done. But I'm glad to answer the questions below.


>
> - why we need this feature? Is that purely for migration purpose?  Or
>    it can be used somewhere else too?

Yes. Migration is the one that currently benefits a lot from this 
feature. I haven't thought of others so far. It is common that new 
features start with just 1 or 2 typical use cases.


> - high level stuff about how this is implemented, e.g.:
>    - the protocol of the new virtio queues
>    - how we should get the free page hints (please see below)

The high-level introduction would be
1. host sends a start cmd id to the guest;
2. the guest starts a new round of reporting by sending a cmd_id+free 
page hints to host;
3. QEMU side optimization code applies the free page hints (filter them 
from the dirty bitmap) only when the reported cmd id matches the one 
that was just sent.

The protocol was suggested by Michael and has been thoroughly discussed 
when upstreaming the kernel part. It might not be necessary to go over 
that again :)
I would suggest to focus on the supplied interface and its usage in live 
migration. That is, now we have two APIs, start() and stop(), to start 
and stop the optimization.

1) where in the migration code should we use them (do you agree with the 
step (1), (2), (3) you concluded below?)
2) how should we use them, directly do global call or via notifiers?

>
> For now, what I see is that we do:
>
> (1) stop hinting
> (2) sync bitmap
> (3) start hinting
>
> Why this order?

We start to filter out free pages from the dirty bitmap only when all 
the dirty bits are ready there, i.e. after sync bitmap. To some degree, 
the action of synchronizing bitmap indicates the end of the last round 
and the beginning of the new round, so we stop the free page 
optimization for the old round when the old round ends.


>   My understanding is that obviously there is a race
> between the page hinting thread and the dirty bitmap tracking part
> (which is done in KVM).  How do we make sure there is no race?

Could you please explain more about the race you saw? (free page is 
reported from the guest, and the bitmap is tracked in KVM)



>
> An direct question is that, do we need to make sure step (1) must be
> before step (2)?  Asked since I see that currently step (1) is an
> async operation (taking a lock, set status, then return).  Then would
> such an async operation satisfy any ordering requirement after all?

Yes. Step(1) guarantees us that the QEMU side optimization call has 
exited (we don't need to rely on guest side ACK because the guest could 
be in any state). This is enough. If the guest continues to report after 
that, that reported hints will be detected as stale hints and dropped in 
the next start of optimization.


>
> Btw, I would appreciate if you can push your new trees (both QEMU and
> kernel) to the links you mentioned in the cover letter - I noticed
> that they are not the same as what you have posted on the list.
>

Sure.
For kernel part, you can get it from linux-next: 
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
For the v7 QEMU part: 
git://github.com/wei-w-wang/qemu-free-page-hint.git (my connection to 
github is too slow, it would be ready in 24hours, I can also send you 
the raw patches via email if you need)


Best,
Wei


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-06-01 15:33       ` Dr. David Alan Gilbert
@ 2018-06-05 14:39           ` Michael S. Tsirkin
  2018-06-05 14:39           ` [virtio-dev] " Michael S. Tsirkin
  1 sibling, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-06-05 14:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Xu, Wei Wang, qemu-devel, virtio-dev, quintela,
	yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 01, 2018 at 04:33:29PM +0100, Dr. David Alan Gilbert wrote:
> I think the answer is no - it doesn't matter; by telling the hypervisor
> the page is 'free' the kernel gives freedom to the hypervisor to
> discard the page contents.

I'd like to call attention to this since it's easy to get confused.

That's not exactly true in the current interface.

It's a *hint* not a guarantee.

Let me explain.

It all starts with a request from hypervisor and each free page report
is matched to a request.  What the report says is that the page was free
*sometime after the request was sent to guest*.  If hypervisor was
tracking changes to page all the time since before sending the request,
it can conclude that page was free and can discard the contents.  If it
wasn't then it can't be sure and can not discard the page, it can maybe
use the hint for other decisions (e.g. unused => should be sent before
other pages).

-- 
MST

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
@ 2018-06-05 14:39           ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-06-05 14:39 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Peter Xu, Wei Wang, qemu-devel, virtio-dev, quintela,
	yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 01, 2018 at 04:33:29PM +0100, Dr. David Alan Gilbert wrote:
> I think the answer is no - it doesn't matter; by telling the hypervisor
> the page is 'free' the kernel gives freedom to the hypervisor to
> discard the page contents.

I'd like to call attention to this since it's easy to get confused.

That's not exactly true in the current interface.

It's a *hint* not a guarantee.

Let me explain.

It all starts with a request from hypervisor and each free page report
is matched to a request.  What the report says is that the page was free
*sometime after the request was sent to guest*.  If hypervisor was
tracking changes to page all the time since before sending the request,
it can conclude that page was free and can discard the contents.  If it
wasn't then it can't be sure and can not discard the page, it can maybe
use the hint for other decisions (e.g. unused => should be sent before
other pages).

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
  2018-06-05  6:42         ` Peter Xu
@ 2018-06-05 14:40             ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-06-05 14:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, Wei Wang, qemu-devel, virtio-dev,
	quintela, yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini,
	nilal

On Tue, Jun 05, 2018 at 02:42:40PM +0800, Peter Xu wrote:
> > I think the answer is no - it doesn't matter; by telling the hypervisor
> > the page is 'free' the kernel gives freedom to the hypervisor to
> > discard the page contents.
> 
> Yeh it seems so.

Well not exactly.  I replied to parent with a clarification.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support
@ 2018-06-05 14:40             ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-06-05 14:40 UTC (permalink / raw)
  To: Peter Xu
  Cc: Dr. David Alan Gilbert, Wei Wang, qemu-devel, virtio-dev,
	quintela, yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini,
	nilal

On Tue, Jun 05, 2018 at 02:42:40PM +0800, Peter Xu wrote:
> > I think the answer is no - it doesn't matter; by telling the hypervisor
> > the page is 'free' the kernel gives freedom to the hypervisor to
> > discard the page contents.
> 
> Yeh it seems so.

Well not exactly.  I replied to parent with a clarification.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-05 13:22               ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-06  5:42               ` Peter Xu
  2018-06-06 10:04                   ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-06  5:42 UTC (permalink / raw)
  To: Wei Wang
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On Tue, Jun 05, 2018 at 09:22:51PM +0800, Wei Wang wrote:
> On 06/05/2018 02:58 PM, Peter Xu wrote:
> > On Mon, Jun 04, 2018 at 04:04:51PM +0800, Wei Wang wrote:
> > > On 05/30/2018 08:47 PM, Michael S. Tsirkin wrote:
> > > > On Wed, May 30, 2018 at 05:12:09PM +0800, Wei Wang wrote:
> > > > > On 05/29/2018 11:24 PM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
> > > > > > > +/*
> > > > > > > + * Balloon will report pages which were free at the time of this call. As the
> > > > > > > + * reporting happens asynchronously, dirty bit logging must be enabled before
> > > > > > > + * this call is made.
> > > > > > > + */
> > > > > > > +void balloon_free_page_start(void)
> > > > > > > +{
> > > > > > > +    balloon_free_page_start_fn(balloon_opaque);
> > > > > > > +}
> > > > > > Please create notifier support, not a single global.
> > > > > OK. The start is called at the end of bitmap_sync, and the stop is called at
> > > > > the beginning of bitmap_sync. In this case, we will need to add two
> > > > > migration states, MIGRATION_STATUS_BEFORE_BITMAP_SYNC and
> > > > > MIGRATION_STATUS_AFTER_BITMAP_SYNC, right?
> > > Peter, do you have any thought about this?
> > > 
> > > Currently, the usage of free page optimization isn't limited to the first
> > > stage. It is used in each stage. A global call to start the free page
> > > optimization is made after bitmap sync, and another global call to stop the
> > > optimization is made before bitmap sync. It is simple to just use global
> > > calls.
> > > 
> > > If we change the implementation to use notifiers, I think we will need to
> > > add two new MigrationStatus as above. Would you think that is worthwhile for
> > > some reason?
> > I'm a bit confused.  Could you elaborate why we need those extra
> > states?
> 
> Sure. Notifiers are used when an event happens. In this case, it would be a
> state change, which invokes the state change callback. So I think we
> probably need to add 2 new states for the start and stop callback.

IMHO migration states do not suite here.  IMHO bitmap syncing is too
frequently an operation, especially at the end of a precopy migration.
If you really want to introduce some notifiers, I would prefer
something new rather than fiddling around with migration state.  E.g.,
maybe a new migration event notifiers, then introduce two new events
for both start/end of bitmap syncing.

> 
> 
> > Or, to ask a more general question - could you elaborate a bit on how
> > you order these operations?  I would be really glad if you can point
> > me to some documents for the feature.  Is there any latest virtio
> > document that I can refer to (or old cover letter links)?  It'll be
> > good if the document could mention about things like:
> 
> I haven't made documents to explain it yet. It's planed to be ready after
> this code series is done. But I'm glad to answer the questions below.

Ok, thanks.  If we are very sure we'll have a document, IMHO it'll be
very nice at least for reviewers to have the document as long as
prototyping is finished... But it's okay.

> 
> 
> > 
> > - why we need this feature? Is that purely for migration purpose?  Or
> >    it can be used somewhere else too?
> 
> Yes. Migration is the one that currently benefits a lot from this feature. I
> haven't thought of others so far. It is common that new features start with
> just 1 or 2 typical use cases.

Yes, it was a pure question actually; this is okay to me.

> 
> 
> > - high level stuff about how this is implemented, e.g.:
> >    - the protocol of the new virtio queues
> >    - how we should get the free page hints (please see below)
> 
> The high-level introduction would be
> 1. host sends a start cmd id to the guest;
> 2. the guest starts a new round of reporting by sending a cmd_id+free page
> hints to host;
> 3. QEMU side optimization code applies the free page hints (filter them from
> the dirty bitmap) only when the reported cmd id matches the one that was
> just sent.
> 
> The protocol was suggested by Michael and has been thoroughly discussed when
> upstreaming the kernel part. It might not be necessary to go over that again
> :)

I don't mean we should go back to review the content again; I mean we
still need to have such a knowledge on some of the details. Since
there is no document to properly define the interface between
migration code and the balloon API yet, IMHO it's still useful even
for a reviewer from migration pov to fully understand what's that
behind, especially this is quite low-level stuff to play around with
guest pages, and it contains some tricky points and potential
cross-over with e.g. dirty page trackings.

> I would suggest to focus on the supplied interface and its usage in live
> migration. That is, now we have two APIs, start() and stop(), to start and
> stop the optimization.
> 
> 1) where in the migration code should we use them (do you agree with the
> step (1), (2), (3) you concluded below?)
> 2) how should we use them, directly do global call or via notifiers?

I don't know how Dave and Juan might think; here I tend to agree with
Michael that some notifier framework should be nicer.

> 
> > 
> > For now, what I see is that we do:
> > 
> > (1) stop hinting
> > (2) sync bitmap
> > (3) start hinting
> > 
> > Why this order?
> 
> We start to filter out free pages from the dirty bitmap only when all the
> dirty bits are ready there, i.e. after sync bitmap. To some degree, the
> action of synchronizing bitmap indicates the end of the last round and the
> beginning of the new round, so we stop the free page optimization for the
> old round when the old round ends.

Yeh this looks sane to me.

> 
> 
> >   My understanding is that obviously there is a race
> > between the page hinting thread and the dirty bitmap tracking part
> > (which is done in KVM).  How do we make sure there is no race?
> 
> Could you please explain more about the race you saw? (free page is reported
> from the guest, and the bitmap is tracked in KVM)

It's the one I mentioned below...

> 
> 
> 
> > 
> > An direct question is that, do we need to make sure step (1) must be
> > before step (2)?  Asked since I see that currently step (1) is an
> > async operation (taking a lock, set status, then return).  Then would
> > such an async operation satisfy any ordering requirement after all?
> 
> Yes. Step(1) guarantees us that the QEMU side optimization call has exited
> (we don't need to rely on guest side ACK because the guest could be in any
> state).

This is not that obvious to me.  For now I think it's true, since when
we call stop() we'll take the mutex, meanwhile the mutex is actually
always held by the iothread (in the big loop in
virtio_balloon_poll_free_page_hints) until either:

- it sleeps in qemu_cond_wait() [1], or
- it leaves the big loop [2]

Since I don't see anyone who will set dev->block_iothread to true for
the balloon device, then [1] cannot happen; then I think when stop()
has taken the mutex then the thread must have quitted the big loop,
which goes to path [2].  I am not sure my understanding is correct,
but in all cases "Step(1) guarantees us that the QEMU side
optimization call has exited" is not obvious to me.  Would you add
some comment to the code (or even improve the code itself somehow) to
help people understand that?

For example, I saw that the old github code has a pthread_join() (in
that old code it was not using iothread at all).  That's a very good
code example so that people can understand that it's a synchronous
operations.

> This is enough. If the guest continues to report after that, that
> reported hints will be detected as stale hints and dropped in the next start
> of optimization.

This is not clear to me too.  Say, stop() only sets the balloon status
to STOP, AFAIU it does not really modify the cmd_id field immediately,
then how will the new coming hints be known as stale hints?

> 
> 
> > 
> > Btw, I would appreciate if you can push your new trees (both QEMU and
> > kernel) to the links you mentioned in the cover letter - I noticed
> > that they are not the same as what you have posted on the list.
> > 
> 
> Sure.
> For kernel part, you can get it from linux-next:
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
> For the v7 QEMU part: git://github.com/wei-w-wang/qemu-free-page-hint.git
> (my connection to github is too slow, it would be ready in 24hours, I can
> also send you the raw patches via email if you need)

No need to post patches; I can read the ones on the list for sure.
It's just a reminder in case you forgot to push the tree when sending
new versions.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-04-24  6:13   ` [virtio-dev] " Wei Wang
  (?)
  (?)
@ 2018-06-06  6:43   ` Peter Xu
  2018-06-06 10:11       ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-06  6:43 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:

[...]

> +static void virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;
> +
> +    while (1) {
> +        qemu_mutex_lock(&dev->free_page_lock);
> +        while (dev->block_iothread) {
> +            qemu_cond_wait(&dev->free_page_cond, &dev->free_page_lock);
> +        }
> +
> +        /*
> +         * If the migration thread actively stops the reporting, exit
> +         * immediately.
> +         */
> +        if (dev->free_page_report_status == FREE_PAGE_REPORT_S_STOP) {
> +            qemu_mutex_unlock(&dev->free_page_lock);
> +            break;
> +        }
> +
> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +        if (!elem) {
> +            qemu_mutex_unlock(&dev->free_page_lock);
> +            continue;
> +        }
> +
> +        if (elem->out_num) {
> +            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
> +            virtqueue_push(vq, elem, size);

Silly question: is this sending the same id back to guest?  Why?

> +            g_free(elem);
> +
> +            virtio_tswap32s(vdev, &id);
> +            if (unlikely(size != sizeof(id))) {
> +                virtio_error(vdev, "received an incorrect cmd id");

Forgot to unlock?

Maybe we can just move the lock operations outside:

  mutex_lock();
  while (1) {
    ...
    if (block) {
      qemu_cond_wait();
    }
    ...
    if (skip) {
      continue;
    }
    ...
    if (error) {
      break;
    }
    ...
  }
  mutex_unlock();

> +                break;
> +            }
> +            if (id == dev->free_page_report_cmd_id) {
> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> +            } else {
> +                /*
> +                 * Stop the optimization only when it has started. This
> +                 * avoids a stale stop sign for the previous command.
> +                 */
> +                if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START) {
> +                    dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +                    qemu_mutex_unlock(&dev->free_page_lock);
> +                    break;
> +                }
> +            }
> +        }
> +
> +        if (elem->in_num) {
> +            /* TODO: send the poison value to the destination */
> +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
> +                !dev->poison_val) {
> +                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
> +                                          elem->in_sg[0].iov_len);
> +            }
> +            virtqueue_push(vq, elem, 0);
> +            g_free(elem);
> +        }
> +        qemu_mutex_unlock(&dev->free_page_lock);
> +    }
> +    virtio_notify(vdev, vq);
> +}

[...]

> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> +    .name = "virtio-balloon-device/free-page-report",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = virtio_balloon_free_page_support,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),

(could we move all the poison-related lines into another patch or
 postpone?  after all we don't support it yet, do we?)

> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_virtio_balloon_device = {
>      .name = "virtio-balloon-device",
>      .version_id = 1,
> @@ -423,30 +572,42 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>          VMSTATE_UINT32(actual, VirtIOBalloon),
>          VMSTATE_END_OF_LIST()
>      },
> +    .subsections = (const VMStateDescription * []) {
> +        &vmstate_virtio_balloon_free_page_report,
> +        NULL
> +    }
>  };
>  
>  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> -    int ret;
>  
>      virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
>                  sizeof(struct virtio_balloon_config));
>  
> -    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> -                                   virtio_balloon_stat, s);
> -
> -    if (ret < 0) {
> -        error_setg(errp, "Only one balloon device is supported");
> -        virtio_cleanup(vdev);
> -        return;
> -    }
> -
>      s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> -
> +    if (virtio_has_feature(s->host_features,
> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +        s->free_page_report_cmd_id =
> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;

Why explicitly -1?  I thought ID_MIN would be fine too?

> +        if (s->iothread) {
> +            object_ref(OBJECT(s->iothread));
> +            s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
> +                                       virtio_balloon_poll_free_page_hints, s);

Just to mention that now we can create internal iothreads.  Please
have a look at iothread_create().

> +            qemu_mutex_init(&s->free_page_lock);
> +            qemu_cond_init(&s->free_page_cond);
> +            s->block_iothread = false;
> +        } else {
> +            /* Simply disable this feature if the iothread wasn't created. */
> +            s->host_features &= ~(1 << VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> +            virtio_error(vdev, "iothread is missing");
> +        }
> +    }
>      reset_stats(s);
>  }

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-06  5:42               ` [Qemu-devel] " Peter Xu
@ 2018-06-06 10:04                   ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-06 10:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/06/2018 01:42 PM, Peter Xu wrote:
>
> IMHO migration states do not suite here.  IMHO bitmap syncing is too
> frequently an operation, especially at the end of a precopy migration.
> If you really want to introduce some notifiers, I would prefer
> something new rather than fiddling around with migration state.  E.g.,
> maybe a new migration event notifiers, then introduce two new events
> for both start/end of bitmap syncing.

Please see if below aligns to what you meant:

MigrationState {
...
+ int ram_save_state;

}

typedef enum RamSaveState {
     RAM_SAVE_BEGIN = 0,
     RAM_SAVE_END = 1,
     RAM_SAVE_MAX = 2
}

then at the step 1) and 3) you concluded somewhere below, we change the 
state and invoke the callback.


Btw, the migration_state_notifiers is already there, but seems not 
really used (I only tracked spice-core.c called 
add_migration_state_change_notifier). I thought adding new migration 
states can reuse all that we have.
What's your real concern about that? (not sure how defining new events 
would make a difference)

>> I would suggest to focus on the supplied interface and its usage in live
>> migration. That is, now we have two APIs, start() and stop(), to start and
>> stop the optimization.
>>
>> 1) where in the migration code should we use them (do you agree with the
>> step (1), (2), (3) you concluded below?)
>> 2) how should we use them, directly do global call or via notifiers?
> I don't know how Dave and Juan might think; here I tend to agree with
> Michael that some notifier framework should be nicer.
>

What would be the advantages of using notifiers here?



> This is not that obvious to me.  For now I think it's true, since when
> we call stop() we'll take the mutex, meanwhile the mutex is actually
> always held by the iothread (in the big loop in
> virtio_balloon_poll_free_page_hints) until either:
>
> - it sleeps in qemu_cond_wait() [1], or
> - it leaves the big loop [2]
>
> Since I don't see anyone who will set dev->block_iothread to true for
> the balloon device, then [1] cannot happen;

there is a case in virtio_balloon_set_status which sets 
dev->block_iothread to true.

Did you mean the free_page_lock mutex? it is released at the bottom of 
the while() loop in virtio_balloon_poll_free_page_hint. It's actually 
released for every hint. That is,

while(1){
     take the lock;
     process 1 hint from the vq;
     release the lock;
}

>   then I think when stop()
> has taken the mutex then the thread must have quitted the big loop,
> which goes to path [2].  I am not sure my understanding is correct,
> but in all cases "Step(1) guarantees us that the QEMU side
> optimization call has exited" is not obvious to me.  Would you add
> some comment to the code (or even improve the code itself somehow) to
> help people understand that?
>
> For example, I saw that the old github code has a pthread_join() (in
> that old code it was not using iothread at all).  That's a very good
> code example so that people can understand that it's a synchronous
> operations.

>> This is enough. If the guest continues to report after that, that
>> reported hints will be detected as stale hints and dropped in the next start
>> of optimization.
> This is not clear to me too.  Say, stop() only sets the balloon status
> to STOP, AFAIU it does not really modify the cmd_id field immediately,
> then how will the new coming hints be known as stale hints?

Yes, you get that correctly - stop() only sets the status to STOP. On 
the other side, virtio_balloon_poll_free_page_hints will stop when it 
sees the staus is STOP. The free_page_lock guarantees that when stop() 
returns, virtio_balloon_poll_free_page_hints will not proceed. When 
virtio_balloon_poll_free_page_hints exits, the coming hints are not 
processed any more. They just stay in the vq. The next time start() is 
called, virtio_balloon_poll_free_page_hints works again, it will first 
drop all those stale hints.
I'll see where I could add some comments to explain.


Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-06 10:04                   ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-06 10:04 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/06/2018 01:42 PM, Peter Xu wrote:
>
> IMHO migration states do not suite here.  IMHO bitmap syncing is too
> frequently an operation, especially at the end of a precopy migration.
> If you really want to introduce some notifiers, I would prefer
> something new rather than fiddling around with migration state.  E.g.,
> maybe a new migration event notifiers, then introduce two new events
> for both start/end of bitmap syncing.

Please see if below aligns to what you meant:

MigrationState {
...
+ int ram_save_state;

}

typedef enum RamSaveState {
     RAM_SAVE_BEGIN = 0,
     RAM_SAVE_END = 1,
     RAM_SAVE_MAX = 2
}

then at the step 1) and 3) you concluded somewhere below, we change the 
state and invoke the callback.


Btw, the migration_state_notifiers is already there, but seems not 
really used (I only tracked spice-core.c called 
add_migration_state_change_notifier). I thought adding new migration 
states can reuse all that we have.
What's your real concern about that? (not sure how defining new events 
would make a difference)

>> I would suggest to focus on the supplied interface and its usage in live
>> migration. That is, now we have two APIs, start() and stop(), to start and
>> stop the optimization.
>>
>> 1) where in the migration code should we use them (do you agree with the
>> step (1), (2), (3) you concluded below?)
>> 2) how should we use them, directly do global call or via notifiers?
> I don't know how Dave and Juan might think; here I tend to agree with
> Michael that some notifier framework should be nicer.
>

What would be the advantages of using notifiers here?



> This is not that obvious to me.  For now I think it's true, since when
> we call stop() we'll take the mutex, meanwhile the mutex is actually
> always held by the iothread (in the big loop in
> virtio_balloon_poll_free_page_hints) until either:
>
> - it sleeps in qemu_cond_wait() [1], or
> - it leaves the big loop [2]
>
> Since I don't see anyone who will set dev->block_iothread to true for
> the balloon device, then [1] cannot happen;

there is a case in virtio_balloon_set_status which sets 
dev->block_iothread to true.

Did you mean the free_page_lock mutex? it is released at the bottom of 
the while() loop in virtio_balloon_poll_free_page_hint. It's actually 
released for every hint. That is,

while(1){
     take the lock;
     process 1 hint from the vq;
     release the lock;
}

>   then I think when stop()
> has taken the mutex then the thread must have quitted the big loop,
> which goes to path [2].  I am not sure my understanding is correct,
> but in all cases "Step(1) guarantees us that the QEMU side
> optimization call has exited" is not obvious to me.  Would you add
> some comment to the code (or even improve the code itself somehow) to
> help people understand that?
>
> For example, I saw that the old github code has a pthread_join() (in
> that old code it was not using iothread at all).  That's a very good
> code example so that people can understand that it's a synchronous
> operations.

>> This is enough. If the guest continues to report after that, that
>> reported hints will be detected as stale hints and dropped in the next start
>> of optimization.
> This is not clear to me too.  Say, stop() only sets the balloon status
> to STOP, AFAIU it does not really modify the cmd_id field immediately,
> then how will the new coming hints be known as stale hints?

Yes, you get that correctly - stop() only sets the status to STOP. On 
the other side, virtio_balloon_poll_free_page_hints will stop when it 
sees the staus is STOP. The free_page_lock guarantees that when stop() 
returns, virtio_balloon_poll_free_page_hints will not proceed. When 
virtio_balloon_poll_free_page_hints exits, the coming hints are not 
processed any more. They just stay in the vq. The next time start() is 
called, virtio_balloon_poll_free_page_hints works again, it will first 
drop all those stale hints.
I'll see where I could add some comments to explain.


Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-06  6:43   ` [Qemu-devel] " Peter Xu
@ 2018-06-06 10:11       ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-06 10:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/06/2018 02:43 PM, Peter Xu wrote:
> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>
> [...]
>
> +        if (elem->out_num) {
> +            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
> +            virtqueue_push(vq, elem, size);
> Silly question: is this sending the same id back to guest?  Why?

No. It's just giving back the used buffer.

>
>> +            g_free(elem);
>> +
>> +            virtio_tswap32s(vdev, &id);
>> +            if (unlikely(size != sizeof(id))) {
>> +                virtio_error(vdev, "received an incorrect cmd id");
> Forgot to unlock?
>
> Maybe we can just move the lock operations outside:
>
>    mutex_lock();
>    while (1) {
>      ...
>      if (block) {
>        qemu_cond_wait();
>      }
>      ...
>      if (skip) {
>        continue;
>      }
>      ...
>      if (error) {
>        break;
>      }
>      ...
>    }
>    mutex_unlock();


I got similar comments from Michael, and it will be
while (1) {
lock;
func();
unlock();
}

All the unlock inside the body will be gone.

> [...]
>> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
>> +    .name = "virtio-balloon-device/free-page-report",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = virtio_balloon_free_page_support,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
>> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> (could we move all the poison-related lines into another patch or
>   postpone?  after all we don't support it yet, do we?)
>

  We don't support migrating poison value, but guest maybe use it, so we 
are actually disabling this feature in that case. Probably good to leave 
the code together to handle that case.


>> +    if (virtio_has_feature(s->host_features,
>> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
>> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>> +        s->free_page_report_cmd_id =
>> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
> Why explicitly -1?  I thought ID_MIN would be fine too?

Yes, that will also be fine. Since we states that the cmd id will be 
from [MIN, MAX], and we make s->free_page_report_cmd_id++ in start(), 
using VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN here will make it [MIN 
+ 1, MAX].

>
>> +        if (s->iothread) {
>> +            object_ref(OBJECT(s->iothread));
>> +            s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
>> +                                       virtio_balloon_poll_free_page_hints, s);
> Just to mention that now we can create internal iothreads.  Please
> have a look at iothread_create().

Thanks. I noticed that, but I think configuring via the cmd line can let 
people share the iothread with other devices that need it.

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-06 10:11       ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-06 10:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/06/2018 02:43 PM, Peter Xu wrote:
> On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
>
> [...]
>
> +        if (elem->out_num) {
> +            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
> +            virtqueue_push(vq, elem, size);
> Silly question: is this sending the same id back to guest?  Why?

No. It's just giving back the used buffer.

>
>> +            g_free(elem);
>> +
>> +            virtio_tswap32s(vdev, &id);
>> +            if (unlikely(size != sizeof(id))) {
>> +                virtio_error(vdev, "received an incorrect cmd id");
> Forgot to unlock?
>
> Maybe we can just move the lock operations outside:
>
>    mutex_lock();
>    while (1) {
>      ...
>      if (block) {
>        qemu_cond_wait();
>      }
>      ...
>      if (skip) {
>        continue;
>      }
>      ...
>      if (error) {
>        break;
>      }
>      ...
>    }
>    mutex_unlock();


I got similar comments from Michael, and it will be
while (1) {
lock;
func();
unlock();
}

All the unlock inside the body will be gone.

> [...]
>> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
>> +    .name = "virtio-balloon-device/free-page-report",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .needed = virtio_balloon_free_page_support,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
>> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> (could we move all the poison-related lines into another patch or
>   postpone?  after all we don't support it yet, do we?)
>

  We don't support migrating poison value, but guest maybe use it, so we 
are actually disabling this feature in that case. Probably good to leave 
the code together to handle that case.


>> +    if (virtio_has_feature(s->host_features,
>> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
>> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>> +        s->free_page_report_cmd_id =
>> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
> Why explicitly -1?  I thought ID_MIN would be fine too?

Yes, that will also be fine. Since we states that the cmd id will be 
from [MIN, MAX], and we make s->free_page_report_cmd_id++ in start(), 
using VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN here will make it [MIN 
+ 1, MAX].

>
>> +        if (s->iothread) {
>> +            object_ref(OBJECT(s->iothread));
>> +            s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
>> +                                       virtio_balloon_poll_free_page_hints, s);
> Just to mention that now we can create internal iothreads.  Please
> have a look at iothread_create().

Thanks. I noticed that, but I think configuring via the cmd line can let 
people share the iothread with other devices that need it.

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-06 10:04                   ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-06 11:02                   ` Peter Xu
  2018-06-07  5:24                       ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-06 11:02 UTC (permalink / raw)
  To: Wei Wang
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On Wed, Jun 06, 2018 at 06:04:23PM +0800, Wei Wang wrote:
> On 06/06/2018 01:42 PM, Peter Xu wrote:
> > 
> > IMHO migration states do not suite here.  IMHO bitmap syncing is too
> > frequently an operation, especially at the end of a precopy migration.
> > If you really want to introduce some notifiers, I would prefer
> > something new rather than fiddling around with migration state.  E.g.,
> > maybe a new migration event notifiers, then introduce two new events
> > for both start/end of bitmap syncing.
> 
> Please see if below aligns to what you meant:
> 
> MigrationState {
> ...
> + int ram_save_state;
> 
> }
> 
> typedef enum RamSaveState {
>     RAM_SAVE_BEGIN = 0,
>     RAM_SAVE_END = 1,
>     RAM_SAVE_MAX = 2
> }
> 
> then at the step 1) and 3) you concluded somewhere below, we change the
> state and invoke the callback.

I mean something like this:

1693c64c27 ("postcopy: Add notifier chain", 2018-03-20)

That was a postcopy-only notifier.  Maybe we can generalize it into a
more common notifier for the migration framework so that we can even
register with non-postcopy events like bitmap syncing?

> 
> 
> Btw, the migration_state_notifiers is already there, but seems not really
> used (I only tracked spice-core.c called
> add_migration_state_change_notifier). I thought adding new migration states
> can reuse all that we have.
> What's your real concern about that? (not sure how defining new events would
> make a difference)

Migration state is exposed via control path (QMP).  Adding new states
mean that the QMP clients will see more.  IMO that's not really
anything that a QMP client will need to know, instead we can keep it
internally.  That's a reason from compatibility pov.

Meanwhile, it's not really a state-thing at all for me.  It looks
really more like hook or event (start/stop of sync).

> 
> > > I would suggest to focus on the supplied interface and its usage in live
> > > migration. That is, now we have two APIs, start() and stop(), to start and
> > > stop the optimization.
> > > 
> > > 1) where in the migration code should we use them (do you agree with the
> > > step (1), (2), (3) you concluded below?)
> > > 2) how should we use them, directly do global call or via notifiers?
> > I don't know how Dave and Juan might think; here I tend to agree with
> > Michael that some notifier framework should be nicer.
> > 
> 
> What would be the advantages of using notifiers here?

Isolation of modules?  Then migration/ram.c at least won't need to
include something like "balloon.h".

And I think it's also possible too if some other modules would like to
hook at these places someday.

> 
> 
> 
> > This is not that obvious to me.  For now I think it's true, since when
> > we call stop() we'll take the mutex, meanwhile the mutex is actually
> > always held by the iothread (in the big loop in
> > virtio_balloon_poll_free_page_hints) until either:
> > 
> > - it sleeps in qemu_cond_wait() [1], or
> > - it leaves the big loop [2]
> > 
> > Since I don't see anyone who will set dev->block_iothread to true for
> > the balloon device, then [1] cannot happen;
> 
> there is a case in virtio_balloon_set_status which sets dev->block_iothread
> to true.
> 
> Did you mean the free_page_lock mutex? it is released at the bottom of the
> while() loop in virtio_balloon_poll_free_page_hint. It's actually released
> for every hint. That is,
> 
> while(1){
>     take the lock;
>     process 1 hint from the vq;
>     release the lock;
> }

Ah, so now I understand why you need the lock to be inside the loop,
since the loop is busy polling actually.  Is it possible to do this in
an async way?  I'm a bit curious on how much time will it use to do
one round of the free page hints (e.g., an idle guest with 8G mem, or
any configuration you tested)?  I suppose during that time the
iothread will be held steady with 100% cpu usage, am I right?

> 
> >   then I think when stop()
> > has taken the mutex then the thread must have quitted the big loop,
> > which goes to path [2].  I am not sure my understanding is correct,
> > but in all cases "Step(1) guarantees us that the QEMU side
> > optimization call has exited" is not obvious to me.  Would you add
> > some comment to the code (or even improve the code itself somehow) to
> > help people understand that?
> > 
> > For example, I saw that the old github code has a pthread_join() (in
> > that old code it was not using iothread at all).  That's a very good
> > code example so that people can understand that it's a synchronous
> > operations.
> 
> > > This is enough. If the guest continues to report after that, that
> > > reported hints will be detected as stale hints and dropped in the next start
> > > of optimization.
> > This is not clear to me too.  Say, stop() only sets the balloon status
> > to STOP, AFAIU it does not really modify the cmd_id field immediately,
> > then how will the new coming hints be known as stale hints?
> 
> Yes, you get that correctly - stop() only sets the status to STOP. On the
> other side, virtio_balloon_poll_free_page_hints will stop when it sees the
> staus is STOP. The free_page_lock guarantees that when stop() returns,
> virtio_balloon_poll_free_page_hints will not proceed. When
> virtio_balloon_poll_free_page_hints exits, the coming hints are not
> processed any more. They just stay in the vq. The next time start() is
> called, virtio_balloon_poll_free_page_hints works again, it will first drop
> all those stale hints.
> I'll see where I could add some comments to explain.

That'll be nice.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-06 10:11       ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-07  3:17       ` Peter Xu
  2018-06-07  5:29           ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-07  3:17 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Wed, Jun 06, 2018 at 06:11:50PM +0800, Wei Wang wrote:
> On 06/06/2018 02:43 PM, Peter Xu wrote:
> > On Tue, Apr 24, 2018 at 02:13:47PM +0800, Wei Wang wrote:
> > 
> > [...]
> > 
> > +        if (elem->out_num) {
> > +            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
> > +            virtqueue_push(vq, elem, size);
> > Silly question: is this sending the same id back to guest?  Why?
> 
> No. It's just giving back the used buffer.

Oops, sorry!

> 
> > 
> > > +            g_free(elem);
> > > +
> > > +            virtio_tswap32s(vdev, &id);
> > > +            if (unlikely(size != sizeof(id))) {
> > > +                virtio_error(vdev, "received an incorrect cmd id");
> > Forgot to unlock?
> > 
> > Maybe we can just move the lock operations outside:
> > 
> >    mutex_lock();
> >    while (1) {
> >      ...
> >      if (block) {
> >        qemu_cond_wait();
> >      }
> >      ...
> >      if (skip) {
> >        continue;
> >      }
> >      ...
> >      if (error) {
> >        break;
> >      }
> >      ...
> >    }
> >    mutex_unlock();
> 
> 
> I got similar comments from Michael, and it will be
> while (1) {
> lock;
> func();
> unlock();
> }
> 
> All the unlock inside the body will be gone.

Ok I think I have more question on this part...

Actually AFAICT this new feature uses iothread in a way very similar
to the block layer, so I digged a bit on how block layer used the
iothreads.  I see that the block code is using something like
virtio_queue_aio_set_host_notifier_handler() to hook up the
iothread/aiocontext and the ioeventfd, however here you are manually
creating one QEMUBH and bound that to the new context.  Should you
also use something like the block layer?  Then IMHO you can avoid
using a busy loop there (assuming the performance does not really
matter that much here for page hintings), and all the packet handling
can again be based on interrupts from the guest (ioeventfd).

[1]

> 
> > [...]
> > > +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> > > +    .name = "virtio-balloon-device/free-page-report",
> > > +    .version_id = 1,
> > > +    .minimum_version_id = 1,
> > > +    .needed = virtio_balloon_free_page_support,
> > > +    .fields = (VMStateField[]) {
> > > +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> > > +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> > (could we move all the poison-related lines into another patch or
> >   postpone?  after all we don't support it yet, do we?)
> > 
> 
>  We don't support migrating poison value, but guest maybe use it, so we are
> actually disabling this feature in that case. Probably good to leave the
> code together to handle that case.

Could we just avoid declaring that feature bit in emulation code
completely?  I mean, we support VIRTIO_BALLOON_F_FREE_PAGE_HINT first
as the first step (as you mentioned in commit message, the POISON is a
TODO).  Then when you really want to completely support the POISON
bit, you can put all that into a separate patch.  Would that work?

> 
> 
> > > +    if (virtio_has_feature(s->host_features,
> > > +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > > +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
> > > +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> > > +        s->free_page_report_cmd_id =
> > > +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
> > Why explicitly -1?  I thought ID_MIN would be fine too?
> 
> Yes, that will also be fine. Since we states that the cmd id will be from
> [MIN, MAX], and we make s->free_page_report_cmd_id++ in start(), using
> VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN here will make it [MIN + 1, MAX].

Then I would prefer we just use the MIN value, otherwise IMO we'd
better have a comment mentioning about why that -1 is there.

> 
> > 
> > > +        if (s->iothread) {
> > > +            object_ref(OBJECT(s->iothread));
> > > +            s->free_page_bh = aio_bh_new(iothread_get_aio_context(s->iothread),
> > > +                                       virtio_balloon_poll_free_page_hints, s);
> > Just to mention that now we can create internal iothreads.  Please
> > have a look at iothread_create().
> 
> Thanks. I noticed that, but I think configuring via the cmd line can let
> people share the iothread with other devices that need it.

Ok.  Please have a look at my previous comment at [1].  I'm not sure
whether my understanding is correct.  But in case if so, not sure
whether we can avoid this QEMUBH here.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-06 11:02                   ` [Qemu-devel] " Peter Xu
@ 2018-06-07  5:24                       ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-07  5:24 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/06/2018 07:02 PM, Peter Xu wrote:
> On Wed, Jun 06, 2018 at 06:04:23PM +0800, Wei Wang wrote:
>> On 06/06/2018 01:42 PM, Peter Xu wrote:
>>> IMHO migration states do not suite here.  IMHO bitmap syncing is too
>>> frequently an operation, especially at the end of a precopy migration.
>>> If you really want to introduce some notifiers, I would prefer
>>> something new rather than fiddling around with migration state.  E.g.,
>>> maybe a new migration event notifiers, then introduce two new events
>>> for both start/end of bitmap syncing.
>> Please see if below aligns to what you meant:
>>
>> MigrationState {
>> ...
>> + int ram_save_state;
>>
>> }
>>
>> typedef enum RamSaveState {
>>      RAM_SAVE_BEGIN = 0,
>>      RAM_SAVE_END = 1,
>>      RAM_SAVE_MAX = 2
>> }
>>
>> then at the step 1) and 3) you concluded somewhere below, we change the
>> state and invoke the callback.
> I mean something like this:
>
> 1693c64c27 ("postcopy: Add notifier chain", 2018-03-20)
>
> That was a postcopy-only notifier.  Maybe we can generalize it into a
> more common notifier for the migration framework so that we can even
> register with non-postcopy events like bitmap syncing?

Precopy already has its own notifiers: git 99a0db9b
If we want to reuse, that one would be more suitable. I think mixing 
non-related events into one notifier list isn't nice.

>>
>> Btw, the migration_state_notifiers is already there, but seems not really
>> used (I only tracked spice-core.c called
>> add_migration_state_change_notifier). I thought adding new migration states
>> can reuse all that we have.
>> What's your real concern about that? (not sure how defining new events would
>> make a difference)
> Migration state is exposed via control path (QMP).  Adding new states
> mean that the QMP clients will see more.  IMO that's not really
> anything that a QMP client will need to know, instead we can keep it
> internally.  That's a reason from compatibility pov.
>
> Meanwhile, it's not really a state-thing at all for me.  It looks
> really more like hook or event (start/stop of sync).

Thanks for sharing your concerns in detail, which are quite helpful for 
the discussion. To reuse 99a0db9b, we can also add sub-states (or say 
events), instead of new migration states.
For example, we can still define "enum RamSaveState" as above, which can 
be an indication for the notifier queued on the 99a0db9b notider_list to 
decide whether to call start or stop.
Does this solve your concern?


>
>>>> I would suggest to focus on the supplied interface and its usage in live
>>>> migration. That is, now we have two APIs, start() and stop(), to start and
>>>> stop the optimization.
>>>>
>>>> 1) where in the migration code should we use them (do you agree with the
>>>> step (1), (2), (3) you concluded below?)
>>>> 2) how should we use them, directly do global call or via notifiers?
>>> I don't know how Dave and Juan might think; here I tend to agree with
>>> Michael that some notifier framework should be nicer.
>>>
>> What would be the advantages of using notifiers here?
> Isolation of modules?  Then migration/ram.c at least won't need to
> include something like "balloon.h".
>
> And I think it's also possible too if some other modules would like to
> hook at these places someday.

OK, I can implement it with notifiers, but this (framework) is usually 
added when someday there is a second user who needs a callback at the 
same place.



>>
>>
>>> This is not that obvious to me.  For now I think it's true, since when
>>> we call stop() we'll take the mutex, meanwhile the mutex is actually
>>> always held by the iothread (in the big loop in
>>> virtio_balloon_poll_free_page_hints) until either:
>>>
>>> - it sleeps in qemu_cond_wait() [1], or
>>> - it leaves the big loop [2]
>>>
>>> Since I don't see anyone who will set dev->block_iothread to true for
>>> the balloon device, then [1] cannot happen;
>> there is a case in virtio_balloon_set_status which sets dev->block_iothread
>> to true.
>>
>> Did you mean the free_page_lock mutex? it is released at the bottom of the
>> while() loop in virtio_balloon_poll_free_page_hint. It's actually released
>> for every hint. That is,
>>
>> while(1){
>>      take the lock;
>>      process 1 hint from the vq;
>>      release the lock;
>> }
> Ah, so now I understand why you need the lock to be inside the loop,
> since the loop is busy polling actually.  Is it possible to do this in
> an async way?

We need to use polling here because of some back story in the guest side 
(due to some locks being held) that makes it a barrier to sending 
notifications for each hints.

> I'm a bit curious on how much time will it use to do
> one round of the free page hints (e.g., an idle guest with 8G mem, or
> any configuration you tested)?  I suppose during that time the
> iothread will be held steady with 100% cpu usage, am I right?

Compared to the time spent by the legacy migration to send free pages, 
that small amount of CPU usage spent on filtering free pages could be 
neglected.
Grinding a chopper will not hold up the work of cutting firewood :)

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-07  5:24                       ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-07  5:24 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/06/2018 07:02 PM, Peter Xu wrote:
> On Wed, Jun 06, 2018 at 06:04:23PM +0800, Wei Wang wrote:
>> On 06/06/2018 01:42 PM, Peter Xu wrote:
>>> IMHO migration states do not suite here.  IMHO bitmap syncing is too
>>> frequently an operation, especially at the end of a precopy migration.
>>> If you really want to introduce some notifiers, I would prefer
>>> something new rather than fiddling around with migration state.  E.g.,
>>> maybe a new migration event notifiers, then introduce two new events
>>> for both start/end of bitmap syncing.
>> Please see if below aligns to what you meant:
>>
>> MigrationState {
>> ...
>> + int ram_save_state;
>>
>> }
>>
>> typedef enum RamSaveState {
>>      RAM_SAVE_BEGIN = 0,
>>      RAM_SAVE_END = 1,
>>      RAM_SAVE_MAX = 2
>> }
>>
>> then at the step 1) and 3) you concluded somewhere below, we change the
>> state and invoke the callback.
> I mean something like this:
>
> 1693c64c27 ("postcopy: Add notifier chain", 2018-03-20)
>
> That was a postcopy-only notifier.  Maybe we can generalize it into a
> more common notifier for the migration framework so that we can even
> register with non-postcopy events like bitmap syncing?

Precopy already has its own notifiers: git 99a0db9b
If we want to reuse, that one would be more suitable. I think mixing 
non-related events into one notifier list isn't nice.

>>
>> Btw, the migration_state_notifiers is already there, but seems not really
>> used (I only tracked spice-core.c called
>> add_migration_state_change_notifier). I thought adding new migration states
>> can reuse all that we have.
>> What's your real concern about that? (not sure how defining new events would
>> make a difference)
> Migration state is exposed via control path (QMP).  Adding new states
> mean that the QMP clients will see more.  IMO that's not really
> anything that a QMP client will need to know, instead we can keep it
> internally.  That's a reason from compatibility pov.
>
> Meanwhile, it's not really a state-thing at all for me.  It looks
> really more like hook or event (start/stop of sync).

Thanks for sharing your concerns in detail, which are quite helpful for 
the discussion. To reuse 99a0db9b, we can also add sub-states (or say 
events), instead of new migration states.
For example, we can still define "enum RamSaveState" as above, which can 
be an indication for the notifier queued on the 99a0db9b notider_list to 
decide whether to call start or stop.
Does this solve your concern?


>
>>>> I would suggest to focus on the supplied interface and its usage in live
>>>> migration. That is, now we have two APIs, start() and stop(), to start and
>>>> stop the optimization.
>>>>
>>>> 1) where in the migration code should we use them (do you agree with the
>>>> step (1), (2), (3) you concluded below?)
>>>> 2) how should we use them, directly do global call or via notifiers?
>>> I don't know how Dave and Juan might think; here I tend to agree with
>>> Michael that some notifier framework should be nicer.
>>>
>> What would be the advantages of using notifiers here?
> Isolation of modules?  Then migration/ram.c at least won't need to
> include something like "balloon.h".
>
> And I think it's also possible too if some other modules would like to
> hook at these places someday.

OK, I can implement it with notifiers, but this (framework) is usually 
added when someday there is a second user who needs a callback at the 
same place.



>>
>>
>>> This is not that obvious to me.  For now I think it's true, since when
>>> we call stop() we'll take the mutex, meanwhile the mutex is actually
>>> always held by the iothread (in the big loop in
>>> virtio_balloon_poll_free_page_hints) until either:
>>>
>>> - it sleeps in qemu_cond_wait() [1], or
>>> - it leaves the big loop [2]
>>>
>>> Since I don't see anyone who will set dev->block_iothread to true for
>>> the balloon device, then [1] cannot happen;
>> there is a case in virtio_balloon_set_status which sets dev->block_iothread
>> to true.
>>
>> Did you mean the free_page_lock mutex? it is released at the bottom of the
>> while() loop in virtio_balloon_poll_free_page_hint. It's actually released
>> for every hint. That is,
>>
>> while(1){
>>      take the lock;
>>      process 1 hint from the vq;
>>      release the lock;
>> }
> Ah, so now I understand why you need the lock to be inside the loop,
> since the loop is busy polling actually.  Is it possible to do this in
> an async way?

We need to use polling here because of some back story in the guest side 
(due to some locks being held) that makes it a barrier to sending 
notifications for each hints.

> I'm a bit curious on how much time will it use to do
> one round of the free page hints (e.g., an idle guest with 8G mem, or
> any configuration you tested)?  I suppose during that time the
> iothread will be held steady with 100% cpu usage, am I right?

Compared to the time spent by the legacy migration to send free pages, 
that small amount of CPU usage spent on filtering free pages could be 
neglected.
Grinding a chopper will not hold up the work of cutting firewood :)

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-07  3:17       ` Peter Xu
@ 2018-06-07  5:29           ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-07  5:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/07/2018 11:17 AM, Peter Xu wrote:
> On Wed, Jun 06, 2018 at 06:11:50PM +0800, Wei Wang wrote:
>
> I got similar comments from Michael, and it will be
> while (1) {
> lock;
> func();
> unlock();
> }
>
> All the unlock inside the body will be gone.
> Ok I think I have more question on this part...
>
> Actually AFAICT this new feature uses iothread in a way very similar
> to the block layer, so I digged a bit on how block layer used the
> iothreads.  I see that the block code is using something like
> virtio_queue_aio_set_host_notifier_handler() to hook up the
> iothread/aiocontext and the ioeventfd, however here you are manually
> creating one QEMUBH and bound that to the new context.  Should you
> also use something like the block layer?  Then IMHO you can avoid
> using a busy loop there (assuming the performance does not really
> matter that much here for page hintings), and all the packet handling
> can again be based on interrupts from the guest (ioeventfd).
>
> [1]

Also mentioned in another discussion thread that it's better to not let 
guest send notifications. Otherwise, we would have used the virtqueue 
door bell to notify host.
So we need to use polling here, and Michael suggested to implemented in 
BH, which sounds good to me.


>
>>> [...]
>>>> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
>>>> +    .name = "virtio-balloon-device/free-page-report",
>>>> +    .version_id = 1,
>>>> +    .minimum_version_id = 1,
>>>> +    .needed = virtio_balloon_free_page_support,
>>>> +    .fields = (VMStateField[]) {
>>>> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
>>>> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
>>> (could we move all the poison-related lines into another patch or
>>>    postpone?  after all we don't support it yet, do we?)
>>>
>>   We don't support migrating poison value, but guest maybe use it, so we are
>> actually disabling this feature in that case. Probably good to leave the
>> code together to handle that case.
> Could we just avoid declaring that feature bit in emulation code
> completely?  I mean, we support VIRTIO_BALLOON_F_FREE_PAGE_HINT first
> as the first step (as you mentioned in commit message, the POISON is a
> TODO).  Then when you really want to completely support the POISON
> bit, you can put all that into a separate patch.  Would that work?
>

Not really. The F_PAGE_POISON isn't a feature configured via QEMU cmd 
line like F_FREE_PAGE_HINT. We always set F_PAGE_POISON if 
F_FREE_PAGE_HINT is enabled. It is used to detect if the guest is using 
page poison.


>>
>>>> +    if (virtio_has_feature(s->host_features,
>>>> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
>>>> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>>>> +        s->free_page_report_cmd_id =
>>>> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
>>> Why explicitly -1?  I thought ID_MIN would be fine too?
>> Yes, that will also be fine. Since we states that the cmd id will be from
>> [MIN, MAX], and we make s->free_page_report_cmd_id++ in start(), using
>> VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN here will make it [MIN + 1, MAX].
> Then I would prefer we just use the MIN value, otherwise IMO we'd
> better have a comment mentioning about why that -1 is there.

Sure, we can do that.

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-07  5:29           ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-07  5:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/07/2018 11:17 AM, Peter Xu wrote:
> On Wed, Jun 06, 2018 at 06:11:50PM +0800, Wei Wang wrote:
>
> I got similar comments from Michael, and it will be
> while (1) {
> lock;
> func();
> unlock();
> }
>
> All the unlock inside the body will be gone.
> Ok I think I have more question on this part...
>
> Actually AFAICT this new feature uses iothread in a way very similar
> to the block layer, so I digged a bit on how block layer used the
> iothreads.  I see that the block code is using something like
> virtio_queue_aio_set_host_notifier_handler() to hook up the
> iothread/aiocontext and the ioeventfd, however here you are manually
> creating one QEMUBH and bound that to the new context.  Should you
> also use something like the block layer?  Then IMHO you can avoid
> using a busy loop there (assuming the performance does not really
> matter that much here for page hintings), and all the packet handling
> can again be based on interrupts from the guest (ioeventfd).
>
> [1]

Also mentioned in another discussion thread that it's better to not let 
guest send notifications. Otherwise, we would have used the virtqueue 
door bell to notify host.
So we need to use polling here, and Michael suggested to implemented in 
BH, which sounds good to me.


>
>>> [...]
>>>> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
>>>> +    .name = "virtio-balloon-device/free-page-report",
>>>> +    .version_id = 1,
>>>> +    .minimum_version_id = 1,
>>>> +    .needed = virtio_balloon_free_page_support,
>>>> +    .fields = (VMStateField[]) {
>>>> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
>>>> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
>>> (could we move all the poison-related lines into another patch or
>>>    postpone?  after all we don't support it yet, do we?)
>>>
>>   We don't support migrating poison value, but guest maybe use it, so we are
>> actually disabling this feature in that case. Probably good to leave the
>> code together to handle that case.
> Could we just avoid declaring that feature bit in emulation code
> completely?  I mean, we support VIRTIO_BALLOON_F_FREE_PAGE_HINT first
> as the first step (as you mentioned in commit message, the POISON is a
> TODO).  Then when you really want to completely support the POISON
> bit, you can put all that into a separate patch.  Would that work?
>

Not really. The F_PAGE_POISON isn't a feature configured via QEMU cmd 
line like F_FREE_PAGE_HINT. We always set F_PAGE_POISON if 
F_FREE_PAGE_HINT is enabled. It is used to detect if the guest is using 
page poison.


>>
>>>> +    if (virtio_has_feature(s->host_features,
>>>> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
>>>> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>>>> +        s->free_page_report_cmd_id =
>>>> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
>>> Why explicitly -1?  I thought ID_MIN would be fine too?
>> Yes, that will also be fine. Since we states that the cmd id will be from
>> [MIN, MAX], and we make s->free_page_report_cmd_id++ in start(), using
>> VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN here will make it [MIN + 1, MAX].
> Then I would prefer we just use the MIN value, otherwise IMO we'd
> better have a comment mentioning about why that -1 is there.

Sure, we can do that.

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-07  5:24                       ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-07  6:32                       ` Peter Xu
  2018-06-07 11:59                           ` [virtio-dev] " Wei Wang
  2018-06-08  7:31                           ` [virtio-dev] " Wei Wang
  -1 siblings, 2 replies; 93+ messages in thread
From: Peter Xu @ 2018-06-07  6:32 UTC (permalink / raw)
  To: Wei Wang
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On Thu, Jun 07, 2018 at 01:24:29PM +0800, Wei Wang wrote:
> On 06/06/2018 07:02 PM, Peter Xu wrote:
> > On Wed, Jun 06, 2018 at 06:04:23PM +0800, Wei Wang wrote:
> > > On 06/06/2018 01:42 PM, Peter Xu wrote:
> > > > IMHO migration states do not suite here.  IMHO bitmap syncing is too
> > > > frequently an operation, especially at the end of a precopy migration.
> > > > If you really want to introduce some notifiers, I would prefer
> > > > something new rather than fiddling around with migration state.  E.g.,
> > > > maybe a new migration event notifiers, then introduce two new events
> > > > for both start/end of bitmap syncing.
> > > Please see if below aligns to what you meant:
> > > 
> > > MigrationState {
> > > ...
> > > + int ram_save_state;
> > > 
> > > }
> > > 
> > > typedef enum RamSaveState {
> > >      RAM_SAVE_BEGIN = 0,
> > >      RAM_SAVE_END = 1,
> > >      RAM_SAVE_MAX = 2
> > > }
> > > 
> > > then at the step 1) and 3) you concluded somewhere below, we change the
> > > state and invoke the callback.
> > I mean something like this:
> > 
> > 1693c64c27 ("postcopy: Add notifier chain", 2018-03-20)
> > 
> > That was a postcopy-only notifier.  Maybe we can generalize it into a
> > more common notifier for the migration framework so that we can even
> > register with non-postcopy events like bitmap syncing?
> 
> Precopy already has its own notifiers: git 99a0db9b
> If we want to reuse, that one would be more suitable. I think mixing
> non-related events into one notifier list isn't nice.

I think that's only for migration state changes?

> 
> > > 
> > > Btw, the migration_state_notifiers is already there, but seems not really
> > > used (I only tracked spice-core.c called
> > > add_migration_state_change_notifier). I thought adding new migration states
> > > can reuse all that we have.
> > > What's your real concern about that? (not sure how defining new events would
> > > make a difference)
> > Migration state is exposed via control path (QMP).  Adding new states
> > mean that the QMP clients will see more.  IMO that's not really
> > anything that a QMP client will need to know, instead we can keep it
> > internally.  That's a reason from compatibility pov.
> > 
> > Meanwhile, it's not really a state-thing at all for me.  It looks
> > really more like hook or event (start/stop of sync).
> 
> Thanks for sharing your concerns in detail, which are quite helpful for the
> discussion. To reuse 99a0db9b, we can also add sub-states (or say events),
> instead of new migration states.
> For example, we can still define "enum RamSaveState" as above, which can be
> an indication for the notifier queued on the 99a0db9b notider_list to decide
> whether to call start or stop.
> Does this solve your concern?

Frankly speaking I don't fully understand how you would add that
sub-state.  If you are confident with the idea, maybe you can post
your new version with the change, then I can read the code.

> 
> 
> > 
> > > > > I would suggest to focus on the supplied interface and its usage in live
> > > > > migration. That is, now we have two APIs, start() and stop(), to start and
> > > > > stop the optimization.
> > > > > 
> > > > > 1) where in the migration code should we use them (do you agree with the
> > > > > step (1), (2), (3) you concluded below?)
> > > > > 2) how should we use them, directly do global call or via notifiers?
> > > > I don't know how Dave and Juan might think; here I tend to agree with
> > > > Michael that some notifier framework should be nicer.
> > > > 
> > > What would be the advantages of using notifiers here?
> > Isolation of modules?  Then migration/ram.c at least won't need to
> > include something like "balloon.h".
> > 
> > And I think it's also possible too if some other modules would like to
> > hook at these places someday.
> 
> OK, I can implement it with notifiers, but this (framework) is usually added
> when someday there is a second user who needs a callback at the same place.
> 
> 
> 
> > > 
> > > 
> > > > This is not that obvious to me.  For now I think it's true, since when
> > > > we call stop() we'll take the mutex, meanwhile the mutex is actually
> > > > always held by the iothread (in the big loop in
> > > > virtio_balloon_poll_free_page_hints) until either:
> > > > 
> > > > - it sleeps in qemu_cond_wait() [1], or
> > > > - it leaves the big loop [2]
> > > > 
> > > > Since I don't see anyone who will set dev->block_iothread to true for
> > > > the balloon device, then [1] cannot happen;
> > > there is a case in virtio_balloon_set_status which sets dev->block_iothread
> > > to true.
> > > 
> > > Did you mean the free_page_lock mutex? it is released at the bottom of the
> > > while() loop in virtio_balloon_poll_free_page_hint. It's actually released
> > > for every hint. That is,
> > > 
> > > while(1){
> > >      take the lock;
> > >      process 1 hint from the vq;
> > >      release the lock;
> > > }
> > Ah, so now I understand why you need the lock to be inside the loop,
> > since the loop is busy polling actually.  Is it possible to do this in
> > an async way?
> 
> We need to use polling here because of some back story in the guest side
> (due to some locks being held) that makes it a barrier to sending
> notifications for each hints.

Any link to the "back story" that I can learn about? :) If it's too
complicated a problem and you think I don't need to understand at all,
please feel free to do so.  Then I would assume at least Michael has
fully acknowledged that idea, and I can just stop putting more time on
this part.

Besides, if you are going to use a busy loop, then I would be not
quite sure about whether you really want to share that iothread with
others, since AFAIU that's not how iothread is designed (which is
mostly event-based and should not welcome out-of-control blocking in
the handler of events).  Though I am not 100% confident about my
understaning on that, I only raise this question up.  Anyway you'll
just take over the thread for a while without sharing, and after the
burst IOs it's mostly never used (until the next bitmap sync).  Then
it seems a bit confusing to me on why you need to share that after
all.

> 
> > I'm a bit curious on how much time will it use to do
> > one round of the free page hints (e.g., an idle guest with 8G mem, or
> > any configuration you tested)?  I suppose during that time the
> > iothread will be held steady with 100% cpu usage, am I right?
> 
> Compared to the time spent by the legacy migration to send free pages, that
> small amount of CPU usage spent on filtering free pages could be neglected.
> Grinding a chopper will not hold up the work of cutting firewood :)

Sorry I didn't express myself clearly.

My question was that, have you measured how long time it will take
from starting of the free page hints (when balloon state is set to
FREE_PAGE_REPORT_S_REQUESTED), until it completes (when QEMU receives
the VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID, then set the status to
FREE_PAGE_REPORT_S_STOP)?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-07  5:29           ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-07  6:58           ` Peter Xu
  2018-06-07 12:01               ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-07  6:58 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Thu, Jun 07, 2018 at 01:29:22PM +0800, Wei Wang wrote:
> On 06/07/2018 11:17 AM, Peter Xu wrote:
> > On Wed, Jun 06, 2018 at 06:11:50PM +0800, Wei Wang wrote:
> > 
> > I got similar comments from Michael, and it will be
> > while (1) {
> > lock;
> > func();
> > unlock();
> > }
> > 
> > All the unlock inside the body will be gone.
> > Ok I think I have more question on this part...
> > 
> > Actually AFAICT this new feature uses iothread in a way very similar
> > to the block layer, so I digged a bit on how block layer used the
> > iothreads.  I see that the block code is using something like
> > virtio_queue_aio_set_host_notifier_handler() to hook up the
> > iothread/aiocontext and the ioeventfd, however here you are manually
> > creating one QEMUBH and bound that to the new context.  Should you
> > also use something like the block layer?  Then IMHO you can avoid
> > using a busy loop there (assuming the performance does not really
> > matter that much here for page hintings), and all the packet handling
> > can again be based on interrupts from the guest (ioeventfd).
> > 
> > [1]
> 
> Also mentioned in another discussion thread that it's better to not let
> guest send notifications. Otherwise, we would have used the virtqueue door
> bell to notify host.
> So we need to use polling here, and Michael suggested to implemented in BH,
> which sounds good to me.

(We're discussing the same problem in the other thread, so let's do it
 there)

> 
> 
> > 
> > > > [...]
> > > > > +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> > > > > +    .name = "virtio-balloon-device/free-page-report",
> > > > > +    .version_id = 1,
> > > > > +    .minimum_version_id = 1,
> > > > > +    .needed = virtio_balloon_free_page_support,
> > > > > +    .fields = (VMStateField[]) {
> > > > > +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> > > > > +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> > > > (could we move all the poison-related lines into another patch or
> > > >    postpone?  after all we don't support it yet, do we?)
> > > > 
> > >   We don't support migrating poison value, but guest maybe use it, so we are
> > > actually disabling this feature in that case. Probably good to leave the
> > > code together to handle that case.
> > Could we just avoid declaring that feature bit in emulation code
> > completely?  I mean, we support VIRTIO_BALLOON_F_FREE_PAGE_HINT first
> > as the first step (as you mentioned in commit message, the POISON is a
> > TODO).  Then when you really want to completely support the POISON
> > bit, you can put all that into a separate patch.  Would that work?
> > 
> 
> Not really. The F_PAGE_POISON isn't a feature configured via QEMU cmd line
> like F_FREE_PAGE_HINT. We always set F_PAGE_POISON if F_FREE_PAGE_HINT is
> enabled. It is used to detect if the guest is using page poison.

Ok I think I kind of understand.  But it seems strange to me to have
this as a feature bit.  I thought it suites more to be a config field
so that guest could setup.  Like, we can have 1 byte to setup "whether
PAGE_POISON is used in the guest", another 1 byte to setup "what is
the PAGE_POISON value if it's enabled".

Asked since I see this in virtio spec (v1.0, though I guess it won't
change) in chapter "2.2.1 Driver Requirements: Feature Bits":

"The driver MUST NOT accept a feature which the device did not offer"

Then I'm curious what would happen if:

- a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
- a guest that enabled PAGE_POISON

Then how the driver could tell the host that PAGE_POISON is enabled
considering that guest should never set that feature bit if the
emulation code didn't provide it?

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-07  6:32                       ` [Qemu-devel] " Peter Xu
@ 2018-06-07 11:59                           ` Wei Wang
  2018-06-08  7:31                           ` [virtio-dev] " Wei Wang
  1 sibling, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-07 11:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/07/2018 02:32 PM, Peter Xu wrote:
> On Thu, Jun 07, 2018 at 01:24:29PM +0800, Wei Wang wrote:
>> On 06/06/2018 07:02 PM, Peter Xu wrote:
>>> On Wed, Jun 06, 2018 at 06:04:23PM +0800, Wei Wang wrote:
>>>> On 06/06/2018 01:42 PM, Peter Xu wrote:
>>>>> IMHO migration states do not suite here.  IMHO bitmap syncing is too
>>>>> frequently an operation, especially at the end of a precopy migration.
>>>>> If you really want to introduce some notifiers, I would prefer
>>>>> something new rather than fiddling around with migration state.  E.g.,
>>>>> maybe a new migration event notifiers, then introduce two new events
>>>>> for both start/end of bitmap syncing.
>>>> Please see if below aligns to what you meant:
>>>>
>>>> MigrationState {
>>>> ...
>>>> + int ram_save_state;
>>>>
>>>> }
>>>>
>>>> typedef enum RamSaveState {
>>>>       RAM_SAVE_BEGIN = 0,
>>>>       RAM_SAVE_END = 1,
>>>>       RAM_SAVE_MAX = 2
>>>> }
>>>>
>>>> then at the step 1) and 3) you concluded somewhere below, we change the
>>>> state and invoke the callback.
>>> I mean something like this:
>>>
>>> 1693c64c27 ("postcopy: Add notifier chain", 2018-03-20)
>>>
>>> That was a postcopy-only notifier.  Maybe we can generalize it into a
>>> more common notifier for the migration framework so that we can even
>>> register with non-postcopy events like bitmap syncing?
>> Precopy already has its own notifiers: git 99a0db9b
>> If we want to reuse, that one would be more suitable. I think mixing
>> non-related events into one notifier list isn't nice.
> I think that's only for migration state changes?
>
>>>> Btw, the migration_state_notifiers is already there, but seems not really
>>>> used (I only tracked spice-core.c called
>>>> add_migration_state_change_notifier). I thought adding new migration states
>>>> can reuse all that we have.
>>>> What's your real concern about that? (not sure how defining new events would
>>>> make a difference)
>>> Migration state is exposed via control path (QMP).  Adding new states
>>> mean that the QMP clients will see more.  IMO that's not really
>>> anything that a QMP client will need to know, instead we can keep it
>>> internally.  That's a reason from compatibility pov.
>>>
>>> Meanwhile, it's not really a state-thing at all for me.  It looks
>>> really more like hook or event (start/stop of sync).
>> Thanks for sharing your concerns in detail, which are quite helpful for the
>> discussion. To reuse 99a0db9b, we can also add sub-states (or say events),
>> instead of new migration states.
>> For example, we can still define "enum RamSaveState" as above, which can be
>> an indication for the notifier queued on the 99a0db9b notider_list to decide
>> whether to call start or stop.
>> Does this solve your concern?
> Frankly speaking I don't fully understand how you would add that
> sub-state.  If you are confident with the idea, maybe you can post
> your new version with the change, then I can read the code.

Sure. Code is more straightforward for this one. Let's check it in the 
new version.

>>>>> This is not that obvious to me.  For now I think it's true, since when
>>>>> we call stop() we'll take the mutex, meanwhile the mutex is actually
>>>>> always held by the iothread (in the big loop in
>>>>> virtio_balloon_poll_free_page_hints) until either:
>>>>>
>>>>> - it sleeps in qemu_cond_wait() [1], or
>>>>> - it leaves the big loop [2]
>>>>>
>>>>> Since I don't see anyone who will set dev->block_iothread to true for
>>>>> the balloon device, then [1] cannot happen;
>>>> there is a case in virtio_balloon_set_status which sets dev->block_iothread
>>>> to true.
>>>>
>>>> Did you mean the free_page_lock mutex? it is released at the bottom of the
>>>> while() loop in virtio_balloon_poll_free_page_hint. It's actually released
>>>> for every hint. That is,
>>>>
>>>> while(1){
>>>>       take the lock;
>>>>       process 1 hint from the vq;
>>>>       release the lock;
>>>> }
>>> Ah, so now I understand why you need the lock to be inside the loop,
>>> since the loop is busy polling actually.  Is it possible to do this in
>>> an async way?
>> We need to use polling here because of some back story in the guest side
>> (due to some locks being held) that makes it a barrier to sending
>> notifications for each hints.
> Any link to the "back story" that I can learn about? :) If it's too
> complicated a problem and you think I don't need to understand at all,
> please feel free to do so.

I searched a little bit, and forgot where we discussed this one. But the 
conclusion is that we don't want kick happens when the mm lock is held. 
Also, polling is a good idea here to me.
There are 32 versions of kernel patch discussions scattered, interesting 
to watch, but might take too much time. Also people usually have 
different thoughts (sometimes with partial understanding) when they 
watch something (we even have many different versions of implementations 
ourselves if you check the whole 32 versions). It's not easy to get here 
with many consensus. That's why I hope our discussion could be more 
focused on the migration part, which is the last part that has not be 
fully finalized.



> Then I would assume at least Michael has
> fully acknowledged that idea, and I can just stop putting more time on
> this part.

Yes, he's been on the loop since the beginning.


>
> Besides, if you are going to use a busy loop, then I would be not
> quite sure about whether you really want to share that iothread with
> others, since AFAIU that's not how iothread is designed (which is
> mostly event-based and should not welcome out-of-control blocking in
> the handler of events).  Though I am not 100% confident about my
> understaning on that, I only raise this question up.  Anyway you'll
> just take over the thread for a while without sharing, and after the
> burst IOs it's mostly never used (until the next bitmap sync).  Then
> it seems a bit confusing to me on why you need to share that after
> all.

Not necessarily _need_ to share it, I meant it can be shared using qemu 
command line.
Live migration doesn't happen all the time, and that optimization 
doesn't run that long, if users want to have other BHs run in this 
iothread context, they can only create one iothread via the qemu cmd line.


>
>>> I'm a bit curious on how much time will it use to do
>>> one round of the free page hints (e.g., an idle guest with 8G mem, or
>>> any configuration you tested)?  I suppose during that time the
>>> iothread will be held steady with 100% cpu usage, am I right?
>> Compared to the time spent by the legacy migration to send free pages, that
>> small amount of CPU usage spent on filtering free pages could be neglected.
>> Grinding a chopper will not hold up the work of cutting firewood :)
> Sorry I didn't express myself clearly.
>
> My question was that, have you measured how long time it will take
> from starting of the free page hints (when balloon state is set to
> FREE_PAGE_REPORT_S_REQUESTED), until it completes (when QEMU receives
> the VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID, then set the status to
> FREE_PAGE_REPORT_S_STOP)?
>

I vaguely remember it's several ms (for around 7.5G free pages) long 
time ago. What would be the concern behind that number you want to know?

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-07 11:59                           ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-07 11:59 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/07/2018 02:32 PM, Peter Xu wrote:
> On Thu, Jun 07, 2018 at 01:24:29PM +0800, Wei Wang wrote:
>> On 06/06/2018 07:02 PM, Peter Xu wrote:
>>> On Wed, Jun 06, 2018 at 06:04:23PM +0800, Wei Wang wrote:
>>>> On 06/06/2018 01:42 PM, Peter Xu wrote:
>>>>> IMHO migration states do not suite here.  IMHO bitmap syncing is too
>>>>> frequently an operation, especially at the end of a precopy migration.
>>>>> If you really want to introduce some notifiers, I would prefer
>>>>> something new rather than fiddling around with migration state.  E.g.,
>>>>> maybe a new migration event notifiers, then introduce two new events
>>>>> for both start/end of bitmap syncing.
>>>> Please see if below aligns to what you meant:
>>>>
>>>> MigrationState {
>>>> ...
>>>> + int ram_save_state;
>>>>
>>>> }
>>>>
>>>> typedef enum RamSaveState {
>>>>       RAM_SAVE_BEGIN = 0,
>>>>       RAM_SAVE_END = 1,
>>>>       RAM_SAVE_MAX = 2
>>>> }
>>>>
>>>> then at the step 1) and 3) you concluded somewhere below, we change the
>>>> state and invoke the callback.
>>> I mean something like this:
>>>
>>> 1693c64c27 ("postcopy: Add notifier chain", 2018-03-20)
>>>
>>> That was a postcopy-only notifier.  Maybe we can generalize it into a
>>> more common notifier for the migration framework so that we can even
>>> register with non-postcopy events like bitmap syncing?
>> Precopy already has its own notifiers: git 99a0db9b
>> If we want to reuse, that one would be more suitable. I think mixing
>> non-related events into one notifier list isn't nice.
> I think that's only for migration state changes?
>
>>>> Btw, the migration_state_notifiers is already there, but seems not really
>>>> used (I only tracked spice-core.c called
>>>> add_migration_state_change_notifier). I thought adding new migration states
>>>> can reuse all that we have.
>>>> What's your real concern about that? (not sure how defining new events would
>>>> make a difference)
>>> Migration state is exposed via control path (QMP).  Adding new states
>>> mean that the QMP clients will see more.  IMO that's not really
>>> anything that a QMP client will need to know, instead we can keep it
>>> internally.  That's a reason from compatibility pov.
>>>
>>> Meanwhile, it's not really a state-thing at all for me.  It looks
>>> really more like hook or event (start/stop of sync).
>> Thanks for sharing your concerns in detail, which are quite helpful for the
>> discussion. To reuse 99a0db9b, we can also add sub-states (or say events),
>> instead of new migration states.
>> For example, we can still define "enum RamSaveState" as above, which can be
>> an indication for the notifier queued on the 99a0db9b notider_list to decide
>> whether to call start or stop.
>> Does this solve your concern?
> Frankly speaking I don't fully understand how you would add that
> sub-state.  If you are confident with the idea, maybe you can post
> your new version with the change, then I can read the code.

Sure. Code is more straightforward for this one. Let's check it in the 
new version.

>>>>> This is not that obvious to me.  For now I think it's true, since when
>>>>> we call stop() we'll take the mutex, meanwhile the mutex is actually
>>>>> always held by the iothread (in the big loop in
>>>>> virtio_balloon_poll_free_page_hints) until either:
>>>>>
>>>>> - it sleeps in qemu_cond_wait() [1], or
>>>>> - it leaves the big loop [2]
>>>>>
>>>>> Since I don't see anyone who will set dev->block_iothread to true for
>>>>> the balloon device, then [1] cannot happen;
>>>> there is a case in virtio_balloon_set_status which sets dev->block_iothread
>>>> to true.
>>>>
>>>> Did you mean the free_page_lock mutex? it is released at the bottom of the
>>>> while() loop in virtio_balloon_poll_free_page_hint. It's actually released
>>>> for every hint. That is,
>>>>
>>>> while(1){
>>>>       take the lock;
>>>>       process 1 hint from the vq;
>>>>       release the lock;
>>>> }
>>> Ah, so now I understand why you need the lock to be inside the loop,
>>> since the loop is busy polling actually.  Is it possible to do this in
>>> an async way?
>> We need to use polling here because of some back story in the guest side
>> (due to some locks being held) that makes it a barrier to sending
>> notifications for each hints.
> Any link to the "back story" that I can learn about? :) If it's too
> complicated a problem and you think I don't need to understand at all,
> please feel free to do so.

I searched a little bit, and forgot where we discussed this one. But the 
conclusion is that we don't want kick happens when the mm lock is held. 
Also, polling is a good idea here to me.
There are 32 versions of kernel patch discussions scattered, interesting 
to watch, but might take too much time. Also people usually have 
different thoughts (sometimes with partial understanding) when they 
watch something (we even have many different versions of implementations 
ourselves if you check the whole 32 versions). It's not easy to get here 
with many consensus. That's why I hope our discussion could be more 
focused on the migration part, which is the last part that has not be 
fully finalized.



> Then I would assume at least Michael has
> fully acknowledged that idea, and I can just stop putting more time on
> this part.

Yes, he's been on the loop since the beginning.


>
> Besides, if you are going to use a busy loop, then I would be not
> quite sure about whether you really want to share that iothread with
> others, since AFAIU that's not how iothread is designed (which is
> mostly event-based and should not welcome out-of-control blocking in
> the handler of events).  Though I am not 100% confident about my
> understaning on that, I only raise this question up.  Anyway you'll
> just take over the thread for a while without sharing, and after the
> burst IOs it's mostly never used (until the next bitmap sync).  Then
> it seems a bit confusing to me on why you need to share that after
> all.

Not necessarily _need_ to share it, I meant it can be shared using qemu 
command line.
Live migration doesn't happen all the time, and that optimization 
doesn't run that long, if users want to have other BHs run in this 
iothread context, they can only create one iothread via the qemu cmd line.


>
>>> I'm a bit curious on how much time will it use to do
>>> one round of the free page hints (e.g., an idle guest with 8G mem, or
>>> any configuration you tested)?  I suppose during that time the
>>> iothread will be held steady with 100% cpu usage, am I right?
>> Compared to the time spent by the legacy migration to send free pages, that
>> small amount of CPU usage spent on filtering free pages could be neglected.
>> Grinding a chopper will not hold up the work of cutting firewood :)
> Sorry I didn't express myself clearly.
>
> My question was that, have you measured how long time it will take
> from starting of the free page hints (when balloon state is set to
> FREE_PAGE_REPORT_S_REQUESTED), until it completes (when QEMU receives
> the VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID, then set the status to
> FREE_PAGE_REPORT_S_STOP)?
>

I vaguely remember it's several ms (for around 7.5G free pages) long 
time ago. What would be the concern behind that number you want to know?

Best,
Wei


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-07  6:58           ` Peter Xu
@ 2018-06-07 12:01               ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-07 12:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/07/2018 02:58 PM, Peter Xu wrote:
> On Thu, Jun 07, 2018 at 01:29:22PM +0800, Wei Wang wrote:
>>>>> [...]
>>>>>> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
>>>>>> +    .name = "virtio-balloon-device/free-page-report",
>>>>>> +    .version_id = 1,
>>>>>> +    .minimum_version_id = 1,
>>>>>> +    .needed = virtio_balloon_free_page_support,
>>>>>> +    .fields = (VMStateField[]) {
>>>>>> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
>>>>>> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
>>>>> (could we move all the poison-related lines into another patch or
>>>>>     postpone?  after all we don't support it yet, do we?)
>>>>>
>>>>    We don't support migrating poison value, but guest maybe use it, so we are
>>>> actually disabling this feature in that case. Probably good to leave the
>>>> code together to handle that case.
>>> Could we just avoid declaring that feature bit in emulation code
>>> completely?  I mean, we support VIRTIO_BALLOON_F_FREE_PAGE_HINT first
>>> as the first step (as you mentioned in commit message, the POISON is a
>>> TODO).  Then when you really want to completely support the POISON
>>> bit, you can put all that into a separate patch.  Would that work?
>>>
>> Not really. The F_PAGE_POISON isn't a feature configured via QEMU cmd line
>> like F_FREE_PAGE_HINT. We always set F_PAGE_POISON if F_FREE_PAGE_HINT is
>> enabled. It is used to detect if the guest is using page poison.
> Ok I think I kind of understand.  But it seems strange to me to have
> this as a feature bit.  I thought it suites more to be a config field
> so that guest could setup.  Like, we can have 1 byte to setup "whether
> PAGE_POISON is used in the guest", another 1 byte to setup "what is
> the PAGE_POISON value if it's enabled".

This is also suggested by Michael, which sounds good to me. Using config 
is doable, but that doesn't show advantages over using feature bits.



>
> Asked since I see this in virtio spec (v1.0, though I guess it won't
> change) in chapter "2.2.1 Driver Requirements: Feature Bits":
>
> "The driver MUST NOT accept a feature which the device did not offer"
>
> Then I'm curious what would happen if:
>
> - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> - a guest that enabled PAGE_POISON
>
> Then how the driver could tell the host that PAGE_POISON is enabled
> considering that guest should never set that feature bit if the
> emulation code didn't provide it?
>

All the emulator implementations need to follow the virtio spec. We will 
finally have this feature written to the virtio-balloon device section, 
and state that the F_PAGE_POISON needs to be set on the device when 
F_FREE_PAGE_HINT is set on the device.


Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-07 12:01               ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-07 12:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On 06/07/2018 02:58 PM, Peter Xu wrote:
> On Thu, Jun 07, 2018 at 01:29:22PM +0800, Wei Wang wrote:
>>>>> [...]
>>>>>> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
>>>>>> +    .name = "virtio-balloon-device/free-page-report",
>>>>>> +    .version_id = 1,
>>>>>> +    .minimum_version_id = 1,
>>>>>> +    .needed = virtio_balloon_free_page_support,
>>>>>> +    .fields = (VMStateField[]) {
>>>>>> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
>>>>>> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
>>>>> (could we move all the poison-related lines into another patch or
>>>>>     postpone?  after all we don't support it yet, do we?)
>>>>>
>>>>    We don't support migrating poison value, but guest maybe use it, so we are
>>>> actually disabling this feature in that case. Probably good to leave the
>>>> code together to handle that case.
>>> Could we just avoid declaring that feature bit in emulation code
>>> completely?  I mean, we support VIRTIO_BALLOON_F_FREE_PAGE_HINT first
>>> as the first step (as you mentioned in commit message, the POISON is a
>>> TODO).  Then when you really want to completely support the POISON
>>> bit, you can put all that into a separate patch.  Would that work?
>>>
>> Not really. The F_PAGE_POISON isn't a feature configured via QEMU cmd line
>> like F_FREE_PAGE_HINT. We always set F_PAGE_POISON if F_FREE_PAGE_HINT is
>> enabled. It is used to detect if the guest is using page poison.
> Ok I think I kind of understand.  But it seems strange to me to have
> this as a feature bit.  I thought it suites more to be a config field
> so that guest could setup.  Like, we can have 1 byte to setup "whether
> PAGE_POISON is used in the guest", another 1 byte to setup "what is
> the PAGE_POISON value if it's enabled".

This is also suggested by Michael, which sounds good to me. Using config 
is doable, but that doesn't show advantages over using feature bits.



>
> Asked since I see this in virtio spec (v1.0, though I guess it won't
> change) in chapter "2.2.1 Driver Requirements: Feature Bits":
>
> "The driver MUST NOT accept a feature which the device did not offer"
>
> Then I'm curious what would happen if:
>
> - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> - a guest that enabled PAGE_POISON
>
> Then how the driver could tell the host that PAGE_POISON is enabled
> considering that guest should never set that feature bit if the
> emulation code didn't provide it?
>

All the emulator implementations need to follow the virtio spec. We will 
finally have this feature written to the virtio-balloon device section, 
and state that the F_PAGE_POISON needs to be set on the device when 
F_FREE_PAGE_HINT is set on the device.


Best,
Wei


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-07 12:01               ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-08  1:37               ` Peter Xu
  2018-06-08  1:58                 ` Peter Xu
  2018-06-08  1:58                   ` [virtio-dev] " Michael S. Tsirkin
  -1 siblings, 2 replies; 93+ messages in thread
From: Peter Xu @ 2018-06-08  1:37 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Thu, Jun 07, 2018 at 08:01:42PM +0800, Wei Wang wrote:
> On 06/07/2018 02:58 PM, Peter Xu wrote:
> > On Thu, Jun 07, 2018 at 01:29:22PM +0800, Wei Wang wrote:
> > > > > > [...]
> > > > > > > +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> > > > > > > +    .name = "virtio-balloon-device/free-page-report",
> > > > > > > +    .version_id = 1,
> > > > > > > +    .minimum_version_id = 1,
> > > > > > > +    .needed = virtio_balloon_free_page_support,
> > > > > > > +    .fields = (VMStateField[]) {
> > > > > > > +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> > > > > > > +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> > > > > > (could we move all the poison-related lines into another patch or
> > > > > >     postpone?  after all we don't support it yet, do we?)
> > > > > > 
> > > > >    We don't support migrating poison value, but guest maybe use it, so we are
> > > > > actually disabling this feature in that case. Probably good to leave the
> > > > > code together to handle that case.
> > > > Could we just avoid declaring that feature bit in emulation code
> > > > completely?  I mean, we support VIRTIO_BALLOON_F_FREE_PAGE_HINT first
> > > > as the first step (as you mentioned in commit message, the POISON is a
> > > > TODO).  Then when you really want to completely support the POISON
> > > > bit, you can put all that into a separate patch.  Would that work?
> > > > 
> > > Not really. The F_PAGE_POISON isn't a feature configured via QEMU cmd line
> > > like F_FREE_PAGE_HINT. We always set F_PAGE_POISON if F_FREE_PAGE_HINT is
> > > enabled. It is used to detect if the guest is using page poison.
> > Ok I think I kind of understand.  But it seems strange to me to have
> > this as a feature bit.  I thought it suites more to be a config field
> > so that guest could setup.  Like, we can have 1 byte to setup "whether
> > PAGE_POISON is used in the guest", another 1 byte to setup "what is
> > the PAGE_POISON value if it's enabled".
> 
> This is also suggested by Michael, which sounds good to me. Using config is
> doable, but that doesn't show advantages over using feature bits.
> 
> 
> 
> > 
> > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > 
> > "The driver MUST NOT accept a feature which the device did not offer"
> > 
> > Then I'm curious what would happen if:
> > 
> > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > - a guest that enabled PAGE_POISON
> > 
> > Then how the driver could tell the host that PAGE_POISON is enabled
> > considering that guest should never set that feature bit if the
> > emulation code didn't provide it?
> > 
> 
> All the emulator implementations need to follow the virtio spec. We will
> finally have this feature written to the virtio-balloon device section, and
> state that the F_PAGE_POISON needs to be set on the device when
> F_FREE_PAGE_HINT is set on the device.

Okay.  Still I would think a single feature cleaner here since they
are actually tightly bound together, e.g., normally AFAIU this only
happens when we introduce FEATURE1, after a while we introduced
FEATURE2, then we need to have two features there since there are
emulators that are already running only with FEATURE1.

AFAICT the thing behind is that your kernel patches are split (one for
FEATURE1, one for FEATURE2), however when you only have FEATURE1 it's
actually broken (if without the POISON support, PAGE_HINT feature
might be broken).  So it would be nicer if the kernel patches are
squashed so that no commit would broke any guest.  And, if they are
squashed then IMHO we don't need two feature bits at all. ;)

But anyway, I understand it now.  Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-08  1:37               ` Peter Xu
@ 2018-06-08  1:58                 ` Peter Xu
  2018-06-08  1:58                   ` [virtio-dev] " Michael S. Tsirkin
  1 sibling, 0 replies; 93+ messages in thread
From: Peter Xu @ 2018-06-08  1:58 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, dgilbert, yang.zhang.wz,
	quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:

[...]

> > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > 
> > > "The driver MUST NOT accept a feature which the device did not offer"
> > > 
> > > Then I'm curious what would happen if:
> > > 
> > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > - a guest that enabled PAGE_POISON
> > > 
> > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > considering that guest should never set that feature bit if the
> > > emulation code didn't provide it?
> > > 
> > 
> > All the emulator implementations need to follow the virtio spec. We will
> > finally have this feature written to the virtio-balloon device section, and
> > state that the F_PAGE_POISON needs to be set on the device when
> > F_FREE_PAGE_HINT is set on the device.
> 
> Okay.  Still I would think a single feature cleaner here since they
> are actually tightly bound together, e.g., normally AFAIU this only
> happens when we introduce FEATURE1, after a while we introduced
> FEATURE2, then we need to have two features there since there are
> emulators that are already running only with FEATURE1.
> 
> AFAICT the thing behind is that your kernel patches are split (one for
> FEATURE1, one for FEATURE2), however when you only have FEATURE1 it's
> actually broken (if without the POISON support, PAGE_HINT feature
> might be broken).  So it would be nicer if the kernel patches are
> squashed so that no commit would broke any guest.  And, if they are
> squashed then IMHO we don't need two feature bits at all. ;)
> 
> But anyway, I understand it now.  Thanks,

This also reminds me that since we're going to declare both features
in this single patch, the final version of the patch should contain
the implementation of poisoned bits rather than a todo, am I right?

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-08  1:37               ` Peter Xu
@ 2018-06-08  1:58                   ` Michael S. Tsirkin
  2018-06-08  1:58                   ` [virtio-dev] " Michael S. Tsirkin
  1 sibling, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-06-08  1:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, dgilbert,
	yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:
> > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > 
> > > "The driver MUST NOT accept a feature which the device did not offer"
> > > 
> > > Then I'm curious what would happen if:
> > > 
> > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > - a guest that enabled PAGE_POISON
> > > 
> > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > considering that guest should never set that feature bit if the
> > > emulation code didn't provide it?

It wouldn't. It just has to deal with the fact that host can discard
writes to hinted pages. Right now driver deals with it simply by
disabling F_FREE_PAGE_HINT.

-- 
MST

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-08  1:58                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-06-08  1:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, dgilbert,
	yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:
> > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > 
> > > "The driver MUST NOT accept a feature which the device did not offer"
> > > 
> > > Then I'm curious what would happen if:
> > > 
> > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > - a guest that enabled PAGE_POISON
> > > 
> > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > considering that guest should never set that feature bit if the
> > > emulation code didn't provide it?

It wouldn't. It just has to deal with the fact that host can discard
writes to hinted pages. Right now driver deals with it simply by
disabling F_FREE_PAGE_HINT.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-07 11:59                           ` [virtio-dev] " Wei Wang
  (?)
@ 2018-06-08  2:17                           ` Peter Xu
  2018-06-08  7:14                               ` [virtio-dev] " Wei Wang
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-08  2:17 UTC (permalink / raw)
  To: Wei Wang
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On Thu, Jun 07, 2018 at 07:59:22PM +0800, Wei Wang wrote:
> On 06/07/2018 02:32 PM, Peter Xu wrote:
> > On Thu, Jun 07, 2018 at 01:24:29PM +0800, Wei Wang wrote:
> > > On 06/06/2018 07:02 PM, Peter Xu wrote:
> > > > On Wed, Jun 06, 2018 at 06:04:23PM +0800, Wei Wang wrote:
> > > > > On 06/06/2018 01:42 PM, Peter Xu wrote:
> > > > > > IMHO migration states do not suite here.  IMHO bitmap syncing is too
> > > > > > frequently an operation, especially at the end of a precopy migration.
> > > > > > If you really want to introduce some notifiers, I would prefer
> > > > > > something new rather than fiddling around with migration state.  E.g.,
> > > > > > maybe a new migration event notifiers, then introduce two new events
> > > > > > for both start/end of bitmap syncing.
> > > > > Please see if below aligns to what you meant:
> > > > > 
> > > > > MigrationState {
> > > > > ...
> > > > > + int ram_save_state;
> > > > > 
> > > > > }
> > > > > 
> > > > > typedef enum RamSaveState {
> > > > >       RAM_SAVE_BEGIN = 0,
> > > > >       RAM_SAVE_END = 1,
> > > > >       RAM_SAVE_MAX = 2
> > > > > }
> > > > > 
> > > > > then at the step 1) and 3) you concluded somewhere below, we change the
> > > > > state and invoke the callback.
> > > > I mean something like this:
> > > > 
> > > > 1693c64c27 ("postcopy: Add notifier chain", 2018-03-20)
> > > > 
> > > > That was a postcopy-only notifier.  Maybe we can generalize it into a
> > > > more common notifier for the migration framework so that we can even
> > > > register with non-postcopy events like bitmap syncing?
> > > Precopy already has its own notifiers: git 99a0db9b
> > > If we want to reuse, that one would be more suitable. I think mixing
> > > non-related events into one notifier list isn't nice.
> > I think that's only for migration state changes?
> > 
> > > > > Btw, the migration_state_notifiers is already there, but seems not really
> > > > > used (I only tracked spice-core.c called
> > > > > add_migration_state_change_notifier). I thought adding new migration states
> > > > > can reuse all that we have.
> > > > > What's your real concern about that? (not sure how defining new events would
> > > > > make a difference)
> > > > Migration state is exposed via control path (QMP).  Adding new states
> > > > mean that the QMP clients will see more.  IMO that's not really
> > > > anything that a QMP client will need to know, instead we can keep it
> > > > internally.  That's a reason from compatibility pov.
> > > > 
> > > > Meanwhile, it's not really a state-thing at all for me.  It looks
> > > > really more like hook or event (start/stop of sync).
> > > Thanks for sharing your concerns in detail, which are quite helpful for the
> > > discussion. To reuse 99a0db9b, we can also add sub-states (or say events),
> > > instead of new migration states.
> > > For example, we can still define "enum RamSaveState" as above, which can be
> > > an indication for the notifier queued on the 99a0db9b notider_list to decide
> > > whether to call start or stop.
> > > Does this solve your concern?
> > Frankly speaking I don't fully understand how you would add that
> > sub-state.  If you are confident with the idea, maybe you can post
> > your new version with the change, then I can read the code.
> 
> Sure. Code is more straightforward for this one. Let's check it in the new
> version.
> 
> > > > > > This is not that obvious to me.  For now I think it's true, since when
> > > > > > we call stop() we'll take the mutex, meanwhile the mutex is actually
> > > > > > always held by the iothread (in the big loop in
> > > > > > virtio_balloon_poll_free_page_hints) until either:
> > > > > > 
> > > > > > - it sleeps in qemu_cond_wait() [1], or
> > > > > > - it leaves the big loop [2]
> > > > > > 
> > > > > > Since I don't see anyone who will set dev->block_iothread to true for
> > > > > > the balloon device, then [1] cannot happen;
> > > > > there is a case in virtio_balloon_set_status which sets dev->block_iothread
> > > > > to true.
> > > > > 
> > > > > Did you mean the free_page_lock mutex? it is released at the bottom of the
> > > > > while() loop in virtio_balloon_poll_free_page_hint. It's actually released
> > > > > for every hint. That is,
> > > > > 
> > > > > while(1){
> > > > >       take the lock;
> > > > >       process 1 hint from the vq;
> > > > >       release the lock;
> > > > > }
> > > > Ah, so now I understand why you need the lock to be inside the loop,
> > > > since the loop is busy polling actually.  Is it possible to do this in
> > > > an async way?
> > > We need to use polling here because of some back story in the guest side
> > > (due to some locks being held) that makes it a barrier to sending
> > > notifications for each hints.
> > Any link to the "back story" that I can learn about? :) If it's too
> > complicated a problem and you think I don't need to understand at all,
> > please feel free to do so.
> 
> I searched a little bit, and forgot where we discussed this one. But the
> conclusion is that we don't want kick happens when the mm lock is held.
> Also, polling is a good idea here to me.
> There are 32 versions of kernel patch discussions scattered, interesting to
> watch, but might take too much time. Also people usually have different
> thoughts (sometimes with partial understanding) when they watch something
> (we even have many different versions of implementations ourselves if you
> check the whole 32 versions). It's not easy to get here with many consensus.
> That's why I hope our discussion could be more focused on the migration
> part, which is the last part that has not be fully finalized.

It's ok.

I'd be focused on migration part if you have a very clear interface
declared. :) You know, it was not even clear to me before I read the
series on whether the free_page_stop() operation is synchronous. And
IMHO that's really important even if I focus on migration review.

I'd say I'll treat reviewers somehow differently from you.  But I
don't think that worth a debate.

> 
> 
> 
> > Then I would assume at least Michael has
> > fully acknowledged that idea, and I can just stop putting more time on
> > this part.
> 
> Yes, he's been on the loop since the beginning.
> 
> 
> > 
> > Besides, if you are going to use a busy loop, then I would be not
> > quite sure about whether you really want to share that iothread with
> > others, since AFAIU that's not how iothread is designed (which is
> > mostly event-based and should not welcome out-of-control blocking in
> > the handler of events).  Though I am not 100% confident about my
> > understaning on that, I only raise this question up.  Anyway you'll
> > just take over the thread for a while without sharing, and after the
> > burst IOs it's mostly never used (until the next bitmap sync).  Then
> > it seems a bit confusing to me on why you need to share that after
> > all.
> 
> Not necessarily _need_ to share it, I meant it can be shared using qemu
> command line.
> Live migration doesn't happen all the time, and that optimization doesn't
> run that long, if users want to have other BHs run in this iothread context,
> they can only create one iothread via the qemu cmd line.

IMO iothreads and aiocontexts are for event-driven model.  Busy loop
is not an event-driven model.  Here if we want a busy loop I'll create
a thread when start page hinting, then join the thread when done.

But I'll stop commenting on this.  Please prepare a more clear
interface for migration in your next post.  I'll read that.

> 
> 
> > 
> > > > I'm a bit curious on how much time will it use to do
> > > > one round of the free page hints (e.g., an idle guest with 8G mem, or
> > > > any configuration you tested)?  I suppose during that time the
> > > > iothread will be held steady with 100% cpu usage, am I right?
> > > Compared to the time spent by the legacy migration to send free pages, that
> > > small amount of CPU usage spent on filtering free pages could be neglected.
> > > Grinding a chopper will not hold up the work of cutting firewood :)
> > Sorry I didn't express myself clearly.
> > 
> > My question was that, have you measured how long time it will take
> > from starting of the free page hints (when balloon state is set to
> > FREE_PAGE_REPORT_S_REQUESTED), until it completes (when QEMU receives
> > the VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID, then set the status to
> > FREE_PAGE_REPORT_S_STOP)?
> > 
> 
> I vaguely remember it's several ms (for around 7.5G free pages) long time
> ago. What would be the concern behind that number you want to know?

Because roughly I know the time between two bitmap syncs.  Then I will
know how possible a free page hinting process won't stop until the
next bitmap sync happens.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-08  1:58                   ` [virtio-dev] " Michael S. Tsirkin
  (?)
@ 2018-06-08  2:34                   ` Peter Xu
  2018-06-08  2:49                       ` [virtio-dev] " Michael S. Tsirkin
  -1 siblings, 1 reply; 93+ messages in thread
From: Peter Xu @ 2018-06-08  2:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, dgilbert,
	yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 08, 2018 at 04:58:21AM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:
> > > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > > 
> > > > "The driver MUST NOT accept a feature which the device did not offer"
> > > > 
> > > > Then I'm curious what would happen if:
> > > > 
> > > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > > - a guest that enabled PAGE_POISON
> > > > 
> > > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > > considering that guest should never set that feature bit if the
> > > > emulation code didn't provide it?
> 
> It wouldn't. It just has to deal with the fact that host can discard
> writes to hinted pages. Right now driver deals with it simply by
> disabling F_FREE_PAGE_HINT.

Ah I see.  Thanks Michael.

Then it seems to me that it's more important to implement the F_POISON
along with where it is declared since otherwise it'll be a real broken
(device declares F_POISON, guest assumes it can handle the POISON so
guest will enable FREE_PAGE_HINT, however the device can't really
handle that).

Or, if the guest driver is capable to drop F_FREE_PAGE_HINT when
F_POISON is not declared, we can safely split the two features into
two patches in QEMU too.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-08  2:34                   ` Peter Xu
@ 2018-06-08  2:49                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-06-08  2:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, dgilbert,
	yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 08, 2018 at 10:34:25AM +0800, Peter Xu wrote:
> On Fri, Jun 08, 2018 at 04:58:21AM +0300, Michael S. Tsirkin wrote:
> > On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:
> > > > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > > > 
> > > > > "The driver MUST NOT accept a feature which the device did not offer"
> > > > > 
> > > > > Then I'm curious what would happen if:
> > > > > 
> > > > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > > > - a guest that enabled PAGE_POISON
> > > > > 
> > > > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > > > considering that guest should never set that feature bit if the
> > > > > emulation code didn't provide it?
> > 
> > It wouldn't. It just has to deal with the fact that host can discard
> > writes to hinted pages. Right now driver deals with it simply by
> > disabling F_FREE_PAGE_HINT.
> 
> Ah I see.  Thanks Michael.
> 
> Then it seems to me that it's more important to implement the F_POISON
> along with where it is declared since otherwise it'll be a real broken
> (device declares F_POISON, guest assumes it can handle the POISON so
> guest will enable FREE_PAGE_HINT, however the device can't really
> handle that).

It seems to handle it fine, it just ignores the hints.

> Or, if the guest driver is capable to drop F_FREE_PAGE_HINT when
> F_POISON is not declared, we can safely split the two features into
> two patches in QEMU too.
> 
> Regards,
> 
> -- 
> Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-08  2:49                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 93+ messages in thread
From: Michael S. Tsirkin @ 2018-06-08  2:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, dgilbert,
	yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 08, 2018 at 10:34:25AM +0800, Peter Xu wrote:
> On Fri, Jun 08, 2018 at 04:58:21AM +0300, Michael S. Tsirkin wrote:
> > On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:
> > > > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > > > 
> > > > > "The driver MUST NOT accept a feature which the device did not offer"
> > > > > 
> > > > > Then I'm curious what would happen if:
> > > > > 
> > > > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > > > - a guest that enabled PAGE_POISON
> > > > > 
> > > > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > > > considering that guest should never set that feature bit if the
> > > > > emulation code didn't provide it?
> > 
> > It wouldn't. It just has to deal with the fact that host can discard
> > writes to hinted pages. Right now driver deals with it simply by
> > disabling F_FREE_PAGE_HINT.
> 
> Ah I see.  Thanks Michael.
> 
> Then it seems to me that it's more important to implement the F_POISON
> along with where it is declared since otherwise it'll be a real broken
> (device declares F_POISON, guest assumes it can handle the POISON so
> guest will enable FREE_PAGE_HINT, however the device can't really
> handle that).

It seems to handle it fine, it just ignores the hints.

> Or, if the guest driver is capable to drop F_FREE_PAGE_HINT when
> F_POISON is not declared, we can safely split the two features into
> two patches in QEMU too.
> 
> Regards,
> 
> -- 
> Peter Xu

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-08  2:49                       ` [virtio-dev] " Michael S. Tsirkin
  (?)
@ 2018-06-08  3:34                       ` Peter Xu
  -1 siblings, 0 replies; 93+ messages in thread
From: Peter Xu @ 2018-06-08  3:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, dgilbert,
	yang.zhang.wz, quan.xu0, liliang.opensource, pbonzini, nilal

On Fri, Jun 08, 2018 at 05:49:26AM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 08, 2018 at 10:34:25AM +0800, Peter Xu wrote:
> > On Fri, Jun 08, 2018 at 04:58:21AM +0300, Michael S. Tsirkin wrote:
> > > On Fri, Jun 08, 2018 at 09:37:23AM +0800, Peter Xu wrote:
> > > > > > Asked since I see this in virtio spec (v1.0, though I guess it won't
> > > > > > change) in chapter "2.2.1 Driver Requirements: Feature Bits":
> > > > > > 
> > > > > > "The driver MUST NOT accept a feature which the device did not offer"
> > > > > > 
> > > > > > Then I'm curious what would happen if:
> > > > > > 
> > > > > > - a emulator (not QEMU) only offered F_FREE_PAGE_HINT, not F_POISON
> > > > > > - a guest that enabled PAGE_POISON
> > > > > > 
> > > > > > Then how the driver could tell the host that PAGE_POISON is enabled
> > > > > > considering that guest should never set that feature bit if the
> > > > > > emulation code didn't provide it?
> > > 
> > > It wouldn't. It just has to deal with the fact that host can discard
> > > writes to hinted pages. Right now driver deals with it simply by
> > > disabling F_FREE_PAGE_HINT.
> > 
> > Ah I see.  Thanks Michael.
> > 
> > Then it seems to me that it's more important to implement the F_POISON
> > along with where it is declared since otherwise it'll be a real broken
> > (device declares F_POISON, guest assumes it can handle the POISON so
> > guest will enable FREE_PAGE_HINT, however the device can't really
> > handle that).
> 
> It seems to handle it fine, it just ignores the hints.

Ok I misunderstood.  Then that's fine.

The message in the commit message is a bit misleading:

"TODO: - handle the case when page poisoning is in use"

It seems to me that:

"Now we handle the page poisoning by dropping the page hints
 directly.  In the future we might do something better."

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-08  2:17                           ` [Qemu-devel] " Peter Xu
@ 2018-06-08  7:14                               ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-08  7:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/08/2018 10:17 AM, Peter Xu wrote:
> On Thu, Jun 07, 2018 at 07:59:22PM +0800, Wei Wang wrote:
>> Not necessarily _need_ to share it, I meant it can be shared using qemu
>> command line.
>> Live migration doesn't happen all the time, and that optimization doesn't
>> run that long, if users want to have other BHs run in this iothread context,
>> they can only create one iothread via the qemu cmd line.
> IMO iothreads and aiocontexts are for event-driven model.

To me it's just a thread which polls for submitted callbacks to run. 
When migration reaches the place that needs to submit the optimization 
function, it calls start() to submit it. I'm not sure why there is a 
worry about what's inside the callback.

> Busy loop
> is not an event-driven model.  Here if we want a busy loop I'll create
> a thread when start page hinting, then join the thread when done.

  The old (v4) implementation worked that way as you mentioned above, 
and Michael suggested to use iothread in the previous discussion. I'm 
fine with both actually. For the virtio part, we've had many 
discussions, I would take the choice I had with Michael before, unless 
there is an obvious advantage (e.g. proved better performance).


>
> But I'll stop commenting on this.  Please prepare a more clear
> interface for migration in your next post.  I'll read that.
>

Sure, thanks. The new version is coming soon.




>>
>>>>> I'm a bit curious on how much time will it use to do
>>>>> one round of the free page hints (e.g., an idle guest with 8G mem, or
>>>>> any configuration you tested)?  I suppose during that time the
>>>>> iothread will be held steady with 100% cpu usage, am I right?
>>>> Compared to the time spent by the legacy migration to send free pages, that
>>>> small amount of CPU usage spent on filtering free pages could be neglected.
>>>> Grinding a chopper will not hold up the work of cutting firewood :)
>>> Sorry I didn't express myself clearly.
>>>
>>> My question was that, have you measured how long time it will take
>>> from starting of the free page hints (when balloon state is set to
>>> FREE_PAGE_REPORT_S_REQUESTED), until it completes (when QEMU receives
>>> the VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID, then set the status to
>>> FREE_PAGE_REPORT_S_STOP)?
>>>
>> I vaguely remember it's several ms (for around 7.5G free pages) long time
>> ago. What would be the concern behind that number you want to know?
> Because roughly I know the time between two bitmap syncs.  Then I will
> know how possible a free page hinting process won't stop until the
> next bitmap sync happens.

We have a function, stop(), to stop the optimization before the next 
bitmap sync if the optimization is still running. But I never saw that 
case happens (the free page hinting finishes itself before that).

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-08  7:14                               ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-08  7:14 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/08/2018 10:17 AM, Peter Xu wrote:
> On Thu, Jun 07, 2018 at 07:59:22PM +0800, Wei Wang wrote:
>> Not necessarily _need_ to share it, I meant it can be shared using qemu
>> command line.
>> Live migration doesn't happen all the time, and that optimization doesn't
>> run that long, if users want to have other BHs run in this iothread context,
>> they can only create one iothread via the qemu cmd line.
> IMO iothreads and aiocontexts are for event-driven model.

To me it's just a thread which polls for submitted callbacks to run. 
When migration reaches the place that needs to submit the optimization 
function, it calls start() to submit it. I'm not sure why there is a 
worry about what's inside the callback.

> Busy loop
> is not an event-driven model.  Here if we want a busy loop I'll create
> a thread when start page hinting, then join the thread when done.

  The old (v4) implementation worked that way as you mentioned above, 
and Michael suggested to use iothread in the previous discussion. I'm 
fine with both actually. For the virtio part, we've had many 
discussions, I would take the choice I had with Michael before, unless 
there is an obvious advantage (e.g. proved better performance).


>
> But I'll stop commenting on this.  Please prepare a more clear
> interface for migration in your next post.  I'll read that.
>

Sure, thanks. The new version is coming soon.




>>
>>>>> I'm a bit curious on how much time will it use to do
>>>>> one round of the free page hints (e.g., an idle guest with 8G mem, or
>>>>> any configuration you tested)?  I suppose during that time the
>>>>> iothread will be held steady with 100% cpu usage, am I right?
>>>> Compared to the time spent by the legacy migration to send free pages, that
>>>> small amount of CPU usage spent on filtering free pages could be neglected.
>>>> Grinding a chopper will not hold up the work of cutting firewood :)
>>> Sorry I didn't express myself clearly.
>>>
>>> My question was that, have you measured how long time it will take
>>> from starting of the free page hints (when balloon state is set to
>>> FREE_PAGE_REPORT_S_REQUESTED), until it completes (when QEMU receives
>>> the VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID, then set the status to
>>> FREE_PAGE_REPORT_S_STOP)?
>>>
>> I vaguely remember it's several ms (for around 7.5G free pages) long time
>> ago. What would be the concern behind that number you want to know?
> Because roughly I know the time between two bitmap syncs.  Then I will
> know how possible a free page hinting process won't stop until the
> next bitmap sync happens.

We have a function, stop(), to stop the optimization before the next 
bitmap sync if the optimization is still running. But I never saw that 
case happens (the free page hinting finishes itself before that).

Best,
Wei


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-06-07  6:32                       ` [Qemu-devel] " Peter Xu
@ 2018-06-08  7:31                           ` Wei Wang
  2018-06-08  7:31                           ` [virtio-dev] " Wei Wang
  1 sibling, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-08  7:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/07/2018 02:32 PM, Peter Xu wrote:
>>>> Btw, the migration_state_notifiers is already there, but seems not really
>>>> used (I only tracked spice-core.c called
>>>> add_migration_state_change_notifier). I thought adding new migration states
>>>> can reuse all that we have.
>>>> What's your real concern about that? (not sure how defining new events would
>>>> make a difference)
>>> Migration state is exposed via control path (QMP).  Adding new states
>>> mean that the QMP clients will see more.  IMO that's not really
>>> anything that a QMP client will need to know, instead we can keep it
>>> internally.  That's a reason from compatibility pov.
>>>
>>> Meanwhile, it's not really a state-thing at all for me.  It looks
>>> really more like hook or event (start/stop of sync).
>> Thanks for sharing your concerns in detail, which are quite helpful for the
>> discussion. To reuse 99a0db9b, we can also add sub-states (or say events),
>> instead of new migration states.
>> For example, we can still define "enum RamSaveState" as above, which can be
>> an indication for the notifier queued on the 99a0db9b notider_list to decide
>> whether to call start or stop.
>> Does this solve your concern?
> Frankly speaking I don't fully understand how you would add that
> sub-state.  If you are confident with the idea, maybe you can post
> your new version with the change, then I can read the code.

Reusing 99a0db9b functions well, but I find it is more clear to let ram 
save state have it's own notifier list..will show how that works in v8.

Best,
Wei

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [virtio-dev] Re: [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-06-08  7:31                           ` Wei Wang
  0 siblings, 0 replies; 93+ messages in thread
From: Wei Wang @ 2018-06-08  7:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: Michael S. Tsirkin, qemu-devel, virtio-dev, quintela, dgilbert,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel, zhang.zhanghailiang

On 06/07/2018 02:32 PM, Peter Xu wrote:
>>>> Btw, the migration_state_notifiers is already there, but seems not really
>>>> used (I only tracked spice-core.c called
>>>> add_migration_state_change_notifier). I thought adding new migration states
>>>> can reuse all that we have.
>>>> What's your real concern about that? (not sure how defining new events would
>>>> make a difference)
>>> Migration state is exposed via control path (QMP).  Adding new states
>>> mean that the QMP clients will see more.  IMO that's not really
>>> anything that a QMP client will need to know, instead we can keep it
>>> internally.  That's a reason from compatibility pov.
>>>
>>> Meanwhile, it's not really a state-thing at all for me.  It looks
>>> really more like hook or event (start/stop of sync).
>> Thanks for sharing your concerns in detail, which are quite helpful for the
>> discussion. To reuse 99a0db9b, we can also add sub-states (or say events),
>> instead of new migration states.
>> For example, we can still define "enum RamSaveState" as above, which can be
>> an indication for the notifier queued on the 99a0db9b notider_list to decide
>> whether to call start or stop.
>> Does this solve your concern?
> Frankly speaking I don't fully understand how you would add that
> sub-state.  If you are confident with the idea, maybe you can post
> your new version with the change, then I can read the code.

Reusing 99a0db9b functions well, but I find it is more clear to let ram 
save state have it's own notifier list..will show how that works in v8.

Best,
Wei



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 93+ messages in thread

end of thread, other threads:[~2018-06-08  7:28 UTC | newest]

Thread overview: 93+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-24  6:13 [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support Wei Wang
2018-04-24  6:13 ` [virtio-dev] " Wei Wang
2018-04-24  6:13 ` [Qemu-devel] [PATCH v7 1/5] bitmap: bitmap_count_one_with_offset Wei Wang
2018-04-24  6:13   ` [virtio-dev] " Wei Wang
2018-04-24  6:13 ` [Qemu-devel] [PATCH v7 2/5] migration: use bitmap_mutex in migration_bitmap_clear_dirty Wei Wang
2018-04-24  6:13   ` [virtio-dev] " Wei Wang
2018-06-01  3:37   ` [Qemu-devel] " Peter Xu
2018-04-24  6:13 ` [Qemu-devel] [PATCH v7 3/5] migration: API to clear bits of guest free pages from the dirty bitmap Wei Wang
2018-04-24  6:13   ` [virtio-dev] " Wei Wang
2018-06-01  4:00   ` [Qemu-devel] " Peter Xu
2018-06-01  7:36     ` Wei Wang
2018-06-01  7:36       ` [virtio-dev] " Wei Wang
2018-06-01 10:06       ` Peter Xu
2018-06-01 12:32         ` Wei Wang
2018-06-01 12:32           ` [virtio-dev] " Wei Wang
2018-06-04  2:49           ` Peter Xu
2018-06-04  7:43             ` Wei Wang
2018-06-04  7:43               ` [virtio-dev] " Wei Wang
2018-04-24  6:13 ` [Qemu-devel] [PATCH v7 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-04-24  6:13   ` [virtio-dev] " Wei Wang
2018-05-29 15:24   ` [Qemu-devel] " Michael S. Tsirkin
2018-05-29 15:24     ` [virtio-dev] " Michael S. Tsirkin
2018-05-30  9:12     ` [Qemu-devel] " Wei Wang
2018-05-30  9:12       ` [virtio-dev] " Wei Wang
2018-05-30 12:47       ` [Qemu-devel] " Michael S. Tsirkin
2018-05-30 12:47         ` [virtio-dev] " Michael S. Tsirkin
2018-05-31  2:27         ` [Qemu-devel] " Wei Wang
2018-05-31  2:27           ` [virtio-dev] " Wei Wang
2018-05-31 17:42           ` [Qemu-devel] " Michael S. Tsirkin
2018-05-31 17:42             ` [virtio-dev] " Michael S. Tsirkin
2018-06-01  3:18             ` [Qemu-devel] " Wei Wang
2018-06-01  3:18               ` [virtio-dev] " Wei Wang
2018-06-04  8:04         ` [Qemu-devel] " Wei Wang
2018-06-04  8:04           ` [virtio-dev] " Wei Wang
2018-06-05  6:58           ` [Qemu-devel] " Peter Xu
2018-06-05 13:22             ` Wei Wang
2018-06-05 13:22               ` [virtio-dev] " Wei Wang
2018-06-06  5:42               ` [Qemu-devel] " Peter Xu
2018-06-06 10:04                 ` Wei Wang
2018-06-06 10:04                   ` [virtio-dev] " Wei Wang
2018-06-06 11:02                   ` [Qemu-devel] " Peter Xu
2018-06-07  5:24                     ` Wei Wang
2018-06-07  5:24                       ` [virtio-dev] " Wei Wang
2018-06-07  6:32                       ` [Qemu-devel] " Peter Xu
2018-06-07 11:59                         ` Wei Wang
2018-06-07 11:59                           ` [virtio-dev] " Wei Wang
2018-06-08  2:17                           ` [Qemu-devel] " Peter Xu
2018-06-08  7:14                             ` Wei Wang
2018-06-08  7:14                               ` [virtio-dev] " Wei Wang
2018-06-08  7:31                         ` [Qemu-devel] " Wei Wang
2018-06-08  7:31                           ` [virtio-dev] " Wei Wang
2018-06-06  6:43   ` [Qemu-devel] " Peter Xu
2018-06-06 10:11     ` Wei Wang
2018-06-06 10:11       ` [virtio-dev] " Wei Wang
2018-06-07  3:17       ` Peter Xu
2018-06-07  5:29         ` Wei Wang
2018-06-07  5:29           ` [virtio-dev] " Wei Wang
2018-06-07  6:58           ` Peter Xu
2018-06-07 12:01             ` Wei Wang
2018-06-07 12:01               ` [virtio-dev] " Wei Wang
2018-06-08  1:37               ` Peter Xu
2018-06-08  1:58                 ` Peter Xu
2018-06-08  1:58                 ` Michael S. Tsirkin
2018-06-08  1:58                   ` [virtio-dev] " Michael S. Tsirkin
2018-06-08  2:34                   ` Peter Xu
2018-06-08  2:49                     ` Michael S. Tsirkin
2018-06-08  2:49                       ` [virtio-dev] " Michael S. Tsirkin
2018-06-08  3:34                       ` Peter Xu
2018-04-24  6:13 ` [Qemu-devel] [PATCH v7 5/5] migration: use the free page hint feature from balloon Wei Wang
2018-04-24  6:13   ` [virtio-dev] " Wei Wang
2018-04-24  6:42 ` [Qemu-devel] [PATCH v7 0/5] virtio-balloon: free page hint reporting support Wei Wang
2018-04-24  6:42   ` [virtio-dev] " Wei Wang
2018-05-14  1:22 ` [Qemu-devel] " Wei Wang
2018-05-14  1:22   ` [virtio-dev] " Wei Wang
2018-05-29 15:00 ` [Qemu-devel] " Hailiang Zhang
2018-05-29 15:24   ` Michael S. Tsirkin
2018-05-29 15:24     ` [virtio-dev] " Michael S. Tsirkin
2018-06-01  4:58 ` Peter Xu
2018-06-01  5:07   ` Peter Xu
2018-06-01  7:29     ` Wei Wang
2018-06-01  7:29       ` [virtio-dev] " Wei Wang
2018-06-01 10:02       ` Peter Xu
2018-06-01 12:31         ` Wei Wang
2018-06-01 12:31           ` [virtio-dev] " Wei Wang
2018-06-01  7:21   ` Wei Wang
2018-06-01  7:21     ` [virtio-dev] " Wei Wang
2018-06-01 10:40     ` Peter Xu
2018-06-01 15:33       ` Dr. David Alan Gilbert
2018-06-05  6:42         ` Peter Xu
2018-06-05 14:40           ` Michael S. Tsirkin
2018-06-05 14:40             ` [virtio-dev] " Michael S. Tsirkin
2018-06-05 14:39         ` Michael S. Tsirkin
2018-06-05 14:39           ` [virtio-dev] " Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.