All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/5] virtio-balloon: free page hint reporting support
@ 2018-03-16 10:48 ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.

- Test Environment
    Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    Guest: 8G RAM, 4 vCPU
    Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
    - Idle Guest Live Migration Time (results are averaged over 10 runs):
        - Optimization v.s. Legacy = 261ms vs 1769ms --> ~86% reduction
    - Guest with Linux Compilation Workload (make bzImage -j4):
        - Live Migration Time (average)
          Optimization v.s. Legacy = 1260ms v.s. 2634ms --> ~51% reduction
        - Linux Compilation Time
          Optimization v.s. Legacy = 4min58s v.s. 5min3s
          --> no obvious difference

- Source Code
    - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
    - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

ChangeLog:
v4->v5:
    1) migration:
        - bitmap_clear_dirty: update the dirty bitmap and dirty page
          count under the bitmap mutex as what other functions are doing;
        - qemu_guest_free_page_hint:
            - add comments for this function;
            - check the !block case;
            - check "offset > block->used_length" before proceed;
            - assign used_len inside the for{} body;
            - update the dirty bitmap and dirty page counter under the
              bitmap mutex;
        - ram_state_reset:
            - rs->free_page_support: && with use "migrate_postcopy"
              instead of migration_in_postcopy;
            - clear the ram_bulk_stage flag if free_page_support is true;
    2) balloon:
         - add the usage documentation of balloon_free_page_start and
           balloon_free_page_stop in code;
         - the optimization thread is named "balloon_fpo" to meet the
           requirement of "less than 14 characters";
         - virtio_balloon_poll_free_page_hints:
             - run on condition when runstate_is_running() is true;
             - add a qemu spin lock to synchronize accesses to the free
               page reporting related fields shared among the migration
               thread and the optimization thread;
          - virtio_balloon_free_page_start: just return if
            runstate_is_running is false;
          - virtio_balloon_free_page_stop: access to the free page
            reporting related fields under a qemu spin lock;
          - virtio_balloon_device_unrealize/reset: call
            virtio_balloon_free_page_stop is the free page hint feature is
            used;
          - virtio_balloon_set_status: call irtio_balloon_free_page_stop
            in case the guest is stopped by qmp when the optimization is
            running;
v3->v4:
    1) bitmap: add a new API to count 1s starting from an offset of a
       bitmap
    2) migration:
        - qemu_guest_free_page_hint: calculate
          ram_state->migration_dirty_pages by counting how many bits of
          free pages are truely cleared. If some of the bits were
          already 0, they shouldn't be deducted by
          ram_state->migration_dirty_pages. This wasn't needed for
          previous versions since we optimized bulk stage only,
          where all bits are guaranteed to be set. It's needed now
          because we extened the usage of this optimizaton to all stages
          except the last stop&copy stage. From 2nd stage onward, there
          are possibilities that some bits of free pages are already 0.
     3) virtio-balloon:
         - virtio_balloon_free_page_report_status: introduce a new status,
           FREE_PAGE_REPORT_S_EXIT. This status indicates that the
           optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
           the reporting is stopped, but the optimization thread still needs
           to be joined by the migration thread.
v2->v3:
    1) virtio-balloon
        - virtio_balloon_free_page_start: poll the hints using a new
          thread;
        - use cmd id between [0x80000000, UINT_MAX];
        - virtio_balloon_poll_free_page_hints:
            - stop the optimization only when it has started;
            - don't skip free pages when !poison_val;
        - add poison_val to vmsd to migrate;
        - virtio_balloon_get_features: add the F_PAGE_POISON feature when
          host has F_FREE_PAGE_HINT;
        - remove the timer patch which is not needed now.
    2) migration
       - new api, qemu_guest_free_page_hint;
       - rs->free_page_support set only in the precopy case;
       - use the new balloon APIs.
v1->v2: 
    1) virtio-balloon
        - use subsections to save free_page_report_cmd_id;
        - poll the free page vq after sending a cmd id to the driver;
        - change the free page vq size to VIRTQUEUE_MAX_SIZE;
        - virtio_balloon_poll_free_page_hints: handle the corner case
          that the free page block reported from the driver may cross
          the RAMBlock boundary.
    2) migration/ram.c
        - use balloon_free_page_poll to start the optimization

Wei Wang (5):
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  migration: use the free page hint feature from balloon

 balloon.c                                       |  58 +++++--
 hw/virtio/virtio-balloon.c                      | 217 ++++++++++++++++++++++--
 include/hw/virtio/virtio-balloon.h              |  20 ++-
 include/migration/misc.h                        |   2 +
 include/qemu/bitmap.h                           |  13 ++
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 migration/ram.c                                 |  73 +++++++-
 8 files changed, 375 insertions(+), 30 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [virtio-dev] [PATCH v5 0/5] virtio-balloon: free page hint reporting support
@ 2018-03-16 10:48 ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.

- Test Environment
    Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    Guest: 8G RAM, 4 vCPU
    Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
    - Idle Guest Live Migration Time (results are averaged over 10 runs):
        - Optimization v.s. Legacy = 261ms vs 1769ms --> ~86% reduction
    - Guest with Linux Compilation Workload (make bzImage -j4):
        - Live Migration Time (average)
          Optimization v.s. Legacy = 1260ms v.s. 2634ms --> ~51% reduction
        - Linux Compilation Time
          Optimization v.s. Legacy = 4min58s v.s. 5min3s
          --> no obvious difference

- Source Code
    - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
    - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

ChangeLog:
v4->v5:
    1) migration:
        - bitmap_clear_dirty: update the dirty bitmap and dirty page
          count under the bitmap mutex as what other functions are doing;
        - qemu_guest_free_page_hint:
            - add comments for this function;
            - check the !block case;
            - check "offset > block->used_length" before proceed;
            - assign used_len inside the for{} body;
            - update the dirty bitmap and dirty page counter under the
              bitmap mutex;
        - ram_state_reset:
            - rs->free_page_support: && with use "migrate_postcopy"
              instead of migration_in_postcopy;
            - clear the ram_bulk_stage flag if free_page_support is true;
    2) balloon:
         - add the usage documentation of balloon_free_page_start and
           balloon_free_page_stop in code;
         - the optimization thread is named "balloon_fpo" to meet the
           requirement of "less than 14 characters";
         - virtio_balloon_poll_free_page_hints:
             - run on condition when runstate_is_running() is true;
             - add a qemu spin lock to synchronize accesses to the free
               page reporting related fields shared among the migration
               thread and the optimization thread;
          - virtio_balloon_free_page_start: just return if
            runstate_is_running is false;
          - virtio_balloon_free_page_stop: access to the free page
            reporting related fields under a qemu spin lock;
          - virtio_balloon_device_unrealize/reset: call
            virtio_balloon_free_page_stop is the free page hint feature is
            used;
          - virtio_balloon_set_status: call irtio_balloon_free_page_stop
            in case the guest is stopped by qmp when the optimization is
            running;
v3->v4:
    1) bitmap: add a new API to count 1s starting from an offset of a
       bitmap
    2) migration:
        - qemu_guest_free_page_hint: calculate
          ram_state->migration_dirty_pages by counting how many bits of
          free pages are truely cleared. If some of the bits were
          already 0, they shouldn't be deducted by
          ram_state->migration_dirty_pages. This wasn't needed for
          previous versions since we optimized bulk stage only,
          where all bits are guaranteed to be set. It's needed now
          because we extened the usage of this optimizaton to all stages
          except the last stop&copy stage. From 2nd stage onward, there
          are possibilities that some bits of free pages are already 0.
     3) virtio-balloon:
         - virtio_balloon_free_page_report_status: introduce a new status,
           FREE_PAGE_REPORT_S_EXIT. This status indicates that the
           optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
           the reporting is stopped, but the optimization thread still needs
           to be joined by the migration thread.
v2->v3:
    1) virtio-balloon
        - virtio_balloon_free_page_start: poll the hints using a new
          thread;
        - use cmd id between [0x80000000, UINT_MAX];
        - virtio_balloon_poll_free_page_hints:
            - stop the optimization only when it has started;
            - don't skip free pages when !poison_val;
        - add poison_val to vmsd to migrate;
        - virtio_balloon_get_features: add the F_PAGE_POISON feature when
          host has F_FREE_PAGE_HINT;
        - remove the timer patch which is not needed now.
    2) migration
       - new api, qemu_guest_free_page_hint;
       - rs->free_page_support set only in the precopy case;
       - use the new balloon APIs.
v1->v2: 
    1) virtio-balloon
        - use subsections to save free_page_report_cmd_id;
        - poll the free page vq after sending a cmd id to the driver;
        - change the free page vq size to VIRTQUEUE_MAX_SIZE;
        - virtio_balloon_poll_free_page_hints: handle the corner case
          that the free page block reported from the driver may cross
          the RAMBlock boundary.
    2) migration/ram.c
        - use balloon_free_page_poll to start the optimization

Wei Wang (5):
  bitmap: bitmap_count_one_with_offset
  migration: use bitmap_mutex in migration_bitmap_clear_dirty
  migration: API to clear bits of guest free pages from the dirty bitmap
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  migration: use the free page hint feature from balloon

 balloon.c                                       |  58 +++++--
 hw/virtio/virtio-balloon.c                      | 217 ++++++++++++++++++++++--
 include/hw/virtio/virtio-balloon.h              |  20 ++-
 include/migration/misc.h                        |   2 +
 include/qemu/bitmap.h                           |  13 ++
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 migration/ram.c                                 |  73 +++++++-
 8 files changed, 375 insertions(+), 30 deletions(-)

-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v5 1/5] bitmap: bitmap_count_one_with_offset
  2018-03-16 10:48 ` [virtio-dev] " Wei Wang
@ 2018-03-16 10:48   ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/qemu/bitmap.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..e3f31f1 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -228,6 +228,19 @@ static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
     }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+                                                long offset, long nbits)
+{
+    long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+    long redundant_bits = offset - aligned_offset;
+    long bits_to_count = nbits + redundant_bits;
+    const unsigned long *bitmap_start = bitmap +
+                                        aligned_offset / BITS_PER_LONG;
+
+    return bitmap_count_one(bitmap_start, bits_to_count) -
+           bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [virtio-dev] [PATCH v5 1/5] bitmap: bitmap_count_one_with_offset
@ 2018-03-16 10:48   ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 include/qemu/bitmap.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..e3f31f1 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -228,6 +228,19 @@ static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
     }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+                                                long offset, long nbits)
+{
+    long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+    long redundant_bits = offset - aligned_offset;
+    long bits_to_count = nbits + redundant_bits;
+    const unsigned long *bitmap_start = bitmap +
+                                        aligned_offset / BITS_PER_LONG;
+
+    return bitmap_count_one(bitmap_start, bits_to_count) -
+           bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v5 2/5] migration: use bitmap_mutex in migration_bitmap_clear_dirty
  2018-03-16 10:48 ` [virtio-dev] " Wei Wang
@ 2018-03-16 10:48   ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages count. This patch makes
migration_bitmap_clear_dirty update the bitmap and count under the
mutex.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 7266351..38c991d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -790,11 +790,14 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 {
     bool ret;
 
+    qemu_mutex_lock(&rs->bitmap_mutex);
     ret = test_and_clear_bit(page, rb->bmap);
 
     if (ret) {
         rs->migration_dirty_pages--;
     }
+    qemu_mutex_unlock(&rs->bitmap_mutex);
+
     return ret;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [virtio-dev] [PATCH v5 2/5] migration: use bitmap_mutex in migration_bitmap_clear_dirty
@ 2018-03-16 10:48   ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The bitmap mutex is used to synchronize threads to update the dirty
bitmap and the migration_dirty_pages count. This patch makes
migration_bitmap_clear_dirty update the bitmap and count under the
mutex.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 7266351..38c991d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -790,11 +790,14 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
 {
     bool ret;
 
+    qemu_mutex_lock(&rs->bitmap_mutex);
     ret = test_and_clear_bit(page, rb->bmap);
 
     if (ret) {
         rs->migration_dirty_pages--;
     }
+    qemu_mutex_unlock(&rs->bitmap_mutex);
+
     return ret;
 }
 
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v5 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-03-16 10:48 ` [virtio-dev] " Wei Wang
@ 2018-03-16 10:48   ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 include/migration/misc.h |  2 ++
 migration/ram.c          | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 77fd4f5..fae1acf 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 38c991d..2e82181 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2193,6 +2193,50 @@ static int ram_init_all(RAMState **rsp)
 }
 
 /*
+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+    RAMBlock *block;
+    ram_addr_t offset;
+    size_t used_len, start, npages;
+
+    for (; len > 0; len -= used_len) {
+        block = qemu_ram_block_from_host(addr, false, &offset);
+        if (unlikely(!block)) {
+            return;
+        }
+
+        /*
+         * This handles the case that the RAMBlock is resized after the free
+         * page hint is reported.
+         */
+        if (unlikely(offset > block->used_length)) {
+            return;
+        }
+
+        if (len <= block->used_length - offset) {
+            used_len = len;
+        } else {
+            used_len = block->used_length - offset;
+            addr += used_len;
+        }
+
+        start = offset >> TARGET_PAGE_BITS;
+        npages = used_len >> TARGET_PAGE_BITS;
+
+        qemu_mutex_lock(&ram_state->bitmap_mutex);
+        ram_state->migration_dirty_pages -=
+                      bitmap_count_one_with_offset(block->bmap, start, npages);
+        bitmap_clear(block->bmap, start, npages);
+        qemu_mutex_unlock(&ram_state->bitmap_mutex);
+    }
+}
+
+/*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [virtio-dev] [PATCH v5 3/5] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-03-16 10:48   ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 include/migration/misc.h |  2 ++
 migration/ram.c          | 44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 77fd4f5..fae1acf 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 38c991d..2e82181 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2193,6 +2193,50 @@ static int ram_init_all(RAMState **rsp)
 }
 
 /*
+ * This function clears bits of the free pages reported by the caller from the
+ * migration dirty bitmap. @addr is the host address corresponding to the
+ * start of the continuous guest free pages, and @len is the total bytes of
+ * those pages.
+ */
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+    RAMBlock *block;
+    ram_addr_t offset;
+    size_t used_len, start, npages;
+
+    for (; len > 0; len -= used_len) {
+        block = qemu_ram_block_from_host(addr, false, &offset);
+        if (unlikely(!block)) {
+            return;
+        }
+
+        /*
+         * This handles the case that the RAMBlock is resized after the free
+         * page hint is reported.
+         */
+        if (unlikely(offset > block->used_length)) {
+            return;
+        }
+
+        if (len <= block->used_length - offset) {
+            used_len = len;
+        } else {
+            used_len = block->used_length - offset;
+            addr += used_len;
+        }
+
+        start = offset >> TARGET_PAGE_BITS;
+        npages = used_len >> TARGET_PAGE_BITS;
+
+        qemu_mutex_lock(&ram_state->bitmap_mutex);
+        ram_state->migration_dirty_pages -=
+                      bitmap_count_one_with_offset(block->bmap, start, npages);
+        bitmap_clear(block->bmap, start, npages);
+        qemu_mutex_unlock(&ram_state->bitmap_mutex);
+    }
+}
+
+/*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
  * start to become numerous it will be necessary to reduce the
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-16 10:48 ` [virtio-dev] " Wei Wang
@ 2018-03-16 10:48   ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

balloon_free_page_start - start guest free page hint reporting.
balloon_free_page_stop - stop guest free page hint reporting.

Note: balloon will report pages which were free at the time
of this call. As the reporting happens asynchronously, dirty bit logging
must be enabled before this call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
---
 balloon.c                                       |  58 +++++--
 hw/virtio/virtio-balloon.c                      | 217 ++++++++++++++++++++++--
 include/hw/virtio/virtio-balloon.h              |  20 ++-
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 5 files changed, 288 insertions(+), 29 deletions(-)

diff --git a/balloon.c b/balloon.c
index 6bf0a96..87a0410 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,9 @@
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
+static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
+static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
+static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
 static void *balloon_opaque;
 static bool balloon_inhibited;
 
@@ -64,19 +67,51 @@ static bool have_balloon(Error **errp)
     return true;
 }
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-                             QEMUBalloonStatus *stat_func, void *opaque)
+bool balloon_free_page_support(void)
 {
-    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
-        /* We're already registered one balloon handler.  How many can
-         * a guest really have?
-         */
-        return -1;
+    return balloon_free_page_support_fn &&
+           balloon_free_page_support_fn(balloon_opaque);
+}
+
+/*
+ * Balloon will report pages which were free at the time of this call. As the
+ * reporting happens asynchronously, dirty bit logging must be enabled before
+ * this call is made.
+ */
+void balloon_free_page_start(void)
+{
+    balloon_free_page_start_fn(balloon_opaque);
+}
+
+/*
+ * Guest reporting must be disabled before the migration dirty bitmap is
+ * synchronized.
+ */
+void balloon_free_page_stop(void)
+{
+    balloon_free_page_stop_fn(balloon_opaque);
+}
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque)
+{
+    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
+        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
+        balloon_opaque) {
+        /* We already registered one balloon handler. */
+        return;
     }
-    balloon_event_fn = event_func;
-    balloon_stat_fn = stat_func;
+
+    balloon_event_fn = event_fn;
+    balloon_stat_fn = stat_fn;
+    balloon_free_page_support_fn = free_page_support_fn;
+    balloon_free_page_start_fn = free_page_start_fn;
+    balloon_free_page_stop_fn = free_page_stop_fn;
     balloon_opaque = opaque;
-    return 0;
 }
 
 void qemu_remove_balloon_handler(void *opaque)
@@ -86,6 +121,9 @@ void qemu_remove_balloon_handler(void *opaque)
     }
     balloon_event_fn = NULL;
     balloon_stat_fn = NULL;
+    balloon_free_page_support_fn = NULL;
+    balloon_free_page_start_fn = NULL;
+    balloon_free_page_stop_fn = NULL;
     balloon_opaque = NULL;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index f456cea..30c7504 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -31,6 +31,7 @@
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "migration/misc.h"
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
 
@@ -308,6 +309,127 @@ out:
     }
 }
 
+static void *virtio_balloon_poll_free_page_hints(void *opaque)
+{
+    VirtQueueElement *elem;
+    VirtIOBalloon *dev = opaque;
+    VirtQueue *vq = dev->free_page_vq;
+    uint32_t id;
+    size_t size;
+
+    /* The optimization thread runs only when the guest is running. */
+    while (runstate_is_running()) {
+        qemu_spin_lock(&dev->free_page_lock);
+        /*
+         * If the migration thread actively stops the reporting, exit
+         * immediately.
+         */
+        if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
+            qemu_spin_unlock(&dev->free_page_lock);
+            break;
+        }
+
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            qemu_spin_unlock(&dev->free_page_lock);
+            continue;
+        }
+
+        if (elem->out_num) {
+            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
+            virtqueue_push(vq, elem, size);
+            g_free(elem);
+            if (unlikely(size != sizeof(id))) {
+                warn_report("%s: received an incorrect cmd id", __func__);
+                qemu_spin_unlock(&dev->free_page_lock);
+                break;
+            }
+            if (id == dev->free_page_report_cmd_id) {
+                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+            } else if (dev->free_page_report_status ==
+                       FREE_PAGE_REPORT_S_START) {
+                /*
+                 * Stop the optimization only when it has started. This avoids
+                 * a stale stop sign for the previous command.
+                 */
+                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+                qemu_spin_unlock(&dev->free_page_lock);
+                break;
+            }
+        }
+
+        if (elem->in_num) {
+            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
+                !dev->poison_val) {
+                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+                                          elem->in_sg[0].iov_len);
+            }
+            virtqueue_push(vq, elem, 0);
+            g_free(elem);
+        }
+        qemu_spin_unlock(&dev->free_page_lock);
+    }
+    return NULL;
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    if (!runstate_is_running()) {
+        return;
+    }
+
+    if (unlikely(s->free_page_report_cmd_id == UINT_MAX)) {
+        s->free_page_report_cmd_id =
+                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+    } else {
+        s->free_page_report_cmd_id++;
+    }
+
+    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+    virtio_notify_config(vdev);
+    qemu_thread_create(&s->free_page_thread, "balloon_fpo",
+                       virtio_balloon_poll_free_page_hints, s,
+                       QEMU_THREAD_JOINABLE);
+}
+
+static void virtio_balloon_free_page_stop(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    qemu_spin_lock(&s->free_page_lock);
+    switch (s->free_page_report_status) {
+    case FREE_PAGE_REPORT_S_REQUESTED:
+    case FREE_PAGE_REPORT_S_START:
+        /*
+         * The guest hasn't done the reporting, so host sends a notification
+         * to the guest to actively stop the reporting before joining the
+         * optimization thread.
+         */
+        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+        virtio_notify_config(vdev);
+    case FREE_PAGE_REPORT_S_STOP:
+        /* The guest has stopped the reporting. Join the optimization thread */
+        qemu_thread_join(&s->free_page_thread);
+        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
+    case FREE_PAGE_REPORT_S_EXIT:
+        /* The optimization thread has gone. No further actions needded. */
+        break;
+    }
+    qemu_spin_unlock(&s->free_page_lock);
+}
+
 static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
@@ -315,6 +437,15 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
+    config.poison_val = cpu_to_le32(dev->poison_val);
+
+    if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+    } else {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(dev->free_page_report_cmd_id);
+    }
 
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
@@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
                         ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
                         &error_abort);
     }
+    dev->poison_val = le32_to_cpu(config.poison_val);
     trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -377,6 +509,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+
+    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
+        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
+    }
+
     return f;
 }
 
@@ -413,6 +550,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
     return 0;
 }
 
+static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
+    .name = "virtio-balloon-device/free-page-report",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = virtio_balloon_free_page_support,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
+        VMSTATE_UINT32(poison_val, VirtIOBalloon),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
     .name = "virtio-balloon-device",
     .version_id = 1,
@@ -423,30 +572,31 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
         VMSTATE_UINT32(actual, VirtIOBalloon),
         VMSTATE_END_OF_LIST()
     },
+    .subsections = (const VMStateDescription * []) {
+        &vmstate_virtio_balloon_free_page_report,
+        NULL
+    }
 };
 
 static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
-    int ret;
 
     virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
                 sizeof(struct virtio_balloon_config));
 
-    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
-                                   virtio_balloon_stat, s);
-
-    if (ret < 0) {
-        error_setg(errp, "Only one balloon device is supported");
-        virtio_cleanup(vdev);
-        return;
-    }
-
     s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
-
+    if (virtio_has_feature(s->host_features,
+                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
+        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
+        s->free_page_report_cmd_id =
+                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
+        qemu_spin_init(&s->free_page_lock);
+    }
     reset_stats(s);
 }
 
@@ -455,6 +605,9 @@ static void virtio_balloon_device_unrealize(DeviceState *dev, Error **errp)
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
 
+    if (virtio_balloon_free_page_support(s)) {
+        virtio_balloon_free_page_stop(s);
+    }
     balloon_stats_destroy_timer(s);
     qemu_remove_balloon_handler(s);
     virtio_cleanup(vdev);
@@ -464,6 +617,10 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
 
+    if (virtio_balloon_free_page_support(s)) {
+        virtio_balloon_free_page_stop(s);
+    }
+
     if (s->stats_vq_elem != NULL) {
         virtqueue_unpop(s->svq, s->stats_vq_elem, 0);
         g_free(s->stats_vq_elem);
@@ -475,11 +632,37 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
 
-    if (!s->stats_vq_elem && vdev->vm_running &&
-        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
-        /* poll stats queue for the element we have discarded when the VM
-         * was stopped */
-        virtio_balloon_receive_stats(vdev, s->svq);
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (!s->stats_vq_elem && vdev->vm_running &&
+            virtqueue_rewind(s->svq, 1)) {
+            /*
+             * Poll stats queue for the element we have discarded when the VM
+             * was stopped.
+             */
+            virtio_balloon_receive_stats(vdev, s->svq);
+        }
+
+        if (virtio_balloon_free_page_support(s)) {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat,
+                                     virtio_balloon_free_page_support,
+                                     virtio_balloon_free_page_start,
+                                     virtio_balloon_free_page_stop,
+                                     s);
+            /*
+             * This handles the case that the guest is being stopped (e.g. by
+             * qmp commands) while the driver is still reporting hints. When
+             * the guest is woken up, it will continue to report hints, which
+             * are not needed. So when the wakeup notifier invokes the
+             * set_status callback here, we get the chance to make sure that
+             * the free page optimization thread is exited via
+             * virtio_balloon_free_page_stop.
+             */
+            virtio_balloon_free_page_stop(s);
+        } else {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat, NULL, NULL, NULL, s);
+        }
     }
 }
 
@@ -509,6 +692,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
 static Property virtio_balloon_properties[] = {
     DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
+    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 1ea13bd..cfdba37 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -23,6 +23,8 @@
 #define VIRTIO_BALLOON(obj) \
         OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
+
 typedef struct virtio_balloon_stat VirtIOBalloonStat;
 
 typedef struct virtio_balloon_stat_modern {
@@ -31,15 +33,31 @@ typedef struct virtio_balloon_stat_modern {
        uint64_t val;
 } VirtIOBalloonStatModern;
 
+enum virtio_balloon_free_page_report_status {
+    FREE_PAGE_REPORT_S_REQUESTED,
+    FREE_PAGE_REPORT_S_START,
+    FREE_PAGE_REPORT_S_STOP,
+    FREE_PAGE_REPORT_S_EXIT,
+};
+
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
+    uint32_t free_page_report_cmd_id;
+    uint32_t poison_val;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
+    QemuThread free_page_thread;
+    /*
+     * Lock to synchronize threads to access the free page reporting related
+     * fields (e.g. free_page_report_status).
+     */
+    QemuSpin free_page_lock;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
     uint32_t host_features;
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 7b0a41b..f89e80f 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,15 +34,22 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
 struct virtio_balloon_config {
 	/* Number of pages host wants Guest to give up. */
 	uint32_t num_pages;
 	/* Number of pages we've actually got in balloon. */
 	uint32_t actual;
+	/* Free page report command id, readonly by guest */
+	uint32_t free_page_report_cmd_id;
+	/* Stores PAGE_POISON if page poisoning is in use */
+	uint32_t poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index 66543ae..6561a08 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -18,11 +18,22 @@
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
 typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
+typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
+typedef void (QEMUBalloonFreePageStart)(void *opaque);
+typedef void (QEMUBalloonFreePageStop)(void *opaque);
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 bool qemu_balloon_is_inhibited(void);
 void qemu_balloon_inhibit(bool state);
+bool balloon_free_page_support(void);
+void balloon_free_page_start(void);
+void balloon_free_page_stop(void);
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque);
 
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [virtio-dev] [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-16 10:48   ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

balloon_free_page_start - start guest free page hint reporting.
balloon_free_page_stop - stop guest free page hint reporting.

Note: balloon will report pages which were free at the time
of this call. As the reporting happens asynchronously, dirty bit logging
must be enabled before this call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
---
 balloon.c                                       |  58 +++++--
 hw/virtio/virtio-balloon.c                      | 217 ++++++++++++++++++++++--
 include/hw/virtio/virtio-balloon.h              |  20 ++-
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 5 files changed, 288 insertions(+), 29 deletions(-)

diff --git a/balloon.c b/balloon.c
index 6bf0a96..87a0410 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,9 @@
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
+static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
+static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
+static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
 static void *balloon_opaque;
 static bool balloon_inhibited;
 
@@ -64,19 +67,51 @@ static bool have_balloon(Error **errp)
     return true;
 }
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-                             QEMUBalloonStatus *stat_func, void *opaque)
+bool balloon_free_page_support(void)
 {
-    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
-        /* We're already registered one balloon handler.  How many can
-         * a guest really have?
-         */
-        return -1;
+    return balloon_free_page_support_fn &&
+           balloon_free_page_support_fn(balloon_opaque);
+}
+
+/*
+ * Balloon will report pages which were free at the time of this call. As the
+ * reporting happens asynchronously, dirty bit logging must be enabled before
+ * this call is made.
+ */
+void balloon_free_page_start(void)
+{
+    balloon_free_page_start_fn(balloon_opaque);
+}
+
+/*
+ * Guest reporting must be disabled before the migration dirty bitmap is
+ * synchronized.
+ */
+void balloon_free_page_stop(void)
+{
+    balloon_free_page_stop_fn(balloon_opaque);
+}
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque)
+{
+    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
+        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
+        balloon_opaque) {
+        /* We already registered one balloon handler. */
+        return;
     }
-    balloon_event_fn = event_func;
-    balloon_stat_fn = stat_func;
+
+    balloon_event_fn = event_fn;
+    balloon_stat_fn = stat_fn;
+    balloon_free_page_support_fn = free_page_support_fn;
+    balloon_free_page_start_fn = free_page_start_fn;
+    balloon_free_page_stop_fn = free_page_stop_fn;
     balloon_opaque = opaque;
-    return 0;
 }
 
 void qemu_remove_balloon_handler(void *opaque)
@@ -86,6 +121,9 @@ void qemu_remove_balloon_handler(void *opaque)
     }
     balloon_event_fn = NULL;
     balloon_stat_fn = NULL;
+    balloon_free_page_support_fn = NULL;
+    balloon_free_page_start_fn = NULL;
+    balloon_free_page_stop_fn = NULL;
     balloon_opaque = NULL;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index f456cea..30c7504 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -31,6 +31,7 @@
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "migration/misc.h"
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
 
@@ -308,6 +309,127 @@ out:
     }
 }
 
+static void *virtio_balloon_poll_free_page_hints(void *opaque)
+{
+    VirtQueueElement *elem;
+    VirtIOBalloon *dev = opaque;
+    VirtQueue *vq = dev->free_page_vq;
+    uint32_t id;
+    size_t size;
+
+    /* The optimization thread runs only when the guest is running. */
+    while (runstate_is_running()) {
+        qemu_spin_lock(&dev->free_page_lock);
+        /*
+         * If the migration thread actively stops the reporting, exit
+         * immediately.
+         */
+        if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
+            qemu_spin_unlock(&dev->free_page_lock);
+            break;
+        }
+
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            qemu_spin_unlock(&dev->free_page_lock);
+            continue;
+        }
+
+        if (elem->out_num) {
+            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
+            virtqueue_push(vq, elem, size);
+            g_free(elem);
+            if (unlikely(size != sizeof(id))) {
+                warn_report("%s: received an incorrect cmd id", __func__);
+                qemu_spin_unlock(&dev->free_page_lock);
+                break;
+            }
+            if (id == dev->free_page_report_cmd_id) {
+                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+            } else if (dev->free_page_report_status ==
+                       FREE_PAGE_REPORT_S_START) {
+                /*
+                 * Stop the optimization only when it has started. This avoids
+                 * a stale stop sign for the previous command.
+                 */
+                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+                qemu_spin_unlock(&dev->free_page_lock);
+                break;
+            }
+        }
+
+        if (elem->in_num) {
+            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
+                !dev->poison_val) {
+                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+                                          elem->in_sg[0].iov_len);
+            }
+            virtqueue_push(vq, elem, 0);
+            g_free(elem);
+        }
+        qemu_spin_unlock(&dev->free_page_lock);
+    }
+    return NULL;
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    if (!runstate_is_running()) {
+        return;
+    }
+
+    if (unlikely(s->free_page_report_cmd_id == UINT_MAX)) {
+        s->free_page_report_cmd_id =
+                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+    } else {
+        s->free_page_report_cmd_id++;
+    }
+
+    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+    virtio_notify_config(vdev);
+    qemu_thread_create(&s->free_page_thread, "balloon_fpo",
+                       virtio_balloon_poll_free_page_hints, s,
+                       QEMU_THREAD_JOINABLE);
+}
+
+static void virtio_balloon_free_page_stop(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    qemu_spin_lock(&s->free_page_lock);
+    switch (s->free_page_report_status) {
+    case FREE_PAGE_REPORT_S_REQUESTED:
+    case FREE_PAGE_REPORT_S_START:
+        /*
+         * The guest hasn't done the reporting, so host sends a notification
+         * to the guest to actively stop the reporting before joining the
+         * optimization thread.
+         */
+        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+        virtio_notify_config(vdev);
+    case FREE_PAGE_REPORT_S_STOP:
+        /* The guest has stopped the reporting. Join the optimization thread */
+        qemu_thread_join(&s->free_page_thread);
+        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
+    case FREE_PAGE_REPORT_S_EXIT:
+        /* The optimization thread has gone. No further actions needded. */
+        break;
+    }
+    qemu_spin_unlock(&s->free_page_lock);
+}
+
 static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
@@ -315,6 +437,15 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
+    config.poison_val = cpu_to_le32(dev->poison_val);
+
+    if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+    } else {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(dev->free_page_report_cmd_id);
+    }
 
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
@@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
                         ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
                         &error_abort);
     }
+    dev->poison_val = le32_to_cpu(config.poison_val);
     trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -377,6 +509,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+
+    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
+        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
+    }
+
     return f;
 }
 
@@ -413,6 +550,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
     return 0;
 }
 
+static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
+    .name = "virtio-balloon-device/free-page-report",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = virtio_balloon_free_page_support,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
+        VMSTATE_UINT32(poison_val, VirtIOBalloon),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
     .name = "virtio-balloon-device",
     .version_id = 1,
@@ -423,30 +572,31 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
         VMSTATE_UINT32(actual, VirtIOBalloon),
         VMSTATE_END_OF_LIST()
     },
+    .subsections = (const VMStateDescription * []) {
+        &vmstate_virtio_balloon_free_page_report,
+        NULL
+    }
 };
 
 static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
-    int ret;
 
     virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
                 sizeof(struct virtio_balloon_config));
 
-    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
-                                   virtio_balloon_stat, s);
-
-    if (ret < 0) {
-        error_setg(errp, "Only one balloon device is supported");
-        virtio_cleanup(vdev);
-        return;
-    }
-
     s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
-
+    if (virtio_has_feature(s->host_features,
+                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
+        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
+        s->free_page_report_cmd_id =
+                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
+        qemu_spin_init(&s->free_page_lock);
+    }
     reset_stats(s);
 }
 
@@ -455,6 +605,9 @@ static void virtio_balloon_device_unrealize(DeviceState *dev, Error **errp)
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
 
+    if (virtio_balloon_free_page_support(s)) {
+        virtio_balloon_free_page_stop(s);
+    }
     balloon_stats_destroy_timer(s);
     qemu_remove_balloon_handler(s);
     virtio_cleanup(vdev);
@@ -464,6 +617,10 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
 
+    if (virtio_balloon_free_page_support(s)) {
+        virtio_balloon_free_page_stop(s);
+    }
+
     if (s->stats_vq_elem != NULL) {
         virtqueue_unpop(s->svq, s->stats_vq_elem, 0);
         g_free(s->stats_vq_elem);
@@ -475,11 +632,37 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
 
-    if (!s->stats_vq_elem && vdev->vm_running &&
-        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
-        /* poll stats queue for the element we have discarded when the VM
-         * was stopped */
-        virtio_balloon_receive_stats(vdev, s->svq);
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (!s->stats_vq_elem && vdev->vm_running &&
+            virtqueue_rewind(s->svq, 1)) {
+            /*
+             * Poll stats queue for the element we have discarded when the VM
+             * was stopped.
+             */
+            virtio_balloon_receive_stats(vdev, s->svq);
+        }
+
+        if (virtio_balloon_free_page_support(s)) {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat,
+                                     virtio_balloon_free_page_support,
+                                     virtio_balloon_free_page_start,
+                                     virtio_balloon_free_page_stop,
+                                     s);
+            /*
+             * This handles the case that the guest is being stopped (e.g. by
+             * qmp commands) while the driver is still reporting hints. When
+             * the guest is woken up, it will continue to report hints, which
+             * are not needed. So when the wakeup notifier invokes the
+             * set_status callback here, we get the chance to make sure that
+             * the free page optimization thread is exited via
+             * virtio_balloon_free_page_stop.
+             */
+            virtio_balloon_free_page_stop(s);
+        } else {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat, NULL, NULL, NULL, s);
+        }
     }
 }
 
@@ -509,6 +692,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
 static Property virtio_balloon_properties[] = {
     DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
+    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 1ea13bd..cfdba37 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -23,6 +23,8 @@
 #define VIRTIO_BALLOON(obj) \
         OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
+
 typedef struct virtio_balloon_stat VirtIOBalloonStat;
 
 typedef struct virtio_balloon_stat_modern {
@@ -31,15 +33,31 @@ typedef struct virtio_balloon_stat_modern {
        uint64_t val;
 } VirtIOBalloonStatModern;
 
+enum virtio_balloon_free_page_report_status {
+    FREE_PAGE_REPORT_S_REQUESTED,
+    FREE_PAGE_REPORT_S_START,
+    FREE_PAGE_REPORT_S_STOP,
+    FREE_PAGE_REPORT_S_EXIT,
+};
+
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
+    uint32_t free_page_report_cmd_id;
+    uint32_t poison_val;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
+    QemuThread free_page_thread;
+    /*
+     * Lock to synchronize threads to access the free page reporting related
+     * fields (e.g. free_page_report_status).
+     */
+    QemuSpin free_page_lock;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
     uint32_t host_features;
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 7b0a41b..f89e80f 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,15 +34,22 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
 struct virtio_balloon_config {
 	/* Number of pages host wants Guest to give up. */
 	uint32_t num_pages;
 	/* Number of pages we've actually got in balloon. */
 	uint32_t actual;
+	/* Free page report command id, readonly by guest */
+	uint32_t free_page_report_cmd_id;
+	/* Stores PAGE_POISON if page poisoning is in use */
+	uint32_t poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index 66543ae..6561a08 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -18,11 +18,22 @@
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
 typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
+typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
+typedef void (QEMUBalloonFreePageStart)(void *opaque);
+typedef void (QEMUBalloonFreePageStop)(void *opaque);
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 bool qemu_balloon_is_inhibited(void);
 void qemu_balloon_inhibit(bool state);
+bool balloon_free_page_support(void);
+void balloon_free_page_start(void);
+void balloon_free_page_stop(void);
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque);
 
 #endif
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [Qemu-devel] [PATCH v5 5/5] migration: use the free page hint feature from balloon
  2018-03-16 10:48 ` [virtio-dev] " Wei Wang
@ 2018-03-16 10:48   ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Start the free page optimization after the migration bitmap is
synchronized. This can't be used in the stop&copy phase since the guest
is paused. Make sure the guest reporting has stopped before
synchronizing the migration dirty bitmap. Currently, the optimization is
added to precopy only.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 2e82181..8589a51 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -51,6 +51,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "sysemu/balloon.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -208,6 +209,8 @@ struct RAMState {
     uint32_t last_version;
     /* We are in the first round */
     bool ram_bulk_stage;
+    /* The free pages optimization feature is supported */
+    bool free_page_support;
     /* How many times we have dirty too many pages */
     int dirty_rate_high_cnt;
     /* these variables are used for bitmap sync */
@@ -836,6 +839,10 @@ static void migration_bitmap_sync(RAMState *rs)
     int64_t end_time;
     uint64_t bytes_xfer_now;
 
+    if (rs->free_page_support) {
+        balloon_free_page_stop();
+    }
+
     ram_counters.dirty_sync_count++;
 
     if (!rs->time_last_bitmap_sync) {
@@ -902,6 +909,10 @@ static void migration_bitmap_sync(RAMState *rs)
     if (migrate_use_events()) {
         qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
     }
+
+    if (rs->free_page_support) {
+        balloon_free_page_start();
+    }
 }
 
 /**
@@ -1658,7 +1669,17 @@ static void ram_state_reset(RAMState *rs)
     rs->last_sent_block = NULL;
     rs->last_page = 0;
     rs->last_version = ram_list.version;
-    rs->ram_bulk_stage = true;
+    rs->free_page_support = balloon_free_page_support() && !migrate_postcopy();
+    if (rs->free_page_support) {
+        /*
+         * When the free page optimization is used, not all the pages are
+         * treated as dirty pages (via migration_bitmap_find_dirty) that need
+         * to be sent. So disable ram_bulk_stage in this case.
+         */
+        rs->ram_bulk_stage = false;
+    } else {
+        rs->ram_bulk_stage = true;
+    }
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -2364,6 +2385,9 @@ out:
 
     ret = qemu_file_get_error(f);
     if (ret < 0) {
+        if (rs->free_page_support) {
+            balloon_free_page_stop();
+        }
         return ret;
     }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [virtio-dev] [PATCH v5 5/5] migration: use the free page hint feature from balloon
@ 2018-03-16 10:48   ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-16 10:48 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Start the free page optimization after the migration bitmap is
synchronized. This can't be used in the stop&copy phase since the guest
is paused. Make sure the guest reporting has stopped before
synchronizing the migration dirty bitmap. Currently, the optimization is
added to precopy only.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 2e82181..8589a51 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -51,6 +51,7 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "sysemu/balloon.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -208,6 +209,8 @@ struct RAMState {
     uint32_t last_version;
     /* We are in the first round */
     bool ram_bulk_stage;
+    /* The free pages optimization feature is supported */
+    bool free_page_support;
     /* How many times we have dirty too many pages */
     int dirty_rate_high_cnt;
     /* these variables are used for bitmap sync */
@@ -836,6 +839,10 @@ static void migration_bitmap_sync(RAMState *rs)
     int64_t end_time;
     uint64_t bytes_xfer_now;
 
+    if (rs->free_page_support) {
+        balloon_free_page_stop();
+    }
+
     ram_counters.dirty_sync_count++;
 
     if (!rs->time_last_bitmap_sync) {
@@ -902,6 +909,10 @@ static void migration_bitmap_sync(RAMState *rs)
     if (migrate_use_events()) {
         qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
     }
+
+    if (rs->free_page_support) {
+        balloon_free_page_start();
+    }
 }
 
 /**
@@ -1658,7 +1669,17 @@ static void ram_state_reset(RAMState *rs)
     rs->last_sent_block = NULL;
     rs->last_page = 0;
     rs->last_version = ram_list.version;
-    rs->ram_bulk_stage = true;
+    rs->free_page_support = balloon_free_page_support() && !migrate_postcopy();
+    if (rs->free_page_support) {
+        /*
+         * When the free page optimization is used, not all the pages are
+         * treated as dirty pages (via migration_bitmap_find_dirty) that need
+         * to be sent. So disable ram_bulk_stage in this case.
+         */
+        rs->ram_bulk_stage = false;
+    } else {
+        rs->ram_bulk_stage = true;
+    }
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -2364,6 +2385,9 @@ out:
 
     ret = qemu_file_get_error(f);
     if (ret < 0) {
+        if (rs->free_page_support) {
+            balloon_free_page_stop();
+        }
         return ret;
     }
 
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-16 10:48   ` [virtio-dev] " Wei Wang
@ 2018-03-16 15:16     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2018-03-16 15:16 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> The new feature enables the virtio-balloon device to receive hints of
> guest free pages from the free page vq.
> 
> balloon_free_page_start - start guest free page hint reporting.
> balloon_free_page_stop - stop guest free page hint reporting.
> 
> Note: balloon will report pages which were free at the time
> of this call. As the reporting happens asynchronously, dirty bit logging
> must be enabled before this call is made. Guest reporting must be
> disabled before the migration dirty bitmap is synchronized.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> ---
>  balloon.c                                       |  58 +++++--
>  hw/virtio/virtio-balloon.c                      | 217 ++++++++++++++++++++++--
>  include/hw/virtio/virtio-balloon.h              |  20 ++-
>  include/standard-headers/linux/virtio_balloon.h |   7 +
>  include/sysemu/balloon.h                        |  15 +-
>  5 files changed, 288 insertions(+), 29 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index 6bf0a96..87a0410 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -36,6 +36,9 @@
>  
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
> +static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
> +static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
>  static void *balloon_opaque;
>  static bool balloon_inhibited;
>  
> @@ -64,19 +67,51 @@ static bool have_balloon(Error **errp)
>      return true;
>  }
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -                             QEMUBalloonStatus *stat_func, void *opaque)
> +bool balloon_free_page_support(void)
>  {
> -    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> -        /* We're already registered one balloon handler.  How many can
> -         * a guest really have?
> -         */
> -        return -1;
> +    return balloon_free_page_support_fn &&
> +           balloon_free_page_support_fn(balloon_opaque);
> +}
> +
> +/*
> + * Balloon will report pages which were free at the time of this call. As the
> + * reporting happens asynchronously, dirty bit logging must be enabled before
> + * this call is made.
> + */
> +void balloon_free_page_start(void)
> +{
> +    balloon_free_page_start_fn(balloon_opaque);
> +}
> +
> +/*
> + * Guest reporting must be disabled before the migration dirty bitmap is
> + * synchronized.
> + */
> +void balloon_free_page_stop(void)
> +{
> +    balloon_free_page_stop_fn(balloon_opaque);
> +}
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque)
> +{
> +    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
> +        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
> +        balloon_opaque) {
> +        /* We already registered one balloon handler. */
> +        return;
>      }
> -    balloon_event_fn = event_func;
> -    balloon_stat_fn = stat_func;
> +
> +    balloon_event_fn = event_fn;
> +    balloon_stat_fn = stat_fn;
> +    balloon_free_page_support_fn = free_page_support_fn;
> +    balloon_free_page_start_fn = free_page_start_fn;
> +    balloon_free_page_stop_fn = free_page_stop_fn;
>      balloon_opaque = opaque;
> -    return 0;
>  }
>  
>  void qemu_remove_balloon_handler(void *opaque)
> @@ -86,6 +121,9 @@ void qemu_remove_balloon_handler(void *opaque)
>      }
>      balloon_event_fn = NULL;
>      balloon_stat_fn = NULL;
> +    balloon_free_page_support_fn = NULL;
> +    balloon_free_page_start_fn = NULL;
> +    balloon_free_page_stop_fn = NULL;
>      balloon_opaque = NULL;
>  }
>  
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index f456cea..30c7504 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -31,6 +31,7 @@
>  
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "migration/misc.h"
>  
>  #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
>  
> @@ -308,6 +309,127 @@ out:
>      }
>  }
>  
> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;
> +
> +    /* The optimization thread runs only when the guest is running. */
> +    while (runstate_is_running()) {

Note that this check is not guaranteed to be correct
when checked like this outside BQL.

I think you are better off relying on status
callback to synchronise with the backend thread.


> +        qemu_spin_lock(&dev->free_page_lock);
> +        /*
> +         * If the migration thread actively stops the reporting, exit
> +         * immediately.
> +         */
> +        if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
> +            qemu_spin_unlock(&dev->free_page_lock);
> +            break;
> +        }
> +
> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +        if (!elem) {
> +            qemu_spin_unlock(&dev->free_page_lock);
> +            continue;
> +        }
> +
> +        if (elem->out_num) {
> +            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
> +            virtqueue_push(vq, elem, size);
> +            g_free(elem);
> +            if (unlikely(size != sizeof(id))) {
> +                warn_report("%s: received an incorrect cmd id", __func__);

virtio_error?

> +                qemu_spin_unlock(&dev->free_page_lock);
> +                break;
> +            }
> +            if (id == dev->free_page_report_cmd_id) {

id is LE above, but used in native endian here.

> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> +            } else if (dev->free_page_report_status ==
> +                       FREE_PAGE_REPORT_S_START) {
> +                /*
> +                 * Stop the optimization only when it has started. This avoids
> +                 * a stale stop sign for the previous command.
> +                 */
> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +                qemu_spin_unlock(&dev->free_page_lock);
> +                break;
> +            }

And else? What if it does not match and status is not start?
Don't you need to skip in elem decoding?


> +        }
> +
> +        if (elem->in_num) {
> +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
> +                !dev->poison_val) {

poison generally disables everything? Add a TODO to handle
it in he future pls.


> +                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
> +                                          elem->in_sg[0].iov_len);
> +            }
> +            virtqueue_push(vq, elem, 0);
> +            g_free(elem);
> +        }
> +        qemu_spin_unlock(&dev->free_page_lock);
> +    }
> +    return NULL;
> +}
> +
> +static bool virtio_balloon_free_page_support(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> +}
> +
> +static void virtio_balloon_free_page_start(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    if (!runstate_is_running()) {
> +        return;
> +    }
> +
> +    if (unlikely(s->free_page_report_cmd_id == UINT_MAX)) {

Don't bother with these annotations for non data path things.

> +        s->free_page_report_cmd_id =
> +                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> +    } else {
> +        s->free_page_report_cmd_id++;
> +    }
> +
> +    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
> +    virtio_notify_config(vdev);
> +    qemu_thread_create(&s->free_page_thread, "balloon_fpo",
> +                       virtio_balloon_poll_free_page_hints, s,
> +                       QEMU_THREAD_JOINABLE);
> +}
> +
> +static void virtio_balloon_free_page_stop(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    qemu_spin_lock(&s->free_page_lock);
> +    switch (s->free_page_report_status) {
> +    case FREE_PAGE_REPORT_S_REQUESTED:
> +    case FREE_PAGE_REPORT_S_START:
> +        /*
> +         * The guest hasn't done the reporting, so host sends a notification
> +         * to the guest to actively stop the reporting before joining the
> +         * optimization thread.
> +         */
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +        virtio_notify_config(vdev);
> +    case FREE_PAGE_REPORT_S_STOP:
> +        /* The guest has stopped the reporting. Join the optimization thread */
> +        qemu_thread_join(&s->free_page_thread);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
> +    case FREE_PAGE_REPORT_S_EXIT:
> +        /* The optimization thread has gone. No further actions needded. */
> +        break;
> +    }
> +    qemu_spin_unlock(&s->free_page_lock);
> +}
> +
>  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> @@ -315,6 +437,15 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  
>      config.num_pages = cpu_to_le32(dev->num_pages);
>      config.actual = cpu_to_le32(dev->actual);
> +    config.poison_val = cpu_to_le32(dev->poison_val);
> +
> +    if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
> +    } else {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(dev->free_page_report_cmd_id);
> +    }
>  
>      trace_virtio_balloon_get_config(config.num_pages, config.actual);
>      memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
> @@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
>                          &error_abort);
>      }
> +    dev->poison_val = le32_to_cpu(config.poison_val);

We probably should not assume this value is correct if guest didn't
ack the appropriate feature.


>      trace_virtio_balloon_set_config(dev->actual, oldactual);
>  }
>  
> @@ -377,6 +509,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +
> +    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> +    }
> +
>      return f;
>  }
>  
> @@ -413,6 +550,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
>      return 0;
>  }
>  
> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> +    .name = "virtio-balloon-device/free-page-report",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = virtio_balloon_free_page_support,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_virtio_balloon_device = {
>      .name = "virtio-balloon-device",
>      .version_id = 1,
> @@ -423,30 +572,31 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>          VMSTATE_UINT32(actual, VirtIOBalloon),
>          VMSTATE_END_OF_LIST()
>      },
> +    .subsections = (const VMStateDescription * []) {
> +        &vmstate_virtio_balloon_free_page_report,
> +        NULL
> +    }
>  };
>  
>  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> -    int ret;
>  
>      virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
>                  sizeof(struct virtio_balloon_config));
>  
> -    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> -                                   virtio_balloon_stat, s);
> -
> -    if (ret < 0) {
> -        error_setg(errp, "Only one balloon device is supported");
> -        virtio_cleanup(vdev);
> -        return;
> -    }
> -
>      s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> -
> +    if (virtio_has_feature(s->host_features,
> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
> +        s->free_page_report_cmd_id =
> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
> +        qemu_spin_init(&s->free_page_lock);
> +    }
>      reset_stats(s);
>  }
>  
> @@ -455,6 +605,9 @@ static void virtio_balloon_device_unrealize(DeviceState *dev, Error **errp)
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
>  
> +    if (virtio_balloon_free_page_support(s)) {
> +        virtio_balloon_free_page_stop(s);
> +    }
>      balloon_stats_destroy_timer(s);
>      qemu_remove_balloon_handler(s);
>      virtio_cleanup(vdev);
> @@ -464,6 +617,10 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> +    if (virtio_balloon_free_page_support(s)) {
> +        virtio_balloon_free_page_stop(s);
> +    }
> +
>      if (s->stats_vq_elem != NULL) {
>          virtqueue_unpop(s->svq, s->stats_vq_elem, 0);
>          g_free(s->stats_vq_elem);
> @@ -475,11 +632,37 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> -    if (!s->stats_vq_elem && vdev->vm_running &&
> -        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
> -        /* poll stats queue for the element we have discarded when the VM
> -         * was stopped */
> -        virtio_balloon_receive_stats(vdev, s->svq);
> +    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +        if (!s->stats_vq_elem && vdev->vm_running &&
> +            virtqueue_rewind(s->svq, 1)) {
> +            /*
> +             * Poll stats queue for the element we have discarded when the VM
> +             * was stopped.
> +             */
> +            virtio_balloon_receive_stats(vdev, s->svq);
> +        }
> +
> +        if (virtio_balloon_free_page_support(s)) {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat,
> +                                     virtio_balloon_free_page_support,
> +                                     virtio_balloon_free_page_start,
> +                                     virtio_balloon_free_page_stop,
> +                                     s);
> +            /*
> +             * This handles the case that the guest is being stopped (e.g. by
> +             * qmp commands) while the driver is still reporting hints. When
> +             * the guest is woken up, it will continue to report hints, which
> +             * are not needed. So when the wakeup notifier invokes the
> +             * set_status callback here, we get the chance to make sure that
> +             * the free page optimization thread is exited via
> +             * virtio_balloon_free_page_stop.
> +             */
> +            virtio_balloon_free_page_stop(s);

I don't understand the logic here at all.
So if status is set to DRIVER_OK, then we stop reporting?


> +        } else {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat, NULL, NULL, NULL, s);
> +        }
>      }
>  }
>  
> @@ -509,6 +692,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
>  static Property virtio_balloon_properties[] = {
>      DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> +    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 1ea13bd..cfdba37 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -23,6 +23,8 @@
>  #define VIRTIO_BALLOON(obj) \
>          OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
> +
>  typedef struct virtio_balloon_stat VirtIOBalloonStat;
>  
>  typedef struct virtio_balloon_stat_modern {
> @@ -31,15 +33,31 @@ typedef struct virtio_balloon_stat_modern {
>         uint64_t val;
>  } VirtIOBalloonStatModern;
>  
> +enum virtio_balloon_free_page_report_status {
> +    FREE_PAGE_REPORT_S_REQUESTED,
> +    FREE_PAGE_REPORT_S_START,
> +    FREE_PAGE_REPORT_S_STOP,
> +    FREE_PAGE_REPORT_S_EXIT,
> +};
> +
>  typedef struct VirtIOBalloon {
>      VirtIODevice parent_obj;
> -    VirtQueue *ivq, *dvq, *svq;
> +    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
> +    uint32_t free_page_report_status;
>      uint32_t num_pages;
>      uint32_t actual;
> +    uint32_t free_page_report_cmd_id;
> +    uint32_t poison_val;
>      uint64_t stats[VIRTIO_BALLOON_S_NR];
>      VirtQueueElement *stats_vq_elem;
>      size_t stats_vq_offset;
>      QEMUTimer *stats_timer;
> +    QemuThread free_page_thread;
> +    /*
> +     * Lock to synchronize threads to access the free page reporting related
> +     * fields (e.g. free_page_report_status).
> +     */
> +    QemuSpin free_page_lock;
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
>      uint32_t host_features;
> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
> index 7b0a41b..f89e80f 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -34,15 +34,22 @@
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> +#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
> +#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
>  struct virtio_balloon_config {
>  	/* Number of pages host wants Guest to give up. */
>  	uint32_t num_pages;
>  	/* Number of pages we've actually got in balloon. */
>  	uint32_t actual;
> +	/* Free page report command id, readonly by guest */
> +	uint32_t free_page_report_cmd_id;
> +	/* Stores PAGE_POISON if page poisoning is in use */
> +	uint32_t poison_val;
>  };
>  
>  #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index 66543ae..6561a08 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h
> @@ -18,11 +18,22 @@
>  
>  typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
>  typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
> +typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
> +typedef void (QEMUBalloonFreePageStart)(void *opaque);
> +typedef void (QEMUBalloonFreePageStop)(void *opaque);
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -			     QEMUBalloonStatus *stat_func, void *opaque);
>  void qemu_remove_balloon_handler(void *opaque);
>  bool qemu_balloon_is_inhibited(void);
>  void qemu_balloon_inhibit(bool state);
> +bool balloon_free_page_support(void);
> +void balloon_free_page_start(void);
> +void balloon_free_page_stop(void);
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque);
>  
>  #endif
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-16 15:16     ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2018-03-16 15:16 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> The new feature enables the virtio-balloon device to receive hints of
> guest free pages from the free page vq.
> 
> balloon_free_page_start - start guest free page hint reporting.
> balloon_free_page_stop - stop guest free page hint reporting.
> 
> Note: balloon will report pages which were free at the time
> of this call. As the reporting happens asynchronously, dirty bit logging
> must be enabled before this call is made. Guest reporting must be
> disabled before the migration dirty bitmap is synchronized.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> ---
>  balloon.c                                       |  58 +++++--
>  hw/virtio/virtio-balloon.c                      | 217 ++++++++++++++++++++++--
>  include/hw/virtio/virtio-balloon.h              |  20 ++-
>  include/standard-headers/linux/virtio_balloon.h |   7 +
>  include/sysemu/balloon.h                        |  15 +-
>  5 files changed, 288 insertions(+), 29 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index 6bf0a96..87a0410 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -36,6 +36,9 @@
>  
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
> +static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
> +static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
>  static void *balloon_opaque;
>  static bool balloon_inhibited;
>  
> @@ -64,19 +67,51 @@ static bool have_balloon(Error **errp)
>      return true;
>  }
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -                             QEMUBalloonStatus *stat_func, void *opaque)
> +bool balloon_free_page_support(void)
>  {
> -    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> -        /* We're already registered one balloon handler.  How many can
> -         * a guest really have?
> -         */
> -        return -1;
> +    return balloon_free_page_support_fn &&
> +           balloon_free_page_support_fn(balloon_opaque);
> +}
> +
> +/*
> + * Balloon will report pages which were free at the time of this call. As the
> + * reporting happens asynchronously, dirty bit logging must be enabled before
> + * this call is made.
> + */
> +void balloon_free_page_start(void)
> +{
> +    balloon_free_page_start_fn(balloon_opaque);
> +}
> +
> +/*
> + * Guest reporting must be disabled before the migration dirty bitmap is
> + * synchronized.
> + */
> +void balloon_free_page_stop(void)
> +{
> +    balloon_free_page_stop_fn(balloon_opaque);
> +}
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque)
> +{
> +    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
> +        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
> +        balloon_opaque) {
> +        /* We already registered one balloon handler. */
> +        return;
>      }
> -    balloon_event_fn = event_func;
> -    balloon_stat_fn = stat_func;
> +
> +    balloon_event_fn = event_fn;
> +    balloon_stat_fn = stat_fn;
> +    balloon_free_page_support_fn = free_page_support_fn;
> +    balloon_free_page_start_fn = free_page_start_fn;
> +    balloon_free_page_stop_fn = free_page_stop_fn;
>      balloon_opaque = opaque;
> -    return 0;
>  }
>  
>  void qemu_remove_balloon_handler(void *opaque)
> @@ -86,6 +121,9 @@ void qemu_remove_balloon_handler(void *opaque)
>      }
>      balloon_event_fn = NULL;
>      balloon_stat_fn = NULL;
> +    balloon_free_page_support_fn = NULL;
> +    balloon_free_page_start_fn = NULL;
> +    balloon_free_page_stop_fn = NULL;
>      balloon_opaque = NULL;
>  }
>  
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index f456cea..30c7504 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -31,6 +31,7 @@
>  
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "migration/misc.h"
>  
>  #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
>  
> @@ -308,6 +309,127 @@ out:
>      }
>  }
>  
> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;
> +
> +    /* The optimization thread runs only when the guest is running. */
> +    while (runstate_is_running()) {

Note that this check is not guaranteed to be correct
when checked like this outside BQL.

I think you are better off relying on status
callback to synchronise with the backend thread.


> +        qemu_spin_lock(&dev->free_page_lock);
> +        /*
> +         * If the migration thread actively stops the reporting, exit
> +         * immediately.
> +         */
> +        if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
> +            qemu_spin_unlock(&dev->free_page_lock);
> +            break;
> +        }
> +
> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +        if (!elem) {
> +            qemu_spin_unlock(&dev->free_page_lock);
> +            continue;
> +        }
> +
> +        if (elem->out_num) {
> +            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
> +            virtqueue_push(vq, elem, size);
> +            g_free(elem);
> +            if (unlikely(size != sizeof(id))) {
> +                warn_report("%s: received an incorrect cmd id", __func__);

virtio_error?

> +                qemu_spin_unlock(&dev->free_page_lock);
> +                break;
> +            }
> +            if (id == dev->free_page_report_cmd_id) {

id is LE above, but used in native endian here.

> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> +            } else if (dev->free_page_report_status ==
> +                       FREE_PAGE_REPORT_S_START) {
> +                /*
> +                 * Stop the optimization only when it has started. This avoids
> +                 * a stale stop sign for the previous command.
> +                 */
> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +                qemu_spin_unlock(&dev->free_page_lock);
> +                break;
> +            }

And else? What if it does not match and status is not start?
Don't you need to skip in elem decoding?


> +        }
> +
> +        if (elem->in_num) {
> +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
> +                !dev->poison_val) {

poison generally disables everything? Add a TODO to handle
it in he future pls.


> +                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
> +                                          elem->in_sg[0].iov_len);
> +            }
> +            virtqueue_push(vq, elem, 0);
> +            g_free(elem);
> +        }
> +        qemu_spin_unlock(&dev->free_page_lock);
> +    }
> +    return NULL;
> +}
> +
> +static bool virtio_balloon_free_page_support(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> +}
> +
> +static void virtio_balloon_free_page_start(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    if (!runstate_is_running()) {
> +        return;
> +    }
> +
> +    if (unlikely(s->free_page_report_cmd_id == UINT_MAX)) {

Don't bother with these annotations for non data path things.

> +        s->free_page_report_cmd_id =
> +                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> +    } else {
> +        s->free_page_report_cmd_id++;
> +    }
> +
> +    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
> +    virtio_notify_config(vdev);
> +    qemu_thread_create(&s->free_page_thread, "balloon_fpo",
> +                       virtio_balloon_poll_free_page_hints, s,
> +                       QEMU_THREAD_JOINABLE);
> +}
> +
> +static void virtio_balloon_free_page_stop(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    qemu_spin_lock(&s->free_page_lock);
> +    switch (s->free_page_report_status) {
> +    case FREE_PAGE_REPORT_S_REQUESTED:
> +    case FREE_PAGE_REPORT_S_START:
> +        /*
> +         * The guest hasn't done the reporting, so host sends a notification
> +         * to the guest to actively stop the reporting before joining the
> +         * optimization thread.
> +         */
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +        virtio_notify_config(vdev);
> +    case FREE_PAGE_REPORT_S_STOP:
> +        /* The guest has stopped the reporting. Join the optimization thread */
> +        qemu_thread_join(&s->free_page_thread);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
> +    case FREE_PAGE_REPORT_S_EXIT:
> +        /* The optimization thread has gone. No further actions needded. */
> +        break;
> +    }
> +    qemu_spin_unlock(&s->free_page_lock);
> +}
> +
>  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> @@ -315,6 +437,15 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  
>      config.num_pages = cpu_to_le32(dev->num_pages);
>      config.actual = cpu_to_le32(dev->actual);
> +    config.poison_val = cpu_to_le32(dev->poison_val);
> +
> +    if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
> +    } else {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(dev->free_page_report_cmd_id);
> +    }
>  
>      trace_virtio_balloon_get_config(config.num_pages, config.actual);
>      memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
> @@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
>                          &error_abort);
>      }
> +    dev->poison_val = le32_to_cpu(config.poison_val);

We probably should not assume this value is correct if guest didn't
ack the appropriate feature.


>      trace_virtio_balloon_set_config(dev->actual, oldactual);
>  }
>  
> @@ -377,6 +509,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +
> +    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> +    }
> +
>      return f;
>  }
>  
> @@ -413,6 +550,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
>      return 0;
>  }
>  
> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> +    .name = "virtio-balloon-device/free-page-report",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = virtio_balloon_free_page_support,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_virtio_balloon_device = {
>      .name = "virtio-balloon-device",
>      .version_id = 1,
> @@ -423,30 +572,31 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>          VMSTATE_UINT32(actual, VirtIOBalloon),
>          VMSTATE_END_OF_LIST()
>      },
> +    .subsections = (const VMStateDescription * []) {
> +        &vmstate_virtio_balloon_free_page_report,
> +        NULL
> +    }
>  };
>  
>  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> -    int ret;
>  
>      virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
>                  sizeof(struct virtio_balloon_config));
>  
> -    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> -                                   virtio_balloon_stat, s);
> -
> -    if (ret < 0) {
> -        error_setg(errp, "Only one balloon device is supported");
> -        virtio_cleanup(vdev);
> -        return;
> -    }
> -
>      s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> -
> +    if (virtio_has_feature(s->host_features,
> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
> +        s->free_page_report_cmd_id =
> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
> +        qemu_spin_init(&s->free_page_lock);
> +    }
>      reset_stats(s);
>  }
>  
> @@ -455,6 +605,9 @@ static void virtio_balloon_device_unrealize(DeviceState *dev, Error **errp)
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
>  
> +    if (virtio_balloon_free_page_support(s)) {
> +        virtio_balloon_free_page_stop(s);
> +    }
>      balloon_stats_destroy_timer(s);
>      qemu_remove_balloon_handler(s);
>      virtio_cleanup(vdev);
> @@ -464,6 +617,10 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> +    if (virtio_balloon_free_page_support(s)) {
> +        virtio_balloon_free_page_stop(s);
> +    }
> +
>      if (s->stats_vq_elem != NULL) {
>          virtqueue_unpop(s->svq, s->stats_vq_elem, 0);
>          g_free(s->stats_vq_elem);
> @@ -475,11 +632,37 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> -    if (!s->stats_vq_elem && vdev->vm_running &&
> -        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
> -        /* poll stats queue for the element we have discarded when the VM
> -         * was stopped */
> -        virtio_balloon_receive_stats(vdev, s->svq);
> +    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +        if (!s->stats_vq_elem && vdev->vm_running &&
> +            virtqueue_rewind(s->svq, 1)) {
> +            /*
> +             * Poll stats queue for the element we have discarded when the VM
> +             * was stopped.
> +             */
> +            virtio_balloon_receive_stats(vdev, s->svq);
> +        }
> +
> +        if (virtio_balloon_free_page_support(s)) {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat,
> +                                     virtio_balloon_free_page_support,
> +                                     virtio_balloon_free_page_start,
> +                                     virtio_balloon_free_page_stop,
> +                                     s);
> +            /*
> +             * This handles the case that the guest is being stopped (e.g. by
> +             * qmp commands) while the driver is still reporting hints. When
> +             * the guest is woken up, it will continue to report hints, which
> +             * are not needed. So when the wakeup notifier invokes the
> +             * set_status callback here, we get the chance to make sure that
> +             * the free page optimization thread is exited via
> +             * virtio_balloon_free_page_stop.
> +             */
> +            virtio_balloon_free_page_stop(s);

I don't understand the logic here at all.
So if status is set to DRIVER_OK, then we stop reporting?


> +        } else {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat, NULL, NULL, NULL, s);
> +        }
>      }
>  }
>  
> @@ -509,6 +692,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
>  static Property virtio_balloon_properties[] = {
>      DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> +    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 1ea13bd..cfdba37 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -23,6 +23,8 @@
>  #define VIRTIO_BALLOON(obj) \
>          OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
> +
>  typedef struct virtio_balloon_stat VirtIOBalloonStat;
>  
>  typedef struct virtio_balloon_stat_modern {
> @@ -31,15 +33,31 @@ typedef struct virtio_balloon_stat_modern {
>         uint64_t val;
>  } VirtIOBalloonStatModern;
>  
> +enum virtio_balloon_free_page_report_status {
> +    FREE_PAGE_REPORT_S_REQUESTED,
> +    FREE_PAGE_REPORT_S_START,
> +    FREE_PAGE_REPORT_S_STOP,
> +    FREE_PAGE_REPORT_S_EXIT,
> +};
> +
>  typedef struct VirtIOBalloon {
>      VirtIODevice parent_obj;
> -    VirtQueue *ivq, *dvq, *svq;
> +    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
> +    uint32_t free_page_report_status;
>      uint32_t num_pages;
>      uint32_t actual;
> +    uint32_t free_page_report_cmd_id;
> +    uint32_t poison_val;
>      uint64_t stats[VIRTIO_BALLOON_S_NR];
>      VirtQueueElement *stats_vq_elem;
>      size_t stats_vq_offset;
>      QEMUTimer *stats_timer;
> +    QemuThread free_page_thread;
> +    /*
> +     * Lock to synchronize threads to access the free page reporting related
> +     * fields (e.g. free_page_report_status).
> +     */
> +    QemuSpin free_page_lock;
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
>      uint32_t host_features;
> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
> index 7b0a41b..f89e80f 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -34,15 +34,22 @@
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> +#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
> +#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
>  struct virtio_balloon_config {
>  	/* Number of pages host wants Guest to give up. */
>  	uint32_t num_pages;
>  	/* Number of pages we've actually got in balloon. */
>  	uint32_t actual;
> +	/* Free page report command id, readonly by guest */
> +	uint32_t free_page_report_cmd_id;
> +	/* Stores PAGE_POISON if page poisoning is in use */
> +	uint32_t poison_val;
>  };
>  
>  #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index 66543ae..6561a08 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h
> @@ -18,11 +18,22 @@
>  
>  typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
>  typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
> +typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
> +typedef void (QEMUBalloonFreePageStart)(void *opaque);
> +typedef void (QEMUBalloonFreePageStop)(void *opaque);
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -			     QEMUBalloonStatus *stat_func, void *opaque);
>  void qemu_remove_balloon_handler(void *opaque);
>  bool qemu_balloon_is_inhibited(void);
>  void qemu_balloon_inhibit(bool state);
> +bool balloon_free_page_support(void);
> +void balloon_free_page_start(void);
> +void balloon_free_page_stop(void);
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque);
>  
>  #endif
> -- 
> 1.8.3.1

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-16 15:16     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-18 10:36       ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-18 10:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>> The new feature enables the virtio-balloon device to receive hints of
>> guest free pages from the free page vq.
>>
>> balloon_free_page_start - start guest free page hint reporting.
>> balloon_free_page_stop - stop guest free page hint reporting.
>>
>> Note: balloon will report pages which were free at the time
>> of this call. As the reporting happens asynchronously, dirty bit logging
>> must be enabled before this call is made. Guest reporting must be
>> disabled before the migration dirty bitmap is synchronized.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> ---
>>   balloon.c                                       |  58 +++++--
>>   hw/virtio/virtio-balloon.c                      | 217 ++++++++++++++++++++++--
>>   include/hw/virtio/virtio-balloon.h              |  20 ++-
>>   include/standard-headers/linux/virtio_balloon.h |   7 +
>>   include/sysemu/balloon.h                        |  15 +-
>>   5 files changed, 288 insertions(+), 29 deletions(-)
>>
>> diff --git a/balloon.c b/balloon.c
>> index 6bf0a96..87a0410 100644
>> --- a/balloon.c
>> +++ b/balloon.c
>> @@ -36,6 +36,9 @@
>>   
>>
>> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
>> +{
>> +    VirtQueueElement *elem;
>> +    VirtIOBalloon *dev = opaque;
>> +    VirtQueue *vq = dev->free_page_vq;
>> +    uint32_t id;
>> +    size_t size;
>> +
>> +    /* The optimization thread runs only when the guest is running. */
>> +    while (runstate_is_running()) {
> Note that this check is not guaranteed to be correct
> when checked like this outside BQL.
>
> I think you are better off relying on status
> callback to synchronise with the backend thread.
>

It's actually OK here, I think we don't need the guarantee. The device 
is just the consumer of the vq, essentially it doesn't have a dependency 
(i.e. won't block or cause errors) on the guest state.
For example:
1) when the device executes "while (runstate_is_running())" and finds 
that the guest is running, it proceeds;
2) the guest is stopped immediately right after the "while 
(runstate_is_running())" check;
3) the device side execution reaches virtqueue_pop(), and finds 
"elem==NULL", since the driver (provider) stops adding hints. Then it 
continues by going back to "while (runstate_is_running())", and now it 
finds the guest is stopped, and then exits.

Essentially, this runstate check is just an optimization to the case 
that the driver is stopped to provide hints while the device side 
optimization thread is still polling the empty vq (i.e. effort in vain). 
Probably, it would be better to check the runstate under "elem==NULL":

while (1) {
     ...
     elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
     if (!elem) {
             qemu_spin_unlock(&dev->free_page_lock);
             if (runstate_is_running())
                 continue;
             else
                 break;
     }
     ...
}


>> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
>> +            } else if (dev->free_page_report_status ==
>> +                       FREE_PAGE_REPORT_S_START) {
>> +                /*
>> +                 * Stop the optimization only when it has started. This avoids
>> +                 * a stale stop sign for the previous command.
>> +                 */
>> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>> +                qemu_spin_unlock(&dev->free_page_lock);
>> +                break;
>> +            }
> And else? What if it does not match and status is not start?
> Don't you need to skip in elem decoding?

No, actually we don't need "else". Please see the code inside if 
(elem->in_num) below. If the status isn't set to START, 
qemu_guest_free_page_hint will not be called to decode the elem.


>
>> +        }
>> +
>> +        if (elem->in_num) {
>> +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
>> +                !dev->poison_val) {
> poison generally disables everything? Add a TODO to handle
> it in he future pls.


OK, will add TODO in the commit.



@@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
                          &error_abort);
      }
+    dev->poison_val = le32_to_cpu(config.poison_val);

> We probably should not assume this value is correct if guest didn't
> ack the appropriate feature.


OK, I'll add a comment to explain that.


@@ -475,11 +632,37 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
  {
      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
  
-    if (!s->stats_vq_elem && vdev->vm_running &&
-        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
-        /* poll stats queue for the element we have discarded when the VM
-         * was stopped */
-        virtio_balloon_receive_stats(vdev, s->svq);
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (!s->stats_vq_elem && vdev->vm_running &&
+            virtqueue_rewind(s->svq, 1)) {
+            /*
+             * Poll stats queue for the element we have discarded when the VM
+             * was stopped.
+             */
+            virtio_balloon_receive_stats(vdev, s->svq);
+        }
+
+        if (virtio_balloon_free_page_support(s)) {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat,
+                                     virtio_balloon_free_page_support,
+                                     virtio_balloon_free_page_start,
+                                     virtio_balloon_free_page_stop,
+                                     s);
+            /*
+             * This handles the case that the guest is being stopped (e.g. by
+             * qmp commands) while the driver is still reporting hints. When
+             * the guest is woken up, it will continue to report hints, which
+             * are not needed. So when the wakeup notifier invokes the
+             * set_status callback here, we get the chance to make sure that
+             * the free page optimization thread is exited via
+             * virtio_balloon_free_page_stop.
+             */
+            virtio_balloon_free_page_stop(s);

> I don't understand the logic here at all.
> So if status is set to DRIVER_OK, then we stop reporting?
>

I thought about this more. It's not necessary to call the stop() 
function here, because we have already made the optimization thread exit 
when the guest is stopped. When the guest is woken up, it will continue 
to report, and there is no consumer of the vq (the optimization thread 
has exited). There is no functional issue here, since the driver will 
just drop the hint when the vq is full. From the optimization point of 
view, I think we can consider to replace the above sop() function with 
virtio_notify_config(vdev), which will stop the guest's unnecessary 
reporting.


Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-18 10:36       ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-18 10:36 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>> The new feature enables the virtio-balloon device to receive hints of
>> guest free pages from the free page vq.
>>
>> balloon_free_page_start - start guest free page hint reporting.
>> balloon_free_page_stop - stop guest free page hint reporting.
>>
>> Note: balloon will report pages which were free at the time
>> of this call. As the reporting happens asynchronously, dirty bit logging
>> must be enabled before this call is made. Guest reporting must be
>> disabled before the migration dirty bitmap is synchronized.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> ---
>>   balloon.c                                       |  58 +++++--
>>   hw/virtio/virtio-balloon.c                      | 217 ++++++++++++++++++++++--
>>   include/hw/virtio/virtio-balloon.h              |  20 ++-
>>   include/standard-headers/linux/virtio_balloon.h |   7 +
>>   include/sysemu/balloon.h                        |  15 +-
>>   5 files changed, 288 insertions(+), 29 deletions(-)
>>
>> diff --git a/balloon.c b/balloon.c
>> index 6bf0a96..87a0410 100644
>> --- a/balloon.c
>> +++ b/balloon.c
>> @@ -36,6 +36,9 @@
>>   
>>
>> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
>> +{
>> +    VirtQueueElement *elem;
>> +    VirtIOBalloon *dev = opaque;
>> +    VirtQueue *vq = dev->free_page_vq;
>> +    uint32_t id;
>> +    size_t size;
>> +
>> +    /* The optimization thread runs only when the guest is running. */
>> +    while (runstate_is_running()) {
> Note that this check is not guaranteed to be correct
> when checked like this outside BQL.
>
> I think you are better off relying on status
> callback to synchronise with the backend thread.
>

It's actually OK here, I think we don't need the guarantee. The device 
is just the consumer of the vq, essentially it doesn't have a dependency 
(i.e. won't block or cause errors) on the guest state.
For example:
1) when the device executes "while (runstate_is_running())" and finds 
that the guest is running, it proceeds;
2) the guest is stopped immediately right after the "while 
(runstate_is_running())" check;
3) the device side execution reaches virtqueue_pop(), and finds 
"elem==NULL", since the driver (provider) stops adding hints. Then it 
continues by going back to "while (runstate_is_running())", and now it 
finds the guest is stopped, and then exits.

Essentially, this runstate check is just an optimization to the case 
that the driver is stopped to provide hints while the device side 
optimization thread is still polling the empty vq (i.e. effort in vain). 
Probably, it would be better to check the runstate under "elem==NULL":

while (1) {
     ...
     elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
     if (!elem) {
             qemu_spin_unlock(&dev->free_page_lock);
             if (runstate_is_running())
                 continue;
             else
                 break;
     }
     ...
}


>> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
>> +            } else if (dev->free_page_report_status ==
>> +                       FREE_PAGE_REPORT_S_START) {
>> +                /*
>> +                 * Stop the optimization only when it has started. This avoids
>> +                 * a stale stop sign for the previous command.
>> +                 */
>> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>> +                qemu_spin_unlock(&dev->free_page_lock);
>> +                break;
>> +            }
> And else? What if it does not match and status is not start?
> Don't you need to skip in elem decoding?

No, actually we don't need "else". Please see the code inside if 
(elem->in_num) below. If the status isn't set to START, 
qemu_guest_free_page_hint will not be called to decode the elem.


>
>> +        }
>> +
>> +        if (elem->in_num) {
>> +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
>> +                !dev->poison_val) {
> poison generally disables everything? Add a TODO to handle
> it in he future pls.


OK, will add TODO in the commit.



@@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
                          &error_abort);
      }
+    dev->poison_val = le32_to_cpu(config.poison_val);

> We probably should not assume this value is correct if guest didn't
> ack the appropriate feature.


OK, I'll add a comment to explain that.


@@ -475,11 +632,37 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
  {
      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
  
-    if (!s->stats_vq_elem && vdev->vm_running &&
-        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
-        /* poll stats queue for the element we have discarded when the VM
-         * was stopped */
-        virtio_balloon_receive_stats(vdev, s->svq);
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (!s->stats_vq_elem && vdev->vm_running &&
+            virtqueue_rewind(s->svq, 1)) {
+            /*
+             * Poll stats queue for the element we have discarded when the VM
+             * was stopped.
+             */
+            virtio_balloon_receive_stats(vdev, s->svq);
+        }
+
+        if (virtio_balloon_free_page_support(s)) {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat,
+                                     virtio_balloon_free_page_support,
+                                     virtio_balloon_free_page_start,
+                                     virtio_balloon_free_page_stop,
+                                     s);
+            /*
+             * This handles the case that the guest is being stopped (e.g. by
+             * qmp commands) while the driver is still reporting hints. When
+             * the guest is woken up, it will continue to report hints, which
+             * are not needed. So when the wakeup notifier invokes the
+             * set_status callback here, we get the chance to make sure that
+             * the free page optimization thread is exited via
+             * virtio_balloon_free_page_stop.
+             */
+            virtio_balloon_free_page_stop(s);

> I don't understand the logic here at all.
> So if status is set to DRIVER_OK, then we stop reporting?
>

I thought about this more. It's not necessary to call the stop() 
function here, because we have already made the optimization thread exit 
when the guest is stopped. When the guest is woken up, it will continue 
to report, and there is no consumer of the vq (the optimization thread 
has exited). There is no functional issue here, since the driver will 
just drop the hint when the vq is full. From the optimization point of 
view, I think we can consider to replace the above sop() function with 
virtio_notify_config(vdev), which will stop the guest's unnecessary 
reporting.


Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-18 10:36       ` [virtio-dev] " Wei Wang
@ 2018-03-19  4:24         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2018-03-19  4:24 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> > On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> > > The new feature enables the virtio-balloon device to receive hints of
> > > guest free pages from the free page vq.
> > > 
> > > balloon_free_page_start - start guest free page hint reporting.
> > > balloon_free_page_stop - stop guest free page hint reporting.
> > > 
> > > Note: balloon will report pages which were free at the time
> > > of this call. As the reporting happens asynchronously, dirty bit logging
> > > must be enabled before this call is made. Guest reporting must be
> > > disabled before the migration dirty bitmap is synchronized.
> > > 
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > > ---
> > >   balloon.c                                       |  58 +++++--
> > >   hw/virtio/virtio-balloon.c                      | 217 ++++++++++++++++++++++--
> > >   include/hw/virtio/virtio-balloon.h              |  20 ++-
> > >   include/standard-headers/linux/virtio_balloon.h |   7 +
> > >   include/sysemu/balloon.h                        |  15 +-
> > >   5 files changed, 288 insertions(+), 29 deletions(-)
> > > 
> > > diff --git a/balloon.c b/balloon.c
> > > index 6bf0a96..87a0410 100644
> > > --- a/balloon.c
> > > +++ b/balloon.c
> > > @@ -36,6 +36,9 @@
> > > 
> > > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > > +{
> > > +    VirtQueueElement *elem;
> > > +    VirtIOBalloon *dev = opaque;
> > > +    VirtQueue *vq = dev->free_page_vq;
> > > +    uint32_t id;
> > > +    size_t size;
> > > +
> > > +    /* The optimization thread runs only when the guest is running. */
> > > +    while (runstate_is_running()) {
> > Note that this check is not guaranteed to be correct
> > when checked like this outside BQL.
> > 
> > I think you are better off relying on status
> > callback to synchronise with the backend thread.
> > 
> 
> It's actually OK here, I think we don't need the guarantee. The device is
> just the consumer of the vq, essentially it doesn't have a dependency (i.e.
> won't block or cause errors) on the guest state.
> For example:
> 1) when the device executes "while (runstate_is_running())" and finds that
> the guest is running, it proceeds;
> 2) the guest is stopped immediately right after the "while
> (runstate_is_running())" check;
> 3) the device side execution reaches virtqueue_pop(), and finds
> "elem==NULL", since the driver (provider) stops adding hints. Then it
> continues by going back to "while (runstate_is_running())", and now it finds
> the guest is stopped, and then exits.


OTOH it seems that if thread stops nothing will wake it up
whem vm is restarted. Such bahaviour change across vmstop/vmstart
is unexpected.

> Essentially, this runstate check is just an optimization to the case that
> the driver is stopped to provide hints while the device side optimization
> thread is still polling the empty vq (i.e. effort in vain). Probably, it
> would be better to check the runstate under "elem==NULL":
> 
> while (1) {
>     ...
>     elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
>     if (!elem) {
>             qemu_spin_unlock(&dev->free_page_lock);
>             if (runstate_is_running())
>                 continue;
>             else
>                 break;
>     }
>     ...
> }
> 
> 
> > > +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> > > +            } else if (dev->free_page_report_status ==
> > > +                       FREE_PAGE_REPORT_S_START) {
> > > +                /*
> > > +                 * Stop the optimization only when it has started. This avoids
> > > +                 * a stale stop sign for the previous command.
> > > +                 */
> > > +                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> > > +                qemu_spin_unlock(&dev->free_page_lock);
> > > +                break;
> > > +            }
> > And else? What if it does not match and status is not start?
> > Don't you need to skip in elem decoding?
> 
> No, actually we don't need "else". Please see the code inside if
> (elem->in_num) below. If the status isn't set to START,
> qemu_guest_free_page_hint will not be called to decode the elem.
> 
> 
> > 
> > > +        }
> > > +
> > > +        if (elem->in_num) {
> > > +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
> > > +                !dev->poison_val) {
> > poison generally disables everything? Add a TODO to handle
> > it in he future pls.
> 
> 
> OK, will add TODO in the commit.
> 
> 
> 
> @@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
>                          &error_abort);
>      }
> +    dev->poison_val = le32_to_cpu(config.poison_val);
> 
> > We probably should not assume this value is correct if guest didn't
> > ack the appropriate feature.
> 
> 
> OK, I'll add a comment to explain that.
> 
> 
> @@ -475,11 +632,37 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
> -    if (!s->stats_vq_elem && vdev->vm_running &&
> -        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
> -        /* poll stats queue for the element we have discarded when the VM
> -         * was stopped */
> -        virtio_balloon_receive_stats(vdev, s->svq);
> +    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +        if (!s->stats_vq_elem && vdev->vm_running &&
> +            virtqueue_rewind(s->svq, 1)) {
> +            /*
> +             * Poll stats queue for the element we have discarded when the VM
> +             * was stopped.
> +             */
> +            virtio_balloon_receive_stats(vdev, s->svq);
> +        }
> +
> +        if (virtio_balloon_free_page_support(s)) {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat,
> +                                     virtio_balloon_free_page_support,
> +                                     virtio_balloon_free_page_start,
> +                                     virtio_balloon_free_page_stop,
> +                                     s);
> +            /*
> +             * This handles the case that the guest is being stopped (e.g. by
> +             * qmp commands) while the driver is still reporting hints. When
> +             * the guest is woken up, it will continue to report hints, which
> +             * are not needed. So when the wakeup notifier invokes the
> +             * set_status callback here, we get the chance to make sure that
> +             * the free page optimization thread is exited via
> +             * virtio_balloon_free_page_stop.
> +             */
> +            virtio_balloon_free_page_stop(s);
> 
> > I don't understand the logic here at all.
> > So if status is set to DRIVER_OK, then we stop reporting?
> > 
> 
> I thought about this more. It's not necessary to call the stop() function
> here, because we have already made the optimization thread exit when the
> guest is stopped. When the guest is woken up, it will continue to report,
> and there is no consumer of the vq (the optimization thread has exited).
> There is no functional issue here, since the driver will just drop the hint
> when the vq is full. From the optimization point of view, I think we can
> consider to replace the above sop() function with
> virtio_notify_config(vdev), which will stop the guest's unnecessary
> reporting.

I do not understand why we want to increment the counter
on vm stop though. It does make sense to stop the thread
but why not resume where we left off when vm is resumed?


> 
> Best,
> Wei

In short the code will be easier to reason about if there's some
symmetry in how we start/stop things.  If we need the thread to run
together with the VCPU then starting/stopping it from status seems
like a reasonable choice.


-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-19  4:24         ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2018-03-19  4:24 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> > On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> > > The new feature enables the virtio-balloon device to receive hints of
> > > guest free pages from the free page vq.
> > > 
> > > balloon_free_page_start - start guest free page hint reporting.
> > > balloon_free_page_stop - stop guest free page hint reporting.
> > > 
> > > Note: balloon will report pages which were free at the time
> > > of this call. As the reporting happens asynchronously, dirty bit logging
> > > must be enabled before this call is made. Guest reporting must be
> > > disabled before the migration dirty bitmap is synchronized.
> > > 
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > > ---
> > >   balloon.c                                       |  58 +++++--
> > >   hw/virtio/virtio-balloon.c                      | 217 ++++++++++++++++++++++--
> > >   include/hw/virtio/virtio-balloon.h              |  20 ++-
> > >   include/standard-headers/linux/virtio_balloon.h |   7 +
> > >   include/sysemu/balloon.h                        |  15 +-
> > >   5 files changed, 288 insertions(+), 29 deletions(-)
> > > 
> > > diff --git a/balloon.c b/balloon.c
> > > index 6bf0a96..87a0410 100644
> > > --- a/balloon.c
> > > +++ b/balloon.c
> > > @@ -36,6 +36,9 @@
> > > 
> > > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > > +{
> > > +    VirtQueueElement *elem;
> > > +    VirtIOBalloon *dev = opaque;
> > > +    VirtQueue *vq = dev->free_page_vq;
> > > +    uint32_t id;
> > > +    size_t size;
> > > +
> > > +    /* The optimization thread runs only when the guest is running. */
> > > +    while (runstate_is_running()) {
> > Note that this check is not guaranteed to be correct
> > when checked like this outside BQL.
> > 
> > I think you are better off relying on status
> > callback to synchronise with the backend thread.
> > 
> 
> It's actually OK here, I think we don't need the guarantee. The device is
> just the consumer of the vq, essentially it doesn't have a dependency (i.e.
> won't block or cause errors) on the guest state.
> For example:
> 1) when the device executes "while (runstate_is_running())" and finds that
> the guest is running, it proceeds;
> 2) the guest is stopped immediately right after the "while
> (runstate_is_running())" check;
> 3) the device side execution reaches virtqueue_pop(), and finds
> "elem==NULL", since the driver (provider) stops adding hints. Then it
> continues by going back to "while (runstate_is_running())", and now it finds
> the guest is stopped, and then exits.


OTOH it seems that if thread stops nothing will wake it up
whem vm is restarted. Such bahaviour change across vmstop/vmstart
is unexpected.

> Essentially, this runstate check is just an optimization to the case that
> the driver is stopped to provide hints while the device side optimization
> thread is still polling the empty vq (i.e. effort in vain). Probably, it
> would be better to check the runstate under "elem==NULL":
> 
> while (1) {
>     ...
>     elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
>     if (!elem) {
>             qemu_spin_unlock(&dev->free_page_lock);
>             if (runstate_is_running())
>                 continue;
>             else
>                 break;
>     }
>     ...
> }
> 
> 
> > > +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> > > +            } else if (dev->free_page_report_status ==
> > > +                       FREE_PAGE_REPORT_S_START) {
> > > +                /*
> > > +                 * Stop the optimization only when it has started. This avoids
> > > +                 * a stale stop sign for the previous command.
> > > +                 */
> > > +                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> > > +                qemu_spin_unlock(&dev->free_page_lock);
> > > +                break;
> > > +            }
> > And else? What if it does not match and status is not start?
> > Don't you need to skip in elem decoding?
> 
> No, actually we don't need "else". Please see the code inside if
> (elem->in_num) below. If the status isn't set to START,
> qemu_guest_free_page_hint will not be called to decode the elem.
> 
> 
> > 
> > > +        }
> > > +
> > > +        if (elem->in_num) {
> > > +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
> > > +                !dev->poison_val) {
> > poison generally disables everything? Add a TODO to handle
> > it in he future pls.
> 
> 
> OK, will add TODO in the commit.
> 
> 
> 
> @@ -368,6 +499,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
>                          &error_abort);
>      }
> +    dev->poison_val = le32_to_cpu(config.poison_val);
> 
> > We probably should not assume this value is correct if guest didn't
> > ack the appropriate feature.
> 
> 
> OK, I'll add a comment to explain that.
> 
> 
> @@ -475,11 +632,37 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
> -    if (!s->stats_vq_elem && vdev->vm_running &&
> -        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
> -        /* poll stats queue for the element we have discarded when the VM
> -         * was stopped */
> -        virtio_balloon_receive_stats(vdev, s->svq);
> +    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +        if (!s->stats_vq_elem && vdev->vm_running &&
> +            virtqueue_rewind(s->svq, 1)) {
> +            /*
> +             * Poll stats queue for the element we have discarded when the VM
> +             * was stopped.
> +             */
> +            virtio_balloon_receive_stats(vdev, s->svq);
> +        }
> +
> +        if (virtio_balloon_free_page_support(s)) {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat,
> +                                     virtio_balloon_free_page_support,
> +                                     virtio_balloon_free_page_start,
> +                                     virtio_balloon_free_page_stop,
> +                                     s);
> +            /*
> +             * This handles the case that the guest is being stopped (e.g. by
> +             * qmp commands) while the driver is still reporting hints. When
> +             * the guest is woken up, it will continue to report hints, which
> +             * are not needed. So when the wakeup notifier invokes the
> +             * set_status callback here, we get the chance to make sure that
> +             * the free page optimization thread is exited via
> +             * virtio_balloon_free_page_stop.
> +             */
> +            virtio_balloon_free_page_stop(s);
> 
> > I don't understand the logic here at all.
> > So if status is set to DRIVER_OK, then we stop reporting?
> > 
> 
> I thought about this more. It's not necessary to call the stop() function
> here, because we have already made the optimization thread exit when the
> guest is stopped. When the guest is woken up, it will continue to report,
> and there is no consumer of the vq (the optimization thread has exited).
> There is no functional issue here, since the driver will just drop the hint
> when the vq is full. From the optimization point of view, I think we can
> consider to replace the above sop() function with
> virtio_notify_config(vdev), which will stop the guest's unnecessary
> reporting.

I do not understand why we want to increment the counter
on vm stop though. It does make sense to stop the thread
but why not resume where we left off when vm is resumed?


> 
> Best,
> Wei

In short the code will be easier to reason about if there's some
symmetry in how we start/stop things.  If we need the thread to run
together with the VCPU then starting/stopping it from status seems
like a reasonable choice.


-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-19  4:24         ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-19  9:01           ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-19  9:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
> On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
>> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
>>> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>
> OTOH it seems that if thread stops nothing will wake it up
> whem vm is restarted. Such bahaviour change across vmstop/vmstart
> is unexpected.
> I do not understand why we want to increment the counter
> on vm stop though. It does make sense to stop the thread
> but why not resume where we left off when vm is resumed?
>

I'm not sure which counter we incremented. But it would be clear if we 
have a high level view of how it works (it is symmetric actually). 
Basically, we start the optimization when each round starts and stop it 
at the end of each round (i.e. before we do the bitmap sync), as shown 
below:

1) 1st Round starts --> free_page_start
2) 1st Round in progress..
3) 1st Round ends  --> free_page_stop
4) 2nd Round starts --> free_page_start
5) 2nd Round in progress..
6) 2nd Round ends --> free_page_stop
......

For example, in 2), the VM is stopped. 
virtio_balloon_poll_free_page_hints finds the vq is empty (i.e. elem == 
NULL) and the runstate is stopped, the optimization thread exits 
immediately. That is, this optimization thread is gone forever (the 
optimization we can do for this round is done). We won't know when would 
the VM be woken up:
A) If the VM is woken up very soon when the migration thread is still in 
progress of 2), then in 4) a new optimization thread (not the same one 
for the first round) will be created and start the optimization for the 
2nd round as usual (If you have questions about 3) in this case, that 
free_page_stop will do nothing than just return, since the optimization 
thread has exited) ;
B) If the VM is woken up after the whole migration has ended, there is 
still no point in resuming the optimization.

I think this would be the simple design for the first release of this 
optimization. There are possibilities to improve case A) above by 
continuing optimization for the 1st Round as it is still in progress, 
but I think adding that complexity for this rare case wouldn't be 
worthwhile (at least for now). What would you think?


Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-19  9:01           ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-19  9:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
> On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
>> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
>>> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>
> OTOH it seems that if thread stops nothing will wake it up
> whem vm is restarted. Such bahaviour change across vmstop/vmstart
> is unexpected.
> I do not understand why we want to increment the counter
> on vm stop though. It does make sense to stop the thread
> but why not resume where we left off when vm is resumed?
>

I'm not sure which counter we incremented. But it would be clear if we 
have a high level view of how it works (it is symmetric actually). 
Basically, we start the optimization when each round starts and stop it 
at the end of each round (i.e. before we do the bitmap sync), as shown 
below:

1) 1st Round starts --> free_page_start
2) 1st Round in progress..
3) 1st Round ends  --> free_page_stop
4) 2nd Round starts --> free_page_start
5) 2nd Round in progress..
6) 2nd Round ends --> free_page_stop
......

For example, in 2), the VM is stopped. 
virtio_balloon_poll_free_page_hints finds the vq is empty (i.e. elem == 
NULL) and the runstate is stopped, the optimization thread exits 
immediately. That is, this optimization thread is gone forever (the 
optimization we can do for this round is done). We won't know when would 
the VM be woken up:
A) If the VM is woken up very soon when the migration thread is still in 
progress of 2), then in 4) a new optimization thread (not the same one 
for the first round) will be created and start the optimization for the 
2nd round as usual (If you have questions about 3) in this case, that 
free_page_stop will do nothing than just return, since the optimization 
thread has exited) ;
B) If the VM is woken up after the whole migration has ended, there is 
still no point in resuming the optimization.

I think this would be the simple design for the first release of this 
optimization. There are possibilities to improve case A) above by 
continuing optimization for the 1st Round as it is still in progress, 
but I think adding that complexity for this rare case wouldn't be 
worthwhile (at least for now). What would you think?


Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-19  9:01           ` Wei Wang
@ 2018-03-19 22:55             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2018-03-19 22:55 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
> On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
> > On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
> > > On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> > > > On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> > 
> > OTOH it seems that if thread stops nothing will wake it up
> > whem vm is restarted. Such bahaviour change across vmstop/vmstart
> > is unexpected.
> > I do not understand why we want to increment the counter
> > on vm stop though. It does make sense to stop the thread
> > but why not resume where we left off when vm is resumed?
> > 
> 
> I'm not sure which counter we incremented. But it would be clear if we have
> a high level view of how it works (it is symmetric actually). Basically, we
> start the optimization when each round starts and stop it at the end of each
> round (i.e. before we do the bitmap sync), as shown below:
> 
> 1) 1st Round starts --> free_page_start
> 2) 1st Round in progress..
> 3) 1st Round ends  --> free_page_stop
> 4) 2nd Round starts --> free_page_start
> 5) 2nd Round in progress..
> 6) 2nd Round ends --> free_page_stop
> ......
> 
> For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
> finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
> optimization thread exits immediately. That is, this optimization thread is
> gone forever (the optimization we can do for this round is done). We won't
> know when would the VM be woken up:
> A) If the VM is woken up very soon when the migration thread is still in
> progress of 2), then in 4) a new optimization thread (not the same one for
> the first round) will be created and start the optimization for the 2nd
> round as usual (If you have questions about 3) in this case, that
> free_page_stop will do nothing than just return, since the optimization
> thread has exited) ;
> B) If the VM is woken up after the whole migration has ended, there is still
> no point in resuming the optimization.
> 
> I think this would be the simple design for the first release of this
> optimization. There are possibilities to improve case A) above by continuing
> optimization for the 1st Round as it is still in progress, but I think
> adding that complexity for this rare case wouldn't be worthwhile (at least
> for now). What would you think?
> 
> 
> Best,
> Wei

In my opinion this just makes the patch very messy.

E.g. attempts to attach a debugger to the guest will call vmstop and
then behaviour changes. This is a receipe for heisenbugs which are then
extremely painful to debug.

It is not really hard to make things symmetrical:
e.g. if you stop on vmstop then you should start on vmstart, etc.
And stopping thread should not involve a bunch of state
changes, just stop it and that's it.

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-19 22:55             ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2018-03-19 22:55 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
> On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
> > On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
> > > On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> > > > On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> > 
> > OTOH it seems that if thread stops nothing will wake it up
> > whem vm is restarted. Such bahaviour change across vmstop/vmstart
> > is unexpected.
> > I do not understand why we want to increment the counter
> > on vm stop though. It does make sense to stop the thread
> > but why not resume where we left off when vm is resumed?
> > 
> 
> I'm not sure which counter we incremented. But it would be clear if we have
> a high level view of how it works (it is symmetric actually). Basically, we
> start the optimization when each round starts and stop it at the end of each
> round (i.e. before we do the bitmap sync), as shown below:
> 
> 1) 1st Round starts --> free_page_start
> 2) 1st Round in progress..
> 3) 1st Round ends  --> free_page_stop
> 4) 2nd Round starts --> free_page_start
> 5) 2nd Round in progress..
> 6) 2nd Round ends --> free_page_stop
> ......
> 
> For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
> finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
> optimization thread exits immediately. That is, this optimization thread is
> gone forever (the optimization we can do for this round is done). We won't
> know when would the VM be woken up:
> A) If the VM is woken up very soon when the migration thread is still in
> progress of 2), then in 4) a new optimization thread (not the same one for
> the first round) will be created and start the optimization for the 2nd
> round as usual (If you have questions about 3) in this case, that
> free_page_stop will do nothing than just return, since the optimization
> thread has exited) ;
> B) If the VM is woken up after the whole migration has ended, there is still
> no point in resuming the optimization.
> 
> I think this would be the simple design for the first release of this
> optimization. There are possibilities to improve case A) above by continuing
> optimization for the 1st Round as it is still in progress, but I think
> adding that complexity for this rare case wouldn't be worthwhile (at least
> for now). What would you think?
> 
> 
> Best,
> Wei

In my opinion this just makes the patch very messy.

E.g. attempts to attach a debugger to the guest will call vmstop and
then behaviour changes. This is a receipe for heisenbugs which are then
extremely painful to debug.

It is not really hard to make things symmetrical:
e.g. if you stop on vmstop then you should start on vmstart, etc.
And stopping thread should not involve a bunch of state
changes, just stop it and that's it.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-19 22:55             ` Michael S. Tsirkin
@ 2018-03-20  2:16               ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-20  2:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
>> On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
>>> On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
>>>> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
>>>>> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>>> OTOH it seems that if thread stops nothing will wake it up
>>> whem vm is restarted. Such bahaviour change across vmstop/vmstart
>>> is unexpected.
>>> I do not understand why we want to increment the counter
>>> on vm stop though. It does make sense to stop the thread
>>> but why not resume where we left off when vm is resumed?
>>>
>> I'm not sure which counter we incremented. But it would be clear if we have
>> a high level view of how it works (it is symmetric actually). Basically, we
>> start the optimization when each round starts and stop it at the end of each
>> round (i.e. before we do the bitmap sync), as shown below:
>>
>> 1) 1st Round starts --> free_page_start
>> 2) 1st Round in progress..
>> 3) 1st Round ends  --> free_page_stop
>> 4) 2nd Round starts --> free_page_start
>> 5) 2nd Round in progress..
>> 6) 2nd Round ends --> free_page_stop
>> ......
>>
>> For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
>> finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
>> optimization thread exits immediately. That is, this optimization thread is
>> gone forever (the optimization we can do for this round is done). We won't
>> know when would the VM be woken up:
>> A) If the VM is woken up very soon when the migration thread is still in
>> progress of 2), then in 4) a new optimization thread (not the same one for
>> the first round) will be created and start the optimization for the 2nd
>> round as usual (If you have questions about 3) in this case, that
>> free_page_stop will do nothing than just return, since the optimization
>> thread has exited) ;
>> B) If the VM is woken up after the whole migration has ended, there is still
>> no point in resuming the optimization.
>>
>> I think this would be the simple design for the first release of this
>> optimization. There are possibilities to improve case A) above by continuing
>> optimization for the 1st Round as it is still in progress, but I think
>> adding that complexity for this rare case wouldn't be worthwhile (at least
>> for now). What would you think?
>>
>>
>> Best,
>> Wei
> In my opinion this just makes the patch very messy.
>
> E.g. attempts to attach a debugger to the guest will call vmstop and
> then behaviour changes. This is a receipe for heisenbugs which are then
> extremely painful to debug.
>
> It is not really hard to make things symmetrical:
> e.g. if you stop on vmstop then you should start on vmstart, etc.
> And stopping thread should not involve a bunch of state
> changes, just stop it and that's it.
>

"stop it" - do you mean to
1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints 
exit the while loop and return NULL); or
2) keep the thread staying in the while loop but yield running (e.g. 
sleep(1) or block on a mutex)? (or please let me know if you suggested a 
different implementation about stopping the thread)

If we go with 1), then we would not be able to resume the thread.
If we go with 2), then there is no guarantee to make the thread continue 
to run immediately (there will be a scheduling delay which is not 
predictable) when the vm is woken up. If the thread cannot run 
immediately when the the VM is woken up, it will be effectively the same 
as 1).

In terms of heisenbugs, I think we can recommend developers (of this 
feature) to use methods that don't stop the VM (e.g. print).

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-20  2:16               ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-20  2:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
> On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
>> On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
>>> On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
>>>> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
>>>>> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>>> OTOH it seems that if thread stops nothing will wake it up
>>> whem vm is restarted. Such bahaviour change across vmstop/vmstart
>>> is unexpected.
>>> I do not understand why we want to increment the counter
>>> on vm stop though. It does make sense to stop the thread
>>> but why not resume where we left off when vm is resumed?
>>>
>> I'm not sure which counter we incremented. But it would be clear if we have
>> a high level view of how it works (it is symmetric actually). Basically, we
>> start the optimization when each round starts and stop it at the end of each
>> round (i.e. before we do the bitmap sync), as shown below:
>>
>> 1) 1st Round starts --> free_page_start
>> 2) 1st Round in progress..
>> 3) 1st Round ends  --> free_page_stop
>> 4) 2nd Round starts --> free_page_start
>> 5) 2nd Round in progress..
>> 6) 2nd Round ends --> free_page_stop
>> ......
>>
>> For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
>> finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
>> optimization thread exits immediately. That is, this optimization thread is
>> gone forever (the optimization we can do for this round is done). We won't
>> know when would the VM be woken up:
>> A) If the VM is woken up very soon when the migration thread is still in
>> progress of 2), then in 4) a new optimization thread (not the same one for
>> the first round) will be created and start the optimization for the 2nd
>> round as usual (If you have questions about 3) in this case, that
>> free_page_stop will do nothing than just return, since the optimization
>> thread has exited) ;
>> B) If the VM is woken up after the whole migration has ended, there is still
>> no point in resuming the optimization.
>>
>> I think this would be the simple design for the first release of this
>> optimization. There are possibilities to improve case A) above by continuing
>> optimization for the 1st Round as it is still in progress, but I think
>> adding that complexity for this rare case wouldn't be worthwhile (at least
>> for now). What would you think?
>>
>>
>> Best,
>> Wei
> In my opinion this just makes the patch very messy.
>
> E.g. attempts to attach a debugger to the guest will call vmstop and
> then behaviour changes. This is a receipe for heisenbugs which are then
> extremely painful to debug.
>
> It is not really hard to make things symmetrical:
> e.g. if you stop on vmstop then you should start on vmstart, etc.
> And stopping thread should not involve a bunch of state
> changes, just stop it and that's it.
>

"stop it" - do you mean to
1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints 
exit the while loop and return NULL); or
2) keep the thread staying in the while loop but yield running (e.g. 
sleep(1) or block on a mutex)? (or please let me know if you suggested a 
different implementation about stopping the thread)

If we go with 1), then we would not be able to resume the thread.
If we go with 2), then there is no guarantee to make the thread continue 
to run immediately (there will be a scheduling delay which is not 
predictable) when the vm is woken up. If the thread cannot run 
immediately when the the VM is woken up, it will be effectively the same 
as 1).

In terms of heisenbugs, I think we can recommend developers (of this 
feature) to use methods that don't stop the VM (e.g. print).

Best,
Wei



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-20  2:16               ` Wei Wang
@ 2018-03-20  2:59                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  2:59 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
> On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
> > On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
> > > On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
> > > > On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
> > > > > On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> > > > > > On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> > > > OTOH it seems that if thread stops nothing will wake it up
> > > > whem vm is restarted. Such bahaviour change across vmstop/vmstart
> > > > is unexpected.
> > > > I do not understand why we want to increment the counter
> > > > on vm stop though. It does make sense to stop the thread
> > > > but why not resume where we left off when vm is resumed?
> > > > 
> > > I'm not sure which counter we incremented. But it would be clear if we have
> > > a high level view of how it works (it is symmetric actually). Basically, we
> > > start the optimization when each round starts and stop it at the end of each
> > > round (i.e. before we do the bitmap sync), as shown below:
> > > 
> > > 1) 1st Round starts --> free_page_start
> > > 2) 1st Round in progress..
> > > 3) 1st Round ends  --> free_page_stop
> > > 4) 2nd Round starts --> free_page_start
> > > 5) 2nd Round in progress..
> > > 6) 2nd Round ends --> free_page_stop
> > > ......
> > > 
> > > For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
> > > finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
> > > optimization thread exits immediately. That is, this optimization thread is
> > > gone forever (the optimization we can do for this round is done). We won't
> > > know when would the VM be woken up:
> > > A) If the VM is woken up very soon when the migration thread is still in
> > > progress of 2), then in 4) a new optimization thread (not the same one for
> > > the first round) will be created and start the optimization for the 2nd
> > > round as usual (If you have questions about 3) in this case, that
> > > free_page_stop will do nothing than just return, since the optimization
> > > thread has exited) ;
> > > B) If the VM is woken up after the whole migration has ended, there is still
> > > no point in resuming the optimization.
> > > 
> > > I think this would be the simple design for the first release of this
> > > optimization. There are possibilities to improve case A) above by continuing
> > > optimization for the 1st Round as it is still in progress, but I think
> > > adding that complexity for this rare case wouldn't be worthwhile (at least
> > > for now). What would you think?
> > > 
> > > 
> > > Best,
> > > Wei
> > In my opinion this just makes the patch very messy.
> > 
> > E.g. attempts to attach a debugger to the guest will call vmstop and
> > then behaviour changes. This is a receipe for heisenbugs which are then
> > extremely painful to debug.
> > 
> > It is not really hard to make things symmetrical:
> > e.g. if you stop on vmstop then you should start on vmstart, etc.
> > And stopping thread should not involve a bunch of state
> > changes, just stop it and that's it.
> > 
> 
> "stop it" - do you mean to
> 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
> the while loop and return NULL); or
> 2) keep the thread staying in the while loop but yield running (e.g.
> sleep(1) or block on a mutex)? (or please let me know if you suggested a
> different implementation about stopping the thread)

I would say it makes more sense to make it block on something.

BTW I still think you are engaging in premature optimization here.
What you are doing here is a "data plane for balloon".
I would make the feature work first by processing this in a BH.
Creating threads immediately opens up questions of isolation,
cgroups etc.

> If we go with 1), then we would not be able to resume the thread.
> If we go with 2), then there is no guarantee to make the thread continue to
> run immediately (there will be a scheduling delay which is not predictable)
> when the vm is woken up. If the thread cannot run immediately when the the
> VM is woken up, it will be effectively the same as 1).
> 
> In terms of heisenbugs, I think we can recommend developers (of this
> feature) to use methods that don't stop the VM (e.g. print).
> 
> Best,
> Wei

Sorry this does not makes sense to me.  The reality is that for most
people migration works just fine.  These patches implement a niche
optimization. #1 priority should be to break no existing flows.

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-20  2:59                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  2:59 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
> On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
> > On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
> > > On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
> > > > On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
> > > > > On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> > > > > > On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> > > > OTOH it seems that if thread stops nothing will wake it up
> > > > whem vm is restarted. Such bahaviour change across vmstop/vmstart
> > > > is unexpected.
> > > > I do not understand why we want to increment the counter
> > > > on vm stop though. It does make sense to stop the thread
> > > > but why not resume where we left off when vm is resumed?
> > > > 
> > > I'm not sure which counter we incremented. But it would be clear if we have
> > > a high level view of how it works (it is symmetric actually). Basically, we
> > > start the optimization when each round starts and stop it at the end of each
> > > round (i.e. before we do the bitmap sync), as shown below:
> > > 
> > > 1) 1st Round starts --> free_page_start
> > > 2) 1st Round in progress..
> > > 3) 1st Round ends  --> free_page_stop
> > > 4) 2nd Round starts --> free_page_start
> > > 5) 2nd Round in progress..
> > > 6) 2nd Round ends --> free_page_stop
> > > ......
> > > 
> > > For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
> > > finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
> > > optimization thread exits immediately. That is, this optimization thread is
> > > gone forever (the optimization we can do for this round is done). We won't
> > > know when would the VM be woken up:
> > > A) If the VM is woken up very soon when the migration thread is still in
> > > progress of 2), then in 4) a new optimization thread (not the same one for
> > > the first round) will be created and start the optimization for the 2nd
> > > round as usual (If you have questions about 3) in this case, that
> > > free_page_stop will do nothing than just return, since the optimization
> > > thread has exited) ;
> > > B) If the VM is woken up after the whole migration has ended, there is still
> > > no point in resuming the optimization.
> > > 
> > > I think this would be the simple design for the first release of this
> > > optimization. There are possibilities to improve case A) above by continuing
> > > optimization for the 1st Round as it is still in progress, but I think
> > > adding that complexity for this rare case wouldn't be worthwhile (at least
> > > for now). What would you think?
> > > 
> > > 
> > > Best,
> > > Wei
> > In my opinion this just makes the patch very messy.
> > 
> > E.g. attempts to attach a debugger to the guest will call vmstop and
> > then behaviour changes. This is a receipe for heisenbugs which are then
> > extremely painful to debug.
> > 
> > It is not really hard to make things symmetrical:
> > e.g. if you stop on vmstop then you should start on vmstart, etc.
> > And stopping thread should not involve a bunch of state
> > changes, just stop it and that's it.
> > 
> 
> "stop it" - do you mean to
> 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
> the while loop and return NULL); or
> 2) keep the thread staying in the while loop but yield running (e.g.
> sleep(1) or block on a mutex)? (or please let me know if you suggested a
> different implementation about stopping the thread)

I would say it makes more sense to make it block on something.

BTW I still think you are engaging in premature optimization here.
What you are doing here is a "data plane for balloon".
I would make the feature work first by processing this in a BH.
Creating threads immediately opens up questions of isolation,
cgroups etc.

> If we go with 1), then we would not be able to resume the thread.
> If we go with 2), then there is no guarantee to make the thread continue to
> run immediately (there will be a scheduling delay which is not predictable)
> when the vm is woken up. If the thread cannot run immediately when the the
> VM is woken up, it will be effectively the same as 1).
> 
> In terms of heisenbugs, I think we can recommend developers (of this
> feature) to use methods that don't stop the VM (e.g. print).
> 
> Best,
> Wei

Sorry this does not makes sense to me.  The reality is that for most
people migration works just fine.  These patches implement a niche
optimization. #1 priority should be to break no existing flows.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-20  2:59                 ` Michael S. Tsirkin
@ 2018-03-20  3:18                   ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-20  3:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/20/2018 10:59 AM, Michael S. Tsirkin wrote:
> On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
>> On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
>>> On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
>>>> On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
>>>>> On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
>>>>>> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
>>>>>>> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>>>>> OTOH it seems that if thread stops nothing will wake it up
>>>>> whem vm is restarted. Such bahaviour change across vmstop/vmstart
>>>>> is unexpected.
>>>>> I do not understand why we want to increment the counter
>>>>> on vm stop though. It does make sense to stop the thread
>>>>> but why not resume where we left off when vm is resumed?
>>>>>
>>>> I'm not sure which counter we incremented. But it would be clear if we have
>>>> a high level view of how it works (it is symmetric actually). Basically, we
>>>> start the optimization when each round starts and stop it at the end of each
>>>> round (i.e. before we do the bitmap sync), as shown below:
>>>>
>>>> 1) 1st Round starts --> free_page_start
>>>> 2) 1st Round in progress..
>>>> 3) 1st Round ends  --> free_page_stop
>>>> 4) 2nd Round starts --> free_page_start
>>>> 5) 2nd Round in progress..
>>>> 6) 2nd Round ends --> free_page_stop
>>>> ......
>>>>
>>>> For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
>>>> finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
>>>> optimization thread exits immediately. That is, this optimization thread is
>>>> gone forever (the optimization we can do for this round is done). We won't
>>>> know when would the VM be woken up:
>>>> A) If the VM is woken up very soon when the migration thread is still in
>>>> progress of 2), then in 4) a new optimization thread (not the same one for
>>>> the first round) will be created and start the optimization for the 2nd
>>>> round as usual (If you have questions about 3) in this case, that
>>>> free_page_stop will do nothing than just return, since the optimization
>>>> thread has exited) ;
>>>> B) If the VM is woken up after the whole migration has ended, there is still
>>>> no point in resuming the optimization.
>>>>
>>>> I think this would be the simple design for the first release of this
>>>> optimization. There are possibilities to improve case A) above by continuing
>>>> optimization for the 1st Round as it is still in progress, but I think
>>>> adding that complexity for this rare case wouldn't be worthwhile (at least
>>>> for now). What would you think?
>>>>
>>>>
>>>> Best,
>>>> Wei
>>> In my opinion this just makes the patch very messy.
>>>
>>> E.g. attempts to attach a debugger to the guest will call vmstop and
>>> then behaviour changes. This is a receipe for heisenbugs which are then
>>> extremely painful to debug.
>>>
>>> It is not really hard to make things symmetrical:
>>> e.g. if you stop on vmstop then you should start on vmstart, etc.
>>> And stopping thread should not involve a bunch of state
>>> changes, just stop it and that's it.
>>>
>> "stop it" - do you mean to
>> 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
>> the while loop and return NULL); or
>> 2) keep the thread staying in the while loop but yield running (e.g.
>> sleep(1) or block on a mutex)? (or please let me know if you suggested a
>> different implementation about stopping the thread)
> I would say it makes more sense to make it block on something.
>
> BTW I still think you are engaging in premature optimization here.
> What you are doing here is a "data plane for balloon".
> I would make the feature work first by processing this in a BH.
> Creating threads immediately opens up questions of isolation,
> cgroups etc.
>

Could you please share more about how creating threads affects isolation 
and cgroup? and how does BH solve it? Thanks.

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-20  3:18                   ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-20  3:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/20/2018 10:59 AM, Michael S. Tsirkin wrote:
> On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
>> On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
>>> On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
>>>> On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
>>>>> On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
>>>>>> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
>>>>>>> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>>>>> OTOH it seems that if thread stops nothing will wake it up
>>>>> whem vm is restarted. Such bahaviour change across vmstop/vmstart
>>>>> is unexpected.
>>>>> I do not understand why we want to increment the counter
>>>>> on vm stop though. It does make sense to stop the thread
>>>>> but why not resume where we left off when vm is resumed?
>>>>>
>>>> I'm not sure which counter we incremented. But it would be clear if we have
>>>> a high level view of how it works (it is symmetric actually). Basically, we
>>>> start the optimization when each round starts and stop it at the end of each
>>>> round (i.e. before we do the bitmap sync), as shown below:
>>>>
>>>> 1) 1st Round starts --> free_page_start
>>>> 2) 1st Round in progress..
>>>> 3) 1st Round ends  --> free_page_stop
>>>> 4) 2nd Round starts --> free_page_start
>>>> 5) 2nd Round in progress..
>>>> 6) 2nd Round ends --> free_page_stop
>>>> ......
>>>>
>>>> For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
>>>> finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
>>>> optimization thread exits immediately. That is, this optimization thread is
>>>> gone forever (the optimization we can do for this round is done). We won't
>>>> know when would the VM be woken up:
>>>> A) If the VM is woken up very soon when the migration thread is still in
>>>> progress of 2), then in 4) a new optimization thread (not the same one for
>>>> the first round) will be created and start the optimization for the 2nd
>>>> round as usual (If you have questions about 3) in this case, that
>>>> free_page_stop will do nothing than just return, since the optimization
>>>> thread has exited) ;
>>>> B) If the VM is woken up after the whole migration has ended, there is still
>>>> no point in resuming the optimization.
>>>>
>>>> I think this would be the simple design for the first release of this
>>>> optimization. There are possibilities to improve case A) above by continuing
>>>> optimization for the 1st Round as it is still in progress, but I think
>>>> adding that complexity for this rare case wouldn't be worthwhile (at least
>>>> for now). What would you think?
>>>>
>>>>
>>>> Best,
>>>> Wei
>>> In my opinion this just makes the patch very messy.
>>>
>>> E.g. attempts to attach a debugger to the guest will call vmstop and
>>> then behaviour changes. This is a receipe for heisenbugs which are then
>>> extremely painful to debug.
>>>
>>> It is not really hard to make things symmetrical:
>>> e.g. if you stop on vmstop then you should start on vmstart, etc.
>>> And stopping thread should not involve a bunch of state
>>> changes, just stop it and that's it.
>>>
>> "stop it" - do you mean to
>> 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
>> the while loop and return NULL); or
>> 2) keep the thread staying in the while loop but yield running (e.g.
>> sleep(1) or block on a mutex)? (or please let me know if you suggested a
>> different implementation about stopping the thread)
> I would say it makes more sense to make it block on something.
>
> BTW I still think you are engaging in premature optimization here.
> What you are doing here is a "data plane for balloon".
> I would make the feature work first by processing this in a BH.
> Creating threads immediately opens up questions of isolation,
> cgroups etc.
>

Could you please share more about how creating threads affects isolation 
and cgroup? and how does BH solve it? Thanks.

Best,
Wei



---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-20  3:18                   ` Wei Wang
@ 2018-03-20  3:24                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:24 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Tue, Mar 20, 2018 at 11:18:23AM +0800, Wei Wang wrote:
> On 03/20/2018 10:59 AM, Michael S. Tsirkin wrote:
> > On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
> > > On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
> > > > > On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
> > > > > > On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
> > > > > > > On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> > > > > > > > On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> > > > > > OTOH it seems that if thread stops nothing will wake it up
> > > > > > whem vm is restarted. Such bahaviour change across vmstop/vmstart
> > > > > > is unexpected.
> > > > > > I do not understand why we want to increment the counter
> > > > > > on vm stop though. It does make sense to stop the thread
> > > > > > but why not resume where we left off when vm is resumed?
> > > > > > 
> > > > > I'm not sure which counter we incremented. But it would be clear if we have
> > > > > a high level view of how it works (it is symmetric actually). Basically, we
> > > > > start the optimization when each round starts and stop it at the end of each
> > > > > round (i.e. before we do the bitmap sync), as shown below:
> > > > > 
> > > > > 1) 1st Round starts --> free_page_start
> > > > > 2) 1st Round in progress..
> > > > > 3) 1st Round ends  --> free_page_stop
> > > > > 4) 2nd Round starts --> free_page_start
> > > > > 5) 2nd Round in progress..
> > > > > 6) 2nd Round ends --> free_page_stop
> > > > > ......
> > > > > 
> > > > > For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
> > > > > finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
> > > > > optimization thread exits immediately. That is, this optimization thread is
> > > > > gone forever (the optimization we can do for this round is done). We won't
> > > > > know when would the VM be woken up:
> > > > > A) If the VM is woken up very soon when the migration thread is still in
> > > > > progress of 2), then in 4) a new optimization thread (not the same one for
> > > > > the first round) will be created and start the optimization for the 2nd
> > > > > round as usual (If you have questions about 3) in this case, that
> > > > > free_page_stop will do nothing than just return, since the optimization
> > > > > thread has exited) ;
> > > > > B) If the VM is woken up after the whole migration has ended, there is still
> > > > > no point in resuming the optimization.
> > > > > 
> > > > > I think this would be the simple design for the first release of this
> > > > > optimization. There are possibilities to improve case A) above by continuing
> > > > > optimization for the 1st Round as it is still in progress, but I think
> > > > > adding that complexity for this rare case wouldn't be worthwhile (at least
> > > > > for now). What would you think?
> > > > > 
> > > > > 
> > > > > Best,
> > > > > Wei
> > > > In my opinion this just makes the patch very messy.
> > > > 
> > > > E.g. attempts to attach a debugger to the guest will call vmstop and
> > > > then behaviour changes. This is a receipe for heisenbugs which are then
> > > > extremely painful to debug.
> > > > 
> > > > It is not really hard to make things symmetrical:
> > > > e.g. if you stop on vmstop then you should start on vmstart, etc.
> > > > And stopping thread should not involve a bunch of state
> > > > changes, just stop it and that's it.
> > > > 
> > > "stop it" - do you mean to
> > > 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
> > > the while loop and return NULL); or
> > > 2) keep the thread staying in the while loop but yield running (e.g.
> > > sleep(1) or block on a mutex)? (or please let me know if you suggested a
> > > different implementation about stopping the thread)
> > I would say it makes more sense to make it block on something.
> > 
> > BTW I still think you are engaging in premature optimization here.
> > What you are doing here is a "data plane for balloon".
> > I would make the feature work first by processing this in a BH.
> > Creating threads immediately opens up questions of isolation,
> > cgroups etc.
> > 
> 
> Could you please share more about how creating threads affects isolation and
> cgroup?

When threads are coming and going, they are hard to control.

Consider the rich API libvirt exposes for controlling the io threads:

https://libvirt.org/formatdomain.html#elementsIOThreadsAllocation


> and how does BH solve it? Thanks.
> Best,
> Wei

It's late at night so I don't remember whether it's the emulator
or the io thread that runs the BH, but the point is it's
already controlled by libvirt.

-- 
MST

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-20  3:24                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 42+ messages in thread
From: Michael S. Tsirkin @ 2018-03-20  3:24 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Tue, Mar 20, 2018 at 11:18:23AM +0800, Wei Wang wrote:
> On 03/20/2018 10:59 AM, Michael S. Tsirkin wrote:
> > On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
> > > On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
> > > > On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
> > > > > On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
> > > > > > On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
> > > > > > > On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> > > > > > > > On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> > > > > > OTOH it seems that if thread stops nothing will wake it up
> > > > > > whem vm is restarted. Such bahaviour change across vmstop/vmstart
> > > > > > is unexpected.
> > > > > > I do not understand why we want to increment the counter
> > > > > > on vm stop though. It does make sense to stop the thread
> > > > > > but why not resume where we left off when vm is resumed?
> > > > > > 
> > > > > I'm not sure which counter we incremented. But it would be clear if we have
> > > > > a high level view of how it works (it is symmetric actually). Basically, we
> > > > > start the optimization when each round starts and stop it at the end of each
> > > > > round (i.e. before we do the bitmap sync), as shown below:
> > > > > 
> > > > > 1) 1st Round starts --> free_page_start
> > > > > 2) 1st Round in progress..
> > > > > 3) 1st Round ends  --> free_page_stop
> > > > > 4) 2nd Round starts --> free_page_start
> > > > > 5) 2nd Round in progress..
> > > > > 6) 2nd Round ends --> free_page_stop
> > > > > ......
> > > > > 
> > > > > For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
> > > > > finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
> > > > > optimization thread exits immediately. That is, this optimization thread is
> > > > > gone forever (the optimization we can do for this round is done). We won't
> > > > > know when would the VM be woken up:
> > > > > A) If the VM is woken up very soon when the migration thread is still in
> > > > > progress of 2), then in 4) a new optimization thread (not the same one for
> > > > > the first round) will be created and start the optimization for the 2nd
> > > > > round as usual (If you have questions about 3) in this case, that
> > > > > free_page_stop will do nothing than just return, since the optimization
> > > > > thread has exited) ;
> > > > > B) If the VM is woken up after the whole migration has ended, there is still
> > > > > no point in resuming the optimization.
> > > > > 
> > > > > I think this would be the simple design for the first release of this
> > > > > optimization. There are possibilities to improve case A) above by continuing
> > > > > optimization for the 1st Round as it is still in progress, but I think
> > > > > adding that complexity for this rare case wouldn't be worthwhile (at least
> > > > > for now). What would you think?
> > > > > 
> > > > > 
> > > > > Best,
> > > > > Wei
> > > > In my opinion this just makes the patch very messy.
> > > > 
> > > > E.g. attempts to attach a debugger to the guest will call vmstop and
> > > > then behaviour changes. This is a receipe for heisenbugs which are then
> > > > extremely painful to debug.
> > > > 
> > > > It is not really hard to make things symmetrical:
> > > > e.g. if you stop on vmstop then you should start on vmstart, etc.
> > > > And stopping thread should not involve a bunch of state
> > > > changes, just stop it and that's it.
> > > > 
> > > "stop it" - do you mean to
> > > 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
> > > the while loop and return NULL); or
> > > 2) keep the thread staying in the while loop but yield running (e.g.
> > > sleep(1) or block on a mutex)? (or please let me know if you suggested a
> > > different implementation about stopping the thread)
> > I would say it makes more sense to make it block on something.
> > 
> > BTW I still think you are engaging in premature optimization here.
> > What you are doing here is a "data plane for balloon".
> > I would make the feature work first by processing this in a BH.
> > Creating threads immediately opens up questions of isolation,
> > cgroups etc.
> > 
> 
> Could you please share more about how creating threads affects isolation and
> cgroup?

When threads are coming and going, they are hard to control.

Consider the rich API libvirt exposes for controlling the io threads:

https://libvirt.org/formatdomain.html#elementsIOThreadsAllocation


> and how does BH solve it? Thanks.
> Best,
> Wei

It's late at night so I don't remember whether it's the emulator
or the io thread that runs the BH, but the point is it's
already controlled by libvirt.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-20  3:24                     ` Michael S. Tsirkin
@ 2018-03-22  3:13                       ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-22  3:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/20/2018 11:24 AM, Michael S. Tsirkin wrote:
> On Tue, Mar 20, 2018 at 11:18:23AM +0800, Wei Wang wrote:
>> On 03/20/2018 10:59 AM, Michael S. Tsirkin wrote:
>>> On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
>>>> On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
>>>>> On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
>>>>>> On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
>>>>>>> On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
>>>>>>>> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>>>>>>> OTOH it seems that if thread stops nothing will wake it up
>>>>>>> whem vm is restarted. Such bahaviour change across vmstop/vmstart
>>>>>>> is unexpected.
>>>>>>> I do not understand why we want to increment the counter
>>>>>>> on vm stop though. It does make sense to stop the thread
>>>>>>> but why not resume where we left off when vm is resumed?
>>>>>>>
>>>>>> I'm not sure which counter we incremented. But it would be clear if we have
>>>>>> a high level view of how it works (it is symmetric actually). Basically, we
>>>>>> start the optimization when each round starts and stop it at the end of each
>>>>>> round (i.e. before we do the bitmap sync), as shown below:
>>>>>>
>>>>>> 1) 1st Round starts --> free_page_start
>>>>>> 2) 1st Round in progress..
>>>>>> 3) 1st Round ends  --> free_page_stop
>>>>>> 4) 2nd Round starts --> free_page_start
>>>>>> 5) 2nd Round in progress..
>>>>>> 6) 2nd Round ends --> free_page_stop
>>>>>> ......
>>>>>>
>>>>>> For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
>>>>>> finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
>>>>>> optimization thread exits immediately. That is, this optimization thread is
>>>>>> gone forever (the optimization we can do for this round is done). We won't
>>>>>> know when would the VM be woken up:
>>>>>> A) If the VM is woken up very soon when the migration thread is still in
>>>>>> progress of 2), then in 4) a new optimization thread (not the same one for
>>>>>> the first round) will be created and start the optimization for the 2nd
>>>>>> round as usual (If you have questions about 3) in this case, that
>>>>>> free_page_stop will do nothing than just return, since the optimization
>>>>>> thread has exited) ;
>>>>>> B) If the VM is woken up after the whole migration has ended, there is still
>>>>>> no point in resuming the optimization.
>>>>>>
>>>>>> I think this would be the simple design for the first release of this
>>>>>> optimization. There are possibilities to improve case A) above by continuing
>>>>>> optimization for the 1st Round as it is still in progress, but I think
>>>>>> adding that complexity for this rare case wouldn't be worthwhile (at least
>>>>>> for now). What would you think?
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Wei
>>>>> In my opinion this just makes the patch very messy.
>>>>>
>>>>> E.g. attempts to attach a debugger to the guest will call vmstop and
>>>>> then behaviour changes. This is a receipe for heisenbugs which are then
>>>>> extremely painful to debug.
>>>>>
>>>>> It is not really hard to make things symmetrical:
>>>>> e.g. if you stop on vmstop then you should start on vmstart, etc.
>>>>> And stopping thread should not involve a bunch of state
>>>>> changes, just stop it and that's it.
>>>>>
>>>> "stop it" - do you mean to
>>>> 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
>>>> the while loop and return NULL); or
>>>> 2) keep the thread staying in the while loop but yield running (e.g.
>>>> sleep(1) or block on a mutex)? (or please let me know if you suggested a
>>>> different implementation about stopping the thread)
>>> I would say it makes more sense to make it block on something.
>>>
>>> BTW I still think you are engaging in premature optimization here.
>>> What you are doing here is a "data plane for balloon".
>>> I would make the feature work first by processing this in a BH.
>>> Creating threads immediately opens up questions of isolation,
>>> cgroups etc.
>>>
>> Could you please share more about how creating threads affects isolation and
>> cgroup?
> When threads are coming and going, they are hard to control.
>
> Consider the rich API libvirt exposes for controlling the io threads:
>
> https://libvirt.org/formatdomain.html#elementsIOThreadsAllocation
>
>
>> and how does BH solve it? Thanks.
>> Best,
>> Wei
> It's late at night so I don't remember whether it's the emulator
> or the io thread that runs the BH, but the point is it's
> already controlled by libvirt.
>

OK. I've tried to implement it this way: create an iothread via the qemu 
cmdline option: --device 
virtio-balloon,free-page-hint=true,iothread=iothread10, and schedule a 
BH to run in the iothread context when free_page_start() is called.

I think the drawback of this method is that the iothread exists during 
the whole QEMU lifetime, which wastes CPU cycles (e.g. when live 
migration isn't in progress). The method in v5 is like migration thread, 
which comes only when the migration request is received and goes when 
migration is done. Any thought?

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-22  3:13                       ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-22  3:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/20/2018 11:24 AM, Michael S. Tsirkin wrote:
> On Tue, Mar 20, 2018 at 11:18:23AM +0800, Wei Wang wrote:
>> On 03/20/2018 10:59 AM, Michael S. Tsirkin wrote:
>>> On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
>>>> On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
>>>>> On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
>>>>>> On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
>>>>>>> On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
>>>>>>>> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>>>>>>> OTOH it seems that if thread stops nothing will wake it up
>>>>>>> whem vm is restarted. Such bahaviour change across vmstop/vmstart
>>>>>>> is unexpected.
>>>>>>> I do not understand why we want to increment the counter
>>>>>>> on vm stop though. It does make sense to stop the thread
>>>>>>> but why not resume where we left off when vm is resumed?
>>>>>>>
>>>>>> I'm not sure which counter we incremented. But it would be clear if we have
>>>>>> a high level view of how it works (it is symmetric actually). Basically, we
>>>>>> start the optimization when each round starts and stop it at the end of each
>>>>>> round (i.e. before we do the bitmap sync), as shown below:
>>>>>>
>>>>>> 1) 1st Round starts --> free_page_start
>>>>>> 2) 1st Round in progress..
>>>>>> 3) 1st Round ends  --> free_page_stop
>>>>>> 4) 2nd Round starts --> free_page_start
>>>>>> 5) 2nd Round in progress..
>>>>>> 6) 2nd Round ends --> free_page_stop
>>>>>> ......
>>>>>>
>>>>>> For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
>>>>>> finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
>>>>>> optimization thread exits immediately. That is, this optimization thread is
>>>>>> gone forever (the optimization we can do for this round is done). We won't
>>>>>> know when would the VM be woken up:
>>>>>> A) If the VM is woken up very soon when the migration thread is still in
>>>>>> progress of 2), then in 4) a new optimization thread (not the same one for
>>>>>> the first round) will be created and start the optimization for the 2nd
>>>>>> round as usual (If you have questions about 3) in this case, that
>>>>>> free_page_stop will do nothing than just return, since the optimization
>>>>>> thread has exited) ;
>>>>>> B) If the VM is woken up after the whole migration has ended, there is still
>>>>>> no point in resuming the optimization.
>>>>>>
>>>>>> I think this would be the simple design for the first release of this
>>>>>> optimization. There are possibilities to improve case A) above by continuing
>>>>>> optimization for the 1st Round as it is still in progress, but I think
>>>>>> adding that complexity for this rare case wouldn't be worthwhile (at least
>>>>>> for now). What would you think?
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Wei
>>>>> In my opinion this just makes the patch very messy.
>>>>>
>>>>> E.g. attempts to attach a debugger to the guest will call vmstop and
>>>>> then behaviour changes. This is a receipe for heisenbugs which are then
>>>>> extremely painful to debug.
>>>>>
>>>>> It is not really hard to make things symmetrical:
>>>>> e.g. if you stop on vmstop then you should start on vmstart, etc.
>>>>> And stopping thread should not involve a bunch of state
>>>>> changes, just stop it and that's it.
>>>>>
>>>> "stop it" - do you mean to
>>>> 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
>>>> the while loop and return NULL); or
>>>> 2) keep the thread staying in the while loop but yield running (e.g.
>>>> sleep(1) or block on a mutex)? (or please let me know if you suggested a
>>>> different implementation about stopping the thread)
>>> I would say it makes more sense to make it block on something.
>>>
>>> BTW I still think you are engaging in premature optimization here.
>>> What you are doing here is a "data plane for balloon".
>>> I would make the feature work first by processing this in a BH.
>>> Creating threads immediately opens up questions of isolation,
>>> cgroups etc.
>>>
>> Could you please share more about how creating threads affects isolation and
>> cgroup?
> When threads are coming and going, they are hard to control.
>
> Consider the rich API libvirt exposes for controlling the io threads:
>
> https://libvirt.org/formatdomain.html#elementsIOThreadsAllocation
>
>
>> and how does BH solve it? Thanks.
>> Best,
>> Wei
> It's late at night so I don't remember whether it's the emulator
> or the io thread that runs the BH, but the point is it's
> already controlled by libvirt.
>

OK. I've tried to implement it this way: create an iothread via the qemu 
cmdline option: --device 
virtio-balloon,free-page-hint=true,iothread=iothread10, and schedule a 
BH to run in the iothread context when free_page_start() is called.

I think the drawback of this method is that the iothread exists during 
the whole QEMU lifetime, which wastes CPU cycles (e.g. when live 
migration isn't in progress). The method in v5 is like migration thread, 
which comes only when the migration request is received and goes when 
migration is done. Any thought?

Best,
Wei









---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-22  3:13                       ` Wei Wang
@ 2018-03-26 10:58                         ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-26 10:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/22/2018 11:13 AM, Wei Wang wrote:
>
> OK. I've tried to implement it this way: create an iothread via the 
> qemu cmdline option: --device 
> virtio-balloon,free-page-hint=true,iothread=iothread10, and schedule a 
> BH to run in the iothread context when free_page_start() is called.
>
> I think the drawback of this method is that the iothread exists during 
> the whole QEMU lifetime, which wastes CPU cycles (e.g. when live 
> migration isn't in progress). The method in v5 is like migration 
> thread, which comes only when the migration request is received and 
> goes when migration is done. Any thought?
>

Hi Michael,

Would it be acceptable to go with the thread creation method for now? I 
would prefer that method for the above reasons. If you are very 
confident about the iothread method, I can also send out the patches 
with that implementation. Hope we could finish this feature soon. Thanks.

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-26 10:58                         ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-26 10:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/22/2018 11:13 AM, Wei Wang wrote:
>
> OK. I've tried to implement it this way: create an iothread via the 
> qemu cmdline option: --device 
> virtio-balloon,free-page-hint=true,iothread=iothread10, and schedule a 
> BH to run in the iothread context when free_page_start() is called.
>
> I think the drawback of this method is that the iothread exists during 
> the whole QEMU lifetime, which wastes CPU cycles (e.g. when live 
> migration isn't in progress). The method in v5 is like migration 
> thread, which comes only when the migration request is received and 
> goes when migration is done. Any thought?
>

Hi Michael,

Would it be acceptable to go with the thread creation method for now? I 
would prefer that method for the above reasons. If you are very 
confident about the iothread method, I can also send out the patches 
with that implementation. Hope we could finish this feature soon. Thanks.

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-20  3:24                     ` Michael S. Tsirkin
  (?)
  (?)
@ 2018-03-26 11:09                     ` Daniel P. Berrangé
  2018-03-26 14:54                         ` [virtio-dev] " Wang, Wei W
  -1 siblings, 1 reply; 42+ messages in thread
From: Daniel P. Berrangé @ 2018-03-26 11:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Wei Wang, yang.zhang.wz, virtio-dev, quan.xu0, quintela,
	liliang.opensource, qemu-devel, dgilbert, pbonzini, nilal

On Tue, Mar 20, 2018 at 05:24:39AM +0200, Michael S. Tsirkin wrote:
> On Tue, Mar 20, 2018 at 11:18:23AM +0800, Wei Wang wrote:
> > On 03/20/2018 10:59 AM, Michael S. Tsirkin wrote:
> > > On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
> > > > On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
> > > > > On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
> > > > > > On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
> > > > > > > On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
> > > > > > > > On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
> > > > > > > > > On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
> > > > > > > OTOH it seems that if thread stops nothing will wake it up
> > > > > > > whem vm is restarted. Such bahaviour change across vmstop/vmstart
> > > > > > > is unexpected.
> > > > > > > I do not understand why we want to increment the counter
> > > > > > > on vm stop though. It does make sense to stop the thread
> > > > > > > but why not resume where we left off when vm is resumed?
> > > > > > > 
> > > > > > I'm not sure which counter we incremented. But it would be clear if we have
> > > > > > a high level view of how it works (it is symmetric actually). Basically, we
> > > > > > start the optimization when each round starts and stop it at the end of each
> > > > > > round (i.e. before we do the bitmap sync), as shown below:
> > > > > > 
> > > > > > 1) 1st Round starts --> free_page_start
> > > > > > 2) 1st Round in progress..
> > > > > > 3) 1st Round ends  --> free_page_stop
> > > > > > 4) 2nd Round starts --> free_page_start
> > > > > > 5) 2nd Round in progress..
> > > > > > 6) 2nd Round ends --> free_page_stop
> > > > > > ......
> > > > > > 
> > > > > > For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
> > > > > > finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
> > > > > > optimization thread exits immediately. That is, this optimization thread is
> > > > > > gone forever (the optimization we can do for this round is done). We won't
> > > > > > know when would the VM be woken up:
> > > > > > A) If the VM is woken up very soon when the migration thread is still in
> > > > > > progress of 2), then in 4) a new optimization thread (not the same one for
> > > > > > the first round) will be created and start the optimization for the 2nd
> > > > > > round as usual (If you have questions about 3) in this case, that
> > > > > > free_page_stop will do nothing than just return, since the optimization
> > > > > > thread has exited) ;
> > > > > > B) If the VM is woken up after the whole migration has ended, there is still
> > > > > > no point in resuming the optimization.
> > > > > > 
> > > > > > I think this would be the simple design for the first release of this
> > > > > > optimization. There are possibilities to improve case A) above by continuing
> > > > > > optimization for the 1st Round as it is still in progress, but I think
> > > > > > adding that complexity for this rare case wouldn't be worthwhile (at least
> > > > > > for now). What would you think?
> > > > > > 
> > > > > > 
> > > > > > Best,
> > > > > > Wei
> > > > > In my opinion this just makes the patch very messy.
> > > > > 
> > > > > E.g. attempts to attach a debugger to the guest will call vmstop and
> > > > > then behaviour changes. This is a receipe for heisenbugs which are then
> > > > > extremely painful to debug.
> > > > > 
> > > > > It is not really hard to make things symmetrical:
> > > > > e.g. if you stop on vmstop then you should start on vmstart, etc.
> > > > > And stopping thread should not involve a bunch of state
> > > > > changes, just stop it and that's it.
> > > > > 
> > > > "stop it" - do you mean to
> > > > 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
> > > > the while loop and return NULL); or
> > > > 2) keep the thread staying in the while loop but yield running (e.g.
> > > > sleep(1) or block on a mutex)? (or please let me know if you suggested a
> > > > different implementation about stopping the thread)
> > > I would say it makes more sense to make it block on something.
> > > 
> > > BTW I still think you are engaging in premature optimization here.
> > > What you are doing here is a "data plane for balloon".
> > > I would make the feature work first by processing this in a BH.
> > > Creating threads immediately opens up questions of isolation,
> > > cgroups etc.
> > > 
> > 
> > Could you please share more about how creating threads affects isolation and
> > cgroup?
> 
> When threads are coming and going, they are hard to control.
> 
> Consider the rich API libvirt exposes for controlling the io threads:
> 
> https://libvirt.org/formatdomain.html#elementsIOThreadsAllocation
> 
> 
> > and how does BH solve it? Thanks.
> > Best,
> > Wei
> 
> It's late at night so I don't remember whether it's the emulator
> or the io thread that runs the BH, but the point is it's
> already controlled by libvirt.

As far as libvirt is concerned there are three sets of threads it provides
control over

 - vCPUs - each VCPU in KVM has a thread. Libvirt provides per-thread
   tunable control

 - IOThreads - each named I/O thread can be associated with one or more
   devices. Libvirt provides per-thread tunable control.

 - Emulator - any other QEMU thread which isn't an vCPU thread or IO thread
   gets called an emulator thread by libvirt. There is no-per thread
   tunable control - we can set tunables for entire set of emulator threads
   at once.

So, if this balloon driver thread needs to support tuning controls
separately from other general purpose QEMU threads, then it would
ideally use iothread infrastructure.

I don't particularly understand what this code is doing, but please consider
whether NUMA has any impact on the work done in this thread. Specifically when
the guest has multiple virtual NUMA nodes, each associated with a specific
host NUMA node. If there is any memory intensive work being done here, then
it might need to be executed on the correct host NUMA node according to the
memory region being touched.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-26 11:09                     ` [Qemu-devel] " Daniel P. Berrangé
@ 2018-03-26 14:54                         ` Wang, Wei W
  0 siblings, 0 replies; 42+ messages in thread
From: Wang, Wei W @ 2018-03-26 14:54 UTC (permalink / raw)
  To: Daniel P. Berrangé, Michael S. Tsirkin
  Cc: yang.zhang.wz, virtio-dev, quan.xu0, quintela,
	liliang.opensource, qemu-devel, dgilbert, pbonzini, nilal

On Monday, March 26, 2018 7:09 PM, Daniel P. Berrangé wrote:
> 
> As far as libvirt is concerned there are three sets of threads it provides
> control over
> 
>  - vCPUs - each VCPU in KVM has a thread. Libvirt provides per-thread
>    tunable control
> 
>  - IOThreads - each named I/O thread can be associated with one or more
>    devices. Libvirt provides per-thread tunable control.
> 
>  - Emulator - any other QEMU thread which isn't an vCPU thread or IO thread
>    gets called an emulator thread by libvirt. There is no-per thread
>    tunable control - we can set tunables for entire set of emulator threads
>    at once.
> 


Hi Daniel,
Thanks for sharing the details, they are very helpful. I still have a question:

There is no fundamental difference between iothread and our optimization thread (it is similar to the migration thread, which is created when migration begins and terminated when migration is done) - both of them are pthreads and each has a name. Could we also add the similar per-thread tunable control in libvirt for such threads?

For example, in QEMU we can add a new migration qmp command, migrate_enable_free_page_optimization (just like other commands migrate_set_speed 10G), this command will create the optimization thread. In this way, creation of the thread is in libvirt's control, and libvirt can then support tuning the thread (e.g. pin it to any pCPU), right?


> So, if this balloon driver thread needs to support tuning controls separately
> from other general purpose QEMU threads, then it would ideally use
> iothread infrastructure.
> 
> I don't particularly understand what this code is doing, but please consider
> whether NUMA has any impact on the work done in this thread. Specifically
> when the guest has multiple virtual NUMA nodes, each associated with a
> specific host NUMA node. If there is any memory intensive work being done
> here, then it might need to be executed on the correct host NUMA node
> according to the memory region being touched.
> 
 
I think it would not be significantly impacted by NUMA, because this optimization thread doesn’t access to the guest memory a lot except the virtqueue (even with iothread, we may still not know which pCPU to pin to match virtqueue in the vNUMA case). Essentially, it gets the free page address and length, then clears bits from the migration dirty bitmap, which is allocated by QEMU itself.
So, I think adding the tunable support is nicer, but I'm not sure if that would be required.

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [virtio-dev] RE: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-26 14:54                         ` Wang, Wei W
  0 siblings, 0 replies; 42+ messages in thread
From: Wang, Wei W @ 2018-03-26 14:54 UTC (permalink / raw)
  To: Daniel P. Berrangé, Michael S. Tsirkin
  Cc: yang.zhang.wz, virtio-dev, quan.xu0, quintela,
	liliang.opensource, qemu-devel, dgilbert, pbonzini, nilal

On Monday, March 26, 2018 7:09 PM, Daniel P. Berrangé wrote:
> 
> As far as libvirt is concerned there are three sets of threads it provides
> control over
> 
>  - vCPUs - each VCPU in KVM has a thread. Libvirt provides per-thread
>    tunable control
> 
>  - IOThreads - each named I/O thread can be associated with one or more
>    devices. Libvirt provides per-thread tunable control.
> 
>  - Emulator - any other QEMU thread which isn't an vCPU thread or IO thread
>    gets called an emulator thread by libvirt. There is no-per thread
>    tunable control - we can set tunables for entire set of emulator threads
>    at once.
> 


Hi Daniel,
Thanks for sharing the details, they are very helpful. I still have a question:

There is no fundamental difference between iothread and our optimization thread (it is similar to the migration thread, which is created when migration begins and terminated when migration is done) - both of them are pthreads and each has a name. Could we also add the similar per-thread tunable control in libvirt for such threads?

For example, in QEMU we can add a new migration qmp command, migrate_enable_free_page_optimization (just like other commands migrate_set_speed 10G), this command will create the optimization thread. In this way, creation of the thread is in libvirt's control, and libvirt can then support tuning the thread (e.g. pin it to any pCPU), right?


> So, if this balloon driver thread needs to support tuning controls separately
> from other general purpose QEMU threads, then it would ideally use
> iothread infrastructure.
> 
> I don't particularly understand what this code is doing, but please consider
> whether NUMA has any impact on the work done in this thread. Specifically
> when the guest has multiple virtual NUMA nodes, each associated with a
> specific host NUMA node. If there is any memory intensive work being done
> here, then it might need to be executed on the correct host NUMA node
> according to the memory region being touched.
> 
 
I think it would not be significantly impacted by NUMA, because this optimization thread doesn’t access to the guest memory a lot except the virtqueue (even with iothread, we may still not know which pCPU to pin to match virtqueue in the vNUMA case). Essentially, it gets the free page address and length, then clears bits from the migration dirty bitmap, which is allocated by QEMU itself.
So, I think adding the tunable support is nicer, but I'm not sure if that would be required.

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-26 14:54                         ` [virtio-dev] " Wang, Wei W
  (?)
@ 2018-03-26 15:04                         ` Daniel P. Berrangé
  2018-03-26 15:35                             ` [virtio-dev] " Wang, Wei W
  -1 siblings, 1 reply; 42+ messages in thread
From: Daniel P. Berrangé @ 2018-03-26 15:04 UTC (permalink / raw)
  To: Wang, Wei W
  Cc: Michael S. Tsirkin, yang.zhang.wz, virtio-dev, quan.xu0,
	quintela, liliang.opensource, qemu-devel, dgilbert, pbonzini,
	nilal

On Mon, Mar 26, 2018 at 02:54:45PM +0000, Wang, Wei W wrote:
> On Monday, March 26, 2018 7:09 PM, Daniel P. Berrangé wrote:
> > 
> > As far as libvirt is concerned there are three sets of threads it provides
> > control over
> > 
> >  - vCPUs - each VCPU in KVM has a thread. Libvirt provides per-thread
> >    tunable control
> > 
> >  - IOThreads - each named I/O thread can be associated with one or more
> >    devices. Libvirt provides per-thread tunable control.
> > 
> >  - Emulator - any other QEMU thread which isn't an vCPU thread or IO thread
> >    gets called an emulator thread by libvirt. There is no-per thread
> >    tunable control - we can set tunables for entire set of emulator threads
> >    at once.
> > 
> 
> 
> Hi Daniel,
> Thanks for sharing the details, they are very helpful. I still have a question:
> 
> There is no fundamental difference between iothread and our optimization
> thread (it is similar to the migration thread, which is created when
> migration begins and terminated when migration is done) - both of them are
> pthreads and each has a name. Could we also add the similar per-thread
> tunable control in libvirt for such threads?
> 
> For example, in QEMU we can add a new migration qmp command,
> migrate_enable_free_page_optimization (just like other commands
> migrate_set_speed 10G), this command will create the optimization thread.
> In this way, creation of the thread is in libvirt's control, and libvirt
> can then support tuning the thread (e.g. pin it to any pCPU), right?

Anything is possible from a technical level, but from a design POV we
would rather not start exposing tunables for arbitrary device type specific
threads as they are inherantly non-portable and may not even exist in QEMU
long term as it is just an artifact of the current QEMU implementation.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-26 15:04                         ` Daniel P. Berrangé
@ 2018-03-26 15:35                             ` Wang, Wei W
  0 siblings, 0 replies; 42+ messages in thread
From: Wang, Wei W @ 2018-03-26 15:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Michael S. Tsirkin, yang.zhang.wz, virtio-dev, quan.xu0,
	quintela, liliang.opensource, qemu-devel, dgilbert, pbonzini,
	nilal

On Monday, March 26, 2018 11:04 PM, Daniel P. Berrangé wrote:
> On Mon, Mar 26, 2018 at 02:54:45PM +0000, Wang, Wei W wrote:
> > On Monday, March 26, 2018 7:09 PM, Daniel P. Berrangé wrote:
> > >
> > > As far as libvirt is concerned there are three sets of threads it
> > > provides control over
> > >
> > >  - vCPUs - each VCPU in KVM has a thread. Libvirt provides per-thread
> > >    tunable control
> > >
> > >  - IOThreads - each named I/O thread can be associated with one or more
> > >    devices. Libvirt provides per-thread tunable control.
> > >
> > >  - Emulator - any other QEMU thread which isn't an vCPU thread or IO
> thread
> > >    gets called an emulator thread by libvirt. There is no-per thread
> > >    tunable control - we can set tunables for entire set of emulator threads
> > >    at once.
> > >
> >
> >
> > Hi Daniel,
> > Thanks for sharing the details, they are very helpful. I still have a question:
> >
> > There is no fundamental difference between iothread and our
> > optimization thread (it is similar to the migration thread, which is
> > created when migration begins and terminated when migration is done) -
> > both of them are pthreads and each has a name. Could we also add the
> > similar per-thread tunable control in libvirt for such threads?
> >
> > For example, in QEMU we can add a new migration qmp command,
> > migrate_enable_free_page_optimization (just like other commands
> > migrate_set_speed 10G), this command will create the optimization thread.
> > In this way, creation of the thread is in libvirt's control, and
> > libvirt can then support tuning the thread (e.g. pin it to any pCPU), right?
> 
> Anything is possible from a technical level, but from a design POV we would
> rather not start exposing tunables for arbitrary device type specific threads
> as they are inherantly non-portable and may not even exist in QEMU long
> term as it is just an artifact of the current QEMU implementation.
> 

OK. My actual concern with iothread is that it exists during the whole QEMU lifecycle. Block device using it is fine since block device access happens frequently during the QEMU lifecycle. Usages like live migration, if they happen once per day, running this iothread would be a waste of CPU cycles.

Best,
Wei
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [virtio-dev] RE: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-26 15:35                             ` Wang, Wei W
  0 siblings, 0 replies; 42+ messages in thread
From: Wang, Wei W @ 2018-03-26 15:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Michael S. Tsirkin, yang.zhang.wz, virtio-dev, quan.xu0,
	quintela, liliang.opensource, qemu-devel, dgilbert, pbonzini,
	nilal

On Monday, March 26, 2018 11:04 PM, Daniel P. Berrangé wrote:
> On Mon, Mar 26, 2018 at 02:54:45PM +0000, Wang, Wei W wrote:
> > On Monday, March 26, 2018 7:09 PM, Daniel P. Berrangé wrote:
> > >
> > > As far as libvirt is concerned there are three sets of threads it
> > > provides control over
> > >
> > >  - vCPUs - each VCPU in KVM has a thread. Libvirt provides per-thread
> > >    tunable control
> > >
> > >  - IOThreads - each named I/O thread can be associated with one or more
> > >    devices. Libvirt provides per-thread tunable control.
> > >
> > >  - Emulator - any other QEMU thread which isn't an vCPU thread or IO
> thread
> > >    gets called an emulator thread by libvirt. There is no-per thread
> > >    tunable control - we can set tunables for entire set of emulator threads
> > >    at once.
> > >
> >
> >
> > Hi Daniel,
> > Thanks for sharing the details, they are very helpful. I still have a question:
> >
> > There is no fundamental difference between iothread and our
> > optimization thread (it is similar to the migration thread, which is
> > created when migration begins and terminated when migration is done) -
> > both of them are pthreads and each has a name. Could we also add the
> > similar per-thread tunable control in libvirt for such threads?
> >
> > For example, in QEMU we can add a new migration qmp command,
> > migrate_enable_free_page_optimization (just like other commands
> > migrate_set_speed 10G), this command will create the optimization thread.
> > In this way, creation of the thread is in libvirt's control, and
> > libvirt can then support tuning the thread (e.g. pin it to any pCPU), right?
> 
> Anything is possible from a technical level, but from a design POV we would
> rather not start exposing tunables for arbitrary device type specific threads
> as they are inherantly non-portable and may not even exist in QEMU long
> term as it is just an artifact of the current QEMU implementation.
> 

OK. My actual concern with iothread is that it exists during the whole QEMU lifecycle. Block device using it is fine since block device access happens frequently during the QEMU lifecycle. Usages like live migration, if they happen once per day, running this iothread would be a waste of CPU cycles.

Best,
Wei
 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-20  3:24                     ` Michael S. Tsirkin
@ 2018-03-30  9:52                       ` Wei Wang
  -1 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-30  9:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/20/2018 11:24 AM, Michael S. Tsirkin wrote:
> On Tue, Mar 20, 2018 at 11:18:23AM +0800, Wei Wang wrote:
>> On 03/20/2018 10:59 AM, Michael S. Tsirkin wrote:
>>> On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
>>>> On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
>>>>> On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
>>>>>> On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
>>>>>>> On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
>>>>>>>> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>>>>>>> OTOH it seems that if thread stops nothing will wake it up
>>>>>>> whem vm is restarted. Such bahaviour change across vmstop/vmstart
>>>>>>> is unexpected.
>>>>>>> I do not understand why we want to increment the counter
>>>>>>> on vm stop though. It does make sense to stop the thread
>>>>>>> but why not resume where we left off when vm is resumed?
>>>>>>>
>>>>>> I'm not sure which counter we incremented. But it would be clear if we have
>>>>>> a high level view of how it works (it is symmetric actually). Basically, we
>>>>>> start the optimization when each round starts and stop it at the end of each
>>>>>> round (i.e. before we do the bitmap sync), as shown below:
>>>>>>
>>>>>> 1) 1st Round starts --> free_page_start
>>>>>> 2) 1st Round in progress..
>>>>>> 3) 1st Round ends  --> free_page_stop
>>>>>> 4) 2nd Round starts --> free_page_start
>>>>>> 5) 2nd Round in progress..
>>>>>> 6) 2nd Round ends --> free_page_stop
>>>>>> ......
>>>>>>
>>>>>> For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
>>>>>> finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
>>>>>> optimization thread exits immediately. That is, this optimization thread is
>>>>>> gone forever (the optimization we can do for this round is done). We won't
>>>>>> know when would the VM be woken up:
>>>>>> A) If the VM is woken up very soon when the migration thread is still in
>>>>>> progress of 2), then in 4) a new optimization thread (not the same one for
>>>>>> the first round) will be created and start the optimization for the 2nd
>>>>>> round as usual (If you have questions about 3) in this case, that
>>>>>> free_page_stop will do nothing than just return, since the optimization
>>>>>> thread has exited) ;
>>>>>> B) If the VM is woken up after the whole migration has ended, there is still
>>>>>> no point in resuming the optimization.
>>>>>>
>>>>>> I think this would be the simple design for the first release of this
>>>>>> optimization. There are possibilities to improve case A) above by continuing
>>>>>> optimization for the 1st Round as it is still in progress, but I think
>>>>>> adding that complexity for this rare case wouldn't be worthwhile (at least
>>>>>> for now). What would you think?
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Wei
>>>>> In my opinion this just makes the patch very messy.
>>>>>
>>>>> E.g. attempts to attach a debugger to the guest will call vmstop and
>>>>> then behaviour changes. This is a receipe for heisenbugs which are then
>>>>> extremely painful to debug.
>>>>>
>>>>> It is not really hard to make things symmetrical:
>>>>> e.g. if you stop on vmstop then you should start on vmstart, etc.
>>>>> And stopping thread should not involve a bunch of state
>>>>> changes, just stop it and that's it.
>>>>>
>>>> "stop it" - do you mean to
>>>> 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
>>>> the while loop and return NULL); or
>>>> 2) keep the thread staying in the while loop but yield running (e.g.
>>>> sleep(1) or block on a mutex)? (or please let me know if you suggested a
>>>> different implementation about stopping the thread)
>>> I would say it makes more sense to make it block on something.
>>>
>>> BTW I still think you are engaging in premature optimization here.
>>> What you are doing here is a "data plane for balloon".
>>> I would make the feature work first by processing this in a BH.
>>> Creating threads immediately opens up questions of isolation,
>>> cgroups etc.
>>>
>> Could you please share more about how creating threads affects isolation and
>> cgroup?
> When threads are coming and going, they are hard to control.
>
> Consider the rich API libvirt exposes for controlling the io threads:
>
> https://libvirt.org/formatdomain.html#elementsIOThreadsAllocation
>
>
>> and how does BH solve it? Thanks.
>> Best,
>> Wei
> It's late at night so I don't remember whether it's the emulator
> or the io thread that runs the BH, but the point is it's
> already controlled by libvirt.
>

Thanks all for reviewing the patches. I've changed to use iothread in 
the new version. Please have a check if that would be good enough.

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [virtio-dev] Re: [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-30  9:52                       ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-03-30  9:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/20/2018 11:24 AM, Michael S. Tsirkin wrote:
> On Tue, Mar 20, 2018 at 11:18:23AM +0800, Wei Wang wrote:
>> On 03/20/2018 10:59 AM, Michael S. Tsirkin wrote:
>>> On Tue, Mar 20, 2018 at 10:16:09AM +0800, Wei Wang wrote:
>>>> On 03/20/2018 06:55 AM, Michael S. Tsirkin wrote:
>>>>> On Mon, Mar 19, 2018 at 05:01:38PM +0800, Wei Wang wrote:
>>>>>> On 03/19/2018 12:24 PM, Michael S. Tsirkin wrote:
>>>>>>> On Sun, Mar 18, 2018 at 06:36:20PM +0800, Wei Wang wrote:
>>>>>>>> On 03/16/2018 11:16 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Fri, Mar 16, 2018 at 06:48:28PM +0800, Wei Wang wrote:
>>>>>>> OTOH it seems that if thread stops nothing will wake it up
>>>>>>> whem vm is restarted. Such bahaviour change across vmstop/vmstart
>>>>>>> is unexpected.
>>>>>>> I do not understand why we want to increment the counter
>>>>>>> on vm stop though. It does make sense to stop the thread
>>>>>>> but why not resume where we left off when vm is resumed?
>>>>>>>
>>>>>> I'm not sure which counter we incremented. But it would be clear if we have
>>>>>> a high level view of how it works (it is symmetric actually). Basically, we
>>>>>> start the optimization when each round starts and stop it at the end of each
>>>>>> round (i.e. before we do the bitmap sync), as shown below:
>>>>>>
>>>>>> 1) 1st Round starts --> free_page_start
>>>>>> 2) 1st Round in progress..
>>>>>> 3) 1st Round ends  --> free_page_stop
>>>>>> 4) 2nd Round starts --> free_page_start
>>>>>> 5) 2nd Round in progress..
>>>>>> 6) 2nd Round ends --> free_page_stop
>>>>>> ......
>>>>>>
>>>>>> For example, in 2), the VM is stopped. virtio_balloon_poll_free_page_hints
>>>>>> finds the vq is empty (i.e. elem == NULL) and the runstate is stopped, the
>>>>>> optimization thread exits immediately. That is, this optimization thread is
>>>>>> gone forever (the optimization we can do for this round is done). We won't
>>>>>> know when would the VM be woken up:
>>>>>> A) If the VM is woken up very soon when the migration thread is still in
>>>>>> progress of 2), then in 4) a new optimization thread (not the same one for
>>>>>> the first round) will be created and start the optimization for the 2nd
>>>>>> round as usual (If you have questions about 3) in this case, that
>>>>>> free_page_stop will do nothing than just return, since the optimization
>>>>>> thread has exited) ;
>>>>>> B) If the VM is woken up after the whole migration has ended, there is still
>>>>>> no point in resuming the optimization.
>>>>>>
>>>>>> I think this would be the simple design for the first release of this
>>>>>> optimization. There are possibilities to improve case A) above by continuing
>>>>>> optimization for the 1st Round as it is still in progress, but I think
>>>>>> adding that complexity for this rare case wouldn't be worthwhile (at least
>>>>>> for now). What would you think?
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Wei
>>>>> In my opinion this just makes the patch very messy.
>>>>>
>>>>> E.g. attempts to attach a debugger to the guest will call vmstop and
>>>>> then behaviour changes. This is a receipe for heisenbugs which are then
>>>>> extremely painful to debug.
>>>>>
>>>>> It is not really hard to make things symmetrical:
>>>>> e.g. if you stop on vmstop then you should start on vmstart, etc.
>>>>> And stopping thread should not involve a bunch of state
>>>>> changes, just stop it and that's it.
>>>>>
>>>> "stop it" - do you mean to
>>>> 1) make the thread exit (i.e.make virtio_balloon_poll_free_page_hints exit
>>>> the while loop and return NULL); or
>>>> 2) keep the thread staying in the while loop but yield running (e.g.
>>>> sleep(1) or block on a mutex)? (or please let me know if you suggested a
>>>> different implementation about stopping the thread)
>>> I would say it makes more sense to make it block on something.
>>>
>>> BTW I still think you are engaging in premature optimization here.
>>> What you are doing here is a "data plane for balloon".
>>> I would make the feature work first by processing this in a BH.
>>> Creating threads immediately opens up questions of isolation,
>>> cgroups etc.
>>>
>> Could you please share more about how creating threads affects isolation and
>> cgroup?
> When threads are coming and going, they are hard to control.
>
> Consider the rich API libvirt exposes for controlling the io threads:
>
> https://libvirt.org/formatdomain.html#elementsIOThreadsAllocation
>
>
>> and how does BH solve it? Thanks.
>> Best,
>> Wei
> It's late at night so I don't remember whether it's the emulator
> or the io thread that runs the BH, but the point is it's
> already controlled by libvirt.
>

Thanks all for reviewing the patches. I've changed to use iothread in 
the new version. Please have a check if that would be good enough.

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2018-03-30  9:49 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-16 10:48 [Qemu-devel] [PATCH v5 0/5] virtio-balloon: free page hint reporting support Wei Wang
2018-03-16 10:48 ` [virtio-dev] " Wei Wang
2018-03-16 10:48 ` [Qemu-devel] [PATCH v5 1/5] bitmap: bitmap_count_one_with_offset Wei Wang
2018-03-16 10:48   ` [virtio-dev] " Wei Wang
2018-03-16 10:48 ` [Qemu-devel] [PATCH v5 2/5] migration: use bitmap_mutex in migration_bitmap_clear_dirty Wei Wang
2018-03-16 10:48   ` [virtio-dev] " Wei Wang
2018-03-16 10:48 ` [Qemu-devel] [PATCH v5 3/5] migration: API to clear bits of guest free pages from the dirty bitmap Wei Wang
2018-03-16 10:48   ` [virtio-dev] " Wei Wang
2018-03-16 10:48 ` [Qemu-devel] [PATCH v5 4/5] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-03-16 10:48   ` [virtio-dev] " Wei Wang
2018-03-16 15:16   ` [Qemu-devel] " Michael S. Tsirkin
2018-03-16 15:16     ` [virtio-dev] " Michael S. Tsirkin
2018-03-18 10:36     ` [Qemu-devel] " Wei Wang
2018-03-18 10:36       ` [virtio-dev] " Wei Wang
2018-03-19  4:24       ` [Qemu-devel] " Michael S. Tsirkin
2018-03-19  4:24         ` [virtio-dev] " Michael S. Tsirkin
2018-03-19  9:01         ` [Qemu-devel] " Wei Wang
2018-03-19  9:01           ` Wei Wang
2018-03-19 22:55           ` [Qemu-devel] " Michael S. Tsirkin
2018-03-19 22:55             ` Michael S. Tsirkin
2018-03-20  2:16             ` [Qemu-devel] " Wei Wang
2018-03-20  2:16               ` Wei Wang
2018-03-20  2:59               ` [Qemu-devel] " Michael S. Tsirkin
2018-03-20  2:59                 ` Michael S. Tsirkin
2018-03-20  3:18                 ` [Qemu-devel] " Wei Wang
2018-03-20  3:18                   ` Wei Wang
2018-03-20  3:24                   ` [Qemu-devel] " Michael S. Tsirkin
2018-03-20  3:24                     ` Michael S. Tsirkin
2018-03-22  3:13                     ` [Qemu-devel] " Wei Wang
2018-03-22  3:13                       ` Wei Wang
2018-03-26 10:58                       ` [Qemu-devel] " Wei Wang
2018-03-26 10:58                         ` Wei Wang
2018-03-26 11:09                     ` [Qemu-devel] " Daniel P. Berrangé
2018-03-26 14:54                       ` Wang, Wei W
2018-03-26 14:54                         ` [virtio-dev] " Wang, Wei W
2018-03-26 15:04                         ` Daniel P. Berrangé
2018-03-26 15:35                           ` Wang, Wei W
2018-03-26 15:35                             ` [virtio-dev] " Wang, Wei W
2018-03-30  9:52                     ` Wei Wang
2018-03-30  9:52                       ` Wei Wang
2018-03-16 10:48 ` [Qemu-devel] [PATCH v5 5/5] migration: use the free page hint feature from balloon Wei Wang
2018-03-16 10:48   ` [virtio-dev] " Wei Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.