All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/4] virtio-balloon: free page hint reporting support
@ 2018-03-07 12:34 ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-07 12:34 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.

- Test Environment
    Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    Guest: 8G RAM, 4 vCPU
    Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
    - Idle Guest Live Migration Time (results are averaged over 10 runs):
        - Optimization v.s. Legacy = 251ms vs 1769ms --> ~86% reduction
    - Guest with Linux Compilation Workload (make bzImage -j4):
        - Live Migration Time (average)
          Optimization v.s. Legacy = 1297ms v.s. 2634ms --> ~51% reduction
        - Linux Compilation Time
          Optimization v.s. Legacy = 4min56s v.s. 5min3s
          --> no obvious difference

- Source Code
    - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
    - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

ChangeLog:
v3->v4:
    1) bitmap: add a new API to count 1s starting from an offset of a
       bitmap
    2) migration:
        - qemu_guest_free_page_hint: calculate
          ram_state->migration_dirty_pages by counting how many bits of
          free pages are truely cleared. If some of the bits were
          already 0, they shouldn't be deducted by
          ram_state->migration_dirty_pages. This wasn't needed for
          previous versions since we optimized bulk stage only,
          where all bits are guaranteed to be set. It's needed now
          because we extened the usage of this optimizaton to all stages
          except the last stop&copy stage. From 2nd stage onward, there
          are possibilities that some bits of free pages are already 0.
     3) virtio-balloon:
         - virtio_balloon_free_page_report_status: introduce a new status,
           FREE_PAGE_REPORT_S_EXIT. This status indicates that the
           optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
           the reporting is stopped, but the optimization thread still needs
           to be joined by the migration thread.
v2->v3:
    1) virtio-balloon
        - virtio_balloon_free_page_start: poll the hints using a new
          thread;
        - use cmd id between [0x80000000, UINT_MAX];
        - virtio_balloon_poll_free_page_hints:
            - stop the optimization only when it has started;
            - don't skip free pages when !poison_val;
        - add poison_val to vmsd to migrate;
        - virtio_balloon_get_features: add the F_PAGE_POISON feature when
          host has F_FREE_PAGE_HINT;
        - remove the timer patch which is not needed now.
    2) migration
       - new api, qemu_guest_free_page_hint;
       - rs->free_page_support set only in the precopy case;
       - use the new balloon APIs.
v1->v2: 
    1) virtio-balloon
        - use subsections to save free_page_report_cmd_id;
        - poll the free page vq after sending a cmd id to the driver;
        - change the free page vq size to VIRTQUEUE_MAX_SIZE;
        - virtio_balloon_poll_free_page_hints: handle the corner case
          that the free page block reported from the driver may cross
          the RAMBlock boundary.
    2) migration/ram.c
        - use balloon_free_page_poll to start the optimization

Wei Wang (4):
  bitmap: bitmap_count_one_with_offset
  migration: API to clear bits of guest free pages from the dirty bitmap
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  migration: use the free page hint feature from balloon

 balloon.c                                       |  49 +++++--
 hw/virtio/virtio-balloon.c                      | 183 +++++++++++++++++++++---
 include/hw/virtio/virtio-balloon.h              |  15 +-
 include/migration/misc.h                        |   2 +
 include/qemu/bitmap.h                           |  13 ++
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 migration/ram.c                                 |  40 +++++-
 8 files changed, 294 insertions(+), 30 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] [PATCH v4 0/4] virtio-balloon: free page hint reporting support
@ 2018-03-07 12:34 ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-07 12:34 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This is the deivce part implementation to add a new feature,
VIRTIO_BALLOON_F_FREE_PAGE_HINT to the virtio-balloon device. The device
receives the guest free page hints from the driver and clears the
corresponding bits in the dirty bitmap, so that those free pages are
not transferred by the migration thread to the destination.

- Test Environment
    Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    Guest: 8G RAM, 4 vCPU
    Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second

- Test Results
    - Idle Guest Live Migration Time (results are averaged over 10 runs):
        - Optimization v.s. Legacy = 251ms vs 1769ms --> ~86% reduction
    - Guest with Linux Compilation Workload (make bzImage -j4):
        - Live Migration Time (average)
          Optimization v.s. Legacy = 1297ms v.s. 2634ms --> ~51% reduction
        - Linux Compilation Time
          Optimization v.s. Legacy = 4min56s v.s. 5min3s
          --> no obvious difference

- Source Code
    - QEMU:  https://github.com/wei-w-wang/qemu-free-page-lm.git
    - Linux: https://github.com/wei-w-wang/linux-free-page-lm.git

ChangeLog:
v3->v4:
    1) bitmap: add a new API to count 1s starting from an offset of a
       bitmap
    2) migration:
        - qemu_guest_free_page_hint: calculate
          ram_state->migration_dirty_pages by counting how many bits of
          free pages are truely cleared. If some of the bits were
          already 0, they shouldn't be deducted by
          ram_state->migration_dirty_pages. This wasn't needed for
          previous versions since we optimized bulk stage only,
          where all bits are guaranteed to be set. It's needed now
          because we extened the usage of this optimizaton to all stages
          except the last stop&copy stage. From 2nd stage onward, there
          are possibilities that some bits of free pages are already 0.
     3) virtio-balloon:
         - virtio_balloon_free_page_report_status: introduce a new status,
           FREE_PAGE_REPORT_S_EXIT. This status indicates that the
           optimization thread has exited. FREE_PAGE_REPORT_S_STOP means
           the reporting is stopped, but the optimization thread still needs
           to be joined by the migration thread.
v2->v3:
    1) virtio-balloon
        - virtio_balloon_free_page_start: poll the hints using a new
          thread;
        - use cmd id between [0x80000000, UINT_MAX];
        - virtio_balloon_poll_free_page_hints:
            - stop the optimization only when it has started;
            - don't skip free pages when !poison_val;
        - add poison_val to vmsd to migrate;
        - virtio_balloon_get_features: add the F_PAGE_POISON feature when
          host has F_FREE_PAGE_HINT;
        - remove the timer patch which is not needed now.
    2) migration
       - new api, qemu_guest_free_page_hint;
       - rs->free_page_support set only in the precopy case;
       - use the new balloon APIs.
v1->v2: 
    1) virtio-balloon
        - use subsections to save free_page_report_cmd_id;
        - poll the free page vq after sending a cmd id to the driver;
        - change the free page vq size to VIRTQUEUE_MAX_SIZE;
        - virtio_balloon_poll_free_page_hints: handle the corner case
          that the free page block reported from the driver may cross
          the RAMBlock boundary.
    2) migration/ram.c
        - use balloon_free_page_poll to start the optimization

Wei Wang (4):
  bitmap: bitmap_count_one_with_offset
  migration: API to clear bits of guest free pages from the dirty bitmap
  virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  migration: use the free page hint feature from balloon

 balloon.c                                       |  49 +++++--
 hw/virtio/virtio-balloon.c                      | 183 +++++++++++++++++++++---
 include/hw/virtio/virtio-balloon.h              |  15 +-
 include/migration/misc.h                        |   2 +
 include/qemu/bitmap.h                           |  13 ++
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 migration/ram.c                                 |  40 +++++-
 8 files changed, 294 insertions(+), 30 deletions(-)

-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v4 1/4] bitmap: bitmap_count_one_with_offset
  2018-03-07 12:34 ` [virtio-dev] " Wei Wang
@ 2018-03-07 12:34   ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-07 12:34 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 include/qemu/bitmap.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..e3f31f1 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -228,6 +228,19 @@ static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
     }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+                                                long offset, long nbits)
+{
+    long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+    long redundant_bits = offset - aligned_offset;
+    long bits_to_count = nbits + redundant_bits;
+    const unsigned long *bitmap_start = bitmap +
+                                        aligned_offset / BITS_PER_LONG;
+
+    return bitmap_count_one(bitmap_start, bits_to_count) -
+           bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [virtio-dev] [PATCH v4 1/4] bitmap: bitmap_count_one_with_offset
@ 2018-03-07 12:34   ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-07 12:34 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Count the number of 1s in a bitmap starting from an offset.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 include/qemu/bitmap.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
index 509eedd..e3f31f1 100644
--- a/include/qemu/bitmap.h
+++ b/include/qemu/bitmap.h
@@ -228,6 +228,19 @@ static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
     }
 }
 
+static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
+                                                long offset, long nbits)
+{
+    long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
+    long redundant_bits = offset - aligned_offset;
+    long bits_to_count = nbits + redundant_bits;
+    const unsigned long *bitmap_start = bitmap +
+                                        aligned_offset / BITS_PER_LONG;
+
+    return bitmap_count_one(bitmap_start, bits_to_count) -
+           bitmap_count_one(bitmap_start, redundant_bits);
+}
+
 void bitmap_set(unsigned long *map, long i, long len);
 void bitmap_set_atomic(unsigned long *map, long i, long len);
 void bitmap_clear(unsigned long *map, long start, long nr);
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-03-07 12:34 ` [virtio-dev] " Wei Wang
@ 2018-03-07 12:34   ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-07 12:34 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 include/migration/misc.h |  2 ++
 migration/ram.c          | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 77fd4f5..fae1acf 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 5e33e5c..e172798 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
     return 0;
 }
 
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+    RAMBlock *block;
+    ram_addr_t offset;
+    size_t used_len, start, npages;
+
+    for (used_len = len; len > 0; len -= used_len) {
+        block = qemu_ram_block_from_host(addr, false, &offset);
+        if (unlikely(offset + len > block->used_length)) {
+            used_len = block->used_length - offset;
+            addr += used_len;
+        }
+
+        start = offset >> TARGET_PAGE_BITS;
+        npages = used_len >> TARGET_PAGE_BITS;
+        ram_state->migration_dirty_pages -=
+                      bitmap_count_one_with_offset(block->bmap, start, npages);
+        bitmap_clear(block->bmap, start, npages);
+    }
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [virtio-dev] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-03-07 12:34   ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-07 12:34 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

This patch adds an API to clear bits corresponding to guest free pages
from the dirty bitmap. Spilt the free page block if it crosses the QEMU
RAMBlock boundary.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 include/migration/misc.h |  2 ++
 migration/ram.c          | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/migration/misc.h b/include/migration/misc.h
index 77fd4f5..fae1acf 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -14,11 +14,13 @@
 #ifndef MIGRATION_MISC_H
 #define MIGRATION_MISC_H
 
+#include "exec/cpu-common.h"
 #include "qemu/notify.h"
 
 /* migration/ram.c */
 
 void ram_mig_init(void);
+void qemu_guest_free_page_hint(void *addr, size_t len);
 
 /* migration/block.c */
 
diff --git a/migration/ram.c b/migration/ram.c
index 5e33e5c..e172798 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
     return 0;
 }
 
+void qemu_guest_free_page_hint(void *addr, size_t len)
+{
+    RAMBlock *block;
+    ram_addr_t offset;
+    size_t used_len, start, npages;
+
+    for (used_len = len; len > 0; len -= used_len) {
+        block = qemu_ram_block_from_host(addr, false, &offset);
+        if (unlikely(offset + len > block->used_length)) {
+            used_len = block->used_length - offset;
+            addr += used_len;
+        }
+
+        start = offset >> TARGET_PAGE_BITS;
+        npages = used_len >> TARGET_PAGE_BITS;
+        ram_state->migration_dirty_pages -=
+                      bitmap_count_one_with_offset(block->bmap, start, npages);
+        bitmap_clear(block->bmap, start, npages);
+    }
+}
+
 /*
  * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
  * long-running RCU critical section.  When rcu-reclaims in the code
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-07 12:34 ` [virtio-dev] " Wei Wang
@ 2018-03-07 12:34   ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-07 12:34 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

balloon_free_page_start - start guest free page hint reporting.
balloon_free_page_stop - stop guest free page hint reporting.

Note: balloon will report pages which were free at the time
of this call. As the reporting happens asynchronously, dirty bit logging
must be enabled before this call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
---
 balloon.c                                       |  49 +++++--
 hw/virtio/virtio-balloon.c                      | 183 +++++++++++++++++++++---
 include/hw/virtio/virtio-balloon.h              |  15 +-
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 5 files changed, 240 insertions(+), 29 deletions(-)

diff --git a/balloon.c b/balloon.c
index d8dd6fe..b0b0749 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,9 @@
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
+static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
+static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
+static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
 static void *balloon_opaque;
 static bool balloon_inhibited;
 
@@ -64,19 +67,42 @@ static bool have_balloon(Error **errp)
     return true;
 }
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-                             QEMUBalloonStatus *stat_func, void *opaque)
+bool balloon_free_page_support(void)
 {
-    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
-        /* We're already registered one balloon handler.  How many can
-         * a guest really have?
-         */
-        return -1;
+    return balloon_free_page_support_fn &&
+           balloon_free_page_support_fn(balloon_opaque);
+}
+
+void balloon_free_page_start(void)
+{
+    balloon_free_page_start_fn(balloon_opaque);
+}
+
+void balloon_free_page_stop(void)
+{
+    balloon_free_page_stop_fn(balloon_opaque);
+}
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque)
+{
+    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
+        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
+        balloon_opaque) {
+        /* We already registered one balloon handler. */
+        return;
     }
-    balloon_event_fn = event_func;
-    balloon_stat_fn = stat_func;
+
+    balloon_event_fn = event_fn;
+    balloon_stat_fn = stat_fn;
+    balloon_free_page_support_fn = free_page_support_fn;
+    balloon_free_page_start_fn = free_page_start_fn;
+    balloon_free_page_stop_fn = free_page_stop_fn;
     balloon_opaque = opaque;
-    return 0;
 }
 
 void qemu_remove_balloon_handler(void *opaque)
@@ -86,6 +112,9 @@ void qemu_remove_balloon_handler(void *opaque)
     }
     balloon_event_fn = NULL;
     balloon_stat_fn = NULL;
+    balloon_free_page_support_fn = NULL;
+    balloon_free_page_start_fn = NULL;
+    balloon_free_page_stop_fn = NULL;
     balloon_opaque = NULL;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 4822449..48ed2ec 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -31,6 +31,7 @@
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "migration/misc.h"
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
 
@@ -308,6 +309,111 @@ out:
     }
 }
 
+static void *virtio_balloon_poll_free_page_hints(void *opaque)
+{
+    VirtQueueElement *elem;
+    VirtIOBalloon *dev = opaque;
+    VirtQueue *vq = dev->free_page_vq;
+    uint32_t id;
+    size_t size;
+
+    /*
+     * Poll the vq till the status changed to STOP. This happens when
+     * the guest finishes reporting hints or the migration thread actively
+     * stops the reporting.
+     */
+    while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            continue;
+        }
+
+        if (elem->out_num) {
+            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
+            virtqueue_push(vq, elem, size);
+            g_free(elem);
+            if (unlikely(size != sizeof(id))) {
+                warn_report("%s: received an incorrect cmd id", __func__);
+                break;
+            }
+            if (id == dev->free_page_report_cmd_id) {
+                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+            } else if (dev->free_page_report_status ==
+                       FREE_PAGE_REPORT_S_START) {
+                /*
+                 * Stop the optimization only when it has started. This avoids
+                 * obsolete stop sign for the previous command.
+                 */
+                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+                break;
+            }
+        }
+
+        if (elem->in_num) {
+            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
+                !dev->poison_val) {
+                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+                                          elem->in_sg[0].iov_len);
+            }
+            virtqueue_push(vq, elem, 0);
+            g_free(elem);
+        }
+    }
+    return NULL;
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    if (unlikely(s->free_page_report_cmd_id == UINT_MAX)) {
+        s->free_page_report_cmd_id =
+                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+    } else {
+        s->free_page_report_cmd_id++;
+    }
+
+    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+    virtio_notify_config(vdev);
+    qemu_thread_create(&s->free_page_thread, "free_page_optimization_thread",
+                       virtio_balloon_poll_free_page_hints, s,
+                       QEMU_THREAD_JOINABLE);
+}
+
+static void virtio_balloon_free_page_stop(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    switch (s->free_page_report_status) {
+    case FREE_PAGE_REPORT_S_REQUESTED:
+    case FREE_PAGE_REPORT_S_START:
+        /*
+         * The guest hasn't done the reporting, so host sends a notification
+         * to the guest to actively stop the reporting before joining the
+         * optimization thread.
+         */
+        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+        virtio_notify_config(vdev);
+    case FREE_PAGE_REPORT_S_STOP:
+        /* The guest has stopped the reporting. Join the optimization thread */
+        qemu_thread_join(&s->free_page_thread);
+        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
+    case FREE_PAGE_REPORT_S_EXIT:
+        /* The optimization thread has gone. No actions needded so far. */
+        break;
+    }
+}
+
 static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
@@ -315,6 +421,15 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
+    config.poison_val = cpu_to_le32(dev->poison_val);
+
+    if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+    } else {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(dev->free_page_report_cmd_id);
+    }
 
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
@@ -368,6 +483,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
                         ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
                         &error_abort);
     }
+    dev->poison_val = le32_to_cpu(config.poison_val);
     trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -377,6 +493,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+
+    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
+        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
+    }
+
     return f;
 }
 
@@ -413,6 +534,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
     return 0;
 }
 
+static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
+    .name = "virtio-balloon-device/free-page-report",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = virtio_balloon_free_page_support,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
+        VMSTATE_UINT32(poison_val, VirtIOBalloon),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
     .name = "virtio-balloon-device",
     .version_id = 1,
@@ -423,30 +556,30 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
         VMSTATE_UINT32(actual, VirtIOBalloon),
         VMSTATE_END_OF_LIST()
     },
+    .subsections = (const VMStateDescription * []) {
+        &vmstate_virtio_balloon_free_page_report,
+        NULL
+    }
 };
 
 static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
-    int ret;
 
     virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
                 sizeof(struct virtio_balloon_config));
 
-    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
-                                   virtio_balloon_stat, s);
-
-    if (ret < 0) {
-        error_setg(errp, "Only one balloon device is supported");
-        virtio_cleanup(vdev);
-        return;
-    }
-
     s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
-
+    if (virtio_has_feature(s->host_features,
+                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
+        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
+        s->free_page_report_cmd_id =
+                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
+    }
     reset_stats(s);
 }
 
@@ -475,11 +608,27 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
 
-    if (!s->stats_vq_elem && vdev->vm_running &&
-        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
-        /* poll stats queue for the element we have discarded when the VM
-         * was stopped */
-        virtio_balloon_receive_stats(vdev, s->svq);
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (!s->stats_vq_elem && vdev->vm_running &&
+            virtqueue_rewind(s->svq, 1)) {
+            /*
+             * Poll stats queue for the element we have discarded when the VM
+             * was stopped.
+             */
+            virtio_balloon_receive_stats(vdev, s->svq);
+        }
+
+        if (virtio_balloon_free_page_support(s)) {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat,
+                                     virtio_balloon_free_page_support,
+                                     virtio_balloon_free_page_start,
+                                     virtio_balloon_free_page_stop,
+                                     s);
+        } else {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat, NULL, NULL, NULL, s);
+        }
     }
 }
 
@@ -509,6 +658,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
 static Property virtio_balloon_properties[] = {
     DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
+    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 1ea13bd..12fde2f 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -23,6 +23,8 @@
 #define VIRTIO_BALLOON(obj) \
         OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
+
 typedef struct virtio_balloon_stat VirtIOBalloonStat;
 
 typedef struct virtio_balloon_stat_modern {
@@ -31,15 +33,26 @@ typedef struct virtio_balloon_stat_modern {
        uint64_t val;
 } VirtIOBalloonStatModern;
 
+enum virtio_balloon_free_page_report_status {
+    FREE_PAGE_REPORT_S_REQUESTED,
+    FREE_PAGE_REPORT_S_START,
+    FREE_PAGE_REPORT_S_STOP,
+    FREE_PAGE_REPORT_S_EXIT,
+};
+
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
+    uint32_t free_page_report_cmd_id;
+    uint32_t poison_val;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
+    QemuThread free_page_thread;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
     uint32_t host_features;
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 7b0a41b..f89e80f 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,15 +34,22 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
 struct virtio_balloon_config {
 	/* Number of pages host wants Guest to give up. */
 	uint32_t num_pages;
 	/* Number of pages we've actually got in balloon. */
 	uint32_t actual;
+	/* Free page report command id, readonly by guest */
+	uint32_t free_page_report_cmd_id;
+	/* Stores PAGE_POISON if page poisoning is in use */
+	uint32_t poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index af49e19..16a2aae 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -18,11 +18,22 @@
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
 typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
+typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
+typedef void (QEMUBalloonFreePageStart)(void *opaque);
+typedef void (QEMUBalloonFreePageStop)(void *opaque);
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 bool qemu_balloon_is_inhibited(void);
 void qemu_balloon_inhibit(bool state);
+bool balloon_free_page_support(void);
+void balloon_free_page_start(void);
+void balloon_free_page_stop(void);
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque);
 
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [virtio-dev] [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-07 12:34   ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-07 12:34 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

balloon_free_page_start - start guest free page hint reporting.
balloon_free_page_stop - stop guest free page hint reporting.

Note: balloon will report pages which were free at the time
of this call. As the reporting happens asynchronously, dirty bit logging
must be enabled before this call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Liang Li <liang.z.li@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
---
 balloon.c                                       |  49 +++++--
 hw/virtio/virtio-balloon.c                      | 183 +++++++++++++++++++++---
 include/hw/virtio/virtio-balloon.h              |  15 +-
 include/standard-headers/linux/virtio_balloon.h |   7 +
 include/sysemu/balloon.h                        |  15 +-
 5 files changed, 240 insertions(+), 29 deletions(-)

diff --git a/balloon.c b/balloon.c
index d8dd6fe..b0b0749 100644
--- a/balloon.c
+++ b/balloon.c
@@ -36,6 +36,9 @@
 
 static QEMUBalloonEvent *balloon_event_fn;
 static QEMUBalloonStatus *balloon_stat_fn;
+static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
+static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
+static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
 static void *balloon_opaque;
 static bool balloon_inhibited;
 
@@ -64,19 +67,42 @@ static bool have_balloon(Error **errp)
     return true;
 }
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-                             QEMUBalloonStatus *stat_func, void *opaque)
+bool balloon_free_page_support(void)
 {
-    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
-        /* We're already registered one balloon handler.  How many can
-         * a guest really have?
-         */
-        return -1;
+    return balloon_free_page_support_fn &&
+           balloon_free_page_support_fn(balloon_opaque);
+}
+
+void balloon_free_page_start(void)
+{
+    balloon_free_page_start_fn(balloon_opaque);
+}
+
+void balloon_free_page_stop(void)
+{
+    balloon_free_page_stop_fn(balloon_opaque);
+}
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque)
+{
+    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
+        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
+        balloon_opaque) {
+        /* We already registered one balloon handler. */
+        return;
     }
-    balloon_event_fn = event_func;
-    balloon_stat_fn = stat_func;
+
+    balloon_event_fn = event_fn;
+    balloon_stat_fn = stat_fn;
+    balloon_free_page_support_fn = free_page_support_fn;
+    balloon_free_page_start_fn = free_page_start_fn;
+    balloon_free_page_stop_fn = free_page_stop_fn;
     balloon_opaque = opaque;
-    return 0;
 }
 
 void qemu_remove_balloon_handler(void *opaque)
@@ -86,6 +112,9 @@ void qemu_remove_balloon_handler(void *opaque)
     }
     balloon_event_fn = NULL;
     balloon_stat_fn = NULL;
+    balloon_free_page_support_fn = NULL;
+    balloon_free_page_start_fn = NULL;
+    balloon_free_page_stop_fn = NULL;
     balloon_opaque = NULL;
 }
 
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 4822449..48ed2ec 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -31,6 +31,7 @@
 
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "migration/misc.h"
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
 
@@ -308,6 +309,111 @@ out:
     }
 }
 
+static void *virtio_balloon_poll_free_page_hints(void *opaque)
+{
+    VirtQueueElement *elem;
+    VirtIOBalloon *dev = opaque;
+    VirtQueue *vq = dev->free_page_vq;
+    uint32_t id;
+    size_t size;
+
+    /*
+     * Poll the vq till the status changed to STOP. This happens when
+     * the guest finishes reporting hints or the migration thread actively
+     * stops the reporting.
+     */
+    while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            continue;
+        }
+
+        if (elem->out_num) {
+            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
+            virtqueue_push(vq, elem, size);
+            g_free(elem);
+            if (unlikely(size != sizeof(id))) {
+                warn_report("%s: received an incorrect cmd id", __func__);
+                break;
+            }
+            if (id == dev->free_page_report_cmd_id) {
+                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
+            } else if (dev->free_page_report_status ==
+                       FREE_PAGE_REPORT_S_START) {
+                /*
+                 * Stop the optimization only when it has started. This avoids
+                 * obsolete stop sign for the previous command.
+                 */
+                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+                break;
+            }
+        }
+
+        if (elem->in_num) {
+            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
+                !dev->poison_val) {
+                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
+                                          elem->in_sg[0].iov_len);
+            }
+            virtqueue_push(vq, elem, 0);
+            g_free(elem);
+        }
+    }
+    return NULL;
+}
+
+static bool virtio_balloon_free_page_support(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
+}
+
+static void virtio_balloon_free_page_start(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    if (unlikely(s->free_page_report_cmd_id == UINT_MAX)) {
+        s->free_page_report_cmd_id =
+                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
+    } else {
+        s->free_page_report_cmd_id++;
+    }
+
+    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
+    virtio_notify_config(vdev);
+    qemu_thread_create(&s->free_page_thread, "free_page_optimization_thread",
+                       virtio_balloon_poll_free_page_hints, s,
+                       QEMU_THREAD_JOINABLE);
+}
+
+static void virtio_balloon_free_page_stop(void *opaque)
+{
+    VirtIOBalloon *s = opaque;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    switch (s->free_page_report_status) {
+    case FREE_PAGE_REPORT_S_REQUESTED:
+    case FREE_PAGE_REPORT_S_START:
+        /*
+         * The guest hasn't done the reporting, so host sends a notification
+         * to the guest to actively stop the reporting before joining the
+         * optimization thread.
+         */
+        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
+        virtio_notify_config(vdev);
+    case FREE_PAGE_REPORT_S_STOP:
+        /* The guest has stopped the reporting. Join the optimization thread */
+        qemu_thread_join(&s->free_page_thread);
+        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
+    case FREE_PAGE_REPORT_S_EXIT:
+        /* The optimization thread has gone. No actions needded so far. */
+        break;
+    }
+}
+
 static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
@@ -315,6 +421,15 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
+    config.poison_val = cpu_to_le32(dev->poison_val);
+
+    if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
+    } else {
+        config.free_page_report_cmd_id =
+                       cpu_to_le32(dev->free_page_report_cmd_id);
+    }
 
     trace_virtio_balloon_get_config(config.num_pages, config.actual);
     memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
@@ -368,6 +483,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
                         ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
                         &error_abort);
     }
+    dev->poison_val = le32_to_cpu(config.poison_val);
     trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -377,6 +493,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+
+    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
+        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
+    }
+
     return f;
 }
 
@@ -413,6 +534,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
     return 0;
 }
 
+static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
+    .name = "virtio-balloon-device/free-page-report",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = virtio_balloon_free_page_support,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
+        VMSTATE_UINT32(poison_val, VirtIOBalloon),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_virtio_balloon_device = {
     .name = "virtio-balloon-device",
     .version_id = 1,
@@ -423,30 +556,30 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
         VMSTATE_UINT32(actual, VirtIOBalloon),
         VMSTATE_END_OF_LIST()
     },
+    .subsections = (const VMStateDescription * []) {
+        &vmstate_virtio_balloon_free_page_report,
+        NULL
+    }
 };
 
 static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
     VirtIOBalloon *s = VIRTIO_BALLOON(dev);
-    int ret;
 
     virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
                 sizeof(struct virtio_balloon_config));
 
-    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
-                                   virtio_balloon_stat, s);
-
-    if (ret < 0) {
-        error_setg(errp, "Only one balloon device is supported");
-        virtio_cleanup(vdev);
-        return;
-    }
-
     s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
-
+    if (virtio_has_feature(s->host_features,
+                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
+        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
+        s->free_page_report_cmd_id =
+                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
+    }
     reset_stats(s);
 }
 
@@ -475,11 +608,27 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
 
-    if (!s->stats_vq_elem && vdev->vm_running &&
-        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
-        /* poll stats queue for the element we have discarded when the VM
-         * was stopped */
-        virtio_balloon_receive_stats(vdev, s->svq);
+    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+        if (!s->stats_vq_elem && vdev->vm_running &&
+            virtqueue_rewind(s->svq, 1)) {
+            /*
+             * Poll stats queue for the element we have discarded when the VM
+             * was stopped.
+             */
+            virtio_balloon_receive_stats(vdev, s->svq);
+        }
+
+        if (virtio_balloon_free_page_support(s)) {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat,
+                                     virtio_balloon_free_page_support,
+                                     virtio_balloon_free_page_start,
+                                     virtio_balloon_free_page_stop,
+                                     s);
+        } else {
+            qemu_add_balloon_handler(virtio_balloon_to_target,
+                                     virtio_balloon_stat, NULL, NULL, NULL, s);
+        }
     }
 }
 
@@ -509,6 +658,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
 static Property virtio_balloon_properties[] = {
     DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
+    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 1ea13bd..12fde2f 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -23,6 +23,8 @@
 #define VIRTIO_BALLOON(obj) \
         OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
+
 typedef struct virtio_balloon_stat VirtIOBalloonStat;
 
 typedef struct virtio_balloon_stat_modern {
@@ -31,15 +33,26 @@ typedef struct virtio_balloon_stat_modern {
        uint64_t val;
 } VirtIOBalloonStatModern;
 
+enum virtio_balloon_free_page_report_status {
+    FREE_PAGE_REPORT_S_REQUESTED,
+    FREE_PAGE_REPORT_S_START,
+    FREE_PAGE_REPORT_S_STOP,
+    FREE_PAGE_REPORT_S_EXIT,
+};
+
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;
+    uint32_t free_page_report_cmd_id;
+    uint32_t poison_val;
     uint64_t stats[VIRTIO_BALLOON_S_NR];
     VirtQueueElement *stats_vq_elem;
     size_t stats_vq_offset;
     QEMUTimer *stats_timer;
+    QemuThread free_page_thread;
     int64_t stats_last_update;
     int64_t stats_poll_interval;
     uint32_t host_features;
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 7b0a41b..f89e80f 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -34,15 +34,22 @@
 #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
 #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
+#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
+#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12
 
+#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
 struct virtio_balloon_config {
 	/* Number of pages host wants Guest to give up. */
 	uint32_t num_pages;
 	/* Number of pages we've actually got in balloon. */
 	uint32_t actual;
+	/* Free page report command id, readonly by guest */
+	uint32_t free_page_report_cmd_id;
+	/* Stores PAGE_POISON if page poisoning is in use */
+	uint32_t poison_val;
 };
 
 #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
index af49e19..16a2aae 100644
--- a/include/sysemu/balloon.h
+++ b/include/sysemu/balloon.h
@@ -18,11 +18,22 @@
 
 typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
 typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
+typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
+typedef void (QEMUBalloonFreePageStart)(void *opaque);
+typedef void (QEMUBalloonFreePageStop)(void *opaque);
 
-int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
-			     QEMUBalloonStatus *stat_func, void *opaque);
 void qemu_remove_balloon_handler(void *opaque);
 bool qemu_balloon_is_inhibited(void);
 void qemu_balloon_inhibit(bool state);
+bool balloon_free_page_support(void);
+void balloon_free_page_start(void);
+void balloon_free_page_stop(void);
+
+void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
+                              QEMUBalloonStatus *stat_fn,
+                              QEMUBalloonFreePageSupport *free_page_support_fn,
+                              QEMUBalloonFreePageStart *free_page_start_fn,
+                              QEMUBalloonFreePageStop *free_page_stop_fn,
+                              void *opaque);
 
 #endif
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [PATCH v4 4/4] migration: use the free page hint feature from balloon
  2018-03-07 12:34 ` [virtio-dev] " Wei Wang
@ 2018-03-07 12:34   ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-07 12:34 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Start the free page optimization after the migration bitmap is
synchronized. This can't be used in the stop&copy phase since the guest
is paused. Make sure the guest reporting has stopped before
synchronizing the migration dirty bitmap. Currently, the optimization is
added to precopy only.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index e172798..7b4c9b1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -51,6 +51,8 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "sysemu/balloon.h"
+#include "sysemu/sysemu.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -208,6 +210,8 @@ struct RAMState {
     uint32_t last_version;
     /* We are in the first round */
     bool ram_bulk_stage;
+    /* The free pages optimization feature is supported */
+    bool free_page_support;
     /* How many times we have dirty too many pages */
     int dirty_rate_high_cnt;
     /* these variables are used for bitmap sync */
@@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     unsigned long *bitmap = rb->bmap;
     unsigned long next;
 
-    if (rs->ram_bulk_stage && start > 0) {
+    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
         next = start + 1;
     } else {
         next = find_next_bit(bitmap, size, start);
@@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
     int64_t end_time;
     uint64_t bytes_xfer_now;
 
+    if (rs->free_page_support) {
+        balloon_free_page_stop();
+    }
+
     ram_counters.dirty_sync_count++;
 
     if (!rs->time_last_bitmap_sync) {
@@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
     if (migrate_use_events()) {
         qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
     }
+
+    if (rs->free_page_support && runstate_is_running()) {
+        balloon_free_page_start();
+    }
 }
 
 /**
@@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
     rs->last_page = 0;
     rs->last_version = ram_list.version;
     rs->ram_bulk_stage = true;
+    rs->free_page_support = balloon_free_page_support() &
+                            !migration_in_postcopy();
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -2330,6 +2344,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 
     ret = qemu_file_get_error(f);
     if (ret < 0) {
+        if (rs->free_page_support) {
+            balloon_free_page_stop();
+        }
         return ret;
     }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [virtio-dev] [PATCH v4 4/4] migration: use the free page hint feature from balloon
@ 2018-03-07 12:34   ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-07 12:34 UTC (permalink / raw)
  To: qemu-devel, virtio-dev, mst, quintela, dgilbert
  Cc: pbonzini, wei.w.wang, liliang.opensource, yang.zhang.wz,
	quan.xu0, nilal, riel

Start the free page optimization after the migration bitmap is
synchronized. This can't be used in the stop&copy phase since the guest
is paused. Make sure the guest reporting has stopped before
synchronizing the migration dirty bitmap. Currently, the optimization is
added to precopy only.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
---
 migration/ram.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index e172798..7b4c9b1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -51,6 +51,8 @@
 #include "qemu/rcu_queue.h"
 #include "migration/colo.h"
 #include "migration/block.h"
+#include "sysemu/balloon.h"
+#include "sysemu/sysemu.h"
 
 /***********************************************************/
 /* ram save/restore */
@@ -208,6 +210,8 @@ struct RAMState {
     uint32_t last_version;
     /* We are in the first round */
     bool ram_bulk_stage;
+    /* The free pages optimization feature is supported */
+    bool free_page_support;
     /* How many times we have dirty too many pages */
     int dirty_rate_high_cnt;
     /* these variables are used for bitmap sync */
@@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     unsigned long *bitmap = rb->bmap;
     unsigned long next;
 
-    if (rs->ram_bulk_stage && start > 0) {
+    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
         next = start + 1;
     } else {
         next = find_next_bit(bitmap, size, start);
@@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
     int64_t end_time;
     uint64_t bytes_xfer_now;
 
+    if (rs->free_page_support) {
+        balloon_free_page_stop();
+    }
+
     ram_counters.dirty_sync_count++;
 
     if (!rs->time_last_bitmap_sync) {
@@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
     if (migrate_use_events()) {
         qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
     }
+
+    if (rs->free_page_support && runstate_is_running()) {
+        balloon_free_page_start();
+    }
 }
 
 /**
@@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
     rs->last_page = 0;
     rs->last_version = ram_list.version;
     rs->ram_bulk_stage = true;
+    rs->free_page_support = balloon_free_page_support() &
+                            !migration_in_postcopy();
 }
 
 #define MAX_WAIT 50 /* ms, half buffered_file limit */
@@ -2330,6 +2344,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
 
     ret = qemu_file_get_error(f);
     if (ret < 0) {
+        if (rs->free_page_support) {
+            balloon_free_page_stop();
+        }
         return ret;
     }
 
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] migration: use the free page hint feature from balloon
  2018-03-07 12:34   ` [virtio-dev] " Wei Wang
@ 2018-03-13 16:35     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-13 16:35 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
> Start the free page optimization after the migration bitmap is
> synchronized. This can't be used in the stop&copy phase since the guest
> is paused. Make sure the guest reporting has stopped before
> synchronizing the migration dirty bitmap. Currently, the optimization is
> added to precopy only.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> ---
>  migration/ram.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index e172798..7b4c9b1 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -51,6 +51,8 @@
>  #include "qemu/rcu_queue.h"
>  #include "migration/colo.h"
>  #include "migration/block.h"
> +#include "sysemu/balloon.h"
> +#include "sysemu/sysemu.h"
>  
>  /***********************************************************/
>  /* ram save/restore */
> @@ -208,6 +210,8 @@ struct RAMState {
>      uint32_t last_version;
>      /* We are in the first round */
>      bool ram_bulk_stage;
> +    /* The free pages optimization feature is supported */
> +    bool free_page_support;
>      /* How many times we have dirty too many pages */
>      int dirty_rate_high_cnt;
>      /* these variables are used for bitmap sync */
> @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>      unsigned long *bitmap = rb->bmap;
>      unsigned long next;
>  
> -    if (rs->ram_bulk_stage && start > 0) {
> +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
>          next = start + 1;
>      } else {
>          next = find_next_bit(bitmap, size, start);
> @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
>      int64_t end_time;
>      uint64_t bytes_xfer_now;
>  
> +    if (rs->free_page_support) {
> +        balloon_free_page_stop();
> +    }
> +
>      ram_counters.dirty_sync_count++;
>  
>      if (!rs->time_last_bitmap_sync) {
> @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
>      if (migrate_use_events()) {
>          qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
>      }
> +
> +    if (rs->free_page_support && runstate_is_running()) {
> +        balloon_free_page_start();
> +    }
>  }

I think some of these conditions should go into
balloon_free_page_start/stop.

Checking runstate is generally problematic unless you
also handle run state change notifiers as it can
be manipulated from QMP.

>  
>  /**
> @@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
>      rs->last_page = 0;
>      rs->last_version = ram_list.version;
>      rs->ram_bulk_stage = true;
> +    rs->free_page_support = balloon_free_page_support() &
> +                            !migration_in_postcopy();

Probably &&?

>  }
>  
>  #define MAX_WAIT 50 /* ms, half buffered_file limit */
> @@ -2330,6 +2344,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  
>      ret = qemu_file_get_error(f);
>      if (ret < 0) {
> +        if (rs->free_page_support) {
> +            balloon_free_page_stop();
> +        }
>          return ret;
>      }
>  
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 4/4] migration: use the free page hint feature from balloon
@ 2018-03-13 16:35     ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-13 16:35 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
> Start the free page optimization after the migration bitmap is
> synchronized. This can't be used in the stop&copy phase since the guest
> is paused. Make sure the guest reporting has stopped before
> synchronizing the migration dirty bitmap. Currently, the optimization is
> added to precopy only.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> ---
>  migration/ram.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index e172798..7b4c9b1 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -51,6 +51,8 @@
>  #include "qemu/rcu_queue.h"
>  #include "migration/colo.h"
>  #include "migration/block.h"
> +#include "sysemu/balloon.h"
> +#include "sysemu/sysemu.h"
>  
>  /***********************************************************/
>  /* ram save/restore */
> @@ -208,6 +210,8 @@ struct RAMState {
>      uint32_t last_version;
>      /* We are in the first round */
>      bool ram_bulk_stage;
> +    /* The free pages optimization feature is supported */
> +    bool free_page_support;
>      /* How many times we have dirty too many pages */
>      int dirty_rate_high_cnt;
>      /* these variables are used for bitmap sync */
> @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>      unsigned long *bitmap = rb->bmap;
>      unsigned long next;
>  
> -    if (rs->ram_bulk_stage && start > 0) {
> +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
>          next = start + 1;
>      } else {
>          next = find_next_bit(bitmap, size, start);
> @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
>      int64_t end_time;
>      uint64_t bytes_xfer_now;
>  
> +    if (rs->free_page_support) {
> +        balloon_free_page_stop();
> +    }
> +
>      ram_counters.dirty_sync_count++;
>  
>      if (!rs->time_last_bitmap_sync) {
> @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
>      if (migrate_use_events()) {
>          qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
>      }
> +
> +    if (rs->free_page_support && runstate_is_running()) {
> +        balloon_free_page_start();
> +    }
>  }

I think some of these conditions should go into
balloon_free_page_start/stop.

Checking runstate is generally problematic unless you
also handle run state change notifiers as it can
be manipulated from QMP.

>  
>  /**
> @@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
>      rs->last_page = 0;
>      rs->last_version = ram_list.version;
>      rs->ram_bulk_stage = true;
> +    rs->free_page_support = balloon_free_page_support() &
> +                            !migration_in_postcopy();

Probably &&?

>  }
>  
>  #define MAX_WAIT 50 /* ms, half buffered_file limit */
> @@ -2330,6 +2344,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  
>      ret = qemu_file_get_error(f);
>      if (ret < 0) {
> +        if (rs->free_page_support) {
> +            balloon_free_page_stop();
> +        }
>          return ret;
>      }
>  
> -- 
> 1.8.3.1

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-07 12:34   ` [virtio-dev] " Wei Wang
@ 2018-03-13 16:49     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-13 16:49 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> The new feature enables the virtio-balloon device to receive hints of
> guest free pages from the free page vq.
> 
> balloon_free_page_start - start guest free page hint reporting.
> balloon_free_page_stop - stop guest free page hint reporting.
> 
> Note: balloon will report pages which were free at the time
> of this call. As the reporting happens asynchronously, dirty bit logging
> must be enabled before this call is made. Guest reporting must be
> disabled before the migration dirty bitmap is synchronized.

Add this usage documentation in code as well, not just
in the commit log.

Another limitation seems to be that machine needs
to be running. I think ideally you should not teach callers
about this limitation though.

> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>

I find it suspicious that neither unrealize nor reset
functions have been touched at all.
Are you sure you have thought through scenarious like
hot-unplug or disabling the device by guest?
  
> ---
>  balloon.c                                       |  49 +++++--
>  hw/virtio/virtio-balloon.c                      | 183 +++++++++++++++++++++---
>  include/hw/virtio/virtio-balloon.h              |  15 +-
>  include/standard-headers/linux/virtio_balloon.h |   7 +
>  include/sysemu/balloon.h                        |  15 +-
>  5 files changed, 240 insertions(+), 29 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index d8dd6fe..b0b0749 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -36,6 +36,9 @@
>  
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
> +static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
> +static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
>  static void *balloon_opaque;
>  static bool balloon_inhibited;
>  
> @@ -64,19 +67,42 @@ static bool have_balloon(Error **errp)
>      return true;
>  }
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -                             QEMUBalloonStatus *stat_func, void *opaque)
> +bool balloon_free_page_support(void)
>  {
> -    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> -        /* We're already registered one balloon handler.  How many can
> -         * a guest really have?
> -         */
> -        return -1;
> +    return balloon_free_page_support_fn &&
> +           balloon_free_page_support_fn(balloon_opaque);
> +}
> +
> +void balloon_free_page_start(void)
> +{
> +    balloon_free_page_start_fn(balloon_opaque);
> +}
> +
> +void balloon_free_page_stop(void)
> +{
> +    balloon_free_page_stop_fn(balloon_opaque);
> +}
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque)
> +{
> +    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
> +        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
> +        balloon_opaque) {
> +        /* We already registered one balloon handler. */
> +        return;
>      }
> -    balloon_event_fn = event_func;
> -    balloon_stat_fn = stat_func;
> +
> +    balloon_event_fn = event_fn;
> +    balloon_stat_fn = stat_fn;
> +    balloon_free_page_support_fn = free_page_support_fn;
> +    balloon_free_page_start_fn = free_page_start_fn;
> +    balloon_free_page_stop_fn = free_page_stop_fn;
>      balloon_opaque = opaque;
> -    return 0;
>  }
>  
>  void qemu_remove_balloon_handler(void *opaque)
> @@ -86,6 +112,9 @@ void qemu_remove_balloon_handler(void *opaque)
>      }
>      balloon_event_fn = NULL;
>      balloon_stat_fn = NULL;
> +    balloon_free_page_support_fn = NULL;
> +    balloon_free_page_start_fn = NULL;
> +    balloon_free_page_stop_fn = NULL;
>      balloon_opaque = NULL;
>  }
>  
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 4822449..48ed2ec 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -31,6 +31,7 @@
>  
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "migration/misc.h"
>  
>  #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
>  
> @@ -308,6 +309,111 @@ out:
>      }
>  }
>  
> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;

What makes it safe to poke at this device from multiple threads?
I think that it would be safer to do it from e.g. BH.

> +
> +    /*
> +     * Poll the vq till the status changed to STOP. This happens when
> +     * the guest finishes reporting hints or the migration thread actively
> +     * stops the reporting.
> +     */
> +    while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP) {
> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +        if (!elem) {
> +            continue;
> +        }
> +
> +        if (elem->out_num) {
> +            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
> +            virtqueue_push(vq, elem, size);
> +            g_free(elem);
> +            if (unlikely(size != sizeof(id))) {
> +                warn_report("%s: received an incorrect cmd id", __func__);
> +                break;
> +            }
> +            if (id == dev->free_page_report_cmd_id) {
> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> +            } else if (dev->free_page_report_status ==
> +                       FREE_PAGE_REPORT_S_START) {
> +                /*
> +                 * Stop the optimization only when it has started. This avoids
> +                 * obsolete stop sign for the previous command.

obsolete -> a stale

> +                 */
> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +                break;
> +            }
> +        }
> +
> +        if (elem->in_num) {
> +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
> +                !dev->poison_val) {
> +                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
> +                                          elem->in_sg[0].iov_len);
> +            }
> +            virtqueue_push(vq, elem, 0);
> +            g_free(elem);
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static bool virtio_balloon_free_page_support(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> +}
> +
> +static void virtio_balloon_free_page_start(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    if (unlikely(s->free_page_report_cmd_id == UINT_MAX)) {
> +        s->free_page_report_cmd_id =
> +                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> +    } else {
> +        s->free_page_report_cmd_id++;
> +    }
> +
> +    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
> +    virtio_notify_config(vdev);
> +    qemu_thread_create(&s->free_page_thread, "free_page_optimization_thread",
> +                       virtio_balloon_poll_free_page_hints, s,
> +                       QEMU_THREAD_JOINABLE);
> +}
> +
> +static void virtio_balloon_free_page_stop(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    switch (s->free_page_report_status) {
> +    case FREE_PAGE_REPORT_S_REQUESTED:
> +    case FREE_PAGE_REPORT_S_START:
> +        /*
> +         * The guest hasn't done the reporting, so host sends a notification
> +         * to the guest to actively stop the reporting before joining the
> +         * optimization thread.
> +         */
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +        virtio_notify_config(vdev);
> +    case FREE_PAGE_REPORT_S_STOP:
> +        /* The guest has stopped the reporting. Join the optimization thread */
> +        qemu_thread_join(&s->free_page_thread);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
> +    case FREE_PAGE_REPORT_S_EXIT:
> +        /* The optimization thread has gone. No actions needded so far. */
> +        break;
> +    }
> +}
> +
>  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> @@ -315,6 +421,15 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  
>      config.num_pages = cpu_to_le32(dev->num_pages);
>      config.actual = cpu_to_le32(dev->actual);
> +    config.poison_val = cpu_to_le32(dev->poison_val);
> +
> +    if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
> +    } else {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(dev->free_page_report_cmd_id);
> +    }
>  
>      trace_virtio_balloon_get_config(config.num_pages, config.actual);
>      memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
> @@ -368,6 +483,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
>                          &error_abort);
>      }
> +    dev->poison_val = le32_to_cpu(config.poison_val);
>      trace_virtio_balloon_set_config(dev->actual, oldactual);
>  }
>  
> @@ -377,6 +493,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +
> +    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> +    }
> +
>      return f;
>  }
>  
> @@ -413,6 +534,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
>      return 0;
>  }
>  
> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> +    .name = "virtio-balloon-device/free-page-report",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = virtio_balloon_free_page_support,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_virtio_balloon_device = {
>      .name = "virtio-balloon-device",
>      .version_id = 1,
> @@ -423,30 +556,30 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>          VMSTATE_UINT32(actual, VirtIOBalloon),
>          VMSTATE_END_OF_LIST()
>      },
> +    .subsections = (const VMStateDescription * []) {
> +        &vmstate_virtio_balloon_free_page_report,
> +        NULL
> +    }
>  };
>  
>  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> -    int ret;
>  
>      virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
>                  sizeof(struct virtio_balloon_config));
>  
> -    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> -                                   virtio_balloon_stat, s);
> -
> -    if (ret < 0) {
> -        error_setg(errp, "Only one balloon device is supported");
> -        virtio_cleanup(vdev);
> -        return;
> -    }
> -
>      s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> -
> +    if (virtio_has_feature(s->host_features,
> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
> +        s->free_page_report_cmd_id =
> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
> +    }
>      reset_stats(s);
>  }
>
> @@ -475,11 +608,27 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> -    if (!s->stats_vq_elem && vdev->vm_running &&
> -        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
> -        /* poll stats queue for the element we have discarded when the VM
> -         * was stopped */
> -        virtio_balloon_receive_stats(vdev, s->svq);
> +    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +        if (!s->stats_vq_elem && vdev->vm_running &&
> +            virtqueue_rewind(s->svq, 1)) {
> +            /*
> +             * Poll stats queue for the element we have discarded when the VM
> +             * was stopped.
> +             */
> +            virtio_balloon_receive_stats(vdev, s->svq);
> +        }
> +
> +        if (virtio_balloon_free_page_support(s)) {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat,
> +                                     virtio_balloon_free_page_support,
> +                                     virtio_balloon_free_page_start,
> +                                     virtio_balloon_free_page_stop,
> +                                     s);
> +        } else {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat, NULL, NULL, NULL, s);
> +        }
>      }
>  }
>  
> @@ -509,6 +658,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
>  static Property virtio_balloon_properties[] = {
>      DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> +    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 1ea13bd..12fde2f 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -23,6 +23,8 @@
>  #define VIRTIO_BALLOON(obj) \
>          OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
> +
>  typedef struct virtio_balloon_stat VirtIOBalloonStat;
>  
>  typedef struct virtio_balloon_stat_modern {
> @@ -31,15 +33,26 @@ typedef struct virtio_balloon_stat_modern {
>         uint64_t val;
>  } VirtIOBalloonStatModern;
>  
> +enum virtio_balloon_free_page_report_status {
> +    FREE_PAGE_REPORT_S_REQUESTED,
> +    FREE_PAGE_REPORT_S_START,
> +    FREE_PAGE_REPORT_S_STOP,
> +    FREE_PAGE_REPORT_S_EXIT,
> +};
> +
>  typedef struct VirtIOBalloon {
>      VirtIODevice parent_obj;
> -    VirtQueue *ivq, *dvq, *svq;
> +    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
> +    uint32_t free_page_report_status;
>      uint32_t num_pages;
>      uint32_t actual;
> +    uint32_t free_page_report_cmd_id;
> +    uint32_t poison_val;
>      uint64_t stats[VIRTIO_BALLOON_S_NR];
>      VirtQueueElement *stats_vq_elem;
>      size_t stats_vq_offset;
>      QEMUTimer *stats_timer;
> +    QemuThread free_page_thread;
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
>      uint32_t host_features;
> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
> index 7b0a41b..f89e80f 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -34,15 +34,22 @@
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> +#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
> +#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
>  struct virtio_balloon_config {
>  	/* Number of pages host wants Guest to give up. */
>  	uint32_t num_pages;
>  	/* Number of pages we've actually got in balloon. */
>  	uint32_t actual;
> +	/* Free page report command id, readonly by guest */
> +	uint32_t free_page_report_cmd_id;
> +	/* Stores PAGE_POISON if page poisoning is in use */
> +	uint32_t poison_val;
>  };
>  
>  #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index af49e19..16a2aae 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h
> @@ -18,11 +18,22 @@
>  
>  typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
>  typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
> +typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
> +typedef void (QEMUBalloonFreePageStart)(void *opaque);
> +typedef void (QEMUBalloonFreePageStop)(void *opaque);
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -			     QEMUBalloonStatus *stat_func, void *opaque);
>  void qemu_remove_balloon_handler(void *opaque);
>  bool qemu_balloon_is_inhibited(void);
>  void qemu_balloon_inhibit(bool state);
> +bool balloon_free_page_support(void);
> +void balloon_free_page_start(void);
> +void balloon_free_page_stop(void);
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque);
>  
>  #endif
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-13 16:49     ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-13 16:49 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> The new feature enables the virtio-balloon device to receive hints of
> guest free pages from the free page vq.
> 
> balloon_free_page_start - start guest free page hint reporting.
> balloon_free_page_stop - stop guest free page hint reporting.
> 
> Note: balloon will report pages which were free at the time
> of this call. As the reporting happens asynchronously, dirty bit logging
> must be enabled before this call is made. Guest reporting must be
> disabled before the migration dirty bitmap is synchronized.

Add this usage documentation in code as well, not just
in the commit log.

Another limitation seems to be that machine needs
to be running. I think ideally you should not teach callers
about this limitation though.

> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>

I find it suspicious that neither unrealize nor reset
functions have been touched at all.
Are you sure you have thought through scenarious like
hot-unplug or disabling the device by guest?
  
> ---
>  balloon.c                                       |  49 +++++--
>  hw/virtio/virtio-balloon.c                      | 183 +++++++++++++++++++++---
>  include/hw/virtio/virtio-balloon.h              |  15 +-
>  include/standard-headers/linux/virtio_balloon.h |   7 +
>  include/sysemu/balloon.h                        |  15 +-
>  5 files changed, 240 insertions(+), 29 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index d8dd6fe..b0b0749 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -36,6 +36,9 @@
>  
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
> +static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
> +static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
>  static void *balloon_opaque;
>  static bool balloon_inhibited;
>  
> @@ -64,19 +67,42 @@ static bool have_balloon(Error **errp)
>      return true;
>  }
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -                             QEMUBalloonStatus *stat_func, void *opaque)
> +bool balloon_free_page_support(void)
>  {
> -    if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> -        /* We're already registered one balloon handler.  How many can
> -         * a guest really have?
> -         */
> -        return -1;
> +    return balloon_free_page_support_fn &&
> +           balloon_free_page_support_fn(balloon_opaque);
> +}
> +
> +void balloon_free_page_start(void)
> +{
> +    balloon_free_page_start_fn(balloon_opaque);
> +}
> +
> +void balloon_free_page_stop(void)
> +{
> +    balloon_free_page_stop_fn(balloon_opaque);
> +}
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque)
> +{
> +    if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn ||
> +        balloon_free_page_start_fn || balloon_free_page_stop_fn ||
> +        balloon_opaque) {
> +        /* We already registered one balloon handler. */
> +        return;
>      }
> -    balloon_event_fn = event_func;
> -    balloon_stat_fn = stat_func;
> +
> +    balloon_event_fn = event_fn;
> +    balloon_stat_fn = stat_fn;
> +    balloon_free_page_support_fn = free_page_support_fn;
> +    balloon_free_page_start_fn = free_page_start_fn;
> +    balloon_free_page_stop_fn = free_page_stop_fn;
>      balloon_opaque = opaque;
> -    return 0;
>  }
>  
>  void qemu_remove_balloon_handler(void *opaque)
> @@ -86,6 +112,9 @@ void qemu_remove_balloon_handler(void *opaque)
>      }
>      balloon_event_fn = NULL;
>      balloon_stat_fn = NULL;
> +    balloon_free_page_support_fn = NULL;
> +    balloon_free_page_start_fn = NULL;
> +    balloon_free_page_stop_fn = NULL;
>      balloon_opaque = NULL;
>  }
>  
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 4822449..48ed2ec 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -31,6 +31,7 @@
>  
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "migration/misc.h"
>  
>  #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
>  
> @@ -308,6 +309,111 @@ out:
>      }
>  }
>  
> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;

What makes it safe to poke at this device from multiple threads?
I think that it would be safer to do it from e.g. BH.

> +
> +    /*
> +     * Poll the vq till the status changed to STOP. This happens when
> +     * the guest finishes reporting hints or the migration thread actively
> +     * stops the reporting.
> +     */
> +    while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP) {
> +        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
> +        if (!elem) {
> +            continue;
> +        }
> +
> +        if (elem->out_num) {
> +            size = iov_to_buf(elem->out_sg, elem->out_num, 0, &id, sizeof(id));
> +            virtqueue_push(vq, elem, size);
> +            g_free(elem);
> +            if (unlikely(size != sizeof(id))) {
> +                warn_report("%s: received an incorrect cmd id", __func__);
> +                break;
> +            }
> +            if (id == dev->free_page_report_cmd_id) {
> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> +            } else if (dev->free_page_report_status ==
> +                       FREE_PAGE_REPORT_S_START) {
> +                /*
> +                 * Stop the optimization only when it has started. This avoids
> +                 * obsolete stop sign for the previous command.

obsolete -> a stale

> +                 */
> +                dev->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +                break;
> +            }
> +        }
> +
> +        if (elem->in_num) {
> +            if (dev->free_page_report_status == FREE_PAGE_REPORT_S_START &&
> +                !dev->poison_val) {
> +                qemu_guest_free_page_hint(elem->in_sg[0].iov_base,
> +                                          elem->in_sg[0].iov_len);
> +            }
> +            virtqueue_push(vq, elem, 0);
> +            g_free(elem);
> +        }
> +    }
> +    return NULL;
> +}
> +
> +static bool virtio_balloon_free_page_support(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT);
> +}
> +
> +static void virtio_balloon_free_page_start(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    if (unlikely(s->free_page_report_cmd_id == UINT_MAX)) {
> +        s->free_page_report_cmd_id =
> +                       VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> +    } else {
> +        s->free_page_report_cmd_id++;
> +    }
> +
> +    s->free_page_report_status = FREE_PAGE_REPORT_S_REQUESTED;
> +    virtio_notify_config(vdev);
> +    qemu_thread_create(&s->free_page_thread, "free_page_optimization_thread",
> +                       virtio_balloon_poll_free_page_hints, s,
> +                       QEMU_THREAD_JOINABLE);
> +}
> +
> +static void virtio_balloon_free_page_stop(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    switch (s->free_page_report_status) {
> +    case FREE_PAGE_REPORT_S_REQUESTED:
> +    case FREE_PAGE_REPORT_S_START:
> +        /*
> +         * The guest hasn't done the reporting, so host sends a notification
> +         * to the guest to actively stop the reporting before joining the
> +         * optimization thread.
> +         */
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +        virtio_notify_config(vdev);
> +    case FREE_PAGE_REPORT_S_STOP:
> +        /* The guest has stopped the reporting. Join the optimization thread */
> +        qemu_thread_join(&s->free_page_thread);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
> +    case FREE_PAGE_REPORT_S_EXIT:
> +        /* The optimization thread has gone. No actions needded so far. */
> +        break;
> +    }
> +}
> +
>  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> @@ -315,6 +421,15 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  
>      config.num_pages = cpu_to_le32(dev->num_pages);
>      config.actual = cpu_to_le32(dev->actual);
> +    config.poison_val = cpu_to_le32(dev->poison_val);
> +
> +    if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
> +    } else {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(dev->free_page_report_cmd_id);
> +    }
>  
>      trace_virtio_balloon_get_config(config.num_pages, config.actual);
>      memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
> @@ -368,6 +483,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
>                          &error_abort);
>      }
> +    dev->poison_val = le32_to_cpu(config.poison_val);
>      trace_virtio_balloon_set_config(dev->actual, oldactual);
>  }
>  
> @@ -377,6 +493,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +
> +    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> +    }
> +
>      return f;
>  }
>  
> @@ -413,6 +534,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
>      return 0;
>  }
>  
> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> +    .name = "virtio-balloon-device/free-page-report",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = virtio_balloon_free_page_support,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_virtio_balloon_device = {
>      .name = "virtio-balloon-device",
>      .version_id = 1,
> @@ -423,30 +556,30 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>          VMSTATE_UINT32(actual, VirtIOBalloon),
>          VMSTATE_END_OF_LIST()
>      },
> +    .subsections = (const VMStateDescription * []) {
> +        &vmstate_virtio_balloon_free_page_report,
> +        NULL
> +    }
>  };
>  
>  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> -    int ret;
>  
>      virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
>                  sizeof(struct virtio_balloon_config));
>  
> -    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> -                                   virtio_balloon_stat, s);
> -
> -    if (ret < 0) {
> -        error_setg(errp, "Only one balloon device is supported");
> -        virtio_cleanup(vdev);
> -        return;
> -    }
> -
>      s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> -
> +    if (virtio_has_feature(s->host_features,
> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
> +        s->free_page_report_cmd_id =
> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
> +    }
>      reset_stats(s);
>  }
>
> @@ -475,11 +608,27 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> -    if (!s->stats_vq_elem && vdev->vm_running &&
> -        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
> -        /* poll stats queue for the element we have discarded when the VM
> -         * was stopped */
> -        virtio_balloon_receive_stats(vdev, s->svq);
> +    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +        if (!s->stats_vq_elem && vdev->vm_running &&
> +            virtqueue_rewind(s->svq, 1)) {
> +            /*
> +             * Poll stats queue for the element we have discarded when the VM
> +             * was stopped.
> +             */
> +            virtio_balloon_receive_stats(vdev, s->svq);
> +        }
> +
> +        if (virtio_balloon_free_page_support(s)) {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat,
> +                                     virtio_balloon_free_page_support,
> +                                     virtio_balloon_free_page_start,
> +                                     virtio_balloon_free_page_stop,
> +                                     s);
> +        } else {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat, NULL, NULL, NULL, s);
> +        }
>      }
>  }
>  
> @@ -509,6 +658,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
>  static Property virtio_balloon_properties[] = {
>      DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> +    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 1ea13bd..12fde2f 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -23,6 +23,8 @@
>  #define VIRTIO_BALLOON(obj) \
>          OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
> +
>  typedef struct virtio_balloon_stat VirtIOBalloonStat;
>  
>  typedef struct virtio_balloon_stat_modern {
> @@ -31,15 +33,26 @@ typedef struct virtio_balloon_stat_modern {
>         uint64_t val;
>  } VirtIOBalloonStatModern;
>  
> +enum virtio_balloon_free_page_report_status {
> +    FREE_PAGE_REPORT_S_REQUESTED,
> +    FREE_PAGE_REPORT_S_START,
> +    FREE_PAGE_REPORT_S_STOP,
> +    FREE_PAGE_REPORT_S_EXIT,
> +};
> +
>  typedef struct VirtIOBalloon {
>      VirtIODevice parent_obj;
> -    VirtQueue *ivq, *dvq, *svq;
> +    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
> +    uint32_t free_page_report_status;
>      uint32_t num_pages;
>      uint32_t actual;
> +    uint32_t free_page_report_cmd_id;
> +    uint32_t poison_val;
>      uint64_t stats[VIRTIO_BALLOON_S_NR];
>      VirtQueueElement *stats_vq_elem;
>      size_t stats_vq_offset;
>      QEMUTimer *stats_timer;
> +    QemuThread free_page_thread;
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
>      uint32_t host_features;
> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
> index 7b0a41b..f89e80f 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -34,15 +34,22 @@
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> +#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
> +#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
>  struct virtio_balloon_config {
>  	/* Number of pages host wants Guest to give up. */
>  	uint32_t num_pages;
>  	/* Number of pages we've actually got in balloon. */
>  	uint32_t actual;
> +	/* Free page report command id, readonly by guest */
> +	uint32_t free_page_report_cmd_id;
> +	/* Stores PAGE_POISON if page poisoning is in use */
> +	uint32_t poison_val;
>  };
>  
>  #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index af49e19..16a2aae 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h
> @@ -18,11 +18,22 @@
>  
>  typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
>  typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
> +typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
> +typedef void (QEMUBalloonFreePageStart)(void *opaque);
> +typedef void (QEMUBalloonFreePageStop)(void *opaque);
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -			     QEMUBalloonStatus *stat_func, void *opaque);
>  void qemu_remove_balloon_handler(void *opaque);
>  bool qemu_balloon_is_inhibited(void);
>  void qemu_balloon_inhibit(bool state);
> +bool balloon_free_page_support(void);
> +void balloon_free_page_start(void);
> +void balloon_free_page_stop(void);
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque);
>  
>  #endif
> -- 
> 1.8.3.1

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] migration: use the free page hint feature from balloon
  2018-03-13 16:35     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-14  2:41       ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-14  2:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/14/2018 12:35 AM, Michael S. Tsirkin wrote:
> On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
>> Start the free page optimization after the migration bitmap is
>> synchronized. This can't be used in the stop&copy phase since the guest
>> is paused. Make sure the guest reporting has stopped before
>> synchronizing the migration dirty bitmap. Currently, the optimization is
>> added to precopy only.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>   migration/ram.c | 19 ++++++++++++++++++-
>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index e172798..7b4c9b1 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -51,6 +51,8 @@
>>   #include "qemu/rcu_queue.h"
>>   #include "migration/colo.h"
>>   #include "migration/block.h"
>> +#include "sysemu/balloon.h"
>> +#include "sysemu/sysemu.h"
>>   
>>   /***********************************************************/
>>   /* ram save/restore */
>> @@ -208,6 +210,8 @@ struct RAMState {
>>       uint32_t last_version;
>>       /* We are in the first round */
>>       bool ram_bulk_stage;
>> +    /* The free pages optimization feature is supported */
>> +    bool free_page_support;
>>       /* How many times we have dirty too many pages */
>>       int dirty_rate_high_cnt;
>>       /* these variables are used for bitmap sync */
>> @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>>       unsigned long *bitmap = rb->bmap;
>>       unsigned long next;
>>   
>> -    if (rs->ram_bulk_stage && start > 0) {
>> +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
>>           next = start + 1;
>>       } else {
>>           next = find_next_bit(bitmap, size, start);
>> @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
>>       int64_t end_time;
>>       uint64_t bytes_xfer_now;
>>   
>> +    if (rs->free_page_support) {
>> +        balloon_free_page_stop();
>> +    }
>> +
>>       ram_counters.dirty_sync_count++;
>>   
>>       if (!rs->time_last_bitmap_sync) {
>> @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
>>       if (migrate_use_events()) {
>>           qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
>>       }
>> +
>> +    if (rs->free_page_support && runstate_is_running()) {
>> +        balloon_free_page_start();
>> +    }
>>   }
> I think some of these conditions should go into
> balloon_free_page_start/stop.
>
> Checking runstate is generally problematic unless you
> also handle run state change notifiers as it can
> be manipulated from QMP.

How about moving the check of runstate to 
virtio_balloon_poll_free_page_hints:

while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP && 
runstate_is_running()) {
...
}

In this case, I think we won't need a notifier - if the run state is 
changed by qmp, the optimization thread will just exist.


>>   
>>   /**
>> @@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
>>       rs->last_page = 0;
>>       rs->last_version = ram_list.version;
>>       rs->ram_bulk_stage = true;
>> +    rs->free_page_support = balloon_free_page_support() &
>> +                            !migration_in_postcopy();
> Probably &&?
>

OK, will use &&. (Both work well here actually, since all of the values 
here are boolean)


Best,
Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 4/4] migration: use the free page hint feature from balloon
@ 2018-03-14  2:41       ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-14  2:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/14/2018 12:35 AM, Michael S. Tsirkin wrote:
> On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
>> Start the free page optimization after the migration bitmap is
>> synchronized. This can't be used in the stop&copy phase since the guest
>> is paused. Make sure the guest reporting has stopped before
>> synchronizing the migration dirty bitmap. Currently, the optimization is
>> added to precopy only.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>   migration/ram.c | 19 ++++++++++++++++++-
>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index e172798..7b4c9b1 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -51,6 +51,8 @@
>>   #include "qemu/rcu_queue.h"
>>   #include "migration/colo.h"
>>   #include "migration/block.h"
>> +#include "sysemu/balloon.h"
>> +#include "sysemu/sysemu.h"
>>   
>>   /***********************************************************/
>>   /* ram save/restore */
>> @@ -208,6 +210,8 @@ struct RAMState {
>>       uint32_t last_version;
>>       /* We are in the first round */
>>       bool ram_bulk_stage;
>> +    /* The free pages optimization feature is supported */
>> +    bool free_page_support;
>>       /* How many times we have dirty too many pages */
>>       int dirty_rate_high_cnt;
>>       /* these variables are used for bitmap sync */
>> @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>>       unsigned long *bitmap = rb->bmap;
>>       unsigned long next;
>>   
>> -    if (rs->ram_bulk_stage && start > 0) {
>> +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
>>           next = start + 1;
>>       } else {
>>           next = find_next_bit(bitmap, size, start);
>> @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
>>       int64_t end_time;
>>       uint64_t bytes_xfer_now;
>>   
>> +    if (rs->free_page_support) {
>> +        balloon_free_page_stop();
>> +    }
>> +
>>       ram_counters.dirty_sync_count++;
>>   
>>       if (!rs->time_last_bitmap_sync) {
>> @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
>>       if (migrate_use_events()) {
>>           qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
>>       }
>> +
>> +    if (rs->free_page_support && runstate_is_running()) {
>> +        balloon_free_page_start();
>> +    }
>>   }
> I think some of these conditions should go into
> balloon_free_page_start/stop.
>
> Checking runstate is generally problematic unless you
> also handle run state change notifiers as it can
> be manipulated from QMP.

How about moving the check of runstate to 
virtio_balloon_poll_free_page_hints:

while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP && 
runstate_is_running()) {
...
}

In this case, I think we won't need a notifier - if the run state is 
changed by qmp, the optimization thread will just exist.


>>   
>>   /**
>> @@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
>>       rs->last_page = 0;
>>       rs->last_version = ram_list.version;
>>       rs->ram_bulk_stage = true;
>> +    rs->free_page_support = balloon_free_page_support() &
>> +                            !migration_in_postcopy();
> Probably &&?
>

OK, will use &&. (Both work well here actually, since all of the values 
here are boolean)


Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-13 16:49     ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-14  2:43       ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-14  2:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
> I find it suspicious that neither unrealize nor reset
> functions have been touched at all.
> Are you sure you have thought through scenarious like
> hot-unplug or disabling the device by guest?
>    

OK. I think we can call balloon_free_page_stop in unrealize and reset.


>   
> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;
> What makes it safe to poke at this device from multiple threads?
> I think that it would be safer to do it from e.g. BH.
>

Actually the free_page_optimization thread is the only user of 
free_page_vq, and there is only one optimization thread each time. Would 
this be safe enough?

Best,
Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-14  2:43       ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-14  2:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
> I find it suspicious that neither unrealize nor reset
> functions have been touched at all.
> Are you sure you have thought through scenarious like
> hot-unplug or disabling the device by guest?
>    

OK. I think we can call balloon_free_page_stop in unrealize and reset.


>   
> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +    VirtQueueElement *elem;
> +    VirtIOBalloon *dev = opaque;
> +    VirtQueue *vq = dev->free_page_vq;
> +    uint32_t id;
> +    size_t size;
> What makes it safe to poke at this device from multiple threads?
> I think that it would be safer to do it from e.g. BH.
>

Actually the free_page_optimization thread is the only user of 
free_page_vq, and there is only one optimization thread each time. Would 
this be safe enough?

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] migration: use the free page hint feature from balloon
  2018-03-14  2:41       ` [virtio-dev] " Wei Wang
@ 2018-03-14  2:51         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14  2:51 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 10:41:36AM +0800, Wei Wang wrote:
> On 03/14/2018 12:35 AM, Michael S. Tsirkin wrote:
> > On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
> > > Start the free page optimization after the migration bitmap is
> > > synchronized. This can't be used in the stop&copy phase since the guest
> > > is paused. Make sure the guest reporting has stopped before
> > > synchronizing the migration dirty bitmap. Currently, the optimization is
> > > added to precopy only.
> > > 
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >   migration/ram.c | 19 ++++++++++++++++++-
> > >   1 file changed, 18 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index e172798..7b4c9b1 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -51,6 +51,8 @@
> > >   #include "qemu/rcu_queue.h"
> > >   #include "migration/colo.h"
> > >   #include "migration/block.h"
> > > +#include "sysemu/balloon.h"
> > > +#include "sysemu/sysemu.h"
> > >   /***********************************************************/
> > >   /* ram save/restore */
> > > @@ -208,6 +210,8 @@ struct RAMState {
> > >       uint32_t last_version;
> > >       /* We are in the first round */
> > >       bool ram_bulk_stage;
> > > +    /* The free pages optimization feature is supported */
> > > +    bool free_page_support;
> > >       /* How many times we have dirty too many pages */
> > >       int dirty_rate_high_cnt;
> > >       /* these variables are used for bitmap sync */
> > > @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
> > >       unsigned long *bitmap = rb->bmap;
> > >       unsigned long next;
> > > -    if (rs->ram_bulk_stage && start > 0) {
> > > +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
> > >           next = start + 1;
> > >       } else {
> > >           next = find_next_bit(bitmap, size, start);
> > > @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
> > >       int64_t end_time;
> > >       uint64_t bytes_xfer_now;
> > > +    if (rs->free_page_support) {
> > > +        balloon_free_page_stop();
> > > +    }
> > > +
> > >       ram_counters.dirty_sync_count++;
> > >       if (!rs->time_last_bitmap_sync) {
> > > @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
> > >       if (migrate_use_events()) {
> > >           qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
> > >       }
> > > +
> > > +    if (rs->free_page_support && runstate_is_running()) {
> > > +        balloon_free_page_start();
> > > +    }
> > >   }
> > I think some of these conditions should go into
> > balloon_free_page_start/stop.
> > 
> > Checking runstate is generally problematic unless you
> > also handle run state change notifiers as it can
> > be manipulated from QMP.
> 
> How about moving the check of runstate to
> virtio_balloon_poll_free_page_hints:
> 
> while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP &&
> runstate_is_running()) {
> ...
> }

Hard to tell on the outset. E.g. why is just stop affected?  Pls add
comments explaining what happens if VM is not running when start or stop
is called.


> In this case, I think we won't need a notifier - if the run state is changed
> by qmp, the optimization thread will just exist.

But you need to wake it up and notify the guest presumably?

> 
> > >   /**
> > > @@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
> > >       rs->last_page = 0;
> > >       rs->last_version = ram_list.version;
> > >       rs->ram_bulk_stage = true;
> > > +    rs->free_page_support = balloon_free_page_support() &
> > > +                            !migration_in_postcopy();
> > Probably &&?
> > 
> 
> OK, will use &&. (Both work well here actually, since all of the values here
> are boolean)
> 
> 
> Best,
> Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 4/4] migration: use the free page hint feature from balloon
@ 2018-03-14  2:51         ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14  2:51 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 10:41:36AM +0800, Wei Wang wrote:
> On 03/14/2018 12:35 AM, Michael S. Tsirkin wrote:
> > On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
> > > Start the free page optimization after the migration bitmap is
> > > synchronized. This can't be used in the stop&copy phase since the guest
> > > is paused. Make sure the guest reporting has stopped before
> > > synchronizing the migration dirty bitmap. Currently, the optimization is
> > > added to precopy only.
> > > 
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >   migration/ram.c | 19 ++++++++++++++++++-
> > >   1 file changed, 18 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index e172798..7b4c9b1 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -51,6 +51,8 @@
> > >   #include "qemu/rcu_queue.h"
> > >   #include "migration/colo.h"
> > >   #include "migration/block.h"
> > > +#include "sysemu/balloon.h"
> > > +#include "sysemu/sysemu.h"
> > >   /***********************************************************/
> > >   /* ram save/restore */
> > > @@ -208,6 +210,8 @@ struct RAMState {
> > >       uint32_t last_version;
> > >       /* We are in the first round */
> > >       bool ram_bulk_stage;
> > > +    /* The free pages optimization feature is supported */
> > > +    bool free_page_support;
> > >       /* How many times we have dirty too many pages */
> > >       int dirty_rate_high_cnt;
> > >       /* these variables are used for bitmap sync */
> > > @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
> > >       unsigned long *bitmap = rb->bmap;
> > >       unsigned long next;
> > > -    if (rs->ram_bulk_stage && start > 0) {
> > > +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
> > >           next = start + 1;
> > >       } else {
> > >           next = find_next_bit(bitmap, size, start);
> > > @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
> > >       int64_t end_time;
> > >       uint64_t bytes_xfer_now;
> > > +    if (rs->free_page_support) {
> > > +        balloon_free_page_stop();
> > > +    }
> > > +
> > >       ram_counters.dirty_sync_count++;
> > >       if (!rs->time_last_bitmap_sync) {
> > > @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
> > >       if (migrate_use_events()) {
> > >           qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
> > >       }
> > > +
> > > +    if (rs->free_page_support && runstate_is_running()) {
> > > +        balloon_free_page_start();
> > > +    }
> > >   }
> > I think some of these conditions should go into
> > balloon_free_page_start/stop.
> > 
> > Checking runstate is generally problematic unless you
> > also handle run state change notifiers as it can
> > be manipulated from QMP.
> 
> How about moving the check of runstate to
> virtio_balloon_poll_free_page_hints:
> 
> while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP &&
> runstate_is_running()) {
> ...
> }

Hard to tell on the outset. E.g. why is just stop affected?  Pls add
comments explaining what happens if VM is not running when start or stop
is called.


> In this case, I think we won't need a notifier - if the run state is changed
> by qmp, the optimization thread will just exist.

But you need to wake it up and notify the guest presumably?

> 
> > >   /**
> > > @@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
> > >       rs->last_page = 0;
> > >       rs->last_version = ram_list.version;
> > >       rs->ram_bulk_stage = true;
> > > +    rs->free_page_support = balloon_free_page_support() &
> > > +                            !migration_in_postcopy();
> > Probably &&?
> > 
> 
> OK, will use &&. (Both work well here actually, since all of the values here
> are boolean)
> 
> 
> Best,
> Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-14  2:43       ` Wei Wang
@ 2018-03-14  2:53         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14  2:53 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
> On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> > On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> > 
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > I find it suspicious that neither unrealize nor reset
> > functions have been touched at all.
> > Are you sure you have thought through scenarious like
> > hot-unplug or disabling the device by guest?
> 
> OK. I think we can call balloon_free_page_stop in unrealize and reset.
> 
> 
> > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > +{
> > +    VirtQueueElement *elem;
> > +    VirtIOBalloon *dev = opaque;
> > +    VirtQueue *vq = dev->free_page_vq;
> > +    uint32_t id;
> > +    size_t size;
> > What makes it safe to poke at this device from multiple threads?
> > I think that it would be safer to do it from e.g. BH.
> > 
> 
> Actually the free_page_optimization thread is the only user of free_page_vq,
> and there is only one optimization thread each time. Would this be safe
> enough?
> 
> Best,
> Wei

Aren't there other fields there? Also things like reset affect all VQs.

-- 
MST

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-14  2:53         ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14  2:53 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
> On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> > On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> > 
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > I find it suspicious that neither unrealize nor reset
> > functions have been touched at all.
> > Are you sure you have thought through scenarious like
> > hot-unplug or disabling the device by guest?
> 
> OK. I think we can call balloon_free_page_stop in unrealize and reset.
> 
> 
> > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > +{
> > +    VirtQueueElement *elem;
> > +    VirtIOBalloon *dev = opaque;
> > +    VirtQueue *vq = dev->free_page_vq;
> > +    uint32_t id;
> > +    size_t size;
> > What makes it safe to poke at this device from multiple threads?
> > I think that it would be safer to do it from e.g. BH.
> > 
> 
> Actually the free_page_optimization thread is the only user of free_page_vq,
> and there is only one optimization thread each time. Would this be safe
> enough?
> 
> Best,
> Wei

Aren't there other fields there? Also things like reset affect all VQs.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-14  2:53         ` Michael S. Tsirkin
@ 2018-03-14  6:03           ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-14  6:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
> On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
>> On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
>>> On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
>>>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>> I find it suspicious that neither unrealize nor reset
>>> functions have been touched at all.
>>> Are you sure you have thought through scenarious like
>>> hot-unplug or disabling the device by guest?
>> OK. I think we can call balloon_free_page_stop in unrealize and reset.
>>
>>
>>> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
>>> +{
>>> +    VirtQueueElement *elem;
>>> +    VirtIOBalloon *dev = opaque;
>>> +    VirtQueue *vq = dev->free_page_vq;
>>> +    uint32_t id;
>>> +    size_t size;
>>> What makes it safe to poke at this device from multiple threads?
>>> I think that it would be safer to do it from e.g. BH.
>>>
>> Actually the free_page_optimization thread is the only user of free_page_vq,
>> and there is only one optimization thread each time. Would this be safe
>> enough?
>>
>> Best,
>> Wei
> Aren't there other fields there? Also things like reset affect all VQs.
>

Yes. But I think BHs are used to avoid re-entrancy, which isn't the 
issue here.

The potential issue I could observe here is that 
"dev->free_page_report_status" is read and written by the optimization 
thread, and it is also modified by the migration thread and reset via 
virtio_balloon_free_page_stop.

How about adding a QEMU SpinLock, like this:

virtio_balloon_poll_free_page_hints()
{

     while (1) {
         qemu_spin_lock();
         /* If the status has been changed to STOP or EXIT, or the VM is 
stopped, just exit */
         if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP || 
!runstate_is_running()) {
             qemu_spin_unlock();
             break;
         }
         ....
         qemu_spin_unlock();
     }
}


Best,
Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-14  6:03           ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-14  6:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
> On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
>> On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
>>> On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
>>>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>> I find it suspicious that neither unrealize nor reset
>>> functions have been touched at all.
>>> Are you sure you have thought through scenarious like
>>> hot-unplug or disabling the device by guest?
>> OK. I think we can call balloon_free_page_stop in unrealize and reset.
>>
>>
>>> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
>>> +{
>>> +    VirtQueueElement *elem;
>>> +    VirtIOBalloon *dev = opaque;
>>> +    VirtQueue *vq = dev->free_page_vq;
>>> +    uint32_t id;
>>> +    size_t size;
>>> What makes it safe to poke at this device from multiple threads?
>>> I think that it would be safer to do it from e.g. BH.
>>>
>> Actually the free_page_optimization thread is the only user of free_page_vq,
>> and there is only one optimization thread each time. Would this be safe
>> enough?
>>
>> Best,
>> Wei
> Aren't there other fields there? Also things like reset affect all VQs.
>

Yes. But I think BHs are used to avoid re-entrancy, which isn't the 
issue here.

The potential issue I could observe here is that 
"dev->free_page_report_status" is read and written by the optimization 
thread, and it is also modified by the migration thread and reset via 
virtio_balloon_free_page_stop.

How about adding a QEMU SpinLock, like this:

virtio_balloon_poll_free_page_hints()
{

     while (1) {
         qemu_spin_lock();
         /* If the status has been changed to STOP or EXIT, or the VM is 
stopped, just exit */
         if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP || 
!runstate_is_running()) {
             qemu_spin_unlock();
             break;
         }
         ....
         qemu_spin_unlock();
     }
}


Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] migration: use the free page hint feature from balloon
  2018-03-14  2:51         ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-14  6:50           ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-14  6:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/14/2018 10:51 AM, Michael S. Tsirkin wrote:
> On Wed, Mar 14, 2018 at 10:41:36AM +0800, Wei Wang wrote:
>> On 03/14/2018 12:35 AM, Michael S. Tsirkin wrote:
>>> On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
>>>> Start the free page optimization after the migration bitmap is
>>>> synchronized. This can't be used in the stop&copy phase since the guest
>>>> is paused. Make sure the guest reporting has stopped before
>>>> synchronizing the migration dirty bitmap. Currently, the optimization is
>>>> added to precopy only.
>>>>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>> ---
>>>>    migration/ram.c | 19 ++++++++++++++++++-
>>>>    1 file changed, 18 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index e172798..7b4c9b1 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -51,6 +51,8 @@
>>>>    #include "qemu/rcu_queue.h"
>>>>    #include "migration/colo.h"
>>>>    #include "migration/block.h"
>>>> +#include "sysemu/balloon.h"
>>>> +#include "sysemu/sysemu.h"
>>>>    /***********************************************************/
>>>>    /* ram save/restore */
>>>> @@ -208,6 +210,8 @@ struct RAMState {
>>>>        uint32_t last_version;
>>>>        /* We are in the first round */
>>>>        bool ram_bulk_stage;
>>>> +    /* The free pages optimization feature is supported */
>>>> +    bool free_page_support;
>>>>        /* How many times we have dirty too many pages */
>>>>        int dirty_rate_high_cnt;
>>>>        /* these variables are used for bitmap sync */
>>>> @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>>>>        unsigned long *bitmap = rb->bmap;
>>>>        unsigned long next;
>>>> -    if (rs->ram_bulk_stage && start > 0) {
>>>> +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
>>>>            next = start + 1;
>>>>        } else {
>>>>            next = find_next_bit(bitmap, size, start);
>>>> @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
>>>>        int64_t end_time;
>>>>        uint64_t bytes_xfer_now;
>>>> +    if (rs->free_page_support) {
>>>> +        balloon_free_page_stop();
>>>> +    }
>>>> +
>>>>        ram_counters.dirty_sync_count++;
>>>>        if (!rs->time_last_bitmap_sync) {
>>>> @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
>>>>        if (migrate_use_events()) {
>>>>            qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
>>>>        }
>>>> +
>>>> +    if (rs->free_page_support && runstate_is_running()) {
>>>> +        balloon_free_page_start();
>>>> +    }
>>>>    }
>>> I think some of these conditions should go into
>>> balloon_free_page_start/stop.
>>>
>>> Checking runstate is generally problematic unless you
>>> also handle run state change notifiers as it can
>>> be manipulated from QMP.
>> How about moving the check of runstate to
>> virtio_balloon_poll_free_page_hints:
>>
>> while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP &&
>> runstate_is_running()) {
>> ...
>> }
> Hard to tell on the outset. E.g. why is just stop affected?  Pls add
> comments explaining what happens if VM is not running when start or stop
> is called.
>
>
>> In this case, I think we won't need a notifier - if the run state is changed
>> by qmp, the optimization thread will just exist.
> But you need to wake it up and notify the guest presumably?
>


I think it's not necessary to wake it up, because when the VM is not 
running, there will be no hints reported to the vq, and the optimization 
thread exits. (there is no issue in that case)
Probably we can add a notifier which calls 
virtio_balloon_free_page_stop() when qmp wakes up the VM.

Best,
Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 4/4] migration: use the free page hint feature from balloon
@ 2018-03-14  6:50           ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-14  6:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/14/2018 10:51 AM, Michael S. Tsirkin wrote:
> On Wed, Mar 14, 2018 at 10:41:36AM +0800, Wei Wang wrote:
>> On 03/14/2018 12:35 AM, Michael S. Tsirkin wrote:
>>> On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
>>>> Start the free page optimization after the migration bitmap is
>>>> synchronized. This can't be used in the stop&copy phase since the guest
>>>> is paused. Make sure the guest reporting has stopped before
>>>> synchronizing the migration dirty bitmap. Currently, the optimization is
>>>> added to precopy only.
>>>>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>> ---
>>>>    migration/ram.c | 19 ++++++++++++++++++-
>>>>    1 file changed, 18 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index e172798..7b4c9b1 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -51,6 +51,8 @@
>>>>    #include "qemu/rcu_queue.h"
>>>>    #include "migration/colo.h"
>>>>    #include "migration/block.h"
>>>> +#include "sysemu/balloon.h"
>>>> +#include "sysemu/sysemu.h"
>>>>    /***********************************************************/
>>>>    /* ram save/restore */
>>>> @@ -208,6 +210,8 @@ struct RAMState {
>>>>        uint32_t last_version;
>>>>        /* We are in the first round */
>>>>        bool ram_bulk_stage;
>>>> +    /* The free pages optimization feature is supported */
>>>> +    bool free_page_support;
>>>>        /* How many times we have dirty too many pages */
>>>>        int dirty_rate_high_cnt;
>>>>        /* these variables are used for bitmap sync */
>>>> @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>>>>        unsigned long *bitmap = rb->bmap;
>>>>        unsigned long next;
>>>> -    if (rs->ram_bulk_stage && start > 0) {
>>>> +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
>>>>            next = start + 1;
>>>>        } else {
>>>>            next = find_next_bit(bitmap, size, start);
>>>> @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
>>>>        int64_t end_time;
>>>>        uint64_t bytes_xfer_now;
>>>> +    if (rs->free_page_support) {
>>>> +        balloon_free_page_stop();
>>>> +    }
>>>> +
>>>>        ram_counters.dirty_sync_count++;
>>>>        if (!rs->time_last_bitmap_sync) {
>>>> @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
>>>>        if (migrate_use_events()) {
>>>>            qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
>>>>        }
>>>> +
>>>> +    if (rs->free_page_support && runstate_is_running()) {
>>>> +        balloon_free_page_start();
>>>> +    }
>>>>    }
>>> I think some of these conditions should go into
>>> balloon_free_page_start/stop.
>>>
>>> Checking runstate is generally problematic unless you
>>> also handle run state change notifiers as it can
>>> be manipulated from QMP.
>> How about moving the check of runstate to
>> virtio_balloon_poll_free_page_hints:
>>
>> while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP &&
>> runstate_is_running()) {
>> ...
>> }
> Hard to tell on the outset. E.g. why is just stop affected?  Pls add
> comments explaining what happens if VM is not running when start or stop
> is called.
>
>
>> In this case, I think we won't need a notifier - if the run state is changed
>> by qmp, the optimization thread will just exist.
> But you need to wake it up and notify the guest presumably?
>


I think it's not necessary to wake it up, because when the VM is not 
running, there will be no hints reported to the vq, and the optimization 
thread exits. (there is no issue in that case)
Probably we can add a notifier which calls 
virtio_balloon_free_page_stop() when qmp wakes up the VM.

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-14  6:03           ` Wei Wang
@ 2018-03-14 14:12             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14 14:12 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 02:03:19PM +0800, Wei Wang wrote:
> On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
> > On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
> > > On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> > > > 
> > > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > CC: Juan Quintela <quintela@redhat.com>
> > > > I find it suspicious that neither unrealize nor reset
> > > > functions have been touched at all.
> > > > Are you sure you have thought through scenarious like
> > > > hot-unplug or disabling the device by guest?
> > > OK. I think we can call balloon_free_page_stop in unrealize and reset.
> > > 
> > > 
> > > > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > > > +{
> > > > +    VirtQueueElement *elem;
> > > > +    VirtIOBalloon *dev = opaque;
> > > > +    VirtQueue *vq = dev->free_page_vq;
> > > > +    uint32_t id;
> > > > +    size_t size;
> > > > What makes it safe to poke at this device from multiple threads?
> > > > I think that it would be safer to do it from e.g. BH.
> > > > 
> > > Actually the free_page_optimization thread is the only user of free_page_vq,
> > > and there is only one optimization thread each time. Would this be safe
> > > enough?
> > > 
> > > Best,
> > > Wei
> > Aren't there other fields there? Also things like reset affect all VQs.
> > 
> 
> Yes. But I think BHs are used to avoid re-entrancy, which isn't the issue
> here.

Since you are adding locks to address the issue - doesn't this imply
reentrancy is exactly the issue?

> The potential issue I could observe here is that
> "dev->free_page_report_status" is read and written by the optimization
> thread, and it is also modified by the migration thread and reset via
> virtio_balloon_free_page_stop.
> 
> How about adding a QEMU SpinLock, like this:
> 
> virtio_balloon_poll_free_page_hints()
> {
> 
>     while (1) {
>         qemu_spin_lock();
>         /* If the status has been changed to STOP or EXIT, or the VM is
> stopped, just exit */
>         if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP ||
> !runstate_is_running()) {
>             qemu_spin_unlock();
>             break;
>         }
>         ....
>         qemu_spin_unlock();
>     }
> }
> 
> 
> Best,
> Wei

That will address the issue but it does look weird.

-- 
MST

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-14 14:12             ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14 14:12 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 02:03:19PM +0800, Wei Wang wrote:
> On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
> > On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
> > > On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> > > > 
> > > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > CC: Juan Quintela <quintela@redhat.com>
> > > > I find it suspicious that neither unrealize nor reset
> > > > functions have been touched at all.
> > > > Are you sure you have thought through scenarious like
> > > > hot-unplug or disabling the device by guest?
> > > OK. I think we can call balloon_free_page_stop in unrealize and reset.
> > > 
> > > 
> > > > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > > > +{
> > > > +    VirtQueueElement *elem;
> > > > +    VirtIOBalloon *dev = opaque;
> > > > +    VirtQueue *vq = dev->free_page_vq;
> > > > +    uint32_t id;
> > > > +    size_t size;
> > > > What makes it safe to poke at this device from multiple threads?
> > > > I think that it would be safer to do it from e.g. BH.
> > > > 
> > > Actually the free_page_optimization thread is the only user of free_page_vq,
> > > and there is only one optimization thread each time. Would this be safe
> > > enough?
> > > 
> > > Best,
> > > Wei
> > Aren't there other fields there? Also things like reset affect all VQs.
> > 
> 
> Yes. But I think BHs are used to avoid re-entrancy, which isn't the issue
> here.

Since you are adding locks to address the issue - doesn't this imply
reentrancy is exactly the issue?

> The potential issue I could observe here is that
> "dev->free_page_report_status" is read and written by the optimization
> thread, and it is also modified by the migration thread and reset via
> virtio_balloon_free_page_stop.
> 
> How about adding a QEMU SpinLock, like this:
> 
> virtio_balloon_poll_free_page_hints()
> {
> 
>     while (1) {
>         qemu_spin_lock();
>         /* If the status has been changed to STOP or EXIT, or the VM is
> stopped, just exit */
>         if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP ||
> !runstate_is_running()) {
>             qemu_spin_unlock();
>             break;
>         }
>         ....
>         qemu_spin_unlock();
>     }
> }
> 
> 
> Best,
> Wei

That will address the issue but it does look weird.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] migration: use the free page hint feature from balloon
  2018-03-14  6:50           ` [virtio-dev] " Wei Wang
@ 2018-03-14 14:45             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14 14:45 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 02:50:44PM +0800, Wei Wang wrote:
> On 03/14/2018 10:51 AM, Michael S. Tsirkin wrote:
> > On Wed, Mar 14, 2018 at 10:41:36AM +0800, Wei Wang wrote:
> > > On 03/14/2018 12:35 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
> > > > > Start the free page optimization after the migration bitmap is
> > > > > synchronized. This can't be used in the stop&copy phase since the guest
> > > > > is paused. Make sure the guest reporting has stopped before
> > > > > synchronizing the migration dirty bitmap. Currently, the optimization is
> > > > > added to precopy only.
> > > > > 
> > > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > CC: Juan Quintela <quintela@redhat.com>
> > > > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > >    migration/ram.c | 19 ++++++++++++++++++-
> > > > >    1 file changed, 18 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > > index e172798..7b4c9b1 100644
> > > > > --- a/migration/ram.c
> > > > > +++ b/migration/ram.c
> > > > > @@ -51,6 +51,8 @@
> > > > >    #include "qemu/rcu_queue.h"
> > > > >    #include "migration/colo.h"
> > > > >    #include "migration/block.h"
> > > > > +#include "sysemu/balloon.h"
> > > > > +#include "sysemu/sysemu.h"
> > > > >    /***********************************************************/
> > > > >    /* ram save/restore */
> > > > > @@ -208,6 +210,8 @@ struct RAMState {
> > > > >        uint32_t last_version;
> > > > >        /* We are in the first round */
> > > > >        bool ram_bulk_stage;
> > > > > +    /* The free pages optimization feature is supported */
> > > > > +    bool free_page_support;
> > > > >        /* How many times we have dirty too many pages */
> > > > >        int dirty_rate_high_cnt;
> > > > >        /* these variables are used for bitmap sync */
> > > > > @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
> > > > >        unsigned long *bitmap = rb->bmap;
> > > > >        unsigned long next;
> > > > > -    if (rs->ram_bulk_stage && start > 0) {
> > > > > +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
> > > > >            next = start + 1;
> > > > >        } else {
> > > > >            next = find_next_bit(bitmap, size, start);
> > > > > @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
> > > > >        int64_t end_time;
> > > > >        uint64_t bytes_xfer_now;
> > > > > +    if (rs->free_page_support) {
> > > > > +        balloon_free_page_stop();
> > > > > +    }
> > > > > +
> > > > >        ram_counters.dirty_sync_count++;
> > > > >        if (!rs->time_last_bitmap_sync) {
> > > > > @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
> > > > >        if (migrate_use_events()) {
> > > > >            qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
> > > > >        }
> > > > > +
> > > > > +    if (rs->free_page_support && runstate_is_running()) {
> > > > > +        balloon_free_page_start();
> > > > > +    }
> > > > >    }
> > > > I think some of these conditions should go into
> > > > balloon_free_page_start/stop.
> > > > 
> > > > Checking runstate is generally problematic unless you
> > > > also handle run state change notifiers as it can
> > > > be manipulated from QMP.
> > > How about moving the check of runstate to
> > > virtio_balloon_poll_free_page_hints:
> > > 
> > > while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP &&
> > > runstate_is_running()) {
> > > ...
> > > }
> > Hard to tell on the outset. E.g. why is just stop affected?  Pls add
> > comments explaining what happens if VM is not running when start or stop
> > is called.
> > 
> > 
> > > In this case, I think we won't need a notifier - if the run state is changed
> > > by qmp, the optimization thread will just exist.
> > But you need to wake it up and notify the guest presumably?
> > 
> 
> 
> I think it's not necessary to wake it up, because when the VM is not
> running, there will be no hints reported to the vq, and the optimization
> thread exits. (there is no issue in that case)
> Probably we can add a notifier which calls virtio_balloon_free_page_stop()
> when qmp wakes up the VM.
> 
> Best,
> Wei

set_status callback is invoked so you can use that maybe.

Might be a good idea to handle a bunch of other corner
cases e.g. if guest driver is loaded when migration
is already in progress.

-- 
MST

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 4/4] migration: use the free page hint feature from balloon
@ 2018-03-14 14:45             ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14 14:45 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 02:50:44PM +0800, Wei Wang wrote:
> On 03/14/2018 10:51 AM, Michael S. Tsirkin wrote:
> > On Wed, Mar 14, 2018 at 10:41:36AM +0800, Wei Wang wrote:
> > > On 03/14/2018 12:35 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
> > > > > Start the free page optimization after the migration bitmap is
> > > > > synchronized. This can't be used in the stop&copy phase since the guest
> > > > > is paused. Make sure the guest reporting has stopped before
> > > > > synchronizing the migration dirty bitmap. Currently, the optimization is
> > > > > added to precopy only.
> > > > > 
> > > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > CC: Juan Quintela <quintela@redhat.com>
> > > > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > >    migration/ram.c | 19 ++++++++++++++++++-
> > > > >    1 file changed, 18 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/migration/ram.c b/migration/ram.c
> > > > > index e172798..7b4c9b1 100644
> > > > > --- a/migration/ram.c
> > > > > +++ b/migration/ram.c
> > > > > @@ -51,6 +51,8 @@
> > > > >    #include "qemu/rcu_queue.h"
> > > > >    #include "migration/colo.h"
> > > > >    #include "migration/block.h"
> > > > > +#include "sysemu/balloon.h"
> > > > > +#include "sysemu/sysemu.h"
> > > > >    /***********************************************************/
> > > > >    /* ram save/restore */
> > > > > @@ -208,6 +210,8 @@ struct RAMState {
> > > > >        uint32_t last_version;
> > > > >        /* We are in the first round */
> > > > >        bool ram_bulk_stage;
> > > > > +    /* The free pages optimization feature is supported */
> > > > > +    bool free_page_support;
> > > > >        /* How many times we have dirty too many pages */
> > > > >        int dirty_rate_high_cnt;
> > > > >        /* these variables are used for bitmap sync */
> > > > > @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
> > > > >        unsigned long *bitmap = rb->bmap;
> > > > >        unsigned long next;
> > > > > -    if (rs->ram_bulk_stage && start > 0) {
> > > > > +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
> > > > >            next = start + 1;
> > > > >        } else {
> > > > >            next = find_next_bit(bitmap, size, start);
> > > > > @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
> > > > >        int64_t end_time;
> > > > >        uint64_t bytes_xfer_now;
> > > > > +    if (rs->free_page_support) {
> > > > > +        balloon_free_page_stop();
> > > > > +    }
> > > > > +
> > > > >        ram_counters.dirty_sync_count++;
> > > > >        if (!rs->time_last_bitmap_sync) {
> > > > > @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
> > > > >        if (migrate_use_events()) {
> > > > >            qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
> > > > >        }
> > > > > +
> > > > > +    if (rs->free_page_support && runstate_is_running()) {
> > > > > +        balloon_free_page_start();
> > > > > +    }
> > > > >    }
> > > > I think some of these conditions should go into
> > > > balloon_free_page_start/stop.
> > > > 
> > > > Checking runstate is generally problematic unless you
> > > > also handle run state change notifiers as it can
> > > > be manipulated from QMP.
> > > How about moving the check of runstate to
> > > virtio_balloon_poll_free_page_hints:
> > > 
> > > while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP &&
> > > runstate_is_running()) {
> > > ...
> > > }
> > Hard to tell on the outset. E.g. why is just stop affected?  Pls add
> > comments explaining what happens if VM is not running when start or stop
> > is called.
> > 
> > 
> > > In this case, I think we won't need a notifier - if the run state is changed
> > > by qmp, the optimization thread will just exist.
> > But you need to wake it up and notify the guest presumably?
> > 
> 
> 
> I think it's not necessary to wake it up, because when the VM is not
> running, there will be no hints reported to the vq, and the optimization
> thread exits. (there is no issue in that case)
> Probably we can add a notifier which calls virtio_balloon_free_page_stop()
> when qmp wakes up the VM.
> 
> Best,
> Wei

set_status callback is invoked so you can use that maybe.

Might be a good idea to handle a bunch of other corner
cases e.g. if guest driver is loaded when migration
is already in progress.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/4] bitmap: bitmap_count_one_with_offset
  2018-03-07 12:34   ` [virtio-dev] " Wei Wang
  (?)
@ 2018-03-14 16:20   ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 56+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-14 16:20 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

* Wei Wang (wei.w.wang@intel.com) wrote:
> Count the number of 1s in a bitmap starting from an offset.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  include/qemu/bitmap.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
> index 509eedd..e3f31f1 100644
> --- a/include/qemu/bitmap.h
> +++ b/include/qemu/bitmap.h
> @@ -228,6 +228,19 @@ static inline long bitmap_count_one(const unsigned long *bitmap, long nbits)
>      }
>  }
>  
> +static inline long bitmap_count_one_with_offset(const unsigned long *bitmap,
> +                                                long offset, long nbits)
> +{
> +    long aligned_offset = QEMU_ALIGN_DOWN(offset, BITS_PER_LONG);
> +    long redundant_bits = offset - aligned_offset;
> +    long bits_to_count = nbits + redundant_bits;
> +    const unsigned long *bitmap_start = bitmap +
> +                                        aligned_offset / BITS_PER_LONG;
> +
> +    return bitmap_count_one(bitmap_start, bits_to_count) -
> +           bitmap_count_one(bitmap_start, redundant_bits);
> +}
> +
>  void bitmap_set(unsigned long *map, long i, long len);
>  void bitmap_set_atomic(unsigned long *map, long i, long len);
>  void bitmap_clear(unsigned long *map, long start, long nr);
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-03-07 12:34   ` [virtio-dev] " Wei Wang
  (?)
@ 2018-03-14 18:11   ` Dr. David Alan Gilbert
  2018-03-14 19:16       ` [virtio-dev] " Michael S. Tsirkin
  2018-03-15 10:52       ` [virtio-dev] " Wei Wang
  -1 siblings, 2 replies; 56+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-14 18:11 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

* Wei Wang (wei.w.wang@intel.com) wrote:
> This patch adds an API to clear bits corresponding to guest free pages
> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
> RAMBlock boundary.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> ---
>  include/migration/misc.h |  2 ++
>  migration/ram.c          | 21 +++++++++++++++++++++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/include/migration/misc.h b/include/migration/misc.h
> index 77fd4f5..fae1acf 100644
> --- a/include/migration/misc.h
> +++ b/include/migration/misc.h
> @@ -14,11 +14,13 @@
>  #ifndef MIGRATION_MISC_H
>  #define MIGRATION_MISC_H
>  
> +#include "exec/cpu-common.h"
>  #include "qemu/notify.h"
>  
>  /* migration/ram.c */
>  
>  void ram_mig_init(void);
> +void qemu_guest_free_page_hint(void *addr, size_t len);
>  
>  /* migration/block.c */
>  
> diff --git a/migration/ram.c b/migration/ram.c
> index 5e33e5c..e172798 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
>      return 0;
>  }
>  

This could do with some comments

> +void qemu_guest_free_page_hint(void *addr, size_t len)
> +{
> +    RAMBlock *block;
> +    ram_addr_t offset;
> +    size_t used_len, start, npages;

From your use I think the addr and len are coming raw from the guest;
so we need to take some care.

> +
> +    for (used_len = len; len > 0; len -= used_len) {

That initialisation of used_len is unusual; I'd rather put that
in the body.

> +        block = qemu_ram_block_from_host(addr, false, &offset);

CHeck for block != 0

> +        if (unlikely(offset + len > block->used_length)) {

I think to make that overflow safe, that should be:
  if (len > (block->used_length - offset)) {

But we'll need another test before it, because qemu_ram_block_from_host
seems to check max_length not used_length, so we need to check
for offset > block->used_length first

> +            used_len = block->used_length - offset;
> +            addr += used_len;
> +        }
> +
> +        start = offset >> TARGET_PAGE_BITS;
> +        npages = used_len >> TARGET_PAGE_BITS;
> +        ram_state->migration_dirty_pages -=
> +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> +        bitmap_clear(block->bmap, start, npages);

If this is happening while the migration is running, this isn't safe -
the migration code could clear a bit at about the same point this
happens, so that the count returned by bitmap_count_one_with_offset
wouldn't match the word that was cleared by bitmap_clear.

The only way I can see to fix it is to run over the range using
bitmap_test_and_clear_atomic, using the return value to decrement
the number of dirty pages.
But you also need to be careful with the update of the
migration_dirty_pages value itself, because that's also being read
by the migration thread.

Dave

> +    }
> +}
> +
>  /*
>   * Each of ram_save_setup, ram_save_iterate and ram_save_complete has
>   * long-running RCU critical section.  When rcu-reclaims in the code
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-07 12:34   ` [virtio-dev] " Wei Wang
  (?)
  (?)
@ 2018-03-14 18:44   ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 56+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-14 18:44 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, mst, quintela, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

* Wei Wang (wei.w.wang@intel.com) wrote:
> The new feature enables the virtio-balloon device to receive hints of
> guest free pages from the free page vq.
> 
> balloon_free_page_start - start guest free page hint reporting.
> balloon_free_page_stop - stop guest free page hint reporting.
> 
> Note: balloon will report pages which were free at the time
> of this call. As the reporting happens asynchronously, dirty bit logging
> must be enabled before this call is made. Guest reporting must be
> disabled before the migration dirty bitmap is synchronized.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Signed-off-by: Liang Li <liang.z.li@intel.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> ---
>  balloon.c                                       |  49 +++++--
>  hw/virtio/virtio-balloon.c                      | 183 +++++++++++++++++++++---
>  include/hw/virtio/virtio-balloon.h              |  15 +-
>  include/standard-headers/linux/virtio_balloon.h |   7 +
>  include/sysemu/balloon.h                        |  15 +-
>  5 files changed, 240 insertions(+), 29 deletions(-)

<snip>

> +    qemu_thread_create(&s->free_page_thread, "free_page_optimization_thread",

Note the maximum size of thread name is normally 14 chars, and also we
don't need to say 'thread' since we know it's a thread; I suggest
shortening it to "free_page_opt" or "balloon_fpo"

Dave

> +                       virtio_balloon_poll_free_page_hints, s,
> +                       QEMU_THREAD_JOINABLE);
> +}
> +
> +static void virtio_balloon_free_page_stop(void *opaque)
> +{
> +    VirtIOBalloon *s = opaque;
> +    VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +
> +    switch (s->free_page_report_status) {
> +    case FREE_PAGE_REPORT_S_REQUESTED:
> +    case FREE_PAGE_REPORT_S_START:
> +        /*
> +         * The guest hasn't done the reporting, so host sends a notification
> +         * to the guest to actively stop the reporting before joining the
> +         * optimization thread.
> +         */
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> +        virtio_notify_config(vdev);
> +    case FREE_PAGE_REPORT_S_STOP:
> +        /* The guest has stopped the reporting. Join the optimization thread */
> +        qemu_thread_join(&s->free_page_thread);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
> +    case FREE_PAGE_REPORT_S_EXIT:
> +        /* The optimization thread has gone. No actions needded so far. */
> +        break;
> +    }
> +}
> +
>  static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  {
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> @@ -315,6 +421,15 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  
>      config.num_pages = cpu_to_le32(dev->num_pages);
>      config.actual = cpu_to_le32(dev->actual);
> +    config.poison_val = cpu_to_le32(dev->poison_val);
> +
> +    if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP) {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID);
> +    } else {
> +        config.free_page_report_cmd_id =
> +                       cpu_to_le32(dev->free_page_report_cmd_id);
> +    }
>  
>      trace_virtio_balloon_get_config(config.num_pages, config.actual);
>      memcpy(config_data, &config, sizeof(struct virtio_balloon_config));
> @@ -368,6 +483,7 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT),
>                          &error_abort);
>      }
> +    dev->poison_val = le32_to_cpu(config.poison_val);
>      trace_virtio_balloon_set_config(dev->actual, oldactual);
>  }
>  
> @@ -377,6 +493,11 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +
> +    if (dev->host_features & 1ULL << VIRTIO_BALLOON_F_FREE_PAGE_HINT) {
> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> +    }
> +
>      return f;
>  }
>  
> @@ -413,6 +534,18 @@ static int virtio_balloon_post_load_device(void *opaque, int version_id)
>      return 0;
>  }
>  
> +static const VMStateDescription vmstate_virtio_balloon_free_page_report = {
> +    .name = "virtio-balloon-device/free-page-report",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = virtio_balloon_free_page_support,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(free_page_report_cmd_id, VirtIOBalloon),
> +        VMSTATE_UINT32(poison_val, VirtIOBalloon),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_virtio_balloon_device = {
>      .name = "virtio-balloon-device",
>      .version_id = 1,
> @@ -423,30 +556,30 @@ static const VMStateDescription vmstate_virtio_balloon_device = {
>          VMSTATE_UINT32(actual, VirtIOBalloon),
>          VMSTATE_END_OF_LIST()
>      },
> +    .subsections = (const VMStateDescription * []) {
> +        &vmstate_virtio_balloon_free_page_report,
> +        NULL
> +    }
>  };
>  
>  static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
>  {
>      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
>      VirtIOBalloon *s = VIRTIO_BALLOON(dev);
> -    int ret;
>  
>      virtio_init(vdev, "virtio-balloon", VIRTIO_ID_BALLOON,
>                  sizeof(struct virtio_balloon_config));
>  
> -    ret = qemu_add_balloon_handler(virtio_balloon_to_target,
> -                                   virtio_balloon_stat, s);
> -
> -    if (ret < 0) {
> -        error_setg(errp, "Only one balloon device is supported");
> -        virtio_cleanup(vdev);
> -        return;
> -    }
> -
>      s->ivq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
>      s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
> -
> +    if (virtio_has_feature(s->host_features,
> +                           VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> +        s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE, NULL);
> +        s->free_page_report_status = FREE_PAGE_REPORT_S_EXIT;
> +        s->free_page_report_cmd_id =
> +                           VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN - 1;
> +    }
>      reset_stats(s);
>  }
>  
> @@ -475,11 +608,27 @@ static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
>  
> -    if (!s->stats_vq_elem && vdev->vm_running &&
> -        (status & VIRTIO_CONFIG_S_DRIVER_OK) && virtqueue_rewind(s->svq, 1)) {
> -        /* poll stats queue for the element we have discarded when the VM
> -         * was stopped */
> -        virtio_balloon_receive_stats(vdev, s->svq);
> +    if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +        if (!s->stats_vq_elem && vdev->vm_running &&
> +            virtqueue_rewind(s->svq, 1)) {
> +            /*
> +             * Poll stats queue for the element we have discarded when the VM
> +             * was stopped.
> +             */
> +            virtio_balloon_receive_stats(vdev, s->svq);
> +        }
> +
> +        if (virtio_balloon_free_page_support(s)) {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat,
> +                                     virtio_balloon_free_page_support,
> +                                     virtio_balloon_free_page_start,
> +                                     virtio_balloon_free_page_stop,
> +                                     s);
> +        } else {
> +            qemu_add_balloon_handler(virtio_balloon_to_target,
> +                                     virtio_balloon_stat, NULL, NULL, NULL, s);
> +        }
>      }
>  }
>  
> @@ -509,6 +658,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
>  static Property virtio_balloon_properties[] = {
>      DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> +    DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> index 1ea13bd..12fde2f 100644
> --- a/include/hw/virtio/virtio-balloon.h
> +++ b/include/hw/virtio/virtio-balloon.h
> @@ -23,6 +23,8 @@
>  #define VIRTIO_BALLOON(obj) \
>          OBJECT_CHECK(VirtIOBalloon, (obj), TYPE_VIRTIO_BALLOON)
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN 0x80000000
> +
>  typedef struct virtio_balloon_stat VirtIOBalloonStat;
>  
>  typedef struct virtio_balloon_stat_modern {
> @@ -31,15 +33,26 @@ typedef struct virtio_balloon_stat_modern {
>         uint64_t val;
>  } VirtIOBalloonStatModern;
>  
> +enum virtio_balloon_free_page_report_status {
> +    FREE_PAGE_REPORT_S_REQUESTED,
> +    FREE_PAGE_REPORT_S_START,
> +    FREE_PAGE_REPORT_S_STOP,
> +    FREE_PAGE_REPORT_S_EXIT,
> +};
> +
>  typedef struct VirtIOBalloon {
>      VirtIODevice parent_obj;
> -    VirtQueue *ivq, *dvq, *svq;
> +    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
> +    uint32_t free_page_report_status;
>      uint32_t num_pages;
>      uint32_t actual;
> +    uint32_t free_page_report_cmd_id;
> +    uint32_t poison_val;
>      uint64_t stats[VIRTIO_BALLOON_S_NR];
>      VirtQueueElement *stats_vq_elem;
>      size_t stats_vq_offset;
>      QEMUTimer *stats_timer;
> +    QemuThread free_page_thread;
>      int64_t stats_last_update;
>      int64_t stats_poll_interval;
>      uint32_t host_features;
> diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
> index 7b0a41b..f89e80f 100644
> --- a/include/standard-headers/linux/virtio_balloon.h
> +++ b/include/standard-headers/linux/virtio_balloon.h
> @@ -34,15 +34,22 @@
>  #define VIRTIO_BALLOON_F_MUST_TELL_HOST	0 /* Tell before reclaiming pages */
>  #define VIRTIO_BALLOON_F_STATS_VQ	1 /* Memory Stats virtqueue */
>  #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
> +#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
> +#define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
>  
>  /* Size of a PFN in the balloon interface. */
>  #define VIRTIO_BALLOON_PFN_SHIFT 12
>  
> +#define VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID 0
>  struct virtio_balloon_config {
>  	/* Number of pages host wants Guest to give up. */
>  	uint32_t num_pages;
>  	/* Number of pages we've actually got in balloon. */
>  	uint32_t actual;
> +	/* Free page report command id, readonly by guest */
> +	uint32_t free_page_report_cmd_id;
> +	/* Stores PAGE_POISON if page poisoning is in use */
> +	uint32_t poison_val;
>  };
>  
>  #define VIRTIO_BALLOON_S_SWAP_IN  0   /* Amount of memory swapped in */
> diff --git a/include/sysemu/balloon.h b/include/sysemu/balloon.h
> index af49e19..16a2aae 100644
> --- a/include/sysemu/balloon.h
> +++ b/include/sysemu/balloon.h
> @@ -18,11 +18,22 @@
>  
>  typedef void (QEMUBalloonEvent)(void *opaque, ram_addr_t target);
>  typedef void (QEMUBalloonStatus)(void *opaque, BalloonInfo *info);
> +typedef bool (QEMUBalloonFreePageSupport)(void *opaque);
> +typedef void (QEMUBalloonFreePageStart)(void *opaque);
> +typedef void (QEMUBalloonFreePageStop)(void *opaque);
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> -			     QEMUBalloonStatus *stat_func, void *opaque);
>  void qemu_remove_balloon_handler(void *opaque);
>  bool qemu_balloon_is_inhibited(void);
>  void qemu_balloon_inhibit(bool state);
> +bool balloon_free_page_support(void);
> +void balloon_free_page_start(void);
> +void balloon_free_page_stop(void);
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +                              QEMUBalloonStatus *stat_fn,
> +                              QEMUBalloonFreePageSupport *free_page_support_fn,
> +                              QEMUBalloonFreePageStart *free_page_start_fn,
> +                              QEMUBalloonFreePageStop *free_page_stop_fn,
> +                              void *opaque);
>  
>  #endif
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-03-14 18:11   ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-14 19:16       ` Michael S. Tsirkin
  2018-03-15 10:52       ` [virtio-dev] " Wei Wang
  1 sibling, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14 19:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
> > +            used_len = block->used_length - offset;
> > +            addr += used_len;
> > +        }
> > +
> > +        start = offset >> TARGET_PAGE_BITS;
> > +        npages = used_len >> TARGET_PAGE_BITS;
> > +        ram_state->migration_dirty_pages -=
> > +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> > +        bitmap_clear(block->bmap, start, npages);
> 
> If this is happening while the migration is running, this isn't safe -
> the migration code could clear a bit at about the same point this
> happens, so that the count returned by bitmap_count_one_with_offset
> wouldn't match the word that was cleared by bitmap_clear.
> 
> The only way I can see to fix it is to run over the range using
> bitmap_test_and_clear_atomic, using the return value to decrement
> the number of dirty pages.
> But you also need to be careful with the update of the
> migration_dirty_pages value itself, because that's also being read
> by the migration thread.
> 
> Dave

I see that there's migration_bitmap_sync but it does not seem to be
taken on all paths. E.g. migration_bitmap_clear_dirty and
migration_bitmap_find_dirty are called without that lock sometimes.
Thoughts?

-- 
MST

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-03-14 19:16       ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14 19:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
> > +            used_len = block->used_length - offset;
> > +            addr += used_len;
> > +        }
> > +
> > +        start = offset >> TARGET_PAGE_BITS;
> > +        npages = used_len >> TARGET_PAGE_BITS;
> > +        ram_state->migration_dirty_pages -=
> > +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> > +        bitmap_clear(block->bmap, start, npages);
> 
> If this is happening while the migration is running, this isn't safe -
> the migration code could clear a bit at about the same point this
> happens, so that the count returned by bitmap_count_one_with_offset
> wouldn't match the word that was cleared by bitmap_clear.
> 
> The only way I can see to fix it is to run over the range using
> bitmap_test_and_clear_atomic, using the return value to decrement
> the number of dirty pages.
> But you also need to be careful with the update of the
> migration_dirty_pages value itself, because that's also being read
> by the migration thread.
> 
> Dave

I see that there's migration_bitmap_sync but it does not seem to be
taken on all paths. E.g. migration_bitmap_clear_dirty and
migration_bitmap_find_dirty are called without that lock sometimes.
Thoughts?

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-03-14 19:16       ` [virtio-dev] " Michael S. Tsirkin
  (?)
@ 2018-03-14 19:42       ` Dr. David Alan Gilbert
  2018-03-14 20:38           ` [virtio-dev] " Michael S. Tsirkin
  2018-03-15 11:10           ` [virtio-dev] " Wei Wang
  -1 siblings, 2 replies; 56+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-14 19:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
> > > +            used_len = block->used_length - offset;
> > > +            addr += used_len;
> > > +        }
> > > +
> > > +        start = offset >> TARGET_PAGE_BITS;
> > > +        npages = used_len >> TARGET_PAGE_BITS;
> > > +        ram_state->migration_dirty_pages -=
> > > +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> > > +        bitmap_clear(block->bmap, start, npages);
> > 
> > If this is happening while the migration is running, this isn't safe -
> > the migration code could clear a bit at about the same point this
> > happens, so that the count returned by bitmap_count_one_with_offset
> > wouldn't match the word that was cleared by bitmap_clear.
> > 
> > The only way I can see to fix it is to run over the range using
> > bitmap_test_and_clear_atomic, using the return value to decrement
> > the number of dirty pages.
> > But you also need to be careful with the update of the
> > migration_dirty_pages value itself, because that's also being read
> > by the migration thread.
> > 
> > Dave
> 
> I see that there's migration_bitmap_sync but it does not seem to be

Do you mean bitmap_mutex?

> taken on all paths. E.g. migration_bitmap_clear_dirty and
> migration_bitmap_find_dirty are called without that lock sometimes.
> Thoughts?

Hmm, that doesn't seem to protect much at all!  It looks like it was
originally added to handle hotplug causing the bitmaps to be resized;
that extension code was removed in 66103a5 so that lock can probably go.

I don't see how the lock would help us though; the migration thread is
scanning it most of the time so would have to have the lock held
most of the time.

Dave

> -- 
> MST
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] migration: use the free page hint feature from balloon
  2018-03-07 12:34   ` [virtio-dev] " Wei Wang
  (?)
  (?)
@ 2018-03-14 19:49   ` Dr. David Alan Gilbert
  2018-03-16 11:20     ` Wei Wang
  -1 siblings, 1 reply; 56+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-14 19:49 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, mst, quintela, pbonzini, liliang.opensource,
	yang.zhang.wz, quan.xu0

* Wei Wang (wei.w.wang@intel.com) wrote:
> Start the free page optimization after the migration bitmap is
> synchronized. This can't be used in the stop&copy phase since the guest
> is paused. Make sure the guest reporting has stopped before
> synchronizing the migration dirty bitmap. Currently, the optimization is
> added to precopy only.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> CC: Juan Quintela <quintela@redhat.com>
> CC: Michael S. Tsirkin <mst@redhat.com>
> ---
>  migration/ram.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index e172798..7b4c9b1 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -51,6 +51,8 @@
>  #include "qemu/rcu_queue.h"
>  #include "migration/colo.h"
>  #include "migration/block.h"
> +#include "sysemu/balloon.h"
> +#include "sysemu/sysemu.h"
>  
>  /***********************************************************/
>  /* ram save/restore */
> @@ -208,6 +210,8 @@ struct RAMState {
>      uint32_t last_version;
>      /* We are in the first round */
>      bool ram_bulk_stage;
> +    /* The free pages optimization feature is supported */
> +    bool free_page_support;
>      /* How many times we have dirty too many pages */
>      int dirty_rate_high_cnt;
>      /* these variables are used for bitmap sync */
> @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>      unsigned long *bitmap = rb->bmap;
>      unsigned long next;
>  
> -    if (rs->ram_bulk_stage && start > 0) {
> +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
>          next = start + 1;

An easier thing is just to clear the ram_bulk_stage flag (and if you're
doing it in the middle of the migration you need to reset some of the
pointers; see postcopy_start for an example).

>      } else {
>          next = find_next_bit(bitmap, size, start);
> @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
>      int64_t end_time;
>      uint64_t bytes_xfer_now;
>  
> +    if (rs->free_page_support) {
> +        balloon_free_page_stop();

Does balloon_free_page_stop cause it to immediately stop, or does it
just ask nicely?   Could a slow guest keep pumping advice to us even
when it was stopped?

> +    }
> +
>      ram_counters.dirty_sync_count++;
>  
>      if (!rs->time_last_bitmap_sync) {
> @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
>      if (migrate_use_events()) {
>          qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
>      }
> +
> +    if (rs->free_page_support && runstate_is_running()) {
> +        balloon_free_page_start();
> +    }
>  }
>  
>  /**
> @@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
>      rs->last_page = 0;
>      rs->last_version = ram_list.version;
>      rs->ram_bulk_stage = true;
> +    rs->free_page_support = balloon_free_page_support() &
> +                            !migration_in_postcopy();

That's probably the wrong test for postcopy; I think it'll always
be false there.  Using migrate_postcopy_ram() tells you whether
postcopy-ram is enabled; although not necessarily in use at that
point.

Dave

>  }
>  
>  #define MAX_WAIT 50 /* ms, half buffered_file limit */
> @@ -2330,6 +2344,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  
>      ret = qemu_file_get_error(f);
>      if (ret < 0) {
> +        if (rs->free_page_support) {
> +            balloon_free_page_stop();
> +        }
>          return ret;
>      }
>  
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-03-14 19:42       ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-14 20:38           ` Michael S. Tsirkin
  2018-03-15 11:10           ` [virtio-dev] " Wei Wang
  1 sibling, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14 20:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 07:42:59PM +0000, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
> > > > +            used_len = block->used_length - offset;
> > > > +            addr += used_len;
> > > > +        }
> > > > +
> > > > +        start = offset >> TARGET_PAGE_BITS;
> > > > +        npages = used_len >> TARGET_PAGE_BITS;
> > > > +        ram_state->migration_dirty_pages -=
> > > > +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> > > > +        bitmap_clear(block->bmap, start, npages);
> > > 
> > > If this is happening while the migration is running, this isn't safe -
> > > the migration code could clear a bit at about the same point this
> > > happens, so that the count returned by bitmap_count_one_with_offset
> > > wouldn't match the word that was cleared by bitmap_clear.
> > > 
> > > The only way I can see to fix it is to run over the range using
> > > bitmap_test_and_clear_atomic, using the return value to decrement
> > > the number of dirty pages.
> > > But you also need to be careful with the update of the
> > > migration_dirty_pages value itself, because that's also being read
> > > by the migration thread.
> > > 
> > > Dave
> > 
> > I see that there's migration_bitmap_sync but it does not seem to be
> 
> Do you mean bitmap_mutex?

Yes. Sorry.

> > taken on all paths. E.g. migration_bitmap_clear_dirty and
> > migration_bitmap_find_dirty are called without that lock sometimes.
> > Thoughts?
> 
> Hmm, that doesn't seem to protect much at all!  It looks like it was
> originally added to handle hotplug causing the bitmaps to be resized;
> that extension code was removed in 66103a5 so that lock can probably go.
> 
> I don't see how the lock would help us though; the migration thread is
> scanning it most of the time so would have to have the lock held
> most of the time.
> 
> Dave
> 
> > -- 
> > MST
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-03-14 20:38           ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-14 20:38 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Wei Wang, qemu-devel, virtio-dev, quintela, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Wed, Mar 14, 2018 at 07:42:59PM +0000, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
> > > > +            used_len = block->used_length - offset;
> > > > +            addr += used_len;
> > > > +        }
> > > > +
> > > > +        start = offset >> TARGET_PAGE_BITS;
> > > > +        npages = used_len >> TARGET_PAGE_BITS;
> > > > +        ram_state->migration_dirty_pages -=
> > > > +                      bitmap_count_one_with_offset(block->bmap, start, npages);
> > > > +        bitmap_clear(block->bmap, start, npages);
> > > 
> > > If this is happening while the migration is running, this isn't safe -
> > > the migration code could clear a bit at about the same point this
> > > happens, so that the count returned by bitmap_count_one_with_offset
> > > wouldn't match the word that was cleared by bitmap_clear.
> > > 
> > > The only way I can see to fix it is to run over the range using
> > > bitmap_test_and_clear_atomic, using the return value to decrement
> > > the number of dirty pages.
> > > But you also need to be careful with the update of the
> > > migration_dirty_pages value itself, because that's also being read
> > > by the migration thread.
> > > 
> > > Dave
> > 
> > I see that there's migration_bitmap_sync but it does not seem to be
> 
> Do you mean bitmap_mutex?

Yes. Sorry.

> > taken on all paths. E.g. migration_bitmap_clear_dirty and
> > migration_bitmap_find_dirty are called without that lock sometimes.
> > Thoughts?
> 
> Hmm, that doesn't seem to protect much at all!  It looks like it was
> originally added to handle hotplug causing the bitmaps to be resized;
> that extension code was removed in 66103a5 so that lock can probably go.
> 
> I don't see how the lock would help us though; the migration thread is
> scanning it most of the time so would have to have the lock held
> most of the time.
> 
> Dave
> 
> > -- 
> > MST
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-14 14:12             ` Michael S. Tsirkin
@ 2018-03-15  1:15               ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-15  1:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/14/2018 10:12 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 14, 2018 at 02:03:19PM +0800, Wei Wang wrote:
>> On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
>>> On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
>>>> On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
>>>>> On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
>>>>>
>>>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>>>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>>>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>>> CC: Juan Quintela <quintela@redhat.com>
>>>>> I find it suspicious that neither unrealize nor reset
>>>>> functions have been touched at all.
>>>>> Are you sure you have thought through scenarious like
>>>>> hot-unplug or disabling the device by guest?
>>>> OK. I think we can call balloon_free_page_stop in unrealize and reset.
>>>>
>>>>
>>>>> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
>>>>> +{
>>>>> +    VirtQueueElement *elem;
>>>>> +    VirtIOBalloon *dev = opaque;
>>>>> +    VirtQueue *vq = dev->free_page_vq;
>>>>> +    uint32_t id;
>>>>> +    size_t size;
>>>>> What makes it safe to poke at this device from multiple threads?
>>>>> I think that it would be safer to do it from e.g. BH.
>>>>>
>>>> Actually the free_page_optimization thread is the only user of free_page_vq,
>>>> and there is only one optimization thread each time. Would this be safe
>>>> enough?
>>>>
>>>> Best,
>>>> Wei
>>> Aren't there other fields there? Also things like reset affect all VQs.
>>>
>> Yes. But I think BHs are used to avoid re-entrancy, which isn't the issue
>> here.
> Since you are adding locks to address the issue - doesn't this imply
> reentrancy is exactly the issue?

Not really. The lock isn't intended for any reentrancy issues, since 
there will be only one run of the virtio_balloon_poll_free_page_hints 
function at any given time. Instead, the lock is used to synchronize 
virtio_balloon_poll_free_page_hints and virtio_balloon_free_page_stop to 
access dev->free_page_report_status. Please see the whole picture below:

virtio_balloon_poll_free_page_hints()
{

     while (1) {
         qemu_spin_lock();
         if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP ||
             !runstate_is_running()) {
             qemu_spin_unlock();
             break;
         }
         ...
         if (id == dev->free_page_report_cmd_id) {
==>        dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
         ...
         qemu_spin_unlock();
     }
}


static void virtio_balloon_free_page_stop(void *opaque)
{
     VirtIOBalloon *s = opaque;
     VirtIODevice *vdev = VIRTIO_DEVICE(s);

     qemu_spin_lock();
...
==>       s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
     ...
     qemu_spin_unlock();
}


Without the lock, there are theoretical possibilities that assigning 
STOP below is overridden by START above. In that 
case,virtio_balloon_free_page_stop does not effectively stop 
virtio_balloon_poll_free_page_hints.
I think this issue couldn't be solved by BHs.

Best,
Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-15  1:15               ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-15  1:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/14/2018 10:12 PM, Michael S. Tsirkin wrote:
> On Wed, Mar 14, 2018 at 02:03:19PM +0800, Wei Wang wrote:
>> On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
>>> On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
>>>> On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
>>>>> On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
>>>>>
>>>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>>>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>>>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>>> CC: Juan Quintela <quintela@redhat.com>
>>>>> I find it suspicious that neither unrealize nor reset
>>>>> functions have been touched at all.
>>>>> Are you sure you have thought through scenarious like
>>>>> hot-unplug or disabling the device by guest?
>>>> OK. I think we can call balloon_free_page_stop in unrealize and reset.
>>>>
>>>>
>>>>> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
>>>>> +{
>>>>> +    VirtQueueElement *elem;
>>>>> +    VirtIOBalloon *dev = opaque;
>>>>> +    VirtQueue *vq = dev->free_page_vq;
>>>>> +    uint32_t id;
>>>>> +    size_t size;
>>>>> What makes it safe to poke at this device from multiple threads?
>>>>> I think that it would be safer to do it from e.g. BH.
>>>>>
>>>> Actually the free_page_optimization thread is the only user of free_page_vq,
>>>> and there is only one optimization thread each time. Would this be safe
>>>> enough?
>>>>
>>>> Best,
>>>> Wei
>>> Aren't there other fields there? Also things like reset affect all VQs.
>>>
>> Yes. But I think BHs are used to avoid re-entrancy, which isn't the issue
>> here.
> Since you are adding locks to address the issue - doesn't this imply
> reentrancy is exactly the issue?

Not really. The lock isn't intended for any reentrancy issues, since 
there will be only one run of the virtio_balloon_poll_free_page_hints 
function at any given time. Instead, the lock is used to synchronize 
virtio_balloon_poll_free_page_hints and virtio_balloon_free_page_stop to 
access dev->free_page_report_status. Please see the whole picture below:

virtio_balloon_poll_free_page_hints()
{

     while (1) {
         qemu_spin_lock();
         if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP ||
             !runstate_is_running()) {
             qemu_spin_unlock();
             break;
         }
         ...
         if (id == dev->free_page_report_cmd_id) {
==>        dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
         ...
         qemu_spin_unlock();
     }
}


static void virtio_balloon_free_page_stop(void *opaque)
{
     VirtIOBalloon *s = opaque;
     VirtIODevice *vdev = VIRTIO_DEVICE(s);

     qemu_spin_lock();
...
==>       s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
     ...
     qemu_spin_unlock();
}


Without the lock, there are theoretical possibilities that assigning 
STOP below is overridden by START above. In that 
case,virtio_balloon_free_page_stop does not effectively stop 
virtio_balloon_poll_free_page_hints.
I think this issue couldn't be solved by BHs.

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-15  1:15               ` Wei Wang
@ 2018-03-15  2:47                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-15  2:47 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Thu, Mar 15, 2018 at 09:15:48AM +0800, Wei Wang wrote:
> On 03/14/2018 10:12 PM, Michael S. Tsirkin wrote:
> > On Wed, Mar 14, 2018 at 02:03:19PM +0800, Wei Wang wrote:
> > > On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
> > > > > On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> > > > > > 
> > > > > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > > > > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > > > > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > > > CC: Juan Quintela <quintela@redhat.com>
> > > > > > I find it suspicious that neither unrealize nor reset
> > > > > > functions have been touched at all.
> > > > > > Are you sure you have thought through scenarious like
> > > > > > hot-unplug or disabling the device by guest?
> > > > > OK. I think we can call balloon_free_page_stop in unrealize and reset.
> > > > > 
> > > > > 
> > > > > > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > > > > > +{
> > > > > > +    VirtQueueElement *elem;
> > > > > > +    VirtIOBalloon *dev = opaque;
> > > > > > +    VirtQueue *vq = dev->free_page_vq;
> > > > > > +    uint32_t id;
> > > > > > +    size_t size;
> > > > > > What makes it safe to poke at this device from multiple threads?
> > > > > > I think that it would be safer to do it from e.g. BH.
> > > > > > 
> > > > > Actually the free_page_optimization thread is the only user of free_page_vq,
> > > > > and there is only one optimization thread each time. Would this be safe
> > > > > enough?
> > > > > 
> > > > > Best,
> > > > > Wei
> > > > Aren't there other fields there? Also things like reset affect all VQs.
> > > > 
> > > Yes. But I think BHs are used to avoid re-entrancy, which isn't the issue
> > > here.
> > Since you are adding locks to address the issue - doesn't this imply
> > reentrancy is exactly the issue?
> 
> Not really. The lock isn't intended for any reentrancy issues, since there
> will be only one run of the virtio_balloon_poll_free_page_hints function at
> any given time. Instead, the lock is used to synchronize
> virtio_balloon_poll_free_page_hints and virtio_balloon_free_page_stop to
> access dev->free_page_report_status.

I wonder whether that's enough. E.g. is there a race with guest
trying to reset the device? That resets all VQs you know.


> Please see the whole picture below:
> 
> virtio_balloon_poll_free_page_hints()
> {
> 
>     while (1) {
>         qemu_spin_lock();
>         if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP ||
>             !runstate_is_running()) {
>             qemu_spin_unlock();
>             break;
>         }
>         ...
>         if (id == dev->free_page_report_cmd_id) {
> ==>        dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
>         ...
>         qemu_spin_unlock();
>     }
> }
> 
> 
> static void virtio_balloon_free_page_stop(void *opaque)
> {
>     VirtIOBalloon *s = opaque;
>     VirtIODevice *vdev = VIRTIO_DEVICE(s);
> 
>     qemu_spin_lock();
> ...
> ==>       s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>     ...
>     qemu_spin_unlock();
> }
> 
> 
> Without the lock, there are theoretical possibilities that assigning STOP
> below is overridden by START above. In that
> case,virtio_balloon_free_page_stop does not effectively stop
> virtio_balloon_poll_free_page_hints.
> I think this issue couldn't be solved by BHs.
> 
> Best,
> Wei

Don't all BHs run under the BQL?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-15  2:47                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-15  2:47 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Thu, Mar 15, 2018 at 09:15:48AM +0800, Wei Wang wrote:
> On 03/14/2018 10:12 PM, Michael S. Tsirkin wrote:
> > On Wed, Mar 14, 2018 at 02:03:19PM +0800, Wei Wang wrote:
> > > On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
> > > > > On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> > > > > > 
> > > > > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > > > > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > > > > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > > > CC: Juan Quintela <quintela@redhat.com>
> > > > > > I find it suspicious that neither unrealize nor reset
> > > > > > functions have been touched at all.
> > > > > > Are you sure you have thought through scenarious like
> > > > > > hot-unplug or disabling the device by guest?
> > > > > OK. I think we can call balloon_free_page_stop in unrealize and reset.
> > > > > 
> > > > > 
> > > > > > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > > > > > +{
> > > > > > +    VirtQueueElement *elem;
> > > > > > +    VirtIOBalloon *dev = opaque;
> > > > > > +    VirtQueue *vq = dev->free_page_vq;
> > > > > > +    uint32_t id;
> > > > > > +    size_t size;
> > > > > > What makes it safe to poke at this device from multiple threads?
> > > > > > I think that it would be safer to do it from e.g. BH.
> > > > > > 
> > > > > Actually the free_page_optimization thread is the only user of free_page_vq,
> > > > > and there is only one optimization thread each time. Would this be safe
> > > > > enough?
> > > > > 
> > > > > Best,
> > > > > Wei
> > > > Aren't there other fields there? Also things like reset affect all VQs.
> > > > 
> > > Yes. But I think BHs are used to avoid re-entrancy, which isn't the issue
> > > here.
> > Since you are adding locks to address the issue - doesn't this imply
> > reentrancy is exactly the issue?
> 
> Not really. The lock isn't intended for any reentrancy issues, since there
> will be only one run of the virtio_balloon_poll_free_page_hints function at
> any given time. Instead, the lock is used to synchronize
> virtio_balloon_poll_free_page_hints and virtio_balloon_free_page_stop to
> access dev->free_page_report_status.

I wonder whether that's enough. E.g. is there a race with guest
trying to reset the device? That resets all VQs you know.


> Please see the whole picture below:
> 
> virtio_balloon_poll_free_page_hints()
> {
> 
>     while (1) {
>         qemu_spin_lock();
>         if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP ||
>             !runstate_is_running()) {
>             qemu_spin_unlock();
>             break;
>         }
>         ...
>         if (id == dev->free_page_report_cmd_id) {
> ==>        dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
>         ...
>         qemu_spin_unlock();
>     }
> }
> 
> 
> static void virtio_balloon_free_page_stop(void *opaque)
> {
>     VirtIOBalloon *s = opaque;
>     VirtIODevice *vdev = VIRTIO_DEVICE(s);
> 
>     qemu_spin_lock();
> ...
> ==>       s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>     ...
>     qemu_spin_unlock();
> }
> 
> 
> Without the lock, there are theoretical possibilities that assigning STOP
> below is overridden by START above. In that
> case,virtio_balloon_free_page_stop does not effectively stop
> virtio_balloon_poll_free_page_hints.
> I think this issue couldn't be solved by BHs.
> 
> Best,
> Wei

Don't all BHs run under the BQL?

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-15  2:47                 ` Michael S. Tsirkin
@ 2018-03-15 10:24                   ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-15 10:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/15/2018 10:47 AM, Michael S. Tsirkin wrote:
> On Thu, Mar 15, 2018 at 09:15:48AM +0800, Wei Wang wrote:
>> On 03/14/2018 10:12 PM, Michael S. Tsirkin wrote:
>>> On Wed, Mar 14, 2018 at 02:03:19PM +0800, Wei Wang wrote:
>>>> On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
>>>>> On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
>>>>>> On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
>>>>>>>
>>>>>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>>>>>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>>>>>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>>>>> CC: Juan Quintela <quintela@redhat.com>
>>>>>>> I find it suspicious that neither unrealize nor reset
>>>>>>> functions have been touched at all.
>>>>>>> Are you sure you have thought through scenarious like
>>>>>>> hot-unplug or disabling the device by guest?
>>>>>> OK. I think we can call balloon_free_page_stop in unrealize and reset.
>>>>>>
>>>>>>
>>>>>>> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
>>>>>>> +{
>>>>>>> +    VirtQueueElement *elem;
>>>>>>> +    VirtIOBalloon *dev = opaque;
>>>>>>> +    VirtQueue *vq = dev->free_page_vq;
>>>>>>> +    uint32_t id;
>>>>>>> +    size_t size;
>>>>>>> What makes it safe to poke at this device from multiple threads?
>>>>>>> I think that it would be safer to do it from e.g. BH.
>>>>>>>
>>>>>> Actually the free_page_optimization thread is the only user of free_page_vq,
>>>>>> and there is only one optimization thread each time. Would this be safe
>>>>>> enough?
>>>>>>
>>>>>> Best,
>>>>>> Wei
>>>>> Aren't there other fields there? Also things like reset affect all VQs.
>>>>>
>>>> Yes. But I think BHs are used to avoid re-entrancy, which isn't the issue
>>>> here.
>>> Since you are adding locks to address the issue - doesn't this imply
>>> reentrancy is exactly the issue?
>> Not really. The lock isn't intended for any reentrancy issues, since there
>> will be only one run of the virtio_balloon_poll_free_page_hints function at
>> any given time. Instead, the lock is used to synchronize
>> virtio_balloon_poll_free_page_hints and virtio_balloon_free_page_stop to
>> access dev->free_page_report_status.
> I wonder whether that's enough. E.g. is there a race with guest
> trying to reset the device? That resets all VQs you know.

I think that's OK - we will call virtio_balloon_free_page_stop in the 
device reset function, and qemu_thread_join() in 
virtio_balloon_free_page_stop will wait till the optimization thread 
exits. That is, the reset will proceed after the optimization thread exits.


>
>
>> Please see the whole picture below:
>>
>> virtio_balloon_poll_free_page_hints()
>> {
>>
>>      while (1) {
>>          qemu_spin_lock();
>>          if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP ||
>>              !runstate_is_running()) {
>>              qemu_spin_unlock();
>>              break;
>>          }
>>          ...
>>          if (id == dev->free_page_report_cmd_id) {
>> ==>        dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
>>          ...
>>          qemu_spin_unlock();
>>      }
>> }
>>
>>
>> static void virtio_balloon_free_page_stop(void *opaque)
>> {
>>      VirtIOBalloon *s = opaque;
>>      VirtIODevice *vdev = VIRTIO_DEVICE(s);
>>
>>      qemu_spin_lock();
>> ...
>> ==>       s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>>      ...
>>      qemu_spin_unlock();
>> }
>>
>>
>> Without the lock, there are theoretical possibilities that assigning STOP
>> below is overridden by START above. In that
>> case,virtio_balloon_free_page_stop does not effectively stop
>> virtio_balloon_poll_free_page_hints.
>> I think this issue couldn't be solved by BHs.
>>
>> Best,
>> Wei
> Don't all BHs run under the BQL?

Actually the virtio_balloon_free_page_stop is called by the migration 
thread (instead of a BH). Even we guarantee the migration thread calls 
virtio_balloon_free_page_stop under BQL, the BQL is still too big for 
our case. Imagine this case: when the migration thread calls 
virtio_balloon_free_page_stop to stop the reporting, it blocks by BQL as 
virtio_balloon_poll_free_page_hints is in progress with BQL held, and 
the migration thread won't proceed untill 
virtio_balloon_poll_free_page_hints exits (i.e. getting all the hints). 
I think this isn't our intention - we basically want the migration 
thread to stop the guest reporting immediately.
So I think the small lock above is better (it locks for only one hint).

Best,
Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-15 10:24                   ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-15 10:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/15/2018 10:47 AM, Michael S. Tsirkin wrote:
> On Thu, Mar 15, 2018 at 09:15:48AM +0800, Wei Wang wrote:
>> On 03/14/2018 10:12 PM, Michael S. Tsirkin wrote:
>>> On Wed, Mar 14, 2018 at 02:03:19PM +0800, Wei Wang wrote:
>>>> On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
>>>>> On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
>>>>>> On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
>>>>>>>
>>>>>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>>>>>> Signed-off-by: Liang Li <liang.z.li@intel.com>
>>>>>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>>>>>> CC: Juan Quintela <quintela@redhat.com>
>>>>>>> I find it suspicious that neither unrealize nor reset
>>>>>>> functions have been touched at all.
>>>>>>> Are you sure you have thought through scenarious like
>>>>>>> hot-unplug or disabling the device by guest?
>>>>>> OK. I think we can call balloon_free_page_stop in unrealize and reset.
>>>>>>
>>>>>>
>>>>>>> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
>>>>>>> +{
>>>>>>> +    VirtQueueElement *elem;
>>>>>>> +    VirtIOBalloon *dev = opaque;
>>>>>>> +    VirtQueue *vq = dev->free_page_vq;
>>>>>>> +    uint32_t id;
>>>>>>> +    size_t size;
>>>>>>> What makes it safe to poke at this device from multiple threads?
>>>>>>> I think that it would be safer to do it from e.g. BH.
>>>>>>>
>>>>>> Actually the free_page_optimization thread is the only user of free_page_vq,
>>>>>> and there is only one optimization thread each time. Would this be safe
>>>>>> enough?
>>>>>>
>>>>>> Best,
>>>>>> Wei
>>>>> Aren't there other fields there? Also things like reset affect all VQs.
>>>>>
>>>> Yes. But I think BHs are used to avoid re-entrancy, which isn't the issue
>>>> here.
>>> Since you are adding locks to address the issue - doesn't this imply
>>> reentrancy is exactly the issue?
>> Not really. The lock isn't intended for any reentrancy issues, since there
>> will be only one run of the virtio_balloon_poll_free_page_hints function at
>> any given time. Instead, the lock is used to synchronize
>> virtio_balloon_poll_free_page_hints and virtio_balloon_free_page_stop to
>> access dev->free_page_report_status.
> I wonder whether that's enough. E.g. is there a race with guest
> trying to reset the device? That resets all VQs you know.

I think that's OK - we will call virtio_balloon_free_page_stop in the 
device reset function, and qemu_thread_join() in 
virtio_balloon_free_page_stop will wait till the optimization thread 
exits. That is, the reset will proceed after the optimization thread exits.


>
>
>> Please see the whole picture below:
>>
>> virtio_balloon_poll_free_page_hints()
>> {
>>
>>      while (1) {
>>          qemu_spin_lock();
>>          if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP ||
>>              !runstate_is_running()) {
>>              qemu_spin_unlock();
>>              break;
>>          }
>>          ...
>>          if (id == dev->free_page_report_cmd_id) {
>> ==>        dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
>>          ...
>>          qemu_spin_unlock();
>>      }
>> }
>>
>>
>> static void virtio_balloon_free_page_stop(void *opaque)
>> {
>>      VirtIOBalloon *s = opaque;
>>      VirtIODevice *vdev = VIRTIO_DEVICE(s);
>>
>>      qemu_spin_lock();
>> ...
>> ==>       s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
>>      ...
>>      qemu_spin_unlock();
>> }
>>
>>
>> Without the lock, there are theoretical possibilities that assigning STOP
>> below is overridden by START above. In that
>> case,virtio_balloon_free_page_stop does not effectively stop
>> virtio_balloon_poll_free_page_hints.
>> I think this issue couldn't be solved by BHs.
>>
>> Best,
>> Wei
> Don't all BHs run under the BQL?

Actually the virtio_balloon_free_page_stop is called by the migration 
thread (instead of a BH). Even we guarantee the migration thread calls 
virtio_balloon_free_page_stop under BQL, the BQL is still too big for 
our case. Imagine this case: when the migration thread calls 
virtio_balloon_free_page_stop to stop the reporting, it blocks by BQL as 
virtio_balloon_poll_free_page_hints is in progress with BQL held, and 
the migration thread won't proceed untill 
virtio_balloon_poll_free_page_hints exits (i.e. getting all the hints). 
I think this isn't our intention - we basically want the migration 
thread to stop the guest reporting immediately.
So I think the small lock above is better (it locks for only one hint).

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-03-14 18:11   ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-15 10:52       ` Wei Wang
  2018-03-15 10:52       ` [virtio-dev] " Wei Wang
  1 sibling, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-15 10:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, virtio-dev, mst, quintela, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote:
> * Wei Wang (wei.w.wang@intel.com) wrote:
>> This patch adds an API to clear bits corresponding to guest free pages
>> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
>> RAMBlock boundary.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>   include/migration/misc.h |  2 ++
>>   migration/ram.c          | 21 +++++++++++++++++++++
>>   2 files changed, 23 insertions(+)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index 77fd4f5..fae1acf 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -14,11 +14,13 @@
>>   #ifndef MIGRATION_MISC_H
>>   #define MIGRATION_MISC_H
>>   
>> +#include "exec/cpu-common.h"
>>   #include "qemu/notify.h"
>>   
>>   /* migration/ram.c */
>>   
>>   void ram_mig_init(void);
>> +void qemu_guest_free_page_hint(void *addr, size_t len);
>>   
>>   /* migration/block.c */
>>   
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 5e33e5c..e172798 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
>>       return 0;
>>   }
>>   
> This could do with some comments

OK, I'll add some.

>
>> +void qemu_guest_free_page_hint(void *addr, size_t len)
>> +{
>> +    RAMBlock *block;
>> +    ram_addr_t offset;
>> +    size_t used_len, start, npages;
>  From your use I think the addr and len are coming raw from the guest;
> so we need to take some care.
>

Actually the "addr" here has been the host address that corresponds to 
the guest free page. It's from elem->in_sg[0].iov_base.

>
>> +        if (unlikely(offset + len > block->used_length)) {
> I think to make that overflow safe, that should be:
>    if (len > (block->used_length - offset)) {
>
> But we'll need another test before it, because qemu_ram_block_from_host
> seems to check max_length not used_length, so we need to check
> for offset > block->used_length first

OK, how about adding an assert above, like this:

block = qemu_ram_block_from_host(addr, false, &offset);
assert (offset  < block->used_length );
if (!block)
     ...

The address corresponds to a guest free page, which means it should be 
within used_length. If not, something weird happens, I think we'd better 
to assert it in that case.

Best,
Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-03-15 10:52       ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-15 10:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, virtio-dev, mst, quintela, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote:
> * Wei Wang (wei.w.wang@intel.com) wrote:
>> This patch adds an API to clear bits corresponding to guest free pages
>> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
>> RAMBlock boundary.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>   include/migration/misc.h |  2 ++
>>   migration/ram.c          | 21 +++++++++++++++++++++
>>   2 files changed, 23 insertions(+)
>>
>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>> index 77fd4f5..fae1acf 100644
>> --- a/include/migration/misc.h
>> +++ b/include/migration/misc.h
>> @@ -14,11 +14,13 @@
>>   #ifndef MIGRATION_MISC_H
>>   #define MIGRATION_MISC_H
>>   
>> +#include "exec/cpu-common.h"
>>   #include "qemu/notify.h"
>>   
>>   /* migration/ram.c */
>>   
>>   void ram_mig_init(void);
>> +void qemu_guest_free_page_hint(void *addr, size_t len);
>>   
>>   /* migration/block.c */
>>   
>> diff --git a/migration/ram.c b/migration/ram.c
>> index 5e33e5c..e172798 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
>>       return 0;
>>   }
>>   
> This could do with some comments

OK, I'll add some.

>
>> +void qemu_guest_free_page_hint(void *addr, size_t len)
>> +{
>> +    RAMBlock *block;
>> +    ram_addr_t offset;
>> +    size_t used_len, start, npages;
>  From your use I think the addr and len are coming raw from the guest;
> so we need to take some care.
>

Actually the "addr" here has been the host address that corresponds to 
the guest free page. It's from elem->in_sg[0].iov_base.

>
>> +        if (unlikely(offset + len > block->used_length)) {
> I think to make that overflow safe, that should be:
>    if (len > (block->used_length - offset)) {
>
> But we'll need another test before it, because qemu_ram_block_from_host
> seems to check max_length not used_length, so we need to check
> for offset > block->used_length first

OK, how about adding an assert above, like this:

block = qemu_ram_block_from_host(addr, false, &offset);
assert (offset  < block->used_length );
if (!block)
     ...

The address corresponds to a guest free page, which means it should be 
within used_length. If not, something weird happens, I think we'd better 
to assert it in that case.

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-03-14 19:42       ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-15 11:10           ` Wei Wang
  2018-03-15 11:10           ` [virtio-dev] " Wei Wang
  1 sibling, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-15 11:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, pbonzini, liliang.opensource,
	yang.zhang.wz, quan.xu0, nilal, riel

On 03/15/2018 03:42 AM, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
>> On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
>>>> +            used_len = block->used_length - offset;
>>>> +            addr += used_len;
>>>> +        }
>>>> +
>>>> +        start = offset >> TARGET_PAGE_BITS;
>>>> +        npages = used_len >> TARGET_PAGE_BITS;
>>>> +        ram_state->migration_dirty_pages -=
>>>> +                      bitmap_count_one_with_offset(block->bmap, start, npages);
>>>> +        bitmap_clear(block->bmap, start, npages);
>>> If this is happening while the migration is running, this isn't safe -
>>> the migration code could clear a bit at about the same point this
>>> happens, so that the count returned by bitmap_count_one_with_offset
>>> wouldn't match the word that was cleared by bitmap_clear.
>>>
>>> The only way I can see to fix it is to run over the range using
>>> bitmap_test_and_clear_atomic, using the return value to decrement
>>> the number of dirty pages.
>>> But you also need to be careful with the update of the
>>> migration_dirty_pages value itself, because that's also being read
>>> by the migration thread.
>>>
>>> Dave
>> I see that there's migration_bitmap_sync but it does not seem to be
> Do you mean bitmap_mutex?
>
>> taken on all paths. E.g. migration_bitmap_clear_dirty and
>> migration_bitmap_find_dirty are called without that lock sometimes.
>> Thoughts?

Right. The bitmap claims to protect modification of the bitmap, but 
migration_bitmap_clear_dirty doesn't strictly follow the rule.

> Hmm, that doesn't seem to protect much at all!  It looks like it was
> originally added to handle hotplug causing the bitmaps to be resized;
> that extension code was removed in 66103a5 so that lock can probably go.
>
> I don't see how the lock would help us though; the migration thread is
> scanning it most of the time so would have to have the lock held
> most of the time.
>



How about adding the lock to migration_bitmap_clear_dirty, and we will 
have something like this:

migration_bitmap_clear_dirty()
{
     qemu_mutex_lock(&rs->bitmap_mutex);
     ret = test_and_clear_bit(page, rb->bmap);
      if (ret) {
         rs->migration_dirty_pages--;
     }
     ...
     qemu_mutex_unlock(&rs->bitmap_mutex);
}


qemu_guest_free_page_hint()
{
     qemu_mutex_lock(&rs->bitmap_mutex);
     ...
     ram_state->migration_dirty_pages -=
                       bitmap_count_one_with_offset(block->bmap, start, 
npages);
     bitmap_clear(block->bmap, start, npages);
     qemu_mutex_unlock(&rs->bitmap_mutex);
}


The migration thread will hold the lock only when it clears a bit from 
the bitmap. Or would you consider to change it to qemu_spin_lock?

Best,
Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-03-15 11:10           ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-15 11:10 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Michael S. Tsirkin
  Cc: qemu-devel, virtio-dev, quintela, pbonzini, liliang.opensource,
	yang.zhang.wz, quan.xu0, nilal, riel

On 03/15/2018 03:42 AM, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
>> On Wed, Mar 14, 2018 at 06:11:37PM +0000, Dr. David Alan Gilbert wrote:
>>>> +            used_len = block->used_length - offset;
>>>> +            addr += used_len;
>>>> +        }
>>>> +
>>>> +        start = offset >> TARGET_PAGE_BITS;
>>>> +        npages = used_len >> TARGET_PAGE_BITS;
>>>> +        ram_state->migration_dirty_pages -=
>>>> +                      bitmap_count_one_with_offset(block->bmap, start, npages);
>>>> +        bitmap_clear(block->bmap, start, npages);
>>> If this is happening while the migration is running, this isn't safe -
>>> the migration code could clear a bit at about the same point this
>>> happens, so that the count returned by bitmap_count_one_with_offset
>>> wouldn't match the word that was cleared by bitmap_clear.
>>>
>>> The only way I can see to fix it is to run over the range using
>>> bitmap_test_and_clear_atomic, using the return value to decrement
>>> the number of dirty pages.
>>> But you also need to be careful with the update of the
>>> migration_dirty_pages value itself, because that's also being read
>>> by the migration thread.
>>>
>>> Dave
>> I see that there's migration_bitmap_sync but it does not seem to be
> Do you mean bitmap_mutex?
>
>> taken on all paths. E.g. migration_bitmap_clear_dirty and
>> migration_bitmap_find_dirty are called without that lock sometimes.
>> Thoughts?

Right. The bitmap claims to protect modification of the bitmap, but 
migration_bitmap_clear_dirty doesn't strictly follow the rule.

> Hmm, that doesn't seem to protect much at all!  It looks like it was
> originally added to handle hotplug causing the bitmaps to be resized;
> that extension code was removed in 66103a5 so that lock can probably go.
>
> I don't see how the lock would help us though; the migration thread is
> scanning it most of the time so would have to have the lock held
> most of the time.
>



How about adding the lock to migration_bitmap_clear_dirty, and we will 
have something like this:

migration_bitmap_clear_dirty()
{
     qemu_mutex_lock(&rs->bitmap_mutex);
     ret = test_and_clear_bit(page, rb->bmap);
      if (ret) {
         rs->migration_dirty_pages--;
     }
     ...
     qemu_mutex_unlock(&rs->bitmap_mutex);
}


qemu_guest_free_page_hint()
{
     qemu_mutex_lock(&rs->bitmap_mutex);
     ...
     ram_state->migration_dirty_pages -=
                       bitmap_count_one_with_offset(block->bmap, start, 
npages);
     bitmap_clear(block->bmap, start, npages);
     qemu_mutex_unlock(&rs->bitmap_mutex);
}


The migration thread will hold the lock only when it clears a bit from 
the bitmap. Or would you consider to change it to qemu_spin_lock?

Best,
Wei


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-03-15 10:52       ` [virtio-dev] " Wei Wang
@ 2018-03-15 13:50         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-15 13:50 UTC (permalink / raw)
  To: Wei Wang
  Cc: Dr. David Alan Gilbert, qemu-devel, virtio-dev, quintela,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel

On Thu, Mar 15, 2018 at 06:52:41PM +0800, Wei Wang wrote:
> On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote:
> > * Wei Wang (wei.w.wang@intel.com) wrote:
> > > This patch adds an API to clear bits corresponding to guest free pages
> > > from the dirty bitmap. Spilt the free page block if it crosses the QEMU
> > > RAMBlock boundary.
> > > 
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >   include/migration/misc.h |  2 ++
> > >   migration/ram.c          | 21 +++++++++++++++++++++
> > >   2 files changed, 23 insertions(+)
> > > 
> > > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > > index 77fd4f5..fae1acf 100644
> > > --- a/include/migration/misc.h
> > > +++ b/include/migration/misc.h
> > > @@ -14,11 +14,13 @@
> > >   #ifndef MIGRATION_MISC_H
> > >   #define MIGRATION_MISC_H
> > > +#include "exec/cpu-common.h"
> > >   #include "qemu/notify.h"
> > >   /* migration/ram.c */
> > >   void ram_mig_init(void);
> > > +void qemu_guest_free_page_hint(void *addr, size_t len);
> > >   /* migration/block.c */
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 5e33e5c..e172798 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
> > >       return 0;
> > >   }
> > This could do with some comments
> 
> OK, I'll add some.
> 
> > 
> > > +void qemu_guest_free_page_hint(void *addr, size_t len)
> > > +{
> > > +    RAMBlock *block;
> > > +    ram_addr_t offset;
> > > +    size_t used_len, start, npages;
> >  From your use I think the addr and len are coming raw from the guest;
> > so we need to take some care.
> > 
> 
> Actually the "addr" here has been the host address that corresponds to the
> guest free page. It's from elem->in_sg[0].iov_base.
> 
> > 
> > > +        if (unlikely(offset + len > block->used_length)) {
> > I think to make that overflow safe, that should be:
> >    if (len > (block->used_length - offset)) {
> > 
> > But we'll need another test before it, because qemu_ram_block_from_host
> > seems to check max_length not used_length, so we need to check
> > for offset > block->used_length first
> 
> OK, how about adding an assert above, like this:
> 
> block = qemu_ram_block_from_host(addr, false, &offset);
> assert (offset  < block->used_length );
> if (!block)
>     ...
> 
> The address corresponds to a guest free page, which means it should be
> within used_length. If not, something weird happens, I think we'd better to
> assert it in that case.
> 
> Best,
> Wei

What if memory has been removed by hotunplug after guest sent the
free page notification?

This seems to actually be likely to happen as memory being unplugged
would typically be mostly free.

-- 
MST

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-03-15 13:50         ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-15 13:50 UTC (permalink / raw)
  To: Wei Wang
  Cc: Dr. David Alan Gilbert, qemu-devel, virtio-dev, quintela,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel

On Thu, Mar 15, 2018 at 06:52:41PM +0800, Wei Wang wrote:
> On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote:
> > * Wei Wang (wei.w.wang@intel.com) wrote:
> > > This patch adds an API to clear bits corresponding to guest free pages
> > > from the dirty bitmap. Spilt the free page block if it crosses the QEMU
> > > RAMBlock boundary.
> > > 
> > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > CC: Juan Quintela <quintela@redhat.com>
> > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >   include/migration/misc.h |  2 ++
> > >   migration/ram.c          | 21 +++++++++++++++++++++
> > >   2 files changed, 23 insertions(+)
> > > 
> > > diff --git a/include/migration/misc.h b/include/migration/misc.h
> > > index 77fd4f5..fae1acf 100644
> > > --- a/include/migration/misc.h
> > > +++ b/include/migration/misc.h
> > > @@ -14,11 +14,13 @@
> > >   #ifndef MIGRATION_MISC_H
> > >   #define MIGRATION_MISC_H
> > > +#include "exec/cpu-common.h"
> > >   #include "qemu/notify.h"
> > >   /* migration/ram.c */
> > >   void ram_mig_init(void);
> > > +void qemu_guest_free_page_hint(void *addr, size_t len);
> > >   /* migration/block.c */
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index 5e33e5c..e172798 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
> > >       return 0;
> > >   }
> > This could do with some comments
> 
> OK, I'll add some.
> 
> > 
> > > +void qemu_guest_free_page_hint(void *addr, size_t len)
> > > +{
> > > +    RAMBlock *block;
> > > +    ram_addr_t offset;
> > > +    size_t used_len, start, npages;
> >  From your use I think the addr and len are coming raw from the guest;
> > so we need to take some care.
> > 
> 
> Actually the "addr" here has been the host address that corresponds to the
> guest free page. It's from elem->in_sg[0].iov_base.
> 
> > 
> > > +        if (unlikely(offset + len > block->used_length)) {
> > I think to make that overflow safe, that should be:
> >    if (len > (block->used_length - offset)) {
> > 
> > But we'll need another test before it, because qemu_ram_block_from_host
> > seems to check max_length not used_length, so we need to check
> > for offset > block->used_length first
> 
> OK, how about adding an assert above, like this:
> 
> block = qemu_ram_block_from_host(addr, false, &offset);
> assert (offset  < block->used_length );
> if (!block)
>     ...
> 
> The address corresponds to a guest free page, which means it should be
> within used_length. If not, something weird happens, I think we'd better to
> assert it in that case.
> 
> Best,
> Wei

What if memory has been removed by hotunplug after guest sent the
free page notification?

This seems to actually be likely to happen as memory being unplugged
would typically be mostly free.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
  2018-03-15 10:24                   ` Wei Wang
@ 2018-03-15 13:53                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-15 13:53 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Thu, Mar 15, 2018 at 06:24:16PM +0800, Wei Wang wrote:
> On 03/15/2018 10:47 AM, Michael S. Tsirkin wrote:
> > On Thu, Mar 15, 2018 at 09:15:48AM +0800, Wei Wang wrote:
> > > On 03/14/2018 10:12 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 14, 2018 at 02:03:19PM +0800, Wei Wang wrote:
> > > > > On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
> > > > > > > On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> > > > > > > > 
> > > > > > > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > > > > > > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > > > > > > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > > > > > CC: Juan Quintela <quintela@redhat.com>
> > > > > > > > I find it suspicious that neither unrealize nor reset
> > > > > > > > functions have been touched at all.
> > > > > > > > Are you sure you have thought through scenarious like
> > > > > > > > hot-unplug or disabling the device by guest?
> > > > > > > OK. I think we can call balloon_free_page_stop in unrealize and reset.
> > > > > > > 
> > > > > > > 
> > > > > > > > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > > > > > > > +{
> > > > > > > > +    VirtQueueElement *elem;
> > > > > > > > +    VirtIOBalloon *dev = opaque;
> > > > > > > > +    VirtQueue *vq = dev->free_page_vq;
> > > > > > > > +    uint32_t id;
> > > > > > > > +    size_t size;
> > > > > > > > What makes it safe to poke at this device from multiple threads?
> > > > > > > > I think that it would be safer to do it from e.g. BH.
> > > > > > > > 
> > > > > > > Actually the free_page_optimization thread is the only user of free_page_vq,
> > > > > > > and there is only one optimization thread each time. Would this be safe
> > > > > > > enough?
> > > > > > > 
> > > > > > > Best,
> > > > > > > Wei
> > > > > > Aren't there other fields there? Also things like reset affect all VQs.
> > > > > > 
> > > > > Yes. But I think BHs are used to avoid re-entrancy, which isn't the issue
> > > > > here.
> > > > Since you are adding locks to address the issue - doesn't this imply
> > > > reentrancy is exactly the issue?
> > > Not really. The lock isn't intended for any reentrancy issues, since there
> > > will be only one run of the virtio_balloon_poll_free_page_hints function at
> > > any given time. Instead, the lock is used to synchronize
> > > virtio_balloon_poll_free_page_hints and virtio_balloon_free_page_stop to
> > > access dev->free_page_report_status.
> > I wonder whether that's enough. E.g. is there a race with guest
> > trying to reset the device? That resets all VQs you know.
> 
> I think that's OK - we will call virtio_balloon_free_page_stop in the device
> reset function, and qemu_thread_join() in virtio_balloon_free_page_stop will
> wait till the optimization thread exits. That is, the reset will proceed
> after the optimization thread exits.
> 
> 
> > 
> > 
> > > Please see the whole picture below:
> > > 
> > > virtio_balloon_poll_free_page_hints()
> > > {
> > > 
> > >      while (1) {
> > >          qemu_spin_lock();
> > >          if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP ||
> > >              !runstate_is_running()) {
> > >              qemu_spin_unlock();
> > >              break;
> > >          }
> > >          ...
> > >          if (id == dev->free_page_report_cmd_id) {
> > > ==>        dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> > >          ...
> > >          qemu_spin_unlock();
> > >      }
> > > }
> > > 
> > > 
> > > static void virtio_balloon_free_page_stop(void *opaque)
> > > {
> > >      VirtIOBalloon *s = opaque;
> > >      VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > > 
> > >      qemu_spin_lock();
> > > ...
> > > ==>       s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> > >      ...
> > >      qemu_spin_unlock();
> > > }
> > > 
> > > 
> > > Without the lock, there are theoretical possibilities that assigning STOP
> > > below is overridden by START above. In that
> > > case,virtio_balloon_free_page_stop does not effectively stop
> > > virtio_balloon_poll_free_page_hints.
> > > I think this issue couldn't be solved by BHs.
> > > 
> > > Best,
> > > Wei
> > Don't all BHs run under the BQL?
> 
> Actually the virtio_balloon_free_page_stop is called by the migration thread
> (instead of a BH). Even we guarantee the migration thread calls
> virtio_balloon_free_page_stop under BQL, the BQL is still too big for our
> case. Imagine this case: when the migration thread calls
> virtio_balloon_free_page_stop to stop the reporting, it blocks by BQL as
> virtio_balloon_poll_free_page_hints is in progress with BQL held, and the
> migration thread won't proceed untill virtio_balloon_poll_free_page_hints
> exits (i.e. getting all the hints). I think this isn't our intention - we
> basically want the migration thread to stop the guest reporting immediately.
> So I think the small lock above is better (it locks for only one hint).
> 
> Best,
> Wei

I am just saying that all these locking and ownership tricks need to be
documented. BH would be much simpler.

-- 
MST

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
@ 2018-03-15 13:53                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 56+ messages in thread
From: Michael S. Tsirkin @ 2018-03-15 13:53 UTC (permalink / raw)
  To: Wei Wang
  Cc: qemu-devel, virtio-dev, quintela, dgilbert, pbonzini,
	liliang.opensource, yang.zhang.wz, quan.xu0, nilal, riel

On Thu, Mar 15, 2018 at 06:24:16PM +0800, Wei Wang wrote:
> On 03/15/2018 10:47 AM, Michael S. Tsirkin wrote:
> > On Thu, Mar 15, 2018 at 09:15:48AM +0800, Wei Wang wrote:
> > > On 03/14/2018 10:12 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 14, 2018 at 02:03:19PM +0800, Wei Wang wrote:
> > > > > On 03/14/2018 10:53 AM, Michael S. Tsirkin wrote:
> > > > > > On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
> > > > > > > On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> > > > > > > > 
> > > > > > > > > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > > > > > > > Signed-off-by: Liang Li <liang.z.li@intel.com>
> > > > > > > > > CC: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > > > > > CC: Juan Quintela <quintela@redhat.com>
> > > > > > > > I find it suspicious that neither unrealize nor reset
> > > > > > > > functions have been touched at all.
> > > > > > > > Are you sure you have thought through scenarious like
> > > > > > > > hot-unplug or disabling the device by guest?
> > > > > > > OK. I think we can call balloon_free_page_stop in unrealize and reset.
> > > > > > > 
> > > > > > > 
> > > > > > > > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > > > > > > > +{
> > > > > > > > +    VirtQueueElement *elem;
> > > > > > > > +    VirtIOBalloon *dev = opaque;
> > > > > > > > +    VirtQueue *vq = dev->free_page_vq;
> > > > > > > > +    uint32_t id;
> > > > > > > > +    size_t size;
> > > > > > > > What makes it safe to poke at this device from multiple threads?
> > > > > > > > I think that it would be safer to do it from e.g. BH.
> > > > > > > > 
> > > > > > > Actually the free_page_optimization thread is the only user of free_page_vq,
> > > > > > > and there is only one optimization thread each time. Would this be safe
> > > > > > > enough?
> > > > > > > 
> > > > > > > Best,
> > > > > > > Wei
> > > > > > Aren't there other fields there? Also things like reset affect all VQs.
> > > > > > 
> > > > > Yes. But I think BHs are used to avoid re-entrancy, which isn't the issue
> > > > > here.
> > > > Since you are adding locks to address the issue - doesn't this imply
> > > > reentrancy is exactly the issue?
> > > Not really. The lock isn't intended for any reentrancy issues, since there
> > > will be only one run of the virtio_balloon_poll_free_page_hints function at
> > > any given time. Instead, the lock is used to synchronize
> > > virtio_balloon_poll_free_page_hints and virtio_balloon_free_page_stop to
> > > access dev->free_page_report_status.
> > I wonder whether that's enough. E.g. is there a race with guest
> > trying to reset the device? That resets all VQs you know.
> 
> I think that's OK - we will call virtio_balloon_free_page_stop in the device
> reset function, and qemu_thread_join() in virtio_balloon_free_page_stop will
> wait till the optimization thread exits. That is, the reset will proceed
> after the optimization thread exits.
> 
> 
> > 
> > 
> > > Please see the whole picture below:
> > > 
> > > virtio_balloon_poll_free_page_hints()
> > > {
> > > 
> > >      while (1) {
> > >          qemu_spin_lock();
> > >          if (dev->free_page_report_status >= FREE_PAGE_REPORT_S_STOP ||
> > >              !runstate_is_running()) {
> > >              qemu_spin_unlock();
> > >              break;
> > >          }
> > >          ...
> > >          if (id == dev->free_page_report_cmd_id) {
> > > ==>        dev->free_page_report_status = FREE_PAGE_REPORT_S_START;
> > >          ...
> > >          qemu_spin_unlock();
> > >      }
> > > }
> > > 
> > > 
> > > static void virtio_balloon_free_page_stop(void *opaque)
> > > {
> > >      VirtIOBalloon *s = opaque;
> > >      VirtIODevice *vdev = VIRTIO_DEVICE(s);
> > > 
> > >      qemu_spin_lock();
> > > ...
> > > ==>       s->free_page_report_status = FREE_PAGE_REPORT_S_STOP;
> > >      ...
> > >      qemu_spin_unlock();
> > > }
> > > 
> > > 
> > > Without the lock, there are theoretical possibilities that assigning STOP
> > > below is overridden by START above. In that
> > > case,virtio_balloon_free_page_stop does not effectively stop
> > > virtio_balloon_poll_free_page_hints.
> > > I think this issue couldn't be solved by BHs.
> > > 
> > > Best,
> > > Wei
> > Don't all BHs run under the BQL?
> 
> Actually the virtio_balloon_free_page_stop is called by the migration thread
> (instead of a BH). Even we guarantee the migration thread calls
> virtio_balloon_free_page_stop under BQL, the BQL is still too big for our
> case. Imagine this case: when the migration thread calls
> virtio_balloon_free_page_stop to stop the reporting, it blocks by BQL as
> virtio_balloon_poll_free_page_hints is in progress with BQL held, and the
> migration thread won't proceed untill virtio_balloon_poll_free_page_hints
> exits (i.e. getting all the hints). I think this isn't our intention - we
> basically want the migration thread to stop the guest reporting immediately.
> So I think the small lock above is better (it locks for only one hint).
> 
> Best,
> Wei

I am just saying that all these locking and ownership tricks need to be
documented. BH would be much simpler.

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/4] migration: use the free page hint feature from balloon
  2018-03-14 19:49   ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2018-03-16 11:20     ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-16 11:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, mst, quintela, pbonzini, liliang.opensource,
	yang.zhang.wz, quan.xu0

On 03/15/2018 03:49 AM, Dr. David Alan Gilbert wrote:
> * Wei Wang (wei.w.wang@intel.com) wrote:
>> Start the free page optimization after the migration bitmap is
>> synchronized. This can't be used in the stop&copy phase since the guest
>> is paused. Make sure the guest reporting has stopped before
>> synchronizing the migration dirty bitmap. Currently, the optimization is
>> added to precopy only.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: Michael S. Tsirkin <mst@redhat.com>
>> ---
>>   migration/ram.c | 19 ++++++++++++++++++-
>>   1 file changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c
>> index e172798..7b4c9b1 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -51,6 +51,8 @@
>>   #include "qemu/rcu_queue.h"
>>   #include "migration/colo.h"
>>   #include "migration/block.h"
>> +#include "sysemu/balloon.h"
>> +#include "sysemu/sysemu.h"
>>   
>>   /***********************************************************/
>>   /* ram save/restore */
>> @@ -208,6 +210,8 @@ struct RAMState {
>>       uint32_t last_version;
>>       /* We are in the first round */
>>       bool ram_bulk_stage;
>> +    /* The free pages optimization feature is supported */
>> +    bool free_page_support;
>>       /* How many times we have dirty too many pages */
>>       int dirty_rate_high_cnt;
>>       /* these variables are used for bitmap sync */
>> @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>>       unsigned long *bitmap = rb->bmap;
>>       unsigned long next;
>>   
>> -    if (rs->ram_bulk_stage && start > 0) {
>> +    if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
>>           next = start + 1;
> An easier thing is just to clear the ram_bulk_stage flag (and if you're
> doing it in the middle of the migration you need to reset some of the
> pointers; see postcopy_start for an example).
>
>>       } else {
>>           next = find_next_bit(bitmap, size, start);
>> @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
>>       int64_t end_time;
>>       uint64_t bytes_xfer_now;
>>   
>> +    if (rs->free_page_support) {
>> +        balloon_free_page_stop();
> Does balloon_free_page_stop cause it to immediately stop, or does it
> just ask nicely?   Could a slow guest keep pumping advice to us even
> when it was stopped?
>

Yes, balloon_free_page_stop will cause the optimization thread to exit 
immediately. It doesn't rely on anything from the guest.
The guest won't keep reporting, since before the optimization thread 
exits, it sends a stop sign to the guest to stop reporting (but not 
waiting for any ACKs as that's not needed actually).

I also applied other comments in the new version, please have check v5 
patches. Thanks.

Best,
Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
  2018-03-15 13:50         ` [virtio-dev] " Michael S. Tsirkin
@ 2018-03-16 11:24           ` Wei Wang
  -1 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-16 11:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Dr. David Alan Gilbert, qemu-devel, virtio-dev, quintela,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel

On 03/15/2018 09:50 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 15, 2018 at 06:52:41PM +0800, Wei Wang wrote:
>> On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote:
>>> * Wei Wang (wei.w.wang@intel.com) wrote:
>>>> This patch adds an API to clear bits corresponding to guest free pages
>>>> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
>>>> RAMBlock boundary.
>>>>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>> ---
>>>>    include/migration/misc.h |  2 ++
>>>>    migration/ram.c          | 21 +++++++++++++++++++++
>>>>    2 files changed, 23 insertions(+)
>>>>
>>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>>> index 77fd4f5..fae1acf 100644
>>>> --- a/include/migration/misc.h
>>>> +++ b/include/migration/misc.h
>>>> @@ -14,11 +14,13 @@
>>>>    #ifndef MIGRATION_MISC_H
>>>>    #define MIGRATION_MISC_H
>>>> +#include "exec/cpu-common.h"
>>>>    #include "qemu/notify.h"
>>>>    /* migration/ram.c */
>>>>    void ram_mig_init(void);
>>>> +void qemu_guest_free_page_hint(void *addr, size_t len);
>>>>    /* migration/block.c */
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index 5e33e5c..e172798 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
>>>>        return 0;
>>>>    }
>>> This could do with some comments
>> OK, I'll add some.
>>
>>>> +void qemu_guest_free_page_hint(void *addr, size_t len)
>>>> +{
>>>> +    RAMBlock *block;
>>>> +    ram_addr_t offset;
>>>> +    size_t used_len, start, npages;
>>>   From your use I think the addr and len are coming raw from the guest;
>>> so we need to take some care.
>>>
>> Actually the "addr" here has been the host address that corresponds to the
>> guest free page. It's from elem->in_sg[0].iov_base.
>>
>>>> +        if (unlikely(offset + len > block->used_length)) {
>>> I think to make that overflow safe, that should be:
>>>     if (len > (block->used_length - offset)) {
>>>
>>> But we'll need another test before it, because qemu_ram_block_from_host
>>> seems to check max_length not used_length, so we need to check
>>> for offset > block->used_length first
>> OK, how about adding an assert above, like this:
>>
>> block = qemu_ram_block_from_host(addr, false, &offset);
>> assert (offset  < block->used_length );
>> if (!block)
>>      ...
>>
>> The address corresponds to a guest free page, which means it should be
>> within used_length. If not, something weird happens, I think we'd better to
>> assert it in that case.
>>
>> Best,
>> Wei
> What if memory has been removed by hotunplug after guest sent the
> free page notification?
>
> This seems to actually be likely to happen as memory being unplugged
> would typically be mostly free.


OK, thanks for the reminder. Instead of using an assert, I think we can 
let the function just return if (offset > block->used_length).

Best,
Wei

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [virtio-dev] Re: [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap
@ 2018-03-16 11:24           ` Wei Wang
  0 siblings, 0 replies; 56+ messages in thread
From: Wei Wang @ 2018-03-16 11:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Dr. David Alan Gilbert, qemu-devel, virtio-dev, quintela,
	pbonzini, liliang.opensource, yang.zhang.wz, quan.xu0, nilal,
	riel

On 03/15/2018 09:50 PM, Michael S. Tsirkin wrote:
> On Thu, Mar 15, 2018 at 06:52:41PM +0800, Wei Wang wrote:
>> On 03/15/2018 02:11 AM, Dr. David Alan Gilbert wrote:
>>> * Wei Wang (wei.w.wang@intel.com) wrote:
>>>> This patch adds an API to clear bits corresponding to guest free pages
>>>> from the dirty bitmap. Spilt the free page block if it crosses the QEMU
>>>> RAMBlock boundary.
>>>>
>>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>>> CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> CC: Juan Quintela <quintela@redhat.com>
>>>> CC: Michael S. Tsirkin <mst@redhat.com>
>>>> ---
>>>>    include/migration/misc.h |  2 ++
>>>>    migration/ram.c          | 21 +++++++++++++++++++++
>>>>    2 files changed, 23 insertions(+)
>>>>
>>>> diff --git a/include/migration/misc.h b/include/migration/misc.h
>>>> index 77fd4f5..fae1acf 100644
>>>> --- a/include/migration/misc.h
>>>> +++ b/include/migration/misc.h
>>>> @@ -14,11 +14,13 @@
>>>>    #ifndef MIGRATION_MISC_H
>>>>    #define MIGRATION_MISC_H
>>>> +#include "exec/cpu-common.h"
>>>>    #include "qemu/notify.h"
>>>>    /* migration/ram.c */
>>>>    void ram_mig_init(void);
>>>> +void qemu_guest_free_page_hint(void *addr, size_t len);
>>>>    /* migration/block.c */
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index 5e33e5c..e172798 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -2189,6 +2189,27 @@ static int ram_init_all(RAMState **rsp)
>>>>        return 0;
>>>>    }
>>> This could do with some comments
>> OK, I'll add some.
>>
>>>> +void qemu_guest_free_page_hint(void *addr, size_t len)
>>>> +{
>>>> +    RAMBlock *block;
>>>> +    ram_addr_t offset;
>>>> +    size_t used_len, start, npages;
>>>   From your use I think the addr and len are coming raw from the guest;
>>> so we need to take some care.
>>>
>> Actually the "addr" here has been the host address that corresponds to the
>> guest free page. It's from elem->in_sg[0].iov_base.
>>
>>>> +        if (unlikely(offset + len > block->used_length)) {
>>> I think to make that overflow safe, that should be:
>>>     if (len > (block->used_length - offset)) {
>>>
>>> But we'll need another test before it, because qemu_ram_block_from_host
>>> seems to check max_length not used_length, so we need to check
>>> for offset > block->used_length first
>> OK, how about adding an assert above, like this:
>>
>> block = qemu_ram_block_from_host(addr, false, &offset);
>> assert (offset  < block->used_length );
>> if (!block)
>>      ...
>>
>> The address corresponds to a guest free page, which means it should be
>> within used_length. If not, something weird happens, I think we'd better to
>> assert it in that case.
>>
>> Best,
>> Wei
> What if memory has been removed by hotunplug after guest sent the
> free page notification?
>
> This seems to actually be likely to happen as memory being unplugged
> would typically be mostly free.


OK, thanks for the reminder. Instead of using an assert, I think we can 
let the function just return if (offset > block->used_length).

Best,
Wei

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2018-03-16 11:22 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-07 12:34 [Qemu-devel] [PATCH v4 0/4] virtio-balloon: free page hint reporting support Wei Wang
2018-03-07 12:34 ` [virtio-dev] " Wei Wang
2018-03-07 12:34 ` [Qemu-devel] [PATCH v4 1/4] bitmap: bitmap_count_one_with_offset Wei Wang
2018-03-07 12:34   ` [virtio-dev] " Wei Wang
2018-03-14 16:20   ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-07 12:34 ` [Qemu-devel] [PATCH v4 2/4] migration: API to clear bits of guest free pages from the dirty bitmap Wei Wang
2018-03-07 12:34   ` [virtio-dev] " Wei Wang
2018-03-14 18:11   ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-14 19:16     ` Michael S. Tsirkin
2018-03-14 19:16       ` [virtio-dev] " Michael S. Tsirkin
2018-03-14 19:42       ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-14 20:38         ` Michael S. Tsirkin
2018-03-14 20:38           ` [virtio-dev] " Michael S. Tsirkin
2018-03-15 11:10         ` [Qemu-devel] " Wei Wang
2018-03-15 11:10           ` [virtio-dev] " Wei Wang
2018-03-15 10:52     ` [Qemu-devel] " Wei Wang
2018-03-15 10:52       ` [virtio-dev] " Wei Wang
2018-03-15 13:50       ` [Qemu-devel] " Michael S. Tsirkin
2018-03-15 13:50         ` [virtio-dev] " Michael S. Tsirkin
2018-03-16 11:24         ` [Qemu-devel] " Wei Wang
2018-03-16 11:24           ` [virtio-dev] " Wei Wang
2018-03-07 12:34 ` [Qemu-devel] [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT Wei Wang
2018-03-07 12:34   ` [virtio-dev] " Wei Wang
2018-03-13 16:49   ` [Qemu-devel] " Michael S. Tsirkin
2018-03-13 16:49     ` [virtio-dev] " Michael S. Tsirkin
2018-03-14  2:43     ` [Qemu-devel] " Wei Wang
2018-03-14  2:43       ` Wei Wang
2018-03-14  2:53       ` [Qemu-devel] " Michael S. Tsirkin
2018-03-14  2:53         ` Michael S. Tsirkin
2018-03-14  6:03         ` [Qemu-devel] " Wei Wang
2018-03-14  6:03           ` Wei Wang
2018-03-14 14:12           ` [Qemu-devel] " Michael S. Tsirkin
2018-03-14 14:12             ` Michael S. Tsirkin
2018-03-15  1:15             ` [Qemu-devel] " Wei Wang
2018-03-15  1:15               ` Wei Wang
2018-03-15  2:47               ` [Qemu-devel] " Michael S. Tsirkin
2018-03-15  2:47                 ` Michael S. Tsirkin
2018-03-15 10:24                 ` [Qemu-devel] " Wei Wang
2018-03-15 10:24                   ` Wei Wang
2018-03-15 13:53                   ` [Qemu-devel] " Michael S. Tsirkin
2018-03-15 13:53                     ` Michael S. Tsirkin
2018-03-14 18:44   ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-07 12:34 ` [Qemu-devel] [PATCH v4 4/4] migration: use the free page hint feature from balloon Wei Wang
2018-03-07 12:34   ` [virtio-dev] " Wei Wang
2018-03-13 16:35   ` [Qemu-devel] " Michael S. Tsirkin
2018-03-13 16:35     ` [virtio-dev] " Michael S. Tsirkin
2018-03-14  2:41     ` [Qemu-devel] " Wei Wang
2018-03-14  2:41       ` [virtio-dev] " Wei Wang
2018-03-14  2:51       ` [Qemu-devel] " Michael S. Tsirkin
2018-03-14  2:51         ` [virtio-dev] " Michael S. Tsirkin
2018-03-14  6:50         ` [Qemu-devel] " Wei Wang
2018-03-14  6:50           ` [virtio-dev] " Wei Wang
2018-03-14 14:45           ` [Qemu-devel] " Michael S. Tsirkin
2018-03-14 14:45             ` [virtio-dev] " Michael S. Tsirkin
2018-03-14 19:49   ` [Qemu-devel] " Dr. David Alan Gilbert
2018-03-16 11:20     ` Wei Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.