All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v19 QEMU 0/4] virtio-balloon: add support for free page reporting
@ 2020-04-10  3:41 ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-10  3:41 UTC (permalink / raw)
  To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel

This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

I originally submitted this patch series back on February 11th 2020[1],
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I am submitting the QEMU pieces for
inclusion.

[1]: https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

---

Alexander Duyck (4):
      virtio-balloon: Implement support for page poison tracking feature
      linux-headers: update to contain virito-balloon free page reporting
      virtio-balloon: Provide an interface for free page reporting
      memory: Do not allow direct write access to rom_device regions


 hw/virtio/virtio-balloon.c                      |   85 ++++++++++++++++++++++-
 include/exec/memory.h                           |    4 +
 include/hw/virtio/virtio-balloon.h              |    3 +
 include/standard-headers/linux/virtio_balloon.h |    1 
 4 files changed, 86 insertions(+), 7 deletions(-)

--


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] [PATCH v19 QEMU 0/4] virtio-balloon: add support for free page reporting
@ 2020-04-10  3:41 ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-10  3:41 UTC (permalink / raw)
  To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel

This series provides an asynchronous means of reporting free guest pages
to QEMU through virtio-balloon so that the memory associated with those
pages can be dropped and reused by other processes and/or guests on the
host. Using this it is possible to avoid unnecessary I/O to disk and
greatly improve performance in the case of memory overcommit on the host.

I originally submitted this patch series back on February 11th 2020[1],
but at that time I was focused primarily on the kernel portion of this
patch set. However as of April 7th those patches are now included in
Linus's kernel tree[2] and so I am submitting the QEMU pieces for
inclusion.

[1]: https://lore.kernel.org/lkml/20200211224416.29318.44077.stgit@localhost.localdomain/
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b0c504f154718904ae49349147e3b7e6ae91ffdc

Changes from v17:
Fixed typo in patch 1 title
Addressed white-space issues reported via checkpatch
Added braces {} for two if statements to match expected coding style

Changes from v18:
Updated patches 2 and 3 based on input from dhildenb
Added comment to patch 2 describing what keeps us from reporting a bad page
Added patch to address issue with ROM devices being directly writable

---

Alexander Duyck (4):
      virtio-balloon: Implement support for page poison tracking feature
      linux-headers: update to contain virito-balloon free page reporting
      virtio-balloon: Provide an interface for free page reporting
      memory: Do not allow direct write access to rom_device regions


 hw/virtio/virtio-balloon.c                      |   85 ++++++++++++++++++++++-
 include/exec/memory.h                           |    4 +
 include/hw/virtio/virtio-balloon.h              |    3 +
 include/standard-headers/linux/virtio_balloon.h |    1 
 4 files changed, 86 insertions(+), 7 deletions(-)

--

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-10  3:41 ` [virtio-dev] " Alexander Duyck
@ 2020-04-10  3:41   ` Alexander Duyck
  -1 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-10  3:41 UTC (permalink / raw)
  To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

We need to make certain to advertise support for page poison tracking if
we want to actually get data on if the guest will be poisoning pages. So
if free page hinting is active we should add page poisoning support and
let the guest disable it if it isn't using it.

Page poisoning will result in a page being dirtied on free. As such we
cannot really avoid having to copy the page at least one more time since
we will need to write the poison value to the destination. As such we can
just ignore free page hinting if page poisoning is enabled as it will
actually reduce the work we have to do.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 hw/virtio/virtio-balloon.c         |   26 ++++++++++++++++++++++----
 include/hw/virtio/virtio-balloon.h |    1 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..1c6d36a29a04 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
         return;
     }
 
+    /*
+     * If page poisoning is enabled then we probably shouldn't bother with
+     * the hinting since the poisoning will dirty the page and invalidate
+     * the work we are doing anyway.
+     */
+    if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+        return;
+    }
+
     if (s->free_page_report_cmd_id == UINT_MAX) {
         s->free_page_report_cmd_id =
                        VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
@@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
     if (s->qemu_4_0_config_size) {
         return sizeof(struct virtio_balloon_config);
     }
-    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
+    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
+        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
         return sizeof(struct virtio_balloon_config);
     }
-    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-        return offsetof(struct virtio_balloon_config, poison_val);
-    }
     return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
 }
 
@@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
+    config.poison_val = cpu_to_le32(dev->poison_val);
 
     if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
         config.free_page_report_cmd_id =
@@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
         qapi_event_send_balloon_change(vm_ram_size -
                         ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
     }
+    dev->poison_val = virtio_vdev_has_feature(vdev,
+                                              VIRTIO_BALLOON_F_PAGE_POISON) ?
+                      le32_to_cpu(config.poison_val) : 0;
     trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
+    }
 
     return f;
 }
@@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
         g_free(s->stats_vq_elem);
         s->stats_vq_elem = NULL;
     }
+
+    s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
     DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+    DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_PAGE_POISON, false),
     /* QEMU 4.0 accidentally changed the config size even when free-page-hint
      * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
      * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index d1c968d2376e..7fe78e5c14d7 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
     uint32_t host_features;
 
     bool qemu_4_0_config_size;
+    uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [virtio-dev] [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-10  3:41   ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-10  3:41 UTC (permalink / raw)
  To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

We need to make certain to advertise support for page poison tracking if
we want to actually get data on if the guest will be poisoning pages. So
if free page hinting is active we should add page poisoning support and
let the guest disable it if it isn't using it.

Page poisoning will result in a page being dirtied on free. As such we
cannot really avoid having to copy the page at least one more time since
we will need to write the poison value to the destination. As such we can
just ignore free page hinting if page poisoning is enabled as it will
actually reduce the work we have to do.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 hw/virtio/virtio-balloon.c         |   26 ++++++++++++++++++++++----
 include/hw/virtio/virtio-balloon.h |    1 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..1c6d36a29a04 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
         return;
     }
 
+    /*
+     * If page poisoning is enabled then we probably shouldn't bother with
+     * the hinting since the poisoning will dirty the page and invalidate
+     * the work we are doing anyway.
+     */
+    if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+        return;
+    }
+
     if (s->free_page_report_cmd_id == UINT_MAX) {
         s->free_page_report_cmd_id =
                        VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
@@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
     if (s->qemu_4_0_config_size) {
         return sizeof(struct virtio_balloon_config);
     }
-    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
+    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
+        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
         return sizeof(struct virtio_balloon_config);
     }
-    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
-        return offsetof(struct virtio_balloon_config, poison_val);
-    }
     return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
 }
 
@@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
 
     config.num_pages = cpu_to_le32(dev->num_pages);
     config.actual = cpu_to_le32(dev->actual);
+    config.poison_val = cpu_to_le32(dev->poison_val);
 
     if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
         config.free_page_report_cmd_id =
@@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
         qapi_event_send_balloon_change(vm_ram_size -
                         ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
     }
+    dev->poison_val = virtio_vdev_has_feature(vdev,
+                                              VIRTIO_BALLOON_F_PAGE_POISON) ?
+                      le32_to_cpu(config.poison_val) : 0;
     trace_virtio_balloon_set_config(dev->actual, oldactual);
 }
 
@@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
+    }
 
     return f;
 }
@@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
         g_free(s->stats_vq_elem);
         s->stats_vq_elem = NULL;
     }
+
+    s->poison_val = 0;
 }
 
 static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
     DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+    DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_PAGE_POISON, false),
     /* QEMU 4.0 accidentally changed the config size even when free-page-hint
      * is disabled, resulting in QEMU 3.1 migration incompatibility.  This
      * property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index d1c968d2376e..7fe78e5c14d7 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
     uint32_t host_features;
 
     bool qemu_4_0_config_size;
+    uint32_t poison_val;
 } VirtIOBalloon;
 
 #endif


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v19 QEMU 2/4] linux-headers: update to contain virito-balloon free page reporting
  2020-04-10  3:41 ` [virtio-dev] " Alexander Duyck
@ 2020-04-10  3:41   ` Alexander Duyck
  -1 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-10  3:41 UTC (permalink / raw)
  To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

Sync the latest upstream changes for free page reporting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 include/standard-headers/linux/virtio_balloon.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..1c5f6d6f2de6 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING	5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [virtio-dev] [PATCH v19 QEMU 2/4] linux-headers: update to contain virito-balloon free page reporting
@ 2020-04-10  3:41   ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-10  3:41 UTC (permalink / raw)
  To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

Sync the latest upstream changes for free page reporting. To be
replaced by a full linux header sync.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 include/standard-headers/linux/virtio_balloon.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..1c5f6d6f2de6 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
 #define VIRTIO_BALLOON_F_DEFLATE_ON_OOM	2 /* Deflate balloon on OOM */
 #define VIRTIO_BALLOON_F_FREE_PAGE_HINT	3 /* VQ to report free pages */
 #define VIRTIO_BALLOON_F_PAGE_POISON	4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING	5 /* Page reporting virtqueue */
 
 /* Size of a PFN in the balloon interface. */
 #define VIRTIO_BALLOON_PFN_SHIFT 12


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
  2020-04-10  3:41 ` [virtio-dev] " Alexander Duyck
@ 2020-04-10  3:41   ` Alexander Duyck
  -1 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-10  3:41 UTC (permalink / raw)
  To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 hw/virtio/virtio-balloon.c         |   63 +++++++++++++++++++++++++++++++++++-
 include/hw/virtio/virtio-balloon.h |    2 +
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1c6d36a29a04..86d8b48a8e3a 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
     balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+    VirtQueueElement *elem;
+
+    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
+        unsigned int i;
+
+        for (i = 0; i < elem->in_num; i++) {
+            void *addr = elem->in_sg[i].iov_base;
+            size_t size = elem->in_sg[i].iov_len;
+            ram_addr_t ram_offset;
+            size_t rb_page_size;
+            RAMBlock *rb;
+
+            if (qemu_balloon_is_inhibited() || dev->poison_val) {
+                continue;
+            }
+
+            /*
+             * There is no need to check the memory section to see if
+             * it is ram/readonly/romd like there is for handle_output
+             * below. If the region is not meant to be written to then
+             * address_space_map will have allocated a bounce buffer
+             * and it will be freed in address_space_unmap and trigger
+             * and unassigned_mem_write before failing to copy over the
+             * buffer. If more than one bad descriptor is provided it
+             * will return NULL after the first bounce buffer and fail
+             * to map any resources.
+             */
+            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
+            if (!rb) {
+                trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+                continue;
+            }
+
+            /* For now we will simply ignore unaligned memory regions */
+            rb_page_size = qemu_ram_pagesize(rb);
+            if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
+                continue;
+            }
+
+            ram_block_discard_range(rb, ram_offset, size);
+        }
+
+        virtqueue_push(vq, elem, 0);
+        virtio_notify(vdev, vq);
+        g_free(elem);
+    }
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -628,7 +679,8 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
         return sizeof(struct virtio_balloon_config);
     }
     if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
-        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+        virtio_has_feature(features, VIRTIO_BALLOON_F_REPORTING)) {
         return sizeof(struct virtio_balloon_config);
     }
     return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
@@ -717,7 +769,8 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
-    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+        virtio_has_feature(f, VIRTIO_BALLOON_F_REPORTING)) {
         virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
     }
 
@@ -807,6 +860,10 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+        s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+    }
+
     if (virtio_has_feature(s->host_features,
                            VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
         s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -940,6 +997,8 @@ static Property virtio_balloon_properties[] = {
      */
     DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
                      qemu_4_0_config_size, false),
+    DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_REPORTING, true),
     DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
                      IOThread *),
     DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 7fe78e5c14d7..db5bf7127112 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
 
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq, *rvq;
     uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [virtio-dev] [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
@ 2020-04-10  3:41   ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-10  3:41 UTC (permalink / raw)
  To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.

This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 hw/virtio/virtio-balloon.c         |   63 +++++++++++++++++++++++++++++++++++-
 include/hw/virtio/virtio-balloon.h |    2 +
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1c6d36a29a04..86d8b48a8e3a 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
     balloon_stats_change_timer(s, 0);
 }
 
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+    VirtQueueElement *elem;
+
+    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
+        unsigned int i;
+
+        for (i = 0; i < elem->in_num; i++) {
+            void *addr = elem->in_sg[i].iov_base;
+            size_t size = elem->in_sg[i].iov_len;
+            ram_addr_t ram_offset;
+            size_t rb_page_size;
+            RAMBlock *rb;
+
+            if (qemu_balloon_is_inhibited() || dev->poison_val) {
+                continue;
+            }
+
+            /*
+             * There is no need to check the memory section to see if
+             * it is ram/readonly/romd like there is for handle_output
+             * below. If the region is not meant to be written to then
+             * address_space_map will have allocated a bounce buffer
+             * and it will be freed in address_space_unmap and trigger
+             * and unassigned_mem_write before failing to copy over the
+             * buffer. If more than one bad descriptor is provided it
+             * will return NULL after the first bounce buffer and fail
+             * to map any resources.
+             */
+            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
+            if (!rb) {
+                trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+                continue;
+            }
+
+            /* For now we will simply ignore unaligned memory regions */
+            rb_page_size = qemu_ram_pagesize(rb);
+            if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
+                continue;
+            }
+
+            ram_block_discard_range(rb, ram_offset, size);
+        }
+
+        virtqueue_push(vq, elem, 0);
+        virtio_notify(vdev, vq);
+        g_free(elem);
+    }
+}
+
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -628,7 +679,8 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
         return sizeof(struct virtio_balloon_config);
     }
     if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
-        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+        virtio_has_feature(features, VIRTIO_BALLOON_F_REPORTING)) {
         return sizeof(struct virtio_balloon_config);
     }
     return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
@@ -717,7 +769,8 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
     VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
     f |= dev->host_features;
     virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
-    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+        virtio_has_feature(f, VIRTIO_BALLOON_F_REPORTING)) {
         virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
     }
 
@@ -807,6 +860,10 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
     s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
     s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
 
+    if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+        s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+    }
+
     if (virtio_has_feature(s->host_features,
                            VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
         s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -940,6 +997,8 @@ static Property virtio_balloon_properties[] = {
      */
     DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
                      qemu_4_0_config_size, false),
+    DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_REPORTING, true),
     DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
                      IOThread *),
     DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 7fe78e5c14d7..db5bf7127112 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
 
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
-    VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+    VirtQueue *ivq, *dvq, *svq, *free_page_vq, *rvq;
     uint32_t free_page_report_status;
     uint32_t num_pages;
     uint32_t actual;


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
  2020-04-10  3:41 ` [virtio-dev] " Alexander Duyck
@ 2020-04-10  3:41   ` Alexander Duyck
  -1 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-10  3:41 UTC (permalink / raw)
  To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

According to the documentation in memory.h a ROM memory region will be
backed by RAM for reads, but is supposed to go through a callback for
writes. Currently we were not checking for the existence of the rom_device
flag when determining if we could perform a direct write or not.

To correct that add a check to memory_region_is_direct so that if the
memory region has the rom_device flag set we will return false for all
checks where is_write is set.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 include/exec/memory.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1614d9a02c0c..e000bd2f97b2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
     if (is_write) {
-        return memory_region_is_ram(mr) &&
-               !mr->readonly && !memory_region_is_ram_device(mr);
+        return memory_region_is_ram(mr) && !mr->readonly &&
+               !mr->rom_device && !memory_region_is_ram_device(mr);
     } else {
         return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
                memory_region_is_romd(mr);



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [virtio-dev] [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
@ 2020-04-10  3:41   ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-10  3:41 UTC (permalink / raw)
  To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

According to the documentation in memory.h a ROM memory region will be
backed by RAM for reads, but is supposed to go through a callback for
writes. Currently we were not checking for the existence of the rom_device
flag when determining if we could perform a direct write or not.

To correct that add a check to memory_region_is_direct so that if the
memory region has the rom_device flag set we will return false for all
checks where is_write is set.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 include/exec/memory.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1614d9a02c0c..e000bd2f97b2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
     if (is_write) {
-        return memory_region_is_ram(mr) &&
-               !mr->readonly && !memory_region_is_ram_device(mr);
+        return memory_region_is_ram(mr) && !mr->readonly &&
+               !mr->rom_device && !memory_region_is_ram_device(mr);
     } else {
         return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
                memory_region_is_romd(mr);


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
  2020-04-10  3:41   ` [virtio-dev] " Alexander Duyck
@ 2020-04-10 10:50     ` Paolo Bonzini
  -1 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2020-04-10 10:50 UTC (permalink / raw)
  To: Alexander Duyck, david, mst; +Cc: virtio-dev, qemu-devel

On 10/04/20 05:41, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> According to the documentation in memory.h a ROM memory region will be
> backed by RAM for reads, but is supposed to go through a callback for
> writes. Currently we were not checking for the existence of the rom_device
> flag when determining if we could perform a direct write or not.
> 
> To correct that add a check to memory_region_is_direct so that if the
> memory region has the rom_device flag set we will return false for all
> checks where is_write is set.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  include/exec/memory.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 1614d9a02c0c..e000bd2f97b2 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
>  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
>  {
>      if (is_write) {
> -        return memory_region_is_ram(mr) &&
> -               !mr->readonly && !memory_region_is_ram_device(mr);
> +        return memory_region_is_ram(mr) && !mr->readonly &&
> +               !mr->rom_device && !memory_region_is_ram_device(mr);
>      } else {
>          return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
>                 memory_region_is_romd(mr);
> 

Good catch.  I queued this up for 5.0.

Thanks,

Paolo



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
@ 2020-04-10 10:50     ` Paolo Bonzini
  0 siblings, 0 replies; 45+ messages in thread
From: Paolo Bonzini @ 2020-04-10 10:50 UTC (permalink / raw)
  To: Alexander Duyck, david, mst; +Cc: virtio-dev, qemu-devel

On 10/04/20 05:41, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> According to the documentation in memory.h a ROM memory region will be
> backed by RAM for reads, but is supposed to go through a callback for
> writes. Currently we were not checking for the existence of the rom_device
> flag when determining if we could perform a direct write or not.
> 
> To correct that add a check to memory_region_is_direct so that if the
> memory region has the rom_device flag set we will return false for all
> checks where is_write is set.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  include/exec/memory.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 1614d9a02c0c..e000bd2f97b2 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
>  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
>  {
>      if (is_write) {
> -        return memory_region_is_ram(mr) &&
> -               !mr->readonly && !memory_region_is_ram_device(mr);
> +        return memory_region_is_ram(mr) && !mr->readonly &&
> +               !mr->rom_device && !memory_region_is_ram_device(mr);
>      } else {
>          return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
>                 memory_region_is_romd(mr);
> 

Good catch.  I queued this up for 5.0.

Thanks,

Paolo


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
  2020-04-10 10:50     ` [virtio-dev] " Paolo Bonzini
@ 2020-04-13 22:48       ` Alexander Duyck
  -1 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-13 22:48 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin, David Hildenbrand
  Cc: virtio-dev, qemu-devel

On Fri, Apr 10, 2020 at 3:50 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 10/04/20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > According to the documentation in memory.h a ROM memory region will be
> > backed by RAM for reads, but is supposed to go through a callback for
> > writes. Currently we were not checking for the existence of the rom_device
> > flag when determining if we could perform a direct write or not.
> >
> > To correct that add a check to memory_region_is_direct so that if the
> > memory region has the rom_device flag set we will return false for all
> > checks where is_write is set.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> >  include/exec/memory.h |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 1614d9a02c0c..e000bd2f97b2 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
> >  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
> >  {
> >      if (is_write) {
> > -        return memory_region_is_ram(mr) &&
> > -               !mr->readonly && !memory_region_is_ram_device(mr);
> > +        return memory_region_is_ram(mr) && !mr->readonly &&
> > +               !mr->rom_device && !memory_region_is_ram_device(mr);
> >      } else {
> >          return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
> >                 memory_region_is_romd(mr);
> >
>
> Good catch.  I queued this up for 5.0.
>
> Thanks,
>
> Paolo

Thanks Paolo,

It looks like you only pulled this patch correct?

If so, David & Michael, do I need to resubmit the first 3 in this
series or can those be pulled separately?

Thanks.

Alex


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
@ 2020-04-13 22:48       ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-13 22:48 UTC (permalink / raw)
  To: Paolo Bonzini, Michael S. Tsirkin, David Hildenbrand
  Cc: virtio-dev, qemu-devel

On Fri, Apr 10, 2020 at 3:50 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 10/04/20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > According to the documentation in memory.h a ROM memory region will be
> > backed by RAM for reads, but is supposed to go through a callback for
> > writes. Currently we were not checking for the existence of the rom_device
> > flag when determining if we could perform a direct write or not.
> >
> > To correct that add a check to memory_region_is_direct so that if the
> > memory region has the rom_device flag set we will return false for all
> > checks where is_write is set.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> >  include/exec/memory.h |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 1614d9a02c0c..e000bd2f97b2 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
> >  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
> >  {
> >      if (is_write) {
> > -        return memory_region_is_ram(mr) &&
> > -               !mr->readonly && !memory_region_is_ram_device(mr);
> > +        return memory_region_is_ram(mr) && !mr->readonly &&
> > +               !mr->rom_device && !memory_region_is_ram_device(mr);
> >      } else {
> >          return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
> >                 memory_region_is_romd(mr);
> >
>
> Good catch.  I queued this up for 5.0.
>
> Thanks,
>
> Paolo

Thanks Paolo,

It looks like you only pulled this patch correct?

If so, David & Michael, do I need to resubmit the first 3 in this
series or can those be pulled separately?

Thanks.

Alex

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
  2020-04-13 22:48       ` [virtio-dev] " Alexander Duyck
@ 2020-04-14  7:36         ` David Hildenbrand
  -1 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-14  7:36 UTC (permalink / raw)
  To: Alexander Duyck, Paolo Bonzini, Michael S. Tsirkin; +Cc: virtio-dev, qemu-devel

On 14.04.20 00:48, Alexander Duyck wrote:
> On Fri, Apr 10, 2020 at 3:50 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 10/04/20 05:41, Alexander Duyck wrote:
>>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>>
>>> According to the documentation in memory.h a ROM memory region will be
>>> backed by RAM for reads, but is supposed to go through a callback for
>>> writes. Currently we were not checking for the existence of the rom_device
>>> flag when determining if we could perform a direct write or not.
>>>
>>> To correct that add a check to memory_region_is_direct so that if the
>>> memory region has the rom_device flag set we will return false for all
>>> checks where is_write is set.
>>>
>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>> ---
>>>  include/exec/memory.h |    4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index 1614d9a02c0c..e000bd2f97b2 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
>>>  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
>>>  {
>>>      if (is_write) {
>>> -        return memory_region_is_ram(mr) &&
>>> -               !mr->readonly && !memory_region_is_ram_device(mr);
>>> +        return memory_region_is_ram(mr) && !mr->readonly &&
>>> +               !mr->rom_device && !memory_region_is_ram_device(mr);
>>>      } else {
>>>          return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
>>>                 memory_region_is_romd(mr);
>>>
>>
>> Good catch.  I queued this up for 5.0.
>>
>> Thanks,
>>
>> Paolo
> 
> Thanks Paolo,
> 
> It looks like you only pulled this patch correct?
> 
> If so, David & Michael, do I need to resubmit the first 3 in this
> series or can those be pulled separately?

QEMU is currently in hard freeze. I'll have a final look over the
patches. If nothing jumps at me (and nothing changed upstream in the
meantime), Michael will queue them without a resend.

Thanks!


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
@ 2020-04-14  7:36         ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-14  7:36 UTC (permalink / raw)
  To: Alexander Duyck, Paolo Bonzini, Michael S. Tsirkin; +Cc: virtio-dev, qemu-devel

On 14.04.20 00:48, Alexander Duyck wrote:
> On Fri, Apr 10, 2020 at 3:50 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 10/04/20 05:41, Alexander Duyck wrote:
>>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>>
>>> According to the documentation in memory.h a ROM memory region will be
>>> backed by RAM for reads, but is supposed to go through a callback for
>>> writes. Currently we were not checking for the existence of the rom_device
>>> flag when determining if we could perform a direct write or not.
>>>
>>> To correct that add a check to memory_region_is_direct so that if the
>>> memory region has the rom_device flag set we will return false for all
>>> checks where is_write is set.
>>>
>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>> ---
>>>  include/exec/memory.h |    4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index 1614d9a02c0c..e000bd2f97b2 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
>>>  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
>>>  {
>>>      if (is_write) {
>>> -        return memory_region_is_ram(mr) &&
>>> -               !mr->readonly && !memory_region_is_ram_device(mr);
>>> +        return memory_region_is_ram(mr) && !mr->readonly &&
>>> +               !mr->rom_device && !memory_region_is_ram_device(mr);
>>>      } else {
>>>          return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
>>>                 memory_region_is_romd(mr);
>>>
>>
>> Good catch.  I queued this up for 5.0.
>>
>> Thanks,
>>
>> Paolo
> 
> Thanks Paolo,
> 
> It looks like you only pulled this patch correct?
> 
> If so, David & Michael, do I need to resubmit the first 3 in this
> series or can those be pulled separately?

QEMU is currently in hard freeze. I'll have a final look over the
patches. If nothing jumps at me (and nothing changed upstream in the
meantime), Michael will queue them without a resend.

Thanks!


-- 
Thanks,

David / dhildenb


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-10  3:41   ` [virtio-dev] " Alexander Duyck
@ 2020-04-15  8:08     ` David Hildenbrand
  -1 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-15  8:08 UTC (permalink / raw)
  To: Alexander Duyck, pbonzini, mst; +Cc: virtio-dev, qemu-devel

On 10.04.20 05:41, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> We need to make certain to advertise support for page poison tracking if
> we want to actually get data on if the guest will be poisoning pages. So
> if free page hinting is active we should add page poisoning support and
> let the guest disable it if it isn't using it.
> 
> Page poisoning will result in a page being dirtied on free. As such we
> cannot really avoid having to copy the page at least one more time since
> we will need to write the poison value to the destination. As such we can
> just ignore free page hinting if page poisoning is enabled as it will
> actually reduce the work we have to do.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  hw/virtio/virtio-balloon.c         |   26 ++++++++++++++++++++++----
>  include/hw/virtio/virtio-balloon.h |    1 +
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index a4729f7fc930..1c6d36a29a04 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
>          return;
>      }
>  
> +    /*
> +     * If page poisoning is enabled then we probably shouldn't bother with
> +     * the hinting since the poisoning will dirty the page and invalidate
> +     * the work we are doing anyway.
> +     */
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {

Why not check for the poison value instead? (as you do in patch #3) ?

> +        return;
> +    }
> +
>      if (s->free_page_report_cmd_id == UINT_MAX) {
>          s->free_page_report_cmd_id =
>                         VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;

We should rename all "free_page_report" stuff we can to
"free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
side :) ) before adding free page reporting .

(looking at the virtio-balloon linux header, it's also confusing but
we're stuck with that - maybe we should add better comments)


> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>      if (s->qemu_4_0_config_size) {
>          return sizeof(struct virtio_balloon_config);
>      }
> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>          return sizeof(struct virtio_balloon_config);
>      }
> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> -        return offsetof(struct virtio_balloon_config, poison_val);
> -    }

I am not sure this change is completely sane. Why is that necessary at all?

>      return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
>  }
>  
> @@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  
>      config.num_pages = cpu_to_le32(dev->num_pages);
>      config.actual = cpu_to_le32(dev->actual);
> +    config.poison_val = cpu_to_le32(dev->poison_val);
>  
>      if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
>          config.free_page_report_cmd_id =
> @@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>          qapi_event_send_balloon_change(vm_ram_size -
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
>      }
> +    dev->poison_val = virtio_vdev_has_feature(vdev,
> +                                              VIRTIO_BALLOON_F_PAGE_POISON) ?
> +                      le32_to_cpu(config.poison_val) : 0;

Can we just do a


dev->poison_val = 0;
if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
	dev->poison_val = le32_to_cpu(config.poison_val);
}

instead?


>      trace_virtio_balloon_set_config(dev->actual, oldactual);
>  }
>  
> @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> +    }
>  
>      return f;
>  }
> @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
>          g_free(s->stats_vq_elem);
>          s->stats_vq_elem = NULL;
>      }
> +
> +    s->poison_val = 0;
>  }
>  
>  static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
>      DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
>                      VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> +    DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_PAGE_POISON, false),

Just curious, why an x- feature?


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-15  8:08     ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-15  8:08 UTC (permalink / raw)
  To: Alexander Duyck, pbonzini, mst; +Cc: virtio-dev, qemu-devel

On 10.04.20 05:41, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> We need to make certain to advertise support for page poison tracking if
> we want to actually get data on if the guest will be poisoning pages. So
> if free page hinting is active we should add page poisoning support and
> let the guest disable it if it isn't using it.
> 
> Page poisoning will result in a page being dirtied on free. As such we
> cannot really avoid having to copy the page at least one more time since
> we will need to write the poison value to the destination. As such we can
> just ignore free page hinting if page poisoning is enabled as it will
> actually reduce the work we have to do.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  hw/virtio/virtio-balloon.c         |   26 ++++++++++++++++++++++----
>  include/hw/virtio/virtio-balloon.h |    1 +
>  2 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index a4729f7fc930..1c6d36a29a04 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
>          return;
>      }
>  
> +    /*
> +     * If page poisoning is enabled then we probably shouldn't bother with
> +     * the hinting since the poisoning will dirty the page and invalidate
> +     * the work we are doing anyway.
> +     */
> +    if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {

Why not check for the poison value instead? (as you do in patch #3) ?

> +        return;
> +    }
> +
>      if (s->free_page_report_cmd_id == UINT_MAX) {
>          s->free_page_report_cmd_id =
>                         VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;

We should rename all "free_page_report" stuff we can to
"free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
side :) ) before adding free page reporting .

(looking at the virtio-balloon linux header, it's also confusing but
we're stuck with that - maybe we should add better comments)


> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>      if (s->qemu_4_0_config_size) {
>          return sizeof(struct virtio_balloon_config);
>      }
> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>          return sizeof(struct virtio_balloon_config);
>      }
> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> -        return offsetof(struct virtio_balloon_config, poison_val);
> -    }

I am not sure this change is completely sane. Why is that necessary at all?

>      return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
>  }
>  
> @@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>  
>      config.num_pages = cpu_to_le32(dev->num_pages);
>      config.actual = cpu_to_le32(dev->actual);
> +    config.poison_val = cpu_to_le32(dev->poison_val);
>  
>      if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
>          config.free_page_report_cmd_id =
> @@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
>          qapi_event_send_balloon_change(vm_ram_size -
>                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
>      }
> +    dev->poison_val = virtio_vdev_has_feature(vdev,
> +                                              VIRTIO_BALLOON_F_PAGE_POISON) ?
> +                      le32_to_cpu(config.poison_val) : 0;

Can we just do a


dev->poison_val = 0;
if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
	dev->poison_val = le32_to_cpu(config.poison_val);
}

instead?


>      trace_virtio_balloon_set_config(dev->actual, oldactual);
>  }
>  
> @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>      f |= dev->host_features;
>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> +    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> +    }
>  
>      return f;
>  }
> @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
>          g_free(s->stats_vq_elem);
>          s->stats_vq_elem = NULL;
>      }
> +
> +    s->poison_val = 0;
>  }
>  
>  static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
>      DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
>                      VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> +    DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
> +                    VIRTIO_BALLOON_F_PAGE_POISON, false),

Just curious, why an x- feature?


-- 
Thanks,

David / dhildenb


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
  2020-04-10  3:41   ` [virtio-dev] " Alexander Duyck
@ 2020-04-15  8:17     ` David Hildenbrand
  -1 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-15  8:17 UTC (permalink / raw)
  To: Alexander Duyck, pbonzini, mst; +Cc: virtio-dev, qemu-devel

On 10.04.20 05:41, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> Add support for free page reporting. The idea is to function very similar
> to how the balloon works in that we basically end up madvising the page as
> not being used. However we don't really need to bother with any deflate
> type logic since the page will be faulted back into the guest when it is
> read or written to.
> 
> This provides a new way of letting the guest proactively report free
> pages to the hypervisor, so the hypervisor can reuse them. In contrast to
> inflate/deflate that is triggered via the hypervisor explicitly.

Much better, thanks!

> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  hw/virtio/virtio-balloon.c         |   63 +++++++++++++++++++++++++++++++++++-
>  include/hw/virtio/virtio-balloon.h |    2 +
>  2 files changed, 62 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 1c6d36a29a04..86d8b48a8e3a 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
>      balloon_stats_change_timer(s, 0);
>  }
>  
> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> +    VirtQueueElement *elem;
> +
> +    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
> +        unsigned int i;
> +
> +        for (i = 0; i < elem->in_num; i++) {
> +            void *addr = elem->in_sg[i].iov_base;
> +            size_t size = elem->in_sg[i].iov_len;
> +            ram_addr_t ram_offset;
> +            size_t rb_page_size;
> +            RAMBlock *rb;
> +
> +            if (qemu_balloon_is_inhibited() || dev->poison_val) {
> +                continue;

actually, you want to do that in the outer loop, no?

> +            }
> +
> +            /*
> +             * There is no need to check the memory section to see if
> +             * it is ram/readonly/romd like there is for handle_output
> +             * below. If the region is not meant to be written to then
> +             * address_space_map will have allocated a bounce buffer
> +             * and it will be freed in address_space_unmap and trigger
> +             * and unassigned_mem_write before failing to copy over the
> +             * buffer. If more than one bad descriptor is provided it
> +             * will return NULL after the first bounce buffer and fail
> +             * to map any resources.
> +             */
> +            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> +            if (!rb) {
> +                trace_virtio_balloon_bad_addr(elem->in_addr[i]);
> +                continue;
> +            }
> +
> +            /* For now we will simply ignore unaligned memory regions */
> +            rb_page_size = qemu_ram_pagesize(rb);
> +            if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {

/me thinks you can drop rb_page_size

I *think* there is still one remaining case to handle: Crossing RAM blocks.

Most probably you should check

/* For now, ignore crossing RAM blocks. */
if (ram_offset + size >= qemu_ram_get_used_length()) {
	continue;
}

otherwise ram_block_discard_range() will report an error.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
@ 2020-04-15  8:17     ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-15  8:17 UTC (permalink / raw)
  To: Alexander Duyck, pbonzini, mst; +Cc: virtio-dev, qemu-devel

On 10.04.20 05:41, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> Add support for free page reporting. The idea is to function very similar
> to how the balloon works in that we basically end up madvising the page as
> not being used. However we don't really need to bother with any deflate
> type logic since the page will be faulted back into the guest when it is
> read or written to.
> 
> This provides a new way of letting the guest proactively report free
> pages to the hypervisor, so the hypervisor can reuse them. In contrast to
> inflate/deflate that is triggered via the hypervisor explicitly.

Much better, thanks!

> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  hw/virtio/virtio-balloon.c         |   63 +++++++++++++++++++++++++++++++++++-
>  include/hw/virtio/virtio-balloon.h |    2 +
>  2 files changed, 62 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 1c6d36a29a04..86d8b48a8e3a 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
>      balloon_stats_change_timer(s, 0);
>  }
>  
> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> +    VirtQueueElement *elem;
> +
> +    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
> +        unsigned int i;
> +
> +        for (i = 0; i < elem->in_num; i++) {
> +            void *addr = elem->in_sg[i].iov_base;
> +            size_t size = elem->in_sg[i].iov_len;
> +            ram_addr_t ram_offset;
> +            size_t rb_page_size;
> +            RAMBlock *rb;
> +
> +            if (qemu_balloon_is_inhibited() || dev->poison_val) {
> +                continue;

actually, you want to do that in the outer loop, no?

> +            }
> +
> +            /*
> +             * There is no need to check the memory section to see if
> +             * it is ram/readonly/romd like there is for handle_output
> +             * below. If the region is not meant to be written to then
> +             * address_space_map will have allocated a bounce buffer
> +             * and it will be freed in address_space_unmap and trigger
> +             * and unassigned_mem_write before failing to copy over the
> +             * buffer. If more than one bad descriptor is provided it
> +             * will return NULL after the first bounce buffer and fail
> +             * to map any resources.
> +             */
> +            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> +            if (!rb) {
> +                trace_virtio_balloon_bad_addr(elem->in_addr[i]);
> +                continue;
> +            }
> +
> +            /* For now we will simply ignore unaligned memory regions */
> +            rb_page_size = qemu_ram_pagesize(rb);
> +            if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {

/me thinks you can drop rb_page_size

I *think* there is still one remaining case to handle: Crossing RAM blocks.

Most probably you should check

/* For now, ignore crossing RAM blocks. */
if (ram_offset + size >= qemu_ram_get_used_length()) {
	continue;
}

otherwise ram_block_discard_range() will report an error.

-- 
Thanks,

David / dhildenb


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
  2020-04-15  8:17     ` [virtio-dev] " David Hildenbrand
@ 2020-04-15  9:03       ` David Hildenbrand
  -1 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-15  9:03 UTC (permalink / raw)
  To: Alexander Duyck, pbonzini, mst; +Cc: virtio-dev, qemu-devel

On 15.04.20 10:17, David Hildenbrand wrote:
> On 10.04.20 05:41, Alexander Duyck wrote:
>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>
>> Add support for free page reporting. The idea is to function very similar
>> to how the balloon works in that we basically end up madvising the page as
>> not being used. However we don't really need to bother with any deflate
>> type logic since the page will be faulted back into the guest when it is
>> read or written to.
>>
>> This provides a new way of letting the guest proactively report free
>> pages to the hypervisor, so the hypervisor can reuse them. In contrast to
>> inflate/deflate that is triggered via the hypervisor explicitly.
> 
> Much better, thanks!
> 
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>> ---
>>  hw/virtio/virtio-balloon.c         |   63 +++++++++++++++++++++++++++++++++++-
>>  include/hw/virtio/virtio-balloon.h |    2 +
>>  2 files changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>> index 1c6d36a29a04..86d8b48a8e3a 100644
>> --- a/hw/virtio/virtio-balloon.c
>> +++ b/hw/virtio/virtio-balloon.c
>> @@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
>>      balloon_stats_change_timer(s, 0);
>>  }
>>  
>> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
>> +{
>> +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>> +    VirtQueueElement *elem;
>> +
>> +    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
>> +        unsigned int i;
>> +
>> +        for (i = 0; i < elem->in_num; i++) {
>> +            void *addr = elem->in_sg[i].iov_base;
>> +            size_t size = elem->in_sg[i].iov_len;
>> +            ram_addr_t ram_offset;
>> +            size_t rb_page_size;
>> +            RAMBlock *rb;
>> +
>> +            if (qemu_balloon_is_inhibited() || dev->poison_val) {
>> +                continue;
> 
> actually, you want to do that in the outer loop, no?
> 
>> +            }
>> +
>> +            /*
>> +             * There is no need to check the memory section to see if
>> +             * it is ram/readonly/romd like there is for handle_output
>> +             * below. If the region is not meant to be written to then
>> +             * address_space_map will have allocated a bounce buffer
>> +             * and it will be freed in address_space_unmap and trigger
>> +             * and unassigned_mem_write before failing to copy over the
>> +             * buffer. If more than one bad descriptor is provided it
>> +             * will return NULL after the first bounce buffer and fail
>> +             * to map any resources.
>> +             */
>> +            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
>> +            if (!rb) {
>> +                trace_virtio_balloon_bad_addr(elem->in_addr[i]);
>> +                continue;
>> +            }
>> +
>> +            /* For now we will simply ignore unaligned memory regions */
>> +            rb_page_size = qemu_ram_pagesize(rb);
>> +            if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
> 
> /me thinks you can drop rb_page_size
> 
> I *think* there is still one remaining case to handle: Crossing RAM blocks.
> 
> Most probably you should check
> 
> /* For now, ignore crossing RAM blocks. */
> if (ram_offset + size >= qemu_ram_get_used_length()) {
> 	continue;

(should be an > I guess)


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
@ 2020-04-15  9:03       ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-15  9:03 UTC (permalink / raw)
  To: Alexander Duyck, pbonzini, mst; +Cc: virtio-dev, qemu-devel

On 15.04.20 10:17, David Hildenbrand wrote:
> On 10.04.20 05:41, Alexander Duyck wrote:
>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>
>> Add support for free page reporting. The idea is to function very similar
>> to how the balloon works in that we basically end up madvising the page as
>> not being used. However we don't really need to bother with any deflate
>> type logic since the page will be faulted back into the guest when it is
>> read or written to.
>>
>> This provides a new way of letting the guest proactively report free
>> pages to the hypervisor, so the hypervisor can reuse them. In contrast to
>> inflate/deflate that is triggered via the hypervisor explicitly.
> 
> Much better, thanks!
> 
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>> ---
>>  hw/virtio/virtio-balloon.c         |   63 +++++++++++++++++++++++++++++++++++-
>>  include/hw/virtio/virtio-balloon.h |    2 +
>>  2 files changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>> index 1c6d36a29a04..86d8b48a8e3a 100644
>> --- a/hw/virtio/virtio-balloon.c
>> +++ b/hw/virtio/virtio-balloon.c
>> @@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
>>      balloon_stats_change_timer(s, 0);
>>  }
>>  
>> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
>> +{
>> +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>> +    VirtQueueElement *elem;
>> +
>> +    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
>> +        unsigned int i;
>> +
>> +        for (i = 0; i < elem->in_num; i++) {
>> +            void *addr = elem->in_sg[i].iov_base;
>> +            size_t size = elem->in_sg[i].iov_len;
>> +            ram_addr_t ram_offset;
>> +            size_t rb_page_size;
>> +            RAMBlock *rb;
>> +
>> +            if (qemu_balloon_is_inhibited() || dev->poison_val) {
>> +                continue;
> 
> actually, you want to do that in the outer loop, no?
> 
>> +            }
>> +
>> +            /*
>> +             * There is no need to check the memory section to see if
>> +             * it is ram/readonly/romd like there is for handle_output
>> +             * below. If the region is not meant to be written to then
>> +             * address_space_map will have allocated a bounce buffer
>> +             * and it will be freed in address_space_unmap and trigger
>> +             * and unassigned_mem_write before failing to copy over the
>> +             * buffer. If more than one bad descriptor is provided it
>> +             * will return NULL after the first bounce buffer and fail
>> +             * to map any resources.
>> +             */
>> +            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
>> +            if (!rb) {
>> +                trace_virtio_balloon_bad_addr(elem->in_addr[i]);
>> +                continue;
>> +            }
>> +
>> +            /* For now we will simply ignore unaligned memory regions */
>> +            rb_page_size = qemu_ram_pagesize(rb);
>> +            if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
> 
> /me thinks you can drop rb_page_size
> 
> I *think* there is still one remaining case to handle: Crossing RAM blocks.
> 
> Most probably you should check
> 
> /* For now, ignore crossing RAM blocks. */
> if (ram_offset + size >= qemu_ram_get_used_length()) {
> 	continue;

(should be an > I guess)


-- 
Thanks,

David / dhildenb


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
  2020-04-15  8:17     ` [virtio-dev] " David Hildenbrand
@ 2020-04-15 15:31       ` Alexander Duyck
  -1 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-15 15:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Paolo Bonzini, virtio-dev, Michael S. Tsirkin

On Wed, Apr 15, 2020 at 1:17 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.04.20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > Add support for free page reporting. The idea is to function very similar
> > to how the balloon works in that we basically end up madvising the page as
> > not being used. However we don't really need to bother with any deflate
> > type logic since the page will be faulted back into the guest when it is
> > read or written to.
> >
> > This provides a new way of letting the guest proactively report free
> > pages to the hypervisor, so the hypervisor can reuse them. In contrast to
> > inflate/deflate that is triggered via the hypervisor explicitly.
>
> Much better, thanks!
>
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> >  hw/virtio/virtio-balloon.c         |   63 +++++++++++++++++++++++++++++++++++-
> >  include/hw/virtio/virtio-balloon.h |    2 +
> >  2 files changed, 62 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 1c6d36a29a04..86d8b48a8e3a 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
> >      balloon_stats_change_timer(s, 0);
> >  }
> >
> > +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> > +    VirtQueueElement *elem;
> > +
> > +    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
> > +        unsigned int i;
> > +
> > +        for (i = 0; i < elem->in_num; i++) {
> > +            void *addr = elem->in_sg[i].iov_base;
> > +            size_t size = elem->in_sg[i].iov_len;
> > +            ram_addr_t ram_offset;
> > +            size_t rb_page_size;
> > +            RAMBlock *rb;
> > +
> > +            if (qemu_balloon_is_inhibited() || dev->poison_val) {
> > +                continue;
>
> actually, you want to do that in the outer loop, no?

I'll move that. Odds are compiler was doing that anyway.

> > +            }
> > +
> > +            /*
> > +             * There is no need to check the memory section to see if
> > +             * it is ram/readonly/romd like there is for handle_output
> > +             * below. If the region is not meant to be written to then
> > +             * address_space_map will have allocated a bounce buffer
> > +             * and it will be freed in address_space_unmap and trigger
> > +             * and unassigned_mem_write before failing to copy over the
> > +             * buffer. If more than one bad descriptor is provided it
> > +             * will return NULL after the first bounce buffer and fail
> > +             * to map any resources.
> > +             */
> > +            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> > +            if (!rb) {
> > +                trace_virtio_balloon_bad_addr(elem->in_addr[i]);
> > +                continue;
> > +            }
> > +
> > +            /* For now we will simply ignore unaligned memory regions */
> > +            rb_page_size = qemu_ram_pagesize(rb);
> > +            if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
>
> /me thinks you can drop rb_page_size

You mean just fold it into the statement? I guess I can do that.

> I *think* there is still one remaining case to handle: Crossing RAM blocks.
>
> Most probably you should check
>
> /* For now, ignore crossing RAM blocks. */
> if (ram_offset + size >= qemu_ram_get_used_length()) {
>         continue;
> }
>
> otherwise ram_block_discard_range() will report an error.

Makes sense. I can add that into the QEMU_IS_ALIGNED check.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
@ 2020-04-15 15:31       ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-15 15:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Paolo Bonzini, Michael S. Tsirkin, virtio-dev, qemu-devel

On Wed, Apr 15, 2020 at 1:17 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.04.20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > Add support for free page reporting. The idea is to function very similar
> > to how the balloon works in that we basically end up madvising the page as
> > not being used. However we don't really need to bother with any deflate
> > type logic since the page will be faulted back into the guest when it is
> > read or written to.
> >
> > This provides a new way of letting the guest proactively report free
> > pages to the hypervisor, so the hypervisor can reuse them. In contrast to
> > inflate/deflate that is triggered via the hypervisor explicitly.
>
> Much better, thanks!
>
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> >  hw/virtio/virtio-balloon.c         |   63 +++++++++++++++++++++++++++++++++++-
> >  include/hw/virtio/virtio-balloon.h |    2 +
> >  2 files changed, 62 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 1c6d36a29a04..86d8b48a8e3a 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
> >      balloon_stats_change_timer(s, 0);
> >  }
> >
> > +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> > +    VirtQueueElement *elem;
> > +
> > +    while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
> > +        unsigned int i;
> > +
> > +        for (i = 0; i < elem->in_num; i++) {
> > +            void *addr = elem->in_sg[i].iov_base;
> > +            size_t size = elem->in_sg[i].iov_len;
> > +            ram_addr_t ram_offset;
> > +            size_t rb_page_size;
> > +            RAMBlock *rb;
> > +
> > +            if (qemu_balloon_is_inhibited() || dev->poison_val) {
> > +                continue;
>
> actually, you want to do that in the outer loop, no?

I'll move that. Odds are compiler was doing that anyway.

> > +            }
> > +
> > +            /*
> > +             * There is no need to check the memory section to see if
> > +             * it is ram/readonly/romd like there is for handle_output
> > +             * below. If the region is not meant to be written to then
> > +             * address_space_map will have allocated a bounce buffer
> > +             * and it will be freed in address_space_unmap and trigger
> > +             * and unassigned_mem_write before failing to copy over the
> > +             * buffer. If more than one bad descriptor is provided it
> > +             * will return NULL after the first bounce buffer and fail
> > +             * to map any resources.
> > +             */
> > +            rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> > +            if (!rb) {
> > +                trace_virtio_balloon_bad_addr(elem->in_addr[i]);
> > +                continue;
> > +            }
> > +
> > +            /* For now we will simply ignore unaligned memory regions */
> > +            rb_page_size = qemu_ram_pagesize(rb);
> > +            if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
>
> /me thinks you can drop rb_page_size

You mean just fold it into the statement? I guess I can do that.

> I *think* there is still one remaining case to handle: Crossing RAM blocks.
>
> Most probably you should check
>
> /* For now, ignore crossing RAM blocks. */
> if (ram_offset + size >= qemu_ram_get_used_length()) {
>         continue;
> }
>
> otherwise ram_block_discard_range() will report an error.

Makes sense. I can add that into the QEMU_IS_ALIGNED check.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-15  8:08     ` [virtio-dev] " David Hildenbrand
@ 2020-04-15 17:17       ` Alexander Duyck
  -1 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-15 17:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, Paolo Bonzini, virtio-dev, Michael S. Tsirkin

On Wed, Apr 15, 2020 at 1:08 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.04.20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > We need to make certain to advertise support for page poison tracking if
> > we want to actually get data on if the guest will be poisoning pages. So
> > if free page hinting is active we should add page poisoning support and
> > let the guest disable it if it isn't using it.
> >
> > Page poisoning will result in a page being dirtied on free. As such we
> > cannot really avoid having to copy the page at least one more time since
> > we will need to write the poison value to the destination. As such we can
> > just ignore free page hinting if page poisoning is enabled as it will
> > actually reduce the work we have to do.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> >  hw/virtio/virtio-balloon.c         |   26 ++++++++++++++++++++++----
> >  include/hw/virtio/virtio-balloon.h |    1 +
> >  2 files changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index a4729f7fc930..1c6d36a29a04 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
> >          return;
> >      }
> >
> > +    /*
> > +     * If page poisoning is enabled then we probably shouldn't bother with
> > +     * the hinting since the poisoning will dirty the page and invalidate
> > +     * the work we are doing anyway.
> > +     */
> > +    if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
>
> Why not check for the poison value instead? (as you do in patch #3) ?

So if I recall correctly the vdev has feature requires the host to
indicate that the feature is in use. If page poisoning is not enabled
on the host then it clears the flag on its end and we can proceed with
the feature.

The comment above explains the "why". Basically poisoning a page will
dirty it. So why hint a page as free when that will drop it back into
the guest and result in it being dirtied again. What you end up with
is all the pages that were temporarily placed in the balloon are dirty
after the hinting report is finished at which point you made things
worse instead of helping to improve them.

>
> > +        return;
> > +    }
> > +
> >      if (s->free_page_report_cmd_id == UINT_MAX) {
> >          s->free_page_report_cmd_id =
> >                         VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>
> We should rename all "free_page_report" stuff we can to
> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> side :) ) before adding free page reporting .
>
> (looking at the virtio-balloon linux header, it's also confusing but
> we're stuck with that - maybe we should add better comments)

Are we stuck? Couldn't we just convert it to an anonymous union with
free_page_hint_cmd_id and then use that where needed?

> > @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> >      if (s->qemu_4_0_config_size) {
> >          return sizeof(struct virtio_balloon_config);
> >      }
> > -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> > +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> > +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >          return sizeof(struct virtio_balloon_config);
> >      }
> > -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > -        return offsetof(struct virtio_balloon_config, poison_val);
> > -    }
>
> I am not sure this change is completely sane. Why is that necessary at all?

The poison_val is stored at the end of the structure and is required
in order to make free page hinting to work. What this change is doing
is forcing it so that we report the config size as the full size if
either poisoning or hinting are enabled since the poison val is the
last member of the config structure.

If the question is why bother reducing the size if free page hinting
is not present then I guess we could simplify this and just report
report the size of the config for all cases.

> >      return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
> >  }
> >
> > @@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
> >
> >      config.num_pages = cpu_to_le32(dev->num_pages);
> >      config.actual = cpu_to_le32(dev->actual);
> > +    config.poison_val = cpu_to_le32(dev->poison_val);
> >
> >      if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
> >          config.free_page_report_cmd_id =
> > @@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
> >          qapi_event_send_balloon_change(vm_ram_size -
> >                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
> >      }
> > +    dev->poison_val = virtio_vdev_has_feature(vdev,
> > +                                              VIRTIO_BALLOON_F_PAGE_POISON) ?
> > +                      le32_to_cpu(config.poison_val) : 0;
>
> Can we just do a
>
>
> dev->poison_val = 0;
> if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
>         dev->poison_val = le32_to_cpu(config.poison_val);
> }
>
> instead?

I can change it to that if that is what is preferred.

> >      trace_virtio_balloon_set_config(dev->actual, oldactual);
> >  }
> >
> > @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
> >      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> >      f |= dev->host_features;
> >      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> > +    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> > +    }
> >
> >      return f;
> >  }
> > @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
> >          g_free(s->stats_vq_elem);
> >          s->stats_vq_elem = NULL;
> >      }
> > +
> > +    s->poison_val = 0;
> >  }
> >
> >  static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> > @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
> >                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> >      DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> >                      VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> > +    DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
> > +                    VIRTIO_BALLOON_F_PAGE_POISON, false),
>
> Just curious, why an x- feature?

It was something I didn't expect the users to enable. It gets enabled
when either free page hinting or free page reporting is enabled. So if
you look you will see that in virtio_balloon_get_features the page
poison feature is added if free page hinting is present. The guest
will clear the feature bit if poisoning is not enabled in the guest.
That results in the vdev getting the bit cleared.

Part of it was also about making this work with the existing feature
code as it had been added to the upstream kernel.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-15 17:17       ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-15 17:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Paolo Bonzini, Michael S. Tsirkin, virtio-dev, qemu-devel

On Wed, Apr 15, 2020 at 1:08 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.04.20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > We need to make certain to advertise support for page poison tracking if
> > we want to actually get data on if the guest will be poisoning pages. So
> > if free page hinting is active we should add page poisoning support and
> > let the guest disable it if it isn't using it.
> >
> > Page poisoning will result in a page being dirtied on free. As such we
> > cannot really avoid having to copy the page at least one more time since
> > we will need to write the poison value to the destination. As such we can
> > just ignore free page hinting if page poisoning is enabled as it will
> > actually reduce the work we have to do.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> >  hw/virtio/virtio-balloon.c         |   26 ++++++++++++++++++++++----
> >  include/hw/virtio/virtio-balloon.h |    1 +
> >  2 files changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index a4729f7fc930..1c6d36a29a04 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
> >          return;
> >      }
> >
> > +    /*
> > +     * If page poisoning is enabled then we probably shouldn't bother with
> > +     * the hinting since the poisoning will dirty the page and invalidate
> > +     * the work we are doing anyway.
> > +     */
> > +    if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
>
> Why not check for the poison value instead? (as you do in patch #3) ?

So if I recall correctly the vdev has feature requires the host to
indicate that the feature is in use. If page poisoning is not enabled
on the host then it clears the flag on its end and we can proceed with
the feature.

The comment above explains the "why". Basically poisoning a page will
dirty it. So why hint a page as free when that will drop it back into
the guest and result in it being dirtied again. What you end up with
is all the pages that were temporarily placed in the balloon are dirty
after the hinting report is finished at which point you made things
worse instead of helping to improve them.

>
> > +        return;
> > +    }
> > +
> >      if (s->free_page_report_cmd_id == UINT_MAX) {
> >          s->free_page_report_cmd_id =
> >                         VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>
> We should rename all "free_page_report" stuff we can to
> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> side :) ) before adding free page reporting .
>
> (looking at the virtio-balloon linux header, it's also confusing but
> we're stuck with that - maybe we should add better comments)

Are we stuck? Couldn't we just convert it to an anonymous union with
free_page_hint_cmd_id and then use that where needed?

> > @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> >      if (s->qemu_4_0_config_size) {
> >          return sizeof(struct virtio_balloon_config);
> >      }
> > -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> > +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> > +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >          return sizeof(struct virtio_balloon_config);
> >      }
> > -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > -        return offsetof(struct virtio_balloon_config, poison_val);
> > -    }
>
> I am not sure this change is completely sane. Why is that necessary at all?

The poison_val is stored at the end of the structure and is required
in order to make free page hinting to work. What this change is doing
is forcing it so that we report the config size as the full size if
either poisoning or hinting are enabled since the poison val is the
last member of the config structure.

If the question is why bother reducing the size if free page hinting
is not present then I guess we could simplify this and just report
report the size of the config for all cases.

> >      return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
> >  }
> >
> > @@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
> >
> >      config.num_pages = cpu_to_le32(dev->num_pages);
> >      config.actual = cpu_to_le32(dev->actual);
> > +    config.poison_val = cpu_to_le32(dev->poison_val);
> >
> >      if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
> >          config.free_page_report_cmd_id =
> > @@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
> >          qapi_event_send_balloon_change(vm_ram_size -
> >                          ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
> >      }
> > +    dev->poison_val = virtio_vdev_has_feature(vdev,
> > +                                              VIRTIO_BALLOON_F_PAGE_POISON) ?
> > +                      le32_to_cpu(config.poison_val) : 0;
>
> Can we just do a
>
>
> dev->poison_val = 0;
> if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
>         dev->poison_val = le32_to_cpu(config.poison_val);
> }
>
> instead?

I can change it to that if that is what is preferred.

> >      trace_virtio_balloon_set_config(dev->actual, oldactual);
> >  }
> >
> > @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
> >      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> >      f |= dev->host_features;
> >      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> > +    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> > +    }
> >
> >      return f;
> >  }
> > @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
> >          g_free(s->stats_vq_elem);
> >          s->stats_vq_elem = NULL;
> >      }
> > +
> > +    s->poison_val = 0;
> >  }
> >
> >  static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> > @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
> >                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> >      DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> >                      VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> > +    DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
> > +                    VIRTIO_BALLOON_F_PAGE_POISON, false),
>
> Just curious, why an x- feature?

It was something I didn't expect the users to enable. It gets enabled
when either free page hinting or free page reporting is enabled. So if
you look you will see that in virtio_balloon_get_features the page
poison feature is added if free page hinting is present. The guest
will clear the feature bit if poisoning is not enabled in the guest.
That results in the vdev getting the bit cleared.

Part of it was also about making this work with the existing feature
code as it had been added to the upstream kernel.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-15 17:17       ` [virtio-dev] " Alexander Duyck
@ 2020-04-15 18:16         ` David Hildenbrand
  -1 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-15 18:16 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: qemu-devel, Paolo Bonzini, virtio-dev, Michael S. Tsirkin

> 
> The comment above explains the "why". Basically poisoning a page will
> dirty it. So why hint a page as free when that will drop it back into
> the guest and result in it being dirtied again. What you end up with
> is all the pages that were temporarily placed in the balloon are dirty
> after the hinting report is finished at which point you made things
> worse instead of helping to improve them.

Let me think this through. What happens on free page hinting is that

a) we tell the guest that migration starts
b) it allocates pages (alloc_pages()), sends them to us and adds them to
   a list
b) we exclude these pages on migration
c) we tell the guest that migration is over
d) the guest frees all allocated pages

The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
will fill all pages again with a pattern (or zero).

AFAIKs, it's either

1) Not performing free page hinting, migrating pages filled with a
poison value (or zero).
2) Performing free page hinting, not migrating pages filled with a
poison value (or zero), letting only the guest fill them again.

I have the feeling, that 2) is still better for migration, because we
don't migrate useless pages and let the guest reconstruct the content?
(having a poison value of zero might be debatable)

Can you tell me what I am missing? :)

> 
>>
>>> +        return;
>>> +    }
>>> +
>>>      if (s->free_page_report_cmd_id == UINT_MAX) {
>>>          s->free_page_report_cmd_id =
>>>                         VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>>
>> We should rename all "free_page_report" stuff we can to
>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
>> side :) ) before adding free page reporting .
>>
>> (looking at the virtio-balloon linux header, it's also confusing but
>> we're stuck with that - maybe we should add better comments)
> 
> Are we stuck? Couldn't we just convert it to an anonymous union with
> free_page_hint_cmd_id and then use that where needed?

Saw your patch already :)

> 
>>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>>>      if (s->qemu_4_0_config_size) {
>>>          return sizeof(struct virtio_balloon_config);
>>>      }
>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
>>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
>>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>          return sizeof(struct virtio_balloon_config);
>>>      }
>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>> -        return offsetof(struct virtio_balloon_config, poison_val);
>>> -    }
>>
>> I am not sure this change is completely sane. Why is that necessary at all?
> 
> The poison_val is stored at the end of the structure and is required
> in order to make free page hinting to work. What this change is doing

Required to make it work? In the kernel,

commit 2e991629bcf55a43681aec1ee096eeb03cf81709
Author: Wei Wang <wei.w.wang@intel.com>
Date:   Mon Aug 27 09:32:19 2018 +0800

    virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

was merged after

commit 86a559787e6f5cf662c081363f64a20cad654195
Author: Wei Wang <wei.w.wang@intel.com>
Date:   Mon Aug 27 09:32:17 2018 +0800

    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

So I assume it's perfectly fine to not have it.

I'd say it's the responsibility of the guest to *not* use
VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
into the foot.

> is forcing it so that we report the config size as the full size if
> either poisoning or hinting are enabled since the poison val is the
> last member of the config structure.
> 
> If the question is why bother reducing the size if free page hinting
> is not present then I guess we could simplify this and just report
> report the size of the config for all cases.

I guess the idea is that if you migrate from one QEMU to another, your
config size will not suddenly change (fenced by a feature set that will
not change during migration, observable by a running guest).

My guess would be that we cannot simply change existing configurations
(defined via feature bits) as you do here - see e.g., qemu-4-0-config-size.

[...]

>>>
>>> @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>>>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>>>      f |= dev->host_features;
>>>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
>>> +    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
>>> +    }
>>>
>>>      return f;
>>>  }
>>> @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
>>>          g_free(s->stats_vq_elem);
>>>          s->stats_vq_elem = NULL;
>>>      }
>>> +
>>> +    s->poison_val = 0;
>>>  }
>>>
>>>  static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>>> @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
>>>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
>>>      DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
>>>                      VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
>>> +    DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
>>> +                    VIRTIO_BALLOON_F_PAGE_POISON, false),
>>
>> Just curious, why an x- feature?
> 
> It was something I didn't expect the users to enable. It gets enabled
> when either free page hinting or free page reporting is enabled. So if
> you look you will see that in virtio_balloon_get_features the page
> poison feature is added if free page hinting is present. The guest
> will clear the feature bit if poisoning is not enabled in the guest.
> That results in the vdev getting the bit cleared.

The weird thing is that setting it to "false" will still enable it
automatically, depending on other features. Not sure how helpful that is.

I'd prefer to simply always enable it, similar to
VIRTIO_BALLOON_F_STATS_VQ - but it's late and I am confused how
migration with compat machines will work. :)

So, I wonder if we need this feature bit here at all.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-15 18:16         ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-15 18:16 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Paolo Bonzini, Michael S. Tsirkin, virtio-dev, qemu-devel

> 
> The comment above explains the "why". Basically poisoning a page will
> dirty it. So why hint a page as free when that will drop it back into
> the guest and result in it being dirtied again. What you end up with
> is all the pages that were temporarily placed in the balloon are dirty
> after the hinting report is finished at which point you made things
> worse instead of helping to improve them.

Let me think this through. What happens on free page hinting is that

a) we tell the guest that migration starts
b) it allocates pages (alloc_pages()), sends them to us and adds them to
   a list
b) we exclude these pages on migration
c) we tell the guest that migration is over
d) the guest frees all allocated pages

The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
will fill all pages again with a pattern (or zero).

AFAIKs, it's either

1) Not performing free page hinting, migrating pages filled with a
poison value (or zero).
2) Performing free page hinting, not migrating pages filled with a
poison value (or zero), letting only the guest fill them again.

I have the feeling, that 2) is still better for migration, because we
don't migrate useless pages and let the guest reconstruct the content?
(having a poison value of zero might be debatable)

Can you tell me what I am missing? :)

> 
>>
>>> +        return;
>>> +    }
>>> +
>>>      if (s->free_page_report_cmd_id == UINT_MAX) {
>>>          s->free_page_report_cmd_id =
>>>                         VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>>
>> We should rename all "free_page_report" stuff we can to
>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
>> side :) ) before adding free page reporting .
>>
>> (looking at the virtio-balloon linux header, it's also confusing but
>> we're stuck with that - maybe we should add better comments)
> 
> Are we stuck? Couldn't we just convert it to an anonymous union with
> free_page_hint_cmd_id and then use that where needed?

Saw your patch already :)

> 
>>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>>>      if (s->qemu_4_0_config_size) {
>>>          return sizeof(struct virtio_balloon_config);
>>>      }
>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
>>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
>>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>          return sizeof(struct virtio_balloon_config);
>>>      }
>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>> -        return offsetof(struct virtio_balloon_config, poison_val);
>>> -    }
>>
>> I am not sure this change is completely sane. Why is that necessary at all?
> 
> The poison_val is stored at the end of the structure and is required
> in order to make free page hinting to work. What this change is doing

Required to make it work? In the kernel,

commit 2e991629bcf55a43681aec1ee096eeb03cf81709
Author: Wei Wang <wei.w.wang@intel.com>
Date:   Mon Aug 27 09:32:19 2018 +0800

    virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON

was merged after

commit 86a559787e6f5cf662c081363f64a20cad654195
Author: Wei Wang <wei.w.wang@intel.com>
Date:   Mon Aug 27 09:32:17 2018 +0800

    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

So I assume it's perfectly fine to not have it.

I'd say it's the responsibility of the guest to *not* use
VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
into the foot.

> is forcing it so that we report the config size as the full size if
> either poisoning or hinting are enabled since the poison val is the
> last member of the config structure.
> 
> If the question is why bother reducing the size if free page hinting
> is not present then I guess we could simplify this and just report
> report the size of the config for all cases.

I guess the idea is that if you migrate from one QEMU to another, your
config size will not suddenly change (fenced by a feature set that will
not change during migration, observable by a running guest).

My guess would be that we cannot simply change existing configurations
(defined via feature bits) as you do here - see e.g., qemu-4-0-config-size.

[...]

>>>
>>> @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>>>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>>>      f |= dev->host_features;
>>>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
>>> +    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
>>> +    }
>>>
>>>      return f;
>>>  }
>>> @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
>>>          g_free(s->stats_vq_elem);
>>>          s->stats_vq_elem = NULL;
>>>      }
>>> +
>>> +    s->poison_val = 0;
>>>  }
>>>
>>>  static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>>> @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
>>>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
>>>      DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
>>>                      VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
>>> +    DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
>>> +                    VIRTIO_BALLOON_F_PAGE_POISON, false),
>>
>> Just curious, why an x- feature?
> 
> It was something I didn't expect the users to enable. It gets enabled
> when either free page hinting or free page reporting is enabled. So if
> you look you will see that in virtio_balloon_get_features the page
> poison feature is added if free page hinting is present. The guest
> will clear the feature bit if poisoning is not enabled in the guest.
> That results in the vdev getting the bit cleared.

The weird thing is that setting it to "false" will still enable it
automatically, depending on other features. Not sure how helpful that is.

I'd prefer to simply always enable it, similar to
VIRTIO_BALLOON_F_STATS_VQ - but it's late and I am confused how
migration with compat machines will work. :)

So, I wonder if we need this feature bit here at all.

-- 
Thanks,

David / dhildenb


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-15 18:16         ` [virtio-dev] " David Hildenbrand
@ 2020-04-15 19:28           ` Alexander Duyck
  -1 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-15 19:28 UTC (permalink / raw)
  To: David Hildenbrand, Wang, Wei W
  Cc: qemu-devel, Paolo Bonzini, virtio-dev, Michael S. Tsirkin

On Wed, Apr 15, 2020 at 11:17 AM David Hildenbrand <david@redhat.com> wrote:
>
> >
> > The comment above explains the "why". Basically poisoning a page will
> > dirty it. So why hint a page as free when that will drop it back into
> > the guest and result in it being dirtied again. What you end up with
> > is all the pages that were temporarily placed in the balloon are dirty
> > after the hinting report is finished at which point you made things
> > worse instead of helping to improve them.
>
> Let me think this through. What happens on free page hinting is that
>
> a) we tell the guest that migration starts
> b) it allocates pages (alloc_pages()), sends them to us and adds them to
>    a list
> b) we exclude these pages on migration
> c) we tell the guest that migration is over
> d) the guest frees all allocated pages
>
> The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
> will fill all pages again with a pattern (or zero).

They should have already been filled with the poison pattern before we
got to d). A bigger worry is that we at some point in the future
update the kernel so that d) doesn't trigger re-poisoning, in which
case the pages won't be marked as dirty, we will have skipped the
migration, and we have no idea what will be in the pages at the end.

> AFAIKs, it's either
>
> 1) Not performing free page hinting, migrating pages filled with a
> poison value (or zero).
> 2) Performing free page hinting, not migrating pages filled with a
> poison value (or zero), letting only the guest fill them again.
>
> I have the feeling, that 2) is still better for migration, because we
> don't migrate useless pages and let the guest reconstruct the content?
> (having a poison value of zero might be debatable)
>
> Can you tell me what I am missing? :)

The goal of the free page hinting was to reduce the number of pages
that have to be migrated, in the second case you point out is is
basically deferring it and will make the post-copy quite more
expensive since all of the free memory will be pushed to the post-copy
which I would think would be undesirable since it means the VM would
have to be down for a greater amount of time with the poisoning
enabled.

The worst case scenario would be one in which the VM was suspended for
migration while there were still pages in the balloon that needed to
be drained. In such a case you would have them in an indeterminate
state since the last page poisoning for them might have been ignored
since they were marked as "free", so they are just going to be
whatever value they were if they had been migrated at all.

> >
> >>
> >>> +        return;
> >>> +    }
> >>> +
> >>>      if (s->free_page_report_cmd_id == UINT_MAX) {
> >>>          s->free_page_report_cmd_id =
> >>>                         VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> >>
> >> We should rename all "free_page_report" stuff we can to
> >> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> >> side :) ) before adding free page reporting .
> >>
> >> (looking at the virtio-balloon linux header, it's also confusing but
> >> we're stuck with that - maybe we should add better comments)
> >
> > Are we stuck? Couldn't we just convert it to an anonymous union with
> > free_page_hint_cmd_id and then use that where needed?
>
> Saw your patch already :)
>
> >
> >>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> >>>      if (s->qemu_4_0_config_size) {
> >>>          return sizeof(struct virtio_balloon_config);
> >>>      }
> >>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> >>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> >>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>          return sizeof(struct virtio_balloon_config);
> >>>      }
> >>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>> -        return offsetof(struct virtio_balloon_config, poison_val);
> >>> -    }
> >>
> >> I am not sure this change is completely sane. Why is that necessary at all?
> >
> > The poison_val is stored at the end of the structure and is required
> > in order to make free page hinting to work. What this change is doing
>
> Required to make it work? In the kernel,
>
> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
> Author: Wei Wang <wei.w.wang@intel.com>
> Date:   Mon Aug 27 09:32:19 2018 +0800
>
>     virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
>
> was merged after
>
> commit 86a559787e6f5cf662c081363f64a20cad654195
> Author: Wei Wang <wei.w.wang@intel.com>
> Date:   Mon Aug 27 09:32:17 2018 +0800
>
>     virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>
> So I assume it's perfectly fine to not have it.
>
> I'd say it's the responsibility of the guest to *not* use
> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
> into the foot.

Based on the timing I am guessing the page poisoning was in the same
patch set as the free page hinting since there is only 2 seconds
between when the two are merged. If I recall the page poisoning logic
was added after the issue was pointed out that it needed to address
it.

I can probably make some changes to virtballoon_validate in the kernel
driver to address the fact that if we are poisoning pages we need to
either have the PAGE_POISON feature or we need to disable page
reporting and page hinting.

> > is forcing it so that we report the config size as the full size if
> > either poisoning or hinting are enabled since the poison val is the
> > last member of the config structure.
> >
> > If the question is why bother reducing the size if free page hinting
> > is not present then I guess we could simplify this and just report
> > report the size of the config for all cases.
>
> I guess the idea is that if you migrate from one QEMU to another, your
> config size will not suddenly change (fenced by a feature set that will
> not change during migration, observable by a running guest).
>
> My guess would be that we cannot simply change existing configurations
> (defined via feature bits) as you do here - see e.g., qemu-4-0-config-size.

Okay, I guess I can revert that bit.

> [...]
>
> >>>
> >>> @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
> >>>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> >>>      f |= dev->host_features;
> >>>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> >>> +    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> >>> +    }
> >>>
> >>>      return f;
> >>>  }
> >>> @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
> >>>          g_free(s->stats_vq_elem);
> >>>          s->stats_vq_elem = NULL;
> >>>      }
> >>> +
> >>> +    s->poison_val = 0;
> >>>  }
> >>>
> >>>  static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> >>> @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
> >>>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> >>>      DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> >>>                      VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> >>> +    DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
> >>> +                    VIRTIO_BALLOON_F_PAGE_POISON, false),
> >>
> >> Just curious, why an x- feature?
> >
> > It was something I didn't expect the users to enable. It gets enabled
> > when either free page hinting or free page reporting is enabled. So if
> > you look you will see that in virtio_balloon_get_features the page
> > poison feature is added if free page hinting is present. The guest
> > will clear the feature bit if poisoning is not enabled in the guest.
> > That results in the vdev getting the bit cleared.
>
> The weird thing is that setting it to "false" will still enable it
> automatically, depending on other features. Not sure how helpful that is.
>
> I'd prefer to simply always enable it, similar to
> VIRTIO_BALLOON_F_STATS_VQ - but it's late and I am confused how
> migration with compat machines will work. :)
>
> So, I wonder if we need this feature bit here at all.

I can drop it. I don't really recall why I added that piece.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-15 19:28           ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-15 19:28 UTC (permalink / raw)
  To: David Hildenbrand, Wang, Wei W
  Cc: Paolo Bonzini, Michael S. Tsirkin, virtio-dev, qemu-devel

On Wed, Apr 15, 2020 at 11:17 AM David Hildenbrand <david@redhat.com> wrote:
>
> >
> > The comment above explains the "why". Basically poisoning a page will
> > dirty it. So why hint a page as free when that will drop it back into
> > the guest and result in it being dirtied again. What you end up with
> > is all the pages that were temporarily placed in the balloon are dirty
> > after the hinting report is finished at which point you made things
> > worse instead of helping to improve them.
>
> Let me think this through. What happens on free page hinting is that
>
> a) we tell the guest that migration starts
> b) it allocates pages (alloc_pages()), sends them to us and adds them to
>    a list
> b) we exclude these pages on migration
> c) we tell the guest that migration is over
> d) the guest frees all allocated pages
>
> The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
> will fill all pages again with a pattern (or zero).

They should have already been filled with the poison pattern before we
got to d). A bigger worry is that we at some point in the future
update the kernel so that d) doesn't trigger re-poisoning, in which
case the pages won't be marked as dirty, we will have skipped the
migration, and we have no idea what will be in the pages at the end.

> AFAIKs, it's either
>
> 1) Not performing free page hinting, migrating pages filled with a
> poison value (or zero).
> 2) Performing free page hinting, not migrating pages filled with a
> poison value (or zero), letting only the guest fill them again.
>
> I have the feeling, that 2) is still better for migration, because we
> don't migrate useless pages and let the guest reconstruct the content?
> (having a poison value of zero might be debatable)
>
> Can you tell me what I am missing? :)

The goal of the free page hinting was to reduce the number of pages
that have to be migrated, in the second case you point out is is
basically deferring it and will make the post-copy quite more
expensive since all of the free memory will be pushed to the post-copy
which I would think would be undesirable since it means the VM would
have to be down for a greater amount of time with the poisoning
enabled.

The worst case scenario would be one in which the VM was suspended for
migration while there were still pages in the balloon that needed to
be drained. In such a case you would have them in an indeterminate
state since the last page poisoning for them might have been ignored
since they were marked as "free", so they are just going to be
whatever value they were if they had been migrated at all.

> >
> >>
> >>> +        return;
> >>> +    }
> >>> +
> >>>      if (s->free_page_report_cmd_id == UINT_MAX) {
> >>>          s->free_page_report_cmd_id =
> >>>                         VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> >>
> >> We should rename all "free_page_report" stuff we can to
> >> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> >> side :) ) before adding free page reporting .
> >>
> >> (looking at the virtio-balloon linux header, it's also confusing but
> >> we're stuck with that - maybe we should add better comments)
> >
> > Are we stuck? Couldn't we just convert it to an anonymous union with
> > free_page_hint_cmd_id and then use that where needed?
>
> Saw your patch already :)
>
> >
> >>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> >>>      if (s->qemu_4_0_config_size) {
> >>>          return sizeof(struct virtio_balloon_config);
> >>>      }
> >>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> >>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> >>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>          return sizeof(struct virtio_balloon_config);
> >>>      }
> >>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>> -        return offsetof(struct virtio_balloon_config, poison_val);
> >>> -    }
> >>
> >> I am not sure this change is completely sane. Why is that necessary at all?
> >
> > The poison_val is stored at the end of the structure and is required
> > in order to make free page hinting to work. What this change is doing
>
> Required to make it work? In the kernel,
>
> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
> Author: Wei Wang <wei.w.wang@intel.com>
> Date:   Mon Aug 27 09:32:19 2018 +0800
>
>     virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
>
> was merged after
>
> commit 86a559787e6f5cf662c081363f64a20cad654195
> Author: Wei Wang <wei.w.wang@intel.com>
> Date:   Mon Aug 27 09:32:17 2018 +0800
>
>     virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>
> So I assume it's perfectly fine to not have it.
>
> I'd say it's the responsibility of the guest to *not* use
> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
> into the foot.

Based on the timing I am guessing the page poisoning was in the same
patch set as the free page hinting since there is only 2 seconds
between when the two are merged. If I recall the page poisoning logic
was added after the issue was pointed out that it needed to address
it.

I can probably make some changes to virtballoon_validate in the kernel
driver to address the fact that if we are poisoning pages we need to
either have the PAGE_POISON feature or we need to disable page
reporting and page hinting.

> > is forcing it so that we report the config size as the full size if
> > either poisoning or hinting are enabled since the poison val is the
> > last member of the config structure.
> >
> > If the question is why bother reducing the size if free page hinting
> > is not present then I guess we could simplify this and just report
> > report the size of the config for all cases.
>
> I guess the idea is that if you migrate from one QEMU to another, your
> config size will not suddenly change (fenced by a feature set that will
> not change during migration, observable by a running guest).
>
> My guess would be that we cannot simply change existing configurations
> (defined via feature bits) as you do here - see e.g., qemu-4-0-config-size.

Okay, I guess I can revert that bit.

> [...]
>
> >>>
> >>> @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
> >>>      VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> >>>      f |= dev->host_features;
> >>>      virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> >>> +    if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>> +        virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> >>> +    }
> >>>
> >>>      return f;
> >>>  }
> >>> @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
> >>>          g_free(s->stats_vq_elem);
> >>>          s->stats_vq_elem = NULL;
> >>>      }
> >>> +
> >>> +    s->poison_val = 0;
> >>>  }
> >>>
> >>>  static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> >>> @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
> >>>                      VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> >>>      DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> >>>                      VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> >>> +    DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
> >>> +                    VIRTIO_BALLOON_F_PAGE_POISON, false),
> >>
> >> Just curious, why an x- feature?
> >
> > It was something I didn't expect the users to enable. It gets enabled
> > when either free page hinting or free page reporting is enabled. So if
> > you look you will see that in virtio_balloon_get_features the page
> > poison feature is added if free page hinting is present. The guest
> > will clear the feature bit if poisoning is not enabled in the guest.
> > That results in the vdev getting the bit cleared.
>
> The weird thing is that setting it to "false" will still enable it
> automatically, depending on other features. Not sure how helpful that is.
>
> I'd prefer to simply always enable it, similar to
> VIRTIO_BALLOON_F_STATS_VQ - but it's late and I am confused how
> migration with compat machines will work. :)
>
> So, I wonder if we need this feature bit here at all.

I can drop it. I don't really recall why I added that piece.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-15 19:28           ` [virtio-dev] " Alexander Duyck
  (?)
@ 2020-04-15 19:46           ` David Hildenbrand
  2020-04-15 21:16               ` [virtio-dev] " Alexander Duyck
  -1 siblings, 1 reply; 45+ messages in thread
From: David Hildenbrand @ 2020-04-15 19:46 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: virtio-dev, Michael S. Tsirkin, David Hildenbrand, qemu-devel,
	Wang, Wei W, Paolo Bonzini



> Am 15.04.2020 um 21:29 schrieb Alexander Duyck <alexander.duyck@gmail.com>:
> 
> On Wed, Apr 15, 2020 at 11:17 AM David Hildenbrand <david@redhat.com> wrote:
>> 
>>> 
>>> The comment above explains the "why". Basically poisoning a page will
>>> dirty it. So why hint a page as free when that will drop it back into
>>> the guest and result in it being dirtied again. What you end up with
>>> is all the pages that were temporarily placed in the balloon are dirty
>>> after the hinting report is finished at which point you made things
>>> worse instead of helping to improve them.
>> 
>> Let me think this through. What happens on free page hinting is that
>> 
>> a) we tell the guest that migration starts
>> b) it allocates pages (alloc_pages()), sends them to us and adds them to
>>   a list
>> b) we exclude these pages on migration
>> c) we tell the guest that migration is over
>> d) the guest frees all allocated pages
>> 
>> The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
>> will fill all pages again with a pattern (or zero).
> 
> They should have already been filled with the poison pattern before we
> got to d). A bigger worry is that we at some point in the future
> update the kernel so that d) doesn't trigger re-poisoning, in which
> case the pages won't be marked as dirty, we will have skipped the
> migration, and we have no idea what will be in the pages at the end.
> 
>> AFAIKs, it's either
>> 
>> 1) Not performing free page hinting, migrating pages filled with a
>> poison value (or zero).
>> 2) Performing free page hinting, not migrating pages filled with a
>> poison value (or zero), letting only the guest fill them again.
>> 
>> I have the feeling, that 2) is still better for migration, because we
>> don't migrate useless pages and let the guest reconstruct the content?
>> (having a poison value of zero might be debatable)
>> 
>> Can you tell me what I am missing? :)
> 
> The goal of the free page hinting was to reduce the number of pages
> that have to be migrated, in the second case you point out is is
> basically deferring it and will make the post-copy quite more
> expensive since all of the free memory will be pushed to the post-copy
> which I would think would be undesirable since it means the VM would
> have to be down for a greater amount of time with the poisoning
> enabled.

Postcopy is a very good point, bought!

But (what you wrote above) it sounds like that this is really what we *have to* do, not an optimization. I‘ll double check the spec tomorrow (hopefully it was documented). We should rephrase the comment then.

I could imagine that we also have to care about ordinary page inflation/deflation when handling page poisoning. (Iow, don‘t inflate/deflate if set for now).

> 
> The worst case scenario would be one in which the VM was suspended for
> migration while there were still pages in the balloon that needed to
> be drained. In such a case you would have them in an indeterminate
> state since the last page poisoning for them might have been ignored
> since they were marked as "free", so they are just going to be
> whatever value they were if they had been migrated at all.
> 
>>> 
>>>> 
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>>     if (s->free_page_report_cmd_id == UINT_MAX) {
>>>>>         s->free_page_report_cmd_id =
>>>>>                        VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>>>> 
>>>> We should rename all "free_page_report" stuff we can to
>>>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
>>>> side :) ) before adding free page reporting .
>>>> 
>>>> (looking at the virtio-balloon linux header, it's also confusing but
>>>> we're stuck with that - maybe we should add better comments)
>>> 
>>> Are we stuck? Couldn't we just convert it to an anonymous union with
>>> free_page_hint_cmd_id and then use that where needed?
>> 
>> Saw your patch already :)
>> 
>>> 
>>>>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>>>>>     if (s->qemu_4_0_config_size) {
>>>>>         return sizeof(struct virtio_balloon_config);
>>>>>     }
>>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
>>>>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
>>>>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>>>         return sizeof(struct virtio_balloon_config);
>>>>>     }
>>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>>> -        return offsetof(struct virtio_balloon_config, poison_val);
>>>>> -    }
>>>> 
>>>> I am not sure this change is completely sane. Why is that necessary at all?
>>> 
>>> The poison_val is stored at the end of the structure and is required
>>> in order to make free page hinting to work. What this change is doing
>> 
>> Required to make it work? In the kernel,
>> 
>> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
>> Author: Wei Wang <wei.w.wang@intel.com>
>> Date:   Mon Aug 27 09:32:19 2018 +0800
>> 
>>    virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
>> 
>> was merged after
>> 
>> commit 86a559787e6f5cf662c081363f64a20cad654195
>> Author: Wei Wang <wei.w.wang@intel.com>
>> Date:   Mon Aug 27 09:32:17 2018 +0800
>> 
>>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>> 
>> So I assume it's perfectly fine to not have it.
>> 
>> I'd say it's the responsibility of the guest to *not* use
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
>> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
>> into the foot.
> 
> Based on the timing I am guessing the page poisoning was in the same
> patch set as the free page hinting since there is only 2 seconds
> between when the two are merged. If I recall the page poisoning logic
> was added after the issue was pointed out that it needed to address
> it.
> 

Yeah, but any other OS implementing this, not caring about page poisoning wouldn‘t need it. Maybe there is something in the spec.

Mental note: we‘ll have to migrate the poison value if not already done (writing on my mobile).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-15 19:46           ` David Hildenbrand
@ 2020-04-15 21:16               ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-15 21:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: virtio-dev, Michael S. Tsirkin, David Hildenbrand, qemu-devel,
	Wang, Wei W, Paolo Bonzini

On Wed, Apr 15, 2020 at 12:46 PM David Hildenbrand <dhildenb@redhat.com> wrote:
>
>
>
> > Am 15.04.2020 um 21:29 schrieb Alexander Duyck <alexander.duyck@gmail.com>:
> >
> > On Wed, Apr 15, 2020 at 11:17 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >>>
> >>> The comment above explains the "why". Basically poisoning a page will
> >>> dirty it. So why hint a page as free when that will drop it back into
> >>> the guest and result in it being dirtied again. What you end up with
> >>> is all the pages that were temporarily placed in the balloon are dirty
> >>> after the hinting report is finished at which point you made things
> >>> worse instead of helping to improve them.
> >>
> >> Let me think this through. What happens on free page hinting is that
> >>
> >> a) we tell the guest that migration starts
> >> b) it allocates pages (alloc_pages()), sends them to us and adds them to
> >>   a list
> >> b) we exclude these pages on migration
> >> c) we tell the guest that migration is over
> >> d) the guest frees all allocated pages
> >>
> >> The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
> >> will fill all pages again with a pattern (or zero).
> >
> > They should have already been filled with the poison pattern before we
> > got to d). A bigger worry is that we at some point in the future
> > update the kernel so that d) doesn't trigger re-poisoning, in which
> > case the pages won't be marked as dirty, we will have skipped the
> > migration, and we have no idea what will be in the pages at the end.
> >
> >> AFAIKs, it's either
> >>
> >> 1) Not performing free page hinting, migrating pages filled with a
> >> poison value (or zero).
> >> 2) Performing free page hinting, not migrating pages filled with a
> >> poison value (or zero), letting only the guest fill them again.
> >>
> >> I have the feeling, that 2) is still better for migration, because we
> >> don't migrate useless pages and let the guest reconstruct the content?
> >> (having a poison value of zero might be debatable)
> >>
> >> Can you tell me what I am missing? :)
> >
> > The goal of the free page hinting was to reduce the number of pages
> > that have to be migrated, in the second case you point out is is
> > basically deferring it and will make the post-copy quite more
> > expensive since all of the free memory will be pushed to the post-copy
> > which I would think would be undesirable since it means the VM would
> > have to be down for a greater amount of time with the poisoning
> > enabled.
>
> Postcopy is a very good point, bought!
>
> But (what you wrote above) it sounds like that this is really what we *have to* do, not an optimization. I‘ll double check the spec tomorrow (hopefully it was documented). We should rephrase the comment then.

Do you have a link to the spec that I could look at? I am not hopeful
that this will be listed there since the poison side of QEMU was never
implemented. The flag is only here because it was copied over in the
kernel header.

> I could imagine that we also have to care about ordinary page inflation/deflation when handling page poisoning. (Iow, don‘t inflate/deflate if set for now).

The problem is this was broken from the start for that. The issue is
that the poison feature was wrapped up within the page hinting
feature. So as a result enabling it for a VM that doesn't recognize
the feature independently would likely leave the poison value
uninitialized and flagging as though a value of 0 is used. In addition
the original poison checking wasn't making sure that the page wasn't
init_on_free which has the same effect as page poisoning but isn't
page poisoning.

> >
> > The worst case scenario would be one in which the VM was suspended for
> > migration while there were still pages in the balloon that needed to
> > be drained. In such a case you would have them in an indeterminate
> > state since the last page poisoning for them might have been ignored
> > since they were marked as "free", so they are just going to be
> > whatever value they were if they had been migrated at all.
> >
> >>>
> >>>>
> >>>>> +        return;
> >>>>> +    }
> >>>>> +
> >>>>>     if (s->free_page_report_cmd_id == UINT_MAX) {
> >>>>>         s->free_page_report_cmd_id =
> >>>>>                        VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> >>>>
> >>>> We should rename all "free_page_report" stuff we can to
> >>>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> >>>> side :) ) before adding free page reporting .
> >>>>
> >>>> (looking at the virtio-balloon linux header, it's also confusing but
> >>>> we're stuck with that - maybe we should add better comments)
> >>>
> >>> Are we stuck? Couldn't we just convert it to an anonymous union with
> >>> free_page_hint_cmd_id and then use that where needed?
> >>
> >> Saw your patch already :)
> >>
> >>>
> >>>>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> >>>>>     if (s->qemu_4_0_config_size) {
> >>>>>         return sizeof(struct virtio_balloon_config);
> >>>>>     }
> >>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> >>>>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> >>>>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>>>         return sizeof(struct virtio_balloon_config);
> >>>>>     }
> >>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>>> -        return offsetof(struct virtio_balloon_config, poison_val);
> >>>>> -    }
> >>>>
> >>>> I am not sure this change is completely sane. Why is that necessary at all?
> >>>
> >>> The poison_val is stored at the end of the structure and is required
> >>> in order to make free page hinting to work. What this change is doing
> >>
> >> Required to make it work? In the kernel,
> >>
> >> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
> >> Author: Wei Wang <wei.w.wang@intel.com>
> >> Date:   Mon Aug 27 09:32:19 2018 +0800
> >>
> >>    virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
> >>
> >> was merged after
> >>
> >> commit 86a559787e6f5cf662c081363f64a20cad654195
> >> Author: Wei Wang <wei.w.wang@intel.com>
> >> Date:   Mon Aug 27 09:32:17 2018 +0800
> >>
> >>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> >>
> >> So I assume it's perfectly fine to not have it.
> >>
> >> I'd say it's the responsibility of the guest to *not* use
> >> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
> >> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
> >> into the foot.
> >
> > Based on the timing I am guessing the page poisoning was in the same
> > patch set as the free page hinting since there is only 2 seconds
> > between when the two are merged. If I recall the page poisoning logic
> > was added after the issue was pointed out that it needed to address
> > it.
> >
>
> Yeah, but any other OS implementing this, not caring about page poisoning wouldn‘t need it. Maybe there is something in the spec.
>
> Mental note: we‘ll have to migrate the poison value if not already done (writing on my mobile).

Right. We need to make sure any poison or init on free is migrated
over to the destination before we can say we are going to skip the
migration. If anything what I probably ought to look into would be if
we could change the code so that if we have a hint the page is unused,
poison is enabled, and the value is 0 we just ship over a 0 page
instead of giving up on hinting entirely.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-15 21:16               ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-15 21:16 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: David Hildenbrand, Wang, Wei W, Paolo Bonzini,
	Michael S. Tsirkin, virtio-dev, qemu-devel

On Wed, Apr 15, 2020 at 12:46 PM David Hildenbrand <dhildenb@redhat.com> wrote:
>
>
>
> > Am 15.04.2020 um 21:29 schrieb Alexander Duyck <alexander.duyck@gmail.com>:
> >
> > On Wed, Apr 15, 2020 at 11:17 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >>>
> >>> The comment above explains the "why". Basically poisoning a page will
> >>> dirty it. So why hint a page as free when that will drop it back into
> >>> the guest and result in it being dirtied again. What you end up with
> >>> is all the pages that were temporarily placed in the balloon are dirty
> >>> after the hinting report is finished at which point you made things
> >>> worse instead of helping to improve them.
> >>
> >> Let me think this through. What happens on free page hinting is that
> >>
> >> a) we tell the guest that migration starts
> >> b) it allocates pages (alloc_pages()), sends them to us and adds them to
> >>   a list
> >> b) we exclude these pages on migration
> >> c) we tell the guest that migration is over
> >> d) the guest frees all allocated pages
> >>
> >> The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
> >> will fill all pages again with a pattern (or zero).
> >
> > They should have already been filled with the poison pattern before we
> > got to d). A bigger worry is that we at some point in the future
> > update the kernel so that d) doesn't trigger re-poisoning, in which
> > case the pages won't be marked as dirty, we will have skipped the
> > migration, and we have no idea what will be in the pages at the end.
> >
> >> AFAIKs, it's either
> >>
> >> 1) Not performing free page hinting, migrating pages filled with a
> >> poison value (or zero).
> >> 2) Performing free page hinting, not migrating pages filled with a
> >> poison value (or zero), letting only the guest fill them again.
> >>
> >> I have the feeling, that 2) is still better for migration, because we
> >> don't migrate useless pages and let the guest reconstruct the content?
> >> (having a poison value of zero might be debatable)
> >>
> >> Can you tell me what I am missing? :)
> >
> > The goal of the free page hinting was to reduce the number of pages
> > that have to be migrated, in the second case you point out is is
> > basically deferring it and will make the post-copy quite more
> > expensive since all of the free memory will be pushed to the post-copy
> > which I would think would be undesirable since it means the VM would
> > have to be down for a greater amount of time with the poisoning
> > enabled.
>
> Postcopy is a very good point, bought!
>
> But (what you wrote above) it sounds like that this is really what we *have to* do, not an optimization. I‘ll double check the spec tomorrow (hopefully it was documented). We should rephrase the comment then.

Do you have a link to the spec that I could look at? I am not hopeful
that this will be listed there since the poison side of QEMU was never
implemented. The flag is only here because it was copied over in the
kernel header.

> I could imagine that we also have to care about ordinary page inflation/deflation when handling page poisoning. (Iow, don‘t inflate/deflate if set for now).

The problem is this was broken from the start for that. The issue is
that the poison feature was wrapped up within the page hinting
feature. So as a result enabling it for a VM that doesn't recognize
the feature independently would likely leave the poison value
uninitialized and flagging as though a value of 0 is used. In addition
the original poison checking wasn't making sure that the page wasn't
init_on_free which has the same effect as page poisoning but isn't
page poisoning.

> >
> > The worst case scenario would be one in which the VM was suspended for
> > migration while there were still pages in the balloon that needed to
> > be drained. In such a case you would have them in an indeterminate
> > state since the last page poisoning for them might have been ignored
> > since they were marked as "free", so they are just going to be
> > whatever value they were if they had been migrated at all.
> >
> >>>
> >>>>
> >>>>> +        return;
> >>>>> +    }
> >>>>> +
> >>>>>     if (s->free_page_report_cmd_id == UINT_MAX) {
> >>>>>         s->free_page_report_cmd_id =
> >>>>>                        VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> >>>>
> >>>> We should rename all "free_page_report" stuff we can to
> >>>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> >>>> side :) ) before adding free page reporting .
> >>>>
> >>>> (looking at the virtio-balloon linux header, it's also confusing but
> >>>> we're stuck with that - maybe we should add better comments)
> >>>
> >>> Are we stuck? Couldn't we just convert it to an anonymous union with
> >>> free_page_hint_cmd_id and then use that where needed?
> >>
> >> Saw your patch already :)
> >>
> >>>
> >>>>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> >>>>>     if (s->qemu_4_0_config_size) {
> >>>>>         return sizeof(struct virtio_balloon_config);
> >>>>>     }
> >>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> >>>>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> >>>>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>>>         return sizeof(struct virtio_balloon_config);
> >>>>>     }
> >>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>>> -        return offsetof(struct virtio_balloon_config, poison_val);
> >>>>> -    }
> >>>>
> >>>> I am not sure this change is completely sane. Why is that necessary at all?
> >>>
> >>> The poison_val is stored at the end of the structure and is required
> >>> in order to make free page hinting to work. What this change is doing
> >>
> >> Required to make it work? In the kernel,
> >>
> >> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
> >> Author: Wei Wang <wei.w.wang@intel.com>
> >> Date:   Mon Aug 27 09:32:19 2018 +0800
> >>
> >>    virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
> >>
> >> was merged after
> >>
> >> commit 86a559787e6f5cf662c081363f64a20cad654195
> >> Author: Wei Wang <wei.w.wang@intel.com>
> >> Date:   Mon Aug 27 09:32:17 2018 +0800
> >>
> >>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> >>
> >> So I assume it's perfectly fine to not have it.
> >>
> >> I'd say it's the responsibility of the guest to *not* use
> >> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
> >> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
> >> into the foot.
> >
> > Based on the timing I am guessing the page poisoning was in the same
> > patch set as the free page hinting since there is only 2 seconds
> > between when the two are merged. If I recall the page poisoning logic
> > was added after the issue was pointed out that it needed to address
> > it.
> >
>
> Yeah, but any other OS implementing this, not caring about page poisoning wouldn‘t need it. Maybe there is something in the spec.
>
> Mental note: we‘ll have to migrate the poison value if not already done (writing on my mobile).

Right. We need to make sure any poison or init on free is migrated
over to the destination before we can say we are going to skip the
migration. If anything what I probably ought to look into would be if
we could change the code so that if we have a hint the page is unused,
poison is enabled, and the value is 0 we just ship over a 0 page
instead of giving up on hinting entirely.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-15 21:16               ` [virtio-dev] " Alexander Duyck
@ 2020-04-16  8:18                 ` David Hildenbrand
  -1 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-16  8:18 UTC (permalink / raw)
  To: Alexander Duyck, David Hildenbrand
  Cc: qemu-devel, Paolo Bonzini, Wang, Wei W, virtio-dev, Michael S. Tsirkin

>>
>> Postcopy is a very good point, bought!
>>
>> But (what you wrote above) it sounds like that this is really what we *have to* do, not an optimization. I‘ll double check the spec tomorrow (hopefully it was documented). We should rephrase the comment then.
> 
> Do you have a link to the spec that I could look at? I am not hopeful
> that this will be listed there since the poison side of QEMU was never
> implemented. The flag is only here because it was copied over in the
> kernel header.

Introducing a feature without

a) specification what it does
b) implementation (of both sides) showing what has to be done
c) any kind of documentation of what needs to be done

just horrible.

The latest-greatest spec lives in

https://github.com/oasis-tcs/virtio-spec.git

I can't spot any signs of free page hinting/page poisioning. :(

We should document our result of page poisoning, free page hinting, and
free page reporting there as well. I hope you'll have time for the latter.

-------------------------------------------------------------------------
Semantics of VIRTIO_BALLOON_F_PAGE_POISON
-------------------------------------------------------------------------

"The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value that is in use."
-> Very little information, no signs about what has to be done.

"Let the hypervisor know that we are expecting a specific value to be
written back in balloon pages."
-> Okay, that talks about "balloon pages", which would include right now
-- pages "inflated" and then "deflated" using free page hinting
-- pages "inflated" and then "deflated" using oridnary inflate/deflate
   queue
-- pages "inflated" using free page reporting and automatically
   "deflated" on access

So VIRTIO_BALLOON_F_PAGE_POISON really means "whenever the guest
deflates a page (either explicitly, or implicitly with free page
reporting), it is filled with "poison_val".

And I would add

"However, if the inflated page was not filled with "poison_val" when
inflating, it's not predictable if the original page or a page filled
with "poison_val" is returned."

Which would cover the "we did not discard the page in the hypervisor, so
the original page is still there".


We should also document what is expected to happen if "poison_val" is
suddenly changed by the guest at one point in time again. (e.g., not
supported, unexpected things can happen, etc.) Also, we might have to
document that a device reset resets the poison_val to 0. (not sure if
that's really necessary)

-------------------------------------------------------------------------
What we have to do in the guest/Linux:
-------------------------------------------------------------------------

A guest which relies on this (esp., free page reporting in Linux only,
right?), has to not use/advertise VIRTIO_BALLOON_F_REPORTING *in case
VIRTIO_BALLOON_F_PAGE_POISON is not provided* by the host. AFAIKS,
ordinary inflation/deflation and free page hinting does currently not
care, as we go via free_page(), so that is currently fine AFAIKs.

-------------------------------------------------------------------------
What we have to do in the hypervisor/QEMU:
-------------------------------------------------------------------------

With VIRTIO_BALLOON_F_PAGE_POISON, we could provide free page reporting
easily IFF "page_val"==0. However, as you said, it will currently be
expensive in case of postcopy, as the guest still zeroes out all pages.
Document that properly.

With VIRTIO_BALLOON_F_PAGE_POISON, don't inflate any pages right now
(not discarding anything), OR fill the pages with poison_val when
deflating. I guess the latter would be better - even if current Linux
does not need it, it's what we are expected to do AFAIKS.

With VIRTIO_BALLOON_F_PAGE_POISON and page_val != 0, discard all free
page reporting attempts. (== what your patch #3 does). Document that
properly.


Makes sense? See below for guest migration thingies.

> 
>> I could imagine that we also have to care about ordinary page inflation/deflation when handling page poisoning. (Iow, don‘t inflate/deflate if set for now).
> 
> The problem is this was broken from the start for that. The issue is
> that the poison feature was wrapped up within the page hinting
> feature. So as a result enabling it for a VM that doesn't recognize
> the feature independently would likely leave the poison value
> uninitialized and flagging as though a value of 0 is used. In addition
> the original poison checking wasn't making sure that the page wasn't
> init_on_free which has the same effect as page poisoning but isn't
> page poisoning.

If the guest agrees to have VIRTIO_BALLOON_F_PAGE_POISON but does not
initialize poison_val, then it's a guest bug, I wouldn't worry about
that for now.

> 
>>>
>>> The worst case scenario would be one in which the VM was suspended for
>>> migration while there were still pages in the balloon that needed to
>>> be drained. In such a case you would have them in an indeterminate
>>> state since the last page poisoning for them might have been ignored
>>> since they were marked as "free", so they are just going to be
>>> whatever value they were if they had been migrated at all.
>>>
>>>>>
>>>>>>
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>     if (s->free_page_report_cmd_id == UINT_MAX) {
>>>>>>>         s->free_page_report_cmd_id =
>>>>>>>                        VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>>>>>>
>>>>>> We should rename all "free_page_report" stuff we can to
>>>>>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
>>>>>> side :) ) before adding free page reporting .
>>>>>>
>>>>>> (looking at the virtio-balloon linux header, it's also confusing but
>>>>>> we're stuck with that - maybe we should add better comments)
>>>>>
>>>>> Are we stuck? Couldn't we just convert it to an anonymous union with
>>>>> free_page_hint_cmd_id and then use that where needed?
>>>>
>>>> Saw your patch already :)
>>>>
>>>>>
>>>>>>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>>>>>>>     if (s->qemu_4_0_config_size) {
>>>>>>>         return sizeof(struct virtio_balloon_config);
>>>>>>>     }
>>>>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
>>>>>>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
>>>>>>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>>>>>         return sizeof(struct virtio_balloon_config);
>>>>>>>     }
>>>>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>>>>> -        return offsetof(struct virtio_balloon_config, poison_val);
>>>>>>> -    }
>>>>>>
>>>>>> I am not sure this change is completely sane. Why is that necessary at all?
>>>>>
>>>>> The poison_val is stored at the end of the structure and is required
>>>>> in order to make free page hinting to work. What this change is doing
>>>>
>>>> Required to make it work? In the kernel,
>>>>
>>>> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
>>>> Author: Wei Wang <wei.w.wang@intel.com>
>>>> Date:   Mon Aug 27 09:32:19 2018 +0800
>>>>
>>>>    virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
>>>>
>>>> was merged after
>>>>
>>>> commit 86a559787e6f5cf662c081363f64a20cad654195
>>>> Author: Wei Wang <wei.w.wang@intel.com>
>>>> Date:   Mon Aug 27 09:32:17 2018 +0800
>>>>
>>>>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>>>>
>>>> So I assume it's perfectly fine to not have it.
>>>>
>>>> I'd say it's the responsibility of the guest to *not* use
>>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
>>>> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
>>>> into the foot.
>>>
>>> Based on the timing I am guessing the page poisoning was in the same
>>> patch set as the free page hinting since there is only 2 seconds
>>> between when the two are merged. If I recall the page poisoning logic
>>> was added after the issue was pointed out that it needed to address
>>> it.
>>>
>>
>> Yeah, but any other OS implementing this, not caring about page poisoning wouldn‘t need it. Maybe there is something in the spec.
>>
>> Mental note: we‘ll have to migrate the poison value if not already done (writing on my mobile).
> 
> Right. We need to make sure any poison or init on free is migrated
> over to the destination before we can say we are going to skip the
> migration. If anything what I probably ought to look into would be if
> we could change the code so that if we have a hint the page is unused,
> poison is enabled, and the value is 0 we just ship over a 0 page
> instead of giving up on hinting entirely.
> 

Yeah, we have to migrate poison_val from source to destination. Also, we
should worry about us losing the page poisoning feature when migrating
from source to destination.

Thinking about it, it might make sense to completely decouple page
poisoning here. Assign it a separate property (as you did), default
enable it, but disable it via QEMU compat machines.

Then, we won't lose the feature when migrating.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-16  8:18                 ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-16  8:18 UTC (permalink / raw)
  To: Alexander Duyck, David Hildenbrand
  Cc: Wang, Wei W, Paolo Bonzini, Michael S. Tsirkin, virtio-dev, qemu-devel

>>
>> Postcopy is a very good point, bought!
>>
>> But (what you wrote above) it sounds like that this is really what we *have to* do, not an optimization. I‘ll double check the spec tomorrow (hopefully it was documented). We should rephrase the comment then.
> 
> Do you have a link to the spec that I could look at? I am not hopeful
> that this will be listed there since the poison side of QEMU was never
> implemented. The flag is only here because it was copied over in the
> kernel header.

Introducing a feature without

a) specification what it does
b) implementation (of both sides) showing what has to be done
c) any kind of documentation of what needs to be done

just horrible.

The latest-greatest spec lives in

https://github.com/oasis-tcs/virtio-spec.git

I can't spot any signs of free page hinting/page poisioning. :(

We should document our result of page poisoning, free page hinting, and
free page reporting there as well. I hope you'll have time for the latter.

-------------------------------------------------------------------------
Semantics of VIRTIO_BALLOON_F_PAGE_POISON
-------------------------------------------------------------------------

"The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
guest is using page poisoning. Guest writes to the poison_val config
field to tell host about the page poisoning value that is in use."
-> Very little information, no signs about what has to be done.

"Let the hypervisor know that we are expecting a specific value to be
written back in balloon pages."
-> Okay, that talks about "balloon pages", which would include right now
-- pages "inflated" and then "deflated" using free page hinting
-- pages "inflated" and then "deflated" using oridnary inflate/deflate
   queue
-- pages "inflated" using free page reporting and automatically
   "deflated" on access

So VIRTIO_BALLOON_F_PAGE_POISON really means "whenever the guest
deflates a page (either explicitly, or implicitly with free page
reporting), it is filled with "poison_val".

And I would add

"However, if the inflated page was not filled with "poison_val" when
inflating, it's not predictable if the original page or a page filled
with "poison_val" is returned."

Which would cover the "we did not discard the page in the hypervisor, so
the original page is still there".


We should also document what is expected to happen if "poison_val" is
suddenly changed by the guest at one point in time again. (e.g., not
supported, unexpected things can happen, etc.) Also, we might have to
document that a device reset resets the poison_val to 0. (not sure if
that's really necessary)

-------------------------------------------------------------------------
What we have to do in the guest/Linux:
-------------------------------------------------------------------------

A guest which relies on this (esp., free page reporting in Linux only,
right?), has to not use/advertise VIRTIO_BALLOON_F_REPORTING *in case
VIRTIO_BALLOON_F_PAGE_POISON is not provided* by the host. AFAIKS,
ordinary inflation/deflation and free page hinting does currently not
care, as we go via free_page(), so that is currently fine AFAIKs.

-------------------------------------------------------------------------
What we have to do in the hypervisor/QEMU:
-------------------------------------------------------------------------

With VIRTIO_BALLOON_F_PAGE_POISON, we could provide free page reporting
easily IFF "page_val"==0. However, as you said, it will currently be
expensive in case of postcopy, as the guest still zeroes out all pages.
Document that properly.

With VIRTIO_BALLOON_F_PAGE_POISON, don't inflate any pages right now
(not discarding anything), OR fill the pages with poison_val when
deflating. I guess the latter would be better - even if current Linux
does not need it, it's what we are expected to do AFAIKS.

With VIRTIO_BALLOON_F_PAGE_POISON and page_val != 0, discard all free
page reporting attempts. (== what your patch #3 does). Document that
properly.


Makes sense? See below for guest migration thingies.

> 
>> I could imagine that we also have to care about ordinary page inflation/deflation when handling page poisoning. (Iow, don‘t inflate/deflate if set for now).
> 
> The problem is this was broken from the start for that. The issue is
> that the poison feature was wrapped up within the page hinting
> feature. So as a result enabling it for a VM that doesn't recognize
> the feature independently would likely leave the poison value
> uninitialized and flagging as though a value of 0 is used. In addition
> the original poison checking wasn't making sure that the page wasn't
> init_on_free which has the same effect as page poisoning but isn't
> page poisoning.

If the guest agrees to have VIRTIO_BALLOON_F_PAGE_POISON but does not
initialize poison_val, then it's a guest bug, I wouldn't worry about
that for now.

> 
>>>
>>> The worst case scenario would be one in which the VM was suspended for
>>> migration while there were still pages in the balloon that needed to
>>> be drained. In such a case you would have them in an indeterminate
>>> state since the last page poisoning for them might have been ignored
>>> since they were marked as "free", so they are just going to be
>>> whatever value they were if they had been migrated at all.
>>>
>>>>>
>>>>>>
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>     if (s->free_page_report_cmd_id == UINT_MAX) {
>>>>>>>         s->free_page_report_cmd_id =
>>>>>>>                        VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>>>>>>
>>>>>> We should rename all "free_page_report" stuff we can to
>>>>>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
>>>>>> side :) ) before adding free page reporting .
>>>>>>
>>>>>> (looking at the virtio-balloon linux header, it's also confusing but
>>>>>> we're stuck with that - maybe we should add better comments)
>>>>>
>>>>> Are we stuck? Couldn't we just convert it to an anonymous union with
>>>>> free_page_hint_cmd_id and then use that where needed?
>>>>
>>>> Saw your patch already :)
>>>>
>>>>>
>>>>>>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>>>>>>>     if (s->qemu_4_0_config_size) {
>>>>>>>         return sizeof(struct virtio_balloon_config);
>>>>>>>     }
>>>>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
>>>>>>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
>>>>>>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>>>>>         return sizeof(struct virtio_balloon_config);
>>>>>>>     }
>>>>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>>>>>> -        return offsetof(struct virtio_balloon_config, poison_val);
>>>>>>> -    }
>>>>>>
>>>>>> I am not sure this change is completely sane. Why is that necessary at all?
>>>>>
>>>>> The poison_val is stored at the end of the structure and is required
>>>>> in order to make free page hinting to work. What this change is doing
>>>>
>>>> Required to make it work? In the kernel,
>>>>
>>>> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
>>>> Author: Wei Wang <wei.w.wang@intel.com>
>>>> Date:   Mon Aug 27 09:32:19 2018 +0800
>>>>
>>>>    virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
>>>>
>>>> was merged after
>>>>
>>>> commit 86a559787e6f5cf662c081363f64a20cad654195
>>>> Author: Wei Wang <wei.w.wang@intel.com>
>>>> Date:   Mon Aug 27 09:32:17 2018 +0800
>>>>
>>>>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>>>>
>>>> So I assume it's perfectly fine to not have it.
>>>>
>>>> I'd say it's the responsibility of the guest to *not* use
>>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
>>>> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
>>>> into the foot.
>>>
>>> Based on the timing I am guessing the page poisoning was in the same
>>> patch set as the free page hinting since there is only 2 seconds
>>> between when the two are merged. If I recall the page poisoning logic
>>> was added after the issue was pointed out that it needed to address
>>> it.
>>>
>>
>> Yeah, but any other OS implementing this, not caring about page poisoning wouldn‘t need it. Maybe there is something in the spec.
>>
>> Mental note: we‘ll have to migrate the poison value if not already done (writing on my mobile).
> 
> Right. We need to make sure any poison or init on free is migrated
> over to the destination before we can say we are going to skip the
> migration. If anything what I probably ought to look into would be if
> we could change the code so that if we have a hint the page is unused,
> poison is enabled, and the value is 0 we just ship over a 0 page
> instead of giving up on hinting entirely.
> 

Yeah, we have to migrate poison_val from source to destination. Also, we
should worry about us losing the page poisoning feature when migrating
from source to destination.

Thinking about it, it might make sense to completely decouple page
poisoning here. Assign it a separate property (as you did), default
enable it, but disable it via QEMU compat machines.

Then, we won't lose the feature when migrating.

-- 
Thanks,

David / dhildenb


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-16  8:18                 ` [virtio-dev] " David Hildenbrand
@ 2020-04-16  8:36                   ` David Hildenbrand
  -1 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-16  8:36 UTC (permalink / raw)
  To: Alexander Duyck, David Hildenbrand
  Cc: qemu-devel, Paolo Bonzini, Wang, Wei W, virtio-dev, Michael S. Tsirkin

On 16.04.20 10:18, David Hildenbrand wrote:
>>>
>>> Postcopy is a very good point, bought!
>>>
>>> But (what you wrote above) it sounds like that this is really what we *have to* do, not an optimization. I‘ll double check the spec tomorrow (hopefully it was documented). We should rephrase the comment then.
>>
>> Do you have a link to the spec that I could look at? I am not hopeful
>> that this will be listed there since the poison side of QEMU was never
>> implemented. The flag is only here because it was copied over in the
>> kernel header.
> 
> Introducing a feature without
> 
> a) specification what it does
> b) implementation (of both sides) showing what has to be done
> c) any kind of documentation of what needs to be done
> 
> just horrible.
> 
> The latest-greatest spec lives in
> 
> https://github.com/oasis-tcs/virtio-spec.git
> 
> I can't spot any signs of free page hinting/page poisioning. :(
> 
> We should document our result of page poisoning, free page hinting, and
> free page reporting there as well. I hope you'll have time for the latter.
> 
> -------------------------------------------------------------------------
> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
> -------------------------------------------------------------------------
> 
> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
> guest is using page poisoning. Guest writes to the poison_val config
> field to tell host about the page poisoning value that is in use."
> -> Very little information, no signs about what has to be done.
> 
> "Let the hypervisor know that we are expecting a specific value to be
> written back in balloon pages."
> -> Okay, that talks about "balloon pages", which would include right now
> -- pages "inflated" and then "deflated" using free page hinting
> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
>    queue
> -- pages "inflated" using free page reporting and automatically
>    "deflated" on access
> 
> So VIRTIO_BALLOON_F_PAGE_POISON really means "whenever the guest
> deflates a page (either explicitly, or implicitly with free page
> reporting), it is filled with "poison_val".
> 
> And I would add
> 
> "However, if the inflated page was not filled with "poison_val" when
> inflating, it's not predictable if the original page or a page filled
> with "poison_val" is returned."
> 
> Which would cover the "we did not discard the page in the hypervisor, so
> the original page is still there".
> 
> 
> We should also document what is expected to happen if "poison_val" is
> suddenly changed by the guest at one point in time again. (e.g., not
> supported, unexpected things can happen, etc.) Also, we might have to
> document that a device reset resets the poison_val to 0. (not sure if
> that's really necessary)

Looking at the spec, we will only have to care about
"VIRTIO_BALLOON_F_MUST_TELL_HOST". Especially:

"If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiated, the
driver MUST NOT use pages from the balloon until the device has
acknowledged the deflate request. Otherwise, if the
VIRTIO_BALLOON_F_MUST_TELL_HOST feature is not negotiated, the driver
MAY begin to re-use pages previously given to the balloon before the
device has acknowledged the deflate request."

So I guess, in QEMU, if
- VIRTIO_BALLOON_F_MUST_TELL_HOST is not set (== currently always)
- VIRTIO_BALLOON_F_PAGE_POISON is set
- poison_val is != 0
then, don't discard any pages, because we cannot fill the page reliably
during a deflation request (== guest could already be reusing the page
and expecting a certain value).

In QEMU, we should always set VIRTIO_BALLOON_F_MUST_TELL_HOST when
setting VIRTIO_BALLOON_F_PAGE_POISON.

In the spec, we should document that VIRTIO_BALLOON_F_PAGE_POISON should
only be used with VIRTIO_BALLOON_F_MUST_TELL_HOST or sth like that ...

confusing stuff. Let me know what you think.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-16  8:36                   ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-16  8:36 UTC (permalink / raw)
  To: Alexander Duyck, David Hildenbrand
  Cc: Wang, Wei W, Paolo Bonzini, Michael S. Tsirkin, virtio-dev, qemu-devel

On 16.04.20 10:18, David Hildenbrand wrote:
>>>
>>> Postcopy is a very good point, bought!
>>>
>>> But (what you wrote above) it sounds like that this is really what we *have to* do, not an optimization. I‘ll double check the spec tomorrow (hopefully it was documented). We should rephrase the comment then.
>>
>> Do you have a link to the spec that I could look at? I am not hopeful
>> that this will be listed there since the poison side of QEMU was never
>> implemented. The flag is only here because it was copied over in the
>> kernel header.
> 
> Introducing a feature without
> 
> a) specification what it does
> b) implementation (of both sides) showing what has to be done
> c) any kind of documentation of what needs to be done
> 
> just horrible.
> 
> The latest-greatest spec lives in
> 
> https://github.com/oasis-tcs/virtio-spec.git
> 
> I can't spot any signs of free page hinting/page poisioning. :(
> 
> We should document our result of page poisoning, free page hinting, and
> free page reporting there as well. I hope you'll have time for the latter.
> 
> -------------------------------------------------------------------------
> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
> -------------------------------------------------------------------------
> 
> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
> guest is using page poisoning. Guest writes to the poison_val config
> field to tell host about the page poisoning value that is in use."
> -> Very little information, no signs about what has to be done.
> 
> "Let the hypervisor know that we are expecting a specific value to be
> written back in balloon pages."
> -> Okay, that talks about "balloon pages", which would include right now
> -- pages "inflated" and then "deflated" using free page hinting
> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
>    queue
> -- pages "inflated" using free page reporting and automatically
>    "deflated" on access
> 
> So VIRTIO_BALLOON_F_PAGE_POISON really means "whenever the guest
> deflates a page (either explicitly, or implicitly with free page
> reporting), it is filled with "poison_val".
> 
> And I would add
> 
> "However, if the inflated page was not filled with "poison_val" when
> inflating, it's not predictable if the original page or a page filled
> with "poison_val" is returned."
> 
> Which would cover the "we did not discard the page in the hypervisor, so
> the original page is still there".
> 
> 
> We should also document what is expected to happen if "poison_val" is
> suddenly changed by the guest at one point in time again. (e.g., not
> supported, unexpected things can happen, etc.) Also, we might have to
> document that a device reset resets the poison_val to 0. (not sure if
> that's really necessary)

Looking at the spec, we will only have to care about
"VIRTIO_BALLOON_F_MUST_TELL_HOST". Especially:

"If the VIRTIO_BALLOON_F_MUST_TELL_HOST feature is negotiated, the
driver MUST NOT use pages from the balloon until the device has
acknowledged the deflate request. Otherwise, if the
VIRTIO_BALLOON_F_MUST_TELL_HOST feature is not negotiated, the driver
MAY begin to re-use pages previously given to the balloon before the
device has acknowledged the deflate request."

So I guess, in QEMU, if
- VIRTIO_BALLOON_F_MUST_TELL_HOST is not set (== currently always)
- VIRTIO_BALLOON_F_PAGE_POISON is set
- poison_val is != 0
then, don't discard any pages, because we cannot fill the page reliably
during a deflation request (== guest could already be reusing the page
and expecting a certain value).

In QEMU, we should always set VIRTIO_BALLOON_F_MUST_TELL_HOST when
setting VIRTIO_BALLOON_F_PAGE_POISON.

In the spec, we should document that VIRTIO_BALLOON_F_PAGE_POISON should
only be used with VIRTIO_BALLOON_F_MUST_TELL_HOST or sth like that ...

confusing stuff. Let me know what you think.


-- 
Thanks,

David / dhildenb


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-16  8:18                 ` [virtio-dev] " David Hildenbrand
@ 2020-04-16 14:33                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2020-04-16 14:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: virtio-dev, qemu-devel, Alexander Duyck, Wang, Wei W,
	David Hildenbrand, Paolo Bonzini

On Thu, Apr 16, 2020 at 10:18:49AM +0200, David Hildenbrand wrote:
> >>
> >> Postcopy is a very good point, bought!
> >>
> >> But (what you wrote above) it sounds like that this is really what we *have to* do, not an optimization. I‘ll double check the spec tomorrow (hopefully it was documented). We should rephrase the comment then.
> > 
> > Do you have a link to the spec that I could look at? I am not hopeful
> > that this will be listed there since the poison side of QEMU was never
> > implemented. The flag is only here because it was copied over in the
> > kernel header.
> 
> Introducing a feature without
> 
> a) specification what it does
> b) implementation (of both sides) showing what has to be done
> c) any kind of documentation of what needs to be done
> 
> just horrible.
> 
> The latest-greatest spec lives in
> 
> https://github.com/oasis-tcs/virtio-spec.git
> 
> I can't spot any signs of free page hinting/page poisioning. :(

Right. I merged the hinting support in Linux on the hope that
spec and qemu upstream will surface later. It seemed so
since IIRC there were some drafts (even though I don't
have any links to hand). Unfortunately neither happened.



> We should document our result of page poisoning, free page hinting, and
> free page reporting there as well. I hope you'll have time for the latter.
> 
> -------------------------------------------------------------------------
> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
> -------------------------------------------------------------------------
> 
> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
> guest is using page poisoning. Guest writes to the poison_val config
> field to tell host about the page poisoning value that is in use."
> -> Very little information, no signs about what has to be done.

I think it's an informational field. Knowing that free pages
are full of a specific pattern can be handy for the hypervisor
for a variety of reasons. E.g. compression/deduplication?


> "Let the hypervisor know that we are expecting a specific value to be
> written back in balloon pages."



> -> Okay, that talks about "balloon pages", which would include right now
> -- pages "inflated" and then "deflated" using free page hinting
> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
>    queue

ATM, in this case driver calls "free" and that fills page with the
poison value.

> -- pages "inflated" using free page reporting and automatically
>    "deflated" on access
> 
> So VIRTIO_BALLOON_F_PAGE_POISON really means "whenever the guest
> deflates a page (either explicitly, or implicitly with free page
> reporting), it is filled with "poison_val".

It might be a valid optimization to allow driver to skip
poisoning of freed pages in this case.

> And I would add
> 
> "However, if the inflated page was not filled with "poison_val" when
> inflating, it's not predictable if the original page or a page filled
> with "poison_val" is returned."
> 
> Which would cover the "we did not discard the page in the hypervisor, so
> the original page is still there".
> 
> 
> We should also document what is expected to happen if "poison_val" is
> suddenly changed by the guest at one point in time again. (e.g., not
> supported, unexpected things can happen, etc.)

Right. I think we should require that this can only be changed
before features have been negotiated.
That is the only point where hypervisor can still fail
gracefully (i.e. fail FEATURES_OK).


> Also, we might have to
> document that a device reset resets the poison_val to 0. (not sure if
> that's really necessary)

Probably yes, for things like kexec.

> -------------------------------------------------------------------------
> What we have to do in the guest/Linux:
> -------------------------------------------------------------------------
> 
> A guest which relies on this (esp., free page reporting in Linux only,
> right?), has to not use/advertise VIRTIO_BALLOON_F_REPORTING *in case
> VIRTIO_BALLOON_F_PAGE_POISON is not provided* by the host. AFAIKS,
> ordinary inflation/deflation and free page hinting does currently not
> care, as we go via free_page(), so that is currently fine AFAIKs.
> 
> -------------------------------------------------------------------------
> What we have to do in the hypervisor/QEMU:
> -------------------------------------------------------------------------
> 
> With VIRTIO_BALLOON_F_PAGE_POISON, we could provide free page reporting
> easily IFF "page_val"==0. However, as you said, it will currently be
> expensive in case of postcopy, as the guest still zeroes out all pages.
> Document that properly.
> 
> With VIRTIO_BALLOON_F_PAGE_POISON, don't inflate any pages right now
> (not discarding anything), OR fill the pages with poison_val when
> deflating. I guess the latter would be better - even if current Linux
> does not need it, it's what we are expected to do AFAIKS.
> 
> With VIRTIO_BALLOON_F_PAGE_POISON and page_val != 0, discard all free
> page reporting attempts. (== what your patch #3 does). Document that
> properly.
> 
> 
> Makes sense? See below for guest migration thingies.
> 
> > 
> >> I could imagine that we also have to care about ordinary page inflation/deflation when handling page poisoning. (Iow, don‘t inflate/deflate if set for now).
> > 
> > The problem is this was broken from the start for that. The issue is
> > that the poison feature was wrapped up within the page hinting
> > feature. So as a result enabling it for a VM that doesn't recognize
> > the feature independently would likely leave the poison value
> > uninitialized and flagging as though a value of 0 is used. In addition
> > the original poison checking wasn't making sure that the page wasn't
> > init_on_free which has the same effect as page poisoning but isn't
> > page poisoning.
> 
> If the guest agrees to have VIRTIO_BALLOON_F_PAGE_POISON but does not
> initialize poison_val, then it's a guest bug, I wouldn't worry about
> that for now.
> 
> > 
> >>>
> >>> The worst case scenario would be one in which the VM was suspended for
> >>> migration while there were still pages in the balloon that needed to
> >>> be drained. In such a case you would have them in an indeterminate
> >>> state since the last page poisoning for them might have been ignored
> >>> since they were marked as "free", so they are just going to be
> >>> whatever value they were if they had been migrated at all.
> >>>
> >>>>>
> >>>>>>
> >>>>>>> +        return;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>>     if (s->free_page_report_cmd_id == UINT_MAX) {
> >>>>>>>         s->free_page_report_cmd_id =
> >>>>>>>                        VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> >>>>>>
> >>>>>> We should rename all "free_page_report" stuff we can to
> >>>>>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> >>>>>> side :) ) before adding free page reporting .
> >>>>>>
> >>>>>> (looking at the virtio-balloon linux header, it's also confusing but
> >>>>>> we're stuck with that - maybe we should add better comments)
> >>>>>
> >>>>> Are we stuck? Couldn't we just convert it to an anonymous union with
> >>>>> free_page_hint_cmd_id and then use that where needed?
> >>>>
> >>>> Saw your patch already :)
> >>>>
> >>>>>
> >>>>>>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> >>>>>>>     if (s->qemu_4_0_config_size) {
> >>>>>>>         return sizeof(struct virtio_balloon_config);
> >>>>>>>     }
> >>>>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> >>>>>>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> >>>>>>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>>>>>         return sizeof(struct virtio_balloon_config);
> >>>>>>>     }
> >>>>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>>>>> -        return offsetof(struct virtio_balloon_config, poison_val);
> >>>>>>> -    }
> >>>>>>
> >>>>>> I am not sure this change is completely sane. Why is that necessary at all?
> >>>>>
> >>>>> The poison_val is stored at the end of the structure and is required
> >>>>> in order to make free page hinting to work. What this change is doing
> >>>>
> >>>> Required to make it work? In the kernel,
> >>>>
> >>>> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
> >>>> Author: Wei Wang <wei.w.wang@intel.com>
> >>>> Date:   Mon Aug 27 09:32:19 2018 +0800
> >>>>
> >>>>    virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
> >>>>
> >>>> was merged after
> >>>>
> >>>> commit 86a559787e6f5cf662c081363f64a20cad654195
> >>>> Author: Wei Wang <wei.w.wang@intel.com>
> >>>> Date:   Mon Aug 27 09:32:17 2018 +0800
> >>>>
> >>>>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> >>>>
> >>>> So I assume it's perfectly fine to not have it.
> >>>>
> >>>> I'd say it's the responsibility of the guest to *not* use
> >>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
> >>>> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
> >>>> into the foot.
> >>>
> >>> Based on the timing I am guessing the page poisoning was in the same
> >>> patch set as the free page hinting since there is only 2 seconds
> >>> between when the two are merged. If I recall the page poisoning logic
> >>> was added after the issue was pointed out that it needed to address
> >>> it.
> >>>
> >>
> >> Yeah, but any other OS implementing this, not caring about page poisoning wouldn‘t need it. Maybe there is something in the spec.
> >>
> >> Mental note: we‘ll have to migrate the poison value if not already done (writing on my mobile).
> > 
> > Right. We need to make sure any poison or init on free is migrated
> > over to the destination before we can say we are going to skip the
> > migration. If anything what I probably ought to look into would be if
> > we could change the code so that if we have a hint the page is unused,
> > poison is enabled, and the value is 0 we just ship over a 0 page
> > instead of giving up on hinting entirely.
> > 
> 
> Yeah, we have to migrate poison_val from source to destination. Also, we
> should worry about us losing the page poisoning feature when migrating
> from source to destination.
> 
> Thinking about it, it might make sense to completely decouple page
> poisoning here. Assign it a separate property (as you did), default
> enable it, but disable it via QEMU compat machines.
> 
> Then, we won't lose the feature when migrating.
> 
> -- 
> Thanks,
> 
> David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-16 14:33                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 45+ messages in thread
From: Michael S. Tsirkin @ 2020-04-16 14:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Alexander Duyck, David Hildenbrand, Wang, Wei W, Paolo Bonzini,
	virtio-dev, qemu-devel

On Thu, Apr 16, 2020 at 10:18:49AM +0200, David Hildenbrand wrote:
> >>
> >> Postcopy is a very good point, bought!
> >>
> >> But (what you wrote above) it sounds like that this is really what we *have to* do, not an optimization. I‘ll double check the spec tomorrow (hopefully it was documented). We should rephrase the comment then.
> > 
> > Do you have a link to the spec that I could look at? I am not hopeful
> > that this will be listed there since the poison side of QEMU was never
> > implemented. The flag is only here because it was copied over in the
> > kernel header.
> 
> Introducing a feature without
> 
> a) specification what it does
> b) implementation (of both sides) showing what has to be done
> c) any kind of documentation of what needs to be done
> 
> just horrible.
> 
> The latest-greatest spec lives in
> 
> https://github.com/oasis-tcs/virtio-spec.git
> 
> I can't spot any signs of free page hinting/page poisioning. :(

Right. I merged the hinting support in Linux on the hope that
spec and qemu upstream will surface later. It seemed so
since IIRC there were some drafts (even though I don't
have any links to hand). Unfortunately neither happened.



> We should document our result of page poisoning, free page hinting, and
> free page reporting there as well. I hope you'll have time for the latter.
> 
> -------------------------------------------------------------------------
> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
> -------------------------------------------------------------------------
> 
> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
> guest is using page poisoning. Guest writes to the poison_val config
> field to tell host about the page poisoning value that is in use."
> -> Very little information, no signs about what has to be done.

I think it's an informational field. Knowing that free pages
are full of a specific pattern can be handy for the hypervisor
for a variety of reasons. E.g. compression/deduplication?


> "Let the hypervisor know that we are expecting a specific value to be
> written back in balloon pages."



> -> Okay, that talks about "balloon pages", which would include right now
> -- pages "inflated" and then "deflated" using free page hinting
> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
>    queue

ATM, in this case driver calls "free" and that fills page with the
poison value.

> -- pages "inflated" using free page reporting and automatically
>    "deflated" on access
> 
> So VIRTIO_BALLOON_F_PAGE_POISON really means "whenever the guest
> deflates a page (either explicitly, or implicitly with free page
> reporting), it is filled with "poison_val".

It might be a valid optimization to allow driver to skip
poisoning of freed pages in this case.

> And I would add
> 
> "However, if the inflated page was not filled with "poison_val" when
> inflating, it's not predictable if the original page or a page filled
> with "poison_val" is returned."
> 
> Which would cover the "we did not discard the page in the hypervisor, so
> the original page is still there".
> 
> 
> We should also document what is expected to happen if "poison_val" is
> suddenly changed by the guest at one point in time again. (e.g., not
> supported, unexpected things can happen, etc.)

Right. I think we should require that this can only be changed
before features have been negotiated.
That is the only point where hypervisor can still fail
gracefully (i.e. fail FEATURES_OK).


> Also, we might have to
> document that a device reset resets the poison_val to 0. (not sure if
> that's really necessary)

Probably yes, for things like kexec.

> -------------------------------------------------------------------------
> What we have to do in the guest/Linux:
> -------------------------------------------------------------------------
> 
> A guest which relies on this (esp., free page reporting in Linux only,
> right?), has to not use/advertise VIRTIO_BALLOON_F_REPORTING *in case
> VIRTIO_BALLOON_F_PAGE_POISON is not provided* by the host. AFAIKS,
> ordinary inflation/deflation and free page hinting does currently not
> care, as we go via free_page(), so that is currently fine AFAIKs.
> 
> -------------------------------------------------------------------------
> What we have to do in the hypervisor/QEMU:
> -------------------------------------------------------------------------
> 
> With VIRTIO_BALLOON_F_PAGE_POISON, we could provide free page reporting
> easily IFF "page_val"==0. However, as you said, it will currently be
> expensive in case of postcopy, as the guest still zeroes out all pages.
> Document that properly.
> 
> With VIRTIO_BALLOON_F_PAGE_POISON, don't inflate any pages right now
> (not discarding anything), OR fill the pages with poison_val when
> deflating. I guess the latter would be better - even if current Linux
> does not need it, it's what we are expected to do AFAIKS.
> 
> With VIRTIO_BALLOON_F_PAGE_POISON and page_val != 0, discard all free
> page reporting attempts. (== what your patch #3 does). Document that
> properly.
> 
> 
> Makes sense? See below for guest migration thingies.
> 
> > 
> >> I could imagine that we also have to care about ordinary page inflation/deflation when handling page poisoning. (Iow, don‘t inflate/deflate if set for now).
> > 
> > The problem is this was broken from the start for that. The issue is
> > that the poison feature was wrapped up within the page hinting
> > feature. So as a result enabling it for a VM that doesn't recognize
> > the feature independently would likely leave the poison value
> > uninitialized and flagging as though a value of 0 is used. In addition
> > the original poison checking wasn't making sure that the page wasn't
> > init_on_free which has the same effect as page poisoning but isn't
> > page poisoning.
> 
> If the guest agrees to have VIRTIO_BALLOON_F_PAGE_POISON but does not
> initialize poison_val, then it's a guest bug, I wouldn't worry about
> that for now.
> 
> > 
> >>>
> >>> The worst case scenario would be one in which the VM was suspended for
> >>> migration while there were still pages in the balloon that needed to
> >>> be drained. In such a case you would have them in an indeterminate
> >>> state since the last page poisoning for them might have been ignored
> >>> since they were marked as "free", so they are just going to be
> >>> whatever value they were if they had been migrated at all.
> >>>
> >>>>>
> >>>>>>
> >>>>>>> +        return;
> >>>>>>> +    }
> >>>>>>> +
> >>>>>>>     if (s->free_page_report_cmd_id == UINT_MAX) {
> >>>>>>>         s->free_page_report_cmd_id =
> >>>>>>>                        VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> >>>>>>
> >>>>>> We should rename all "free_page_report" stuff we can to
> >>>>>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> >>>>>> side :) ) before adding free page reporting .
> >>>>>>
> >>>>>> (looking at the virtio-balloon linux header, it's also confusing but
> >>>>>> we're stuck with that - maybe we should add better comments)
> >>>>>
> >>>>> Are we stuck? Couldn't we just convert it to an anonymous union with
> >>>>> free_page_hint_cmd_id and then use that where needed?
> >>>>
> >>>> Saw your patch already :)
> >>>>
> >>>>>
> >>>>>>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> >>>>>>>     if (s->qemu_4_0_config_size) {
> >>>>>>>         return sizeof(struct virtio_balloon_config);
> >>>>>>>     }
> >>>>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> >>>>>>> +    if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> >>>>>>> +        virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>>>>>         return sizeof(struct virtio_balloon_config);
> >>>>>>>     }
> >>>>>>> -    if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>>>>>> -        return offsetof(struct virtio_balloon_config, poison_val);
> >>>>>>> -    }
> >>>>>>
> >>>>>> I am not sure this change is completely sane. Why is that necessary at all?
> >>>>>
> >>>>> The poison_val is stored at the end of the structure and is required
> >>>>> in order to make free page hinting to work. What this change is doing
> >>>>
> >>>> Required to make it work? In the kernel,
> >>>>
> >>>> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
> >>>> Author: Wei Wang <wei.w.wang@intel.com>
> >>>> Date:   Mon Aug 27 09:32:19 2018 +0800
> >>>>
> >>>>    virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
> >>>>
> >>>> was merged after
> >>>>
> >>>> commit 86a559787e6f5cf662c081363f64a20cad654195
> >>>> Author: Wei Wang <wei.w.wang@intel.com>
> >>>> Date:   Mon Aug 27 09:32:17 2018 +0800
> >>>>
> >>>>    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> >>>>
> >>>> So I assume it's perfectly fine to not have it.
> >>>>
> >>>> I'd say it's the responsibility of the guest to *not* use
> >>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
> >>>> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
> >>>> into the foot.
> >>>
> >>> Based on the timing I am guessing the page poisoning was in the same
> >>> patch set as the free page hinting since there is only 2 seconds
> >>> between when the two are merged. If I recall the page poisoning logic
> >>> was added after the issue was pointed out that it needed to address
> >>> it.
> >>>
> >>
> >> Yeah, but any other OS implementing this, not caring about page poisoning wouldn‘t need it. Maybe there is something in the spec.
> >>
> >> Mental note: we‘ll have to migrate the poison value if not already done (writing on my mobile).
> > 
> > Right. We need to make sure any poison or init on free is migrated
> > over to the destination before we can say we are going to skip the
> > migration. If anything what I probably ought to look into would be if
> > we could change the code so that if we have a hint the page is unused,
> > poison is enabled, and the value is 0 we just ship over a 0 page
> > instead of giving up on hinting entirely.
> > 
> 
> Yeah, we have to migrate poison_val from source to destination. Also, we
> should worry about us losing the page poisoning feature when migrating
> from source to destination.
> 
> Thinking about it, it might make sense to completely decouple page
> poisoning here. Assign it a separate property (as you did), default
> enable it, but disable it via QEMU compat machines.
> 
> Then, we won't lose the feature when migrating.
> 
> -- 
> Thanks,
> 
> David / dhildenb


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-16 14:33                   ` [virtio-dev] " Michael S. Tsirkin
@ 2020-04-16 14:55                     ` David Hildenbrand
  -1 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-16 14:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, qemu-devel, Alexander Duyck, Wang, Wei W,
	David Hildenbrand, Paolo Bonzini

>> We should document our result of page poisoning, free page hinting, and
>> free page reporting there as well. I hope you'll have time for the latter.
>>
>> -------------------------------------------------------------------------
>> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
>> -------------------------------------------------------------------------
>>
>> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
>> guest is using page poisoning. Guest writes to the poison_val config
>> field to tell host about the page poisoning value that is in use."
>> -> Very little information, no signs about what has to be done.
> 
> I think it's an informational field. Knowing that free pages
> are full of a specific pattern can be handy for the hypervisor
> for a variety of reasons. E.g. compression/deduplication?

I was referring to the documentation of the feature and what we
(hypervisor) are expected to do (in regards to inflation/deflation).

Yes, it might be valuable to know that the guest is using poisoning. I
assume compression/deduplication (IOW KSM) will figure out themselves
that such pages are equal.

>> "Let the hypervisor know that we are expecting a specific value to be
>> written back in balloon pages."
> 
> 
> 
>> -> Okay, that talks about "balloon pages", which would include right now
>> -- pages "inflated" and then "deflated" using free page hinting
>> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
>>    queue
> 
> ATM, in this case driver calls "free" and that fills page with the
> poison value.

Yes, that's what I mentioned somehwere, it's currently done by Linux and ...

> 
> It might be a valid optimization to allow driver to skip
> poisoning of freed pages in this case.

... we should prepare for that :)

> 
>> And I would add
>>
>> "However, if the inflated page was not filled with "poison_val" when
>> inflating, it's not predictable if the original page or a page filled
>> with "poison_val" is returned."
>>
>> Which would cover the "we did not discard the page in the hypervisor, so
>> the original page is still there".
>>
>>
>> We should also document what is expected to happen if "poison_val" is
>> suddenly changed by the guest at one point in time again. (e.g., not
>> supported, unexpected things can happen, etc.)
> 
> Right. I think we should require that this can only be changed
> before features have been negotiated.
> That is the only point where hypervisor can still fail
> gracefully (i.e. fail FEATURES_OK).

Agreed.

I can totally understand if Alex would want to stop working on
VIRTIO_BALLOON_F_PAGE_POISON at this point and only fix the guest to not
enable free page reporting in case we don't have
VIRTIO_BALLOON_F_PAGE_POISON (unless that's already done), lol. :)

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-16 14:55                     ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-16 14:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, qemu-devel, Alexander Duyck, Wang, Wei W,
	David Hildenbrand, Paolo Bonzini

>> We should document our result of page poisoning, free page hinting, and
>> free page reporting there as well. I hope you'll have time for the latter.
>>
>> -------------------------------------------------------------------------
>> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
>> -------------------------------------------------------------------------
>>
>> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
>> guest is using page poisoning. Guest writes to the poison_val config
>> field to tell host about the page poisoning value that is in use."
>> -> Very little information, no signs about what has to be done.
> 
> I think it's an informational field. Knowing that free pages
> are full of a specific pattern can be handy for the hypervisor
> for a variety of reasons. E.g. compression/deduplication?

I was referring to the documentation of the feature and what we
(hypervisor) are expected to do (in regards to inflation/deflation).

Yes, it might be valuable to know that the guest is using poisoning. I
assume compression/deduplication (IOW KSM) will figure out themselves
that such pages are equal.

>> "Let the hypervisor know that we are expecting a specific value to be
>> written back in balloon pages."
> 
> 
> 
>> -> Okay, that talks about "balloon pages", which would include right now
>> -- pages "inflated" and then "deflated" using free page hinting
>> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
>>    queue
> 
> ATM, in this case driver calls "free" and that fills page with the
> poison value.

Yes, that's what I mentioned somehwere, it's currently done by Linux and ...

> 
> It might be a valid optimization to allow driver to skip
> poisoning of freed pages in this case.

... we should prepare for that :)

> 
>> And I would add
>>
>> "However, if the inflated page was not filled with "poison_val" when
>> inflating, it's not predictable if the original page or a page filled
>> with "poison_val" is returned."
>>
>> Which would cover the "we did not discard the page in the hypervisor, so
>> the original page is still there".
>>
>>
>> We should also document what is expected to happen if "poison_val" is
>> suddenly changed by the guest at one point in time again. (e.g., not
>> supported, unexpected things can happen, etc.)
> 
> Right. I think we should require that this can only be changed
> before features have been negotiated.
> That is the only point where hypervisor can still fail
> gracefully (i.e. fail FEATURES_OK).

Agreed.

I can totally understand if Alex would want to stop working on
VIRTIO_BALLOON_F_PAGE_POISON at this point and only fix the guest to not
enable free page reporting in case we don't have
VIRTIO_BALLOON_F_PAGE_POISON (unless that's already done), lol. :)

-- 
Thanks,

David / dhildenb


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-16 14:55                     ` [virtio-dev] " David Hildenbrand
@ 2020-04-16 18:21                       ` Alexander Duyck
  -1 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-16 18:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: virtio-dev, Michael S. Tsirkin, qemu-devel, Wang, Wei W,
	David Hildenbrand, Paolo Bonzini

On Thu, Apr 16, 2020 at 7:55 AM David Hildenbrand <david@redhat.com> wrote:
>
> >> We should document our result of page poisoning, free page hinting, and
> >> free page reporting there as well. I hope you'll have time for the latter.
> >>
> >> -------------------------------------------------------------------------
> >> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
> >> -------------------------------------------------------------------------
> >>
> >> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
> >> guest is using page poisoning. Guest writes to the poison_val config
> >> field to tell host about the page poisoning value that is in use."
> >> -> Very little information, no signs about what has to be done.
> >
> > I think it's an informational field. Knowing that free pages
> > are full of a specific pattern can be handy for the hypervisor
> > for a variety of reasons. E.g. compression/deduplication?
>
> I was referring to the documentation of the feature and what we
> (hypervisor) are expected to do (in regards to inflation/deflation).
>
> Yes, it might be valuable to know that the guest is using poisoning. I
> assume compression/deduplication (IOW KSM) will figure out themselves
> that such pages are equal.

The other thing to keep in mind is that the poison value only really
comes into play with hinting/reporting. In the case of the standard
balloon the pages are considered allocated from the guest's
perspective until the balloon is deflated. Then any poison/init will
occur over again anyway so I don't think the standard balloon should
really care.

For hinting it somewhat depends. Currently the implementation is
inflating a balloon so having poisoning or init_on_free means it is
written to immediately after it is freed so it defeats the purpose of
the hinting. However that is a Linux implementation issue, not
necessarily an issue with the QEMU implementation. As such may be I
should fix that in the Linux driver since that has been ignored in
QEMU up until now anyway. The more interesting bit is what should the
behavior be from the hypervisor when a page is marked as being hinted.
I think right now the behavior is to just not migrate the page. I
wonder though if we shouldn't instead just consider the page a zero
page, and then maybe modify the zero page behavior for the case where
we know page poisoning is enabled.

For reporting it is a matter of tracking the contents. We don't want
to modify the contents in any way as we are attempting to essentially
do in-place tracking of the page. So if it is poisoned or initialized
it needs to stay in that state so we cannot invalidate the page if
doing so will cause it to lose state information.

> >> "Let the hypervisor know that we are expecting a specific value to be
> >> written back in balloon pages."
> >
> >
> >
> >> -> Okay, that talks about "balloon pages", which would include right now
> >> -- pages "inflated" and then "deflated" using free page hinting
> >> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
> >>    queue
> >
> > ATM, in this case driver calls "free" and that fills page with the
> > poison value.
>
> Yes, that's what I mentioned somehwere, it's currently done by Linux and ...
>
> >
> > It might be a valid optimization to allow driver to skip
> > poisoning of freed pages in this case.
>
> ... we should prepare for that :)

Agreed.

> >
> >> And I would add
> >>
> >> "However, if the inflated page was not filled with "poison_val" when
> >> inflating, it's not predictable if the original page or a page filled
> >> with "poison_val" is returned."
> >>
> >> Which would cover the "we did not discard the page in the hypervisor, so
> >> the original page is still there".
> >>
> >>
> >> We should also document what is expected to happen if "poison_val" is
> >> suddenly changed by the guest at one point in time again. (e.g., not
> >> supported, unexpected things can happen, etc.)
> >
> > Right. I think we should require that this can only be changed
> > before features have been negotiated.
> > That is the only point where hypervisor can still fail
> > gracefully (i.e. fail FEATURES_OK).
>
> Agreed.

I believe that is the current behavior. Essentially if poisoning
enabled then the feature flag is left set. I think the one change I
will make in the driver is that if poisoning is enabled in the kernel,
but PAGE_POISON is not available as a feature, I am going to disable
both the reporting and hinting features in virtballoon_validate.

> I can totally understand if Alex would want to stop working on
> VIRTIO_BALLOON_F_PAGE_POISON at this point and only fix the guest to not
> enable free page reporting in case we don't have
> VIRTIO_BALLOON_F_PAGE_POISON (unless that's already done), lol. :)

I already have a patch for that.

The bigger issue is how to deal with the PAGE_POISON being enabled
with FREE_PAGE_HINTING. The legacy code at this point is just broken
and is plowing through with FREE_PAGE_HINTING while it is enabled.
That is safe for now because it is using a balloon, the side effect is
that it is going to defer migration. If it switches to a page
reporting type setup at some point in the future we would need to make
sure something is written to the other end to identify the poison/zero
pages.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-16 18:21                       ` Alexander Duyck
  0 siblings, 0 replies; 45+ messages in thread
From: Alexander Duyck @ 2020-04-16 18:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michael S. Tsirkin, virtio-dev, qemu-devel, Wang, Wei W,
	David Hildenbrand, Paolo Bonzini

On Thu, Apr 16, 2020 at 7:55 AM David Hildenbrand <david@redhat.com> wrote:
>
> >> We should document our result of page poisoning, free page hinting, and
> >> free page reporting there as well. I hope you'll have time for the latter.
> >>
> >> -------------------------------------------------------------------------
> >> Semantics of VIRTIO_BALLOON_F_PAGE_POISON
> >> -------------------------------------------------------------------------
> >>
> >> "The VIRTIO_BALLOON_F_PAGE_POISON feature bit is used to indicate if the
> >> guest is using page poisoning. Guest writes to the poison_val config
> >> field to tell host about the page poisoning value that is in use."
> >> -> Very little information, no signs about what has to be done.
> >
> > I think it's an informational field. Knowing that free pages
> > are full of a specific pattern can be handy for the hypervisor
> > for a variety of reasons. E.g. compression/deduplication?
>
> I was referring to the documentation of the feature and what we
> (hypervisor) are expected to do (in regards to inflation/deflation).
>
> Yes, it might be valuable to know that the guest is using poisoning. I
> assume compression/deduplication (IOW KSM) will figure out themselves
> that such pages are equal.

The other thing to keep in mind is that the poison value only really
comes into play with hinting/reporting. In the case of the standard
balloon the pages are considered allocated from the guest's
perspective until the balloon is deflated. Then any poison/init will
occur over again anyway so I don't think the standard balloon should
really care.

For hinting it somewhat depends. Currently the implementation is
inflating a balloon so having poisoning or init_on_free means it is
written to immediately after it is freed so it defeats the purpose of
the hinting. However that is a Linux implementation issue, not
necessarily an issue with the QEMU implementation. As such may be I
should fix that in the Linux driver since that has been ignored in
QEMU up until now anyway. The more interesting bit is what should the
behavior be from the hypervisor when a page is marked as being hinted.
I think right now the behavior is to just not migrate the page. I
wonder though if we shouldn't instead just consider the page a zero
page, and then maybe modify the zero page behavior for the case where
we know page poisoning is enabled.

For reporting it is a matter of tracking the contents. We don't want
to modify the contents in any way as we are attempting to essentially
do in-place tracking of the page. So if it is poisoned or initialized
it needs to stay in that state so we cannot invalidate the page if
doing so will cause it to lose state information.

> >> "Let the hypervisor know that we are expecting a specific value to be
> >> written back in balloon pages."
> >
> >
> >
> >> -> Okay, that talks about "balloon pages", which would include right now
> >> -- pages "inflated" and then "deflated" using free page hinting
> >> -- pages "inflated" and then "deflated" using oridnary inflate/deflate
> >>    queue
> >
> > ATM, in this case driver calls "free" and that fills page with the
> > poison value.
>
> Yes, that's what I mentioned somehwere, it's currently done by Linux and ...
>
> >
> > It might be a valid optimization to allow driver to skip
> > poisoning of freed pages in this case.
>
> ... we should prepare for that :)

Agreed.

> >
> >> And I would add
> >>
> >> "However, if the inflated page was not filled with "poison_val" when
> >> inflating, it's not predictable if the original page or a page filled
> >> with "poison_val" is returned."
> >>
> >> Which would cover the "we did not discard the page in the hypervisor, so
> >> the original page is still there".
> >>
> >>
> >> We should also document what is expected to happen if "poison_val" is
> >> suddenly changed by the guest at one point in time again. (e.g., not
> >> supported, unexpected things can happen, etc.)
> >
> > Right. I think we should require that this can only be changed
> > before features have been negotiated.
> > That is the only point where hypervisor can still fail
> > gracefully (i.e. fail FEATURES_OK).
>
> Agreed.

I believe that is the current behavior. Essentially if poisoning
enabled then the feature flag is left set. I think the one change I
will make in the driver is that if poisoning is enabled in the kernel,
but PAGE_POISON is not available as a feature, I am going to disable
both the reporting and hinting features in virtballoon_validate.

> I can totally understand if Alex would want to stop working on
> VIRTIO_BALLOON_F_PAGE_POISON at this point and only fix the guest to not
> enable free page reporting in case we don't have
> VIRTIO_BALLOON_F_PAGE_POISON (unless that's already done), lol. :)

I already have a patch for that.

The bigger issue is how to deal with the PAGE_POISON being enabled
with FREE_PAGE_HINTING. The legacy code at this point is just broken
and is plowing through with FREE_PAGE_HINTING while it is enabled.
That is safe for now because it is using a balloon, the side effect is
that it is going to defer migration. If it switches to a page
reporting type setup at some point in the future we would need to make
sure something is written to the other end to identify the poison/zero
pages.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
  2020-04-16 18:21                       ` [virtio-dev] " Alexander Duyck
@ 2020-04-16 18:33                         ` David Hildenbrand
  -1 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-16 18:33 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: virtio-dev, Michael S. Tsirkin, qemu-devel, Wang, Wei W,
	David Hildenbrand, Paolo Bonzini

> The other thing to keep in mind is that the poison value only really
> comes into play with hinting/reporting. In the case of the standard
> balloon the pages are considered allocated from the guest's

Currently just as free page hinting IMHO. They are temporarily
considered allocated.

> perspective until the balloon is deflated. Then any poison/init will
> occur over again anyway so I don't think the standard balloon should
> really care.

I think we should make this consistent. And as we discuss below, this
allows for a nice optimization in the guest even for ordinary
inflation/deflation (no need to zero out/poison again when giving the
pages back to the buddy).

> 
> For hinting it somewhat depends. Currently the implementation is
> inflating a balloon so having poisoning or init_on_free means it is
> written to immediately after it is freed so it defeats the purpose of
> the hinting. However that is a Linux implementation issue, not

Yeah, and as we discuss below, we can optimize that later in Linux. It's
sub-optimal, I agree.

> necessarily an issue with the QEMU implementation. As such may be I
> should fix that in the Linux driver since that has been ignored in
> QEMU up until now anyway. The more interesting bit is what should the
> behavior be from the hypervisor when a page is marked as being hinted.
> I think right now the behavior is to just not migrate the page. I
> wonder though if we shouldn't instead just consider the page a zero
> page, and then maybe modify the zero page behavior for the case where
> we know page poisoning is enabled.

I consider that maybe future work. Let's keep it simple for now (iow,
try to get page poisoning handling right first). The optimize the guest
handling on balloon deflation / end of free page hinting.

[...]

>> I can totally understand if Alex would want to stop working on
>> VIRTIO_BALLOON_F_PAGE_POISON at this point and only fix the guest to not
>> enable free page reporting in case we don't have
>> VIRTIO_BALLOON_F_PAGE_POISON (unless that's already done), lol. :)
> 
> I already have a patch for that.
> 
> The bigger issue is how to deal with the PAGE_POISON being enabled
> with FREE_PAGE_HINTING. The legacy code at this point is just broken
> and is plowing through with FREE_PAGE_HINTING while it is enabled.
> That is safe for now because it is using a balloon, the side effect is
> that it is going to defer migration. If it switches to a page
> reporting type setup at some point in the future we would need to make
> sure something is written to the other end to identify the poison/zero
> pages.


I think we don't have to worry about that for now. Might be sub-optimal,
but then, I don't think actual page poisoning isn't all that frequently
used in production setups.


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
@ 2020-04-16 18:33                         ` David Hildenbrand
  0 siblings, 0 replies; 45+ messages in thread
From: David Hildenbrand @ 2020-04-16 18:33 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Michael S. Tsirkin, virtio-dev, qemu-devel, Wang, Wei W,
	David Hildenbrand, Paolo Bonzini

> The other thing to keep in mind is that the poison value only really
> comes into play with hinting/reporting. In the case of the standard
> balloon the pages are considered allocated from the guest's

Currently just as free page hinting IMHO. They are temporarily
considered allocated.

> perspective until the balloon is deflated. Then any poison/init will
> occur over again anyway so I don't think the standard balloon should
> really care.

I think we should make this consistent. And as we discuss below, this
allows for a nice optimization in the guest even for ordinary
inflation/deflation (no need to zero out/poison again when giving the
pages back to the buddy).

> 
> For hinting it somewhat depends. Currently the implementation is
> inflating a balloon so having poisoning or init_on_free means it is
> written to immediately after it is freed so it defeats the purpose of
> the hinting. However that is a Linux implementation issue, not

Yeah, and as we discuss below, we can optimize that later in Linux. It's
sub-optimal, I agree.

> necessarily an issue with the QEMU implementation. As such may be I
> should fix that in the Linux driver since that has been ignored in
> QEMU up until now anyway. The more interesting bit is what should the
> behavior be from the hypervisor when a page is marked as being hinted.
> I think right now the behavior is to just not migrate the page. I
> wonder though if we shouldn't instead just consider the page a zero
> page, and then maybe modify the zero page behavior for the case where
> we know page poisoning is enabled.

I consider that maybe future work. Let's keep it simple for now (iow,
try to get page poisoning handling right first). The optimize the guest
handling on balloon deflation / end of free page hinting.

[...]

>> I can totally understand if Alex would want to stop working on
>> VIRTIO_BALLOON_F_PAGE_POISON at this point and only fix the guest to not
>> enable free page reporting in case we don't have
>> VIRTIO_BALLOON_F_PAGE_POISON (unless that's already done), lol. :)
> 
> I already have a patch for that.
> 
> The bigger issue is how to deal with the PAGE_POISON being enabled
> with FREE_PAGE_HINTING. The legacy code at this point is just broken
> and is plowing through with FREE_PAGE_HINTING while it is enabled.
> That is safe for now because it is using a balloon, the side effect is
> that it is going to defer migration. If it switches to a page
> reporting type setup at some point in the future we would need to make
> sure something is written to the other end to identify the poison/zero
> pages.


I think we don't have to worry about that for now. Might be sub-optimal,
but then, I don't think actual page poisoning isn't all that frequently
used in production setups.


-- 
Thanks,

David / dhildenb


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2020-04-16 18:35 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-10  3:41 [PATCH v19 QEMU 0/4] virtio-balloon: add support for free page reporting Alexander Duyck
2020-04-10  3:41 ` [virtio-dev] " Alexander Duyck
2020-04-10  3:41 ` [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature Alexander Duyck
2020-04-10  3:41   ` [virtio-dev] " Alexander Duyck
2020-04-15  8:08   ` David Hildenbrand
2020-04-15  8:08     ` [virtio-dev] " David Hildenbrand
2020-04-15 17:17     ` Alexander Duyck
2020-04-15 17:17       ` [virtio-dev] " Alexander Duyck
2020-04-15 18:16       ` David Hildenbrand
2020-04-15 18:16         ` [virtio-dev] " David Hildenbrand
2020-04-15 19:28         ` Alexander Duyck
2020-04-15 19:28           ` [virtio-dev] " Alexander Duyck
2020-04-15 19:46           ` David Hildenbrand
2020-04-15 21:16             ` Alexander Duyck
2020-04-15 21:16               ` [virtio-dev] " Alexander Duyck
2020-04-16  8:18               ` David Hildenbrand
2020-04-16  8:18                 ` [virtio-dev] " David Hildenbrand
2020-04-16  8:36                 ` David Hildenbrand
2020-04-16  8:36                   ` [virtio-dev] " David Hildenbrand
2020-04-16 14:33                 ` Michael S. Tsirkin
2020-04-16 14:33                   ` [virtio-dev] " Michael S. Tsirkin
2020-04-16 14:55                   ` David Hildenbrand
2020-04-16 14:55                     ` [virtio-dev] " David Hildenbrand
2020-04-16 18:21                     ` Alexander Duyck
2020-04-16 18:21                       ` [virtio-dev] " Alexander Duyck
2020-04-16 18:33                       ` David Hildenbrand
2020-04-16 18:33                         ` [virtio-dev] " David Hildenbrand
2020-04-10  3:41 ` [PATCH v19 QEMU 2/4] linux-headers: update to contain virito-balloon free page reporting Alexander Duyck
2020-04-10  3:41   ` [virtio-dev] " Alexander Duyck
2020-04-10  3:41 ` [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for " Alexander Duyck
2020-04-10  3:41   ` [virtio-dev] " Alexander Duyck
2020-04-15  8:17   ` David Hildenbrand
2020-04-15  8:17     ` [virtio-dev] " David Hildenbrand
2020-04-15  9:03     ` David Hildenbrand
2020-04-15  9:03       ` [virtio-dev] " David Hildenbrand
2020-04-15 15:31     ` Alexander Duyck
2020-04-15 15:31       ` [virtio-dev] " Alexander Duyck
2020-04-10  3:41 ` [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions Alexander Duyck
2020-04-10  3:41   ` [virtio-dev] " Alexander Duyck
2020-04-10 10:50   ` Paolo Bonzini
2020-04-10 10:50     ` [virtio-dev] " Paolo Bonzini
2020-04-13 22:48     ` Alexander Duyck
2020-04-13 22:48       ` [virtio-dev] " Alexander Duyck
2020-04-14  7:36       ` David Hildenbrand
2020-04-14  7:36         ` [virtio-dev] " David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.