All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFCv2 for-4.0 0/5] Improve balloon handling of pagesizes other than 4kiB
@ 2018-12-05  5:06 David Gibson
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate David Gibson
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: David Gibson @ 2018-12-05  5:06 UTC (permalink / raw)
  To: dhildenb, imammedo, ehabkost
  Cc: mst, pbonzini, qemu-devel, qemu-ppc, David Gibson

The virtio-balloon devices was never really thought out for cases
other than 4kiB pagesize on both guest and host.  It works in some
cases, but in others can be ineffectual or even cause guest memory
corruption.

This series makes a handful of preliminary cleanups, then makes a
change to safely, though not perfectly, handle cases with non 4kiB
pagesizes.

Changes since RFCv1:
 * Tweak warning reports in 5/5
 * More detailed motiviation in commit message for 1/5
 * Assorted minor adjustments

David Gibson (5):
  virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
  virtio-balloon: Corrections to address verification
  virtio-balloon: Rework ballon_page() interface
  virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
  virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size

 hw/virtio/virtio-balloon.c         | 100 ++++++++++++++++++++++++-----
 include/hw/virtio/virtio-balloon.h |   3 +
 2 files changed, 87 insertions(+), 16 deletions(-)

-- 
2.19.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFCv2 for-4.0 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate
  2018-12-05  5:06 [Qemu-devel] [RFCv2 for-4.0 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
@ 2018-12-05  5:06 ` David Gibson
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 2/5] virtio-balloon: Corrections to address verification David Gibson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2018-12-05  5:06 UTC (permalink / raw)
  To: dhildenb, imammedo, ehabkost
  Cc: mst, pbonzini, qemu-devel, qemu-ppc, David Gibson

When the balloon is inflated, we discard memory place in it using madvise()
with MADV_DONTNEED.  And when we deflate it we use MADV_WILLNEED, which
sounds like it makes sense but is actually unnecessary.

The misleadingly named MADV_DONTNEED just discards the memory in question,
it doesn't set any persistent state on it in-kernel; all that's necessary
to bring the memory back is to touch it.  MADV_WILLNEED in contrast
specifically says that the memory will be used soon and faults it in.

This patch simplify's the balloon operation by dropping the madvise()
on deflate.  This might have an impact on performance - it will move a
delay at deflate time until that memory is actually touched, which
might be more latency sensitive.  However:

  * Memory that's being given back to the guest by deflating the
    balloon *might* be used soon, but it equally could just sit around
    in the guest's pools until needed (or even be faulted out again if
    the host is under memory pressure).

  * Usually, the timescale over which you'll be adjusting the balloon
    is long enough that a few extra faults after deflation aren't
    going to make a difference.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1728e4f83a..6ec4bcf4e1 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -35,9 +35,8 @@
 
 static void balloon_page(void *addr, int deflate)
 {
-    if (!qemu_balloon_is_inhibited()) {
-        qemu_madvise(addr, BALLOON_PAGE_SIZE,
-                deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
+    if (!qemu_balloon_is_inhibited() && !deflate) {
+        qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
     }
 }
 
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFCv2 for-4.0 2/5] virtio-balloon: Corrections to address verification
  2018-12-05  5:06 [Qemu-devel] [RFCv2 for-4.0 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate David Gibson
@ 2018-12-05  5:06 ` David Gibson
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 3/5] virtio-balloon: Rework ballon_page() interface David Gibson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2018-12-05  5:06 UTC (permalink / raw)
  To: dhildenb, imammedo, ehabkost
  Cc: mst, pbonzini, qemu-devel, qemu-ppc, David Gibson

The virtio-balloon device's verification of the address given to it by the
guest has a number of faults:
    * The addresses here are guest physical addresses, which should be
      'hwaddr' rather than 'ram_addr_t' (the distinction is admittedly
      pretty subtle and confusing)
    * We don't check for section.mr being NULL, which is the main way that
      memory_region_find() reports basic failures.  We really need to check
      that before looking at any other section fields, because
      memory_region_find() doesn't initialize them on the failure path
    * We're passing a length of '1' to memory_region_find(), but really the
      guest is requesting that we put the entire page into the balloon,
      so it makes more sense to call it with BALLOON_PAGE_SIZE

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 6ec4bcf4e1..e8611aab0e 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -221,17 +221,20 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
         }
 
         while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
-            ram_addr_t pa;
-            ram_addr_t addr;
+            hwaddr pa;
+            hwaddr addr;
             int p = virtio_ldl_p(vdev, &pfn);
 
-            pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
+            pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
             offset += 4;
 
-            /* FIXME: remove get_system_memory(), but how? */
-            section = memory_region_find(get_system_memory(), pa, 1);
-            if (!int128_nz(section.size) ||
-                !memory_region_is_ram(section.mr) ||
+            section = memory_region_find(get_system_memory(), pa,
+                                         BALLOON_PAGE_SIZE);
+            if (!section.mr) {
+                trace_virtio_balloon_bad_addr(pa);
+                continue;
+            }
+            if (!memory_region_is_ram(section.mr) ||
                 memory_region_is_rom(section.mr) ||
                 memory_region_is_romd(section.mr)) {
                 trace_virtio_balloon_bad_addr(pa);
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFCv2 for-4.0 3/5] virtio-balloon: Rework ballon_page() interface
  2018-12-05  5:06 [Qemu-devel] [RFCv2 for-4.0 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate David Gibson
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 2/5] virtio-balloon: Corrections to address verification David Gibson
@ 2018-12-05  5:06 ` David Gibson
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise() David Gibson
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size David Gibson
  4 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2018-12-05  5:06 UTC (permalink / raw)
  To: dhildenb, imammedo, ehabkost
  Cc: mst, pbonzini, qemu-devel, qemu-ppc, David Gibson

This replaces the balloon_page() internal interface with
ballon_inflate_page(), with a slightly different interface.  The new
interface will make future alterations simpler.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c | 18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index e8611aab0e..c3a19aa27d 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -33,11 +33,12 @@
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
 
-static void balloon_page(void *addr, int deflate)
+static void balloon_inflate_page(VirtIOBalloon *balloon,
+                                 MemoryRegion *mr, hwaddr offset)
 {
-    if (!qemu_balloon_is_inhibited() && !deflate) {
-        qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
-    }
+    void *addr = memory_region_get_ram_ptr(mr) + offset;
+
+    qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
 }
 
 static const char *balloon_stat_names[] = {
@@ -222,7 +223,6 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 
         while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
             hwaddr pa;
-            hwaddr addr;
             int p = virtio_ldl_p(vdev, &pfn);
 
             pa = (hwaddr) p << VIRTIO_BALLOON_PFN_SHIFT;
@@ -244,11 +244,9 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 
             trace_virtio_balloon_handle_output(memory_region_name(section.mr),
                                                pa);
-            /* Using memory_region_get_ram_ptr is bending the rules a bit, but
-               should be OK because we only want a single page.  */
-            addr = section.offset_within_region;
-            balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
-                         !!(vq == s->dvq));
+            if (!qemu_balloon_is_inhibited() && vq != s->dvq) {
+                balloon_inflate_page(s, section.mr, section.offset_within_region);
+            }
             memory_region_unref(section.mr);
         }
 
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
  2018-12-05  5:06 [Qemu-devel] [RFCv2 for-4.0 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
                   ` (2 preceding siblings ...)
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 3/5] virtio-balloon: Rework ballon_page() interface David Gibson
@ 2018-12-05  5:06 ` David Gibson
  2018-12-05  7:59   ` David Hildenbrand
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size David Gibson
  4 siblings, 1 reply; 8+ messages in thread
From: David Gibson @ 2018-12-05  5:06 UTC (permalink / raw)
  To: dhildenb, imammedo, ehabkost
  Cc: mst, pbonzini, qemu-devel, qemu-ppc, David Gibson

Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually
discard RAM pages inserted into the balloon.  This is basically a Linux
only interface (MADV_DONTNEED exists on some other platforms, but doesn't
always have the same semantics).  It also doesn't work on hugepages and has
some other limitations.

It turns out that postcopy also needs to discard chunks of memory, and uses
a better interface for it: ram_block_discard_range().  It doesn't cover
every case, but it covers more than going direct to madvise() and this
gives us a single place to update for more possibilities in future.

There are some subtleties here to maintain the current balloon behaviour:

* For now, we just ignore requests to balloon in a hugepage backed region.
  That matches current behaviour, because MADV_DONTNEED on a hugepage would
  simply fail, and we ignore the error.

* If host page size is > BALLOON_PAGE_SIZE we can frequently call this on
  non-host-page-aligned addresses.  These would also fail in madvise(),
  which we then ignored.  ram_block_discard_range() error_report()s calls
  on unaligned addresses, so we explicitly check that case to avoid
  spamming the logs.

* We now call ram_block_discard_range() with the *host* page size, whereas
  we previously called madvise() with BALLOON_PAGE_SIZE.  Surprisingly,
  this also matches existing behaviour.  Although the kernel fails madvise
  on unaligned addresses, it will round unaligned sizes *up* to the host
  page size.  Yes, this means that if BALLOON_PAGE_SIZE < guest page size
  we can incorrectly discard more memory than the guest asked us to.  I'm
  planning to address that soon.

Errors other than the ones discussed above, will now be reported by
ram_block_discard_range(), rather than silently ignored, which means we
have a much better chance of seeing when something is going wrong.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index c3a19aa27d..4435905c87 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -37,8 +37,29 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
                                  MemoryRegion *mr, hwaddr offset)
 {
     void *addr = memory_region_get_ram_ptr(mr) + offset;
+    RAMBlock *rb;
+    size_t rb_page_size;
+    ram_addr_t ram_offset;
 
-    qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
+    /* XXX is there a better way to get to the RAMBlock than via a
+     * host address? */
+    rb = qemu_ram_block_from_host(addr, false, &ram_offset);
+    rb_page_size = qemu_ram_pagesize(rb);
+
+    /* Silently ignore hugepage RAM blocks */
+    if (rb_page_size != getpagesize()) {
+        return;
+    }
+
+    /* Silently ignore unaligned requests */
+    if (ram_offset & (rb_page_size - 1)) {
+        return;
+    }
+
+    ram_block_discard_range(rb, ram_offset, rb_page_size);
+    /* We ignore errors from ram_block_discard_range(), because it has
+     * already reported them, and failing to discard a balloon page is
+     * not fatal */
 }
 
 static const char *balloon_stat_names[] = {
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [RFCv2 for-4.0 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size
  2018-12-05  5:06 [Qemu-devel] [RFCv2 for-4.0 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
                   ` (3 preceding siblings ...)
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise() David Gibson
@ 2018-12-05  5:06 ` David Gibson
  4 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2018-12-05  5:06 UTC (permalink / raw)
  To: dhildenb, imammedo, ehabkost
  Cc: mst, pbonzini, qemu-devel, qemu-ppc, David Gibson

The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but
we can only actually discard memory in units of the host page size.

Now, we handle this very badly: we silently ignore balloon requests that
aren't host page aligned, and for requests that are host page aligned we
discard the entire host page.  The latter can corrupt guest memory if its
page size is smaller than the host's.

The obvious choice would be to disable the balloon if the host page size is
not 4kiB.  However, that would break the special case where host and guest
have the same page size, but that's larger than 4kiB.  That case currently
works by accident[1] - and is used in practice on many production POWER
systems where 64kiB has long been the Linux default page size on both host
and guest.

To make the balloon safe, without breaking that useful special case, we
need to accumulate 4kiB balloon requests until we have a whole contiguous
host page to discard.

We could in principle do that across all guest memory, but it would require
a large bitmap to track.  This patch represents a compromise: we track
ballooned subpages for a single contiguous host page at a time.  This means
that if the guest discards all 4kiB chunks of a host page in succession,
we will discard it.  This is the expected behaviour in the (host page) ==
(guest page) != 4kiB case we want to support.

If the guest scatters 4kiB requests across different host pages, we don't
discard anything, and issue a warning.  Not ideal, but at least we don't
corrupt guest memory as the previous version could.

Warning reporting is kind of a compromise here.  Determining whether we're
in a problematic state at realize() time is tricky, because we'd have to
look at the host pagesizes of all memory backends, but we can't really know
if some of those backends could be for special purpose memory that's not
subject to ballooning.

Reporting only when the guest tries to balloon a partial page also isn't
great because if the guest page size happens to line up it won't indicate
that we're in a non ideal situation.  It could also cause alarming repeated
warnings whenever a migration is attempted.

So, what we do is warn the first time the guest attempts balloon a partial
host page, whether or not it will end up ballooning the rest of the page
immediately afterwards.

[1] Because when the guest attempts to balloon a page, it will submit
    requests for each 4kiB subpage.  Most will be ignored, but the one
    which happens to be host page aligned will discard the whole lot.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/virtio/virtio-balloon.c         | 67 +++++++++++++++++++++++++-----
 include/hw/virtio/virtio-balloon.h |  3 ++
 2 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 4435905c87..39573ef2e3 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -33,33 +33,80 @@
 
 #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
 
+typedef struct PartiallyBalloonedPage {
+    RAMBlock *rb;
+    ram_addr_t base;
+    unsigned long bitmap[];
+} PartiallyBalloonedPage;
+
 static void balloon_inflate_page(VirtIOBalloon *balloon,
                                  MemoryRegion *mr, hwaddr offset)
 {
     void *addr = memory_region_get_ram_ptr(mr) + offset;
     RAMBlock *rb;
     size_t rb_page_size;
-    ram_addr_t ram_offset;
+    int subpages;
+    ram_addr_t ram_offset, host_page_base;
 
     /* XXX is there a better way to get to the RAMBlock than via a
      * host address? */
     rb = qemu_ram_block_from_host(addr, false, &ram_offset);
     rb_page_size = qemu_ram_pagesize(rb);
+    host_page_base = ram_offset & ~(rb_page_size - 1);
+
+    if (rb_page_size == BALLOON_PAGE_SIZE) {
+        /* Easy case */
 
-    /* Silently ignore hugepage RAM blocks */
-    if (rb_page_size != getpagesize()) {
+        ram_block_discard_range(rb, ram_offset, rb_page_size);
+        /* We ignore errors from ram_block_discard_range(), because it
+         * has already reported them, and failing to discard a balloon
+         * page is not fatal */
         return;
     }
 
-    /* Silently ignore unaligned requests */
-    if (ram_offset & (rb_page_size - 1)) {
-        return;
+    /* Hard case
+     *
+     * We've put a piece of a larger host page into the balloon - we
+     * need to keep track until we have a whole host page to
+     * discard
+     */
+    subpages = rb_page_size / BALLOON_PAGE_SIZE;
+
+    if (balloon->pbp
+        && (rb != balloon->pbp->rb
+            || host_page_base != balloon->pbp->base)) {
+        /* We've partially ballooned part of a host page, but now
+         * we're trying to balloon part of a different one.  Too hard,
+         * give up on the old partial page */
+        warn_report("Unable to insert a partial page into virtio-balloon");
+        free(balloon->pbp);
+        balloon->pbp = NULL;
     }
 
-    ram_block_discard_range(rb, ram_offset, rb_page_size);
-    /* We ignore errors from ram_block_discard_range(), because it has
-     * already reported them, and failing to discard a balloon page is
-     * not fatal */
+    if (!balloon->pbp) {
+        /* Starting on a new host page */
+        size_t bitlen = BITS_TO_LONGS(subpages) * sizeof(unsigned long);
+        balloon->pbp = g_malloc0(sizeof(PartiallyBalloonedPage) + bitlen);
+        balloon->pbp->rb = rb;
+        balloon->pbp->base = host_page_base;
+    }
+
+    bitmap_set(balloon->pbp->bitmap,
+               (ram_offset - balloon->pbp->base) / BALLOON_PAGE_SIZE,
+               subpages);
+
+    if (bitmap_full(balloon->pbp->bitmap, subpages)) {
+        /* We've accumulated a full host page, we can actually discard
+         * it now */
+
+        ram_block_discard_range(rb, balloon->pbp->base, rb_page_size);
+        /* We ignore errors from ram_block_discard_range(), because it
+         * has already reported them, and failing to discard a balloon
+         * page is not fatal */
+
+        free(balloon->pbp);
+        balloon->pbp = NULL;
+    }
 }
 
 static const char *balloon_stat_names[] = {
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index e0df3528c8..99dcd6d105 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -30,6 +30,8 @@ typedef struct virtio_balloon_stat_modern {
        uint64_t val;
 } VirtIOBalloonStatModern;
 
+typedef struct PartiallyBalloonedPage PartiallyBalloonedPage;
+
 typedef struct VirtIOBalloon {
     VirtIODevice parent_obj;
     VirtQueue *ivq, *dvq, *svq;
@@ -42,6 +44,7 @@ typedef struct VirtIOBalloon {
     int64_t stats_last_update;
     int64_t stats_poll_interval;
     uint32_t host_features;
+    PartiallyBalloonedPage *pbp;
 } VirtIOBalloon;
 
 #endif
-- 
2.19.2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
  2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise() David Gibson
@ 2018-12-05  7:59   ` David Hildenbrand
  2018-12-05 22:56     ` David Gibson
  0 siblings, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2018-12-05  7:59 UTC (permalink / raw)
  To: David Gibson, dhildenb, imammedo, ehabkost
  Cc: mst, pbonzini, qemu-devel, qemu-ppc

On 05.12.18 06:06, David Gibson wrote:
> Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually
> discard RAM pages inserted into the balloon.  This is basically a Linux
> only interface (MADV_DONTNEED exists on some other platforms, but doesn't
> always have the same semantics).  It also doesn't work on hugepages and has
> some other limitations.
> 
> It turns out that postcopy also needs to discard chunks of memory, and uses
> a better interface for it: ram_block_discard_range().  It doesn't cover
> every case, but it covers more than going direct to madvise() and this
> gives us a single place to update for more possibilities in future.
> 
> There are some subtleties here to maintain the current balloon behaviour:
> 
> * For now, we just ignore requests to balloon in a hugepage backed region.
>   That matches current behaviour, because MADV_DONTNEED on a hugepage would
>   simply fail, and we ignore the error.
> 
> * If host page size is > BALLOON_PAGE_SIZE we can frequently call this on
>   non-host-page-aligned addresses.  These would also fail in madvise(),
>   which we then ignored.  ram_block_discard_range() error_report()s calls
>   on unaligned addresses, so we explicitly check that case to avoid
>   spamming the logs.
> 
> * We now call ram_block_discard_range() with the *host* page size, whereas
>   we previously called madvise() with BALLOON_PAGE_SIZE.  Surprisingly,
>   this also matches existing behaviour.  Although the kernel fails madvise
>   on unaligned addresses, it will round unaligned sizes *up* to the host
>   page size.  Yes, this means that if BALLOON_PAGE_SIZE < guest page size
>   we can incorrectly discard more memory than the guest asked us to.  I'm
>   planning to address that soon.
> 
> Errors other than the ones discussed above, will now be reported by
> ram_block_discard_range(), rather than silently ignored, which means we
> have a much better chance of seeing when something is going wrong.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/virtio/virtio-balloon.c | 23 ++++++++++++++++++++++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index c3a19aa27d..4435905c87 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -37,8 +37,29 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
>                                   MemoryRegion *mr, hwaddr offset)
>  {
>      void *addr = memory_region_get_ram_ptr(mr) + offset;
> +    RAMBlock *rb;
> +    size_t rb_page_size;
> +    ram_addr_t ram_offset;
>  
> -    qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
> +    /* XXX is there a better way to get to the RAMBlock than via a
> +     * host address? */

We have qemu_get_ram_block(). That one should work as long as we know
that it is a valid guest ram address. (would have to make it !static)

Then we would only have to pass to this function the original ram_addr_t
handed over by the guest (which looks somewhat cleaner to me than going
via memory regions)

> +    rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> +    rb_page_size = qemu_ram_pagesize(rb);
> +
> +    /* Silently ignore hugepage RAM blocks */
> +    if (rb_page_size != getpagesize()) {
> +        return;
> +    }
> +
> +    /* Silently ignore unaligned requests */
> +    if (ram_offset & (rb_page_size - 1)) {
> +        return;
> +    }
> +
> +    ram_block_discard_range(rb, ram_offset, rb_page_size);
> +    /* We ignore errors from ram_block_discard_range(), because it has
> +     * already reported them, and failing to discard a balloon page is
> +     * not fatal */
>  }
>  
>  static const char *balloon_stat_names[] = {
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise()
  2018-12-05  7:59   ` David Hildenbrand
@ 2018-12-05 22:56     ` David Gibson
  0 siblings, 0 replies; 8+ messages in thread
From: David Gibson @ 2018-12-05 22:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: dhildenb, imammedo, ehabkost, mst, pbonzini, qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 4542 bytes --]

On Wed, Dec 05, 2018 at 08:59:06AM +0100, David Hildenbrand wrote:
> On 05.12.18 06:06, David Gibson wrote:
> > Currently, virtio-balloon uses madvise() with MADV_DONTNEED to actually
> > discard RAM pages inserted into the balloon.  This is basically a Linux
> > only interface (MADV_DONTNEED exists on some other platforms, but doesn't
> > always have the same semantics).  It also doesn't work on hugepages and has
> > some other limitations.
> > 
> > It turns out that postcopy also needs to discard chunks of memory, and uses
> > a better interface for it: ram_block_discard_range().  It doesn't cover
> > every case, but it covers more than going direct to madvise() and this
> > gives us a single place to update for more possibilities in future.
> > 
> > There are some subtleties here to maintain the current balloon behaviour:
> > 
> > * For now, we just ignore requests to balloon in a hugepage backed region.
> >   That matches current behaviour, because MADV_DONTNEED on a hugepage would
> >   simply fail, and we ignore the error.
> > 
> > * If host page size is > BALLOON_PAGE_SIZE we can frequently call this on
> >   non-host-page-aligned addresses.  These would also fail in madvise(),
> >   which we then ignored.  ram_block_discard_range() error_report()s calls
> >   on unaligned addresses, so we explicitly check that case to avoid
> >   spamming the logs.
> > 
> > * We now call ram_block_discard_range() with the *host* page size, whereas
> >   we previously called madvise() with BALLOON_PAGE_SIZE.  Surprisingly,
> >   this also matches existing behaviour.  Although the kernel fails madvise
> >   on unaligned addresses, it will round unaligned sizes *up* to the host
> >   page size.  Yes, this means that if BALLOON_PAGE_SIZE < guest page size
> >   we can incorrectly discard more memory than the guest asked us to.  I'm
> >   planning to address that soon.
> > 
> > Errors other than the ones discussed above, will now be reported by
> > ram_block_discard_range(), rather than silently ignored, which means we
> > have a much better chance of seeing when something is going wrong.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/virtio/virtio-balloon.c | 23 ++++++++++++++++++++++-
> >  1 file changed, 22 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index c3a19aa27d..4435905c87 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -37,8 +37,29 @@ static void balloon_inflate_page(VirtIOBalloon *balloon,
> >                                   MemoryRegion *mr, hwaddr offset)
> >  {
> >      void *addr = memory_region_get_ram_ptr(mr) + offset;
> > +    RAMBlock *rb;
> > +    size_t rb_page_size;
> > +    ram_addr_t ram_offset;
> >  
> > -    qemu_madvise(addr, BALLOON_PAGE_SIZE, QEMU_MADV_DONTNEED);
> > +    /* XXX is there a better way to get to the RAMBlock than via a
> > +     * host address? */
> 
> We have qemu_get_ram_block(). That one should work as long as we know
> that it is a valid guest ram address. (would have to make it !static)
> 
> Then we would only have to pass to this function the original ram_addr_t
> handed over by the guest (which looks somewhat cleaner to me than going
> via memory regions)

So, I didn't use that because it's a hwaddr, not a ram_addr_t that the
guest gives us.  I think they have the same value for guest RAM
addresses, but I wasn't sure if it was safe to rely on that.

> 
> > +    rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> > +    rb_page_size = qemu_ram_pagesize(rb);
> > +
> > +    /* Silently ignore hugepage RAM blocks */
> > +    if (rb_page_size != getpagesize()) {
> > +        return;
> > +    }
> > +
> > +    /* Silently ignore unaligned requests */
> > +    if (ram_offset & (rb_page_size - 1)) {
> > +        return;
> > +    }
> > +
> > +    ram_block_discard_range(rb, ram_offset, rb_page_size);
> > +    /* We ignore errors from ram_block_discard_range(), because it has
> > +     * already reported them, and failing to discard a balloon page is
> > +     * not fatal */
> >  }
> >  
> >  static const char *balloon_stat_names[] = {
> > 
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-12-05 22:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-05  5:06 [Qemu-devel] [RFCv2 for-4.0 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate David Gibson
2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 2/5] virtio-balloon: Corrections to address verification David Gibson
2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 3/5] virtio-balloon: Rework ballon_page() interface David Gibson
2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise() David Gibson
2018-12-05  7:59   ` David Hildenbrand
2018-12-05 22:56     ` David Gibson
2018-12-05  5:06 ` [Qemu-devel] [RFCv2 for-4.0 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.