All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: David Hildenbrand <david@redhat.com>
Cc: dhildenb@redhat.com, imammedo@redhat.com, ehabkost@redhat.com,
	mst@redhat.com, pbonzini@redhat.com, qemu-devel@nongnu.org,
	qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [RFC 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size
Date: Sat, 13 Oct 2018 17:40:09 +1100	[thread overview]
Message-ID: <20181013064009.GG16167@umbus.fritz.box> (raw)
In-Reply-To: <f21b8a98-3401-cf82-749d-2101fbe69a33@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 8624 bytes --]

On Fri, Oct 12, 2018 at 10:06:50AM +0200, David Hildenbrand wrote:
> On 12/10/2018 05:24, David Gibson wrote:
> > The virtio-balloon always works in units of 4kiB (BALLOON_PAGE_SIZE), but
> > on the host side, we can only actually discard memory in units of the host
> > page size.
> > 
> > At present we handle this very badly: we silently ignore balloon requests
> > that aren't host page aligned, and for requests that are host page aligned
> > we discard the entire host page.  The latter potentially corrupts guest
> > memory if its page size is smaller than the host's.
> > 
> > We could just disable the balloon if the host page size is not 4kiB, but
> > that would break a the special case where host and guest have the same page
> > size, but that's larger than 4kiB.  Thius case currently works by accident:
> > when the guest puts its page into the balloon, it will submit balloon
> > requests for each 4kiB subpage.  Most will be ignored, but the one which
> > happens to be host page aligned will discard the whole lot.
> 
> Actually I would vote for disabling it if the host page size is not 4k.
> This is broken by design and should not be worked around.

I really don't think that's feasible.  This is not some theoretical
edge case: essentially every KVM on POWER system out there with
balloon enabled is in this situation.  And AIUI, a bunch of common
management layers routinely enable the balloon.  Just disabling this
for non-4k systems will break existing, supported production setups.

> I consider 64k a similar case as huge pages in the hypervisor (from a
> virtio-balloon interface point of view - which is broken by design for
> these cases).

Technically it is very similar, but again, this is in routine use on
real systems out there.

> > This occurs in practice routinely for POWER KVM systems, since both host
> > and guest typically use 64kiB pages.
> > 
> > To make this safe, without breaking that useful case, we need to
> > accumulate 4kiB balloon requests until we have a whole contiguous host page
> > at which point we can discard it.
> 
> ... and if you have a 4k guest it will just randomly report pages and
> your 67 LOC tweak is useless.

Yes.

> And this can and will easily happen.

What cases are you thinking of?  For POWER at least, 4k on 64k will be
vastly less common than 64k on 64k.

> > We could in principle do that across all guest memory, but it would require
> > a large bitmap to track.  This patch represents a compromise: instead we
> 
> Oh god no, no tracking of all memory. (size of bitmap, memory hotplug,
> ... migration?)

Quite, that's why I didn't do it.  Although, I don't actually think
migration is such an issue: we *already* essentially lose track of
which pages are inside the balloon across migration.

> > track ballooned subpages for a single contiguous host page at a time.  This
> > means that if the guest discards all 4kiB chunks of a host page in
> > succession, we will discard it.  In particular that means the balloon will
> > continue to work for the (host page size) == (guest page size) > 4kiB case.
> > 
> > If the guest scatters 4kiB requests across different host pages, we don't
> > discard anything, and issue a warning.  Not ideal, but at least we don't
> > corrupt guest memory as the previous version could.
> 
> My take would be to somehow disable the balloon on the hypervisor side
> in case the host page size is not 4k. Like not allowing to realize it.
> No corruptions, no warnings people will ignore.

No, that's even worse than just having it silently do nothing on
non-4k setups.  Silently being useless we might get away with, we'll
just waste memory, but failing the device load will absolutely break
existing setups.

> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/virtio/virtio-balloon.c         | 67 +++++++++++++++++++++++++-----
> >  include/hw/virtio/virtio-balloon.h |  3 ++
> >  2 files changed, 60 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 4435905c87..39573ef2e3 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -33,33 +33,80 @@
> >  
> >  #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
> >  
> > +typedef struct PartiallyBalloonedPage {
> > +    RAMBlock *rb;
> > +    ram_addr_t base;
> > +    unsigned long bitmap[];
> > +} PartiallyBalloonedPage;
> > +
> >  static void balloon_inflate_page(VirtIOBalloon *balloon,
> >                                   MemoryRegion *mr, hwaddr offset)
> >  {
> >      void *addr = memory_region_get_ram_ptr(mr) + offset;
> >      RAMBlock *rb;
> >      size_t rb_page_size;
> > -    ram_addr_t ram_offset;
> > +    int subpages;
> > +    ram_addr_t ram_offset, host_page_base;
> >  
> >      /* XXX is there a better way to get to the RAMBlock than via a
> >       * host address? */
> >      rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> >      rb_page_size = qemu_ram_pagesize(rb);
> > +    host_page_base = ram_offset & ~(rb_page_size - 1);
> > +
> > +    if (rb_page_size == BALLOON_PAGE_SIZE) {
> > +        /* Easy case */
> >  
> > -    /* Silently ignore hugepage RAM blocks */
> > -    if (rb_page_size != getpagesize()) {
> > +        ram_block_discard_range(rb, ram_offset, rb_page_size);
> > +        /* We ignore errors from ram_block_discard_range(), because it
> > +         * has already reported them, and failing to discard a balloon
> > +         * page is not fatal */
> >          return;
> >      }
> >  
> > -    /* Silently ignore unaligned requests */
> > -    if (ram_offset & (rb_page_size - 1)) {
> > -        return;
> > +    /* Hard case
> > +     *
> > +     * We've put a piece of a larger host page into the balloon - we
> > +     * need to keep track until we have a whole host page to
> > +     * discard
> > +     */
> > +    subpages = rb_page_size / BALLOON_PAGE_SIZE;
> > +
> > +    if (balloon->pbp
> > +        && (rb != balloon->pbp->rb
> > +            || host_page_base != balloon->pbp->base)) {
> > +        /* We've partially ballooned part of a host page, but now
> > +         * we're trying to balloon part of a different one.  Too hard,
> > +         * give up on the old partial page */
> > +        warn_report("Unable to insert a partial page into virtio-balloon");
> 
> I am pretty sure that you can create misleading warnings in case you
> migrate at the wrong time. (migrate while half the 64k page is inflated,
> on the new host the other part is inflated - a warning when switching to
> another 64k page).

Yes we can get bogus warnings across migration with this.  I was
considering that an acceptable price, but I'm open to better
alternatives.

> > +        free(balloon->pbp);
> > +        balloon->pbp = NULL;
> >      }
> >  
> > -    ram_block_discard_range(rb, ram_offset, rb_page_size);
> > -    /* We ignore errors from ram_block_discard_range(), because it has
> > -     * already reported them, and failing to discard a balloon page is
> > -     * not fatal */
> > +    if (!balloon->pbp) {
> > +        /* Starting on a new host page */
> > +        size_t bitlen = BITS_TO_LONGS(subpages) * sizeof(unsigned long);
> > +        balloon->pbp = g_malloc0(sizeof(PartiallyBalloonedPage) + bitlen);
> > +        balloon->pbp->rb = rb;
> > +        balloon->pbp->base = host_page_base;
> > +    }
> > +
> > +    bitmap_set(balloon->pbp->bitmap,
> > +               (ram_offset - balloon->pbp->base) / BALLOON_PAGE_SIZE,
> > +               subpages);
> > +
> > +    if (bitmap_full(balloon->pbp->bitmap, subpages)) {
> > +        /* We've accumulated a full host page, we can actually discard
> > +         * it now */
> > +
> > +        ram_block_discard_range(rb, balloon->pbp->base, rb_page_size);
> > +        /* We ignore errors from ram_block_discard_range(), because it
> > +         * has already reported them, and failing to discard a balloon
> > +         * page is not fatal */
> > +
> > +        free(balloon->pbp);
> > +        balloon->pbp = NULL;
> > +    }
> >  }
> No, not a fan of this approach.

I can see why, but I really can't see what else to do without breaking
existing, supported, working (albeit by accident) setups.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2018-10-13  6:41 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-12  3:24 [Qemu-devel] [RFC 0/5] Improve balloon handling of pagesizes other than 4kiB David Gibson
2018-10-12  3:24 ` [Qemu-devel] [RFC 1/5] virtio-balloon: Remove unnecessary MADV_WILLNEED on deflate David Gibson
2018-10-12  7:40   ` David Hildenbrand
2018-10-13  6:26     ` David Gibson
2018-10-12 17:41   ` Richard Henderson
2018-10-12 17:59     ` Eric Blake
2018-10-13  6:23     ` David Gibson
2018-10-12 18:05   ` Michael S. Tsirkin
2018-10-15  6:54     ` David Hildenbrand
2018-10-15 10:43       ` Michael S. Tsirkin
2018-10-15 11:14         ` David Hildenbrand
2018-12-04  4:26         ` David Gibson
2018-10-12  3:24 ` [Qemu-devel] [RFC 2/5] virtio-balloon: Corrections to address verification David Gibson
2018-10-12  7:44   ` David Hildenbrand
2018-10-13  6:25     ` David Gibson
2018-10-12  3:24 ` [Qemu-devel] [RFC 3/5] virtio-balloon: Rework ballon_page() interface David Gibson
2018-10-12  7:46   ` David Hildenbrand
2018-10-13  6:29     ` David Gibson
2018-10-12  3:24 ` [Qemu-devel] [RFC 4/5] virtio-balloon: Use ram_block_discard_range() instead of raw madvise() David Gibson
2018-10-12  3:24 ` [Qemu-devel] [RFC 5/5] virtio-balloon: Safely handle BALLOON_PAGE_SIZE < host page size David Gibson
2018-10-12  8:06   ` David Hildenbrand
2018-10-13  6:40     ` David Gibson [this message]
2018-10-15  7:08       ` David Hildenbrand
2018-10-17  3:28         ` David Gibson
2018-10-17  9:56           ` David Hildenbrand
2018-10-23  8:02             ` David Gibson
2018-10-23 15:13               ` David Hildenbrand
2018-10-12  8:32   ` David Hildenbrand
2018-10-13  6:41     ` David Gibson
2018-10-12 17:26 ` [Qemu-devel] [RFC 0/5] Improve balloon handling of pagesizes other than 4kiB Michael S. Tsirkin
2018-10-17  3:31   ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181013064009.GG16167@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=david@redhat.com \
    --cc=dhildenb@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.