All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Thomas Huth <thuth@redhat.com>
Cc: drjones@redhat.com, qemu-devel@nongnu.org, dgilbert@redhat.com,
	jitendra.kolhe@hpe.com, wehuang@redhat.com, amit.shah@redhat.com,
	dgibson@redhat.com
Subject: Re: [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes
Date: Wed, 13 Apr 2016 20:55:39 +0300	[thread overview]
Message-ID: <20160413205524-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <570E8404.10503@redhat.com>

On Wed, Apr 13, 2016 at 07:38:12PM +0200, Thomas Huth wrote:
> On 13.04.2016 19:07, Michael S. Tsirkin wrote:
> > On Wed, Apr 13, 2016 at 04:51:49PM +0200, Thomas Huth wrote:
> >> On 13.04.2016 15:15, Michael S. Tsirkin wrote:
> >>> On Wed, Apr 13, 2016 at 01:52:44PM +0200, Thomas Huth wrote:
> ...
> >>>> Then, there's yet another problem: If the host page size is bigger
> >>>> than the 4k balloon page size, we can not simply call madvise() on
> >>>> each of the 4k balloon addresses that we get from the guest - since
> >>>> the madvise() always evicts the whole host page, not only a 4k area!
> >>>>
> >>>> So in this case, we've got to track the 4k fragments of a host page
> >>>> and only call madvise(DONTNEED) when all fragments have been collected.
> >>>> This of course only works fine if the guest sends consecutive 4k
> >>>> fragments - which is the case in the most important scenarios that
> >>>> I try to address here (like a ppc64 guest with 64k page size that
> >>>> is running on a ppc64 host with 64k page size). In case the guest
> >>>> uses a page size that is smaller than the host page size, we might
> >>>> need to add some more additional logic here later to increase the
> >>>> probability of being able to release memory, but at least the guest
> >>>> should now not crash anymore due to unintentionally evicted pages.
> >> ...
> >>>>  static void virtio_balloon_instance_init(Object *obj)
> >>>> diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
> >>>> index 35f62ac..04b7c0c 100644
> >>>> --- a/include/hw/virtio/virtio-balloon.h
> >>>> +++ b/include/hw/virtio/virtio-balloon.h
> >>>> @@ -43,6 +43,9 @@ typedef struct VirtIOBalloon {
> >>>>      int64_t stats_last_update;
> >>>>      int64_t stats_poll_interval;
> >>>>      uint32_t host_features;
> >>>> +    void *current_addr;
> >>>> +    unsigned long *fragment_bits;
> >>>> +    int fragment_bits_size;
> >>>>  } VirtIOBalloon;
> >>>>  
> >>>>  #endif
> >>>
> >>> It looks like fragment_bits would have to be migrated.
> >>> Which is a lot of complexity.
> ...
> >>> How about we just skip madvise if host page size is > balloon
> >>> page size, for 2.6?
> >>
> >> That would mean a regression compared to what we have today. Currently,
> >> the ballooning is working OK for 64k guests on a 64k ppc host - rather
> >> by chance than on purpose, but it's working. The guest is always sending
> >> all the 4k fragments of a 64k page, and QEMU is trying to call madvise()
> >> for every one of them, but the kernel is ignoring madvise() on
> >> non-64k-aligned addresses, so we end up with a situation where the
> >> madvise() frees a whole 64k page which is also declared as free by the
> >> guest.
> >>
> >> I think we should either take this patch as it is right now (without
> >> adding extra code for migration) and later update it to the bitmap code
> >> by Jitendra Kolhe, or omit it completely (leaving 4k guests broken) and
> >> fix it properly after the bitmap code has been applied. But disabling
> >> the balloon code for 64k guests on 64k hosts completely does not sound
> >> very appealing to me. What do you think?
> >>
> >>  Thomas
> > 
> > True. As simple a hack - how about disabling madvise when host page size >
> > target page size?
> 
> That could work - but is there a generic way in QEMU to get the current
> page size from a guest (since this might differ from TARGET_PAGE_SIZE)?
> Or would that mean to pollute the virtio-balloon code with ugly #ifdefs?
> 
>  Thomas

let's just use TARGET_PAGE_SIZE, that's the best I can think of.

  reply	other threads:[~2016-04-13 17:55 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-13 11:52 [Qemu-devel] [PATCH] hw/virtio/balloon: Fixes for different host page sizes Thomas Huth
2016-04-13 12:37 ` Andrew Jones
2016-04-13 13:15 ` Michael S. Tsirkin
2016-04-13 14:51   ` Thomas Huth
2016-04-13 17:07     ` Michael S. Tsirkin
2016-04-13 17:38       ` Thomas Huth
2016-04-13 17:55         ` Michael S. Tsirkin [this message]
2016-04-13 18:11           ` Thomas Huth
2016-04-13 18:14             ` Michael S. Tsirkin
2016-04-14  3:45               ` David Gibson
2016-04-13 18:21             ` Andrew Jones
2016-04-14 11:47     ` Dr. David Alan Gilbert
2016-04-14 12:19       ` Thomas Huth
2016-04-14 18:34         ` Dr. David Alan Gilbert
2016-04-15  4:26           ` David Gibson
2016-05-23  6:25     ` Jitendra Kolhe
2016-04-14  3:39   ` David Gibson
2016-04-14  3:37 ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160413205524-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=amit.shah@redhat.com \
    --cc=dgibson@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=drjones@redhat.com \
    --cc=jitendra.kolhe@hpe.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    --cc=wehuang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.