All of lore.kernel.org
 help / color / mirror / Atom feed
From: Liang Li <liang.z.li@intel.com>
To: qemu-devel@nongnu.org
Cc: mst@redhat.com, pbonzini@redhat.com, quintela@redhat.com,
	amit.shah@redhat.com, kvm@vger.kernel.org, dgilbert@redhat.com,
	thuth@redhat.com, virtio-dev@lists.oasis-open.org,
	dave.hansen@intel.com, Liang Li <liang.z.li@intel.com>
Subject: [PATCH qemu v3 2/6] virtio-balloon: speed up inflating & deflating process
Date: Fri, 21 Oct 2016 14:48:20 +0800	[thread overview]
Message-ID: <1477032504-12745-3-git-send-email-liang.z.li@intel.com> (raw)
In-Reply-To: <1477032504-12745-1-git-send-email-liang.z.li@intel.com>

The implementation of the current virtio-balloon is not very
efficient, the time spends on different stages of inflating
the balloon to 7GB of a 8GB idle guest:

a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

It takes about 4126ms for the inflating process to complete.
Debugging shows that the bottle neck are the stage b and stage d.

If using a bitmap to send the page info instead of the PFNs, we
can reduce the overhead in stage b quite a lot. Furthermore, we
can do the address translation and call madvise() with a bulk of
RAM pages, instead of the current page per page way, the overhead
of stage c and stage d can also be reduced a lot.

This patch is the kernel side implementation which is intended to
speed up the inflating & deflating process by adding a new feature
to the virtio-balloon device. With this new feature, inflating the
balloon to 7GB of a 8GB idle guest only takes 590ms, the
performance improvement is about 85%.

TODO: optimize stage a by allocating/freeing a chunk of pages
instead of a single page at a time.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c | 144 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 123 insertions(+), 21 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1d77028..3fa80a4 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -52,6 +52,77 @@ static const char *balloon_stat_names[] = {
    [VIRTIO_BALLOON_S_NR] = NULL
 };
 
+static void do_balloon_bulk_pages(ram_addr_t base_pfn, uint16_t page_shift,
+                                  unsigned long len, bool deflate)
+{
+    ram_addr_t size, processed, chunk, base;
+    MemoryRegionSection section = {.mr = NULL};
+
+    size = len << page_shift;
+    base = base_pfn << page_shift;
+
+    for (processed = 0; processed < size; processed += chunk) {
+        chunk = size - processed;
+        while (chunk >= TARGET_PAGE_SIZE) {
+            section = memory_region_find(get_system_memory(),
+                                         base + processed, chunk);
+            if (!section.mr) {
+                chunk = QEMU_ALIGN_DOWN(chunk / 2, TARGET_PAGE_SIZE);
+            } else {
+                break;
+            }
+        }
+
+        if (section.mr &&
+            (int128_nz(section.size) && memory_region_is_ram(section.mr))) {
+            void *addr = section.offset_within_region +
+                   memory_region_get_ram_ptr(section.mr);
+            qemu_madvise(addr, chunk,
+                         deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
+        } else {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "Invalid guest RAM range [0x%lx, 0x%lx]\n",
+                          base + processed, chunk);
+            chunk = TARGET_PAGE_SIZE;
+        }
+    }
+}
+
+static void balloon_bulk_pages(struct balloon_bmap_hdr *hdr,
+                               unsigned long *bitmap, bool deflate)
+{
+    ram_addr_t base_pfn = hdr->start_pfn;
+    uint16_t page_shift = hdr->page_shift;
+    unsigned long len = hdr->bmap_len;
+    unsigned long current = 0, end = len * BITS_PER_BYTE;
+
+    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
+                                         kvm_has_sync_mmu())) {
+        while (current < end) {
+            unsigned long one = find_next_bit(bitmap, end, current);
+
+            if (one < end) {
+                unsigned long pages, zero;
+
+                zero = find_next_zero_bit(bitmap, end, one + 1);
+                if (zero >= end) {
+                    pages = end - one;
+                } else {
+                    pages = zero - one;
+                }
+
+                if (pages) {
+                    do_balloon_bulk_pages(base_pfn + one, page_shift,
+                                          pages, deflate);
+                }
+                current = one + pages;
+            } else {
+                current = one;
+            }
+        }
+    }
+}
+
 /*
  * reset_stats - Mark all items in the stats array as unset
  *
@@ -72,6 +143,13 @@ static bool balloon_stats_supported(const VirtIOBalloon *s)
     return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ);
 }
 
+static bool balloon_page_bitmap_supported(const VirtIOBalloon *s)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+}
+
 static bool balloon_stats_enabled(const VirtIOBalloon *s)
 {
     return s->stats_poll_interval > 0;
@@ -213,32 +291,54 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     for (;;) {
         size_t offset = 0;
         uint32_t pfn;
+
         elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
         if (!elem) {
             return;
         }
 
-        while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
-            ram_addr_t pa;
-            ram_addr_t addr;
-            int p = virtio_ldl_p(vdev, &pfn);
-
-            pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
-            offset += 4;
-
-            /* FIXME: remove get_system_memory(), but how? */
-            section = memory_region_find(get_system_memory(), pa, 1);
-            if (!int128_nz(section.size) || !memory_region_is_ram(section.mr))
-                continue;
-
-            trace_virtio_balloon_handle_output(memory_region_name(section.mr),
-                                               pa);
-            /* Using memory_region_get_ram_ptr is bending the rules a bit, but
-               should be OK because we only want a single page.  */
-            addr = section.offset_within_region;
-            balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
-                         !!(vq == s->dvq));
-            memory_region_unref(section.mr);
+        if (balloon_page_bitmap_supported(s)) {
+            struct balloon_bmap_hdr hdr;
+            uint64_t bmap_len;
+
+            iov_to_buf(elem->out_sg, elem->out_num, offset, &hdr, sizeof(hdr));
+            offset += sizeof(hdr);
+
+            bmap_len = hdr.bmap_len;
+            if (bmap_len > 0) {
+                unsigned long *bitmap = bitmap_new(bmap_len * BITS_PER_BYTE);
+                iov_to_buf(elem->out_sg, elem->out_num, offset,
+                           bitmap, bmap_len);
+
+                balloon_bulk_pages(&hdr, bitmap, !!(vq == s->dvq));
+                g_free(bitmap);
+            }
+        } else {
+            while (iov_to_buf(elem->out_sg, elem->out_num, offset,
+                              &pfn, 4) == 4) {
+                ram_addr_t pa;
+                ram_addr_t addr;
+                int p = virtio_ldl_p(vdev, &pfn);
+
+                pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
+                offset += 4;
+
+                /* FIXME: remove get_system_memory(), but how? */
+                section = memory_region_find(get_system_memory(), pa, 1);
+                if (!int128_nz(section.size) ||
+                    !memory_region_is_ram(section.mr)) {
+                    continue;
+                }
+
+                trace_virtio_balloon_handle_output(memory_region_name(
+                                                            section.mr), pa);
+                /* Using memory_region_get_ram_ptr is bending the rules a bit,
+                 * but should be OK because we only want a single page.  */
+                addr = section.offset_within_region;
+                balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
+                             !!(vq == s->dvq));
+                memory_region_unref(section.mr);
+            }
         }
 
         virtqueue_push(vq, elem, offset);
@@ -500,6 +600,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
 static Property virtio_balloon_properties[] = {
     DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
+    DEFINE_PROP_BIT("page-bitmap", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_PAGE_BITMAP, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
1.8.3.1

WARNING: multiple messages have this Message-ID (diff)
From: Liang Li <liang.z.li@intel.com>
To: qemu-devel@nongnu.org
Cc: mst@redhat.com, pbonzini@redhat.com, quintela@redhat.com,
	amit.shah@redhat.com, kvm@vger.kernel.org, dgilbert@redhat.com,
	thuth@redhat.com, virtio-dev@lists.oasis-open.org,
	dave.hansen@intel.com, Liang Li <liang.z.li@intel.com>
Subject: [Qemu-devel] [PATCH qemu v3 2/6] virtio-balloon: speed up inflating & deflating process
Date: Fri, 21 Oct 2016 14:48:20 +0800	[thread overview]
Message-ID: <1477032504-12745-3-git-send-email-liang.z.li@intel.com> (raw)
In-Reply-To: <1477032504-12745-1-git-send-email-liang.z.li@intel.com>

The implementation of the current virtio-balloon is not very
efficient, the time spends on different stages of inflating
the balloon to 7GB of a 8GB idle guest:

a. allocating pages (6.5%)
b. sending PFNs to host (68.3%)
c. address translation (6.1%)
d. madvise (19%)

It takes about 4126ms for the inflating process to complete.
Debugging shows that the bottle neck are the stage b and stage d.

If using a bitmap to send the page info instead of the PFNs, we
can reduce the overhead in stage b quite a lot. Furthermore, we
can do the address translation and call madvise() with a bulk of
RAM pages, instead of the current page per page way, the overhead
of stage c and stage d can also be reduced a lot.

This patch is the kernel side implementation which is intended to
speed up the inflating & deflating process by adding a new feature
to the virtio-balloon device. With this new feature, inflating the
balloon to 7GB of a 8GB idle guest only takes 590ms, the
performance improvement is about 85%.

TODO: optimize stage a by allocating/freeing a chunk of pages
instead of a single page at a time.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/virtio-balloon.c | 144 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 123 insertions(+), 21 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1d77028..3fa80a4 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -52,6 +52,77 @@ static const char *balloon_stat_names[] = {
    [VIRTIO_BALLOON_S_NR] = NULL
 };
 
+static void do_balloon_bulk_pages(ram_addr_t base_pfn, uint16_t page_shift,
+                                  unsigned long len, bool deflate)
+{
+    ram_addr_t size, processed, chunk, base;
+    MemoryRegionSection section = {.mr = NULL};
+
+    size = len << page_shift;
+    base = base_pfn << page_shift;
+
+    for (processed = 0; processed < size; processed += chunk) {
+        chunk = size - processed;
+        while (chunk >= TARGET_PAGE_SIZE) {
+            section = memory_region_find(get_system_memory(),
+                                         base + processed, chunk);
+            if (!section.mr) {
+                chunk = QEMU_ALIGN_DOWN(chunk / 2, TARGET_PAGE_SIZE);
+            } else {
+                break;
+            }
+        }
+
+        if (section.mr &&
+            (int128_nz(section.size) && memory_region_is_ram(section.mr))) {
+            void *addr = section.offset_within_region +
+                   memory_region_get_ram_ptr(section.mr);
+            qemu_madvise(addr, chunk,
+                         deflate ? QEMU_MADV_WILLNEED : QEMU_MADV_DONTNEED);
+        } else {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "Invalid guest RAM range [0x%lx, 0x%lx]\n",
+                          base + processed, chunk);
+            chunk = TARGET_PAGE_SIZE;
+        }
+    }
+}
+
+static void balloon_bulk_pages(struct balloon_bmap_hdr *hdr,
+                               unsigned long *bitmap, bool deflate)
+{
+    ram_addr_t base_pfn = hdr->start_pfn;
+    uint16_t page_shift = hdr->page_shift;
+    unsigned long len = hdr->bmap_len;
+    unsigned long current = 0, end = len * BITS_PER_BYTE;
+
+    if (!qemu_balloon_is_inhibited() && (!kvm_enabled() ||
+                                         kvm_has_sync_mmu())) {
+        while (current < end) {
+            unsigned long one = find_next_bit(bitmap, end, current);
+
+            if (one < end) {
+                unsigned long pages, zero;
+
+                zero = find_next_zero_bit(bitmap, end, one + 1);
+                if (zero >= end) {
+                    pages = end - one;
+                } else {
+                    pages = zero - one;
+                }
+
+                if (pages) {
+                    do_balloon_bulk_pages(base_pfn + one, page_shift,
+                                          pages, deflate);
+                }
+                current = one + pages;
+            } else {
+                current = one;
+            }
+        }
+    }
+}
+
 /*
  * reset_stats - Mark all items in the stats array as unset
  *
@@ -72,6 +143,13 @@ static bool balloon_stats_supported(const VirtIOBalloon *s)
     return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_STATS_VQ);
 }
 
+static bool balloon_page_bitmap_supported(const VirtIOBalloon *s)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+
+    return virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_BITMAP);
+}
+
 static bool balloon_stats_enabled(const VirtIOBalloon *s)
 {
     return s->stats_poll_interval > 0;
@@ -213,32 +291,54 @@ static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     for (;;) {
         size_t offset = 0;
         uint32_t pfn;
+
         elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
         if (!elem) {
             return;
         }
 
-        while (iov_to_buf(elem->out_sg, elem->out_num, offset, &pfn, 4) == 4) {
-            ram_addr_t pa;
-            ram_addr_t addr;
-            int p = virtio_ldl_p(vdev, &pfn);
-
-            pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
-            offset += 4;
-
-            /* FIXME: remove get_system_memory(), but how? */
-            section = memory_region_find(get_system_memory(), pa, 1);
-            if (!int128_nz(section.size) || !memory_region_is_ram(section.mr))
-                continue;
-
-            trace_virtio_balloon_handle_output(memory_region_name(section.mr),
-                                               pa);
-            /* Using memory_region_get_ram_ptr is bending the rules a bit, but
-               should be OK because we only want a single page.  */
-            addr = section.offset_within_region;
-            balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
-                         !!(vq == s->dvq));
-            memory_region_unref(section.mr);
+        if (balloon_page_bitmap_supported(s)) {
+            struct balloon_bmap_hdr hdr;
+            uint64_t bmap_len;
+
+            iov_to_buf(elem->out_sg, elem->out_num, offset, &hdr, sizeof(hdr));
+            offset += sizeof(hdr);
+
+            bmap_len = hdr.bmap_len;
+            if (bmap_len > 0) {
+                unsigned long *bitmap = bitmap_new(bmap_len * BITS_PER_BYTE);
+                iov_to_buf(elem->out_sg, elem->out_num, offset,
+                           bitmap, bmap_len);
+
+                balloon_bulk_pages(&hdr, bitmap, !!(vq == s->dvq));
+                g_free(bitmap);
+            }
+        } else {
+            while (iov_to_buf(elem->out_sg, elem->out_num, offset,
+                              &pfn, 4) == 4) {
+                ram_addr_t pa;
+                ram_addr_t addr;
+                int p = virtio_ldl_p(vdev, &pfn);
+
+                pa = (ram_addr_t) p << VIRTIO_BALLOON_PFN_SHIFT;
+                offset += 4;
+
+                /* FIXME: remove get_system_memory(), but how? */
+                section = memory_region_find(get_system_memory(), pa, 1);
+                if (!int128_nz(section.size) ||
+                    !memory_region_is_ram(section.mr)) {
+                    continue;
+                }
+
+                trace_virtio_balloon_handle_output(memory_region_name(
+                                                            section.mr), pa);
+                /* Using memory_region_get_ram_ptr is bending the rules a bit,
+                 * but should be OK because we only want a single page.  */
+                addr = section.offset_within_region;
+                balloon_page(memory_region_get_ram_ptr(section.mr) + addr,
+                             !!(vq == s->dvq));
+                memory_region_unref(section.mr);
+            }
         }
 
         virtqueue_push(vq, elem, offset);
@@ -500,6 +600,8 @@ static const VMStateDescription vmstate_virtio_balloon = {
 static Property virtio_balloon_properties[] = {
     DEFINE_PROP_BIT("deflate-on-oom", VirtIOBalloon, host_features,
                     VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
+    DEFINE_PROP_BIT("page-bitmap", VirtIOBalloon, host_features,
+                    VIRTIO_BALLOON_F_PAGE_BITMAP, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
1.8.3.1

  parent reply	other threads:[~2016-10-21  6:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-21  6:48 [PATCH qemu v3 0/6] Fast (de)inflating & fast live migration Liang Li
2016-10-21  6:48 ` [Qemu-devel] " Liang Li
2016-10-21  6:48 ` [PATCH qemu v3 1/6] virtio-balloon: update linux head file Liang Li
2016-10-21  6:48   ` [Qemu-devel] " Liang Li
2016-10-21  6:48 ` Liang Li [this message]
2016-10-21  6:48   ` [Qemu-devel] [PATCH qemu v3 2/6] virtio-balloon: speed up inflating & deflating process Liang Li
2016-10-21  6:48 ` [PATCH qemu v3 3/6] balloon: get free page info from guest Liang Li
2016-10-21  6:48   ` [Qemu-devel] " Liang Li
2016-10-21  6:48 ` [PATCH qemu v3 4/6] bitmap: Add a new bitmap_move function Liang Li
2016-10-21  6:48   ` [Qemu-devel] " Liang Li
2016-10-21  6:48 ` [PATCH qemu v3 5/6] kvm: Add two new arch specific functions Liang Li
2016-10-21  6:48   ` [Qemu-devel] " Liang Li
2016-10-21  6:48 ` [PATCH qemu v3 6/6] migration: skip free pages during live migration Liang Li
2016-10-21  6:48   ` [Qemu-devel] " Liang Li
2016-10-21  7:19 ` [Qemu-devel] [PATCH qemu v3 0/6] Fast (de)inflating & fast " no-reply
2016-10-21  7:19   ` no-reply
2016-10-21  7:22 ` no-reply
2016-10-21  7:22   ` no-reply
2016-10-30 22:05 ` Michael S. Tsirkin
2016-10-30 22:05   ` [Qemu-devel] " Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1477032504-12745-3-git-send-email-liang.z.li@intel.com \
    --to=liang.z.li@intel.com \
    --cc=amit.shah@redhat.com \
    --cc=dave.hansen@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thuth@redhat.com \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.