[Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage

* [Qemu-devel] [PATCH 00/15] optimize Qemu RSS usage
@ 2016-06-28  9:01 Peter Lieven
  2016-06-28  9:01 ` [Qemu-devel] [PATCH 01/15] coroutine-ucontext: mmap stack memory Peter Lieven
                   ` (16 more replies)
  0 siblings, 17 replies; 78+ messages in thread
From: Peter Lieven @ 2016-06-28  9:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, mreitz, pbonzini, mst, dgilbert, peter.maydell, kraxel,
	Peter Lieven

I recently found that Qemu is using several hundred megabytes of RSS memory
more than older versions such as Qemu 2.2.0. So I started tracing
memory allocation and found 2 major reasons for this.

1) We changed the qemu coroutine pool to have a per thread and a global release
   pool. The choosen poolsize and the changed algorithm could lead to up to
   192 free coroutines with just a single iothread. Each of the coroutines
   in the pool each having 1MB of stack memory.

2) Between Qemu 2.2.0 and 2.3.0 RCU was introduced which lead to delayed freeing
   of memory. This lead to higher heap allocations which could not effectively
   be returned to kernel (most likely due to fragmentation).

The following series is what I came up with. Beside the coroutine patches I changed
some allocations to forcibly use mmap. All these allocations are not repeatly made
during runtime so the impact of using mmap should be neglectible.

There are still some big malloced allocations left which cannot be easily changed
(e.g. the pixman buffers in VNC). So it might an idea to set a lower mmap threshold for
malloc since this threshold seems to be in the order of several Megabytes on modern systems.

Peter Lieven (15):
  coroutine-ucontext: mmap stack memory
  coroutine-ucontext: add a switch to monitor maximum stack size
  coroutine-ucontext: reduce stack size to 64kB
  coroutine: add a knob to disable the shared release pool
  util: add a helper to mmap private anonymous memory
  exec: use mmap for subpages
  qapi: use mmap for QmpInputVisitor
  virtio: use mmap for VirtQueue
  loader: use mmap for ROMs
  vmware_svga: use mmap for scratch pad
  qom: use mmap for bigger Objects
  util: add a function to realloc mmapped memory
  exec: use mmap for PhysPageMap->nodes
  vnc-tight: make the encoding palette static
  vnc: use mmap for VncState

 configure                 | 33 ++++++++++++++++++--
 exec.c                    | 11 ++++---
 hw/core/loader.c          | 16 +++++-----
 hw/display/vmware_vga.c   |  3 +-
 hw/virtio/virtio.c        |  5 +--
 include/qemu/mmap-alloc.h |  7 +++++
 include/qom/object.h      |  1 +
 qapi/qmp-input-visitor.c  |  5 +--
 qom/object.c              | 20 ++++++++++--
 ui/vnc-enc-tight.c        | 21 ++++++-------
 ui/vnc.c                  |  5 +--
 ui/vnc.h                  |  1 +
 util/coroutine-ucontext.c | 66 +++++++++++++++++++++++++++++++++++++--
 util/mmap-alloc.c         | 27 ++++++++++++++++
 util/qemu-coroutine.c     | 79 ++++++++++++++++++++++++++---------------------
 15 files changed, 225 insertions(+), 75 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 78+ messages in thread