[Qemu-devel] [RFC] Host vs Guest memory allocation

* [Qemu-devel] [RFC] Host vs Guest memory allocation
@ 2010-04-05 22:45 Richard Henderson
  2010-04-05 23:18 ` Aurelien Jarno
  2010-04-12 11:25 ` Paul Brook
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Henderson @ 2010-04-05 22:45 UTC (permalink / raw)
  To: qemu-devel Developers

The Problem:

CONFIG_USER_ONLY kinda sorta tries to manage the distinction between
qemu memory and guest memory. This can be seen in the PAGE_RESERVED
frobbing and qemu_malloc etc. However, it doesn't handle random malloc
calls eg from libc itself or other libraries in use.

Possible solutions:

There are several possible solutions as I see it, each depending on the
pairing of host and guest address space characteristics:

(0) Do nothing.  That is, don't even pretend to record host memory and
    validate guest access that may or may not overlap.  To my mind this
    is in fact an improvement over the kinda-sorta solution we have now.

(1) Enable softmmu for userland. This is of course the highest overhead,
    but will work for all combinations.

(2) Pre-allocate the entire guest address space in the host.  With
    Linux mmap w/ MAP_NORESERVE or Windows VirtualAlloc w/ MEM_RESERVE
    and a possibly reduced guest address space this is doesn't seem so bad.

    Reducing the guest's virtual address space via a command-line switch
    seems like a fairly good idea to help this method apply when emulating
    a 32-bit guest on a 32-bit host.  With few exceptions, it seems like
    most guests could behave well under these conditions, as it isn't
    entirely dis-similar to RLIMIT_AS.

    (2a) Arrange the qemu binary itself so that this pre-allocation can
         happen in low-memory for qemu.

         For 64-bit hosts, this can include compiling the application such
         that it's placed in memory above 4GB.  Of course, this movement
         must be balanced against the performance hit that might be incurred
         beyond the setting of GUEST_BASE to a non-zero value, e.g:

         On x86-64, one could compile the application with -fpie to arrange
         for the compiler to access its code and data with rip-relative
         addressing instead of absolute addressing.  As far as I remember,
         this is a minor code size penalty, but doesn't affect the runtime
         performance at all.

         On sparc64, which has no pc-relative addressing mode, moving the
         application outside the low 4G would be a significant penalty to
         its code size and performance.  However, there's very little 
         overhead in a non-zero GUEST_BASE, since we store GUEST_BASE in
         a register during TCG-generated code and emit accesses to guest
         memory with a reg+reg addressing mode.

         On ia64 and alpha, the application is *always* placed in high
         memory somewhere, so nothing at all needs to change.

         On ppc64, code is compiled pic by default, and so there is no
         penalty for moving the base of the application out of low memory.

    (2b) On x86, use segmentation to allow the guest address space anywhere,
         and at the same time check that the guest memory accesses are in
         range for the guest.

         Clearly this can be done on Linux via modify_ldt(2), and on
         the BSDs via i386_set_ldt.  I can't find an MSDN documented
         method for this for Windows, although there seems to be plenty
         of security advisories that suggest that it's possible.  ;-P

(3) Provide an implementation of malloc for use in qemu which keeps the
    PAGE_RESERVED entries up-to-date all the time.

    Glibc, and probably most other libc implementations, are set up to
    allow malloc to be overridden by the main executable.  There are
    plenty of good malloc implementations out there which we could modify
    to record the bits we need.  For 64-bit hosts, it could likely also
    arrange for the allocated memory to not conflict with the guest address
    space most of the time.

(4) Implement all of 1-3 and select among them at startup.  For instance,
    begin by trying #2, and if the reservation of the guest address space
    fails then fall back to either #1 or #3 based on some command-line switch.

    This is of course my favorite option.  Hopefully a lot of the if-deffery
    concerning CONFIG_USER_ONLY and CONFIG_SOFTMMU can be eliminated in the
    process.

Comments?

r~

^ permalink raw reply	[flat|nested] 11+ messages in thread