All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] Host vs Guest memory allocation
@ 2010-04-05 22:45 Richard Henderson
  2010-04-05 23:18 ` Aurelien Jarno
  2010-04-12 11:25 ` Paul Brook
  0 siblings, 2 replies; 11+ messages in thread
From: Richard Henderson @ 2010-04-05 22:45 UTC (permalink / raw)
  To: qemu-devel Developers

The Problem:

CONFIG_USER_ONLY kinda sorta tries to manage the distinction between
qemu memory and guest memory. This can be seen in the PAGE_RESERVED
frobbing and qemu_malloc etc. However, it doesn't handle random malloc
calls eg from libc itself or other libraries in use.

Possible solutions:

There are several possible solutions as I see it, each depending on the
pairing of host and guest address space characteristics:

(0) Do nothing.  That is, don't even pretend to record host memory and
    validate guest access that may or may not overlap.  To my mind this
    is in fact an improvement over the kinda-sorta solution we have now.

(1) Enable softmmu for userland. This is of course the highest overhead,
    but will work for all combinations.

(2) Pre-allocate the entire guest address space in the host.  With
    Linux mmap w/ MAP_NORESERVE or Windows VirtualAlloc w/ MEM_RESERVE
    and a possibly reduced guest address space this is doesn't seem so bad.

    Reducing the guest's virtual address space via a command-line switch
    seems like a fairly good idea to help this method apply when emulating
    a 32-bit guest on a 32-bit host.  With few exceptions, it seems like
    most guests could behave well under these conditions, as it isn't
    entirely dis-similar to RLIMIT_AS.

    (2a) Arrange the qemu binary itself so that this pre-allocation can
         happen in low-memory for qemu.

         For 64-bit hosts, this can include compiling the application such
         that it's placed in memory above 4GB.  Of course, this movement
         must be balanced against the performance hit that might be incurred
         beyond the setting of GUEST_BASE to a non-zero value, e.g:

         On x86-64, one could compile the application with -fpie to arrange
         for the compiler to access its code and data with rip-relative
         addressing instead of absolute addressing.  As far as I remember,
         this is a minor code size penalty, but doesn't affect the runtime
         performance at all.

         On sparc64, which has no pc-relative addressing mode, moving the
         application outside the low 4G would be a significant penalty to
         its code size and performance.  However, there's very little 
         overhead in a non-zero GUEST_BASE, since we store GUEST_BASE in
         a register during TCG-generated code and emit accesses to guest
         memory with a reg+reg addressing mode.

         On ia64 and alpha, the application is *always* placed in high
         memory somewhere, so nothing at all needs to change.

         On ppc64, code is compiled pic by default, and so there is no
         penalty for moving the base of the application out of low memory.

    (2b) On x86, use segmentation to allow the guest address space anywhere,
         and at the same time check that the guest memory accesses are in
         range for the guest.

         Clearly this can be done on Linux via modify_ldt(2), and on
         the BSDs via i386_set_ldt.  I can't find an MSDN documented
         method for this for Windows, although there seems to be plenty
         of security advisories that suggest that it's possible.  ;-P

(3) Provide an implementation of malloc for use in qemu which keeps the
    PAGE_RESERVED entries up-to-date all the time.

    Glibc, and probably most other libc implementations, are set up to
    allow malloc to be overridden by the main executable.  There are
    plenty of good malloc implementations out there which we could modify
    to record the bits we need.  For 64-bit hosts, it could likely also
    arrange for the allocated memory to not conflict with the guest address
    space most of the time.

(4) Implement all of 1-3 and select among them at startup.  For instance,
    begin by trying #2, and if the reservation of the guest address space
    fails then fall back to either #1 or #3 based on some command-line switch.

    This is of course my favorite option.  Hopefully a lot of the if-deffery
    concerning CONFIG_USER_ONLY and CONFIG_SOFTMMU can be eliminated in the
    process.

Comments?


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] Host vs Guest memory allocation
  2010-04-05 22:45 [Qemu-devel] [RFC] Host vs Guest memory allocation Richard Henderson
@ 2010-04-05 23:18 ` Aurelien Jarno
  2010-04-12 11:48   ` Avi Kivity
  2010-04-12 11:25 ` Paul Brook
  1 sibling, 1 reply; 11+ messages in thread
From: Aurelien Jarno @ 2010-04-05 23:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel Developers

On Mon, Apr 05, 2010 at 03:45:23PM -0700, Richard Henderson wrote:
> The Problem:
> 
> CONFIG_USER_ONLY kinda sorta tries to manage the distinction between
> qemu memory and guest memory. This can be seen in the PAGE_RESERVED
> frobbing and qemu_malloc etc. However, it doesn't handle random malloc
> calls eg from libc itself or other libraries in use.
> 
> Possible solutions:
> 
> There are several possible solutions as I see it, each depending on the
> pairing of host and guest address space characteristics:
> 
> (0) Do nothing.  That is, don't even pretend to record host memory and
>     validate guest access that may or may not overlap.  To my mind this
>     is in fact an improvement over the kinda-sorta solution we have now.
> 
> (1) Enable softmmu for userland. This is of course the highest overhead,
>     but will work for all combinations.
> 

This option would solve a lot of problems and simplify a lot of code:
unaligned access on hosts requiring strict alignements, different page 
size between host and guest, self-modifying code, 64-bit user land on 
32-bit host, etc... That's what currently comes to my mind, but that 
would solve a lot more problems.

It will clearly have a high overhead, but it might be interesting to
quantify it, to see if we should continue to add tons of code/workaround
to linux-user instead of switching to softmmu for userland. It could
also be optimized, for example by increasing the TLB size compared to 
system mode, and doing TLB preallocation on mmap calls.

IOW this is my favorite option ;-)

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] Host vs Guest memory allocation
  2010-04-05 22:45 [Qemu-devel] [RFC] Host vs Guest memory allocation Richard Henderson
  2010-04-05 23:18 ` Aurelien Jarno
@ 2010-04-12 11:25 ` Paul Brook
  2010-04-12 14:48   ` Richard Henderson
  1 sibling, 1 reply; 11+ messages in thread
From: Paul Brook @ 2010-04-12 11:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

> The Problem:
> 
> CONFIG_USER_ONLY kinda sorta tries to manage the distinction between
> qemu memory and guest memory. This can be seen in the PAGE_RESERVED
> frobbing and qemu_malloc etc. However, it doesn't handle random malloc
> calls eg from libc itself or other libraries in use.
> 
> Possible solutions:
> 
> There are several possible solutions as I see it, each depending on the
> pairing of host and guest address space characteristics:
> 
> (0) Do nothing.  That is, don't even pretend to record host memory and
>     validate guest access that may or may not overlap.  To my mind this
>     is in fact an improvement over the kinda-sorta solution we have now.

This is effectively what we do now, the PAGE_RESERVED bits aren't used for 
anything interesting.  Any guest application that makes assumptions about 
address space availability (i.e. maps at hardcoded addresses) is already 
likely to be broken on many native kernels. target_mmap is implemented via 
host mmap, so this should just work.
The only time we have a fixed guest address is loading non-pic applications. 
This is known at startup, so can be fixed by setting guest_base appropriately. 
I have partial patches to fix this.
 
> (1) Enable softmmu for userland. This is of course the highest overhead,
>     but will work for all combinations.

This has a significant performance hit, and gets very tricky for things like 
mmaped files.

> (2) Pre-allocate the entire guest address space in the host.  With
>     Linux mmap w/ MAP_NORESERVE or Windows VirtualAlloc w/ MEM_RESERVE
>     and a possibly reduced guest address space this is doesn't seem so bad.

This breaks if the host sets ulimit -v.
 
Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] Host vs Guest memory allocation
  2010-04-05 23:18 ` Aurelien Jarno
@ 2010-04-12 11:48   ` Avi Kivity
  2010-04-12 14:55     ` Richard Henderson
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2010-04-12 11:48 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel Developers, Richard Henderson

On 04/06/2010 02:18 AM, Aurelien Jarno wrote:
>
>> (1) Enable softmmu for userland. This is of course the highest overhead,
>>      but will work for all combinations.
>>
>>      
> This option would solve a lot of problems and simplify a lot of code:
> unaligned access on hosts requiring strict alignements, different page
> size between host and guest, self-modifying code, 64-bit user land on
> 32-bit host, etc... That's what currently comes to my mind, but that
> would solve a lot more problems.
>
> It will clearly have a high overhead, but it might be interesting to
> quantify it, to see if we should continue to add tons of code/workaround
> to linux-user instead of switching to softmmu for userland. It could
> also be optimized, for example by increasing the TLB size compared to
> system mode, and doing TLB preallocation on mmap calls.
>
>    

You could reduce the overhead somewhat by using kvm for memory 
translation on hosts that support it.  Of course tcg translation and 
syscall costs will grow by the exit overhead.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] Host vs Guest memory allocation
  2010-04-12 11:25 ` Paul Brook
@ 2010-04-12 14:48   ` Richard Henderson
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Henderson @ 2010-04-12 14:48 UTC (permalink / raw)
  To: Paul Brook; +Cc: qemu-devel

On 04/12/2010 04:25 AM, Paul Brook wrote:
>> (1) Enable softmmu for userland. This is of course the highest overhead,
>>      but will work for all combinations.
>
> This has a significant performance hit, and gets very tricky for things like
> mmaped files.

It has the advantage of actually working for several cases of 64-on-32
that simply don't at the moment.  MMap is tricky, but no more than 
usual.  We still have problems with partial pages mapped past the end
of the file when host page size > target page size.

>> (2) Pre-allocate the entire guest address space in the host.  With
>>      Linux mmap w/ MAP_NORESERVE or Windows VirtualAlloc w/ MEM_RESERVE
>>      and a possibly reduced guest address space this is doesn't seem so bad.
>
> This breaks if the host sets ulimit -v.

Yes, but we'd know that immediately at startup.  This is why I 
recommended implementing multiple solutions and falling back from one to 
the other when we find they don't work.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] Host vs Guest memory allocation
  2010-04-12 11:48   ` Avi Kivity
@ 2010-04-12 14:55     ` Richard Henderson
  2010-04-12 15:09       ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Henderson @ 2010-04-12 14:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel Developers, Aurelien Jarno

On 04/12/2010 04:48 AM, Avi Kivity wrote:
>>> (1) Enable softmmu for userland. This is of course the highest overhead,
>>> but will work for all combinations.
>>>
...
> You could reduce the overhead somewhat by using kvm for memory
> translation on hosts that support it. Of course tcg translation and
> syscall costs will grow by the exit overhead.

I've thought about this a bit, and what seemed to be the stickler is
what is the environment that runs in the guest?  TCG generated code
is of course fine, but what about the helper functions?  How can we
tell whether a given helper function can run in the restricted 
environment of the guest or whether it needs to transition back to the 
environment of the host to do its work?

I suppose the obvious solution is some sort of flag on the function that 
well-maintained ports will set.  But the whole marshalling thing is 
still pretty tricky.


r~

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] Host vs Guest memory allocation
  2010-04-12 14:55     ` Richard Henderson
@ 2010-04-12 15:09       ` Avi Kivity
  2010-04-12 15:39         ` Alexander Graf
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2010-04-12 15:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel Developers, Aurelien Jarno

On 04/12/2010 05:55 PM, Richard Henderson wrote:
>
>> You could reduce the overhead somewhat by using kvm for memory
>> translation on hosts that support it. Of course tcg translation and
>> syscall costs will grow by the exit overhead.
>
> I've thought about this a bit, and what seemed to be the stickler is
> what is the environment that runs in the guest?  TCG generated code
> is of course fine, but what about the helper functions?  How can we
> tell whether a given helper function can run in the restricted 
> environment of the guest or whether it needs to transition back to the 
> environment of the host to do its work?

I'd guess all helpers can run in guest context except those that cause a 
transition to target kernel mode.

> I suppose the obvious solution is some sort of flag on the function 
> that well-maintained ports will set.  But the whole marshalling thing 
> is still pretty tricky.

Pass everything through memory; will there be many transitions apart 
from trapping instructions and missing translations?

For extra points run the translator in guest context.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] Host vs Guest memory allocation
  2010-04-12 15:09       ` Avi Kivity
@ 2010-04-12 15:39         ` Alexander Graf
  2010-04-12 15:49           ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Graf @ 2010-04-12 15:39 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel Developers, Aurelien Jarno, Richard Henderson


On 12.04.2010, at 17:09, Avi Kivity wrote:

> On 04/12/2010 05:55 PM, Richard Henderson wrote:
>> 
>>> You could reduce the overhead somewhat by using kvm for memory
>>> translation on hosts that support it. Of course tcg translation and
>>> syscall costs will grow by the exit overhead.
>> 
>> I've thought about this a bit, and what seemed to be the stickler is
>> what is the environment that runs in the guest?  TCG generated code
>> is of course fine, but what about the helper functions?  How can we
>> tell whether a given helper function can run in the restricted environment of the guest or whether it needs to transition back to the environment of the host to do its work?
> 
> I'd guess all helpers can run in guest context except those that cause a transition to target kernel mode.
> 
>> I suppose the obvious solution is some sort of flag on the function that well-maintained ports will set.  But the whole marshalling thing is still pretty tricky.
> 
> Pass everything through memory; will there be many transitions apart from trapping instructions and missing translations?

I don't see how that would help with the 64-on-32 issue. You still don't get a 64 bit address space from running inside KVM.

Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] Host vs Guest memory allocation
  2010-04-12 15:39         ` Alexander Graf
@ 2010-04-12 15:49           ` Avi Kivity
  2010-04-12 15:56             ` Alexander Graf
  0 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2010-04-12 15:49 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel Developers, Aurelien Jarno, Richard Henderson

On 04/12/2010 06:39 PM, Alexander Graf wrote:
>
>> Pass everything through memory; will there be many transitions apart from trapping instructions and missing translations?
>>      
> I don't see how that would help with the 64-on-32 issue. You still don't get a 64 bit address space from running inside KVM.
>    

True.  Like the other options, it's just another tool in the toolbox and 
doesn't solve all problems.

You could cheat and have a 64-bit kernel under a 32-bit qemu.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] Host vs Guest memory allocation
  2010-04-12 15:49           ` Avi Kivity
@ 2010-04-12 15:56             ` Alexander Graf
  2010-04-12 16:08               ` Avi Kivity
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Graf @ 2010-04-12 15:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel Developers, Aurelien Jarno, Richard Henderson


On 12.04.2010, at 17:49, Avi Kivity wrote:

> On 04/12/2010 06:39 PM, Alexander Graf wrote:
>> 
>>> Pass everything through memory; will there be many transitions apart from trapping instructions and missing translations?
>>>     
>> I don't see how that would help with the 64-on-32 issue. You still don't get a 64 bit address space from running inside KVM.
>>   
> 
> True.  Like the other options, it's just another tool in the toolbox and doesn't solve all problems.
> 
> You could cheat and have a 64-bit kernel under a 32-bit qemu.

For fully system emulation on the other hand I can imagine quite some nice tricks one could pull.

On PPC hosts you get a huge number of VSIDs that are basically like tags on the TLB. So if you'd give every x86 page table one VSID you'd potentially have really great and fast shadow PTEs.

On x86 hosts you can just keep several page tables around. You can then map for example every combination of guest VSIDs to one page table each.


I'm sure there are similar fun things you can do with the other supported archs. The hard part is to come up with something generic enough so it works on all hosts and guests with little effort. Oh well :)

Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [RFC] Host vs Guest memory allocation
  2010-04-12 15:56             ` Alexander Graf
@ 2010-04-12 16:08               ` Avi Kivity
  0 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2010-04-12 16:08 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-devel Developers, Aurelien Jarno, Richard Henderson

On 04/12/2010 06:56 PM, Alexander Graf wrote:
>
> For fully system emulation on the other hand I can imagine quite some nice tricks one could pull.
>
> On PPC hosts you get a huge number of VSIDs that are basically like tags on the TLB. So if you'd give every x86 page table one VSID you'd potentially have really great and fast shadow PTEs.
>    

You mean, if you have lots of ppc machines but no x86?

smp would be a problem because of the relaxed memory model (of course 
tcg needs a lot of work before it can do smp anyway).

> On x86 hosts you can just keep several page tables around. You can then map for example every combination of guest VSIDs to one page table each.
>    

Yeah.

> I'm sure there are similar fun things you can do with the other supported archs. The hard part is to come up with something generic enough so it works on all hosts and guests with little effort. Oh well :)
>    

Well, x86 page tables are pretty flexible, the memory model is strict, 
the atomics are rich, and you have both unaligned and trapping 
accesses.  So if you restrict yourself to x86 hosts I think you can do 
anything with page size >= 4k.

Run both user and kernel mode in guest user mode, do tcg and mmu in 
kernel mode.  Should be fun.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-04-12 16:08 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-05 22:45 [Qemu-devel] [RFC] Host vs Guest memory allocation Richard Henderson
2010-04-05 23:18 ` Aurelien Jarno
2010-04-12 11:48   ` Avi Kivity
2010-04-12 14:55     ` Richard Henderson
2010-04-12 15:09       ` Avi Kivity
2010-04-12 15:39         ` Alexander Graf
2010-04-12 15:49           ` Avi Kivity
2010-04-12 15:56             ` Alexander Graf
2010-04-12 16:08               ` Avi Kivity
2010-04-12 11:25 ` Paul Brook
2010-04-12 14:48   ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.