* [Qemu-devel] simulated memory instead of host memory
@ 2003-06-09 18:31 Johan Rydberg
2003-06-09 19:09 ` Fabrice Bellard
0 siblings, 1 reply; 5+ messages in thread
From: Johan Rydberg @ 2003-06-09 18:31 UTC (permalink / raw)
To: qemu-devel
First question of the day;
First of all I would like to say that I really like the concept of QEMU.
Let GCC do most of the work and just glue it all together. Brilliant.
One downside of it though is all the tampering with flags to GCC.
To the question. How hard would it be to make QEMU a full-system
simulator? Or, more concrete: How hard would it be to instead of using
the host memory for the simulated app, use simulated memory (on per-page
basis)?
--
Johan Rydberg, Free Software Developer, Sweden
http://rtmk.sf.net | http://www.nongnu.org/guss/
Listning to Tricky - Where I'm from
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] simulated memory instead of host memory
2003-06-09 18:31 [Qemu-devel] simulated memory instead of host memory Johan Rydberg
@ 2003-06-09 19:09 ` Fabrice Bellard
2003-06-09 19:37 ` Johan Rydberg
0 siblings, 1 reply; 5+ messages in thread
From: Fabrice Bellard @ 2003-06-09 19:09 UTC (permalink / raw)
To: qemu-devel
Johan Rydberg wrote:
> First question of the day;
>
> First of all I would like to say that I really like the concept of QEMU.
> Let GCC do most of the work and just glue it all together. Brilliant.
> One downside of it though is all the tampering with flags to GCC.
Yes, on every new gcc version there may be problems... a solution may be
to distribute binary only versions of some object files of QEMU.
I hope that someday someone will make a proper code generator, but it is
really a lot of work!
> To the question. How hard would it be to make QEMU a full-system
> simulator? Or, more concrete: How hard would it be to instead of using
> the host memory for the simulated app, use simulated memory (on per-page
> basis)?
It would be possible. I spent a lot of time thinking about it, but I did
not make it because of lack of time and motivation. I see three solutions:
1) The very slow (but simplest) solution is to just modify the memory
access inline functions in 'cpu-i386.h' to emulate the x86 MMU.
2) A faster solution is to use 4MB tables containing the addresses of
each CPU page. One 4MB table would be used for read, one table for
write. The tables can be seen as big TLBs. Unmapped pages would have a
NULL entry in the tables so that a fault is generated on access to fill
the table.
3) An even faster solution is to use Linux memory mappings to emulate
the MMU. The Linux MM state of the process would be considered as a TLB
of the virtual x86 MMU state. It works only if the host has <= 4KB page
size and if the guest OS don't do any mapping in memory >= 0xc0000000.
With Linux as guest it would work as you can easily change the base
address of the kernel. The restriction about mappings >= 0xc0000000
could be suppressed with a small (but tricky) kernel patch which would
allow to mmap() at addresses >= 0xc0000000.
I wanted to implement solution (3) to be able to simulate an unpatched
Linux kernel (and call the project 'qplex86' !).
To run any OS you would also need precise segment limits and rights
emulation, at least for non user code.
Fabrice.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] simulated memory instead of host memory
2003-06-09 19:09 ` Fabrice Bellard
@ 2003-06-09 19:37 ` Johan Rydberg
2003-06-09 20:18 ` Fabrice Bellard
0 siblings, 1 reply; 5+ messages in thread
From: Johan Rydberg @ 2003-06-09 19:37 UTC (permalink / raw)
To: qemu-devel
On Mon, 09 Jun 2003 21:09:37 +0200
Fabrice Bellard <fabrice.bellard@free.fr> wrote:
: It would be possible. I spent a lot of time thinking about it, but I did
: not make it because of lack of time and motivation. I see three solutions:
: [...]
: 2) A faster solution is to use 4MB tables containing the addresses of
: each CPU page. One 4MB table would be used for read, one table for
: write. The tables can be seen as big TLBs. Unmapped pages would have a
: NULL entry in the tables so that a fault is generated on access to fill
: the table.
In the current version of GUSS I use a similar technique. I call them
mtcaches, which stands for memory translation caches. They can be seen
as a direct mapped cache, with the virtual page number as index. The
tag is constructed from the virtual address, with the offset masked.
The cache contains <tag, diff> tuples. The diff is the difference between
the virtual address and the host memory address. When there is a mtcache
hit, all that has to be done to get the host memory address is add the virtual
address to the diff value.
When there is a mtcache miss, the full MMU emulation code is called.
It is up to it to add entries to the mtcache (there is separate mtcaches
for reads and write, and user and supervisor mode).
Some early testing (booting the Linux kernel on a simulated MIPS32 4Kc)
shows that you can get a 95% hit rate or more.
On SPARC and other RISC architectures which has bit extraction insns
and register+register addring the test against the mtcache can be done
in 6-8 insns. The testing on IA-32 is a bit more complex (12-14 insns),
mainly due to the limited number of general purpose registers.
This is what my code generator emits for a memory store. The value that
should be stores is located in %ebx. The virtual address in %eax.
%ecx must be pushed on the stack to free a register.
40017160: 0000005b: push %ecx
40017161: 0000005c: mov 0x805cce4,%ebp pointer to mtcache
40017167: 00000062: mov %eax,%ecx
40017169: 00000064: shr $0xc,%ecx
4001716c: 00000067: and $0xff,%ecx 256 entries
40017172: 0000006d: lea 0x0(%ebp,%ecx,8),%esi mtcache entry at %esi
40017176: 00000071: mov %eax,%ecx
40017178: 00000073: and $0xfffff000,%ecx make tag
4001717e: 00000079: cmp %ecx,0x0(%esi) and compare
40017181: 0000007c: jne 0x00000439 miss -> slow way
40017187: 00000082: mov 0x4(%esi),%esi
4001718a: 00000085: add %eax,%esi
4001718c: 00000087: mov %ebx,0x0(%esi) do the store
4001718f: 0000008a: pop %ecx
Can you come to thing of a faster way to do it? Note that I generate
the code by hand (not using GCC).
: 3) An even faster solution is to use Linux memory mappings to emulate
: the MMU. The Linux MM state of the process would be considered as a TLB
: of the virtual x86 MMU state. It works only if the host has <= 4KB page
: size and if the guest OS don't do any mapping in memory >= 0xc0000000.
: With Linux as guest it would work as you can easily change the base
: address of the kernel. The restriction about mappings >= 0xc0000000
: could be suppressed with a small (but tricky) kernel patch which would
: allow to mmap() at addresses >= 0xc0000000.
Since it isn't very portable I don't think it is an option.
: I wanted to implement solution (3) to be able to simulate an unpatched
: Linux kernel (and call the project 'qplex86' !).
:
: To run any OS you would also need precise segment limits and rights
: emulation, at least for non user code.
Of course. Everything has to be simulated. That is the challange :)
--
Johan Rydberg, Free Software Developer, Sweden
http://rtmk.sf.net | http://www.nongnu.org/guss/
Listning to Her Majesty - F.U.N.E.R.A.L.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] simulated memory instead of host memory
2003-06-09 19:37 ` Johan Rydberg
@ 2003-06-09 20:18 ` Fabrice Bellard
2003-06-09 20:43 ` Johan Rydberg
0 siblings, 1 reply; 5+ messages in thread
From: Fabrice Bellard @ 2003-06-09 20:18 UTC (permalink / raw)
To: qemu-devel
Johan Rydberg wrote:
> This is what my code generator emits for a memory store. The value that
> should be stores is located in %ebx. The virtual address in %eax.
> %ecx must be pushed on the stack to free a register.
>
> 40017160: 0000005b: push %ecx
> 40017161: 0000005c: mov 0x805cce4,%ebp pointer to mtcache
> 40017167: 00000062: mov %eax,%ecx
> 40017169: 00000064: shr $0xc,%ecx
> 4001716c: 00000067: and $0xff,%ecx 256 entries
> 40017172: 0000006d: lea 0x0(%ebp,%ecx,8),%esi mtcache entry at %esi
> 40017176: 00000071: mov %eax,%ecx
> 40017178: 00000073: and $0xfffff000,%ecx make tag
> 4001717e: 00000079: cmp %ecx,0x0(%esi) and compare
> 40017181: 0000007c: jne 0x00000439 miss -> slow way
> 40017187: 00000082: mov 0x4(%esi),%esi
> 4001718a: 00000085: add %eax,%esi
> 4001718c: 00000087: mov %ebx,0x0(%esi) do the store
> 4001718f: 0000008a: pop %ecx
>
> Can you come to thing of a faster way to do it? Note that I generate
> the code by hand (not using GCC).
Using a cache as you do is a good idea. You can save some insns, and
more if you use differents bits of the address (do a mask with 0x7f8),
but you would have less cache hits.
40017160: 0000005b: push %ecx
40017167: 00000062: mov %eax,%esi
40017169: 00000064: shr $0xc,%esi
movl %esi, %ecx
4001716c: 00000067: and $0xff,%esi 256 entries
4001717e: 00000079: cmp %ecx,0x805cee4(%esi,8) compare
40017181: 0000007c: jne 0x00000439 miss -> slow way
40017187: 00000082: add 0x805cee8(%esi,8),%eax
4001718c: 00000087: mov %ebx,0x0(%eax) do the store
4001718f: 0000008a: pop %ecx
I guess GCC should give nearly optimal code.
> : 3) An even faster solution is to use Linux memory mappings to emulate
> : the MMU. The Linux MM state of the process would be considered as a TLB
> : of the virtual x86 MMU state. It works only if the host has <= 4KB page
> : size and if the guest OS don't do any mapping in memory >= 0xc0000000.
> : With Linux as guest it would work as you can easily change the base
> : address of the kernel. The restriction about mappings >= 0xc0000000
> : could be suppressed with a small (but tricky) kernel patch which would
> : allow to mmap() at addresses >= 0xc0000000.
>
> Since it isn't very portable I don't think it is an option.
Well, if you generate code it is already not portable :-)
Fabrice.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] simulated memory instead of host memory
2003-06-09 20:18 ` Fabrice Bellard
@ 2003-06-09 20:43 ` Johan Rydberg
0 siblings, 0 replies; 5+ messages in thread
From: Johan Rydberg @ 2003-06-09 20:43 UTC (permalink / raw)
To: qemu-devel
Fabrice Bellard <fabrice.bellard@free.fr> wrote:
: Using a cache as you do is a good idea. You can save some insns, and
: more if you use differents bits of the address (do a mask with 0x7f8),
: but you would have less cache hits.
Since doing it the "slow way" is really slow you should try to maximize
the hit rate. The ideal whould be something like 95-99% hit rate for
normal pages, and it should only have to escape into the slow path on
accesses to memory mapped I/O devices. Well, you could always dream I
guess.
: 40017160: 0000005b: push %ecx
: 40017167: 00000062: mov %eax,%esi
: 40017169: 00000064: shr $0xc,%esi
: movl %esi, %ecx
: 4001716c: 00000067: and $0xff,%esi 256 entries
: 4001717e: 00000079: cmp %ecx,0x805cee4(%esi,8) compare
: 40017181: 0000007c: jne 0x00000439 miss -> slow way
: 40017187: 00000082: add 0x805cee8(%esi,8),%eax
: 4001718c: 00000087: mov %ebx,0x0(%eax) do the store
: 4001718f: 0000008a: pop %ecx
Does this really work? 0x805cee4 is the address to _a pointer_ that holds
the address of the mtcache. The reason for having a pointer to the real
mtcache is that it is much faster just to change the pointer when switching
between user and supervisor mode (and the other way around).
Maybe it would be better to have centralized mtcache, and copy the contents
of the per-cpu and per-state mtcaches into that one when the state changes.
The reason for masking the virtual address as I did, and use it as tag in
the cache is that you may also check for unaligned memory accesses.
This is not an issue when simulating IA-32, but you must detect it when
simulate machines that can not cope with unaligned accesses.
: I guess GCC should give nearly optimal code.
Most probably. I will wrap something together and see what it generates.
: [...]
: Well, if you generate code it is already not portable :-)
I ment between systsems such as GNU/Linux and BSD.
--
Johan Rydberg, Free Software Developer, Sweden
http://rtmk.sf.net | http://www.nongnu.org/guss/
Listning to Her Majesty - Rules to follow
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2003-06-09 20:50 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-09 18:31 [Qemu-devel] simulated memory instead of host memory Johan Rydberg
2003-06-09 19:09 ` Fabrice Bellard
2003-06-09 19:37 ` Johan Rydberg
2003-06-09 20:18 ` Fabrice Bellard
2003-06-09 20:43 ` Johan Rydberg
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.