* [Qemu-devel] qemu-user performance
@ 2018-11-16 13:55 Etienne Dublé
2018-11-16 15:41 ` Emilio G. Cota
0 siblings, 1 reply; 3+ messages in thread
From: Etienne Dublé @ 2018-11-16 13:55 UTC (permalink / raw)
To: qemu-devel
Hello,
Please forgive my little knowledge about qemu internals.
Some time ago I had an idea that might improve usage of qemu user mode
(I believe) and I would like to get your thoughts about it.
Context: qemu-user is used by more and more people to run containers
(e.g. docker) based on a different CPU architecture (e.g. the OS of a
raspberry pi). With linux kernel module "binfmt_misc", the emulation is
handled transparently by qemu. Usually, a shell session will be run
first, and then many subprocesses. And of course, each of these
processes is actually a qemu process running in "user-mode". For
example, if one types "make" to compile some code, there will be a "qemu
make" process, then probably 10 or more "qemu gcc" processes, etc. Since
all of these are different qemu processes, they do not share any
knowledge, so each time a new one is spawn, it has to translate the
binary code of libc, ld-linux, any other library it uses, its own binary
code, etc. When it ends, all this work is lost, and new processes will
have to reprocess a big part of the very same code over and over again.
So the idea is: what if we could share the cache of code already
translated between all those processes?
There would be sereral ways to achieve this:
* use a shared memory area for the cache, and locking mechanisms.
* have a (maybe optional) daemon that would manage the cache of all
processes.
* python-like model: the first time a binary or library is translated,
save this translated code in a cache file next to the original file,
with different extension.
Please let me know what you think about it, if something similar has
already been studied, or if I miss something obvious.
Thanks
Etienne
--
Etienne Dublé
CNRS / LIG - Bâtiment IMAG
700 avenue Centrale - 38401 St Martin d'Hères
Bureau 426 - Tel 0457421431
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] qemu-user performance
2018-11-16 13:55 [Qemu-devel] qemu-user performance Etienne Dublé
@ 2018-11-16 15:41 ` Emilio G. Cota
2018-11-16 16:46 ` Etienne Dublé
0 siblings, 1 reply; 3+ messages in thread
From: Emilio G. Cota @ 2018-11-16 15:41 UTC (permalink / raw)
To: Etienne Dublé; +Cc: qemu-devel
On Fri, Nov 16, 2018 at 14:55:01 +0100, Etienne Dublé wrote:
(snip)
> So the idea is: what if we could share the cache of code already translated
> between all those processes?
> There would be sereral ways to achieve this:
> * use a shared memory area for the cache, and locking mechanisms.
> * have a (maybe optional) daemon that would manage the cache of all
> processes.
> * python-like model: the first time a binary or library is translated, save
> this translated code in a cache file next to the original file, with
> different extension.
> Please let me know what you think about it, if something similar has already
> been studied, or if I miss something obvious.
There's a recent paper that implements something similar to what you
propose:
"A General Persistent Code Caching Framework for Dynamic Binary
Translation (DBT)", ATC'16
https://www.usenix.org/system/files/conference/atc16/atc16_paper-wang.pdf
Note that in that paper they compare against HQEMU, and not against
upstream QEMU. I presume they chose HQEMU because it spends
more effort than QEMU in trying to generate better code for hot
code paths (they use LLVM in a separate thread for those), which
means that code generation can be a bottleneck for some workloads
(e.g. SPEC's gcc or perlbench).
QEMU, on the other hand, generates much simpler code, and as a result
it is rare to find workloads where code generation is a bottleneck.
(You can measure this with perf top in your system; make sure you
configured QEMU with --disable-strip to keep the symbols after
"make install".)
So until QEMU gets some sort of "hot code optimization" that makes
translation more expensive, there's little point in implementing
persistent code caching for it.
As an aside, what QEMU version are you running? Performance has
improved quite a bit (particularly for integer workloads) in the
last couple of years, e.g. see the perf improvements from v2.6 to
v2.11 here:
https://imgur.com/a/5P5zj
Cheers,
Emilio
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] qemu-user performance
2018-11-16 15:41 ` Emilio G. Cota
@ 2018-11-16 16:46 ` Etienne Dublé
0 siblings, 0 replies; 3+ messages in thread
From: Etienne Dublé @ 2018-11-16 16:46 UTC (permalink / raw)
To: Emilio G. Cota; +Cc: qemu-devel
On 16/11/2018 16:41, Emilio G. Cota wrote:
> There's a recent paper that implements something similar to what you
> propose:
>
> "A General Persistent Code Caching Framework for Dynamic Binary
> Translation (DBT)", ATC'16
> https://www.usenix.org/system/files/conference/atc16/atc16_paper-wang.pdf
Interesting. Thanks for the link.
> Note that in that paper they compare against HQEMU, and not against
> upstream QEMU. I presume they chose HQEMU because it spends
> more effort than QEMU in trying to generate better code for hot
> code paths (they use LLVM in a separate thread for those), which
> means that code generation can be a bottleneck for some workloads
> (e.g. SPEC's gcc or perlbench).
>
> QEMU, on the other hand, generates much simpler code, and as a result
> it is rare to find workloads where code generation is a bottleneck.
> (You can measure this with perf top in your system; make sure you
> configured QEMU with --disable-strip to keep the symbols after
> "make install".)
> So until QEMU gets some sort of "hot code optimization" that makes
> translation more expensive, there's little point in implementing
> persistent code caching for it.
I see.
> As an aside, what QEMU version are you running? Performance has
> improved quite a bit (particularly for integer workloads) in the
> last couple of years, e.g. see the perf improvements from v2.6 to
> v2.11 here:
> https://imgur.com/a/5P5zj
Various versions.
I am the tech leader of WalT project (https://walt-project.liglab.fr/)
which allows to build network experimentation testbeds. Nodes are either
Raspberry Pi boards, or any PC booted with our custom USB key, or
virtual devices (based on kvm). We package OS of nodes as docker images.
This allows to easily share them (handy for experiment reproducibility),
and to let users easily create/modify such an OS. When the user wants to
modify the docker image of a raspberry pi board, this is where qemu
comes in. So actually the qemu version we have in the image depends when
it was first built.
I did not know the performance improvement has been so important in
these last years.
Thanks.
Etienne
--
Etienne Dublé
CNRS / LIG - Bâtiment IMAG
700 avenue Centrale - 38401 St Martin d'Hères
Bureau 426 - Tel 0457421431
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-11-16 16:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-16 13:55 [Qemu-devel] qemu-user performance Etienne Dublé
2018-11-16 15:41 ` Emilio G. Cota
2018-11-16 16:46 ` Etienne Dublé
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.