All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] qemu-user performance
@ 2018-11-16 13:55 Etienne Dublé
  2018-11-16 15:41 ` Emilio G. Cota
  0 siblings, 1 reply; 3+ messages in thread
From: Etienne Dublé @ 2018-11-16 13:55 UTC (permalink / raw)
  To: qemu-devel

Hello,
Please forgive my little knowledge about qemu internals.
Some time ago I had an idea that might improve usage of qemu user mode 
(I believe) and I would like to get your thoughts about it.
Context: qemu-user is used by more and more people to run containers 
(e.g. docker) based on a different CPU architecture (e.g. the OS of a 
raspberry pi). With linux kernel module "binfmt_misc", the emulation is 
handled transparently by qemu. Usually, a shell session will be run 
first, and then many subprocesses. And of course, each of these 
processes is actually a qemu process running in "user-mode". For 
example, if one types "make" to compile some code, there will be a "qemu 
make" process, then probably 10 or more "qemu gcc" processes, etc. Since 
all of these are different qemu processes, they do not share any 
knowledge, so each time a new one is spawn, it has to translate the 
binary code of libc, ld-linux, any other library it uses, its own binary 
code, etc. When it ends, all this work is lost, and new processes will 
have to reprocess a big part of the very same code over and over again.
So the idea is: what if we could share the cache of code already 
translated between all those processes?
There would be sereral ways to achieve this:
* use a shared memory area for the cache, and locking mechanisms.
* have a (maybe optional) daemon that would manage the cache of all 
processes.
* python-like model: the first time a binary or library is translated, 
save this translated code in a cache file next to the original file, 
with different extension.
Please let me know what you think about it, if something similar has 
already been studied, or if I miss something obvious.
Thanks
Etienne

-- 
Etienne Dublé
CNRS / LIG - Bâtiment IMAG
700 avenue Centrale - 38401 St Martin d'Hères
Bureau 426 - Tel 0457421431

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] qemu-user performance
  2018-11-16 13:55 [Qemu-devel] qemu-user performance Etienne Dublé
@ 2018-11-16 15:41 ` Emilio G. Cota
  2018-11-16 16:46   ` Etienne Dublé
  0 siblings, 1 reply; 3+ messages in thread
From: Emilio G. Cota @ 2018-11-16 15:41 UTC (permalink / raw)
  To: Etienne Dublé; +Cc: qemu-devel

On Fri, Nov 16, 2018 at 14:55:01 +0100, Etienne Dublé wrote:
(snip)
> So the idea is: what if we could share the cache of code already translated
> between all those processes?
> There would be sereral ways to achieve this:
> * use a shared memory area for the cache, and locking mechanisms.
> * have a (maybe optional) daemon that would manage the cache of all
> processes.
> * python-like model: the first time a binary or library is translated, save
> this translated code in a cache file next to the original file, with
> different extension.
> Please let me know what you think about it, if something similar has already
> been studied, or if I miss something obvious.

There's a recent paper that implements something similar to what you
propose:

  "A General Persistent Code Caching Framework for Dynamic Binary
  Translation (DBT)", ATC'16
  https://www.usenix.org/system/files/conference/atc16/atc16_paper-wang.pdf

Note that in that paper they compare against HQEMU, and not against
upstream QEMU. I presume they chose HQEMU because it spends
more effort than QEMU in trying to generate better code for hot
code paths (they use LLVM in a separate thread for those), which
means that code generation can be a bottleneck for some workloads
(e.g. SPEC's gcc or perlbench).

QEMU, on the other hand, generates much simpler code, and as a result
it is rare to find workloads where code generation is a bottleneck.
(You can measure this with perf top in your system; make sure you
configured QEMU with --disable-strip to keep the symbols after
"make install".)
So until QEMU gets some sort of "hot code optimization" that makes
translation more expensive, there's little point in implementing
persistent code caching for it.

As an aside, what QEMU version are you running? Performance has
improved quite a bit (particularly for integer workloads) in the
last couple of years, e.g. see the perf improvements from v2.6 to
v2.11 here:
  https://imgur.com/a/5P5zj

Cheers,

		Emilio

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] qemu-user performance
  2018-11-16 15:41 ` Emilio G. Cota
@ 2018-11-16 16:46   ` Etienne Dublé
  0 siblings, 0 replies; 3+ messages in thread
From: Etienne Dublé @ 2018-11-16 16:46 UTC (permalink / raw)
  To: Emilio G. Cota; +Cc: qemu-devel

On 16/11/2018 16:41, Emilio G. Cota wrote:
> There's a recent paper that implements something similar to what you
> propose:
>
>    "A General Persistent Code Caching Framework for Dynamic Binary
>    Translation (DBT)", ATC'16
>    https://www.usenix.org/system/files/conference/atc16/atc16_paper-wang.pdf
Interesting. Thanks for the link.

> Note that in that paper they compare against HQEMU, and not against
> upstream QEMU. I presume they chose HQEMU because it spends
> more effort than QEMU in trying to generate better code for hot
> code paths (they use LLVM in a separate thread for those), which
> means that code generation can be a bottleneck for some workloads
> (e.g. SPEC's gcc or perlbench).
>
> QEMU, on the other hand, generates much simpler code, and as a result
> it is rare to find workloads where code generation is a bottleneck.
> (You can measure this with perf top in your system; make sure you
> configured QEMU with --disable-strip to keep the symbols after
> "make install".)
> So until QEMU gets some sort of "hot code optimization" that makes
> translation more expensive, there's little point in implementing
> persistent code caching for it.
I see.

> As an aside, what QEMU version are you running? Performance has
> improved quite a bit (particularly for integer workloads) in the
> last couple of years, e.g. see the perf improvements from v2.6 to
> v2.11 here:
>    https://imgur.com/a/5P5zj
Various versions.
I am the tech leader of WalT project (https://walt-project.liglab.fr/) 
which allows to build network experimentation testbeds. Nodes are either 
Raspberry Pi boards, or any PC booted with our custom USB key, or 
virtual devices (based on kvm). We package OS of nodes as docker images. 
This allows to easily share them (handy for experiment reproducibility), 
and to let users easily create/modify such an OS. When the user wants to 
modify the docker image of a raspberry pi board, this is where qemu 
comes in. So actually the qemu version we have in the image depends when 
it was first built.
I did not know the performance improvement has been so important in 
these last years.
Thanks.
Etienne

-- 
Etienne Dublé
CNRS / LIG - Bâtiment IMAG
700 avenue Centrale - 38401 St Martin d'Hères
Bureau 426 - Tel 0457421431

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-11-16 16:46 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-16 13:55 [Qemu-devel] qemu-user performance Etienne Dublé
2018-11-16 15:41 ` Emilio G. Cota
2018-11-16 16:46   ` Etienne Dublé

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.