[Qemu-devel] [PATCH 00/10] TCG optimizations for 2.10

* [Qemu-devel] [PATCH 00/10] TCG optimizations for 2.10
@ 2017-04-12  1:17 Emilio G. Cota
  2017-04-12  1:17 ` [Qemu-devel] [PATCH 01/10] exec-all: add tb_from_jmp_cache Emilio G. Cota
                   ` (10 more replies)
  0 siblings, 11 replies; 21+ messages in thread
From: Emilio G. Cota @ 2017-04-12  1:17 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Claudio Fontana,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

Hi all,

This series is aimed at 2.10 or beyond. Its goal is to improve
TCG performance by optimizing:

1- Cross-page direct jumps (softmmu only, obviously). Patches 1-4.
2- Indirect branches (softmmu and user-mode). Patches 5-9.
3- tb_jmp_cache hashing in user-mode. Patch 10.

I decided to work on this after reading this paper [1] (code at [2]),
which among other optimizations it proposes solutions for 1 and 2.
I followed the same overall scheme they follow, that is to use helpers
to check whether the target vaddr is valid, and if so, jump to its
corresponding translated code (host address) without having to go back
to the exec loop. My implementation differs from that in the paper
in that it uses tb_jmp_cache instead of adding more caches,
which is simpler and probably more resilient in environments
where TLB invalidations are frequent (in the paper they acknowledge
that they limited background processes to a minimum, which isn't
realistic).

These changes require modifications on the targets and, for optimization
number 2, a new TCG opcode to jump to a host address contained in a register.

For now I only implemented this for the i386 and arm targets, and
the i386 TCG backend. Other targets/backends can easily opt-in.

The 3rd optimization is implemented in the last patch: it improves
tb_jmp_cache hashing for user-mode by removing the requirement of
being able to clear parts of the cache given a page number, since this
requirement only applies to softmmu.

The series applies cleanly on top of 95b31d709ba34.

The commit logs include many measurements, performed using SPECint06 and
NBench from dbt-bench[3].

Feedback welcome! Thanks,

		Emilio

[1] "Optimizing Control Transfer and Memory Virtualization
in Full System Emulators", Ding-Yong Hong, Chun-Chen Hsu, Cheng-Yi Chou,
Wei-Chung Hsu, Pangfeng Liu, Jan-Jan Wu. ACM TACO, Jan. 2016.
  http://www.iis.sinica.edu.tw/page/library/TechReport/tr2015/tr15002.pdf

[2] https://github.com/tkhsu/quick-android-emulator/tree/quick-qemu

[3] https://github.com/cota/dbt-bench

^ permalink raw reply	[flat|nested] 21+ messages in thread