[Qemu-devel] [PATCH v5 0/3] tcg: enhance code generation quality for qemu_ld/st IRs

* [Qemu-devel] [PATCH v5 0/3] tcg: enhance code generation quality for qemu_ld/st IRs
@ 2012-10-09 12:37 Yeongkyoon Lee
  2012-10-09 12:37 ` [Qemu-devel] [PATCH v5 1/3] configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization Yeongkyoon Lee
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Yeongkyoon Lee @ 2012-10-09 12:37 UTC (permalink / raw)
  To: qemu-devel; +Cc: Yeongkyoon Lee

Hi, all.

Here is the 5th version of the series optimizing TCG qemu_ld/st code generation.

v5:
  - Remove RFC tag

v4:
  - Remove CONFIG_SOFTMMU pre-condition from configure
  - Instead, add some CONFIG_SOFTMMU condition to TCG sources
  - Remove some unnecessary comments

v3:
  - Support CONFIG_TCG_PASS_AREG0
    (expected to get more performance enhancement than others)
  - Remove the configure option "--enable-ldst-optimization""
  - Make the optimization as default on i386 and x86_64 hosts
  - Fix some mistyping and apply checkpatch.pl before committing
  - Test i386, arm and sparc softmmu targets on i386 and x86_64 hosts
  - Test linux-user-test-0.3

v2:
  - Follow the submit rule of qemu

v1:
  - Initial commit request

I think the generated codes from qemu_ld/st IRs are relatively heavy, which are
up to 12 instructions for TLB hit case on i386 host.
This patch series enhance the code quality of TCG qemu_ld/st IRs by reducing
jump and enhancing locality.
Main idea is simple and has been already described in the comments in
tcg-target.c, which separates slow path (TLB miss case), and generates it at the
end of TB.

For example, the generated code from qemu_ld changes as follow.
Before:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (5)
(3) TLB hit case: Load value from host memory
(4) Jump to next code (6)
(5) TLB miss case: call MMU helper
(6) ... (next code)

After:
(1) TLB check
(2) If hit fall through, else jump to TLB miss case (7)
(3) TLB hit case: Load value from host memory
(4) ... (next code)
...
(7) TLB miss case: call MMU helper
(8) Return to next code (4)

Following is some performance results measured based on qemu 1.0.
Although there was measurement error, the results was not negligible.

* EEMBC CoreMark (before -> after)
  - Guest: i386, Linux (Tizen platform)
  - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
  - Results: 1135.6 -> 1179.9 (+3.9%)

* nbench (before -> after)
  - Guest: i386, Linux (linux-0.2.img included in QEMU source)
  - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux
  - Results
    . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%)
    . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%)
    . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%)

Summarized features:
 - The changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and
   they are enabled by default on i386/x86_64 hosts
 - Forced removal of the macro will cause compilation error on i386/x86_64 hosts
 - No implementations other than i386/x86_64 hosts yet

In addition, I have tried to remove the generated codes of calling MMU helpers
for TLB miss case from end of TB, however, have not found good solution yet.
In my opinion, TLB hit case performance could be degraded if removing the
calling codes, because it needs to set runtime parameters, such as, data,
mmu index and return address, in register or stack though they are not used
in TLB hit case.
This remains as a further issue.

Yeongkyoon Lee (3):
  configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st
    optimization
  tcg: Add declarations and templates of extended MMU helpers
  tcg: Optimize qemu_ld/st by generating slow paths at the end of a
    block

 configure             |    6 +
 softmmu_defs.h        |   39 +++++
 softmmu_header.h      |   15 ++
 softmmu_template.h    |   41 ++++-
 tcg/i386/tcg-target.c |  420 ++++++++++++++++++++++++++++++++-----------------
 tcg/tcg.c             |   13 ++
 tcg/tcg.h             |   35 ++++
 7 files changed, 416 insertions(+), 153 deletions(-)

--
1.7.5.4

^ permalink raw reply	[flat|nested] 16+ messages in thread