qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org, iii@linux.ibm.com
Subject: [PATCH v2 00/33] accel/tcg + target/arm: pc-relative translation
Date: Tue, 16 Aug 2022 15:33:27 -0500	[thread overview]
Message-ID: <20220816203400.161187-1-richard.henderson@linaro.org> (raw)

Supercedes: 20220812180806.2128593-1-richard.henderson@linaro.org
("accel/tcg: minimize tlb lookups during translate + user-only PROT_EXEC fixes")

A few changes to the PROT_EXEC work that I posted last week, and
then continuing to the main event.

My initial goal was to reduce the overhead of TB flushing, which
Alex Bennee identified as a significant issue with respect to
booting AArch64 kernels under avocado.  Our initial guess was that
we need a more efficient data structure for walking TBs associated
with a physical page.

While I was looking at some of those numbers, I noted that we were
seeing up to 16000 TBs attached to a single page, which is well more
than I expected to see, and means that a new data structure isn't
going to help as much as simply reducing the number of translations.

It turns out the retranslation is due to the guest kernel's userland
address space randomization.  Each process gets e.g. libc mapped to
a different virtual address, which caused a new translation.

This, then, introduces some infrastructure for writing "pc-relative"
translation blocks, in which the guest pc is treated as a variable
just like any other guest cpu register.  The hashing for these TBs
are adjusted to compare the physical address.  The target/arm backend
is adjusted to use the new feature.

This does result in a significant reduction in translation.  From the
BootLinuxAarch64.test_virt_tcg_gicv2 test, at the login prompt:

    Before:

    gen code size       160684739/1073736704
    TB count            289808
    TB flush count      1
    TB invalidate count 235143

    After:

    gen code size       277992547/1073736704
    TB count            503882
    TB flush count      0
    TB invalidate count 69282

Before TARGET_TB_PCREL, we generate approximately 1.1GB of TBs
(overflow 1GB, flush, and fill 153MB again).  Afterward, we only
generate 265MB of TBs.

Surprisingly, this does not affect wall-clock times nearly as
much as I would have expected:

                                       before   after   change
 BootLinuxAarch64.test_virt_tcg_gicv2:  97.35    85.11   -12%
 BootLinuxAarch64.test_virt_tcg_gicv3: 102.75    96.87    -5%

Change in profile, top 10 entries before, matched up with after:

  before                                                           after
   9.01%  qemu-system-aar  [.] helper_lookup_tb_ptr                10.67%
   4.92%  qemu-system-aar  [.] qht_lookup_custom                    5.06%
   4.79%  qemu-system-aar  [.] get_phys_addr_lpae                   5.24%
   2.57%  qemu-system-aar  [.] address_space_ldq_le                 2.77%
   2.33%  qemu-system-aar  [.] liveness_pass_1                      0.60%
   2.24%  qemu-system-aar  [.] cpu_get_tb_cpu_state                 2.58%
   1.76%  qemu-system-aar  [.] address_space_translate_internal     1.75%
   1.71%  qemu-system-aar  [.] tb_lookup_cmp                        1.92%
   1.65%  qemu-system-aar  [.] tcg_gen_code                         0.44%
   1.64%  qemu-system-aar  [.] do_tb_phys_invalidate                0.09%


r~


Ilya Leoshkevich (1):
  accel/tcg: Introduce is_same_page()

Richard Henderson (32):
  linux-user/arm: Mark the commpage executable
  linux-user/hppa: Allocate page zero as a commpage
  linux-user/x86_64: Allocate vsyscall page as a commpage
  linux-user: Honor PT_GNU_STACK
  tests/tcg/i386: Move smc_code2 to an executable section
  accel/tcg: Remove PageDesc code_bitmap
  accel/tcg: Use bool for page_find_alloc
  accel/tcg: Make tb_htable_lookup static
  accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c
  accel/tcg: Properly implement get_page_addr_code for user-only
  accel/tcg: Use probe_access_internal for softmmu
    get_page_addr_code_hostp
  accel/tcg: Add nofault parameter to get_page_addr_code_hostp
  accel/tcg: Unlock mmap_lock after longjmp
  accel/tcg: Raise PROT_EXEC exception early
  accel/tcg: Remove translator_ldsw
  accel/tcg: Add pc and host_pc params to gen_intermediate_code
  accel/tcg: Add fast path for translator_ld*
  accel/tcg: Use DisasContextBase in plugin_gen_tb_start
  accel/tcg: Do not align tb->page_addr[0]
  include/hw/core: Create struct CPUJumpCache
  accel/tcg: Introduce tb_pc and tb_pc_log
  accel/tcg: Introduce TARGET_TB_PCREL
  accel/tcg: Split log_cpu_exec into inline and slow path
  target/arm: Introduce curr_insn_len
  target/arm: Change gen_goto_tb to work on displacements
  target/arm: Change gen_*set_pc_im to gen_*update_pc
  target/arm: Change gen_exception_insn* to work on displacements
  target/arm: Change gen_exception_internal to work on displacements
  target/arm: Change gen_jmp* to work on displacements
  target/arm: Introduce gen_pc_plus_diff for aarch64
  target/arm: Introduce gen_pc_plus_diff for aarch32
  target/arm: Enable TARGET_TB_PCREL

 include/elf.h                           |   1 +
 include/exec/cpu-common.h               |   1 +
 include/exec/cpu-defs.h                 |   3 +
 include/exec/exec-all.h                 | 138 +++++++-------
 include/exec/plugin-gen.h               |   7 +-
 include/exec/translator.h               |  85 +++++++--
 include/hw/core/cpu.h                   |   9 +-
 linux-user/arm/target_cpu.h             |   4 +-
 linux-user/qemu.h                       |   1 +
 target/arm/cpu-param.h                  |   2 +
 target/arm/translate-a32.h              |   2 +-
 target/arm/translate.h                  |  21 ++-
 accel/tcg/cpu-exec.c                    | 222 +++++++++++++---------
 accel/tcg/cputlb.c                      |  98 +++-------
 accel/tcg/plugin-gen.c                  |  23 +--
 accel/tcg/translate-all.c               | 197 +++++++-------------
 accel/tcg/translator.c                  | 122 +++++++++---
 accel/tcg/user-exec.c                   |  15 ++
 linux-user/elfload.c                    |  81 +++++++-
 softmmu/physmem.c                       |  12 ++
 target/alpha/translate.c                |   5 +-
 target/arm/cpu.c                        |  23 +--
 target/arm/translate-a64.c              | 174 ++++++++++-------
 target/arm/translate-m-nocp.c           |   6 +-
 target/arm/translate-mve.c              |   2 +-
 target/arm/translate-vfp.c              |  10 +-
 target/arm/translate.c                  | 237 +++++++++++++++---------
 target/avr/cpu.c                        |   2 +-
 target/avr/translate.c                  |   5 +-
 target/cris/translate.c                 |   5 +-
 target/hexagon/cpu.c                    |   2 +-
 target/hexagon/translate.c              |   6 +-
 target/hppa/cpu.c                       |   4 +-
 target/hppa/translate.c                 |   5 +-
 target/i386/tcg/tcg-cpu.c               |   2 +-
 target/i386/tcg/translate.c             |   7 +-
 target/loongarch/cpu.c                  |   2 +-
 target/loongarch/translate.c            |   6 +-
 target/m68k/translate.c                 |   5 +-
 target/microblaze/cpu.c                 |   2 +-
 target/microblaze/translate.c           |   5 +-
 target/mips/tcg/exception.c             |   2 +-
 target/mips/tcg/sysemu/special_helper.c |   2 +-
 target/mips/tcg/translate.c             |   5 +-
 target/nios2/translate.c                |   5 +-
 target/openrisc/cpu.c                   |   2 +-
 target/openrisc/translate.c             |   6 +-
 target/ppc/translate.c                  |   5 +-
 target/riscv/cpu.c                      |   4 +-
 target/riscv/translate.c                |   5 +-
 target/rx/cpu.c                         |   2 +-
 target/rx/translate.c                   |   5 +-
 target/s390x/tcg/translate.c            |   5 +-
 target/sh4/cpu.c                        |   4 +-
 target/sh4/translate.c                  |   5 +-
 target/sparc/cpu.c                      |   2 +-
 target/sparc/translate.c                |   5 +-
 target/tricore/cpu.c                    |   2 +-
 target/tricore/translate.c              |   6 +-
 target/xtensa/translate.c               |   6 +-
 tcg/tcg.c                               |   6 +-
 tests/tcg/i386/test-i386.c              |   2 +-
 62 files changed, 979 insertions(+), 666 deletions(-)

-- 
2.34.1



             reply	other threads:[~2022-08-16 20:58 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-16 20:33 Richard Henderson [this message]
2022-08-16 20:33 ` [PATCH v2 01/33] linux-user/arm: Mark the commpage executable Richard Henderson
2022-08-16 20:33 ` [PATCH v2 02/33] linux-user/hppa: Allocate page zero as a commpage Richard Henderson
2022-08-16 20:33 ` [PATCH v2 03/33] linux-user/x86_64: Allocate vsyscall page " Richard Henderson
2022-08-17 11:50   ` Ilya Leoshkevich
2022-08-16 20:33 ` [PATCH v2 04/33] linux-user: Honor PT_GNU_STACK Richard Henderson
2022-08-16 20:33 ` [PATCH v2 05/33] tests/tcg/i386: Move smc_code2 to an executable section Richard Henderson
2022-08-16 20:33 ` [PATCH v2 06/33] accel/tcg: Remove PageDesc code_bitmap Richard Henderson
2022-08-16 20:33 ` [PATCH v2 07/33] accel/tcg: Use bool for page_find_alloc Richard Henderson
2022-08-16 20:33 ` [PATCH v2 08/33] accel/tcg: Make tb_htable_lookup static Richard Henderson
2022-08-16 20:33 ` [PATCH v2 09/33] accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c Richard Henderson
2022-08-16 20:33 ` [PATCH v2 10/33] accel/tcg: Properly implement get_page_addr_code for user-only Richard Henderson
2022-08-16 20:33 ` [PATCH v2 11/33] accel/tcg: Use probe_access_internal for softmmu get_page_addr_code_hostp Richard Henderson
2022-08-16 20:33 ` [PATCH v2 12/33] accel/tcg: Add nofault parameter to get_page_addr_code_hostp Richard Henderson
2022-08-16 20:33 ` [PATCH v2 13/33] accel/tcg: Unlock mmap_lock after longjmp Richard Henderson
2022-08-16 20:33 ` [PATCH v2 14/33] accel/tcg: Raise PROT_EXEC exception early Richard Henderson
2022-08-16 20:33 ` [PATCH v2 15/33] accel/tcg: Introduce is_same_page() Richard Henderson
2022-08-16 20:33 ` [PATCH v2 16/33] accel/tcg: Remove translator_ldsw Richard Henderson
2022-08-16 20:33 ` [PATCH v2 17/33] accel/tcg: Add pc and host_pc params to gen_intermediate_code Richard Henderson
2022-08-16 20:33 ` [PATCH v2 18/33] accel/tcg: Add fast path for translator_ld* Richard Henderson
2022-08-16 20:33 ` [PATCH v2 19/33] accel/tcg: Use DisasContextBase in plugin_gen_tb_start Richard Henderson
2022-08-16 20:33 ` [PATCH v2 20/33] accel/tcg: Do not align tb->page_addr[0] Richard Henderson
2022-08-16 20:33 ` [PATCH v2 21/33] include/hw/core: Create struct CPUJumpCache Richard Henderson
2022-08-16 20:33 ` [PATCH v2 22/33] accel/tcg: Introduce tb_pc and tb_pc_log Richard Henderson
2022-08-16 20:33 ` [PATCH v2 23/33] accel/tcg: Introduce TARGET_TB_PCREL Richard Henderson
2022-08-16 20:33 ` [PATCH v2 24/33] accel/tcg: Split log_cpu_exec into inline and slow path Richard Henderson
2022-08-16 20:33 ` [PATCH v2 25/33] target/arm: Introduce curr_insn_len Richard Henderson
2022-08-16 20:33 ` [PATCH v2 26/33] target/arm: Change gen_goto_tb to work on displacements Richard Henderson
2022-08-16 20:33 ` [PATCH v2 27/33] target/arm: Change gen_*set_pc_im to gen_*update_pc Richard Henderson
2022-08-16 20:33 ` [PATCH v2 28/33] target/arm: Change gen_exception_insn* to work on displacements Richard Henderson
2022-08-16 20:33 ` [PATCH v2 29/33] target/arm: Change gen_exception_internal " Richard Henderson
2022-08-16 20:33 ` [PATCH v2 30/33] target/arm: Change gen_jmp* " Richard Henderson
2022-08-16 20:33 ` [PATCH v2 31/33] target/arm: Introduce gen_pc_plus_diff for aarch64 Richard Henderson
2022-08-16 20:33 ` [PATCH v2 32/33] target/arm: Introduce gen_pc_plus_diff for aarch32 Richard Henderson
2022-08-16 20:34 ` [PATCH v2 33/33] target/arm: Enable TARGET_TB_PCREL Richard Henderson
2022-08-16 20:41   ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220816203400.161187-1-richard.henderson@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=alex.bennee@linaro.org \
    --cc=iii@linux.ibm.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).