From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: qemu-arm@nongnu.org, alex.bennee@linaro.org, iii@linux.ibm.com
Subject: [PATCH v2 00/33] accel/tcg + target/arm: pc-relative translation
Date: Tue, 16 Aug 2022 15:33:27 -0500 [thread overview]
Message-ID: <20220816203400.161187-1-richard.henderson@linaro.org> (raw)
Supercedes: 20220812180806.2128593-1-richard.henderson@linaro.org
("accel/tcg: minimize tlb lookups during translate + user-only PROT_EXEC fixes")
A few changes to the PROT_EXEC work that I posted last week, and
then continuing to the main event.
My initial goal was to reduce the overhead of TB flushing, which
Alex Bennee identified as a significant issue with respect to
booting AArch64 kernels under avocado. Our initial guess was that
we need a more efficient data structure for walking TBs associated
with a physical page.
While I was looking at some of those numbers, I noted that we were
seeing up to 16000 TBs attached to a single page, which is well more
than I expected to see, and means that a new data structure isn't
going to help as much as simply reducing the number of translations.
It turns out the retranslation is due to the guest kernel's userland
address space randomization. Each process gets e.g. libc mapped to
a different virtual address, which caused a new translation.
This, then, introduces some infrastructure for writing "pc-relative"
translation blocks, in which the guest pc is treated as a variable
just like any other guest cpu register. The hashing for these TBs
are adjusted to compare the physical address. The target/arm backend
is adjusted to use the new feature.
This does result in a significant reduction in translation. From the
BootLinuxAarch64.test_virt_tcg_gicv2 test, at the login prompt:
Before:
gen code size 160684739/1073736704
TB count 289808
TB flush count 1
TB invalidate count 235143
After:
gen code size 277992547/1073736704
TB count 503882
TB flush count 0
TB invalidate count 69282
Before TARGET_TB_PCREL, we generate approximately 1.1GB of TBs
(overflow 1GB, flush, and fill 153MB again). Afterward, we only
generate 265MB of TBs.
Surprisingly, this does not affect wall-clock times nearly as
much as I would have expected:
before after change
BootLinuxAarch64.test_virt_tcg_gicv2: 97.35 85.11 -12%
BootLinuxAarch64.test_virt_tcg_gicv3: 102.75 96.87 -5%
Change in profile, top 10 entries before, matched up with after:
before after
9.01% qemu-system-aar [.] helper_lookup_tb_ptr 10.67%
4.92% qemu-system-aar [.] qht_lookup_custom 5.06%
4.79% qemu-system-aar [.] get_phys_addr_lpae 5.24%
2.57% qemu-system-aar [.] address_space_ldq_le 2.77%
2.33% qemu-system-aar [.] liveness_pass_1 0.60%
2.24% qemu-system-aar [.] cpu_get_tb_cpu_state 2.58%
1.76% qemu-system-aar [.] address_space_translate_internal 1.75%
1.71% qemu-system-aar [.] tb_lookup_cmp 1.92%
1.65% qemu-system-aar [.] tcg_gen_code 0.44%
1.64% qemu-system-aar [.] do_tb_phys_invalidate 0.09%
r~
Ilya Leoshkevich (1):
accel/tcg: Introduce is_same_page()
Richard Henderson (32):
linux-user/arm: Mark the commpage executable
linux-user/hppa: Allocate page zero as a commpage
linux-user/x86_64: Allocate vsyscall page as a commpage
linux-user: Honor PT_GNU_STACK
tests/tcg/i386: Move smc_code2 to an executable section
accel/tcg: Remove PageDesc code_bitmap
accel/tcg: Use bool for page_find_alloc
accel/tcg: Make tb_htable_lookup static
accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c
accel/tcg: Properly implement get_page_addr_code for user-only
accel/tcg: Use probe_access_internal for softmmu
get_page_addr_code_hostp
accel/tcg: Add nofault parameter to get_page_addr_code_hostp
accel/tcg: Unlock mmap_lock after longjmp
accel/tcg: Raise PROT_EXEC exception early
accel/tcg: Remove translator_ldsw
accel/tcg: Add pc and host_pc params to gen_intermediate_code
accel/tcg: Add fast path for translator_ld*
accel/tcg: Use DisasContextBase in plugin_gen_tb_start
accel/tcg: Do not align tb->page_addr[0]
include/hw/core: Create struct CPUJumpCache
accel/tcg: Introduce tb_pc and tb_pc_log
accel/tcg: Introduce TARGET_TB_PCREL
accel/tcg: Split log_cpu_exec into inline and slow path
target/arm: Introduce curr_insn_len
target/arm: Change gen_goto_tb to work on displacements
target/arm: Change gen_*set_pc_im to gen_*update_pc
target/arm: Change gen_exception_insn* to work on displacements
target/arm: Change gen_exception_internal to work on displacements
target/arm: Change gen_jmp* to work on displacements
target/arm: Introduce gen_pc_plus_diff for aarch64
target/arm: Introduce gen_pc_plus_diff for aarch32
target/arm: Enable TARGET_TB_PCREL
include/elf.h | 1 +
include/exec/cpu-common.h | 1 +
include/exec/cpu-defs.h | 3 +
include/exec/exec-all.h | 138 +++++++-------
include/exec/plugin-gen.h | 7 +-
include/exec/translator.h | 85 +++++++--
include/hw/core/cpu.h | 9 +-
linux-user/arm/target_cpu.h | 4 +-
linux-user/qemu.h | 1 +
target/arm/cpu-param.h | 2 +
target/arm/translate-a32.h | 2 +-
target/arm/translate.h | 21 ++-
accel/tcg/cpu-exec.c | 222 +++++++++++++---------
accel/tcg/cputlb.c | 98 +++-------
accel/tcg/plugin-gen.c | 23 +--
accel/tcg/translate-all.c | 197 +++++++-------------
accel/tcg/translator.c | 122 +++++++++---
accel/tcg/user-exec.c | 15 ++
linux-user/elfload.c | 81 +++++++-
softmmu/physmem.c | 12 ++
target/alpha/translate.c | 5 +-
target/arm/cpu.c | 23 +--
target/arm/translate-a64.c | 174 ++++++++++-------
target/arm/translate-m-nocp.c | 6 +-
target/arm/translate-mve.c | 2 +-
target/arm/translate-vfp.c | 10 +-
target/arm/translate.c | 237 +++++++++++++++---------
target/avr/cpu.c | 2 +-
target/avr/translate.c | 5 +-
target/cris/translate.c | 5 +-
target/hexagon/cpu.c | 2 +-
target/hexagon/translate.c | 6 +-
target/hppa/cpu.c | 4 +-
target/hppa/translate.c | 5 +-
target/i386/tcg/tcg-cpu.c | 2 +-
target/i386/tcg/translate.c | 7 +-
target/loongarch/cpu.c | 2 +-
target/loongarch/translate.c | 6 +-
target/m68k/translate.c | 5 +-
target/microblaze/cpu.c | 2 +-
target/microblaze/translate.c | 5 +-
target/mips/tcg/exception.c | 2 +-
target/mips/tcg/sysemu/special_helper.c | 2 +-
target/mips/tcg/translate.c | 5 +-
target/nios2/translate.c | 5 +-
target/openrisc/cpu.c | 2 +-
target/openrisc/translate.c | 6 +-
target/ppc/translate.c | 5 +-
target/riscv/cpu.c | 4 +-
target/riscv/translate.c | 5 +-
target/rx/cpu.c | 2 +-
target/rx/translate.c | 5 +-
target/s390x/tcg/translate.c | 5 +-
target/sh4/cpu.c | 4 +-
target/sh4/translate.c | 5 +-
target/sparc/cpu.c | 2 +-
target/sparc/translate.c | 5 +-
target/tricore/cpu.c | 2 +-
target/tricore/translate.c | 6 +-
target/xtensa/translate.c | 6 +-
tcg/tcg.c | 6 +-
tests/tcg/i386/test-i386.c | 2 +-
62 files changed, 979 insertions(+), 666 deletions(-)
--
2.34.1
next reply other threads:[~2022-08-16 20:58 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-16 20:33 Richard Henderson [this message]
2022-08-16 20:33 ` [PATCH v2 01/33] linux-user/arm: Mark the commpage executable Richard Henderson
2022-08-16 20:33 ` [PATCH v2 02/33] linux-user/hppa: Allocate page zero as a commpage Richard Henderson
2022-08-16 20:33 ` [PATCH v2 03/33] linux-user/x86_64: Allocate vsyscall page " Richard Henderson
2022-08-17 11:50 ` Ilya Leoshkevich
2022-08-16 20:33 ` [PATCH v2 04/33] linux-user: Honor PT_GNU_STACK Richard Henderson
2022-08-16 20:33 ` [PATCH v2 05/33] tests/tcg/i386: Move smc_code2 to an executable section Richard Henderson
2022-08-16 20:33 ` [PATCH v2 06/33] accel/tcg: Remove PageDesc code_bitmap Richard Henderson
2022-08-16 20:33 ` [PATCH v2 07/33] accel/tcg: Use bool for page_find_alloc Richard Henderson
2022-08-16 20:33 ` [PATCH v2 08/33] accel/tcg: Make tb_htable_lookup static Richard Henderson
2022-08-16 20:33 ` [PATCH v2 09/33] accel/tcg: Move qemu_ram_addr_from_host_nofail to physmem.c Richard Henderson
2022-08-16 20:33 ` [PATCH v2 10/33] accel/tcg: Properly implement get_page_addr_code for user-only Richard Henderson
2022-08-16 20:33 ` [PATCH v2 11/33] accel/tcg: Use probe_access_internal for softmmu get_page_addr_code_hostp Richard Henderson
2022-08-16 20:33 ` [PATCH v2 12/33] accel/tcg: Add nofault parameter to get_page_addr_code_hostp Richard Henderson
2022-08-16 20:33 ` [PATCH v2 13/33] accel/tcg: Unlock mmap_lock after longjmp Richard Henderson
2022-08-16 20:33 ` [PATCH v2 14/33] accel/tcg: Raise PROT_EXEC exception early Richard Henderson
2022-08-16 20:33 ` [PATCH v2 15/33] accel/tcg: Introduce is_same_page() Richard Henderson
2022-08-16 20:33 ` [PATCH v2 16/33] accel/tcg: Remove translator_ldsw Richard Henderson
2022-08-16 20:33 ` [PATCH v2 17/33] accel/tcg: Add pc and host_pc params to gen_intermediate_code Richard Henderson
2022-08-16 20:33 ` [PATCH v2 18/33] accel/tcg: Add fast path for translator_ld* Richard Henderson
2022-08-16 20:33 ` [PATCH v2 19/33] accel/tcg: Use DisasContextBase in plugin_gen_tb_start Richard Henderson
2022-08-16 20:33 ` [PATCH v2 20/33] accel/tcg: Do not align tb->page_addr[0] Richard Henderson
2022-08-16 20:33 ` [PATCH v2 21/33] include/hw/core: Create struct CPUJumpCache Richard Henderson
2022-08-16 20:33 ` [PATCH v2 22/33] accel/tcg: Introduce tb_pc and tb_pc_log Richard Henderson
2022-08-16 20:33 ` [PATCH v2 23/33] accel/tcg: Introduce TARGET_TB_PCREL Richard Henderson
2022-08-16 20:33 ` [PATCH v2 24/33] accel/tcg: Split log_cpu_exec into inline and slow path Richard Henderson
2022-08-16 20:33 ` [PATCH v2 25/33] target/arm: Introduce curr_insn_len Richard Henderson
2022-08-16 20:33 ` [PATCH v2 26/33] target/arm: Change gen_goto_tb to work on displacements Richard Henderson
2022-08-16 20:33 ` [PATCH v2 27/33] target/arm: Change gen_*set_pc_im to gen_*update_pc Richard Henderson
2022-08-16 20:33 ` [PATCH v2 28/33] target/arm: Change gen_exception_insn* to work on displacements Richard Henderson
2022-08-16 20:33 ` [PATCH v2 29/33] target/arm: Change gen_exception_internal " Richard Henderson
2022-08-16 20:33 ` [PATCH v2 30/33] target/arm: Change gen_jmp* " Richard Henderson
2022-08-16 20:33 ` [PATCH v2 31/33] target/arm: Introduce gen_pc_plus_diff for aarch64 Richard Henderson
2022-08-16 20:33 ` [PATCH v2 32/33] target/arm: Introduce gen_pc_plus_diff for aarch32 Richard Henderson
2022-08-16 20:34 ` [PATCH v2 33/33] target/arm: Enable TARGET_TB_PCREL Richard Henderson
2022-08-16 20:41 ` Richard Henderson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220816203400.161187-1-richard.henderson@linaro.org \
--to=richard.henderson@linaro.org \
--cc=alex.bennee@linaro.org \
--cc=iii@linux.ibm.com \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).