All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10
@ 2017-04-26  6:23 Emilio G. Cota
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper Emilio G. Cota
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

This is the v3 of this series. v2 for context:
  https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04342.html

Changes from v2 -- all due to Richard's comments:

- Inlined tb_from_jmp_cache into the TCG helper; now that this helper
  is common there is no point in having a separate function for this.

- Compile-tested for all targets.

- Dropped the TCG alignment patches for now.

- Added reviewed-by tags.

- Removed stray goto_tb from tcg_dump_ops. (That was a leftover
  that should have been removed long ago.)

- merged helper_lookup_tb_ptr and goto_ptr into gen_lookup_and_goto_ptr.
  This makes the calling code simpler and also prunes out the lookup
  when goto_ptr isn't implemented by the backend.

- tcg/i386: add the "pre-epilogue" instruction that sets eax to 0,
  instead of setting eax to 0 before every goto_ptr.

- return "r" instead of "ri" for goto_ptr's op def. (*ouch!*)

- target/arm: extend to tl before calling the lookup_and_goto_ptr
  helper, thereby fixing aarch64-* target compilation.

Also re-ran all measurements, and dropped the noinline measurements
since now there's only a single (inlined) caller.
You can inspect/fetch the changes at:
  https://github.com/cota/qemu/tree/tcg-opt-v3

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
@ 2017-04-26  6:23 ` Emilio G. Cota
  2017-04-26  7:50   ` Paolo Bonzini
                     ` (2 more replies)
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 02/10] tcg: introduce goto_ptr opcode Emilio G. Cota
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

This paves the way for upcoming work.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg-runtime.c     | 21 +++++++++++++++++++++
 tcg/tcg-runtime.h |  2 ++
 tcg/tcg.h         |  1 +
 3 files changed, 24 insertions(+)

diff --git a/tcg-runtime.c b/tcg-runtime.c
index 4c60c96..90d2d4b 100644
--- a/tcg-runtime.c
+++ b/tcg-runtime.c
@@ -27,6 +27,7 @@
 #include "exec/helper-proto.h"
 #include "exec/cpu_ldst.h"
 #include "exec/exec-all.h"
+#include "exec/tb-hash.h"
 
 /* 32-bit helpers */
 
@@ -141,6 +142,26 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
     return ctpop64(arg);
 }
 
+void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
+{
+    CPUState *cpu = ENV_GET_CPU(env);
+    TranslationBlock *tb;
+    target_ulong cs_base, pc;
+    uint32_t flags;
+
+    if (unlikely(atomic_read(&cpu->exit_request))) {
+        goto out_epilogue;
+    }
+    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
+    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
+    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
+               tb->flags == flags)) {
+        return tb->tc_ptr;
+    }
+ out_epilogue:
+    return tcg_ctx.code_gen_epilogue;
+}
+
 void HELPER(exit_atomic)(CPUArchState *env)
 {
     cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
index 114ea6f..c41d38a 100644
--- a/tcg/tcg-runtime.h
+++ b/tcg/tcg-runtime.h
@@ -24,6 +24,8 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
 
+DEF_HELPER_FLAGS_2(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env, tl)
+
 DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
 
 #ifdef CONFIG_SOFTMMU
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 6c216bb..5ec48d1 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -699,6 +699,7 @@ struct TCGContext {
        extension that allows arithmetic on void*.  */
     int code_gen_max_blocks;
     void *code_gen_prologue;
+    void *code_gen_epilogue;
     void *code_gen_buffer;
     size_t code_gen_buffer_size;
     void *code_gen_ptr;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v3 02/10] tcg: introduce goto_ptr opcode
  2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper Emilio G. Cota
@ 2017-04-26  6:23 ` Emilio G. Cota
  2017-04-26  8:30   ` Richard Henderson
  2017-04-26 12:12   ` Richard Henderson
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 03/10] tcg: export tcg_gen_lookup_and_goto_ptr Emilio G. Cota
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/aarch64/tcg-target.h | 1 +
 tcg/arm/tcg-target.h     | 1 +
 tcg/i386/tcg-target.h    | 1 +
 tcg/ia64/tcg-target.h    | 1 +
 tcg/mips/tcg-target.h    | 1 +
 tcg/ppc/tcg-target.h     | 1 +
 tcg/s390/tcg-target.h    | 1 +
 tcg/sparc/tcg-target.h   | 1 +
 tcg/tcg-opc.h            | 1 +
 tcg/tci/tcg-target.h     | 1 +
 10 files changed, 10 insertions(+)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 1a5ea23..b82eac4 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -77,6 +77,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_extrl_i64_i32    0
 #define TCG_TARGET_HAS_extrh_i64_i32    0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #define TCG_TARGET_HAS_div_i64          1
 #define TCG_TARGET_HAS_rem_i64          1
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 09a19c6..2f3ecfd 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -123,6 +123,7 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_div_i32          use_idiv_instructions
 #define TCG_TARGET_HAS_rem_i32          0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 enum {
     TCG_AREG0 = TCG_REG_R6,
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 4275787..59d9835 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -107,6 +107,7 @@ extern bool have_popcnt;
 #define TCG_TARGET_HAS_muls2_i32        1
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_extrl_i64_i32    0
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 42aea03..901bb75 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -173,6 +173,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i64        0
 #define TCG_TARGET_HAS_extrl_i64_i32    0
 #define TCG_TARGET_HAS_extrh_i64_i32    0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
 #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index f46d64a..e3240cf 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -130,6 +130,7 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_muluh_i32        1
 #define TCG_TARGET_HAS_mulsh_i32        1
 #define TCG_TARGET_HAS_bswap32_i32      1
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index abd8b3d..a9aa974 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -82,6 +82,7 @@ extern bool have_isa_3_00;
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        1
 #define TCG_TARGET_HAS_mulsh_i32        1
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_add2_i32         0
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index cbdd2a6..6b7bcfb 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -92,6 +92,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_HAS_mulsh_i32      0
 #define TCG_TARGET_HAS_extrl_i64_i32  0
 #define TCG_TARGET_HAS_extrh_i64_i32  0
+#define TCG_TARGET_HAS_goto_ptr       0
 
 #define TCG_TARGET_HAS_div2_i64       1
 #define TCG_TARGET_HAS_rot_i64        1
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index b8b74f9..9348ddd 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -123,6 +123,7 @@ extern bool use_vis3_instructions;
 #define TCG_TARGET_HAS_muls2_i32        1
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #define TCG_TARGET_HAS_extrl_i64_i32    1
 #define TCG_TARGET_HAS_extrh_i64_i32    1
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index f06f894..c64b994 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -193,6 +193,7 @@ DEF(insn_start, 0, 0, TLADDR_ARGS * TARGET_INSN_START_WORDS,
     TCG_OPF_NOT_PRESENT)
 DEF(exit_tb, 0, 0, 1, TCG_OPF_BB_END)
 DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_END)
+DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_END)
 
 DEF(qemu_ld_i32, 1, TLADDR_ARGS, 1,
     TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 838bf3a..0696328 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -85,6 +85,7 @@
 #define TCG_TARGET_HAS_muls2_i32        0
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
+#define TCG_TARGET_HAS_goto_ptr         0
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_extrl_i64_i32    0
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v3 03/10] tcg: export tcg_gen_lookup_and_goto_ptr
  2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper Emilio G. Cota
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 02/10] tcg: introduce goto_ptr opcode Emilio G. Cota
@ 2017-04-26  6:23 ` Emilio G. Cota
  2017-04-26  8:29   ` Richard Henderson
  2017-04-26 10:33   ` Alex Bennée
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 04/10] tcg/i386: implement goto_ptr op Emilio G. Cota
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

Instead of exporting goto_ptr directly to TCG frontends, export
tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
returned by the lookup_tb_ptr() helper. This is the only use case
we have for goto_ptr and lookup_tb_ptr, so having this function is
very convenient. Furthermore, it trivially allows us to avoid calling
the lookup helper if goto_ptr is not implemented by the backend.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/README   |  8 ++++++++
 tcg/tcg-op.c | 13 +++++++++++++
 tcg/tcg-op.h | 11 +++++++++++
 3 files changed, 32 insertions(+)

diff --git a/tcg/README b/tcg/README
index a9858c2..bf49e82 100644
--- a/tcg/README
+++ b/tcg/README
@@ -477,6 +477,14 @@ current TB was linked to this TB. Otherwise execute the next
 instructions. Only indices 0 and 1 are valid and tcg_gen_goto_tb may be issued
 at most once with each slot index per TB.
 
+* lookup_and_goto_ptr tb_addr
+
+Look up a TB address ('tb_addr') and jump to it if valid. If not valid,
+jump to the TCG epilogue to go back to the exec loop.
+
+This operation is optional. If the TCG backend does not implement the
+goto_ptr opcode, emitting this op is equivalent to emitting exit_tb(0).
+
 * qemu_ld_i32/i64 t0, t1, flags, memidx
 * qemu_st_i32/i64 t0, t1, flags, memidx
 
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 95a39b7..8ff1eaf 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2587,6 +2587,19 @@ void tcg_gen_goto_tb(unsigned idx)
     tcg_gen_op1i(INDEX_op_goto_tb, idx);
 }
 
+void tcg_gen_lookup_and_goto_ptr(TCGv addr)
+{
+    if (TCG_TARGET_HAS_goto_ptr) {
+        TCGv_ptr ptr = tcg_temp_new_ptr();
+
+        gen_helper_lookup_tb_ptr(ptr, tcg_ctx.tcg_env, addr);
+        tcg_gen_op1i(INDEX_op_goto_ptr, GET_TCGV_PTR(ptr));
+        tcg_temp_free_ptr(ptr);
+    } else {
+        tcg_gen_exit_tb(0);
+    }
+}
+
 static inline TCGMemOp tcg_canonicalize_memop(TCGMemOp op, bool is64, bool st)
 {
     /* Trigger the asserts within as early as possible.  */
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index c68e300..5d3278f 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -796,6 +796,17 @@ static inline void tcg_gen_exit_tb(uintptr_t val)
  */
 void tcg_gen_goto_tb(unsigned idx);
 
+/**
+ * tcg_gen_lookup_and_goto_ptr() - look up a TB and jump to it if valid
+ * @addr: Guest address of the target TB
+ *
+ * If the TB is not valid, jump to the epilogue.
+ *
+ * This operation is optional. If the TCG backend does not implement goto_ptr,
+ * this op is equivalent to calling tcg_gen_exit_tb() with 0 as the argument.
+ */
+void tcg_gen_lookup_and_goto_ptr(TCGv addr);
+
 #if TARGET_LONG_BITS == 32
 #define tcg_temp_new() tcg_temp_new_i32()
 #define tcg_global_reg_new tcg_global_reg_new_i32
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v3 04/10] tcg/i386: implement goto_ptr op
  2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
                   ` (2 preceding siblings ...)
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 03/10] tcg: export tcg_gen_lookup_and_goto_ptr Emilio G. Cota
@ 2017-04-26  6:23 ` Emilio G. Cota
  2017-04-26  8:28   ` Richard Henderson
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 05/10] target/arm: optimize cross-page direct jumps in softmmu Emilio G. Cota
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 tcg/i386/tcg-target.h     |  2 +-
 tcg/i386/tcg-target.inc.c | 15 +++++++++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 59d9835..73a15f7 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -107,7 +107,7 @@ extern bool have_popcnt;
 #define TCG_TARGET_HAS_muls2_i32        1
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
-#define TCG_TARGET_HAS_goto_ptr         0
+#define TCG_TARGET_HAS_goto_ptr         1
 
 #if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_HAS_extrl_i64_i32    0
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 5918008..d0bf53a 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1906,6 +1906,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         s->tb_jmp_reset_offset[a0] = tcg_current_code_size(s);
         break;
+    case INDEX_op_goto_ptr:
+        /* jmp to the given host address (could be epilogue) */
+        tcg_out_modrm(s, OPC_GRP5, EXT5_JMPN_Ev, a0);
+        break;
     case INDEX_op_br:
         tcg_out_jxx(s, JCC_JMP, arg_label(a0), 0);
         break;
@@ -2277,6 +2281,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
 static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
 {
+    static const TCGTargetOpDef r = { .args_ct_str = { "r" } };
     static const TCGTargetOpDef ri_r = { .args_ct_str = { "ri", "r" } };
     static const TCGTargetOpDef re_r = { .args_ct_str = { "re", "r" } };
     static const TCGTargetOpDef qi_r = { .args_ct_str = { "qi", "r" } };
@@ -2299,6 +2304,9 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode op)
         = { .args_ct_str = { "L", "L", "L", "L" } };
 
     switch (op) {
+    case INDEX_op_goto_ptr:
+        return &r;
+
     case INDEX_op_ld8u_i32:
     case INDEX_op_ld8u_i64:
     case INDEX_op_ld8s_i32:
@@ -2567,6 +2575,13 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out_modrm(s, OPC_GRP5, EXT5_JMPN_Ev, tcg_target_call_iarg_regs[1]);
 #endif
 
+    /*
+     * Return path for goto_ptr. Set return value to 0, a-la exit_tb,
+     * and fall through to the rest of the epilogue.
+     */
+    s->code_gen_epilogue = s->code_ptr;
+    tcg_out_movi(s, TCG_TYPE_REG, TCG_REG_EAX, 0);
+
     /* TB epilogue */
     tb_ret_addr = s->code_ptr;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v3 05/10] target/arm: optimize cross-page direct jumps in softmmu
  2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
                   ` (3 preceding siblings ...)
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 04/10] tcg/i386: implement goto_ptr op Emilio G. Cota
@ 2017-04-26  6:23 ` Emilio G. Cota
  2017-04-26  8:27   ` Richard Henderson
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 06/10] target/arm: optimize indirect branches Emilio G. Cota
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

Instead of unconditionally exiting to the exec loop, use the
lookup_and_goto_ptr helper to jump to the target if it is valid.

Perf impact: see next commit's log.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/arm/translate.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index e32e38c..02cad96 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4085,8 +4085,12 @@ static inline void gen_goto_tb(DisasContext *s, int n, target_ulong dest)
         gen_set_pc_im(s, dest);
         tcg_gen_exit_tb((uintptr_t)s->tb + n);
     } else {
+        TCGv addr = tcg_temp_new();
+
         gen_set_pc_im(s, dest);
-        tcg_gen_exit_tb(0);
+        tcg_gen_extu_i32_tl(addr, cpu_R[15]);
+        tcg_gen_lookup_and_goto_ptr(addr);
+        tcg_temp_free(addr);
     }
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v3 06/10] target/arm: optimize indirect branches
  2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
                   ` (4 preceding siblings ...)
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 05/10] target/arm: optimize cross-page direct jumps in softmmu Emilio G. Cota
@ 2017-04-26  6:23 ` Emilio G. Cota
  2017-04-26  7:54   ` Richard Henderson
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 07/10] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Emilio G. Cota
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

Speed up indirect branches by jumping to the target if it is valid.

Softmmu measurements (see later commit for user-mode results):

Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.

- Impact on Boot time

| setup  | ARM debian jessie boot+shutdown time | stddev |
|--------+--------------------------------------+--------|
| v2.9.0 |                                 8.84 |   0.07 |
| +cross |                                 8.85 |   0.03 |
| +jr    |                                 8.83 |   0.06 |

-                            NBench, arm-softmmu (debian jessie guest). Host: Intel i7-4790K @ 4.00GHz

  1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
       |                                                                                                                 |
       |   cross                                                          ####                                           |
 1.25x +cross+jr..........................................................#++#.........................................+-+
       |                                                        ####      #  #                                           |
       |                                                     +++#  #      #  #                                           |
       |                                      +++            ****  #      #  #                                           |
  1.2x +-+...................................####............*..*..#......#..#.........................................+-+
       |                                  ****  #            *  *  #      #  #     ####                                  |
       |                                  *  *  #            *  *  #      #  #     #  #                                  |
 1.15x +-+................................*..*..#............*..*..#......#..#.....#..#................................+-+
       |                                  *  *  #            *  *  #      #  #     #  #                                  |
       |                                  *  *  #      ####  *  *  #      #  #     #  #                                  |
       |                                  *  *  #      #  #  *  *  #      #  #     #  #                         ####     |
  1.1x +-+................................*..*..#......#..#..*..*..#......#..#.....#..#.........................#..#...+-+
       |                                  *  *  #      #  #  *  *  #      #  #     #  #                         #  #     |
       |                                  *  *  #      #  #  *  *  #      #  #     #  #                         #  #     |
 1.05x +-+..........................####..*..*..#......#..#..*..*..#......#..#.....#..#......+++............*****..#...+-+
       |                        *****  #  *  *  #      #  #  *  *  #  *****  #     #  #   +++ |    ****###  *   *  #     |
       |                        *+++*  #  *  *  #      #  #  *  *  #  *+++*  #  ****  #  *****###  *  *  #  *   *  #     |
       |     *****###  +++####  *   *  #  *  *  #  *****  #  *  *  #  *   *  #  *  *  #  * | *++#  *  *  #  *   *  #     |
    1x +-++-+*+++*-+#++****++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-++-+
       |     *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
       |     *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
 0.95x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
       ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT     hmean
  png: http://imgur.com/eOLmZNR

NB. 'cross' represents the previous commit.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/arm/translate.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 02cad96..73595b4 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -65,6 +65,7 @@ static TCGv_i32 cpu_R[16];
 TCGv_i32 cpu_CF, cpu_NF, cpu_VF, cpu_ZF;
 TCGv_i64 cpu_exclusive_addr;
 TCGv_i64 cpu_exclusive_val;
+static bool gen_jr;
 
 /* FIXME:  These should be removed.  */
 static TCGv_i32 cpu_F0s, cpu_F1s;
@@ -221,6 +222,7 @@ static void store_reg(DisasContext *s, int reg, TCGv_i32 var)
          */
         tcg_gen_andi_i32(var, var, s->thumb ? ~1 : ~3);
         s->is_jmp = DISAS_JUMP;
+        gen_jr = true;
     }
     tcg_gen_mov_i32(cpu_R[reg], var);
     tcg_temp_free_i32(var);
@@ -893,6 +895,7 @@ static inline void gen_bx_im(DisasContext *s, uint32_t addr)
         tcg_temp_free_i32(tmp);
     }
     tcg_gen_movi_i32(cpu_R[15], addr & ~1);
+    gen_jr = true;
 }
 
 /* Set PC and Thumb state from var.  var is marked as dead.  */
@@ -902,6 +905,7 @@ static inline void gen_bx(DisasContext *s, TCGv_i32 var)
     tcg_gen_andi_i32(cpu_R[15], var, ~1);
     tcg_gen_andi_i32(var, var, 1);
     store_cpu_field(var, thumb);
+    gen_jr = true;
 }
 
 /* Variant of store_reg which uses branch&exchange logic when storing
@@ -12034,6 +12038,16 @@ void gen_intermediate_code(CPUARMState *env, TranslationBlock *tb)
             gen_set_pc_im(dc, dc->pc);
             /* fall through */
         case DISAS_JUMP:
+            if (gen_jr) {
+                TCGv addr = tcg_temp_new();
+
+                gen_jr = false;
+                tcg_gen_extu_i32_tl(addr, cpu_R[15]);
+                tcg_gen_lookup_and_goto_ptr(addr);
+                tcg_temp_free(addr);
+                break;
+            }
+            /* fall through */
         default:
             /* indicate that the hash table must be used to find the next TB */
             tcg_gen_exit_tb(0);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v3 07/10] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr
  2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
                   ` (5 preceding siblings ...)
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 06/10] target/arm: optimize indirect branches Emilio G. Cota
@ 2017-04-26  6:23 ` Emilio G. Cota
  2017-04-26  8:26   ` Richard Henderson
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 08/10] target/i386: optimize cross-page direct jumps in softmmu Emilio G. Cota
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

This helper will be used by subsequent changes.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/i386/translate.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 1d1372f..59f6eaa 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -141,6 +141,7 @@ typedef struct DisasContext {
 } DisasContext;
 
 static void gen_eob(DisasContext *s);
+static void gen_jr(DisasContext *s, TCGv dest);
 static void gen_jmp(DisasContext *s, target_ulong eip);
 static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num);
 static void gen_op(DisasContext *s1, int op, TCGMemOp ot, int d);
@@ -2509,7 +2510,8 @@ static void gen_bnd_jmp(DisasContext *s)
    If INHIBIT, set HF_INHIBIT_IRQ_MASK if it isn't already set.
    If RECHECK_TF, emit a rechecking helper for #DB, ignoring the state of
    S->TF.  This is used by the syscall/sysret insns.  */
-static void gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
+static void
+gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, TCGv jr)
 {
     gen_update_cc_op(s);
 
@@ -2530,6 +2532,13 @@ static void gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
         tcg_gen_exit_tb(0);
     } else if (s->tf) {
         gen_helper_single_step(cpu_env);
+    } else if (jr) {
+        TCGv vaddr = tcg_temp_new();
+
+        tcg_gen_ld_tl(vaddr, cpu_env, offsetof(CPUX86State, segs[R_CS].base));
+        tcg_gen_add_tl(vaddr, vaddr, jr);
+        tcg_gen_lookup_and_goto_ptr(vaddr);
+        tcg_temp_free(vaddr);
     } else {
         tcg_gen_exit_tb(0);
     }
@@ -2540,13 +2549,19 @@ static void gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
    If INHIBIT, set HF_INHIBIT_IRQ_MASK if it isn't already set.  */
 static void gen_eob_inhibit_irq(DisasContext *s, bool inhibit)
 {
-    gen_eob_worker(s, inhibit, false);
+    gen_eob_worker(s, inhibit, false, NULL);
 }
 
 /* End of block, resetting the inhibit irq flag.  */
 static void gen_eob(DisasContext *s)
 {
-    gen_eob_worker(s, false, false);
+    gen_eob_worker(s, false, false, NULL);
+}
+
+/* Jump to register */
+static void gen_jr(DisasContext *s, TCGv dest)
+{
+    gen_eob_worker(s, false, false, dest);
 }
 
 /* generate a jump to eip. No segment change must happen before as a
@@ -7131,7 +7146,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         /* TF handling for the syscall insn is different. The TF bit is  checked
            after the syscall insn completes. This allows #DB to not be
            generated after one has entered CPL0 if TF is set in FMASK.  */
-        gen_eob_worker(s, false, true);
+        gen_eob_worker(s, false, true, NULL);
         break;
     case 0x107: /* sysret */
         if (!s->pe) {
@@ -7146,7 +7161,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                checked after the sysret insn completes. This allows #DB to be
                generated "as if" the syscall insn in userspace has just
                completed.  */
-            gen_eob_worker(s, false, true);
+            gen_eob_worker(s, false, true, NULL);
         }
         break;
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v3 08/10] target/i386: optimize cross-page direct jumps in softmmu
  2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
                   ` (6 preceding siblings ...)
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 07/10] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Emilio G. Cota
@ 2017-04-26  6:23 ` Emilio G. Cota
  2017-04-26  8:25   ` Richard Henderson
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 09/10] target/i386: optimize indirect branches Emilio G. Cota
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 10/10] tb-hash: improve tb_jmp_cache hash function in user mode Emilio G. Cota
  9 siblings, 1 reply; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

Instead of unconditionally exiting to the exec loop, use the
gen_jr helper to jump to the target if it is valid.

Perf impact: see next commit's log.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/i386/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index 59f6eaa..a065271 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2154,9 +2154,9 @@ static inline void gen_goto_tb(DisasContext *s, int tb_num, target_ulong eip)
         gen_jmp_im(eip);
         tcg_gen_exit_tb((uintptr_t)s->tb + tb_num);
     } else {
-        /* jump to another page: currently not optimized */
+        /* jump to another page */
         gen_jmp_im(eip);
-        gen_eob(s);
+        gen_jr(s, cpu_tmp0);
     }
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v3 09/10] target/i386: optimize indirect branches
  2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
                   ` (7 preceding siblings ...)
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 08/10] target/i386: optimize cross-page direct jumps in softmmu Emilio G. Cota
@ 2017-04-26  6:23 ` Emilio G. Cota
  2017-04-26  8:24   ` Richard Henderson
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 10/10] tb-hash: improve tb_jmp_cache hash function in user mode Emilio G. Cota
  9 siblings, 1 reply; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

Speed up indirect branches by jumping to the target if it is valid.

Softmmu measurements (see later commit for user-mode numbers):

Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.

-                  SPECint06 (test set), x86_64-softmmu (Ubuntu 16.04 guest). Host: Intel i7-4790K @ 4.00GHz

 2.4x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                                                                  |
      |   cross                                                                                                          |
 2.2x +cross+jr..........................................................................+++...........................+-+
      |                                                                                   |                              |
      |                                                                               +++ |                              |
   2x +-+..............................................................................|..|............................+-+
      |                                                                                |  |                              |
      |                                                                                |  |                              |
 1.8x +-+..............................................................................|####...........................+-+
      |                                                                                |# |#                             |
      |                                                                              **** |#                             |
 1.6x +-+............................................................................*.|*.|#...........................+-+
      |                                                                              * |* |#                             |
      |                                                                              * |* |#                             |
 1.4x +-+.......................................................................+++..*.|*.|#...........................+-+
      |                                                      ++++++             #### * |*++#             +++             |
      |                        +++                            |  |              #++# *++*  #          +++ |              |
 1.2x +-+......................###.....####....+++............|..|...........****..#.*..*..#....####...|.###.....####..+-+
      |        +++          **** #  ****  #    ####          ***###          *++*  # *  *  #    #++#  ****|#  +++#++#    |
      |    ****###     +++  *++* #  *++*  #  ++#  #    ####  *|* |#     +++  *  *  # *  *  #  ***  #  *| *|#  ****  #    |
   1x +-++-*++*++#++***###++*++*+#++*+-*++#+****++#++***++#+-*+*++#-+****##++*++*-+#+*++*-+#++*+*++#++*-+*+#++*++*++#-++-+
      |    *  *  #  * *  #  *  * #  *  *  # *  *  #  * *  #  *|* |#  *++* #  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
      |    *  *  #  * *  #  *  * #  *  *  # *  *  #  * *  #  *+*++#  *  * #  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
 0.8x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
         astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
  png: http://imgur.com/DU36YFU

NB. 'cross' represents the previous commit.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 target/i386/translate.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/target/i386/translate.c b/target/i386/translate.c
index a065271..3bd80c0 100644
--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -4988,7 +4988,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             gen_push_v(s, cpu_T1);
             gen_op_jmp_v(cpu_T0);
             gen_bnd_jmp(s);
-            gen_eob(s);
+            gen_jr(s, cpu_T0);
             break;
         case 3: /* lcall Ev */
             gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
@@ -5006,7 +5006,8 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                                       tcg_const_i32(dflag - 1),
                                       tcg_const_i32(s->pc - s->cs_base));
             }
-            gen_eob(s);
+            tcg_gen_ld_tl(cpu_tmp4, cpu_env, offsetof(CPUX86State, eip));
+            gen_jr(s, cpu_tmp4);
             break;
         case 4: /* jmp Ev */
             if (dflag == MO_16) {
@@ -5014,7 +5015,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             }
             gen_op_jmp_v(cpu_T0);
             gen_bnd_jmp(s);
-            gen_eob(s);
+            gen_jr(s, cpu_T0);
             break;
         case 5: /* ljmp Ev */
             gen_op_ld_v(s, ot, cpu_T1, cpu_A0);
@@ -5029,7 +5030,8 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 gen_op_movl_seg_T0_vm(R_CS);
                 gen_op_jmp_v(cpu_T1);
             }
-            gen_eob(s);
+            tcg_gen_ld_tl(cpu_tmp4, cpu_env, offsetof(CPUX86State, eip));
+            gen_jr(s, cpu_tmp4);
             break;
         case 6: /* push Ev */
             gen_push_v(s, cpu_T0);
@@ -6409,7 +6411,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         /* Note that gen_pop_T0 uses a zero-extending load.  */
         gen_op_jmp_v(cpu_T0);
         gen_bnd_jmp(s);
-        gen_eob(s);
+        gen_jr(s, cpu_T0);
         break;
     case 0xc3: /* ret */
         ot = gen_pop_T0(s);
@@ -6417,7 +6419,7 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         /* Note that gen_pop_T0 uses a zero-extending load.  */
         gen_op_jmp_v(cpu_T0);
         gen_bnd_jmp(s);
-        gen_eob(s);
+        gen_jr(s, cpu_T0);
         break;
     case 0xca: /* lret im */
         val = cpu_ldsw_code(env, s->pc);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v3 10/10] tb-hash: improve tb_jmp_cache hash function in user mode
  2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
                   ` (8 preceding siblings ...)
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 09/10] target/i386: optimize indirect branches Emilio G. Cota
@ 2017-04-26  6:23 ` Emilio G. Cota
  9 siblings, 0 replies; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26  6:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	alex.bennee, Pranith Kumar

Optimizations to cross-page chaining and indirect branches make
performance more sensitive to the hit rate of tb_jmp_cache.
The constraint of reserving some bits for the page number
lowers the achievable quality of the hashing function.

However, user-mode does not have this requirement. Thus,
with this change we use for user-mode a hashing function that
is both faster and of better quality than the previous one.

Measurements:

Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.

-                           SPECint06 (test set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz

 2.2x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                                                                  |
      |         jr                                                                                                       |
   2x +jr+multhash        +....................................................+++++...................................+-+
      |    jr+hash                                                              |$$$                                     |
      |                                                                         |$+$                                     |
      |                                                                        ### $                                     |
 1.8x +-+......................................................................#|#.$...................................+-+
      |                                                                      ++#+# $                                     |
      |                                                                       |# # $                                     |
 1.6x +-+....................................................................***.#.$....................++$$$..........+-+
      |                                         $$$                          *+* # $                     |$+$            |
      |                       ++$$$           ### $                          * * # $                  +++|$ $            |
      |                     ++###+$           # # $                          * * # $           ###   ****## $            |
 1.4x +-+...................***+#.$.........***.#.$..........................*.*.#.$...........#+#$$.*++*|#.$..........+-+
      |                     *+* # $         * * # $                          * * # $           # # $ *  *+# $            |
      |                     * * # $   +++++ * * # $                          * * # $         *** # $ *  * # $   ###$$    |
 1.2x +-+...................*.*.#.$.***##$$.*.*.#.$..........................*.*.#.$.........*.*.#.$.*..*.#.$.***+#+$..+-+
      |                     * * # $ *+* # $ * * # $   +++                    * * # $ ++###$$ * * # $ *  * # $ * * # $    |
      |    ***##$$          * * # $ * * # $ * * # $ ***##$$          ++###   * * # $ *** #+$ * * # $ *  * # $ * * # $    |
      |    *+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++####$$ ***+#   * * # $ * * # $ * * # $ *  * # $ * * # $    |
   1x +-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$$+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+
      |    * * # $ * * #  $ * * # $ * * # $ * * # $ * * # $ * *  # $ * * # $ * * # $ * * # $ * * # $ *  * # $ * * # $    |
      |    * * # $ * * #  $ * * # $ * * # $ * * # $ * * # $ * *  # $ * * # $ * * # $ * * # $ * * # $ *  * # $ * * # $    |
 0.8x +-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$$-***##$$-***##$$-***##$$-****##$$-***##$$--+-+
         astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
  png: http://imgur.com/4UXTrEc

Here I also tried the hash function suggested by Paolo ("multhash"):

  return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1);

As you can see it is just as good as the other new function ("hash"),
which is what I ended up going with.

-                          SPECint06 (train set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz

 2.6x +-+--------------------------------------------------------------------------------------------------------------+-+
      |                                                                                                                  |
      |     jr                                                                                           ###             |
 2.4x +jr+hash...........................................................................................#.#...........+-+
      |                                                                                                  # #             |
      |                                                                                                  # #             |
 2.2x +-+................................................................................................#.#...........+-+
      |                                                                                                  # #             |
      |                                                                                                  # #             |
   2x +-+................................................................................................#.#...........+-+
      |                                                                                               **** #             |
      |                                                                                               *  * #             |
 1.8x +-+.............................................................................................*..*.#...........+-+
      |                                                                         +++                   *  * #             |
      |                                                                         ####    ####          *  * #             |
 1.6x +-+......................................####.............................#..#.****..#..........*..*.#...........+-+
      |                        +++             #++#                          ****  # *  *  #    ####  *  * #             |
      |                        ###             #  #                          *  *  # *  *  #    #  #  *  * #             |
 1.4x +-+...................****+#..........****..#..........................*..*..#.*..*..#....#..#..*..*.#...........+-+
      |                     *++* #          *  *  #                          *  *  # *  *  #  ***  #  *  * #     ####    |
      |                     *  * #     #### *  *  #                          *  *  # *  *  #  * *  #  *  * #  ****  #    |
 1.2x +-+...................*..*.#..****++#.*..*..#..........................*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+
      |    ****###          *  * #  *  *  # *  *  #                          *  *  # *  *  #  * *  #  *  * #  *  *  #    |
      |    *  *  #  ***###  *  * #  *  *  # *  *  #                  ****##  *  *  # *  *  #  * *  #  *  * #  *  *  #    |
   1x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
         astar   bzip2      gcc   gobmk h264ref   hmmlibquantum      mcf omnetpperlbench   sjengxalancbmk   hmean
  png: http://imgur.com/ArCbHqo

-                                    NBench, x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz

 1.12x +-+-------------------------------------------------------------------------------------------------------------+-+
       |                                                                                                                 |
       |     jr                                                           +++                                            |
  1.1x +jr+hash...........................................................####.........................................+-+
       |                                                               +++#| #                                           |
       |                                                                | #++#                                           |
 1.08x +-+................................+++................+++.+++..*****..#.........................................+-+
       |                                   |  +++             |   |   * | *  #                                           |
       |                                   |   |              |   |   *+++*  #                                           |
 1.06x +-+................................****###.............|...|...*...*..#.........................+++.............+-+
       |                                  *| * |#            ****###  *   *  #                          |                |
       |                                  *| *++#            *| * |#  *   *  #                        ####               |
 1.04x +-+................................*++*..#............*|.*.|#..*...*..#........................#.|#.............+-+
       |                                  *  *  #            *++*++#  *   *  #                     +++#++#               |
       |                                  *  *  #            *  *  #  *   *  #                      | #  #   +++####     |
 1.02x +-+................................*..*..#......+++...*..*..#..*...*..#.....................****..#..*****++#...+-+
       |         +++                      *  *  #   +++ |    *  *  #  *   *  #  +++                *| *  #  *+++*  #     |
       |      +++ |    +++ +++   ++++++   *  *  #  *****###  *  *  #  *   *  #   |  +++   ++++++   *++*  #  *   *  #     |
    1x +-++-+++++####++****###++++-+####+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-+++####-+*****###++*++*++#++*+-+*++#+-++-+
       |     *****| #  *++* |#  *****| #  *  *  #  *   *++#  *  *  #  *   *  #  **** |#  *   *  #  *  *  #  *   *  #     |
       |     * | *| #  *  *++#  * | *++#  *  *  #  *   *  #  *  *  #  *   *  #  *| *++#  *   *  #  *  *  #  *   *  #     |
 0.98x +-+...*.|.*++#..*..*..#..*+++*..#..*..*..#..*...*..#..*..*..#..*...*..#..*++*..#..*...*..#..*..*..#..*...*..#...+-+
       |     *+++*  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
       |     *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
 0.96x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
       ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT     hmean
  png: http://imgur.com/ZXFX0hJ

-                                   NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz

  1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
       |                            ####                                                                                 |
       |     jr                     #  #                                            +++                                  |
 1.25x +jr+hash.....................#..#...........................................####................................+-+
       |                            #  #                                           #  #                                  |
       |                            #  #                                           #  #                                  |
  1.2x +-+..........................#..#...........................................#..#................................+-+
       |                            #  #                                           #  #                                  |
       |                            #  #                                           #  #                                  |
 1.15x +-+..........................#..#...........................................#..#................................+-+
       |                            #  #                                  ####     #  #                                  |
       |                            #  #                                  #  #     #  #                                  |
  1.1x +-+..........................#..#..................................#..#.....#..#................................+-+
       |                            #  #                                  #  #     #  #                         +++      |
       |                            #  #               ####               #  #     #  #                         ####     |
 1.05x +-+..........................#..#...............#..#.....####......#..#.....#..#.........................#..#...+-+
       |                            #  #               #  #     #  #      #  #     #  #                +++      #  #     |
       |                   +++  *****  #     ####  *****  #     #  #   +++#  #  ****  #            ****###      #  #     |
    1x +-++-+*****###++****+++++*+-+*++#+-****++#-+*+++*-+#+++++#++#++*****++#+-*++*++#-+*****-++++*++*++#++*****++#+-++-+
       |     *   *  #  *  * |   *   *  #  *  *  #  *   *  #  ****  #  *   *  #  *  *  #  *   *###  *  *++#  *   *  #     |
       |     *   *  #  *  *###  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
 0.95x +-+...*...*..#..*..*.|#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#...+-+
       |     *   *  #  *  * |#  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
       |     *   *  #  *  * |#  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #  *  *  #  *   *  #     |
  0.9x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
       ASSIGNMENT BITFIELD   FOURFP EMULATION   HUFFMAN   LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT     hmean
  png: http://imgur.com/FfD27ey

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/tb-hash.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 2c27490..b1fe2d0 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -22,6 +22,8 @@
 
 #include "exec/tb-hash-xx.h"
 
+#ifdef CONFIG_SOFTMMU
+
 /* Only the bottom TB_JMP_PAGE_BITS of the jump cache hash bits vary for
    addresses on the same page.  The top bits are the same.  This allows
    TLB invalidation to quickly clear a subset of the hash table.  */
@@ -45,6 +47,16 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
            | (tmp & TB_JMP_ADDR_MASK));
 }
 
+#else
+
+/* In user-mode we can get better hashing because we do not have a TLB */
+static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
+{
+    return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
+}
+
+#endif /* CONFIG_SOFTMMU */
+
 static inline
 uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper Emilio G. Cota
@ 2017-04-26  7:50   ` Paolo Bonzini
  2017-04-26  8:40   ` Richard Henderson
  2017-04-26 10:29   ` Alex Bennée
  2 siblings, 0 replies; 35+ messages in thread
From: Paolo Bonzini @ 2017-04-26  7:50 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Peter Crosthwaite, Richard Henderson, Peter Maydell,
	Eduardo Habkost, Andrzej Zaborowski, Aurelien Jarno,
	Alexander Graf, Stefan Weil, qemu-arm, alex.bennee,
	Pranith Kumar



On 26/04/2017 08:23, Emilio G. Cota wrote:
> This paves the way for upcoming work.
> 
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  tcg-runtime.c     | 21 +++++++++++++++++++++
>  tcg/tcg-runtime.h |  2 ++
>  tcg/tcg.h         |  1 +
>  3 files changed, 24 insertions(+)
> 
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index 4c60c96..90d2d4b 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -27,6 +27,7 @@
>  #include "exec/helper-proto.h"
>  #include "exec/cpu_ldst.h"
>  #include "exec/exec-all.h"
> +#include "exec/tb-hash.h"
>  
>  /* 32-bit helpers */
>  
> @@ -141,6 +142,26 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
>      return ctpop64(arg);
>  }
>  
> +void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
> +{
> +    CPUState *cpu = ENV_GET_CPU(env);
> +    TranslationBlock *tb;
> +    target_ulong cs_base, pc;
> +    uint32_t flags;
> +
> +    if (unlikely(atomic_read(&cpu->exit_request))) {
> +        goto out_epilogue;
> +    }

This should not be necessary, because cpu->icount_decr will be checked
by the prologue of the destination tb.

Paolo

> +    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> +    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> +    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
> +               tb->flags == flags)) {
> +        return tb->tc_ptr;
> +    }
> + out_epilogue:
> +    return tcg_ctx.code_gen_epilogue;
> +}
> +
>  void HELPER(exit_atomic)(CPUArchState *env)
>  {
>      cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
> index 114ea6f..c41d38a 100644
> --- a/tcg/tcg-runtime.h
> +++ b/tcg/tcg-runtime.h
> @@ -24,6 +24,8 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>  DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
>  DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>  
> +DEF_HELPER_FLAGS_2(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env, tl)
> +
>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>  
>  #ifdef CONFIG_SOFTMMU
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 6c216bb..5ec48d1 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -699,6 +699,7 @@ struct TCGContext {
>         extension that allows arithmetic on void*.  */
>      int code_gen_max_blocks;
>      void *code_gen_prologue;
> +    void *code_gen_epilogue;
>      void *code_gen_buffer;
>      size_t code_gen_buffer_size;
>      void *code_gen_ptr;
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/10] target/arm: optimize indirect branches
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 06/10] target/arm: optimize indirect branches Emilio G. Cota
@ 2017-04-26  7:54   ` Richard Henderson
  2017-04-27  3:20     ` Emilio G. Cota
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2017-04-26  7:54 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Peter Maydell, Eduardo Habkost,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> +static bool gen_jr;...
>          case DISAS_JUMP:
> +            if (gen_jr) {

Why the variable?  Why not just try the goto_ptr for any DISAS_JUMP?


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 09/10] target/i386: optimize indirect branches
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 09/10] target/i386: optimize indirect branches Emilio G. Cota
@ 2017-04-26  8:24   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2017-04-26  8:24 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Peter Maydell, Eduardo Habkost,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> Speed up indirect branches by jumping to the target if it is valid.

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 08/10] target/i386: optimize cross-page direct jumps in softmmu
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 08/10] target/i386: optimize cross-page direct jumps in softmmu Emilio G. Cota
@ 2017-04-26  8:25   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2017-04-26  8:25 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Peter Maydell, Eduardo Habkost,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> Instead of unconditionally exiting to the exec loop, use the
> gen_jr helper to jump to the target if it is valid.

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 07/10] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 07/10] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Emilio G. Cota
@ 2017-04-26  8:26   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2017-04-26  8:26 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Peter Maydell, Eduardo Habkost,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> +gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, TCGv jr)
>   {
>       gen_update_cc_op(s);
>   
> @@ -2530,6 +2532,13 @@ static void gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf)
>           tcg_gen_exit_tb(0);
>       } else if (s->tf) {
>           gen_helper_single_step(cpu_env);
> +    } else if (jr) {
...
> -        gen_eob_worker(s, false, true);
> +        gen_eob_worker(s, false, true, NULL);

You cannot use NULL.  While we abuse a pointer type for TCGv, that's because C 
doesn't have type verification for different integral types.

You need to use

     TCGv unused;
     TCGV_UNUSED(unused);

and

     if (TCGV_IS_UNUSED(foo))

(As it happens, I do have a branch that tries to clean this up, so that this 
kind of thing is less unwieldy.  Perhaps I'll get to it this cycle...)

> +        tcg_gen_ld_tl(vaddr, cpu_env, offsetof(CPUX86State, segs[R_CS].base));
> +        tcg_gen_add_tl(vaddr, vaddr, jr);

tcg_gen_add_tl(vaddr, jr, cpu_seg_base[R_CS]);


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 05/10] target/arm: optimize cross-page direct jumps in softmmu
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 05/10] target/arm: optimize cross-page direct jumps in softmmu Emilio G. Cota
@ 2017-04-26  8:27   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2017-04-26  8:27 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Peter Maydell, Eduardo Habkost,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> Instead of unconditionally exiting to the exec loop, use the
> lookup_and_goto_ptr helper to jump to the target if it is valid.
> 
> Perf impact: see next commit's log.
> 
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---
>   target/arm/translate.c | 6 +++++-
>   1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 04/10] tcg/i386: implement goto_ptr op
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 04/10] tcg/i386: implement goto_ptr op Emilio G. Cota
@ 2017-04-26  8:28   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2017-04-26  8:28 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Peter Maydell, Eduardo Habkost,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---
>   tcg/i386/tcg-target.h     |  2 +-
>   tcg/i386/tcg-target.inc.c | 15 +++++++++++++++
>   2 files changed, 16 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] tcg: export tcg_gen_lookup_and_goto_ptr
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 03/10] tcg: export tcg_gen_lookup_and_goto_ptr Emilio G. Cota
@ 2017-04-26  8:29   ` Richard Henderson
  2017-04-26 10:33   ` Alex Bennée
  1 sibling, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2017-04-26  8:29 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Peter Maydell, Eduardo Habkost,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> Instead of exporting goto_ptr directly to TCG frontends, export
> tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
> returned by the lookup_tb_ptr() helper. This is the only use case
> we have for goto_ptr and lookup_tb_ptr, so having this function is
> very convenient. Furthermore, it trivially allows us to avoid calling
> the lookup helper if goto_ptr is not implemented by the backend.
> 
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---
>   tcg/README   |  8 ++++++++
>   tcg/tcg-op.c | 13 +++++++++++++
>   tcg/tcg-op.h | 11 +++++++++++
>   3 files changed, 32 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 02/10] tcg: introduce goto_ptr opcode
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 02/10] tcg: introduce goto_ptr opcode Emilio G. Cota
@ 2017-04-26  8:30   ` Richard Henderson
  2017-04-26 12:12   ` Richard Henderson
  1 sibling, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2017-04-26  8:30 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Peter Maydell, Eduardo Habkost,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> Signed-off-by: Emilio G. Cota<cota@braap.org>
> ---
>   tcg/aarch64/tcg-target.h | 1 +
>   tcg/arm/tcg-target.h     | 1 +
>   tcg/i386/tcg-target.h    | 1 +
>   tcg/ia64/tcg-target.h    | 1 +
>   tcg/mips/tcg-target.h    | 1 +
>   tcg/ppc/tcg-target.h     | 1 +
>   tcg/s390/tcg-target.h    | 1 +
>   tcg/sparc/tcg-target.h   | 1 +
>   tcg/tcg-opc.h            | 1 +
>   tcg/tci/tcg-target.h     | 1 +
>   10 files changed, 10 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper Emilio G. Cota
  2017-04-26  7:50   ` Paolo Bonzini
@ 2017-04-26  8:40   ` Richard Henderson
  2017-04-26 21:56     ` Emilio G. Cota
  2017-04-26 23:17     ` Emilio G. Cota
  2017-04-26 10:29   ` Alex Bennée
  2 siblings, 2 replies; 35+ messages in thread
From: Richard Henderson @ 2017-04-26  8:40 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Peter Maydell, Eduardo Habkost,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> This paves the way for upcoming work.
> 
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>   tcg-runtime.c     | 21 +++++++++++++++++++++
>   tcg/tcg-runtime.h |  2 ++
>   tcg/tcg.h         |  1 +
>   3 files changed, 24 insertions(+)
> 
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index 4c60c96..90d2d4b 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -27,6 +27,7 @@
>   #include "exec/helper-proto.h"
>   #include "exec/cpu_ldst.h"
>   #include "exec/exec-all.h"
> +#include "exec/tb-hash.h"
>   
>   /* 32-bit helpers */
>   
> @@ -141,6 +142,26 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
>       return ctpop64(arg);
>   }
>   
> +void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
> +{
> +    CPUState *cpu = ENV_GET_CPU(env);
> +    TranslationBlock *tb;
> +    target_ulong cs_base, pc;
> +    uint32_t flags;
> +
> +    if (unlikely(atomic_read(&cpu->exit_request))) {
> +        goto out_epilogue;
> +    }

Paolo is right.  This will also be checked by the first instructions of the TB 
and there's little point in repeating it here, especially if it is indeed unlikely.

> +    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> +    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> +    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
> +               tb->flags == flags)) {

This comparison is wrong.  It will incorrectly reject a TB for i386 guest when 
CS_BASE != 0.  You really want

   tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
   if (tb) {
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
     if (tb->pc == pc && tb->cs_base == cs_base && tb->flags == flags) {
       return tb->tc_ptr;
     }
   }
   return tcg_ctx.code_gen_epilogue;

where you don't even load the cpu state if there isn't a preliminary hit in the 
cache.  (Note to self: That minor optimization would also apply to tb_find.)

I also wonder, if we've gone this far, if we wouldn't go all the way and also 
check tb_htable_lookup.


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper Emilio G. Cota
  2017-04-26  7:50   ` Paolo Bonzini
  2017-04-26  8:40   ` Richard Henderson
@ 2017-04-26 10:29   ` Alex Bennée
  2017-04-26 10:43     ` Richard Henderson
  2017-04-26 15:11     ` Paolo Bonzini
  2 siblings, 2 replies; 35+ messages in thread
From: Alex Bennée @ 2017-04-26 10:29 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	Pranith Kumar


Emilio G. Cota <cota@braap.org> writes:

> This paves the way for upcoming work.
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> ---
>  tcg-runtime.c     | 21 +++++++++++++++++++++
>  tcg/tcg-runtime.h |  2 ++
>  tcg/tcg.h         |  1 +
>  3 files changed, 24 insertions(+)
>
> diff --git a/tcg-runtime.c b/tcg-runtime.c
> index 4c60c96..90d2d4b 100644
> --- a/tcg-runtime.c
> +++ b/tcg-runtime.c
> @@ -27,6 +27,7 @@
>  #include "exec/helper-proto.h"
>  #include "exec/cpu_ldst.h"
>  #include "exec/exec-all.h"
> +#include "exec/tb-hash.h"
>
>  /* 32-bit helpers */
>
> @@ -141,6 +142,26 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
>      return ctpop64(arg);
>  }
>
> +void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
> +{
> +    CPUState *cpu = ENV_GET_CPU(env);
> +    TranslationBlock *tb;
> +    target_ulong cs_base, pc;
> +    uint32_t flags;
> +
> +    if (unlikely(atomic_read(&cpu->exit_request))) {
> +        goto out_epilogue;
> +    }
> +    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> +    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> +    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
> +               tb->flags == flags)) {

Should we also not be checking the TB hasn't been invalidated: tb->invalid?

> +        return tb->tc_ptr;
> +    }
> + out_epilogue:
> +    return tcg_ctx.code_gen_epilogue;
> +}
> +
>  void HELPER(exit_atomic)(CPUArchState *env)
>  {
>      cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
> index 114ea6f..c41d38a 100644
> --- a/tcg/tcg-runtime.h
> +++ b/tcg/tcg-runtime.h
> @@ -24,6 +24,8 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>  DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
>  DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>
> +DEF_HELPER_FLAGS_2(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env, tl)
> +
>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>
>  #ifdef CONFIG_SOFTMMU
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index 6c216bb..5ec48d1 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -699,6 +699,7 @@ struct TCGContext {
>         extension that allows arithmetic on void*.  */
>      int code_gen_max_blocks;
>      void *code_gen_prologue;
> +    void *code_gen_epilogue;
>      void *code_gen_buffer;
>      size_t code_gen_buffer_size;
>      void *code_gen_ptr;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 03/10] tcg: export tcg_gen_lookup_and_goto_ptr
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 03/10] tcg: export tcg_gen_lookup_and_goto_ptr Emilio G. Cota
  2017-04-26  8:29   ` Richard Henderson
@ 2017-04-26 10:33   ` Alex Bennée
  1 sibling, 0 replies; 35+ messages in thread
From: Alex Bennée @ 2017-04-26 10:33 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	Pranith Kumar


Emilio G. Cota <cota@braap.org> writes:

> Instead of exporting goto_ptr directly to TCG frontends, export
> tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
> returned by the lookup_tb_ptr() helper. This is the only use case
> we have for goto_ptr and lookup_tb_ptr, so having this function is
> very convenient. Furthermore, it trivially allows us to avoid calling
> the lookup helper if goto_ptr is not implemented by the backend.
>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  tcg/README   |  8 ++++++++
>  tcg/tcg-op.c | 13 +++++++++++++
>  tcg/tcg-op.h | 11 +++++++++++
>  3 files changed, 32 insertions(+)
>
> diff --git a/tcg/README b/tcg/README
> index a9858c2..bf49e82 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -477,6 +477,14 @@ current TB was linked to this TB. Otherwise execute the next
>  instructions. Only indices 0 and 1 are valid and tcg_gen_goto_tb may be issued
>  at most once with each slot index per TB.
>
> +* lookup_and_goto_ptr tb_addr
> +
> +Look up a TB address ('tb_addr') and jump to it if valid. If not valid,
> +jump to the TCG epilogue to go back to the exec loop.
> +
> +This operation is optional. If the TCG backend does not implement the
> +goto_ptr opcode, emitting this op is equivalent to emitting exit_tb(0).
> +
>  * qemu_ld_i32/i64 t0, t1, flags, memidx
>  * qemu_st_i32/i64 t0, t1, flags, memidx
>
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 95a39b7..8ff1eaf 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -2587,6 +2587,19 @@ void tcg_gen_goto_tb(unsigned idx)
>      tcg_gen_op1i(INDEX_op_goto_tb, idx);
>  }
>
> +void tcg_gen_lookup_and_goto_ptr(TCGv addr)
> +{
> +    if (TCG_TARGET_HAS_goto_ptr) {
> +        TCGv_ptr ptr = tcg_temp_new_ptr();
> +
> +        gen_helper_lookup_tb_ptr(ptr, tcg_ctx.tcg_env, addr);
> +        tcg_gen_op1i(INDEX_op_goto_ptr, GET_TCGV_PTR(ptr));
> +        tcg_temp_free_ptr(ptr);
> +    } else {
> +        tcg_gen_exit_tb(0);
> +    }
> +}
> +
>  static inline TCGMemOp tcg_canonicalize_memop(TCGMemOp op, bool is64, bool st)
>  {
>      /* Trigger the asserts within as early as possible.  */
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index c68e300..5d3278f 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -796,6 +796,17 @@ static inline void tcg_gen_exit_tb(uintptr_t val)
>   */
>  void tcg_gen_goto_tb(unsigned idx);
>
> +/**
> + * tcg_gen_lookup_and_goto_ptr() - look up a TB and jump to it if valid
> + * @addr: Guest address of the target TB
> + *
> + * If the TB is not valid, jump to the epilogue.
> + *
> + * This operation is optional. If the TCG backend does not implement goto_ptr,
> + * this op is equivalent to calling tcg_gen_exit_tb() with 0 as the argument.
> + */
> +void tcg_gen_lookup_and_goto_ptr(TCGv addr);
> +
>  #if TARGET_LONG_BITS == 32
>  #define tcg_temp_new() tcg_temp_new_i32()
>  #define tcg_global_reg_new tcg_global_reg_new_i32


--
Alex Bennée

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26 10:29   ` Alex Bennée
@ 2017-04-26 10:43     ` Richard Henderson
  2017-04-26 15:11     ` Paolo Bonzini
  1 sibling, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2017-04-26 10:43 UTC (permalink / raw)
  To: Alex Bennée, Emilio G. Cota
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Peter Maydell,
	Eduardo Habkost, Andrzej Zaborowski, Aurelien Jarno,
	Alexander Graf, Stefan Weil, qemu-arm, Pranith Kumar

On 04/26/2017 12:29 PM, Alex Bennée wrote:
> 
> Emilio G. Cota <cota@braap.org> writes:
> 
>> This paves the way for upcoming work.
>>
>> Reviewed-by: Richard Henderson <rth@twiddle.net>
>> Signed-off-by: Emilio G. Cota <cota@braap.org>
>> ---
>>   tcg-runtime.c     | 21 +++++++++++++++++++++
>>   tcg/tcg-runtime.h |  2 ++
>>   tcg/tcg.h         |  1 +
>>   3 files changed, 24 insertions(+)
>>
>> diff --git a/tcg-runtime.c b/tcg-runtime.c
>> index 4c60c96..90d2d4b 100644
>> --- a/tcg-runtime.c
>> +++ b/tcg-runtime.c
>> @@ -27,6 +27,7 @@
>>   #include "exec/helper-proto.h"
>>   #include "exec/cpu_ldst.h"
>>   #include "exec/exec-all.h"
>> +#include "exec/tb-hash.h"
>>
>>   /* 32-bit helpers */
>>
>> @@ -141,6 +142,26 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
>>       return ctpop64(arg);
>>   }
>>
>> +void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
>> +{
>> +    CPUState *cpu = ENV_GET_CPU(env);
>> +    TranslationBlock *tb;
>> +    target_ulong cs_base, pc;
>> +    uint32_t flags;
>> +
>> +    if (unlikely(atomic_read(&cpu->exit_request))) {
>> +        goto out_epilogue;
>> +    }
>> +    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>> +    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
>> +    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
>> +               tb->flags == flags)) {
> 
> Should we also not be checking the TB hasn't been invalidated: tb->invalid?

We don't check in tb_find.

I guess we're assuming that such have been purged from the tb_jmp_cache.  That 
said, tb_phys_invalidate assumes tb_locked, and I don't immediately remember 
how all that is supposed to tie together.


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 02/10] tcg: introduce goto_ptr opcode
  2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 02/10] tcg: introduce goto_ptr opcode Emilio G. Cota
  2017-04-26  8:30   ` Richard Henderson
@ 2017-04-26 12:12   ` Richard Henderson
  1 sibling, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2017-04-26 12:12 UTC (permalink / raw)
  To: Emilio G. Cota, qemu-devel
  Cc: Paolo Bonzini, Peter Crosthwaite, Peter Maydell, Eduardo Habkost,
	Andrzej Zaborowski, Aurelien Jarno, Alexander Graf, Stefan Weil,
	qemu-arm, alex.bennee, Pranith Kumar

On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> +DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_END)

One more thing:

DEF(goto_ptr, 0, 1, 0, TCG_OPF_BB_END | IMPL(TCG_TARGET_HAS_goto_ptr))

You'd have noticed this if you'd have tested this particular patch, with 
--enable-debug-tcg, before applying patch 4.


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26 10:29   ` Alex Bennée
  2017-04-26 10:43     ` Richard Henderson
@ 2017-04-26 15:11     ` Paolo Bonzini
  2017-04-26 16:16       ` Alex Bennée
  1 sibling, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2017-04-26 15:11 UTC (permalink / raw)
  To: Alex Bennée, Emilio G. Cota
  Cc: qemu-devel, Peter Crosthwaite, Richard Henderson, Peter Maydell,
	Eduardo Habkost, Andrzej Zaborowski, Aurelien Jarno,
	Alexander Graf, Stefan Weil, qemu-arm, Pranith Kumar



On 26/04/2017 12:29, Alex Bennée wrote:
> 
> Emilio G. Cota <cota@braap.org> writes:
> 
>> This paves the way for upcoming work.
>>
>> Reviewed-by: Richard Henderson <rth@twiddle.net>
>> Signed-off-by: Emilio G. Cota <cota@braap.org>
>> ---
>>  tcg-runtime.c     | 21 +++++++++++++++++++++
>>  tcg/tcg-runtime.h |  2 ++
>>  tcg/tcg.h         |  1 +
>>  3 files changed, 24 insertions(+)
>>
>> diff --git a/tcg-runtime.c b/tcg-runtime.c
>> index 4c60c96..90d2d4b 100644
>> --- a/tcg-runtime.c
>> +++ b/tcg-runtime.c
>> @@ -27,6 +27,7 @@
>>  #include "exec/helper-proto.h"
>>  #include "exec/cpu_ldst.h"
>>  #include "exec/exec-all.h"
>> +#include "exec/tb-hash.h"
>>
>>  /* 32-bit helpers */
>>
>> @@ -141,6 +142,26 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
>>      return ctpop64(arg);
>>  }
>>
>> +void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
>> +{
>> +    CPUState *cpu = ENV_GET_CPU(env);
>> +    TranslationBlock *tb;
>> +    target_ulong cs_base, pc;
>> +    uint32_t flags;
>> +
>> +    if (unlikely(atomic_read(&cpu->exit_request))) {
>> +        goto out_epilogue;
>> +    }
>> +    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>> +    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
>> +    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
>> +               tb->flags == flags)) {
> 
> Should we also not be checking the TB hasn't been invalidated: tb->invalid?

It's not needed because this lookup is (if I understand it right) once
only and is not reused later.  This is why tb_find doesn't check
tb->invalid, but uses it to avoid adding the TB to the chain.

Good:

	tb_find			tb_phys_invalidate
				  tb_lock
				  tb->invalid = true
	  lookup cache
	  cache hit
				  tb_unlock
	  tb_lock
	  tb->invalid?
	    yes, skip tb_add_jump
	  tb_unlock
	  execute tb once

Bad (doesn't happen):

	tb_find			tb_phys_invalidate
				  tb_lock
				  tb->invalid = true
	  lookup cache
	  cache hit
				  tb_unlock
	  tb_lock
	  tb_add_jump
	  tb_unlock
	  execute tb many times

Paolo

>> +        return tb->tc_ptr;
>> +    }
>> + out_epilogue:
>> +    return tcg_ctx.code_gen_epilogue;
>> +}
>> +
>>  void HELPER(exit_atomic)(CPUArchState *env)
>>  {
>>      cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
>> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
>> index 114ea6f..c41d38a 100644
>> --- a/tcg/tcg-runtime.h
>> +++ b/tcg/tcg-runtime.h
>> @@ -24,6 +24,8 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>>  DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
>>  DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>>
>> +DEF_HELPER_FLAGS_2(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env, tl)
>> +
>>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>>
>>  #ifdef CONFIG_SOFTMMU
>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>> index 6c216bb..5ec48d1 100644
>> --- a/tcg/tcg.h
>> +++ b/tcg/tcg.h
>> @@ -699,6 +699,7 @@ struct TCGContext {
>>         extension that allows arithmetic on void*.  */
>>      int code_gen_max_blocks;
>>      void *code_gen_prologue;
>> +    void *code_gen_epilogue;
>>      void *code_gen_buffer;
>>      size_t code_gen_buffer_size;
>>      void *code_gen_ptr;
> 
> 
> --
> Alex Bennée
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26 15:11     ` Paolo Bonzini
@ 2017-04-26 16:16       ` Alex Bennée
  0 siblings, 0 replies; 35+ messages in thread
From: Alex Bennée @ 2017-04-26 16:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Emilio G. Cota, qemu-devel, Peter Crosthwaite, Richard Henderson,
	Peter Maydell, Eduardo Habkost, Andrzej Zaborowski,
	Aurelien Jarno, Alexander Graf, Stefan Weil, qemu-arm,
	Pranith Kumar


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 26/04/2017 12:29, Alex Bennée wrote:
>>
>> Emilio G. Cota <cota@braap.org> writes:
>>
>>> This paves the way for upcoming work.
>>>
>>> Reviewed-by: Richard Henderson <rth@twiddle.net>
>>> Signed-off-by: Emilio G. Cota <cota@braap.org>
>>> ---
>>>  tcg-runtime.c     | 21 +++++++++++++++++++++
>>>  tcg/tcg-runtime.h |  2 ++
>>>  tcg/tcg.h         |  1 +
>>>  3 files changed, 24 insertions(+)
>>>
>>> diff --git a/tcg-runtime.c b/tcg-runtime.c
>>> index 4c60c96..90d2d4b 100644
>>> --- a/tcg-runtime.c
>>> +++ b/tcg-runtime.c
>>> @@ -27,6 +27,7 @@
>>>  #include "exec/helper-proto.h"
>>>  #include "exec/cpu_ldst.h"
>>>  #include "exec/exec-all.h"
>>> +#include "exec/tb-hash.h"
>>>
>>>  /* 32-bit helpers */
>>>
>>> @@ -141,6 +142,26 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
>>>      return ctpop64(arg);
>>>  }
>>>
>>> +void *HELPER(lookup_tb_ptr)(CPUArchState *env, target_ulong addr)
>>> +{
>>> +    CPUState *cpu = ENV_GET_CPU(env);
>>> +    TranslationBlock *tb;
>>> +    target_ulong cs_base, pc;
>>> +    uint32_t flags;
>>> +
>>> +    if (unlikely(atomic_read(&cpu->exit_request))) {
>>> +        goto out_epilogue;
>>> +    }
>>> +    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>>> +    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
>>> +    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
>>> +               tb->flags == flags)) {
>>
>> Should we also not be checking the TB hasn't been invalidated: tb->invalid?
>
> It's not needed because this lookup is (if I understand it right) once
> only and is not reused later.  This is why tb_find doesn't check
> tb->invalid, but uses it to avoid adding the TB to the chain.

Right. And when tb->invalid = true is set we then flush it from the
jump cache so it will never be found by the helper after.


OK nothing to see here ;-)

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

>
> Good:
>
> 	tb_find			tb_phys_invalidate
> 				  tb_lock
> 				  tb->invalid = true
> 	  lookup cache
> 	  cache hit
> 				  tb_unlock
> 	  tb_lock
> 	  tb->invalid?
> 	    yes, skip tb_add_jump
> 	  tb_unlock
> 	  execute tb once
>
> Bad (doesn't happen):
>
> 	tb_find			tb_phys_invalidate
> 				  tb_lock
> 				  tb->invalid = true
> 	  lookup cache
> 	  cache hit
> 				  tb_unlock
> 	  tb_lock
> 	  tb_add_jump
> 	  tb_unlock
> 	  execute tb many times
>
> Paolo
>
>>> +        return tb->tc_ptr;
>>> +    }
>>> + out_epilogue:
>>> +    return tcg_ctx.code_gen_epilogue;
>>> +}
>>> +
>>>  void HELPER(exit_atomic)(CPUArchState *env)
>>>  {
>>>      cpu_loop_exit_atomic(ENV_GET_CPU(env), GETPC());
>>> diff --git a/tcg/tcg-runtime.h b/tcg/tcg-runtime.h
>>> index 114ea6f..c41d38a 100644
>>> --- a/tcg/tcg-runtime.h
>>> +++ b/tcg/tcg-runtime.h
>>> @@ -24,6 +24,8 @@ DEF_HELPER_FLAGS_1(clrsb_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>>>  DEF_HELPER_FLAGS_1(ctpop_i32, TCG_CALL_NO_RWG_SE, i32, i32)
>>>  DEF_HELPER_FLAGS_1(ctpop_i64, TCG_CALL_NO_RWG_SE, i64, i64)
>>>
>>> +DEF_HELPER_FLAGS_2(lookup_tb_ptr, TCG_CALL_NO_WG_SE, ptr, env, tl)
>>> +
>>>  DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
>>>
>>>  #ifdef CONFIG_SOFTMMU
>>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>>> index 6c216bb..5ec48d1 100644
>>> --- a/tcg/tcg.h
>>> +++ b/tcg/tcg.h
>>> @@ -699,6 +699,7 @@ struct TCGContext {
>>>         extension that allows arithmetic on void*.  */
>>>      int code_gen_max_blocks;
>>>      void *code_gen_prologue;
>>> +    void *code_gen_epilogue;
>>>      void *code_gen_buffer;
>>>      size_t code_gen_buffer_size;
>>>      void *code_gen_ptr;
>>
>>
>> --
>> Alex Bennée
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26  8:40   ` Richard Henderson
@ 2017-04-26 21:56     ` Emilio G. Cota
  2017-04-26 22:29       ` Richard Henderson
  2017-04-26 23:17     ` Emilio G. Cota
  1 sibling, 1 reply; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26 21:56 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Peter Maydell,
	Eduardo Habkost, Andrzej Zaborowski, Aurelien Jarno,
	Alexander Graf, Stefan Weil, qemu-arm, alex.bennee,
	Pranith Kumar

On Wed, Apr 26, 2017 at 10:40:45 +0200, Richard Henderson wrote:
> On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
(snip)
> >+    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> >+    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> >+    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
> >+               tb->flags == flags)) {
> 
> This comparison is wrong.  It will incorrectly reject a TB for i386 guest
> when CS_BASE != 0.  You really want
> 
>   tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
>   if (tb) {
>     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>     if (tb->pc == pc && tb->cs_base == cs_base && tb->flags == flags) {
>       return tb->tc_ptr;
>     }
>   }
>   return tcg_ctx.code_gen_epilogue;

wrt the comparison, the only change I notice in your suggested change is
  tb->pc == pc

instead of
  tb->pc == addr

, which seems innocuous to me (since tb->pc == addr).

I fail to see how this relates to your "CS_BASE != 0" comment.
What am I missing?

		E.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26 21:56     ` Emilio G. Cota
@ 2017-04-26 22:29       ` Richard Henderson
  2017-04-26 22:45         ` Emilio G. Cota
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2017-04-26 22:29 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Peter Maydell,
	Eduardo Habkost, Andrzej Zaborowski, Aurelien Jarno,
	Alexander Graf, Stefan Weil, qemu-arm, alex.bennee,
	Pranith Kumar

On 04/26/2017 11:56 PM, Emilio G. Cota wrote:
> On Wed, Apr 26, 2017 at 10:40:45 +0200, Richard Henderson wrote:
>> On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> (snip)
>>> +    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>>> +    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
>>> +    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
>>> +               tb->flags == flags)) {
>>
>> This comparison is wrong.  It will incorrectly reject a TB for i386 guest
>> when CS_BASE != 0.  You really want
>>
>>    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
>>    if (tb) {
>>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>>      if (tb->pc == pc && tb->cs_base == cs_base && tb->flags == flags) {
>>        return tb->tc_ptr;
>>      }
>>    }
>>    return tcg_ctx.code_gen_epilogue;
> 
> wrt the comparison, the only change I notice in your suggested change is
>    tb->pc == pc
> 
> instead of
>    tb->pc == addr
> 
> , which seems innocuous to me (since tb->pc == addr).
> 
> I fail to see how this relates to your "CS_BASE != 0" comment.
> What am I missing?

Recall how you computed vaddr for target/i386:

   addr = pc + cs_base


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26 22:29       ` Richard Henderson
@ 2017-04-26 22:45         ` Emilio G. Cota
  2017-04-26 23:11           ` Emilio G. Cota
  0 siblings, 1 reply; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26 22:45 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Peter Maydell,
	Eduardo Habkost, Andrzej Zaborowski, Aurelien Jarno,
	Alexander Graf, Stefan Weil, qemu-arm, alex.bennee,
	Pranith Kumar

On Thu, Apr 27, 2017 at 00:29:49 +0200, Richard Henderson wrote:
> On 04/26/2017 11:56 PM, Emilio G. Cota wrote:
> >On Wed, Apr 26, 2017 at 10:40:45 +0200, Richard Henderson wrote:
> >>On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> >(snip)
> >>>+    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> >>>+    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> >>>+    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
> >>>+               tb->flags == flags)) {
> >>
> >>This comparison is wrong.  It will incorrectly reject a TB for i386 guest
> >>when CS_BASE != 0.  You really want
> >>
> >>   tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> >>   if (tb) {
> >>     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> >>     if (tb->pc == pc && tb->cs_base == cs_base && tb->flags == flags) {
> >>       return tb->tc_ptr;
> >>     }
> >>   }
> >>   return tcg_ctx.code_gen_epilogue;
> >
> >wrt the comparison, the only change I notice in your suggested change is
> >   tb->pc == pc
> >
> >instead of
> >   tb->pc == addr
> >
> >, which seems innocuous to me (since tb->pc == addr).
> >
> >I fail to see how this relates to your "CS_BASE != 0" comment.
> >What am I missing?
> 
> Recall how you computed vaddr for target/i386:
> 
>   addr = pc + cs_base

I see, thanks!

		Emilio

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26 22:45         ` Emilio G. Cota
@ 2017-04-26 23:11           ` Emilio G. Cota
  2017-04-26 23:25             ` Emilio G. Cota
  0 siblings, 1 reply; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26 23:11 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Peter Maydell,
	Eduardo Habkost, Andrzej Zaborowski, Aurelien Jarno,
	Alexander Graf, Stefan Weil, qemu-arm, alex.bennee,
	Pranith Kumar

On Wed, Apr 26, 2017 at 18:45:31 -0400, Emilio G. Cota wrote:
> On Thu, Apr 27, 2017 at 00:29:49 +0200, Richard Henderson wrote:
> > On 04/26/2017 11:56 PM, Emilio G. Cota wrote:
> > >On Wed, Apr 26, 2017 at 10:40:45 +0200, Richard Henderson wrote:
> > >>On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> > >(snip)
> > >>>+    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> > >>>+    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> > >>>+    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
> > >>>+               tb->flags == flags)) {
> > >>
> > >>This comparison is wrong.  It will incorrectly reject a TB for i386 guest
> > >>when CS_BASE != 0.  You really want
> > >>
> > >>   tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> > >>   if (tb) {
> > >>     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> > >>     if (tb->pc == pc && tb->cs_base == cs_base && tb->flags == flags) {
> > >>       return tb->tc_ptr;
> > >>     }
> > >>   }
> > >>   return tcg_ctx.code_gen_epilogue;
> > >
> > >wrt the comparison, the only change I notice in your suggested change is
> > >   tb->pc == pc
> > >
> > >instead of
> > >   tb->pc == addr
> > >
> > >, which seems innocuous to me (since tb->pc == addr).
> > >
> > >I fail to see how this relates to your "CS_BASE != 0" comment.
> > >What am I missing?
> > 
> > Recall how you computed vaddr for target/i386:
> > 
> >   addr = pc + cs_base
> 
> I see, thanks!

Hmm TB's are added to tb_jmp_cache by pc, not by pc + cs_base:

  atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);

Shouldn't we then pass just the pc (without adding cs_base) to
lookup_ptr, then? i.e.

--- a/target/i386/translate.c
+++ b/target/i386/translate.c
@@ -2533,11 +2533,7 @@ gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, TCGv jr)
     } else if (s->tf) {
         gen_helper_single_step(cpu_env);
     } else if (!TCGV_IS_UNUSED(jr)) {
-        TCGv vaddr = tcg_temp_new();
-
-        tcg_gen_add_tl(vaddr, jr, cpu_seg_base[R_CS]);
-        tcg_gen_lookup_and_goto_ptr(vaddr);
-        tcg_temp_free(vaddr);
+        tcg_gen_lookup_and_goto_ptr(jr);
     } else {
         tcg_gen_exit_tb(0);
     }

And while at it, rename the "addr" argument in lookup_ptr to "pc". Hmm?

		E.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26  8:40   ` Richard Henderson
  2017-04-26 21:56     ` Emilio G. Cota
@ 2017-04-26 23:17     ` Emilio G. Cota
  1 sibling, 0 replies; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26 23:17 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Peter Maydell,
	Eduardo Habkost, Andrzej Zaborowski, Aurelien Jarno,
	Alexander Graf, Stefan Weil, qemu-arm, alex.bennee,
	Pranith Kumar

On Wed, Apr 26, 2017 at 10:40:45 +0200, Richard Henderson wrote:
> On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> >This paves the way for upcoming work.
> >
> >Reviewed-by: Richard Henderson <rth@twiddle.net>
> >Signed-off-by: Emilio G. Cota <cota@braap.org>
> >---
> >  tcg-runtime.c     | 21 +++++++++++++++++++++
> >  tcg/tcg-runtime.h |  2 ++
> >  tcg/tcg.h         |  1 +
> >  3 files changed, 24 insertions(+)
> >
> >diff --git a/tcg-runtime.c b/tcg-runtime.c
> >index 4c60c96..90d2d4b 100644
> >--- a/tcg-runtime.c
> >+++ b/tcg-runtime.c
> >@@ -27,6 +27,7 @@
> >  #include "exec/helper-proto.h"
> >  #include "exec/cpu_ldst.h"
> >  #include "exec/exec-all.h"
> >+#include "exec/tb-hash.h"
> >  /* 32-bit helpers */
> >@@ -141,6 +142,26 @@ uint64_t HELPER(ctpop_i64)(uint64_t arg)
> >      return ctpop64(arg);
> >  }
> >+    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> >+    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> >+    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
> >+               tb->flags == flags)) {
> 
> This comparison is wrong.  It will incorrectly reject a TB for i386 guest
> when CS_BASE != 0.  You really want
> 
>   tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
>   if (tb) {
>     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>     if (tb->pc == pc && tb->cs_base == cs_base && tb->flags == flags) {
>       return tb->tc_ptr;
>     }
>   }
>   return tcg_ctx.code_gen_epilogue;
> 
> where you don't even load the cpu state if there isn't a preliminary hit in
> the cache.

Yes, I like this.

> (Note to self: That minor optimization would also apply to tb_find.)

FWIW I looked at tb_find -- you need the pc though, which comes from
loading the CPU state:

    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
                               ^^
    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
                                                                   ^^

If we wanted to really avoid getting all the state I guess we'd have to add
another function that returned just the pc.

		E.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper
  2017-04-26 23:11           ` Emilio G. Cota
@ 2017-04-26 23:25             ` Emilio G. Cota
  0 siblings, 0 replies; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-26 23:25 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Peter Maydell,
	Eduardo Habkost, Andrzej Zaborowski, Aurelien Jarno,
	Alexander Graf, Stefan Weil, qemu-arm, alex.bennee,
	Pranith Kumar

On Wed, Apr 26, 2017 at 19:11:32 -0400, Emilio G. Cota wrote:
> On Wed, Apr 26, 2017 at 18:45:31 -0400, Emilio G. Cota wrote:
> > On Thu, Apr 27, 2017 at 00:29:49 +0200, Richard Henderson wrote:
> > > On 04/26/2017 11:56 PM, Emilio G. Cota wrote:
> > > >On Wed, Apr 26, 2017 at 10:40:45 +0200, Richard Henderson wrote:
> > > >>On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> > > >(snip)
> > > >>>+    cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> > > >>>+    tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> > > >>>+    if (likely(tb && tb->pc == addr && tb->cs_base == cs_base &&
> > > >>>+               tb->flags == flags)) {
> > > >>
> > > >>This comparison is wrong.  It will incorrectly reject a TB for i386 guest
> > > >>when CS_BASE != 0.  You really want
> > > >>
> > > >>   tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(addr)]);
> > > >>   if (tb) {
> > > >>     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> > > >>     if (tb->pc == pc && tb->cs_base == cs_base && tb->flags == flags) {
> > > >>       return tb->tc_ptr;
> > > >>     }
> > > >>   }
> > > >>   return tcg_ctx.code_gen_epilogue;
> > > >
> > > >wrt the comparison, the only change I notice in your suggested change is
> > > >   tb->pc == pc
> > > >
> > > >instead of
> > > >   tb->pc == addr
> > > >
> > > >, which seems innocuous to me (since tb->pc == addr).
> > > >
> > > >I fail to see how this relates to your "CS_BASE != 0" comment.
> > > >What am I missing?
> > > 
> > > Recall how you computed vaddr for target/i386:
> > > 
> > >   addr = pc + cs_base
> > 
> > I see, thanks!
> 
> Hmm TB's are added to tb_jmp_cache by pc, not by pc + cs_base:
> 
>   atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
> 
> Shouldn't we then pass just the pc (without adding cs_base) to
> lookup_ptr, then? i.e.
> 
> --- a/target/i386/translate.c
> +++ b/target/i386/translate.c
> @@ -2533,11 +2533,7 @@ gen_eob_worker(DisasContext *s, bool inhibit, bool recheck_tf, TCGv jr)
>      } else if (s->tf) {
>          gen_helper_single_step(cpu_env);
>      } else if (!TCGV_IS_UNUSED(jr)) {
> -        TCGv vaddr = tcg_temp_new();
> -
> -        tcg_gen_add_tl(vaddr, jr, cpu_seg_base[R_CS]);
> -        tcg_gen_lookup_and_goto_ptr(vaddr);
> -        tcg_temp_free(vaddr);
> +        tcg_gen_lookup_and_goto_ptr(jr);
>      } else {
>          tcg_gen_exit_tb(0);
>      }
> 
> And while at it, rename the "addr" argument in lookup_ptr to "pc". Hmm?

Answering to myself again..

target/i386/cpu.c:
static inline void cpu_get_tb_cpu_state(CPUX86State *env, target_ulong *pc,
                                        target_ulong *cs_base, uint32_t *flags)
{
    *cs_base = env->segs[R_CS].base;
    *pc = *cs_base + env->eip;
    *flags = env->hflags |
        (env->eflags & (IOPL_MASK | TF_MASK | RF_MASK | VM_MASK | AC_MASK));
}

cpu-exec.c:
        /* We add the TB in the virtual pc hash table for the fast lookup */
        atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);

So in lookup_and_goto_ptr, checking tb->pc == pc or tb->pc == addr,
where addr was passed from 'jr + cpu_seg_base[R_CS]', are both correct.
FWIW, I just checked with an assertion in full-system mode.

		E.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/10] target/arm: optimize indirect branches
  2017-04-26  7:54   ` Richard Henderson
@ 2017-04-27  3:20     ` Emilio G. Cota
  2017-04-27 10:25       ` Aurelien Jarno
  0 siblings, 1 reply; 35+ messages in thread
From: Emilio G. Cota @ 2017-04-27  3:20 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, Paolo Bonzini, Peter Crosthwaite, Peter Maydell,
	Eduardo Habkost, Andrzej Zaborowski, Aurelien Jarno,
	Alexander Graf, Stefan Weil, qemu-arm, alex.bennee,
	Pranith Kumar

On Wed, Apr 26, 2017 at 09:54:07 +0200, Richard Henderson wrote:
> On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> >+static bool gen_jr;...
> >         case DISAS_JUMP:
> >+            if (gen_jr) {
> 
> Why the variable?  Why not just try the goto_ptr for any DISAS_JUMP?

We have code that assumes DISAS_JUMP implies "go to exec loop", e.g.:
            case 6: /* isb */
                /* We need to break the TB after this insn to execute
                 * self-modifying code correctly and also to take
                 * any pending interrupts immediately.
                 */
                gen_lookup_tb(s);
where gen_lookup_tb does:

/* Force a TB lookup after an instruction that changes the CPU state.  */
static inline void gen_lookup_tb(DisasContext *s)
{
    tcg_gen_movi_i32(cpu_R[15], s->pc & ~1);
    s->is_jmp = DISAS_JUMP;
}

Also, the gen_exception_* functions set DISAS_JUMP. I suspect we want to go
to the exec loop with those as well.

Testing shows that I'm onto something; if I remove the variable,
and note that I make sure DISAS_UPDATE is not falling through, I get
easily reproducible (~1 out of 5) freezes and other instability
(e.g. RCU lockup warnings) when booting + shutting down debian jessie
in the guest.

In v4 I've added a comment about this.

Thanks,

		E.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v3 06/10] target/arm: optimize indirect branches
  2017-04-27  3:20     ` Emilio G. Cota
@ 2017-04-27 10:25       ` Aurelien Jarno
  0 siblings, 0 replies; 35+ messages in thread
From: Aurelien Jarno @ 2017-04-27 10:25 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: Richard Henderson, Peter Maydell, Eduardo Habkost,
	Peter Crosthwaite, Stefan Weil, qemu-devel, Alexander Graf,
	alex.bennee, qemu-arm, Pranith Kumar, Paolo Bonzini

On 2017-04-26 23:20, Emilio G. Cota wrote:
> On Wed, Apr 26, 2017 at 09:54:07 +0200, Richard Henderson wrote:
> > On 04/26/2017 08:23 AM, Emilio G. Cota wrote:
> > >+static bool gen_jr;...
> > >         case DISAS_JUMP:
> > >+            if (gen_jr) {
> > 
> > Why the variable?  Why not just try the goto_ptr for any DISAS_JUMP?
> 
> We have code that assumes DISAS_JUMP implies "go to exec loop", e.g.:
>             case 6: /* isb */
>                 /* We need to break the TB after this insn to execute
>                  * self-modifying code correctly and also to take
>                  * any pending interrupts immediately.
>                  */
>                 gen_lookup_tb(s);
> where gen_lookup_tb does:
> 
> /* Force a TB lookup after an instruction that changes the CPU state.  */
> static inline void gen_lookup_tb(DisasContext *s)
> {
>     tcg_gen_movi_i32(cpu_R[15], s->pc & ~1);
>     s->is_jmp = DISAS_JUMP;
> }

Changing the CPU state should already be taken into account in
lookup_tb_ptr through the cpu_get_tb_cpu_state function. Also interrupts
should be checked at the beginning of each TB.

> Also, the gen_exception_* functions set DISAS_JUMP. I suspect we want to go
> to the exec loop with those as well.

Technically all the code after the one generated by gen_exception_*
function is not executed, including the tcg_gen_exit_tb(0). Therefore
generating code using tcg_gen_exit_tb or or tcg_gen_lookup_and_goto_ptr
should no change the behaviour.

> Testing shows that I'm onto something; if I remove the variable,
> and note that I make sure DISAS_UPDATE is not falling through, I get
> easily reproducible (~1 out of 5) freezes and other instability
> (e.g. RCU lockup warnings) when booting + shutting down debian jessie
> in the guest.

I agree that always calling lookup_tb_ptr might be suboptimal in case we
know for sure that the lookup will fail or that there will be an exit
request at the beginning of the next TB. That said I found strange that
it causes issues, and I wonder if it just expose a bug somewhere.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-04-27 10:26 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-26  6:23 [Qemu-devel] [PATCH v3 00/10] TCG optimizations for 2.10 Emilio G. Cota
2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 01/10] tcg-runtime: add lookup_tb_ptr helper Emilio G. Cota
2017-04-26  7:50   ` Paolo Bonzini
2017-04-26  8:40   ` Richard Henderson
2017-04-26 21:56     ` Emilio G. Cota
2017-04-26 22:29       ` Richard Henderson
2017-04-26 22:45         ` Emilio G. Cota
2017-04-26 23:11           ` Emilio G. Cota
2017-04-26 23:25             ` Emilio G. Cota
2017-04-26 23:17     ` Emilio G. Cota
2017-04-26 10:29   ` Alex Bennée
2017-04-26 10:43     ` Richard Henderson
2017-04-26 15:11     ` Paolo Bonzini
2017-04-26 16:16       ` Alex Bennée
2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 02/10] tcg: introduce goto_ptr opcode Emilio G. Cota
2017-04-26  8:30   ` Richard Henderson
2017-04-26 12:12   ` Richard Henderson
2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 03/10] tcg: export tcg_gen_lookup_and_goto_ptr Emilio G. Cota
2017-04-26  8:29   ` Richard Henderson
2017-04-26 10:33   ` Alex Bennée
2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 04/10] tcg/i386: implement goto_ptr op Emilio G. Cota
2017-04-26  8:28   ` Richard Henderson
2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 05/10] target/arm: optimize cross-page direct jumps in softmmu Emilio G. Cota
2017-04-26  8:27   ` Richard Henderson
2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 06/10] target/arm: optimize indirect branches Emilio G. Cota
2017-04-26  7:54   ` Richard Henderson
2017-04-27  3:20     ` Emilio G. Cota
2017-04-27 10:25       ` Aurelien Jarno
2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 07/10] target/i386: introduce gen_jr helper to generate lookup_and_goto_ptr Emilio G. Cota
2017-04-26  8:26   ` Richard Henderson
2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 08/10] target/i386: optimize cross-page direct jumps in softmmu Emilio G. Cota
2017-04-26  8:25   ` Richard Henderson
2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 09/10] target/i386: optimize indirect branches Emilio G. Cota
2017-04-26  8:24   ` Richard Henderson
2017-04-26  6:23 ` [Qemu-devel] [PATCH v3 10/10] tb-hash: improve tb_jmp_cache hash function in user mode Emilio G. Cota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.